While the base containers do the grunt work of holding data for most programmers, there are times when something with a bit more functionality and capability is required. Collections are built-in tools that provide specialized alternatives to the regular containers. Most of them are just subclasses or wrappers to existing containers that can make life easier for a developer, provide new features, or just provide more options for a programmer so a developer doesn't have to worry about making boilerplate code and can focus on getting the work done.
Before we get into collections, we will take a little bit of time to review the existing containers so we know what is, and is not, provided with them. This will allow us to better understand the capabilities and potential limitations of collections.
Sequence types include lists, tuples, and ranges, though only lists and tuples are relevant here. Sequence types include the __iter___ function by default, so they can naturally iterate over the sequence of objects they contain.
Lists are mutable sequences, that is, they can be modified in-place. They most commonly hold homogeneous items, but this is not a requirement. Lists are probably the most common container to be used in Python, as it is easy to add new items to a list by simply using <list>.append to extend the sequence.
Tuples are immutable, meaning they cannot be modified in-place and a new tuple must be created if a modification is to occur. They frequently hold heterogeneous data, such as capturing multiple return values. Because they cannot be modified, they are also useful to use if you want to ensure that a sequential list isn't modified by accident.
Dictionaries map values to keys. They are known as hash tables, associated arrays, or by other names in different programming languages. Dictionaries are mutable, just like lists, so they can be changed in-place without having to create a new dictionary. A key feature of dictionaries is that keys must be hashable, that is, the hash digest of the object cannot change during its lifetime. Thus, mutable objects, such as lists or other dictionaries, cannot be used as keys. However, they can be used as values mapped to the keys.
Sets are similar to dictionaries in that they are containers of unordered, hashable objects, but they are just values; no keys exist in a set. Sets are used to test for membership, removing duplicates from sequences, and a variety of mathematical operations.
Sets are mutable objects, while frozensets are immutable. Since sets can be modified, they are not suitable for dictionary keys or as elements of another set. Frozensets, being unchanging, can be used as dictionary keys or as a set element.
Sequence objects (lists and tuples) have the following common operations. Note: s and t are sequences of the same type; n, i, j, and k are integer values, and x is an object that meets the restrictions required by s:
- x in s: This returns true if an item in sequence s is equal to x; otherwise, it returns false
- x not in s: This returns true if no item in sequence s is equal to x; otherwise, it returns false
- s + t: This concatenates sequence s with sequence t (concatenating immutable sequences creates a new object)
- s * n: This adds s to itself n times (items in the sequence are not copied, but referenced multiple times)
- s[i]: This retrieves the ith item in sequence s, with count starting from 0 (negative numbers start counting from the end of the sequence, rather than the beginning)
- s[i:j]: This retrieves a slice of s, from i (inclusive) to j (exclusive)
- s[i:j:k]: This retrieves a slice from s, from i to j, skipping k times
- len(s): This returns the length of s
- min(s): This returns the smallest item in s
- max(s): This returns the largest item in s
- s.index(x[, i[, j]]): This indexes the first instance of x in s; optionally, it returns x at or after index i and (optionally) before index j
- s.count(x): This returns the total count of x instances in s
Mutable sequence objects, such as lists, have the following specific operations available to them (note: s is a mutable sequence, t is an iterable object, i and j are integer values, and the x object meets any sequence restrictions).
- s[i] = x: This replaces the object at index position i with object x
- s[i:j] = t: The slice from i (inclusive) to j (exclusive) is replaced with the contents of object t
- del s[i:j]: This deletes the contents of s from indexes i to j
- s[i:j:k] = t: This replaces the slice of i to j (stepping by k) by object t (t must have the same length as s)
- del s[i:j:k]: This deletes elements of the sequence, as determined by the slice indexes and stepping, if present
- s.append(x): This adds x to the end of s
- s.clear(): This deletes all elements from the sequence
- s.copy(): This is used to shallow copy of s
- s.extend(t): This extends s with the contents of t (can also use s += t)
- s *= n: This is used to update s with its contents repeated n times
- s.insert(i, x): This inserts x into s at position i
- s.pop([i]): This is used to extract an item at index i from s, returning it as a result and removing it from s (defaults to removing the last item from s)
- s.remove(x): This is used to delete the first item from s that matches x (throws an exception if x is not present)
- s.reverse(): This is used to reverse s in-place
Nearly every container in Python has special methods associated with it. While the methods described previously are universal for their respective containers, some containers have methods that apply just to them.
In addition to implementing all common and mutable sequence operations, lists and tuples also have the following special method available to them:
- sort(*, [reverse=False, key=None]): This is used to sort a list in-place, using the < comparator. Reverse comparison, that is, high-to-low, can be accomplished by using reverse=True. The optional key argument specifies a function that returns the list, as sorted by the function.
As an example of how to use the key argument, assume you have a list of lists:
>>> l = [[3, 56], [2, 34], [6, 98], [1, 43]]
To sort this list, call the sort() method on the list, and then print the list. Without having a function that combines the two steps, they have to be called separately. This is actually a feature, as normally sorted lists are then programatically operated on, rather than always printed out:
>>> l.sort() >>> l [[1, 43], [2, 34], [3, 56], [6, 98]]
If you wanted a different sorting, such as sorting by the...