Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .
In this second part of the chapter, we look closely at the built-in list
type, which is probably the most commonly used sequence data type in practice.
list
Type¶As already seen multiple times, to create a list
object, we use the literal notation and list all elements within brackets [
and ]
.
empty = []
simple = [40, 50]
The elements do not need to be of the same type, and list
objects may also be nested.
nested = [empty, 10, 20.0, "Thirty", simple]
PythonTutor shows how
nested
holds references to the empty
and simple
objects. Technically, it holds three more references to the 10
, 20.0
, and "Thirty"
objects as well. However, to simplify the visualization, these three objects are shown right inside the nested
object. That may be done because they are immutable and "flat" data types. In general, the 0s and 1s inside a list
object in memory always constitute references to other objects only.
nested
[[], 10, 20.0, 'Thirty', [40, 50]]
Let's not forget that nested
is an object on its own with an identity and data type.
id(nested)
139688113982848
type(nested)
list
Alternatively, we use the built-in list() constructor to create a
list
object out of any (finite) iterable we pass to it as the argument.
For example, we can wrap the range() built-in with list()
: As described in Chapter 4
,
range
objects, like range(1, 13)
below, are iterable and generate int
objects "on the fly" (i.e., one by one). The list() around it acts like a
for
-loop and materializes twelve int
objects in memory that then become the elements of the newly created list
object. PythonTutor shows this difference visually.
list(range(1, 13))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
Beware of passing a range
object over a "big" horizon as the argument to list() as that may lead to a
MemoryError
and the computer crashing.
list(range(999_999_999_999))
--------------------------------------------------------------------------- MemoryError Traceback (most recent call last) Cell In[8], line 1 ----> 1 list(range(999_999_999_999)) MemoryError:
As another example, we create a list
object from a str
object, which is iterable, as well. Then, the individual characters become the elements of the new list
object!
list("iterable")
['i', 't', 'e', 'r', 'a', 'b', 'l', 'e']
list
objects are sequences. To reiterate that, we briefly summarize the four behaviors of a sequence and provide some more list
-specific details below:
Container
:in
operatorIterable
:for
or while
statementsReversible
:Sized
:The "length" of nested
is five because empty
and simple
count as one element each. In other words, nested
holds five references to other objects, two of which are list
objects.
len(nested)
5
With a for
-loop, we can iterate over all elements in a predictable order, forward or backward. As list
objects hold references to other objects, these have an indentity and may even be of different types; however, the latter observation is rarely, if ever, useful in practice.
for element in nested:
print(str(element).ljust(10), str(id(element)).ljust(18), type(element))
[] 139688113991744 <class 'list'> 10 94291567220768 <class 'int'> 20.0 139688114016656 <class 'float'> Thirty 139688114205808 <class 'str'> [40, 50] 139688113982208 <class 'list'>
for element in reversed(nested):
print(element, end=" ")
[40, 50] Thirty 20.0 10 []
The in
operator checks if a given object is "contained" in a list
object. It uses the ==
operator behind the scenes (i.e., not the is
operator) conducting a linear search : So, Python implicitly loops over all elements and only stops prematurely if an element evaluates equal to the searched object. A linear search may, therefore, be relatively slow for big
list
objects.
10 in nested
True
20
compares equal to the 20.0
in nested
.
20 in nested
True
30 in nested
False
Because of the predictable order and the finiteness, each element in a sequence can be labeled with a unique index, an int
object in the range 0≤index<|sequence|.
Brackets, [
and ]
, are the literal syntax for accessing individual elements of any sequence type. In this book, we also call them the indexing operator in this context.
nested[0]
[]
The last index is one less than len(nested)
, and Python raises an IndexError
if we look up an index that is not in the range.
nested[5]
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[17], line 1 ----> 1 nested[5] IndexError: list index out of range
Negative indices are used to count in reverse order from the end of a sequence, and brackets may be chained to access nested objects. So, to access the 50
inside simple
via the nested
object, we write nested[-1][1]
.
nested[-1]
[40, 50]
nested[-1][1]
50
Slicing list
objects works analogously to slicing str
objects: We use the literal syntax with either one or two colons :
inside the brackets []
to separate the start, stop, and step values. Slicing creates a new list
object with the elements chosen from the original one.
For example, to obtain the three elements in the "middle" of nested
, we slice from 1
(including) to 4
(excluding).
nested[1:4]
[10, 20.0, 'Thirty']
To obtain "every other" element, we slice from beginning to end, defaulting to 0
and len(nested)
when omitted, in steps of 2
.
nested[::2]
[[], 20.0, [40, 50]]
The literal notation with the colons :
is syntactic sugar. It saves us from using the slice() built-in to create
slice
objects. slice() takes start, stop, and step arguments in the same way as the familiar range()
, and the
slice
objects it creates are used just as indexes above.
In most cases, the literal notation is more convenient to use; however, with slice
objects, we can give names to slices and reuse them across several sequences.
middle = slice(1, 4)
type(middle)
slice
nested[middle]
[10, 20.0, 'Thirty']
slice
objects come with three read-only attributes start
, stop
, and step
on them.
middle.start
1
middle.stop
4
If not passed to slice() , these attributes default to
None
. That is why the cell below has no output.
middle.step
A good trick to know is taking a "full" slice: This copies all elements of a list
object into a new list
object.
nested_copy = nested[:]
nested_copy
[[], 10, 20.0, 'Thirty', [40, 50]]
At first glance, nested
and nested_copy
seem to cause no pain. For list
objects, the comparison operator ==
goes over the elements in both operands in a pairwise fashion and checks if they all evaluate equal (cf., the "List Comparison" section below for more details).
We confirm that nested
and nested_copy
compare equal as could be expected but also that they are different objects.
nested == nested_copy
True
nested is nested_copy
False
However, as PythonTutor reveals, only the references to the elements are copied, and not the objects in
nested
themselves! Because of that, nested_copy
is a so-called shallow copy of
nested
.
We could also see this with the id() function: The respective first elements in both
nested
and nested_copy
are the same object, namely empty
. So, we have three ways of accessing the same address in memory. Also, we say that nested
and nested_copy
partially share the same state.
nested[0] is nested_copy[0]
True
id(nested[0])
139688113991744
id(nested_copy[0])
139688113991744
Knowing this becomes critical if the elements in a list
object are mutable objects (i.e., we can change them in place), and this is the case with nested
and nested_copy
, as we see in the next section on "Mutability".
As both the original nested
object and its copy reference the same list
objects in memory, any changes made to them are visible to both! Because of that, working with shallow copies can easily become confusing.
Instead of a shallow copy, we could also create a so-called deep copy of
nested
: Then, the copying process recursively follows every reference in a nested data structure and creates copies of every object found.
To explicitly create shallow or deep copies, the copy module in the standard library
provides two functions, copy()
and deepcopy()
. We must always remember that slicing creates shallow copies only.
import copy
nested_deep_copy = copy.deepcopy(nested)
nested == nested_deep_copy
True
Now, the first elements of nested
and nested_deep_copy
are different objects, and PythonTutor shows that there are six
list
objects in memory.
nested[0] is nested_deep_copy[0]
False
id(nested[0])
139688113991744
id(nested_deep_copy[0])
139688113233152
As this StackOverflow question shows, understanding shallow and deep copies is a common source of confusion, independent of the programming language.
In contrast to str
objects, list
objects are mutable: We may assign new elements to indices or slices and also remove elements. That changes the references in a list
object. In general, if an object is mutable, we say that it may be changed in place.
nested[0] = 0
nested
[0, 10, 20.0, 'Thirty', [40, 50]]
When we re-assign a slice, we can even change the size of the list
object.
nested[:4] = [100, 100, 100]
nested
[100, 100, 100, [40, 50]]
len(nested)
4
The list
object's identity does not change. That is the main point behind mutable objects.
id(nested) # same memory location as before
139688113982848
nested_copy
is unchanged!
nested_copy
[[], 10, 20.0, 'Thirty', [40, 50]]
Let's change the nested [40, 50]
via nested_copy
into [1, 2, 3]
by replacing all its elements.
nested_copy[-1][:] = [1, 2, 3]
nested_copy
[[], 10, 20.0, 'Thirty', [1, 2, 3]]
That has a surprising side effect on nested
.
nested
[100, 100, 100, [1, 2, 3]]
That is precisely the confusion we talked about above when we said that nested_copy
is a shallow copy of nested
. PythonTutor shows how both reference the same nested
list
object that is changed in place from [40, 50]
into [1, 2, 3]
.
Lastly, we use the del
statement to remove an element.
del nested[-1]
nested
[100, 100, 100]
The del
statement also works for slices. Here, we remove all references nested
holds.
del nested[:]
nested
[]
Mutability for sequences is formalized by the MutableSequence
ABC in the collections.abc module.
The list
type is an essential data structure in any real-world Python application, and many typical list
-related algorithms from computer science theory are already built into it at the C level (cf., the documentation or the tutorial
for a full overview; unfortunately, not all methods have direct links). So, understanding and applying the built-in methods of the
list
type not only speeds up the development process but also makes programs significantly faster.
In contrast to the str
type's methods in Chapter 6 (e.g., .upper()
or .lower()
), the
list
type's methods that mutate an object do so in place. That means they never create new list
objects and return None
to indicate that. So, we must never assign the return value of list
methods to the variable holding the list!
Let's look at the following names
example.
names = ["Carl", "Peter"]
To add an object to the end of names
, we use the .append()
method. The code cell shows no output indicating that None
must be the return value.
names.append("Eckardt")
names
['Carl', 'Peter', 'Eckardt']
With the .extend()
method, we may also append multiple elements provided by an iterable. Here, the iterable is a list
object itself holding two str
objects.
names.extend(["Karl", "Oliver"])
names
['Carl', 'Peter', 'Eckardt', 'Karl', 'Oliver']
Similar to .append()
, we may add a new element at an arbitrary position with the .insert()
method. .insert()
takes two arguments, an index and the element to be inserted.
names.insert(1, "Berthold")
names
['Carl', 'Berthold', 'Peter', 'Eckardt', 'Karl', 'Oliver']
sorted(names)
['Berthold', 'Carl', 'Eckardt', 'Karl', 'Oliver', 'Peter']
As the previous code cell created a new list
object, names
is still unsorted.
names
['Carl', 'Berthold', 'Peter', 'Eckardt', 'Karl', 'Oliver']
Let's sort the elements in names
instead.
names.sort()
names
['Berthold', 'Carl', 'Eckardt', 'Karl', 'Oliver', 'Peter']
names.sort(reverse=True)
names
['Peter', 'Oliver', 'Karl', 'Eckardt', 'Carl', 'Berthold']
The .sort() method and the sorted()
function sort the elements in
names
in alphabetical order, forward or backward. However, that does not hold in general.
We mention above that list
objects may contain objects of any type and even of mixed types. Because of that, the sorting is delegated to the elements in a
list
object. In a way, Python "asks" the elements in a list
object to sort themselves. As names
contains only str
objects, they are sorted according the the comparison rules explained in Chapter 6 .
To customize the sorting, we pass a keyword-only key
argument to .sort() or sorted()
, which must be a
function
object accepting one positional argument. Then, the elements in the list
object are passed to that one by one, and the return values are used as the sort keys. The key
argument is also a popular use case for lambda
expressions.
For example, to sort names
not by alphabet but by the names' lengths, we pass in a reference to the built-in len() function as
key=len
. Note that there are no parentheses after len
!
names.sort(key=len)
If two names have the same length, their relative order is kept as is. That is why "Karl"
comes before "Carl"
below. A sorting algorithm with that property is called stable
.
Sorting is an important topic in programming, and we refer to the official HOWTO for a more comprehensive introduction.
names
['Karl', 'Carl', 'Peter', 'Oliver', 'Eckardt', 'Berthold']
.sort(reverse=True)
is different from the .reverse()
method. Whereas the former applies some sorting rule in reverse order, the latter simply reverses the elements in a list
object.
names.reverse()
names
['Berthold', 'Eckardt', 'Oliver', 'Peter', 'Carl', 'Karl']
The .pop()
method removes the last element from a list
object and returns it. Below we capture the removed
element to show that the return value is not None
as with all the methods introduced so far.
removed = names.pop()
removed
'Karl'
names
['Berthold', 'Eckardt', 'Oliver', 'Peter', 'Carl']
.pop()
takes an optional index argument and removes that instead.
So, to remove the second element, "Eckhardt"
, from names
, we write this.
removed = names.pop(1)
removed
'Eckardt'
names
['Berthold', 'Oliver', 'Peter', 'Carl']
Instead of removing an element by its index, we can also remove it by its value with the .remove()
method. Behind the scenes, Python then compares the object to be removed, "Peter"
in the example, sequentially to each element with the ==
operator and removes the first one that evaluates equal to it. .remove()
does not return the removed element.
names.remove("Peter")
names
['Berthold', 'Oliver', 'Carl']
Also, .remove()
raises a ValueError
if the value is not found.
names.remove("Peter")
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[80], line 1 ----> 1 names.remove("Peter") ValueError: list.remove(x): x not in list
list
objects implement an .index()
method that returns the position of the first element that compares equal to its argument. It fails loudly with a ValueError
if no element compares equal.
names
['Berthold', 'Oliver', 'Carl']
names.index("Oliver")
1
names.index("Karl")
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[83], line 1 ----> 1 names.index("Karl") ValueError: 'Karl' is not in list
The .count()
method returns the number of elements that compare equal to its argument.
names.count("Carl")
1
names.count("Karl")
0
Two more methods, .copy()
and .clear()
, are syntactic sugar and replace working with slices.
.copy()
creates a shallow copy. So, names.copy()
below does the same as taking a full slice with names[:]
, and the caveats from above apply, too.
names_copy = names.copy()
names_copy
['Berthold', 'Oliver', 'Carl']
.clear()
removes all references from a list
object. So, names_copy.clear()
is the same as del names_copy[:]
.
names_copy.clear()
names_copy
[]
Many methods introduced in this section are mentioned in the collections.abc module's documentation as well: While the
.index()
and .count()
methods come with any data type that is a Sequence
, the .append()
, .extend()
, .insert()
, .reverse()
, .pop()
, and .remove()
methods are part of any MutableSequence
type. The .sort()
, .copy()
, and .clear()
methods are list
-specific.
So, being a sequence does not only imply the four behaviors specified above, but also means that a data type comes with certain standardized methods.
As with str
objects, the +
and *
operators are overloaded for concatenation and always return a new list
object. The references in this newly created list
object reference the same objects as the two original list
objects. So, the same caveat as with shallow copies from above applies!
names
['Berthold', 'Oliver', 'Carl']
names + ["Diedrich", "Yves"]
['Berthold', 'Oliver', 'Carl', 'Diedrich', 'Yves']
2 * names
['Berthold', 'Oliver', 'Carl', 'Berthold', 'Oliver', 'Carl']
Besides being an operator, the *
symbol has a second syntactical use, as explained in PEP 3132 and PEP 448
: It implements what is called iterable unpacking. It is not an operator syntactically but a notation that Python reads as a literal.
In the example, Python interprets the expression as if the elements of the iterable names
were placed between "Achim"
and "Xavier"
one by one. So, we do not obtain a nested but a flat list.
["Achim", *names, "Xavier"]
['Achim', 'Berthold', 'Oliver', 'Carl', 'Xavier']
Effectively, Python reads that as if we wrote the following.
["Achim", names[0], names[1], names[2], "Xavier"]
['Achim', 'Berthold', 'Oliver', 'Carl', 'Xavier']
The relational operators also work with list
objects; yet another example of operator overloading.
Comparison is made in a pairwise fashion until the first pair of elements does not evaluate equal or one of the list
objects ends. The exact comparison rules depend on the elements and not the list
objects. As with .sort() or sorted()
above, comparison is delegated to the objects to be compared, and Python "asks" the elements in the two
list
objects to compare themselves. Usually, all elements are of the same type, and comparison is straightforward.
names
['Berthold', 'Oliver', 'Carl']
names == ["Berthold", "Oliver", "Carl"]
True
names != ["Berthold", "Oliver", "Karl"]
True
names < ["Berthold", "Oliver", "Karl"]
True
If two list
objects have a different number of elements and all overlapping elements compare equal, the shorter list
object is considered "smaller."
["Berthold", "Oliver"] < names < ["Berthold", "Oliver", "Carl", "Xavier"]
True