Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .

Chapter 7: Sequential Data (continued)¶

In this second part of the chapter, we look closely at the built-in list type, which is probably the most commonly used sequence data type in practice.

The `list` Type¶

As already seen multiple times, to create a list object, we use the literal notation and list all elements within brackets [ and ].

In [1]:

empty = []

In [2]:

simple = [40, 50]

The elements do not need to be of the same type, and list objects may also be nested.

In [3]:

nested = [empty, 10, 20.0, "Thirty", simple]

PythonTutor shows how nested holds references to the empty and simple objects. Technically, it holds three more references to the 10, 20.0, and "Thirty" objects as well. However, to simplify the visualization, these three objects are shown right inside the nested object. That may be done because they are immutable and "flat" data types. In general, the $0$ s and $1$ s inside a list object in memory always constitute references to other objects only.

In [4]:

nested

Out[4]:

[[], 10, 20.0, 'Thirty', [40, 50]]

Let's not forget that nested is an object on its own with an identity and data type.

In [5]:

id(nested)

Out[5]:

139688113982848

In [6]:

type(nested)

Out[6]:

list

Alternatively, we use the built-in list() constructor to create a list object out of any (finite) iterable we pass to it as the argument.

For example, we can wrap the range() built-in with list() : As described in Chapter 4 , range objects, like range(1, 13) below, are iterable and generate int objects "on the fly" (i.e., one by one). The list() around it acts like a for-loop and materializes twelve int objects in memory that then become the elements of the newly created list object. PythonTutor shows this difference visually.

In [7]:

list(range(1, 13))

Out[7]:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

Beware of passing a range object over a "big" horizon as the argument to list() as that may lead to a MemoryError and the computer crashing.

In [8]:

list(range(999_999_999_999))

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
Cell In[8], line 1
----> 1 list(range(999_999_999_999))

MemoryError:

As another example, we create a list object from a str object, which is iterable, as well. Then, the individual characters become the elements of the new list object!

In [9]:

list("iterable")

Out[9]:

['i', 't', 'e', 'r', 'a', 'b', 'l', 'e']

Sequence Behaviors¶

list objects are sequences. To reiterate that, we briefly summarize the four behaviors of a sequence and provide some more list-specific details below:

Container:
holds references to other objects in memory (with their own identity and type)
implements membership testing via the in operator
Iterable:
supports being looped over
works with the for or while statements
Reversible:
the elements come in a predictable order that we may loop over in a forward or backward fashion
works with the reversed() built-in
Sized:
the number of elements is finite and known in advance
works with the built-in len() function

The "length" of nested is five because empty and simple count as one element each. In other words, nested holds five references to other objects, two of which are list objects.

In [10]:

len(nested)

Out[10]:

With a for-loop, we can iterate over all elements in a predictable order, forward or backward. As list objects hold references to other objects, these have an indentity and may even be of different types; however, the latter observation is rarely, if ever, useful in practice.

In [11]:

for element in nested:
    print(str(element).ljust(10), str(id(element)).ljust(18), type(element))

[]         139688113991744    <class 'list'>
10         94291567220768     <class 'int'>
20.0       139688114016656    <class 'float'>
Thirty     139688114205808    <class 'str'>
[40, 50]   139688113982208    <class 'list'>

In [12]:

for element in reversed(nested):
    print(element, end="     ")

[40, 50]     Thirty     20.0     10     []

The in operator checks if a given object is "contained" in a list object. It uses the == operator behind the scenes (i.e., not the is operator) conducting a linear search : So, Python implicitly loops over all elements and only stops prematurely if an element evaluates equal to the searched object. A linear search may, therefore, be relatively slow for big list objects.

In [13]:

10 in nested

Out[13]:

True

20 compares equal to the 20.0 in nested.

In [14]:

20 in nested

Out[14]:

True

In [15]:

30 in nested

Out[15]:

False

Indexing¶

Because of the predictable order and the finiteness, each element in a sequence can be labeled with a unique index, an int object in the range $0 \leq \text{index} < \lvert \text{sequence} \rvert$ .

Brackets, [ and ], are the literal syntax for accessing individual elements of any sequence type. In this book, we also call them the indexing operator in this context.

In [16]:

nested[0]

Out[16]:

[]

The last index is one less than len(nested), and Python raises an IndexError if we look up an index that is not in the range.

In [17]:

nested[5]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[17], line 1
----> 1 nested[5]

IndexError: list index out of range

Negative indices are used to count in reverse order from the end of a sequence, and brackets may be chained to access nested objects. So, to access the 50 inside simple via the nested object, we write nested[-1][1].

In [18]:

nested[-1]

Out[18]:

[40, 50]

In [19]:

nested[-1][1]

Out[19]:

Slicing¶

Slicing list objects works analogously to slicing str objects: We use the literal syntax with either one or two colons : inside the brackets [] to separate the start, stop, and step values. Slicing creates a new list object with the elements chosen from the original one.

For example, to obtain the three elements in the "middle" of nested, we slice from 1 (including) to 4 (excluding).

In [20]:

nested[1:4]

Out[20]:

[10, 20.0, 'Thirty']

To obtain "every other" element, we slice from beginning to end, defaulting to 0 and len(nested) when omitted, in steps of 2.

In [21]:

nested[::2]

Out[21]:

[[], 20.0, [40, 50]]

The literal notation with the colons : is syntactic sugar. It saves us from using the slice() built-in to create slice objects. slice() takes start, stop, and step arguments in the same way as the familiar range() , and the slice objects it creates are used just as indexes above.

In most cases, the literal notation is more convenient to use; however, with slice objects, we can give names to slices and reuse them across several sequences.

In [22]:

middle = slice(1, 4)

In [23]:

type(middle)

Out[23]:

slice

In [24]:

nested[middle]

Out[24]:

[10, 20.0, 'Thirty']

slice objects come with three read-only attributes start, stop, and step on them.

In [25]:

middle.start

Out[25]:

In [26]:

middle.stop

Out[26]:

If not passed to slice() , these attributes default to None. That is why the cell below has no output.

In [27]:

middle.step

A good trick to know is taking a "full" slice: This copies all elements of a list object into a new list object.

In [28]:

nested_copy = nested[:]

In [29]:

nested_copy

Out[29]:

[[], 10, 20.0, 'Thirty', [40, 50]]

At first glance, nested and nested_copy seem to cause no pain. For list objects, the comparison operator == goes over the elements in both operands in a pairwise fashion and checks if they all evaluate equal (cf., the "List Comparison" section below for more details).

We confirm that nested and nested_copy compare equal as could be expected but also that they are different objects.

In [30]:

nested == nested_copy

Out[30]:

True

In [31]:

nested is nested_copy

Out[31]:

False

However, as PythonTutor reveals, only the references to the elements are copied, and not the objects in nested themselves! Because of that, nested_copy is a so-called shallow copy of nested.

We could also see this with the id() function: The respective first elements in both nested and nested_copy are the same object, namely empty. So, we have three ways of accessing the same address in memory. Also, we say that nested and nested_copy partially share the same state.

In [32]:

nested[0] is nested_copy[0]

Out[32]:

True

In [33]:

id(nested[0])

Out[33]:

139688113991744

In [34]:

id(nested_copy[0])

Out[34]:

139688113991744

Knowing this becomes critical if the elements in a list object are mutable objects (i.e., we can change them in place), and this is the case with nested and nested_copy, as we see in the next section on "Mutability".

As both the original nested object and its copy reference the same list objects in memory, any changes made to them are visible to both! Because of that, working with shallow copies can easily become confusing.

Instead of a shallow copy, we could also create a so-called deep copy of nested: Then, the copying process recursively follows every reference in a nested data structure and creates copies of every object found.

To explicitly create shallow or deep copies, the copy module in the standard library provides two functions, copy() and deepcopy() . We must always remember that slicing creates shallow copies only.

In [35]:

import copy

In [36]:

nested_deep_copy = copy.deepcopy(nested)

In [37]:

nested == nested_deep_copy

Out[37]:

True

Now, the first elements of nested and nested_deep_copy are different objects, and PythonTutor shows that there are six list objects in memory.

In [38]:

nested[0] is nested_deep_copy[0]

Out[38]:

False

In [39]:

id(nested[0])

Out[39]:

139688113991744

In [40]:

id(nested_deep_copy[0])

Out[40]:

139688113233152

As this StackOverflow question shows, understanding shallow and deep copies is a common source of confusion, independent of the programming language.

Mutability¶

In contrast to str objects, list objects are mutable: We may assign new elements to indices or slices and also remove elements. That changes the references in a list object. In general, if an object is mutable, we say that it may be changed in place.

In [41]:

nested[0] = 0

In [42]:

nested

Out[42]:

[0, 10, 20.0, 'Thirty', [40, 50]]

When we re-assign a slice, we can even change the size of the list object.

In [43]:

nested[:4] = [100, 100, 100]

In [44]:

nested

Out[44]:

[100, 100, 100, [40, 50]]

In [45]:

len(nested)

Out[45]:

The list object's identity does not change. That is the main point behind mutable objects.

In [46]:

id(nested)  # same memory location as before

Out[46]:

139688113982848

nested_copy is unchanged!

In [47]:

nested_copy

Out[47]:

[[], 10, 20.0, 'Thirty', [40, 50]]

Let's change the nested [40, 50] via nested_copy into [1, 2, 3] by replacing all its elements.

In [48]:

nested_copy[-1][:] = [1, 2, 3]

In [49]:

nested_copy

Out[49]:

[[], 10, 20.0, 'Thirty', [1, 2, 3]]

That has a surprising side effect on nested.

In [50]:

nested

Out[50]:

[100, 100, 100, [1, 2, 3]]

That is precisely the confusion we talked about above when we said that nested_copy is a shallow copy of nested. PythonTutor shows how both reference the same nested list object that is changed in place from [40, 50] into [1, 2, 3].

Lastly, we use the del statement to remove an element.

In [51]:

del nested[-1]

In [52]:

nested

Out[52]:

[100, 100, 100]

The del statement also works for slices. Here, we remove all references nested holds.

In [53]:

del nested[:]

In [54]:

nested

Out[54]:

[]

Mutability for sequences is formalized by the MutableSequence ABC in the collections.abc module.

List Methods¶

The list type is an essential data structure in any real-world Python application, and many typical list-related algorithms from computer science theory are already built into it at the C level (cf., the documentation or the tutorial for a full overview; unfortunately, not all methods have direct links). So, understanding and applying the built-in methods of the list type not only speeds up the development process but also makes programs significantly faster.

In contrast to the str type's methods in Chapter 6 (e.g., .upper() or .lower() ), the list type's methods that mutate an object do so in place. That means they never create new list objects and return None to indicate that. So, we must never assign the return value of list methods to the variable holding the list!

Let's look at the following names example.

In [55]:

names = ["Carl", "Peter"]

To add an object to the end of names, we use the .append() method. The code cell shows no output indicating that None must be the return value.

In [56]:

names.append("Eckardt")

In [57]:

names

Out[57]:

['Carl', 'Peter', 'Eckardt']

With the .extend() method, we may also append multiple elements provided by an iterable. Here, the iterable is a list object itself holding two str objects.

In [58]:

names.extend(["Karl", "Oliver"])

In [59]:

names

Out[59]:

['Carl', 'Peter', 'Eckardt', 'Karl', 'Oliver']

Similar to .append(), we may add a new element at an arbitrary position with the .insert() method. .insert() takes two arguments, an index and the element to be inserted.

In [60]:

names.insert(1, "Berthold")

In [61]:

names

Out[61]:

['Carl', 'Berthold', 'Peter', 'Eckardt', 'Karl', 'Oliver']

list objects may be sorted in place with the .sort() method. That is different from the built-in sorted() function that takes any finite and iterable object and returns a new list object with the iterable's elements sorted!

In [62]:

sorted(names)

Out[62]:

['Berthold', 'Carl', 'Eckardt', 'Karl', 'Oliver', 'Peter']

As the previous code cell created a new list object, names is still unsorted.

In [63]:

names

Out[63]:

['Carl', 'Berthold', 'Peter', 'Eckardt', 'Karl', 'Oliver']

Let's sort the elements in names instead.

In [64]:

names.sort()

In [65]:

names

Out[65]:

['Berthold', 'Carl', 'Eckardt', 'Karl', 'Oliver', 'Peter']

To sort in reverse order, we pass a keyword-only reverse=True argument to either the .sort() method or the sorted() function.

In [66]:

names.sort(reverse=True)

In [67]:

names

Out[67]:

['Peter', 'Oliver', 'Karl', 'Eckardt', 'Carl', 'Berthold']

The .sort() method and the sorted() function sort the elements in names in alphabetical order, forward or backward. However, that does not hold in general.

We mention above that list objects may contain objects of any type and even of mixed types. Because of that, the sorting is delegated to the elements in a list object. In a way, Python "asks" the elements in a list object to sort themselves. As names contains only str objects, they are sorted according the the comparison rules explained in Chapter 6 .

To customize the sorting, we pass a keyword-only key argument to .sort() or sorted() , which must be a function object accepting one positional argument. Then, the elements in the list object are passed to that one by one, and the return values are used as the sort keys. The key argument is also a popular use case for lambda expressions.

For example, to sort names not by alphabet but by the names' lengths, we pass in a reference to the built-in len() function as key=len. Note that there are no parentheses after len!

In [68]:

names.sort(key=len)

If two names have the same length, their relative order is kept as is. That is why "Karl" comes before "Carl" below. A sorting algorithm with that property is called stable .

Sorting is an important topic in programming, and we refer to the official HOWTO for a more comprehensive introduction.

In [69]:

names

Out[69]:

['Karl', 'Carl', 'Peter', 'Oliver', 'Eckardt', 'Berthold']

.sort(reverse=True) is different from the .reverse() method. Whereas the former applies some sorting rule in reverse order, the latter simply reverses the elements in a list object.

In [70]:

names.reverse()

In [71]:

names

Out[71]:

['Berthold', 'Eckardt', 'Oliver', 'Peter', 'Carl', 'Karl']

The .pop() method removes the last element from a list object and returns it. Below we capture the removed element to show that the return value is not None as with all the methods introduced so far.

In [72]:

removed = names.pop()

In [73]:

removed

Out[73]:

'Karl'

In [74]:

names

Out[74]:

['Berthold', 'Eckardt', 'Oliver', 'Peter', 'Carl']

.pop() takes an optional index argument and removes that instead.

So, to remove the second element, "Eckhardt", from names, we write this.

In [75]:

removed = names.pop(1)

In [76]:

removed

Out[76]:

'Eckardt'

In [77]:

names

Out[77]:

['Berthold', 'Oliver', 'Peter', 'Carl']

Instead of removing an element by its index, we can also remove it by its value with the .remove() method. Behind the scenes, Python then compares the object to be removed, "Peter" in the example, sequentially to each element with the == operator and removes the first one that evaluates equal to it. .remove() does not return the removed element.

In [78]:

names.remove("Peter")

In [79]:

names

Out[79]:

['Berthold', 'Oliver', 'Carl']

Also, .remove() raises a ValueError if the value is not found.

In [80]:

names.remove("Peter")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[80], line 1
----> 1 names.remove("Peter")

ValueError: list.remove(x): x not in list

list objects implement an .index() method that returns the position of the first element that compares equal to its argument. It fails loudly with a ValueError if no element compares equal.

In [81]:

names

Out[81]:

['Berthold', 'Oliver', 'Carl']

In [82]:

names.index("Oliver")

Out[82]:

In [83]:

names.index("Karl")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[83], line 1
----> 1 names.index("Karl")

ValueError: 'Karl' is not in list

The .count() method returns the number of elements that compare equal to its argument.

In [84]:

names.count("Carl")

Out[84]:

In [85]:

names.count("Karl")

Out[85]:

Two more methods, .copy() and .clear(), are syntactic sugar and replace working with slices.

.copy() creates a shallow copy. So, names.copy() below does the same as taking a full slice with names[:], and the caveats from above apply, too.

In [86]:

names_copy = names.copy()

In [87]:

names_copy

Out[87]:

['Berthold', 'Oliver', 'Carl']

.clear() removes all references from a list object. So, names_copy.clear() is the same as del names_copy[:].

In [88]:

names_copy.clear()

In [89]:

names_copy

Out[89]:

[]

Many methods introduced in this section are mentioned in the collections.abc module's documentation as well: While the .index() and .count() methods come with any data type that is a Sequence, the .append(), .extend(), .insert(), .reverse(), .pop(), and .remove() methods are part of any MutableSequence type. The .sort(), .copy(), and .clear() methods are list-specific.

So, being a sequence does not only imply the four behaviors specified above, but also means that a data type comes with certain standardized methods.

List Operations¶

As with str objects, the + and * operators are overloaded for concatenation and always return a new list object. The references in this newly created list object reference the same objects as the two original list objects. So, the same caveat as with shallow copies from above applies!

In [90]:

names

Out[90]:

['Berthold', 'Oliver', 'Carl']

In [91]:

names + ["Diedrich", "Yves"]

Out[91]:

['Berthold', 'Oliver', 'Carl', 'Diedrich', 'Yves']

In [92]:

2 * names

Out[92]:

['Berthold', 'Oliver', 'Carl', 'Berthold', 'Oliver', 'Carl']

Unpacking¶

Besides being an operator, the * symbol has a second syntactical use, as explained in PEP 3132 and PEP 448 : It implements what is called iterable unpacking. It is not an operator syntactically but a notation that Python reads as a literal.

In the example, Python interprets the expression as if the elements of the iterable names were placed between "Achim" and "Xavier" one by one. So, we do not obtain a nested but a flat list.

In [93]:

["Achim", *names, "Xavier"]

Out[93]:

['Achim', 'Berthold', 'Oliver', 'Carl', 'Xavier']

Effectively, Python reads that as if we wrote the following.

In [94]:

["Achim", names[0], names[1], names[2], "Xavier"]

Out[94]:

['Achim', 'Berthold', 'Oliver', 'Carl', 'Xavier']

List Comparison¶

The relational operators also work with list objects; yet another example of operator overloading.

Comparison is made in a pairwise fashion until the first pair of elements does not evaluate equal or one of the list objects ends. The exact comparison rules depend on the elements and not the list objects. As with .sort() or sorted() above, comparison is delegated to the objects to be compared, and Python "asks" the elements in the two list objects to compare themselves. Usually, all elements are of the same type, and comparison is straightforward.

In [95]:

names

Out[95]:

['Berthold', 'Oliver', 'Carl']

In [96]:

names == ["Berthold", "Oliver", "Carl"]

Out[96]:

True

In [97]:

names != ["Berthold", "Oliver", "Karl"]

Out[97]:

True

In [98]:

names < ["Berthold", "Oliver", "Karl"]

Out[98]:

True

If two list objects have a different number of elements and all overlapping elements compare equal, the shorter list object is considered "smaller."

In [99]:

["Berthold", "Oliver"] < names < ["Berthold", "Oliver", "Carl", "Xavier"]

Out[99]:

True