Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .

Chapter 7: Sequential Data (continued)¶

In this third part of the chapter, we first look at a major implication of the list type's mutability. Then, we see how its close relative, the tuple type, can mitigate this. Lastly, we see how Python's syntax assumes sequential data at various places: for example, when unpacking iterables during a for-loop or an assignment, or when working with function objects.

Modifiers vs. Pure Functions¶

As list objects are mutable, the caller of a function can see the changes made to a list object passed to the function as an argument. That is often a surprising side effect and should be avoided.

As an example, consider the add_xyz() function.

In [1]:

letters = ["a", "b", "c"]

In [2]:

def add_xyz(arg):
    """Append letters to a list."""
    arg.extend(["x", "y", "z"])
    return arg

While this function is being executed, two variables, namely letters in the global scope and arg inside the function's local scope, reference the same list object in memory. Furthermore, the passed in arg is also the return value.

So, after the function call, letters_with_xyz and letters are aliases as well, referencing the same object. We can also visualize that with PythonTutor .

In [3]:

letters_with_xyz = add_xyz(letters)

In [4]:

letters_with_xyz

Out[4]:

['a', 'b', 'c', 'x', 'y', 'z']

In [5]:

letters

Out[5]:

['a', 'b', 'c', 'x', 'y', 'z']

A better practice is to first create a copy of arg within the function that is then modified and returned. If we are sure that arg contains immutable elements only, we get away with a shallow copy. The downside of this approach is the higher amount of memory necessary.

The revised add_xyz() function below is more natural to reason about as it does not modify the passed in arg internally. PythonTutor shows that as well. This approach is following the functional programming paradigm that is going through a "renaissance" currently. Two essential characteristics of functional programming are that a function never changes its inputs and always returns the same output given the same inputs.

For a beginner, it is probably better to stick to this idea and not change any arguments as the original add_xyz() above. However, functions that modify and return the argument passed in are an important aspect of object-oriented programming, as explained in Chapter 11 .

In [6]:

letters = ["a", "b", "c"]

In [7]:

def add_xyz(arg):
    """Create a new list from an existing one."""
    new_arg = arg[:]
    new_arg.extend(["x", "y", "z"])
    return new_arg

In [8]:

letters_with_xyz = add_xyz(letters)

In [9]:

letters_with_xyz

Out[9]:

['a', 'b', 'c', 'x', 'y', 'z']

In [10]:

letters

Out[10]:

['a', 'b', 'c']

If we want to modify the argument passed in, it is best to return None and not arg, as does the final version of add_xyz() below. Then, the user of our function cannot accidentally create two aliases to the same object. That is also why the list methods above all return None. PythonTutor shows how there is only one reference to letters after the function call.

In [11]:

letters = ["a", "b", "c"]

In [12]:

def add_xyz(arg):
    """Append letters to a list."""
    arg.extend(["x", "y", "z"])
    return  # None

In [13]:

add_xyz(letters)

In [14]:

letters

Out[14]:

['a', 'b', 'c', 'x', 'y', 'z']

If we call add_xyz() with letters as the argument again, we end up with an even longer list object.

In [15]:

add_xyz(letters)

In [16]:

letters

Out[16]:

['a', 'b', 'c', 'x', 'y', 'z', 'x', 'y', 'z']

Functions that only work on the argument passed in are called modifiers. Their primary purpose is to change the state of the argument. On the contrary, functions that have no side effects on the arguments are said to be pure.

The `tuple` Type¶

To create a tuple object, we can use the same literal notation as for list objects without the brackets and list all elements.

In [17]:

numbers = 7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4

In [18]:

numbers

Out[18]:

(7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4)

However, to be clearer, many Pythonistas write out the optional parentheses ( and ).

In [19]:

numbers = (7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4)

In [20]:

numbers

Out[20]:

(7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4)

As before, numbers is an object on its own.

In [21]:

id(numbers)

Out[21]:

140248673535456

In [22]:

type(numbers)

Out[22]:

tuple

While we could use empty parentheses () to create an empty tuple object ...

In [23]:

empty_tuple = ()

In [24]:

empty_tuple

Out[24]:

()

In [25]:

type(empty_tuple)

Out[25]:

tuple

... we must use a trailing comma to create a tuple object holding one element. If we forget the comma, the parentheses are interpreted as the grouping operator and effectively useless!

In [26]:

one_tuple = (1,)  # we could ommit the parentheses but not the comma

In [27]:

one_tuple

Out[27]:

(1,)

In [28]:

type(one_tuple)

Out[28]:

tuple

In [29]:

no_tuple = (1)

In [30]:

no_tuple

Out[30]:

In [31]:

type(no_tuple)

Out[31]:

int

Alternatively, we may use the tuple() built-in that takes any iterable as its argument and creates a new tuple from its elements.

In [32]:

tuple([1])

Out[32]:

(1,)

In [33]:

tuple("iterable")

Out[33]:

('i', 't', 'e', 'r', 'a', 'b', 'l', 'e')

Tuples are like "Immutable Lists"¶

Most operations involving tuple objects work in the same way as with list objects. The main difference is that tuple objects are immutable. So, if our program does not depend on mutability, we may and should use tuple and not list objects to model sequential data. That way, we avoid the pitfalls seen above.

tuple objects are sequences exhibiting the familiar four behaviors. So, numbers holds a finite number of elements ...

In [34]:

len(numbers)

Out[34]:

... that we can obtain individually by looping over it in a predictable forward or reverse order.

In [35]:

for number in numbers:
    print(number, end="   ")

7   11   8   5   3   12   2   6   9   10   1   4

In [36]:

for number in reversed(numbers):
    print(number, end="   ")

4   1   10   9   6   2   12   3   5   8   11   7

To check if a given object is contained in numbers, we use the in operator and conduct a linear search.

In [37]:

0 in numbers

Out[37]:

False

In [38]:

1 in numbers

Out[38]:

True

In [39]:

1.0 in numbers  # in relies on == behind the scenes

Out[39]:

True

We may index and slice with the [] operator. The latter returns new tuple objects.

In [40]:

numbers[0]

Out[40]:

In [41]:

numbers[-1]

Out[41]:

In [42]:

numbers[6:]

Out[42]:

(2, 6, 9, 10, 1, 4)

Index assignment does not work as tuples are immutable and results in a TypeError.

In [43]:

numbers[-1] = 99

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[43], line 1
----> 1 numbers[-1] = 99

TypeError: 'tuple' object does not support item assignment

The + and * operators work with tuple objects as well: They always create new tuple objects.

In [44]:

numbers + (99,)

Out[44]:

(7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4, 99)

In [45]:

2 * numbers

Out[45]:

(7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4, 7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4)

Being immutable, tuple objects only provide the .count() and .index() methods of Sequence types. The .append(), .extend(), .insert(), .reverse(), .pop(), and .remove() methods of MutableSequence types are not available. The same holds for the list-specific .sort(), .copy(), and .clear() methods.

In [46]:

numbers.count(0)

Out[46]:

In [47]:

numbers.index(1)

Out[47]:

The relational operators work in the same way as for list objects.

In [48]:

numbers

Out[48]:

(7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4)

In [49]:

numbers == (7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4)

Out[49]:

True

In [50]:

numbers != (99, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4)

Out[50]:

True

In [51]:

numbers < (99, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4)

Out[51]:

True

While tuple objects are immutable, this only relates to the references they hold. If a tuple object contains references to mutable objects, the entire nested structure is not immutable as a whole!

Consider the following stylized example not_immutable: It contains three elements, 1, [2, ..., 11], and 12, and the elements of the nested list object may be changed. While it is not practical to mix data types in a tuple object that is used as an "immutable list," we want to make the point that the mere usage of the tuple type does not guarantee a nested object to be immutable as a whole.

In [52]:

not_immutable = (1, [2, 3, 4, 5, 6, 7, 8, 9, 10, 11], 12)

In [53]:

not_immutable

Out[53]:

(1, [2, 3, 4, 5, 6, 7, 8, 9, 10, 11], 12)

In [54]:

not_immutable[1][:] = [99, 99, 99]

In [55]:

not_immutable

Out[55]:

(1, [99, 99, 99], 12)

Packing & Unpacking¶

In the "List Operations" section in the second part of this chapter, the * symbol unpacks the elements of a list object into another one. This idea of iterable unpacking is built into Python at various places, even without the * symbol.

For example, we may write variables on the left-hand side of a = statement in a literal tuple style. Then, any finite iterable on the right-hand side is unpacked. So, numbers is unpacked into twelve variables below.

In [56]:

n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12 = numbers

In [57]:

n1

Out[57]:

In [58]:

n2

Out[58]:

In [59]:

n3

Out[59]:

Having to type twelve variables on the left is already tedious. Furthermore, if the iterable on the right yields a number of elements different from the number of variables, we get a ValueError.

In [60]:

n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11 = numbers

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[60], line 1
----> 1 n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11 = numbers

ValueError: too many values to unpack (expected 11)

In [61]:

n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12, n13 = numbers

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[61], line 1
----> 1 n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12, n13 = numbers

ValueError: not enough values to unpack (expected 13, got 12)

So, to make iterable unpacking useful, we prepend the * symbol to one of the variables on the left: That variable then becomes a list object holding the elements not captured by the other variables. We say that the excess elements from the iterable are packed into this variable.

For example, let's get the first and last element of numbers and collect the rest in middle.

In [62]:

first, *middle, last = numbers

In [63]:

first

Out[63]:

In [64]:

middle  # always a list!

Out[64]:

[11, 8, 5, 3, 12, 2, 6, 9, 10, 1]

In [65]:

last

Out[65]:

We already used unpacking before this section without knowing it. Whenever we write a for-loop over the zip() built-in, that generates a new tuple object in each iteration that we unpack by listing several loop variables.

So, the name, position below acts like a left-hand side of an = statement and unpacks the tuple objects generated from "zipping" the names list and the positions tuple together.

In [66]:

names = ["Berthold", "Oliver", "Carl"]

In [67]:

positions = ("goalkeeper", "defender", "midfielder", "striker", "coach")

In [68]:

for name, position in zip(names, positions):
    print(name, "is a", position)

Berthold is a goalkeeper
Oliver is a defender
Carl is a midfielder

Without unpacking, zip() generates a series of tuple objects.

In [69]:

for pair in zip(names, positions):
    print(type(pair), pair, sep="   ")

<class 'tuple'>   ('Berthold', 'goalkeeper')
<class 'tuple'>   ('Oliver', 'defender')
<class 'tuple'>   ('Carl', 'midfielder')

Unpacking also works for nested objects. Below, we wrap zip() with the enumerate() built-in to have an index variable number inside the for-loop. In each iteration, a tuple object consisting of number and another tuple object is created. The inner one then holds the name and position.

In [70]:

for number, (name, position) in enumerate(zip(names, positions), start=1):
    print(f"{name} (jersey #{number}) is a {position}")

Berthold (jersey #1) is a goalkeeper
Oliver (jersey #2) is a defender
Carl (jersey #3) is a midfielder

Swapping Variables¶

A popular use case of unpacking is swapping two variables.

Consider a and b below.

In [71]:

a = 0
b = 1

Without unpacking, we must use a temporary variable temp to swap a and b.

In [72]:

temp = a
a = b
b = temp

del temp

In [73]:

Out[73]:

In [74]:

Out[74]:

With unpacking, the solution is more elegant. All expressions on the right-hand side are evaluated before any assignment takes place.

In [75]:

a, b = 0, 1

In [76]:

a, b = b, a

In [77]:

a, b

Out[77]:

(1, 0)

Example: Fibonacci Numbers (revisited)

Unpacking allows us to rewrite the iterative fibonacci() function from Chapter 4 in a concise way.

In [78]:

def fibonacci(i):
    """Calculate the ith Fibonacci number.

    Args:
        i (int): index of the Fibonacci number to calculate

    Returns:
        ith_fibonacci (int)
    """
    a, b = 0, 1

    for _ in range(i - 1):
        a, b = b, a + b

    return b

In [79]:

fibonacci(12)

Out[79]:

Function Definitions & Calls¶

The concepts of packing and unpacking are also helpful when writing and using functions.

For example, let's look at the product() function below. Its implementation suggests that args must be a sequence type. Otherwise, it would not make sense to index into it with [0] or take a slice with [1:]. In line with the function's name, the for-loop multiplies all elements of the args sequence. So, what does the * do in the header line, and what is the exact data type of args?

The * is again not an operator in this context but a special syntax that makes Python pack all positional arguments passed to product() into a single tuple object called args.

In [80]:

def product(*args):
    """Multiply all arguments."""
    result = args[0]

    for arg in args[1:]:
        result *= arg

    return result

So, we can pass an arbitrary (i.e., also none) number of positional arguments to product().

The product of just one number is the number itself.

In [81]:

product(42)

Out[81]:

Passing in several numbers works as expected.

In [82]:

product(2, 5, 10)

Out[82]:

However, this implementation of product() needs at least one argument passed in due to the expression args[0] used internally. Otherwise, we see a runtime error, namely an IndexError. We emphasize that this error is not caused in the header line.

In [83]:

product()

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[83], line 1
----> 1 product()

Cell In[80], line 3, in product(*args)
      1 def product(*args):
      2     """Multiply all arguments."""
----> 3     result = args[0]
      5     for arg in args[1:]:
      6         result *= arg

IndexError: tuple index out of range

Another downside of this implementation is that we can easily generate semantic errors: For example, if we pass in an iterable object like the one_hundred list, no exception is raised. However, the return value is also not a numeric object as we expect. The reason for this is that during the function call, args becomes a tuple object holding one element, which is one_hundred, a list object. So, we created a nested structure by accident.

In [84]:

one_hundred = [2, 5, 10]

In [85]:

product(one_hundred)  # a semantic error!

Out[85]:

[2, 5, 10]

This error does not occur if we unpack one_hundred upon passing it as the argument.

In [86]:

product(*one_hundred)

Out[86]:

That is the equivalent of writing out the following tedious expression. Yet, that does not scale for iterables with many elements in them.

In [87]:

product(one_hundred[0], one_hundred[1], one_hundred[2])

Out[87]:

In the "Packing & Unpacking with Functions" exercise , we look at product() in more detail.

While we needed to unpack one_hundred above to avoid the semantic error, unpacking an argument in a function call may also be a convenience in general. For example, to print the elements of one_hundred in one line, we need to use a for statement, until now. With unpacking, we get away without a loop.

In [88]:

print(one_hundred)  # prints the tuple; we do not want that

[2, 5, 10]

In [89]:

for number in one_hundred:
    print(number, end=" ")

2 5 10

In [90]:

print(*one_hundred)  # replaces the for-loop

2 5 10