Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .

Chapter 8: Map, Filter, & Reduce (continued)¶

After introducing the Map-Filter-Reduce paradigm in the first part of this chapter, we first see how list comprehensions can replace the map() and filter() built-ins in many cases. Then, we learn how generator expressions are like list comprehensions without using the memory. We end this part with a short discussion of the built-in all() and any() functions.

`list` Comprehensions¶

Consider again the original "A simple Filter" example from Chapter 4 , re-written such that both the mapping and the filtering are done in one for-loop instead of the two above.

In [1]:

numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]

In [2]:

evens_transformed = []

for number in numbers:
    if number % 2 == 0:
        evens_transformed.append((number ** 2) + 1)

In [3]:

sum(evens_transformed)

Out[3]:

list comprehensions are expressions to derive new list objects out of existing ones (cf., reference ). Practically, this means that we place the for and if inside brackets [ and ].

So, the example can be written in a single expression like below replacing the compound for statement above.

In [4]:

[(n ** 2) + 1 for n in numbers if n % 2 == 0]

Out[4]:

[65, 145, 5, 37, 101, 17]

A list comprehension may always be used in a place where otherwise a list object would work.

For example, let's rewrite the "A simple Filter" example from Chapter 4 in just one line. As a caveat, the code below materializes all elements in memory before summing them up, and may, therefore, cause a MemoryError when executed with a bigger numbers list. We see with PythonTutor how a list object exists in memory at step 17 and then "gets lost" right after. As the next section shows, this downside may be mitigated.

In [5]:

sum([(n ** 2) + 1 for n in numbers if n % 2 == 0])

Out[5]:

Example: Nested Lists¶

List comprehensions may come with several for's and if's.

The cell below creates a list object that contains three other list objects with a series of numbers in them. The first and last numbers in each inner list object are offset by 1.

In [6]:

nested_numbers = [
    [1, 2, 3, 4, 5, 6, 7],
    [2, 3, 4, 5, 6, 7, 8],
    [3, 4, 5, 6, 7, 8, 9],
]

In [7]:

nested_numbers

Out[7]:

[[1, 2, 3, 4, 5, 6, 7], [2, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9]]

To do something meaningful with the numbers, we have to remove the inner layer of list objects and flatten (i.e., "un-nest") the data.

Without list comprehensions, we achieve that with two nested for-loops.

In [8]:

flat_numbers = []

for inner_numbers in nested_numbers:
    for number in inner_numbers:
        flat_numbers.append(number)

In [9]:

flat_numbers

Out[9]:

[1, 2, 3, 4, 5, 6, 7, 2, 3, 4, 5, 6, 7, 8, 3, 4, 5, 6, 7, 8, 9]

That translates into a list comprehension like below. The order of the for's may be confusing at first but is the same as writing out the nested for-loops as above.

In [10]:

[number for inner_numbers in nested_numbers for number in inner_numbers]

Out[10]:

[1, 2, 3, 4, 5, 6, 7, 2, 3, 4, 5, 6, 7, 8, 3, 4, 5, 6, 7, 8, 9]

Now, we may use the list object resulting from the list comprehension in any way we want.

For example, to add up the flattened numbers with sum() . The same caveat holds as before: The list object passed into sum() is materialized with all its elements before the sum is calculated!

In [11]:

sum([number for inner_numbers in nested_numbers for number in inner_numbers])

Out[11]:

In this particular example, however, we can exploit the fact that any sum of numbers can be expressed as the sum of sums of mutually exclusive and collectively exhaustive subsets of these numbers and get away with just one for in the list comprehension.

In [12]:

sum([sum(inner_numbers) for inner_numbers in nested_numbers])

Out[12]:

Example: Working with Cartesian Products¶

A popular use case of nested list comprehensions is applying a transformation to each $n$ -tuple in the Cartesian product created from elements of $n$ iterables. In the generic illustration below, a transformation $f(x, y)$ is applied to each $2$ -tuple in the Cartesian product $X \times Y$ where $x$ is an element in $X = \{x_1, x_2, x_3\}$ and $y$ is an element in $Y = \{y_1, y_2, y_3\}$ .

$Y$ \ $X$	$x_1$	$x_2$	$x_3$
$y_1$	f( $x_1$ , $y_1$ )	f( $x_2$ , $y_1$ )	f( $x_3$ , $y_1$ )
$y_2$	f( $x_1$ , $y_2$ )	f( $x_2$ , $y_2$ )	f( $x_3$ , $y_2$ )
$y_3$	f( $x_1$ , $y_3$ )	f( $x_2$ , $y_3$ )	f( $x_3$ , $y_3$ )

For example, let's add each quotient obtained by taking the numerator $x$ from [10, 20, 30] and the denominator $y$ from [40, 50, 60] to 1, and then find the overall product. This transformation can be described mathematically as the function $z = f(x, y)$ below.

$z = f(x, y) = \prod{ (1 + \frac{x}{y} )}$

Further, the table below visualizes the calculations: The result is the product of nine entries.

`y` \ `x`	10	20	30
40	1.25	1.50	1.75
50	1.20	1.40	1.60
60	1.17	1.33	1.50

To express that in Python, we start by creating two list objects, first and second.

In [13]:

first = [10, 20, 30]
second = [40, 50, 60]

For a Cartesian product, we must loop over all possible $2$ -tuples where one element is drawn from first and the other from second. We achieve that with two nested for-loops, in which we calculate each quotient and append it to an initially empty cartesian_product list.

In [14]:

cartesian_product = []

for numerator in first:
    for denominator in second:
        quotient = 1 + (numerator / denominator)
        cartesian_product.append(quotient)

cartesian_product

Out[14]:

[1.25, 1.2, 1.1666666666666667, 1.5, 1.4, 1.3333333333333333, 1.75, 1.6, 1.5]

Next, we convert the two explicit for-loops into one list comprehensions and use x and y as shorter variable names.

In [15]:

[1 + (x / y) for x in first for y in second]

Out[15]:

[1.25, 1.2, 1.1666666666666667, 1.5, 1.4, 1.3333333333333333, 1.75, 1.6, 1.5]

The order of the for's is important: The list comprehension above divides numbers from first by numbers from second, whereas the list comprehension below does the opposite.

In [16]:

[1 + (x / y) for x in second for y in first]

Out[16]:

[5.0, 3.0, 2.333333333333333, 6.0, 3.5, 2.666666666666667, 7.0, 4.0, 3.0]

To find the overall product, we unpack the list comprehension into the product() function from the "Function Definitions & Calls" sub-section in Chapter 7 .

In [17]:

def product(*args):
    """Multiply all arguments."""
    result = args[0]

    for arg in args[1:]:
        result *= arg

    return result

In [18]:

product(*[1 + (x / y) for x in first for y in second])

Out[18]:

20.58

Alternatively, we use a lambda expression with the reduce() function from the functools module.

In [19]:

from functools import reduce

In [20]:

reduce(lambda x, y: x * y, [1 + (x / y) for x in first for y in second])

Out[20]:

20.58

While this example is stylized, Cartesian products are hidden in many applications, and it shows how the various language features introduced in this chapter can be seamlessly combined to process sequential data.

`generator` Expressions¶

Because of the high memory consumption, Pythonistas avoid materialized list objects, and, thus, also list comprehensions, whenever possible. Instead, they prefer to work with generator expressions . Syntactically, they work like list comprehensions except that parentheses, ( and ), replace brackets, [ and ].

Let's go back to the original "A simple Filter" example from Chapter 4 one more time, apply the transformation $y := x^2 + 1$ to all even numbers, and sum them up.

In [21]:

numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]

To filter and transform numbers, we wrote a list comprehension above ...

In [22]:

[(n ** 2) + 1 for n in numbers if n % 2 == 0]

Out[22]:

[65, 145, 5, 37, 101, 17]

... that now becomes a generator expression.

In [23]:

((n ** 2) + 1 for n in numbers if n % 2 == 0)

Out[23]:

<generator object <genexpr> at 0x7fcf703d84a0>

A generator expression evaluates to yet another "rule"-like object in memory that knows how to generate the individual objects in a series one by one. Whereas a list comprehension materializes its elements in memory when it is evaluated, the opposite holds true for generator expressions: No object is created in memory except the "rule" itself. Because of this behavior, we describe generator expressions as lazy and list comprehensions as eager.

To materialize the elements specified by a generator expression, we use the list() constructor as seen above.

In [24]:

list(((n ** 2) + 1 for n in numbers if n % 2 == 0))

Out[24]:

[65, 145, 5, 37, 101, 17]

Whenever a generator expression is the only argument in a function call, we may merge the double parentheses, (( and )), into just ( and ).

In [25]:

list((n ** 2) + 1 for n in numbers if n % 2 == 0)

Out[25]:

[65, 145, 5, 37, 101, 17]

A common use case is to reduce the elements into a single object instead, for example, by adding them up with sum() as in the original "A simple Filter" example. PythonTutor shows how the code cell below does not create a temporary list object in memory whereas a list comprehension would do so (cf., step 17).

In [26]:

sum((n ** 2) + 1 for n in numbers if n % 2 == 0)

Out[26]:

Let's assign the object to which the generator expression below evaluates to to a variable gen and inspect it.

In [27]:

gen = ((n ** 2) + 1 for n in numbers if n % 2 == 0)

In [28]:

gen

Out[28]:

<generator object <genexpr> at 0x7fcf703d8c10>

Unsurprisingly, generator expressions create objects of type generator.

In [29]:

type(gen)

Out[29]:

generator

generator objects work just like the map and filter objects in the first part of this chapter. So, with the next() function, we can generate elements one by one.

In [30]:

next(gen)

Out[30]:

In [31]:

next(gen)

Out[31]:

In [32]:

next(gen)

Out[32]:

In [33]:

next(gen)

Out[33]:

In [34]:

next(gen)

Out[34]:

In [35]:

next(gen)

Out[35]:

Once a generator object runs out of elements, it raises a StopIteration exception, and we say that the generator object is exhausted. To loop over its elements again, we must create a new one.

In [36]:

next(gen)

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
Cell In[36], line 1
----> 1 next(gen)

StopIteration:

In [37]:

next(gen)

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
Cell In[37], line 1
----> 1 next(gen)

StopIteration:

Calling the next() function repeatedly with the same generator object as the argument is essentially what a for-loop automates for us. So, generator objects are iterable. We look into this in detail further below in the "The for Statement (revisited)" section.

In [38]:

for number in ((n ** 2) + 1 for n in numbers if n % 2 == 0):
    print(number, end=" ")

65 145 5 37 101 17

Example: Nested Lists (continued)¶

If we are only interested in a reduction of nested_numbers into a single statistic, as the overall sum in the "Nested Lists" example above, we should replace list objects or list comprehensions with generator expressions wherever possible. The result is the same, but no intermediate list objects are materialized! That makes our code scale to large amounts of data.

Let's adapt the example nested_numbers from above.

In [39]:

nested_numbers

Out[39]:

[[1, 2, 3, 4, 5, 6, 7], [2, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9]]

Compared to the implementation with the list comprehension above, we simply remove the brackets, [ and ]: The argument to sum() becomes a generator expression.

In [40]:

sum(number for inner_numbers in nested_numbers for number in inner_numbers)

Out[40]:

That also holds for the alternative formulation as a sum of sums.

In [41]:

sum(sum(inner_numbers) for inner_numbers in nested_numbers)

Out[41]:

Because nested_numbers has an internal structure (i.e., the inner list objects are series of consecutive int objects), we can even provide an effectively memoryless implementation by expressing it as a generator expression derived from range objects. PythonTutor confirms that no list objects materialize at any point in time.

In [42]:

nested_numbers = ((range(x, y + 1)) for x, y in zip(range(1, 4), range(7, 10)))

In [43]:

nested_numbers

Out[43]:

<generator object <genexpr> at 0x7fcf70347970>

In [44]:

sum(number for inner_numbers in nested_numbers for number in inner_numbers)

Out[44]:

We must be careful when assigning a generator object to a variable: If we use nested_numbers again, for example, in the alternative formulation below, sum() returns 0 because nested_numbers is exhausted after executing the previous code cell. PythonTutor also shows that.

In [45]:

sum(sum(inner_numbers) for inner_numbers in nested_numbers)

Out[45]:

Example: Working with Cartesian Products (continued)¶

Let's also rewrite the "Working with Cartesian Products" example from above with generator expressions.

As a first optimization, we replace the materialized list objects, first and second, with memoryless range objects.

In [46]:

first = range(10, 31, 10)  # "==" [10, 20, 30]
second = range(40, 61, 10)  # "==" [40, 50, 60]

Now, the first of the two alternative solutions may be more appealing to many readers. In general, many practitioners seem to dislike lambda expressions.

In the first solution, we unpack the elements produced by (1 + (x / y) for x in first for y in second) into the product() function from the "Function Definitions & Calls" sub-section in Chapter 7 . However, inside product(), the elements are packed into args, a materialized tuple object! So, all the memory efficiency gained by using a generator expression is lost! PythonTutor shows how a tuple object exists in steps 38-58.

In [47]:

product(*(1 + (x / y) for x in first for y in second))  # not memory efficient!

Out[47]:

20.58

On the contrary, the second solution with the reduce() function from the functools module and the lambda expression works without the elements materialized at the same time, and PythonTutor confirms that. So, only the second alternative is truly memory-efficient!

In [48]:

reduce(lambda x, y: x * y, (1 + (x / y) for x in first for y in second))

Out[48]:

20.58

In summary, we learn from this example that unpacking generator objects may be a bad idea.

Example: Averaging all even Numbers in a List (revisited)¶

With the new concepts in this chapter, let's rewrite the book's introductory "Averaging all even Numbers in a List" example from Chapter 1 such that it efficiently handles a large sequence of numbers. We continue from its latest implementation, the average_evens() function in the "Keyword-only Arguments" section in Chapter 2 .

We assume that average_evens() is called with a finite and iterable object that generates a stream of numeric objects that can be cast as int objects. After all, the idea of even and odd numbers makes sense only in the context of whole numbers.

The generator expression (round(n) for n in numbers) implements the type casting, and, when it is evaluated during a function call, nothing happens except that a generator object is assigned to integers. Then, with the reduce() function from the functools module, we simultaneously add up and count the even numbers with the generator object to which the inner generator expression ((n, 1) for n in integers if n % 2 == 0) evaluates to. That generator object takes the integers generator as its source and produces tuple objects consisting of the next even number in line and 1. Two such tuple objects are then iteratively passed to the function object to which the lambda expression evaluates to. x represents the total and the count of the even numbers processed so far, while y's first element, y[0], is the next even number to be added to the running total. The result of calling reduce() is again a tuple object, namely the final total and count. Lastly, we calculate the simple average and scale it.

In summary, this implementation of average_evens() does not keep materialized list objects internally like its predecessors in Chapter 2 but processes the elements of the numbers argument on a one-by-one basis.

In [49]:

def average_evens(numbers, *, scalar=1):
    """Calculate the average of all even numbers.

    Args:
        numbers (iterable): a finite stream of the numbers to be averaged;
            if non-whole numbers are provided, they are rounded
        scalar (float, optional): multiplies the average; defaults to 1

    Returns:
        float: (scaled) average
    """
    integers = (round(n) for n in numbers)
    total, count = reduce(
        lambda x, y: (x[0] + y[0], x[1] + y[1]),
        ((n, 1) for n in integers if n % 2 == 0)
    )
    return scalar * total / count

In [50]:

average_evens([7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4])

Out[50]:

7.0

We may provide an optional scalar argument as before.

In [51]:

average_evens([7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4], scalar=2)

Out[51]:

14.0

An argument with float objects works just as well.

In [52]:

average_evens([7., 11., 8., 5., 3., 12., 2., 6., 9., 10., 1., 4.])

Out[52]:

7.0

To show that average_evens() can process a large stream of data, we simulate 10_000_000 randomly drawn integers between 0 and 100 with the randint() function from the random module. We use a generator expression derived from a range object as the numbers argument. So, at no point in time is there a materialized list object in memory. The result approaching 50 confirms that randint() must be based on a uniform distribution.

In [53]:

import random

In [54]:

random.seed(42)

In [55]:

average_evens(random.randint(0, 100) for _ in range(10_000_000))

Out[55]:

49.994081434519636

To show that average_evens() filters out odd numbers, we simulate another stream of 10_000_000 randomly drawn odd integers between 1 and 99. As no function in the random module does that "out of the box," we must be creative: Doubling a number drawn from random.randint(0, 49) results in an even number between 0 and 98, and adding 1 makes it odd. Then, average_evens() raises a TypeError, essentially because (int(n) for n in numbers) does not generate any element.

In [56]:

average_evens(2 * random.randint(0, 49) + 1 for _ in range(10_000_000))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[56], line 1
----> 1 average_evens(2 * random.randint(0, 49) + 1 for _ in range(10_000_000))

Cell In[49], line 13, in average_evens(numbers, scalar)
      2 """Calculate the average of all even numbers.
      3 
      4 Args:
   (...)
     10     float: (scaled) average
     11 """
     12 integers = (round(n) for n in numbers)
---> 13 total, count = reduce(
     14     lambda x, y: (x[0] + y[0], x[1] + y[1]),
     15     ((n, 1) for n in integers if n % 2 == 0)
     16 )
     17 return scalar * total / count

TypeError: reduce() of empty iterable with no initial value

`tuple` Comprehensions¶

There is no syntax to derive new tuple objects out of existing ones. However, we can mimic such a construct by combining the built-in tuple() constructor with a generator expression.

So, to convert the list comprehension [(n ** 2) + 1 for n in numbers if n % 2 == 0] from above into a "tuple comprehension," we write the following.

In [57]:

tuple((n ** 2) + 1 for n in numbers if n % 2 == 0)

Out[57]:

(65, 145, 5, 37, 101, 17)

Boolean Reducers¶

Besides min() , max() , and sum() , Python provides two boolean reduce functions: all() and any() .

Let's look at straightforward examples involving numbers again.

In [58]:

numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]

all() takes an iterable argument and returns True if all elements are truthy.

For example, let's check if the square of each element in numbers is below 100 or 150, respectively. We express the computation with a generator expression passed as the only argument to all() .

In [59]:

all(x ** 2 < 100 for x in numbers)

Out[59]:

False

In [60]:

all(x ** 2 < 150 for x in numbers)

Out[60]:

True

all() can be viewed as syntactic sugar replacing a for-loop: Internally, all() implements the short-circuiting strategy explained in Chapter 3 , and we mimic that by testing for the opposite condition in the if statement and stopping the for-loop early with the break statement. In the worst case, if threshold were, for example, 150, we would loop over all elements in the iterable argument, which must be finite for the code to work. So, all() is a linear search in disguise.

In [61]:

threshold = 100

for number in numbers:
    if number ** 2 >= threshold:  # the opposite of what we are checking for
        all_below_threshold = False
        break
else:
    all_below_threshold = True

all_below_threshold

Out[61]:

False

The documentation of all() shows in another way what it does with code: By placing a return statement in a for-loop's body inside a function, iteration is stopped prematurely once an element does not meet the condition. That is the familiar early exit pattern at work.

In [62]:

def all_alt(iterable):
    """Alternative implementation of the built-in all() function."""
    for element in iterable:
        if not element:  # the opposite of what we are checking for
            return False
    return True

In [63]:

all_alt(x ** 2 < 100 for x in numbers)

Out[63]:

False

In [64]:

all_alt(x ** 2 < 150 for x in numbers)

Out[64]:

True

Similarly, any() checks if at least one element in the iterable argument is truthy.

To continue the example, let's check if the square of any element in numbers is above 100 or 150, respectively.

In [65]:

any(x ** 2 > 100 for x in numbers)

Out[65]:

True

In [66]:

any(x ** 2 > 150 for x in numbers)

Out[66]:

False

The code cell below shows how any() works internally: It also follows the short-circuiting strategy. Here, we do not need to check for the opposite condition.

In [67]:

threshold = 100

for number in numbers:
    if number ** 2 > threshold:
        any_above_threshold = True
        break
else:
    any_above_threshold = False

any_above_threshold

Out[67]:

True

The alternative formulation in the documentation of any() is straightforward and also uses the early exit pattern.

In [68]:

def any_alt(iterable):
    """Alternative implementation of the built-in any() function."""
    for element in iterable:
        if element:
            return True
    return False

In [69]:

any_alt(x ** 2 > 100 for x in numbers)

Out[69]:

True

In [70]:

any_alt(x ** 2 > 150 for x in numbers)

Out[70]:

False