Notebook

Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .

Chapter 4: Recursion & Looping (continued)¶

While what we learned about the for and while statements in the second part of this chapter suffices to translate any iterative algorithm into code, both come with some syntactic sugar to make life easier for the developer. This last part of the chapter shows how we can further customize the looping logic and introduces as "trick" for situations where we cannot come up with a stopping criterion in a while-loop.

Stopping Loops prematurely¶

This section introduces additional syntax to customize for and while statements in our code even further. They are mostly syntactic sugar in that they do not change how a program runs but make its code more readable. We illustrate them for the for statement only. However, everything presented in this section also works for the while statement.

Example: Is the square of a number in `[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]` greater than `100`?¶

Let's say we have a list of numbers and want to check if the square of at least one of its elements is greater than 100. So, conceptually, we are asking the question if a list of numbers as a whole satisfies a certain condition.

In [1]:

numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]

A first naive implementation could look like this: We loop over every element in numbers and set an indicator variable is_above, initialized as False, to True once we encounter an element satisfying the condition.

This implementation is inefficient as even if the first element in numbers has a square greater than 100, we loop until the last element: This could take a long time for a big list.

Moreover, we must initialize is_above before the for-loop and write an if-else-logic after it to check for the result. The actual business logic is not conveyed in a clear way.

In [2]:

is_above = False

for number in numbers:
    print(number, end="   ")  # added for didactical purposes
    if number ** 2 > 100:
        is_above = True

if is_above:
    print("=>   at least one number satisfies the condition")
else:
    print("=>   no number satisfies the condition")

7   11   8   5   3   12   2   6   9   10   1   4   =>   at least one number satisfies the condition

The `break` Statement¶

Python provides the break statement (cf., reference ) that lets us stop a loop prematurely at any iteration. It is yet another means of controlling the flow of execution, and we say that we "break out of a loop."

In [3]:

is_above = False

for number in numbers:
    print(number, end="   ")  # added for didactical purposes
    if number ** 2 > 100:
        is_above = True
        break

if is_above:
    print("=>   at least one number satisfies the condition")
else:
    print("=>   no number satisfies the condition")

7   11   =>   at least one number satisfies the condition

This is a computational improvement. However, the code still consists of three sections: Some initialization before the for-loop, the loop itself, and some finalizing logic. We prefer to convey the program's idea in one compound statement instead.

The `else`-clause¶

To express the logic in a prettier way, we add an else-clause at the end of the for-loop (cf., reference ). The else-clause is executed only if the for-loop is not stopped with a break statement prematurely (i.e., before reaching the last iteration in the loop). The word "else" implies a somewhat unintuitive meaning and may have better been named a then-clause. In most use cases, however, the else-clause logically goes together with some if statement in the loop's body.

Overall, the code's expressive power increases. Not many programming languages support an optional else-branching for the for and while statements, which turns out to be very useful in practice.

In [4]:

for number in numbers:
    print(number, end="   ")  # added for didactical purposes
    if number ** 2 > 100:
        is_above = True
        break
else:
    is_above = False

if is_above:
    print("=>   at least one number satisfies the condition")
else:
    print("=>   no number satisfies the condition")

7   11   =>   at least one number satisfies the condition

Lastly, we incorporate the finalizing if-else logic into the for-loop, avoiding the is_above variable altogether.

In [5]:

for number in numbers:
    print(number, end="   ")  # added for didactical purposes
    if number ** 2 > 100:
        print("=>   at least one number satisfies the condition")
        break
else:
    print("=>   no number satisfies the condition")

7   11   =>   at least one number satisfies the condition

Of course, if we choose the number an element's square has to pass to be larger, for example, to 200, we have to loop over all numbers. There is no way to optimize this linear search further.

In [6]:

for number in numbers:
    print(number, end="   ")  # added for didactical purposes
    if number ** 2 > 200:
        print("=>   at least one number satisfies the condition")
        break
else:
    print("=>   no number satisfies the condition")

7   11   8   5   3   12   2   6   9   10   1   4   =>   no number satisfies the condition

A first Glance at the Map-Filter-Reduce Paradigm¶

Often, we process some iterable with numeric data, for example, a list of numbers as in this book's introductory example in Chapter 1 or, more realistically, data from a CSV file with many rows and columns.

Processing numeric data usually comes down to operations that may be grouped into one of the following three categories:

mapping: transform a number according to some functional relationship $y = f(x)$
filtering: throw away individual numbers (e.g., statistical outliers in a sample)
reducing: collect individual numbers into summary statistics

We study this map-filter-reduce paradigm extensively in Chapter 8 after introducing more advanced data types that are needed to work with "big" data.

Here, we focus on filtering out some numbers in a for-loop.

Example: A simple Filter¶

Calculate the sum of all even numbers in [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4] after squaring them and adding 1 to the squares:

"all" => loop over an iterable
"even" => filter out the odd numbers
"square and add $1$ " => apply the map $y = f(x) = x^2 + 1$
"sum" => reduce the remaining and mapped numbers to their sum

In [7]:

numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]

In [8]:

total = 0

for number in numbers:
    if number % 2 == 0:  # only keep even numbers
        square = (number ** 2) + 1
        print(number, "->", square, end="   ")  # added for didactical purposes
        total += square

total

8 -> 65   12 -> 145   2 -> 5   6 -> 37   10 -> 101   4 -> 17

Out[8]:

The above code is easy to read as it involves only two levels of indentation.

In general, code gets harder to comprehend the more horizontal space it occupies. It is commonly considered good practice to grow a program vertically rather than horizontally. Code compliant with PEP 8 requires us to use at most 79 characters in a line!

Consider the next example, whose implementation in code already starts to look unbalanced.

Example: Several Filters¶

Calculate the sum of every third and even number in [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4] after squaring them and adding 1 to the squares:

"every" => loop over an iterable
"third" => filter out all numbers except every third
"even" => filter out the odd numbers
"square and add $1$ " => apply the map $y = f(x) = x^2 + 1$
"sum" => reduce the remaining and mapped numbers to their sum

In [9]:

total = 0

for i, number in enumerate(numbers, start=1):
    if i % 3 == 0:  # only keep every third number
        if number % 2 == 0:  # only keep even numbers
            square = (number ** 2) + 1
            print(number, "->", square, end="   ")  # added for didactical purposes 
            total += square

total

8 -> 65   12 -> 145   4 -> 17

Out[9]:

With already three levels of indentation, less horizontal space is available for the actual code block. Of course, one could flatten the two if statements with the logical and operator, as shown in Chapter 3 . Then, however, we trade off horizontal space against a more "complex" if logic, and this is not a real improvement.

The `continue` Statement¶

A Pythonista would instead make use of the continue statement (cf., reference ) that causes a loop to jump into the next iteration skipping the rest of the code block.

The revised code fragment below occupies more vertical space and less horizontal space: A good trade-off.

One caveat is that we need to negate the conditions in the if statements. Conceptually, we are now filtering "out" and not "in."

In [10]:

total = 0

for i, number in enumerate(numbers, start=1):
    if i % 3 != 0:  # only keep every third number
        continue
    elif number % 2 != 0:  # only keep even numbers
        continue

    square = (number ** 2) + 1
    print(number, "->", square, end="   ")  # added for didactical purposes 
    total += square

total

8 -> 65   12 -> 145   4 -> 17

Out[10]:

This is yet another illustration of why programming is an art. The two preceding code cells do the same with identical time complexity. However, the latter is arguably easier to read for a human, even more so when the business logic grows beyond two filters.

Indefinite Loops¶

Sometimes we find ourselves in situations where we cannot know ahead of time how often or until which point in time a code block is to be executed.

Example: Guessing a Coin Toss¶

Let's consider a game where we randomly choose a variable to be either "Heads" or "Tails" and the user of our program has to guess it.

Python provides the built-in input() function that prints a message to the user, called the prompt, and reads in what was typed in response as a str object. We use it to process a user's "unreliable" input to our program (i.e., a user might type in some invalid response). Further, we use the random() function in the random module to model the coin toss.

A popular pattern to approach such indefinite loops is to go with a while True statement, which on its own would cause Python to enter into an infinite loop. Then, once a particular event occurs, we break out of the loop.

Let's look at a first and naive implementation.

In [11]:

import random

In [12]:

random.seed(42)

In [13]:

while True:
    guess = input("Guess if the coin comes up as heads or tails: ")

    if random.random() < 0.5:
        if guess == "heads":
            print("Yes, it was heads")
            break
        else:
            print("Ooops, it was heads")
    else:
        if guess == "tails":
            print("Yes, it was tails")
            break
        else:
            print("Ooops, it was tails")

Ooops, it was tails

Yes, it was heads

This version exhibits two severe issues where we should improve on:

If a user enters anything other than "heads" or "tails", for example, "Heads" or "Tails", the program keeps running without the user knowing about the mistake!
The code intermingles the coin tossing with the processing of the user's input: Mixing unrelated business logic in the same code block makes a program harder to read and, more importantly, maintain in the long run.

Example: Guessing a Coin Toss (revisited)¶

Let's refactor the code and make it modular.

First, we divide the business logic into two functions get_guess() and toss_coin() that are controlled from within a while-loop.

get_guess() not only reads in the user's input but also implements a simple input validation pattern in that the .strip() and .lower() methods remove preceding and trailing whitespace and lower case the input ensuring that the user may spell the input in any possible way (e.g., all upper or lower case). Also, get_guess() checks if the user entered one of the two valid options. If so, it returns either "heads" or "tails"; if not, it returns None.

In [14]:

def get_guess():
    """Process the user's input.
    
    Returns:
        guess (str / NoneType): either "heads" or "tails"
            if the input can be parsed and None otherwise
    """
    guess = input("Guess if the coin comes up as heads or tails: ")
    # handle frequent cases of "misspelled" user input
    guess = guess.strip().lower()

    if guess in ["heads", "tails"]:
        return guess
    return None

toss_coin() models a fair coin toss when called with default arguments.

In [15]:

def toss_coin(p_heads=0.5):
    """Simulate the tossing of a coin.

    Args:
        p_heads (optional, float): probability that the coin comes up "heads";
            defaults to 0.5 resembling a fair coin

    Returns:
        side_on_top (str): "heads" or "tails"
    """
    if random.random() < p_heads:
        return "heads"
    return "tails"

Second, we rewrite the if-else-logic to handle the case where get_guess() returns None explicitly: Whenever the user enters something invalid, a warning is shown, and another try is granted. We use the is operator and not the == operator as None is a singleton object.

The while-loop takes on the role of glue code that manages how other parts of the program interact with each other.

In [16]:

random.seed(42)

In [17]:

while True:
    guess = get_guess()
    result = toss_coin()

    if guess is None:
        print("Make sure to enter your guess correctly!")
    elif guess == result:
        print("Yes, it was", result)
        break
    else:
        print("Ooops, it was", result)

Make sure to enter your guess correctly!

Yes, it was heads

Now, the program's business logic is expressed in a clearer way. More importantly, we can now change it more easily. For example, we could make the toss_coin() function base the tossing on a probability distribution other than the uniform (i.e., replace the random.random() function with another one). In general, modular architecture leads to improved software maintenance.