Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .
Similarly to how we classify different concrete data types like list
or str
by how they behave abstractly in a given context in Chapter 7 , we also do so for the data types we have introduced in this chapter.
Here, the map
, filter
, and generator
types all behave like "rules" in memory that govern how objects are produced "on the fly." Their main commonality is their support for the built-in next() function. In computer science terminology, such data types are called iterators
, and the collections.abc
module formalizes them with the
Iterator
ABC in Python.
So, one example of an iterator is evens_transformed
below, an object of type generator
.
numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]
evens_transformed = ((x ** 2) + 1 for x in numbers if x % 2 == 0)
Let's first confirm that evens_transformed
is indeed an Iterator
, "abstractly speaking."
import collections.abc as abc
isinstance(evens_transformed, abc.Iterator)
True
In Python, iterators are always also iterables. The reverse is not true! To be precise, iterators are specializations of iterables. That is what the "Inherits from" column means in the collections.abc module's documentation.
isinstance(evens_transformed, abc.Iterable)
True
Furthermore, we sharpen our definition of an iterable from Chapter 7 : Just as we define an iterator to be any object that supports the next()
function, we define an iterable to be any object that supports the built-in iter()
function.
The confused reader may now be wondering how the two concepts relate to each other.
In short, the iter() function is the general way to create an iterator object out of a given iterable object. Then, the iterator object manages the iteration over the iterable object. In real-world code, we hardly ever see iter()
as Python calls it for us in the background.
For illustration, let's do that ourselves and create two iterators out of the iterable numbers
and see what we can do with them.
iterator1 = iter(numbers)
iterator2 = iter(numbers)
iterator1
and iterator2
are of type list_iterator
.
type(iterator1)
list_iterator
Iterators are useful for only one thing: Get the next object from the associated iterable.
By calling next() three times with
iterator1
as the argument, we obtain the first three elements of numbers
.
next(iterator1), next(iterator1), next(iterator1)
(7, 11, 8)
iterator1
and iterator2
keep their states separate. So, we could loop over the same iterable several times in parallel.
next(iterator1), next(iterator2)
(5, 7)
We can also play a "trick" and exchange some elements in numbers
. iterator1
and iterator2
do not see these changes and present us with the new elements. So, iterators not only have state on their own but also keep this separate from the underlying iterable.
numbers[1], numbers[4] = 99, 99
next(iterator1), next(iterator2)
(99, 99)
Let's re-assign the elements in numbers
so that they are in order. After that, the numbers returned from next() also tell us how often next()
was called with
iterator1
or iterator2
. We conclude that list_iterator
objects work by simply keeping track of the last index obtained from the underlying iterable.
numbers[:] = list(range(1, 13))
numbers
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
next(iterator1), next(iterator2)
(6, 3)
With the concepts introduced in this section, we can now understand the first sentence in the documentation on the zip() built-in better: "Make an iterator that aggregates elements from each of the iterables."
Because iterators are always also iterables, we may pass iterator1
and iterator2
as arguments to zip() .
The returned zipper
object is of type zip
and, "abstractly speaking," an Iterator
as well.
zipper = zip(iterator1, iterator2)
zipper
<zip at 0x7f2788118b00>
type(zipper)
zip
isinstance(zipper, abc.Iterator)
True
zipper_iterator = iter(zipper)
zipper_iterator
<zip at 0x7f2788118b00>
zipper_iterator
references the same object as zipper
! That is true for iterators in general: Any iterator created from an existing iterator with iter() is the iterator itself.
zipper is zipper_iterator
True
The Python core developers made that design decision so that iterators may also be looped over.
The for
-loop below prints out only six more tuple
objects derived from the now ordered numbers
because the iterator1
object hidden inside zipper
already returned its first six elements. So, the respective first elements of the tuple
objects printed range from 7
to 12
. Similarly, as iterator2
already returned its first three elements from numbers
, we see the respective second elements in the range from 4
to 9
.
for x, y in zipper:
print(x, "and", y, end=" ")
7 and 4 8 and 5 9 and 6 10 and 7 11 and 8 12 and 9
zipper
is now exhausted. So, the for
-loop below does not make any iteration at all.
for x, y in zipper:
raise RuntimeError("We won't see this error")
We verify that iterator1
is exhausted by passing it to next() again, which raises a
StopIteration
exception.
next(iterator1)
--------------------------------------------------------------------------- StopIteration Traceback (most recent call last) Cell In[25], line 1 ----> 1 next(iterator1) StopIteration:
On the contrary, iterator2
is not yet exhausted.
next(iterator2)
10
Understanding iterators and iterables is helpful for any data science practitioner that deals with large amounts of data. Even without that, these two terms occur everywhere in Python-related texts and documentation. So, a beginner should regularly review this section until it becomes second nature.
for
Statement (revisited)¶In Chapter 4 , we argue that the
for
statement is syntactic sugar, replacing the while
statement in many common scenarios. In particular, a for
-loop saves us two tasks: Managing an index variable and obtaining the individual elements by indexing. In this sub-section, we look at a more realistic picture, using the new terminology as well.
Let's print out the elements of a list
object as the iterable being looped over.
iterable = [0, 1, 2, 3, 4]
for element in iterable:
print(element, end=" ")
0 1 2 3 4
Our previous and equivalent formulation with a while
statement is like so.
index = 0
while index < len(iterable):
element = iterable[index]
print(element, end=" ")
index += 1
del index
0 1 2 3 4
What actually happens behind the scenes in the Python interpreter is shown below.
First, Python calls iter() with the
iterable
to be looped over as the argument. The returned iterator
contains the entire logic of how the iterable
is looped over. In particular, the iterator
may or may not pick the iterable
's elements in a predictable order. That is up to the "rule" it models.
Second, Python enters an indefinite while
-loop. It tries to obtain the next element with next() . If that succeeds, the
for
-loop's code block is executed. Below, that code is placed within the else
-clause that runs only if no exception is raised in the try
-clause. Then, Python jumps into the next iteration and tries to obtain the next element from the iterator
, and so on. Once the iterator
is exhausted, it raises a StopIteration
exception, and Python stops the while
-loop with the break
statement.
iterator = iter(iterable)
while True:
try:
element = next(iterator)
except StopIteration:
break
else:
print(element, end=" ")
del iterator
0 1 2 3 4
Now that we know the concept of an iterator, let's compare some of the built-ins introduced in Chapter 7 in detail and make sure we understand what is going on in memory. This section also serves as a summary of all the concepts in this chapter.
We use two simple examples, numbers
and memoryless
: numbers
creates thirteen objects in memory and memoryless
only one (cf., PythonTutor ).
numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]
memoryless = range(1, 13)
The sorted() function takes a finite
iterable
argument and materializes its elements into a new list
object that is returned.
The argument may already be materialized, as is the case with numbers
, but may also be an iterable without any objects in it, such as memoryless
. In both cases, we end up with materialized list
objects with the elements sorted in forward order (cf., PythonTutor ).
sorted(numbers)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
sorted(memoryless)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
By adding a keyword-only argument reverse=True
, the materialized list
objects are sorted in reverse order (cf., PythonTutor ).
sorted(numbers, reverse=True)
[12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
sorted(memoryless, reverse=True)
[12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
The order in numbers
remains unchanged, and memoryless
is still not materialized.
numbers
[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]
memoryless
range(1, 13)
The reversed() built-in takes a
sequence
argument and returns an iterator. The argument must be finite and reversible (i.e., iterable in reverse order) as otherwise reversed() could neither determine the last element that becomes the first nor loop in a predictable backward fashion. PythonTutor
confirms that reversed()
does not materialize any elements but only returns an iterator.
Side Note: Even though range
objects, like memoryless
here, do not "contain" references to other objects, they count as sequence types, and as such, they are also container types. The in
operator works with range
objects because we can always cast the object to be checked as an int
and check if that lies within the range
object's start
and stop
values, taking a potential step
value into account (cf., this blog post for more details on the range() built-in).
reversed(numbers)
<list_reverseiterator at 0x7f2788158880>
reversed(memoryless)
<range_iterator at 0x7f2788158fc0>
To materialize the elements, we can pass the returned iterators to, for example, the list() or tuple()
constructors. That creates new
list
and tuple
objects (cf., PythonTutor ).
To reiterate some more new terminology from this chapter, we describe reversed() as lazy whereas list()
and tuple()
are eager. The former has no significant side effect in memory, while the latter may require a lot of memory.
list(reversed(numbers))
[4, 1, 10, 9, 6, 2, 12, 3, 5, 8, 11, 7]
tuple(reversed(memoryless))
(12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)
Of course, we can also loop over the returned iterators instead.
That works because iterators are always iterable; in particular, as the previous "The for Statement (revisited)" sub-section explains, the for
-loops below call iter(reversed(numbers))
and iter(reversed(memoryless))
behind the scenes. However, the iterators returned by iter() are the same as the
reversed(numbers)
and reversed(memoryless)
iterators passed in! In summary, the for
-loops below involve many subtleties that together make Python the expressive language it is.
for number in reversed(numbers):
print(number, end=" ")
4 1 10 9 6 2 12 3 5 8 11 7
for element in reversed(memoryless):
print(element, end=" ")
12 11 10 9 8 7 6 5 4 3 2 1
As with sorted() , the reversed()
built-in does not mutate its argument.
numbers
[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]
memoryless
range(1, 13)
To point out the potentially obvious, we compare the results of sorting numbers
in reverse order with reversing it: These are different concepts!
numbers
[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]
sorted(numbers, reverse=True)
[12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
list(reversed(numbers))
[4, 1, 10, 9, 6, 2, 12, 3, 5, 8, 11, 7]
Whereas both sorted() and reversed()
do not mutate their arguments, the mutable
list
type comes with two methods, sort() and
reverse()
, that implement the same logic but mutate an object, like numbers
below, in place. To indicate that all changes occur in place, the sort() and
reverse()
methods always return None
, which is not shown in JupyterLab.
numbers
[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]
The reverse()
method on the list
type is eager, as opposed to the lazy reversed() built-in. That means the mutations caused by the
reverse()
method are written into memory right away.
numbers.reverse()
numbers
[4, 1, 10, 9, 6, 2, 12, 3, 5, 8, 11, 7]
Sorting numbers
in place occurs eagerly.
numbers.sort(reverse=True)
numbers
[12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
numbers.sort()
numbers
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]