The two are closely related: each time you iterate over an iterable, the interpreter actually creates a new iterator from the iterable, and loops over that. The mechanism is quite simple, but until you understand the details it can seem a little confusing.
It is often the case that, authors tend to keep these in the Advanced Section of their books or articles on Python.
Do this on every one of these
This is applicable generally to anything that uses the for keyword specifically
P.S This is not relevant for while loops
Let's now make a list of programming languages for obvious reasons 😅
def do_this_with(o):
print("!", o, "!")
lang_list = ["Python", "Go", "Java"]
Originally (well, certainly in Python 1.5), for loop iterations over an object x were quite simplistic. The interpeter would internally create a hidden integer variable, then repeatedly index x using the hidden variable as an index (by calling x's getitem method with the hidden variable as an argument), incrementing it to produce successive values until the call produced an IndexError exception, thereby causing the loop to terminate.
private_var = 0
while True:
try:
i = lang_list.__getitem__(private_var)
except IndexError:
break
do_this_with(i)
private_var += 1
! Python ! ! Go ! ! Java !
The mechanism, which we can think of as the old iteration protocol, was easy to understand but only worked for objects that could be numerically indexed (tuples, lists, and other sequence types). Indices must run from 0 to N-1 and therefore could not be used to iterate over unordered containers(sets,dicts).
To overcome the limitations of this old protocol, and specifically to allow iteration over objects that can't be numerically indexed, a newer protocol was introduced, which works with any iterable.
The protocol is quite simple, but not well understood. When you write code like the following to iterate over an iterable such as a list.
for i in test_list: # or some other iterable
do_something_with(i)
the interpreter begins by calling the iterable's iter method to create an iterator. If the object has no iter method, the interpreter simply falls back to the old protocol(as Python is backward compatible). If there's no getitem method either, the interpreter just raises a TypeError exception, on the not unreasonable grounds that there's no way to iterate over the given value.
Let me show you, what I meant.
for i in None:
do_something_with(i)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-3-db0e4290e1ce> in <module>() ----> 1 for i in None: 2 do_something_with(i) TypeError: 'NoneType' object is not iterable
Why couldn't Python iterate over NoneType object? Let's see
oi = dir(None)
print("__iter__" in oi, "__getitem__" in oi)
False False
NoneType has neither iter or getitem methods.
_ = lang_list.__iter__() # creates an iterator
while True:
try:
i = _.__next__()
except StopIteration: # iterator is exhausted
break
do_this_with(i)
! Python ! ! Go ! ! Java !
Each time through the loop, the interpreter extracts the next value from the iterator by calling its next method (Python 2 contained a design flaw and the method is called next, failing to denote it as a special method. It was renamed in Python 3).n the case above, the results of the next call are successively bound to i, until next raises a StopIteration exception, which is used to terminate the loop normally.
This is how you can check if the new iteration protocol will work on an object, see whether it has an iter method. If it does, then it's an iterable. Lists are iterables, for example:
hasattr(lang_list, "__iter__")
True
So what kind of an object does a call to that method return? A specific kind of iterator called a list iterator.
x=[1,2,3]
print(next(x))
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-7-015e4d5f862c> in <module>() 1 x=[1,2,3] ----> 2 print(next(x)) TypeError: 'list' object is not an iterator
x=[1,2,3]
y=iter(x)
print(next(y))
print(next(y))
print(next(y))
1 2 3
type(x)
list
type(y)
list_iterator
Iterables(lists,tuples,dicts,sets etc)
Iterators (built in factory function)
A factory function is basically used to return (new) objects.
Evidence for the above inferences
print(hasattr(x,'__iter__'),hasattr(x,'__next__'))
True False
print(hasattr(y,'__iter__'),hasattr(y,'__next__'))
True True
Calling iter method on a iterable returns a new iterator
iterator_1 = iter(lang_list)
iterator_2 = iter(lang_list)
print(id(iterator_1))
print(id(iterator_2))
2108767821664 2108767821496
Calling iter method on a iterator returns self
iterator_1 = iter(lang_list)
iterator_2 = iter(iterator_1)
print(id(iterator_1))
print(id(iterator_2))
2108767854664 2108767854664
iter(o) == o.iter
next(o) == o.next
Now let me make it clear why Iterators are Iterables but not the other way around.
Iterators are Iterables because they too have an iter method and Iterables are not Iterators as they don't have an next method.
From the outside, the iterator is like a lazy factory that is idle until you ask it for a value, which is when it starts to buzz and produce a single value, after which it turns idle again.
There are two types of generators in Python: generator functions and generator expressions. A generator function is any function in which the keyword yield appears in its body. The other type of generators are the generator equivalent of a list comprehension.
P.S Both yield and return will return some value from a function.
The difference is that, while a return statement terminates a function entirely, yield statement pauses the function saving all its states and later continues from there on successive calls.
Here is what a generator function looks like. Each yield becomes the return value of the next call.
def rangedown(n):
for i in reversed(range(n)):
yield i
generator = rangedown(5)
for x in generator:
print(x)
4 3 2 1 0
generator = rangedown(5)
The cell below does not print anything as the generator has already been iterated over.
for x in generator:
print(x)
Now this is what a generator expression would look like, more or less like any container comprehension. The syntax for generator expression is similar to that of a list comprehension in Python. But the square brackets are replaced with round parentheses.
numbers=[1,2,3,4,5]
lazy_squares = (x * x for x in numbers)
lazy_squares
<generator object <genexpr> at 0x000001EAFC59F7D8>
next(lazy_squares)
1
list(lazy_squares)
[4, 9, 16, 25]
Note that, because we read the first value from lazy_sqaures with next(), it's state is now at the "second" item, so when we consume it entirely by calling list(), that will only return the partial list of sqaures. (This is just to show the lazy behaviour.)
Simple generators can be easily created on the fly using generator expressions. It makes building generators easy. Same as lambda function creates an anonymous function, generator expression creates an anonymous generator function.
Why generators are used in Python? There are several reasons which make generators an attractive implementation to go for.
Easy to Implement
Generators can be implemented in a clear and concise way as compared to their iterator class counterpart.
Following is an example to implement a sequence of power of 2's using iterator class.
Memory Efficient
A normal function to return a sequence will create the entire sequence in memory before returning the result. This is an overkill if the number of items in the sequence is very large. Generator implementation of such sequence is memory friendly and is preferred since it only produces one item at a time.
Represent Infinite Stream
Generators are excellent medium to represent an infinite stream of data. Infinite streams cannot be stored in memory and since generators produce only one item at a time, it can represent infinite stream of data.
class PowTwo:
def __init__(self, max = 0):
self.max = max
def __iter__(self):
self.n = 0
return self
def __next__(self):
if self.n > self.max:
raise StopIteration
result = 2 ** self.n
self.n += 1
return result
This was lengthy. Now lets do the same using a generator function. Since, generators keep track of details automatically, it was concise and much cleaner in implementation.
def PowTwoGen(max = 0):
n = 0
while n < max:
yield 2 ** n
n += 1
from IPython.display import IFrame
IFrame(src='https://pythonclock.org/', width=1000, height=600)