# Lazy sequences working hard¶

### Summary¶

• A lazy sequence yields values when they're needed
• This makes it possible to work with large and even infinite sequences using limited memory
• Lazy sequences can be chained together
• Python provides language and library level support for lazy sequences
In [2]:
G = 1000**100
R = range(G)
R

Out[2]:
range(0, 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
In [3]:
R[666]

Out[3]:
666
In [4]:
R[::1000]

Out[4]:
range(0, 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, 1000)
In [5]:
42 in R

Out[5]:
True
In [6]:
len(R)

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-6-3c42657e37a7> in <module>()
----> 1 len(R)

OverflowError: Python int too large to convert to C ssize_t
In [ ]:
sum(R)

In [7]:
def take(xs, n):
xs = iter(xs)
return [next(xs) for _ in range(n)]

rs = iter(R)
print(take(rs, 10), '\n', next(rs))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
10


### Procrastination¶

range() creates a range object.

It's not a collection in memory. The range stores start, stop, step values, so it can be iterated over etc. Activity is deferred.

Lazy working, not lazy thinking.

### Historical note¶

• In Python 2 range() created a list and xrange() behaved lazily.
• Python 3 simplifies things.
• Lazy is sufficient.
In [8]:
list(range(10))

Out[8]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

See also: map(), filter(), dict.keys() ...

## Lazy-ish¶

Range objects are not very lazy nor particularly sequential. They give the illusion of being all there all the time.

• You can get their length directly
• You can access the value at an index
• You can slice them
• You can check membership

## Generators¶

Generators offer language level support for lazy sequences.

Generators yield values one at a time.

Step using next(), iterate using for.

You cannot go backwards, query the length, etc.

In [9]:
# Generator function
import random

def flip_a_coin():
'Generate a random sequence of Heads/Tails'
while True:

flip_a_coin()

Out[9]:
<generator object flip_a_coin at 0x000000348A978C50>
In [10]:
ht = flip_a_coin()
print(next(ht), next(ht))
' '.join(next(ht) for _ in range(10))

Heads Heads

Out[10]:
'Heads Heads Heads Heads Heads Heads Heads Tails Tails Tails'
In [11]:
M = 1000 * 1000
heads = sum(1 for h, _ in zip(ht, range(M)) if h == 'Heads')

Out[11]:
(499857, 0.499857)
In [12]:
import collections

last10 = collections.deque(take(ht, 10), maxlen=10)

tosses = enumerate(ht)

last10.append(next(tosses)[1])

next(tosses)

Out[12]:
(282, 'Tails')
In [13]:
# Generator expressions look like list comprehensions
# but they create generators, not lists
(x * x for x in R)

Out[13]:
<generator object <genexpr> at 0x000000348A9B7048>
In [14]:
sqs = (x * x for x in R)
print(take(sqs,10))
print(next(sqs))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
100


## Standard Generators¶

Many functions in the Python standard library are generators.

In [15]:
import os

PY = 'C:/Python36'
tree = os.walk(PY)
next(tree)

Out[15]:
('C:/Python36',
['DLLs',
'Doc',
'include',
'Lib',
'libs',
'Scripts',
'selenium',
'tcl',
'Tools'],
'NEWS.txt',
'python.exe',
'python.pdb',
'python3.dll',
'python36.dll',
'python36.pdb',
'python36_d.dll',
'python36_d.pdb',
'python3_d.dll',
'pythonw.exe',
'pythonw.pdb',
'pythonw_d.exe',
'pythonw_d.pdb',
'python_d.exe',
'python_d.pdb',
'vcruntime140.dll'])
In [16]:
subtrees = (r for r, d, f in os.walk(PY))
take(subtrees, 5)

Out[16]:
['C:/Python36',
'C:/Python36\\DLLs',
'C:/Python36\\Doc',
'C:/Python36\\include',
'C:/Python36\\Lib']
In [17]:
def ilen(xs):
"Return the length of an iterable"
return sum(1 for _ in xs)

all_files = (os.path.join(r, f)
for r, _, ff in os.walk(PY)
for f in ff)

py_files = (py for py in all_files if py.endswith('.py'))

max(py_files, key=lambda py: ilen(open(py, errors='ignore')))

Out[17]:
'C:/Python36\\Lib\\pydoc_data\\topics.py'

## Shell Pipeline¶

$find /c/Python36 -name "*.py" | xargs wc -l | grep -v total$ | sort -rn | head -1
12966 /c/Python36/Lib/pydoc_data/topics.py

## Chunked¶

In [18]:
rs = iter(R)
r3 = [rs, rs, rs]
[next(r) for r in r3]

Out[18]:
[0, 1, 2]
In [19]:
[next(r) for r in r3]

Out[19]:
[3, 4, 5]
In [20]:
z = zip(*r3)
take(z, 3)

Out[20]:
[(6, 7, 8), (9, 10, 11), (12, 13, 14)]
In [21]:
def chunks(xs, n):
return zip(*[iter(xs)]*n)

take(chunks(R[::1000], 2), 5)

Out[21]:
[(0, 1000), (2000, 3000), (4000, 5000), (6000, 7000), (8000, 9000)]

## Flattened¶

In [22]:
import itertools
flatten = itertools.chain
flatten('abc', 'def')

Out[22]:
<itertools.chain at 0x3489510c88>
In [23]:
list(_)

Out[23]:
['a', 'b', 'c', 'd', 'e', 'f']
In [24]:
xs = chunks(R[::1000], 2)
flatten = itertools.chain.from_iterable
ys = flatten(xs)
take(ys, 5)

Out[24]:
[0, 1000, 2000, 3000, 4000]

## Itertools¶

• composable functions for working with lazy sequences
• chain()
• count()
• islice()
• cycle()
• repeat()
• etc
In [25]:
import itertools as its
def ilen(xs): return sum(1 for _ in xs)
def run_length_encode(xs):
''' Run length encodes the stream of values, xs

The result is a stream of (value, repeat-count) pairs
'''
return ((x, ilen(g)) for x, g in its.groupby(xs))

rle = run_length_encode(flip_a_coin())
list(its.islice(rle, 5))

Out[25]:
[('Tails', 1), ('Heads', 1), ('Tails', 2), ('Heads', 1), ('Tails', 3)]
In [26]:
def run_length_decode(vc):
'''Reverses run_length_encode

Expands a stream of (value, repeat-count) pairs into
a stream of values.
'''
return its.chain.from_iterable(its.starmap(its.repeat, vc))

scream = ('A', 6), ('R', 4), ('G', 5), ('H', 3), ('!', 13)
''.join(run_length_decode(scream))

Out[26]:
'AAAAAARRRRGGGGGHHH!!!!!!!!!!!!!'
In [27]:
import random

def story_points():
while True:
yield int(random.gauss(100, 25))

def smooth(xs, n=10):
b, e = its.tee(xs)
s = sum(its.islice(e, n))
for bn, en in zip(b, e):
s += en - bn
yield s / n

take(smooth(story_points(), 10000), 10)

Out[27]:
[99.2981,
99.3007,
99.2982,
99.3028,
99.3117,
99.3128,
99.3161,
99.3211,
99.3171,
99.3166]

## Emulating Shell Syntax¶

You have to read the expression:

take(smooth(story_points(), 10), 10)



inside out.

Wouldn't it be nice to borrow shell syntax?

story_points() | smooth(10) | take(10)
In [28]:
class pipe:
def __init__(self, fn):
self._fn = fn

def __ror__(self, stdin):
return self._fn(stdin)

def __call__(self, *v, **k):
return pipe(lambda x: self._fn(x, *v, **k))

# Credit: https://github.com/JulienPalard/Pipe

In [29]:
psmooth = pipe(smooth)
ptake = pipe(take)
story_points() | psmooth(2) | ptake(10)

Out[29]:
[113.0, 129.0, 123.0, 110.5, 84.0, 74.5, 63.0, 77.0, 101.0, 117.0]
In [30]:
@pipe
def bounds(xs):
m, M = 1e9, -1e9
for x in xs:
m, M = min(m, x), max(M, x)
yield m, M

story_points() | bounds() | ptake(10)

Out[30]:
[(101, 101),
(101, 102),
(29, 102),
(29, 128),
(29, 128),
(29, 128),
(29, 128),
(29, 128),
(29, 128),
(29, 128)]