About this notebook

This notebook/presentation has been prepared for the 2017 edition of http://python.g-node.org, the renowned Advanced Scientific Programming in Python summer school (Happy 10th Anniversary!!). I gratefully acknowledge the efforts of the entire Python community, which produced great documentation I largely consumed to create this notebook; a list of which can be found at the end of the notebook. If I have missed anyone, apologies, let me know and I'll add you to the list!

Although you should be able to run the notebook straight out of the box, bear in mind that it was designed to work with Python3, in conjunction with the following nbextensions:

The repository also contains exercises, with and without solutions, which I borrowed from last year's edition of the summer school.

I hope you enjoy it! By all means get in touch! :)

Etienne

In [1]:
import sys
print('Python version ' + sys.version)

import time
from IPython.display import display, Image
from IPython.core.display import HTML

def countdown(t, display_picture=False):
    """Displays countdown.

    Keyword arguments:
    t -- the amount of time to countdown in seconds
    """
    while t:
        mins, secs = divmod(t, 60)
        timeformat = '{:02d}:{:02d} left'.format(mins, secs)
        print(timeformat, end='\r')
        time.sleep(1)
        t -= 1
    print('Hands off of keyboards now!')
    if display_picture:
        display(Image(filename="./picts/aspp2017.png", width=400))
Python version 3.6.1 |Anaconda 4.4.0 (x86_64)| (default, May 11 2017, 13:04:09) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]

Iterators, generators, decorators, and context managers

.. and all these little things you have used without really knowing what is going on

Etienne B. Roesch | University of Reading

http://etienneroes.ch

I ...

  • am/was an old fashioned software engineer
  • am a cognitive scientist, and passionate interdisciplinarian
    • visual perception (psychophysics) and experience (attention, emotion)
    • methods for neuroimaging: coupled EEG-fMRI, ERG
    • modelling at various scales: cells, brain areas, networks of areas
  • increasingly into open science practices
  • increasingly bigger data person (soon Google Cloud Platform certified)

Take-home message

  • Iterators are arcane mechanisms that support loops, and everything else;

  • Generators are kinds of iterators that provide a level of optimisation and interactivity;

  • Decorators are a mechanism to incrementally power-up existing code;

  • Context managers are semantically related to decorators, to manage resources properly.

Iterators

An iterator is any Python type that can be used with a for loop.

They implement the iterator protocol, which describes implicit methods, like __init__(), to iterate in sets of objects. In Python 3, you find them everywhere, e.g. files, i/o streams, etc.

https://docs.python.org/3.6/whatsnew/2.2.html#pep-234-iterators

In [2]:
import numpy as np
nums = np.arange(2)    # ndarray contains [0, 1]
In [3]:
for x in nums:
    print(x, end=" ")
0 1 
In [4]:
iter(nums)             # ndarray is an iterable
Out[4]:
<iterator at 0x1040d00b8>
In [5]:
it = iter(nums)
it.__next__()          # One way to iterate
Out[5]:
0
In [6]:
next(it)    # Another way to iterate
Out[6]:
1
In [7]:
next(it)    # Raises StopIteration exception
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-7-fad128c8b2df> in <module>()
----> 1 next(it)    # Raises StopIteration exception

StopIteration: 

http://www.scipy-lectures.org/intro/language/exceptions.html#exceptions

Leonardo Filius Bonacci (1175-1245), aka Leonardo Fibonacci, defines the recurrence relation that now bears his name and fuels conspiracy theorists.

$F_{n} = F_{n-1} + F_{n-2}$ given $F_{0} = 0, F_{1} = 1$
![noimg](picts/FibonacciSpiral.png)
In [8]:
class Fib:
    '''Iterator Class to calculate the Fibonacci series'''

    def __init__(self, max):
        self.max = max

    def __iter__(self):    # defines initial conditions
        self.a = 0
        self.b = 1
        return self        # returns a handle to the object

    def __next__(self):    # defines behaviour for next()
        fib = self.a
        if fib > self.max:
            raise StopIteration # is caught when in _for_ loop
        temp_b = self.a + self.b
        self.a = self.b
        self.b = temp_b
        return fib         # F_n = F_n-1 + F_n+2

# 33rd degree in Freemason Antient & Accepted Scottish Rite
for i in Fib(33):
    print(i, end=' ')   # literally calls the __next__() method
0 1 1 2 3 5 8 13 21 

Generators

Generators (generator-iterators as they are called) is a mechanism to simplify this process.

Python provides the yield keyword to define generators, which takes care of __iter__() and __next__() for you.

https://www.python.org/dev/peps/pep-0255/

In [9]:
def fib_without_iterator_protocol(max):
    numbers = []          # Needs to return an array of values
    a, b = 0, 1           # a = 0  and  b = 1
    while a < max:
        numbers.append(a)
        a, b = b, a + b   # Evalute right-hand side first and assign
    return numbers        # Returns full list of numbers

for i in fib_without_iterator_protocol(33):
    print(i, end=" ")     # iterates through array of values
0 1 1 2 3 5 8 13 21 

In real life problems, this way of doing things is problematic because it forces us to compute all numbers in turn and to store everything in one go.

yield expression_list

yield does something similar to return:

  • return gives back control to the caller function, and returns some content;
  • yield freezes execution temporarily, stores current context, and returns some content to .__next__()'s caller;

yield saves local state and variables, instruction pointer and internal evaluation stack; i.e. enough information so that .__next__() behaves like an external call.

In [10]:
def fib_with_yield(max_limit):
    '''fib function using yield'''
    a, b = 0, 1          # a = 0  and  b = 1
    while a < max_limit:
        yield a          # freezes execution, returns current a
        a, b = b, a + b  # a = b  and  b = a + b

for i in fib_with_yield(33):
    print(i, end=" ")
0 1 1 2 3 5 8 13 21 
In [11]:
my_masonic_secret = fib_with_yield(33)
my_masonic_secret
Out[11]:
<generator object fib_with_yield at 0x104157620>
In [12]:
next(my_masonic_secret)
Out[12]:
0
In [13]:
next(my_masonic_secret)
Out[13]:
1
In [14]:
next(my_masonic_secret)
Out[14]:
1
In [15]:
next(my_masonic_secret)
Out[15]:
2

... and so on.

Exercise

Write a function that uses yield to draw numbers for the lottery--with replacement is fine! That's six numbers between 1-40; and if you feel ambitious, add one number between 1-15.

In [16]:
countdown(1)
Hands off of keyboards now!

Solution

In [17]:
import random

def super_million_lottery():
    # returns 6 numbers between 1 and 40
    for i in range(6):
        yield random.randint(1, 40)

    # returns a 7th number between 1 and 15
    yield random.randint(1,15)

for i in super_million_lottery():
       print(i, end=" ")
4 18 9 35 33 16 3 

Python's list comprehension, with [..], computes everything at once and can take a lot of memory.

In [18]:
squares = [i**2 for i in range(10)]
squares
Out[18]:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Generator expressions, with (..), are computed on demand.

In [19]:
squares = (i**2 for i in range(10))
squares
Out[19]:
<generator object <genexpr> at 0x1041575c8>

https://www.python.org/dev/peps/pep-0289/

On-demand calculation is important for the streamed processing of big amount of data; where the size of the data is uncertain, values of parameters are changing, etc, or when the processing steps might take a long time, yields errors or enter infinite loops.

Generators are also an easier way to handle callbacks, and can be used to simulate concurrency.

https://www.python.org/dev/peps/pep-0342/

Bash pipeline to count the number of characters, omitting whitespaces, per line, in a given file:

In [20]:
!sed 's/ˆ//g' ./custom.css | tr -d ' ' | awk '{ printf "%i ", length($0); }'
# Noticed the magic "!"? Type %lsmagic in a cell to learn more
# https://blog.dominodatalab.com/lesser-known-ways-of-using-notebooks/
5 41 15 14 21 1 28 11 1 

Same processing pipeline using native generators:

In [21]:
my_custom_css = open("./custom.css")
line_stripped = (line.replace(" ", "").rstrip('\n') for line in my_custom_css)
size_line = (len(line) for line in line_stripped)
for i in size_line:
    print(i, end=" ")
5 41 15 14 21 1 28 11 1 

In your research, you may only need to analyse one single .csv file.

More likely, you will be faced with increasingly bigger and more complex data, leaning towards so-called "big data", whatever that actually is.</font>

This data won't fit in your workspace, may be "live" and constantly changing, and will require real-time or batch analysis methods; e.g., you won't be able to store raw data, but will have to filter it, compute "metrics", like averages, standard deviations, to then make a decision about what to do with the data.

You'll enter the realm of big data techniques, which will attempt to decouple data handling from analysis, and pipeline steps of preprocessing to ease analyses proper.

Keywords: dataflow, processing pipelines and stream processors, map reduce, lambda & kappa architectures, dremmel; e.g., hadoop.

You can simulate concurrency, by interacting with instantiated (currently alive) functions.

In [22]:
def receiver():
    while True:
        item = yield
        print("I'm currently processing:", item)
        
recv = receiver()   # Instantiate function
next(recv)          # Starts function, alt. recv.send(None)
recv.send("Hello")  # Python's .send() to function communicate..
recv.send("World")  # ..with the instantiated object
I'm currently processing: Hello
I'm currently processing: World
In [23]:
recv.close()        # Obviously, clean up after yourself

Generator-iterator cheatsheet


def my_generator():
    ...
    item = yield
    ...
    value = do_something(item)
    ...
    yield value   # return value

gen = my_generator()

next(gen) # Starts generator and advances to yield value = gen.send(item) # Sends and receives stuff gen.close() # Terminates gen.throw(exc, val, tb) # Raises exception result = yield from gen # Handles callback and returns content </code></pre></font>

https://www.python.org/dev/peps/pep-0342/

Decorators

Functions are objects themselves.

In [24]:
def shout(word="hello world"):
    return word.capitalize() + "!"
print(shout())
Hello world!
In [25]:
yell = shout
print(yell())
Hello world!
In [26]:
del shout
try:     # this is how you catch an Exception
    print(shout())      # This won't work
except NameError as e:
    print(e)  
print(yell())           # But this still works
name 'shout' is not defined
Hello world!

Therefore, functions can be defined inside other functions.

In [27]:
def languaging():
    def whisper(word="Hello world"):
        return word.lower() + "..."
    print(whisper())
languaging()
hello world...
In [28]:
try:
    print(whisper())      # is outside the scope!
except NameError as e:
    print(e)
name 'whisper' is not defined
In [29]:
def languaging(type="shout"):
    def shout(word="hello world"):
        return word.capitalize() + "!"
    
    def whisper(word="hello world"):
        return word.lower() + "..."
    
    if type == "shout":
        return shout
    else:
        return whisper

speak = languaging()
print(speak)
<function languaging.<locals>.shout at 0x10416a0d0>
In [30]:
print(speak())
Hello world!
In [31]:
print(languaging("whisper")())
hello world...

If functions, as objects, can be returned, they can also be arguments!

In [32]:
def my_good_old_analysis():
    print("Ah, the way we've always done analysis.")
my_good_old_analysis()
Ah, the way we've always done analysis.
In [33]:
def deprecated(my_function):
    def wrapper():
        print("!!! You should not be using this function.")
        my_function()
        print("!!! Please, don't do it.")
    return wrapper
my_good_old_analysis = deprecated(my_good_old_analysis)
my_good_old_analysis()
!!! You should not be using this function.
Ah, the way we've always done analysis.
!!! Please, don't do it.

And this is exactely what decorators do!

In [34]:
def deprecated(my_function):
    def wrapper():
        print("!!! You should not be using this function.")
        my_function()
        print("!!! Please, don't do it")
    return wrapper

@deprecated  # <-- ain't this a pretty decorator?
def my_even_older_analysis():
    print("Aaaaah, please kill me.")

my_even_older_analysis()
!!! You should not be using this function.
Aaaaah, please kill me.
!!! Please, don't do it

Some in-built Python decorators will ease _abstraction_ (only expose relevant information) and _encapsulation_ (combine data and functions in a usable unit). See: https://docs.python.org/3.6/howto/descriptor.html

In [35]:
class My_class:
    def __init__(self,x):
        self.x = x

    @property                # In-built Python decorator
    def x(self):             # x is publicly accessible 
        return self._x       # _x is private

    @x.setter                # ".setter" in-built Python decorator
    def x(self, x):
        if x < 0:            # Implementation is hidden to end-users
            self._x = 0      # _x actually stores the data 
        elif x > 1000:       # "_" is a warning to end-users
            self._x = 1000   # that things under the hood may
        else:                # change in future releases, and it
            self._x = x      # it's dangerous to assume that'll
            
my_instance = My_class(10000)
my_instance._x -= 1
print( my_instance._x )
999

Exercise

Given a function that uses an ingredient "---Ham---" as a string, write decorators that will draw a sandwich, like this:


 /''''''\ 
@[email protected]
---Ham---
~~Salad~~
 \______/

 /''''''\ 
@[email protected]
---Ham---
~~Salad~~
 \______/
In [36]:
countdown(1)
Hands off of keyboards now!

Solution

In [37]:
def bread(my_function):
    def wrapper():
        print(" /''''''\ ")
        my_function()
        print(" \______/ ")
    return wrapper

def ingredients(my_function):
    def wrapper():
        print("@[email protected]")
        my_function()
        print("~~Salad~~")
    return wrapper
In [38]:
@bread         # Order matters
@ingredients   #
def sandwich(food="---Ham---"):
    print(food)
sandwich()
 /''''''\ 
@[email protected]
---Ham---
~~Salad~~
 \______/ 

Context managers

Context managers are semantically related to decorators.

They aim primarily to help you manage resources properly, i.e., groom your memory, avoid consumer bottlenecks, clean up after yourself, maintain livelihood of connections (db), etc, and other sensible things.

In [39]:
files = []
for x in range(100000):
    files.append(open("how_to_mess_up_my_memory.txt", "w"))
#.. at this point of the notebook, I have messed up my memory
# and won't be able to open any more files
ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

Traceback (most recent call last):
  File "//anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-39-bfc1ac5e0354>", line 3, in <module>
    files.append(open("how_to_mess_up_my_memory.txt", "w"))
OSError: [Errno 24] Too many open files: 'how_to_mess_up_my_memory.txt'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "//anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 1821, in showtraceback
    stb = value._render_traceback_()
AttributeError: 'OSError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "//anaconda/lib/python3.6/site-packages/IPython/core/ultratb.py", line 1132, in get_records
  File "//anaconda/lib/python3.6/site-packages/IPython/core/ultratb.py", line 313, in wrapped
  File "//anaconda/lib/python3.6/site-packages/IPython/core/ultratb.py", line 358, in _fixed_getinnerframes
  File "//anaconda/lib/python3.6/inspect.py", line 1453, in getinnerframes
  File "//anaconda/lib/python3.6/inspect.py", line 1411, in getframeinfo
  File "//anaconda/lib/python3.6/inspect.py", line 666, in getsourcefile
  File "//anaconda/lib/python3.6/inspect.py", line 695, in getmodule
  File "//anaconda/lib/python3.6/inspect.py", line 679, in getabsfile
  File "//anaconda/lib/python3.6/posixpath.py", line 374, in abspath
OSError: [Errno 24] Too many open files
---------------------------------------------------------------------------

In real life, you are dealing with finite resources. When you allocate some resource to a particular task, you need to make sure you use only what you need, and when you are done, you release it for other task/people to use.

In [40]:
print(my_custom_css)       # Remember me? (see Section Generators)
if not my_custom_css.closed:
    print("Clean up, or you'll mess up your memory!")
<_io.TextIOWrapper name='./custom.css' mode='r' encoding='UTF-8'>
Clean up, or you'll mess up your memory!
In [41]:
my_custom_css.close()     # Always clean up after yourself!
del my_custom_css
my_custom_css
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-41-40ecda7ea446> in <module>()
      1 my_custom_css.close()     # Always clean up after yourself!
      2 del my_custom_css
----> 3 my_custom_css

NameError: name 'my_custom_css' is not defined

That's primarily what context managers do for you.

In [42]:
# First, I need to clean up the mess I created by opening
# 100K files, otherwise I won't be able to open files
files = []

#for name in dir():
#    if not name.startswith('files'):
#        del globals()[name]
In [43]:
with open("./custom.css") as my_custom_css:
    for line in my_custom_css:
        print(len(line), end=" ")
7 49 19 18 25 2 30 13 2 
In [44]:
if not my_custom_css.closed:
    print("Clean up, or you'll mess up your memory!")
else:
    print("It's already closed! Ain't that wonderful?")
It's already closed! Ain't that wonderful?

That's all there is to it: the with..as statement instantiates a variable that is short-lived, in a given scope.

It automatically calls a number of "management" functions for you.

You'll find context managers for files, locks, threads, database connections, and you can implement your own.

In [45]:
class File():
    def __init__(self, filename, mode):
        self.filename = filename
        self.mode = mode

    def __enter__(self):
        self.open_file = open(self.filename, self.mode)
        return self.open_file

    def __exit__(self, *args):
        self.open_file.close()

files = []
for _ in range(100000):
    with File('that_shouldnt_mess_up_my_memory.txt', 'w') as myfile:
        files.append(myfile)
len(files)
Out[45]:
100000
In [46]:
for i in range(len(files)):
    if not files[i].closed:  
        print("Arrg, files[%i] is not closed!" % i)
        # Hopefully, there is no output to this cell!! :)

Exercise

Write a context manager that will measure the amount of time spent within its scope.

import time class MyTimeIt(): def init(self): ... # Write something here to initialise time def enter(self): ... # Here start the timer def exit(self, *args): ... # Here measure the amount of time

with MyTimeIt(): time.sleep(2) </code></font>

In [47]:
countdown(1)
Hands off of keyboards now!

Solution

In [49]:
import time

class MyTimeIt():
    def __init__(self):
        self.t = 0.

    def __enter__(self):
        self.t = time.time()

    def __exit__(self, *args):
        print('This function took {:.2f} seconds.'.format(time.time() - self.t))

with MyTimeIt():
    time.sleep(2)
This function took 2.00 seconds.

Take-home message

  • Iterators are arcane mechanisms that support loops, and everything else;

  • Generators are kinds of iterators that provide a level of optimisation and interactivity;

  • Decorators are a mechanism to incrementally power-up existing code;

  • Context managers are semantically related to decorators, to manage resources properly.

Further reading, sources of inspiration and grateful acknowledgements

Thanks for your attention!

This website does not host notebooks, it only renders notebooks available on other websites.

Delivered by Fastly, Rendered by OVHCloud

nbviewer GitHub repository.

nbviewer version: e83752b

nbconvert version: 5.6.1

Rendered (Thu, 04 Jun 2020 18:26:00 UTC)