# CS 506/606 Research Programming (Kyle Gorman): Introduction to Python¶

Python is a popular general-purpose high-level programming language which has achieved particular success in research. It is an interpreted, dynamically-typed, garage-collected language. While it is primarily an imperative language, it also supports constructs from functional programming and object-oriented programming, among others. Let's dive right in to the language; next week, we'll cover Python implementations, style, tooling, and so on.

# Preliminaries¶

## Everything's an object. Really.¶

Every variable in Python is an instance of an object in the sense of object-oriented programming. This includes:

• numeric types
• functions, which store their documentation as instance variables
• modules of code, which map variables and functions onto instance variables and methods
• execution traces and exceptions

Many objects have instance variables. If the instance dog has a variable name, it is stored in dog.name.

## Calling functions¶

A function is simply a bit of code which takes zero or more arguments and either returns nothing (or more specifically, returns None) or returns one object.

To call (i.e., run) a function, state the function name followed by an open parenthesis (, any arguments (separated by a comma), and a closed parenthesis ).

In [1]:
print int        # pointer to a function

<type 'int'>

In [2]:
print int()      # calling a function with no arguments

0

In [3]:
print int(3.7)   # calling a function with one argument

3

In [4]:
print cmp(2, 3)  # calling a function with two arguments

-1


Many objects have instance methods. If the instance dog has a no-argument instance method bark, it can be called as dog.bark().

## Variable assignment¶

Python is a dynamically-typed language which uses type inference to determine variable types at runtime. Therefore, there is no need to declare a variable's type, though there are ways to enforce type safety at runtime, or to check type safety off-line.

The = operator is used for variable assignment.

In [5]:
dog = 2


Variable names may consist of any ASCII alphabetic character or the underscore character, and in non-initial positions, digits.

In [6]:
_ = "syntax demands this variable, but I don't care about it"
snake_count = 3       # most common Python style
camelCount = 4        # much less common in Python
fromJustin2KellyReleaseDate = 2003


Python 3 change: Python 3 also allows Unicode characters in variable names (e.g., ε = 1e-6).

## Numeric types¶

Python supports a large hierarchy of numeric types, including integers, floating point numbers, fractions, and decimals.

### Base-10 numbers¶

Python ints are arbitrary-precision real-valued integers.

In [7]:
x = 3          # real integer


Python floats are real or complex base-10 floating-point numbers with 53 bits of precision.

In [8]:
y = -4.7       # real floating point number

In [9]:
z = 8.5 - 2j   # complex floating point number

In [10]:
print (z.real, z.imag)

(8.5, -2.0)


The int and float functions are used for typecasting.

In [11]:
print int(y)         # cast float to int, flooring it towards 0

-4

In [12]:
print float(3)

3.0

In [13]:
print int("27")      # like atoi

27

In [14]:
print float("-.23")  # like atof

-0.23


Binary operators with one int operand and one float operand usually coerce the int to a float.

In [15]:
print x + y

-1.7


Caution: division of two integers produces an integer, truncating if necessary.

In [16]:
print 27 / 8    # =: 27 // 8

3


Python 3 change: / produces a floating point number, and // produces a truncated integer.

### Basic mathematical functions¶

The math module provides C99-compatible mathematical functions for real numbers.

In [17]:
from math import hypot, log  # we'll cover modules and namespaces later

In [18]:
print hypot(23, 25)

33.9705755029


If a math function would produce a complex number, an exception is raised.

In [19]:
print log(-1)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-f76f59a8c27e> in <module>()
----> 1 print log(-1)

ValueError: math domain error

The cmath module provides core mathematical functions defined for complex numbers.

In [20]:
from cmath import log, polar
print polar(2.7 - 3j)

(4.036087214122113, -0.83798122500839)


Unlike math, cmath functions may return complex numbers.

In [21]:
print log(-1)

3.14159265359j


### Exact representations of rational numbers¶

Many rational numbers (like $1.1 = \frac{11}{10}$) lack an exact binary floating-point representation.

In [22]:
print repr(1.1 + 2.2)

3.3000000000000003


To represent rational numbers exactly, use fractions.Fraction or decimal.Decimal.

In [23]:
from fractions import Fraction
print Fraction('11/10') + Fraction('22/10')

33/10

In [24]:
from decimal import Decimal
print Decimal("1.1") + Decimal("2.2")

3.3


## Booleans¶

In Python, the boolean literals are True and False. To cast an arbitrary item to a boolean, use bool. All the following are cast to False:

• numeric types that are equivalent to zero
• empty containers
• empty strings
• None

Most everything else is cast to True.

In [25]:
print bool(-0.0)

False

In [26]:
print bool("non-null string")

True


## Iteration & flow control¶

Python makes extensive use of for loops. Unlike traditional for loops, however, they do not have conditional-based termination conditions; rather, they are more akin to a "foreach", in which a variable is bound on each iteration of the loop until an iterable has been exhausted. The syntax is:

for [variable to be bound] in [iterable object]:
...



In Python, whitespace is semantically meaningful: indentation is mandatory, and lines which are differentially indented are not part of the same block. For example, if one (non-null) line is more indented than the next (non-null) line, the first line belongs to a code block which has narrower scope than the second. In for loops, the loop expressions must have a consistent intentation which is greater than the for statement itself.

In [27]:
n = 5
for i in xrange(n):  # C-style loop
print i

0
1
2
3
4

In [28]:
grunge_bands = ["Alice In Chains", "Nirvana",  # this syntax will be
"Pearl Jam", "Soundgarden"]    # explained shortly
for band in grunge_bands:
print band

Alice In Chains
Nirvana
Pearl Jam
Soundgarden


To simultaneously iterate and preserve a number index, use the enumerate built-in function.

In [29]:
for (i, band) in enumerate(grunge_bands):
print "{}: {}".format(i, band)

0: Alice In Chains
1: Nirvana
2: Pearl Jam
3: Soundgarden


To iterate over two or more iterable objects simultaneously, use the built-in function zip.

In [30]:
for (i, j) in zip(xrange(n), xrange(n, n + n)):
print (i, j)

(0, 5)
(1, 6)
(2, 7)
(3, 8)
(4, 9)


Caution: if the objects in a zip statement are not the same length, zip will terminates after the shortest object has been exhausted.

Python also supports while loops.

In [31]:
i = 0
while i < 23:
i += 1
print i

23


Python for and while loops support the continue and break keywords.

The continue keyword causes the current iteration of the loop to be abandoned.

In [32]:
for value in range(10):
if value % 2 == 0:
continue
print "{} is odd".format(value)

1 is odd
3 is odd
5 is odd
7 is odd
9 is odd


The break keyword terminates the entire loop.

In [33]:
for value in range(1, 1 + 10):
if value % 7 == 0:
print "{} is a multiple of 7.".format(value)
print "And I'm outta here."
break
print "{} is not a multiple of 7.".format(value)

1 is not a multiple of 7.
2 is not a multiple of 7.
3 is not a multiple of 7.
4 is not a multiple of 7.
5 is not a multiple of 7.
6 is not a multiple of 7.
7 is a multiple of 7.
And I'm outta here.


## Container types¶

### list¶

The simplest container type in Python is the list, which is a

• mutable
• dynamically-sized
• ordered

sequence of zero or more arbitrary objects. It is declared using square braces.

In [34]:
empty_list = []
grunge_bands = ["Alice In Chains", "Nirvana",
"Pearl Jam", "Soundgarden"]


While a Python list is mutable, it is not a linked list, but rather a dynamically allocated array; thus, they make excellent stacks (via list.append and list.pop).

In [35]:
stack = [5, 6, 7]
stack.append(8)
stack.append(9)
print stack.pop()
print stack

9
[5, 6, 7, 8]


Caution: On the other hand, insertion to and deletion of objects in non-final positions is relatively slow. If you need to do many insertions and/or deletions in initial position, use the collections.deque ("double-ended queue") object instead.

To perform a "bulk append", use list.extend.

In [36]:
grunge_bands.extend(["Temple Of The Dog", "Hole",
"Stone Temple Pilots"])
print grunge_bands

['Alice In Chains', 'Nirvana', 'Pearl Jam', 'Soundgarden', 'Temple Of The Dog', 'Hole', 'Stone Temple Pilots']


To sort a list in place, use list.sort.

In [37]:
grunge_bands.sort()  # strings are sorted lexographically, of course
print grunge_bands

['Alice In Chains', 'Hole', 'Nirvana', 'Pearl Jam', 'Soundgarden', 'Stone Temple Pilots', 'Temple Of The Dog']


To reverse a list in place, use list.reverse.

In [38]:
grunge_bands.reverse()
print grunge_bands

['Temple Of The Dog', 'Stone Temple Pilots', 'Soundgarden', 'Pearl Jam', 'Nirvana', 'Hole', 'Alice In Chains']


### Indexing and slicing¶

Python uses zero-based indexing for ordered containers.

In [39]:
print grunge_bands[0]

Temple Of The Dog


To get the length of a list, use the built-in function len.

In [40]:
last_element = grunge_bands[len(grunge_bands) - 1]  # not great


Negative indices are used to access elements with reference to the final element rather than with reference to the first.

In [41]:
last_element = grunge_bands[-1]  # much more idiomatic


Instead of a single digit index, we can obtain a list of multiple items. This is called slicing. A two-place slice consist of a lower and upper index, separated by a colon. The resulting slice is inclusive on the first index but exclusive on the second index.

In [42]:
print grunge_bands[2:5]  # returns three, not four, objects

['Soundgarden', 'Pearl Jam', 'Nirvana']


If the first index of a two-place slice is not specified, it is assumed to be 0 and thus, left-aligned.

In [43]:
print grunge_bands[:2]  # first two

['Temple Of The Dog', 'Stone Temple Pilots']


Similarly, if the second index of a two-place slice is not specified, it is assumed to be the length of the list, and thus, right-aligned.

In [44]:
print grunge_bands[1:]  # all but the first

['Stone Temple Pilots', 'Soundgarden', 'Pearl Jam', 'Nirvana', 'Hole', 'Alice In Chains']

In [45]:
print grunge_bands[-2:]  # the last two

['Hole', 'Alice In Chains']


It is also possible to perform a three-place slice. The third argument is the step size expressed as signed (non-zero) integer.

In [46]:
print grunge_bands[::2]   # odd-numbered elements

['Temple Of The Dog', 'Soundgarden', 'Nirvana', 'Alice In Chains']

In [47]:
print grunge_bands[::-1]  # a copy of the list in reverse

['Alice In Chains', 'Hole', 'Nirvana', 'Pearl Jam', 'Soundgarden', 'Stone Temple Pilots', 'Temple Of The Dog']


### tuple¶

A tuple is an

• immutable
• statically-sized
• ordered

sequence of zero or more arbitrary objects. Parentheses and commas are used to initialize tuples.

In [48]:
favorite_snacks = ("Fresh Pineapple", "Salt & Vinegar Chips")


Caution: a single-value tuple cannot be defined simply by wrapping the argument with parentheses; a single comma must also be used.

In [49]:
even_primes_bad = (2)
print type(even_primes_bad)   # not going to work
even_primes_good = (2,)
print type(even_primes_good)

<type 'int'>
<type 'tuple'>


Tuples are often used as a complex return value for a function.

In [50]:
from math import modf
print modf(3.7)  # modf splits a real-valued float into fractional and integer components

(0.7000000000000002, 3.0)


Unlike lists, tuples are immutable.

In [51]:
favorite_snacks[1] = "Quinoa"  # not going to work

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-51-a702b4cf07e8> in <module>()
----> 1 favorite_snacks[1] = "Quinoa"  # not going to work

TypeError: 'tuple' object does not support item assignment

### set and frozenset¶

The set and frozenset types are unordered containers backed by an open-addressed hashtable, allowing for constant-time lookup.

Instances of set are

• dynamically-sized
• mutable

and implement methods like add and remove.

In [52]:
nineties_gift_ideas = set(["Nirvana CD", "CK One",
"Plaid backpack", "Slap bracelet"])

set(['CK One', 'Nirvana CD', 'Slap bracelet', 'Trapper Keeper', 'Plaid backpack'])


Instances of frozenset are similar to set, but are

• statically-sized
• immutable
In [53]:
my_parents = frozenset(["Mom", "Dad"])

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-53-67883567a708> in <module>()

AttributeError: 'frozenset' object has no attribute 'add'

Both type of sets support set logic operations.

In [54]:
tall_people = set(["Abraham Lincoln", "Lyndon B. Johnson",
"Dracula", "Wolfman",
"Uma Thurman", "Bob Saget"])
monsters = set(["Mummy", "The Thing", "Dracula", "Wolfman",
"Jaws"])

In [55]:
print tall_people & monsters  # intersection, i.e., tall monsters

set(['Dracula', 'Wolfman'])

In [56]:
print monsters <= tall_people  # is subset, i.e., are all monsters tall?

False


Neither sets nor frozensets support indexing.

In [57]:
print monsters[0]  # not going to work

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-57-dfb9da438998> in <module>()
----> 1 print monsters[0]  # not going to work

TypeError: 'set' object does not support indexing

### dict¶

The Python dict ("dictionary") is a dynamically allocated open-addressed hash-table storing unordered key-value pairs. Keys must be unique and immutable (like a number, str, set, or frozenset); values can be virtually anything (including other dictionaries). Curly braces { and } are used to initialize dictionaries. Key-value pairs are separated by commas, and key and value are separated by a colon.

In [58]:
empty_dict = {}

In [59]:
prez2vp = {"George W. Bush": "Dick Cheney",
"Barack Obama": "Joe Biden"}


Dictionaries are used to map keys to values. A key's corresponding value is accessed by indexing the dictionary with the key.

In [60]:
print prez2vp["George W. Bush"]

Dick Cheney

In [61]:
print prez2vp["Bill Clinton"] = "Al Gore"

  File "<ipython-input-61-17676ca9bcde>", line 1
print prez2vp["Bill Clinton"] = "Al Gore"
^
SyntaxError: invalid syntax


If a key is not present, an exception is thrown.

In [62]:
print prez2vp["Kanye West"]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-62-6210ac1ec05c> in <module>()
----> 1 print prez2vp["Kanye West"]

KeyError: 'Kanye West'

To avoid this, check for the presence of a key before attempting to index with it.

In [63]:
print "Ross Perot" in prez2vp

False


### Iteration¶

There are several ways to view the elements in a dictionary. One can iterate over

• keys with dict.iterkeys()
• values with dict.itervalues()
• key-value pairs with dict.iteritems()
In [64]:
# over keys
for prez in prez2vp.iterkeys():  # or: for prez in prez2vp:
print prez

George W. Bush
Barack Obama

In [65]:
# over values
for vp in prez2vp.itervalues():
print vp

Dick Cheney
Joe Biden

In [66]:
# over key-value pairs; more on this syntax later
for (prez, vp) in prez2vp.iteritems():
print (prez, vp)

('George W. Bush', 'Dick Cheney')
('Barack Obama', 'Joe Biden')


Caution: While it's tempting to iterate over keys, and look up the values in the loop, dict.iteritems() is both faster and more explicit.

In [67]:
# over key-value pairs, the BAD WAY
for prez in prez2vp.iterkeys():
vp = prez2vp[prez]
print (prez, vp)

('George W. Bush', 'Dick Cheney')
('Barack Obama', 'Joe Biden')


### Container type-casting¶

The functions list, tuple, set, frozenset, and dict can all be used to cast one container to a different type.

In [68]:
x = ["fi", "fie", "foe", "fum"]
print tuple(x)

('fi', 'fie', 'foe', 'fum')


## Strings and regular expressions¶

### String literals¶

Strings (instances of class str) are defined using single or double quotes. I prefer double quotes.

In [69]:
abacab = "Abacadabra"
shaz = 'Shazam'


For multi-line strings (such as documentation strings), use three sets of double quotes.

In [70]:
def prove_fermat():
"""
This function uses a flux capacitor and dilithium crystals to prove
Fermat's last theorem. Unfortunately, the complexity of this operation
is at least factorial.
"""
raise NotImplementedError

help(prove_fermat)

Help on function prove_fermat in module __main__:

prove_fermat()
This function uses a flux capacitor and dilithium crystals to prove
Fermat's last theorem. Unfortunately, the complexity of this operation
is at least factorial.



### String methods¶

In Python, string instances have many more methods than most languages you are probably familiar with. These are often used where you might use a simple regular expression in some other language.

In [71]:
print abacab == shaz           # value equality?

False

In [72]:
print abacab.startswith("Aba")  # does the string start with "Aba"?

True

In [73]:
print abacab.endswith("ra")     # does the string end with "ra"?

True

In [74]:
print "cad" in abacab           # does the substring exist?

True

In [75]:
print abacab.islower()          # is the string entirely lowercase alphabetic characters?

False

In [76]:
print abacab.upper()            # convert ASCII lowercase characters to uppercase

ABACADABRA

In [77]:
print "    Words        ".strip()  # remove leading or trailing whitespace

Words

In [78]:
print str(23).zfill(4)           # left-pad a number string with zeros

0023

In [79]:
print "It's the end of the world and I feel fine".split()  # whitespace split

["It's", 'the', 'end', 'of', 'the', 'world', 'and', 'I', 'feel', 'fine']

In [80]:
print "{:.04f} is a value I'd like to interpolate".format(3. / 4.)

0.7500 is a value I'd like to interpolate


### String encoding and storage¶

In Python 2, str objects are represented internally as arrays of bytes. str.decode returns a unicode object, and unicode.encode returns str.

Python 3 change: str objects are encoded internally as Unicode. str.encode returns a bytes object (an array of bytes), and bytes.decode returns a str object. Most people find this more intuitive, so if you work with Unicode data a lot, you may want to try out Python 3.

If this is all Greek to you, no worries; we'll cover it in more depth in the NLP class.

### Regular expressions¶

In [81]:
from re import finditer, findall, match, search, split, sub
farm_animal_noises = "ba ba ba oink oink moo oink oink oink"


The re module provides regular expressions using similar syntax to many other languages; by convention, regular expression strings are prefixed with an r. It is safe to state a regular expression in string form even if it will be used many times, because Python caches compiled regular expressions.

In [82]:
sheep_regex = r"(ba(\s+ba)*)"
pig_regex = r"(oink(\s+oink)*)"


The match function applies a regular expression to a string, and returns a MatchObject instance if the regular expression matches the left edge of the string.

In [83]:
m = match(sheep_regex, farm_animal_noises)
if m:
print m.group()

ba ba ba

In [84]:
m = match(pig_regex, farm_animal_noises)  # no match
if m:
print m.group()


The search function applies a regular expression to a string, and returns a MatchObject instance if the regular expression matches any part of the string; the resulting MatchObject will describe the first match.

In [85]:
m = search(pig_regex, farm_animal_noises)
if m:
print m.group()  # but not the second sequence of oinks

oink oink


The finditer function applies a regular expression to a string, and returns an iterator of MatchObject instances representing all non-overlapping matches.

In [86]:
for m in finditer(pig_regex, farm_animal_noises):
print m.group()

oink oink
oink oink oink


The findall function is similar to finditer, but instead of returning an iterator of MatchObject instances, it returns an iterator of tuples containing the capture groups.

In [87]:
for groups in findall(pig_regex, farm_animal_noises):
print groups

('oink oink', ' oink')
('oink oink oink', ' oink')


The sub function applies a regular expression to a string, and when a match is found, the match is replaced with another string. Backreferences to capture groups are also supported. Note that the replacement string goes before the query string.

In [88]:
print sub(pig_regex, "*OINK*", farm_animal_noises)

ba ba ba *OINK* moo *OINK*


The split function applies a regular expression to a string, splitting it into substrings at the point of any match.

In [89]:
print split(r"\s+", farm_animal_noises) # =: farm_animal_noises.split()

['ba', 'ba', 'ba', 'oink', 'oink', 'moo', 'oink', 'oink', 'oink']


## Functions¶

The def keyword is used to define functions.

Unlike some languages (e.g., Java), Python does not permit function overloading. However, it does permit keyword arguments with default values.

In [90]:
def sum_of_squares(iterable, default_value=0):
return sum((item * item for item in iterable), default_value)

print sum_of_squares([2, 5, 7, 9])

159


Caution: do not attempt to initialize any sort of instance within a default argument; the initialization will be run only at function declaration time (i.e., it is a closure), not each time the function is called.

In [91]:
def add5tolist_crazy(mylist=[]):   # doesn't do what you expect!
mylist.append(5)
return mylist


[5]
[5, 5]
[5, 5, 5]


If you want the default argument to be initialized every time the function is called, set it to None and then initialize it in the function body.

In [92]:
def add5tolist_sane(mylist=None):   # probably what you had in mind
if mylist is None:
mylist = []
mylist.append(5)
return mylist


[5]
[5]
[5]


Functions may accept an arbitrary number of arguments. To enable an arbitrary number of non-keyword arguments, add *args to the argument list; all non-keyword arguments in that position will be bound to a tuple called args.

In [93]:
def first_and_rest(*args):
return (args[0], args[1:])

print first_and_rest(0, 5, 6, 7, 9, 12, 14, 15)

(0, (5, 6, 7, 9, 12, 14, 15))


Similarly, to admit an arbitrary number of keyword arguments, add **kwargs to the argument list; all keyword arguments not yet bound will be bound to a dictionary called kwargs.

## Destructuring¶

Functions can only return one object per call, and iterators can only return one object per iteration. However, Python supports destructuring (i.e., unpacking) of complex objects returned by function calls or yielded by iterators so that these complex objects are mapped onto multiple values.

In [94]:
(first, rest) = first_and_rest(2, 5, 9, 23, 27)
print rest

(5, 9, 23, 27)

In [95]:
for (prez, vp) in prez2vp.iteritems():   # repeated from above
print (prez, vp)

('George W. Bush', 'Dick Cheney')
('Barack Obama', 'Joe Biden')


Python 3 change: the * prefix before a variable name in an unpacking expression causes that variable to bound to a tuple from the right-hand argument (e.g., (head, *tail) = iterable).

## Generators and itertools¶

### Generators¶

Generators are a special class of functions which return generator objects. Generators are essentially functions that lazily evaluated, meaning that the return values are computed when needed, not when the function call is evaluated; similarly, a generator object (the type of object that is returned by a generator) is a container for these lazily computed return values. The generator object produces these return values by storing the state of an incompletely evaluated function.

There are many uses for generators:

• infinite functions: traditional functions must halt, generators calls need not halt
• buffered input-output: generators are useful when the function does work that is faster when done in large batches
• memory management: traditional functions must produce all return values before returning, but generator objects can produce one return value at a time

Generators can be identified by the use of the yield statements. Each yield statement that is executed will produce one object in the resulting generator object.

In [96]:
def fibonacci():
"""
Neverending Fibonacci number generator according to the
OEIS:A00045 definition.
"""
lower = 0
upper = 1
while True:
yield lower
summed = lower + upper
lower = upper
upper = summed


When we call a generator like fibonacci, an empty generator object is created and argument variable are bound, but no other work is done.

In [97]:
fibgen = fibonacci()
print fibgen

<generator object fibonacci at 0x1098a8cd0>


The next function causes the generator function to evaluate until it hits a yield statement. At this point, the value of the yield statement is returned and control is returned to the caller until next is called again.

In [98]:
print next(fibgen)

0

In [99]:
print next(fibgen)

1

In [100]:
print next(fibgen)

1

In [101]:
print next(fibgen)

2


Generator objects can be called like this until (or unless) the interpreter encounters an empty return statement in the generator function, or an exception is thrown. In the case of the fibonacci generator, it will run until the system runs out of memory.

Even when a function is finite, generators tend to be more terse than their traditional alternative, since there is no need to initialize containers.

In [102]:
def triangular(x):
"""
The xth triangular number is the number of dots composing a
triangle with n dots on a side
"""
return (x * x + x) // 2

"""
Traditional function returning the first n triangular numbers
"""
retval = []
for x in range(1, 1 + n):
retval.append(triangular(x))
return retval

def first_triangular_generator(n):
"""
Generator producing the first n triangular numbers
"""
for x in range(1, 1 + n):
yield triangular(x)


Finite generator objects can be evaluated all at once by casting them to a container type, like a list or set.

In [103]:
print list(first_triangular_generator(10))

[1, 3, 6, 10, 15, 21, 28, 36, 45, 55]


This is often faster than the traditional function variant returning a container type. Presumably, the generator formulation enables certain optimizations.

In [104]:
% timeit list(first_triangular_traditional(10000))

100 loops, best of 3: 6.48 ms per loop

In [105]:
% timeit list(first_triangular_generator(10000))

100 loops, best of 3: 6.07 ms per loop


Generator objects can also be used in iteration contexts.

In [106]:
for tri in first_triangular_generator(5):
print tri

1
3
6
10
15


### itertools¶

The itertools module contains many functions producing complex generator objects. A few examples follow.

In [107]:
from itertools import combinations, combinations_with_replacement, permutations, product, repeat

In [108]:
print " ".join(repeat("na", 5))

na na na na na

In [109]:
print list(product("ABCD", "XYZ"))

[('A', 'X'), ('A', 'Y'), ('A', 'Z'), ('B', 'X'), ('B', 'Y'), ('B', 'Z'), ('C', 'X'), ('C', 'Y'), ('C', 'Z'), ('D', 'X'), ('D', 'Y'), ('D', 'Z')]

In [110]:
print list(combinations("ABCD", 2))

[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]

In [111]:
print list(combinations_with_replacement("ABCD", 2))

[('A', 'A'), ('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'B'), ('B', 'C'), ('B', 'D'), ('C', 'C'), ('C', 'D'), ('D', 'D')]

In [112]:
print list(permutations("ABCD", 2))

[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'A'), ('B', 'C'), ('B', 'D'), ('C', 'A'), ('C', 'B'), ('C', 'D'), ('D', 'A'), ('D', 'B'), ('D', 'C')]


## Object-oriented programming idioms¶

Python supports the creation of classes (or objects). Instances of an class have their own state (instance variables) and functions (instance methods) which have access to this state. Functions of a class may also be inherited from one or more superclasses. The topmost node in the class hierarchy is object; all classes inherit from it.

### Defining classes¶

The following is a (trivial) class declaration.

In [113]:
class Dog(object):

def __init__(self, name):
self.name = str(name)

def bark(self):
return "woof!"

def whos_a_good_boy(self):
return self.name


To initialize an instance of the Dog class, we call it as if it were a function using the signature of the constructor instance method, called __init__.

In [114]:
fido = Dog("Fido")
print fido.bark()

woof!


Every instance method takes self as its first argument. The ultimate reason for this is that Python treats classes as namespaces, and instance methods as syntactic sugar; for instance, fido.bark() is shorthand for Dog.bark(fido).

As seen in the __init__ and whos_a_good_boy functions above, self is also used as a prefix for defining or accessing instance variables.

### Inheritance¶

Rather than inheriting from object directly, a class can inherit methods and/or variables from another subclass of object.

In [115]:
class FarmAnimal(object):

"""
A farm animal with a name
"""

def __init__(self, name):
self.name = str(name)

class Dog(FarmAnimal):

"""
A dog that barks and knows it's a good boy
"""

# __init__ is now inherited from superclass

def bark(self):
return 'woof!'

def whos_a_good_boy(self):
return self.name

fido = Dog("Fido") # calls FarmAnimal.__init__


Classic multiple inheritance is also permitted.

In [116]:
class ShepherdDog(Dog):

"""
A Dog which can also herd (sort) sheep
"""

def herd(self, sheep):
return sorted(sheep)

rex = ShepherdDog("Rex") # still calls FarmAnimal.__init__


Classes also can inherit from multiple superclasses which are not themselves in a superclass/subclass relationship, assuming Python is able to determine an unambiguous method resolution order. This is perhaps most useful when the superclasses are partially abstract, and are "mixed in" to grant multiple unrelated classes the same abilities.

In [117]:
class Oinker(object):

"""
Abstract class with oinking ability
"""

def oink(self):
return "oink!"

class OinkingDog(Dog, Oinker):

"""
A Dog which can oink
"""

pass # the syntax demands a statement here

rover = OinkingDog("Rover")
print rover.oink()

oink!


Class definitions may also redefine (overload) the behavior of operators or certain built-in functions (like len) via magic methods.

In [118]:
class Band(object):

"""
A band represented by a band name and a list of members
"""

def __init__(self, name, members):
self.name = str(name)
self.members = list(members)

"""
Used when a new member joins the band
"""
self.members.append(member)

def remove_member(self, member):
"""
Used when a member quits the band
"""
self.members.remove(member)

def __repr__(self):
"""
Produces the string used to print the object (in most cases)
"""
return "{}(name={!r}, members={!r})".format(self.__class__.__name__,
self.name,
self.members)

def __len__(self):
"""
Returns the length of the instance when called as the argument
to len
"""
return len(self.members)

def __iter__(self):
"""
Iterates over instance, when instance is placed in an iteration
context or when called as the argument to iter
"""
return iter(self.members)

def __contains__(self, member):
return member in self.members

def __delitem__(self, member):
"""
Removes a member from the band, when del self[member] is
called; this just mimics self.remove_member(member)
"""
self.remove_member(member)

def jam(self):
raise NotImplementedError

In [119]:
band = Band("Holy Rollers", ["Yulia Chang", "Chris Schmutte",
"Sluggo", "Rebop"])
band.remove_member("Rebop")            # he spontaneously combusted
band.add_member("Claire St. Huffins")  # so we needed a new drummer

In [120]:
for member in band:   # iteration context
print member

Yulia Chang
Chris Schmutte
Sluggo
Claire St. Huffins

In [121]:
print "Woody Smith" in band  # __contains__ context

False


For a full list of magic methods, see the excellent A Guide To Magic Methods by R. Kettler.

## Functional programming idioms¶

Python supports a number of functional programming idioms. This is possible since functions are objects too, and can be passed as arguments to other functions.

### filter¶

The filter function provides a general schema for filtering out items from an iterable. It works a bit like the following:

In [122]:
def my_filter(fnc, iterable):
retval = []
for item in iterable:
if fnc(item):
retval.append(item)
return retval


The first argument to filter is a one-place function which is applied to every item in the second argument, an iterable. If the return value evaluates to True, the item is copied into the new iterable; otherwise it is left out.

The lambda keyword is used to define simple one-line anonymous functions, such as the ones you might want to pass to filter.

In [123]:
print filter(lambda x: x % 2 == 0, xrange(1, 1 + 20))

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]


Python 3 change: filter now returns a generator-like object rather than a list.

### map¶

The map function provides a general schema for applying a one-place function to all elements in an iterable. It works a bit like the following:

In [124]:
def my_map(fnc, iterable):
retval = []
for item in iterable:
retval.append(fnc(item))
return retval


Like filter, the first argument to map is a one-place function which is applied to every item in the second argument, an iterable. And like filter, the lambda keyword is often useful here.

In [125]:
print map(lambda x: (x * x + x) // 2, xrange(1, 1 + 20))

[1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120, 136, 153, 171, 190, 210]


Python 3 change: map now returns a generator-like object rather than a list.

### filter and map via comprehensions¶

But filter and map have been somewhat obsoleted by list, dictionary, and set comprehensions.

In [126]:
print [x for x in xrange(1, 1 + 20) if x % 2 == 0]

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

In [127]:
print [triangular(x) for x in xrange(1, 1 + 20)]

[1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120, 136, 153, 171, 190, 210]

In [128]:
print [triangular(x) for x in xrange(1, 1 + 20) if x % 2 == 0]

[3, 10, 21, 36, 55, 78, 105, 136, 171, 210]


### sum and reduce¶

The built-in function sum simply adds a list of numbers together.

In [129]:
print sum(xrange(1, 1 + 20))

210


In an imperative style, we would write this using a single for loop and an accumulator variable (retval) defined outside of the loop.

In [130]:
def my_sum(iterable):
retval = 0
for item in iterable:
retval = retval + item # =: retval += item
return retval


The reduce function generalizes this schema to allow for an arbitrary two-place function inside the loop. It works a bit like the following:

In [131]:
def my_reduce(fnc, iterable, default_val):
retval = default_val
for item in iterable:
retval = fnc(retval, item)
return retval


This is in fact the signature of reduce, except that the 3rd argument (default_val) is optional; if it is present, it is treated as if it were the first item in iterable.

In [132]:
print reduce(lambda x, y: x + y, range(1, 1 + 20))

210


Python 3 change: reduce has been relegated to the functools module, on the grounds that reduce calls tend to be somewhat inscrutable and are often better written via explicit loops.

### Functional tricks with dictionaries¶

Dictionaries are often used to store per-item numeric "scores". When combined with functional built-in functions like max and anonymous functions defined using the lambda keyword, it is very easy to extract summary information from such a dictionary.

In [133]:
oldestpeople2age = {"Misao Okawa": 116.627,
"Gertrude Weaver": 116.296,
"Jeralean Talley": 115.412,
"Nabi Tajima": 114.211}

In [134]:
(oldest_person, oldest_age) = max(oldestpeople2age.iteritems(),
key=lambda kv: kv[1])
print oldest_person

Misao Okawa

In [135]:
from heapq import nlargest  # faster than full sorting
k_oldest = nlargest(2, oldestpeople2age.iteritems(),
key=lambda kv: kv[1])
print k_oldest

[('Misao Okawa', 116.627), ('Gertrude Weaver', 116.296)]

In [136]:
oldest_in_order = sorted(oldestpeople2age.iteritems(),
key=lambda kv: kv[1],
reverse=True)
print oldest_in_order

[('Misao Okawa', 116.627), ('Gertrude Weaver', 116.296), ('Jeralean Talley', 115.412), ('Nabi Tajima', 114.211)]