CS 506/606 Research Programming (Kyle Gorman): Introduction to Python

Python is a popular general-purpose high-level programming language which has achieved particular success in research. It is an interpreted, dynamically-typed, garage-collected language. While it is primarily an imperative language, it also supports constructs from functional programming and object-oriented programming, among others. Let's dive right in to the language; next week, we'll cover Python implementations, style, tooling, and so on.


Everything's an object. Really.

Every variable in Python is an instance of an object in the sense of object-oriented programming. This includes:

  • numeric types
  • functions, which store their documentation as instance variables
  • modules of code, which map variables and functions onto instance variables and methods
  • execution traces and exceptions

Many objects have instance variables. If the instance dog has a variable name, it is stored in dog.name.

Calling functions

A function is simply a bit of code which takes zero or more arguments and either returns nothing (or more specifically, returns None) or returns one object.

To call (i.e., run) a function, state the function name followed by an open parenthesis (, any arguments (separated by a comma), and a closed parenthesis ).

In [1]:
print int        # pointer to a function
<type 'int'>
In [2]:
print int()      # calling a function with no arguments
In [3]:
print int(3.7)   # calling a function with one argument
In [4]:
print cmp(2, 3)  # calling a function with two arguments

Many objects have instance methods. If the instance dog has a no-argument instance method bark, it can be called as dog.bark().

Variable assignment

Python is a dynamically-typed language which uses type inference to determine variable types at runtime. Therefore, there is no need to declare a variable's type, though there are ways to enforce type safety at runtime, or to check type safety off-line.

The = operator is used for variable assignment.

In [5]:
dog = 2

Variable names may consist of any ASCII alphabetic character or the underscore character, and in non-initial positions, digits.

In [6]:
_ = "syntax demands this variable, but I don't care about it"
snake_count = 3       # most common Python style
camelCount = 4        # much less common in Python
fromJustin2KellyReleaseDate = 2003

Python 3 change: Python 3 also allows Unicode characters in variable names (e.g., ε = 1e-6).

Numeric types

Python supports a large hierarchy of numeric types, including integers, floating point numbers, fractions, and decimals.

Base-10 numbers

Python ints are arbitrary-precision real-valued integers.

In [7]:
x = 3          # real integer

Python floats are real or complex base-10 floating-point numbers with 53 bits of precision.

In [8]:
y = -4.7       # real floating point number
In [9]:
z = 8.5 - 2j   # complex floating point number
In [10]:
print (z.real, z.imag)
(8.5, -2.0)

The int and float functions are used for typecasting.

In [11]:
print int(y)         # cast float to int, flooring it towards 0
In [12]:
print float(3)
In [13]:
print int("27")      # like `atoi`
In [14]:
print float("-.23")  # like `atof` 

Binary operators with one int operand and one float operand usually coerce the int to a float.

In [15]:
print x + y

Caution: division of two integers produces an integer, truncating if necessary.

In [16]:
print 27 / 8    # =: 27 // 8

Python 3 change: / produces a floating point number, and // produces a truncated integer.

Basic mathematical functions

The math module provides C99-compatible mathematical functions for real numbers.

In [17]:
from math import hypot, log  # we'll cover modules and namespaces later
In [18]:
print hypot(23, 25)

If a math function would produce a complex number, an exception is raised.

In [19]:
print log(-1)
ValueError                                Traceback (most recent call last)
<ipython-input-19-f76f59a8c27e> in <module>()
----> 1 print log(-1)

ValueError: math domain error

The cmath module provides core mathematical functions defined for complex numbers.

In [20]:
from cmath import log, polar
print polar(2.7 - 3j)
(4.036087214122113, -0.83798122500839)

Unlike math, cmath functions may return complex numbers.

In [21]:
print log(-1)

Exact representations of rational numbers

Many rational numbers (like $1.1 = \frac{11}{10}$) lack an exact binary floating-point representation.

In [22]:
print repr(1.1 + 2.2)

To represent rational numbers exactly, use fractions.Fraction or decimal.Decimal.

In [23]:
from fractions import Fraction
print Fraction('11/10') + Fraction('22/10')
In [24]:
from decimal import Decimal
print Decimal("1.1") + Decimal("2.2")


In Python, the boolean literals are True and False. To cast an arbitrary item to a boolean, use bool. All the following are cast to False:

  • numeric types that are equivalent to zero
  • empty containers
  • empty strings
  • None

Most everything else is cast to True.

In [25]:
print bool(-0.0)
In [26]:
print bool("non-null string")

Iteration & flow control

Python makes extensive use of for loops. Unlike traditional for loops, however, they do not have conditional-based termination conditions; rather, they are more akin to a "foreach", in which a variable is bound on each iteration of the loop until an iterable has been exhausted. The syntax is:

for [variable to be bound] in [iterable object]:

In Python, whitespace is semantically meaningful: indentation is mandatory, and lines which are differentially indented are not part of the same block. For example, if one (non-null) line is more indented than the next (non-null) line, the first line belongs to a code block which has narrower scope than the second. In for loops, the loop expressions must have a consistent intentation which is greater than the for statement itself.

In [27]:
n = 5
for i in xrange(n):  # C-style loop
    print i
In [28]:
grunge_bands = ["Alice In Chains", "Nirvana",  # this syntax will be 
                "Pearl Jam", "Soundgarden"]    # explained shortly
for band in grunge_bands:
    print band
Alice In Chains
Pearl Jam

To simultaneously iterate and preserve a number index, use the enumerate built-in function.

In [29]:
for (i, band) in enumerate(grunge_bands):
    print "{}: {}".format(i, band)          
0: Alice In Chains
1: Nirvana
2: Pearl Jam
3: Soundgarden

To iterate over two or more iterable objects simultaneously, use the built-in function zip.

In [30]:
for (i, j) in zip(xrange(n), xrange(n, n + n)):
    print (i, j)
(0, 5)
(1, 6)
(2, 7)
(3, 8)
(4, 9)

Caution: if the objects in a zip statement are not the same length, zip will terminates after the shortest object has been exhausted.

Python also supports while loops.

In [31]:
i = 0
while i < 23:
    i += 1
print i

Python for and while loops support the continue and break keywords.

The continue keyword causes the current iteration of the loop to be abandoned.

In [32]:
for value in range(10):
    if value % 2 == 0:
    print "{} is odd".format(value)
1 is odd
3 is odd
5 is odd
7 is odd
9 is odd

The break keyword terminates the entire loop.

In [33]:
for value in range(1, 1 + 10):
    if value % 7 == 0:
        print "{} is a multiple of 7.".format(value)
        print "And I'm outta here."
    print "{} is not a multiple of 7.".format(value)
1 is not a multiple of 7.
2 is not a multiple of 7.
3 is not a multiple of 7.
4 is not a multiple of 7.
5 is not a multiple of 7.
6 is not a multiple of 7.
7 is a multiple of 7.
And I'm outta here.

Container types


The simplest container type in Python is the list, which is a

  • mutable
  • dynamically-sized
  • ordered

sequence of zero or more arbitrary objects. It is declared using square braces.

In [34]:
empty_list = []
grunge_bands = ["Alice In Chains", "Nirvana", 
                "Pearl Jam", "Soundgarden"]

While a Python list is mutable, it is not a linked list, but rather a dynamically allocated array; thus, they make excellent stacks (via list.append and list.pop).

In [35]:
stack = [5, 6, 7]
print stack.pop()
print stack
[5, 6, 7, 8]

Caution: On the other hand, insertion to and deletion of objects in non-final positions is relatively slow. If you need to do many insertions and/or deletions in initial position, use the collections.deque ("double-ended queue") object instead.

To perform a "bulk append", use list.extend.

In [36]:
grunge_bands.extend(["Temple Of The Dog", "Hole", 
                     "Stone Temple Pilots"])
print grunge_bands
['Alice In Chains', 'Nirvana', 'Pearl Jam', 'Soundgarden', 'Temple Of The Dog', 'Hole', 'Stone Temple Pilots']

To sort a list in place, use list.sort.

In [37]:
grunge_bands.sort()  # strings are sorted lexographically, of course
print grunge_bands
['Alice In Chains', 'Hole', 'Nirvana', 'Pearl Jam', 'Soundgarden', 'Stone Temple Pilots', 'Temple Of The Dog']

To reverse a list in place, use list.reverse.

In [38]:
print grunge_bands
['Temple Of The Dog', 'Stone Temple Pilots', 'Soundgarden', 'Pearl Jam', 'Nirvana', 'Hole', 'Alice In Chains']

Indexing and slicing

Python uses zero-based indexing for ordered containers.

In [39]:
print grunge_bands[0]
Temple Of The Dog

To get the length of a list, use the built-in function len.

In [40]:
last_element = grunge_bands[len(grunge_bands) - 1]  # not great

Negative indices are used to access elements with reference to the final element rather than with reference to the first.

In [41]:
last_element = grunge_bands[-1]  # much more idiomatic

Instead of a single digit index, we can obtain a list of multiple items. This is called slicing. A two-place slice consist of a lower and upper index, separated by a colon. The resulting slice is inclusive on the first index but exclusive on the second index.

In [42]:
print grunge_bands[2:5]  # returns three, not four, objects
['Soundgarden', 'Pearl Jam', 'Nirvana']

If the first index of a two-place slice is not specified, it is assumed to be 0 and thus, left-aligned.

In [43]:
print grunge_bands[:2]  # first two
['Temple Of The Dog', 'Stone Temple Pilots']

Similarly, if the second index of a two-place slice is not specified, it is assumed to be the length of the list, and thus, right-aligned.

In [44]:
print grunge_bands[1:]  # all but the first
['Stone Temple Pilots', 'Soundgarden', 'Pearl Jam', 'Nirvana', 'Hole', 'Alice In Chains']
In [45]:
print grunge_bands[-2:]  # the last two
['Hole', 'Alice In Chains']

It is also possible to perform a three-place slice. The third argument is the step size expressed as signed (non-zero) integer.

In [46]:
print grunge_bands[::2]   # odd-numbered elements
['Temple Of The Dog', 'Soundgarden', 'Nirvana', 'Alice In Chains']
In [47]:
print grunge_bands[::-1]  # a copy of the list in reverse
['Alice In Chains', 'Hole', 'Nirvana', 'Pearl Jam', 'Soundgarden', 'Stone Temple Pilots', 'Temple Of The Dog']


A tuple is an

  • immutable
  • statically-sized
  • ordered

sequence of zero or more arbitrary objects. Parentheses and commas are used to initialize tuples.

In [48]:
favorite_snacks = ("Fresh Pineapple", "Salt & Vinegar Chips")

Caution: a single-value tuple cannot be defined simply by wrapping the argument with parentheses; a single comma must also be used.

In [49]:
even_primes_bad = (2)
print type(even_primes_bad)   # not going to work
even_primes_good = (2,)
print type(even_primes_good)
<type 'int'>
<type 'tuple'>

Tuples are often used as a complex return value for a function.

In [50]:
from math import modf
print modf(3.7)  # `modf` splits a real-valued float into fractional and integer components
(0.7000000000000002, 3.0)

Unlike lists, tuples are immutable.

In [51]:
favorite_snacks[1] = "Quinoa"  # not going to work
TypeError                                 Traceback (most recent call last)
<ipython-input-51-a702b4cf07e8> in <module>()
----> 1 favorite_snacks[1] = "Quinoa"  # not going to work

TypeError: 'tuple' object does not support item assignment

set and frozenset

The set and frozenset types are unordered containers backed by an open-addressed hashtable, allowing for constant-time lookup.

Instances of set are

  • dynamically-sized
  • mutable

and implement methods like add and remove.

In [52]:
nineties_gift_ideas = set(["Nirvana CD", "CK One", 
                           "Plaid backpack", "Slap bracelet"])
nineties_gift_ideas.add("Trapper Keeper")
print nineties_gift_ideas
set(['CK One', 'Nirvana CD', 'Slap bracelet', 'Trapper Keeper', 'Plaid backpack'])

Instances of frozenset are similar to set, but are

  • statically-sized
  • immutable
In [53]:
my_parents = frozenset(["Mom", "Dad"])
AttributeError                            Traceback (most recent call last)
<ipython-input-53-67883567a708> in <module>()
      1 my_parents = frozenset(["Mom", "Dad"])
----> 2 my_parents.add("Sally???")

AttributeError: 'frozenset' object has no attribute 'add'

Both type of sets support set logic operations.

In [54]:
tall_people = set(["Abraham Lincoln", "Lyndon B. Johnson", 
                   "Dracula", "Wolfman", 
                   "Uma Thurman", "Bob Saget"])
monsters = set(["Mummy", "The Thing", "Dracula", "Wolfman", 
In [55]:
print tall_people & monsters  # intersection, i.e., tall monsters
set(['Dracula', 'Wolfman'])
In [56]:
print monsters <= tall_people  # is subset, i.e., are all monsters tall?

Neither sets nor frozensets support indexing.

In [57]:
print monsters[0]  # not going to work
TypeError                                 Traceback (most recent call last)
<ipython-input-57-dfb9da438998> in <module>()
----> 1 print monsters[0]  # not going to work

TypeError: 'set' object does not support indexing


The Python dict ("dictionary") is a dynamically allocated open-addressed hash-table storing unordered key-value pairs. Keys must be unique and immutable (like a number, str, set, or frozenset); values can be virtually anything (including other dictionaries). Curly braces { and } are used to initialize dictionaries. Key-value pairs are separated by commas, and key and value are separated by a colon.

In [58]:
empty_dict = {}
In [59]:
prez2vp = {"George W. Bush": "Dick Cheney",
           "Barack Obama": "Joe Biden"}

Dictionaries are used to map keys to values. A key's corresponding value is accessed by indexing the dictionary with the key.

In [60]:
print prez2vp["George W. Bush"]
Dick Cheney
In [61]:
print prez2vp["Bill Clinton"] = "Al Gore"
  File "<ipython-input-61-17676ca9bcde>", line 1
    print prez2vp["Bill Clinton"] = "Al Gore"
SyntaxError: invalid syntax

If a key is not present, an exception is thrown.

In [62]:
print prez2vp["Kanye West"]
KeyError                                  Traceback (most recent call last)
<ipython-input-62-6210ac1ec05c> in <module>()
----> 1 print prez2vp["Kanye West"]

KeyError: 'Kanye West'

To avoid this, check for the presence of a key before attempting to index with it.

In [63]:
print "Ross Perot" in prez2vp


There are several ways to view the elements in a dictionary. One can iterate over

  • keys with dict.iterkeys()
  • values with dict.itervalues()
  • key-value pairs with dict.iteritems()
In [64]:
# over keys
for prez in prez2vp.iterkeys():  # or: `for prez in prez2vp:`
    print prez
George W. Bush
Barack Obama
In [65]:
# over values
for vp in prez2vp.itervalues():
    print vp
Dick Cheney
Joe Biden
In [66]:
# over key-value pairs; more on this syntax later
for (prez, vp) in prez2vp.iteritems():
    print (prez, vp)
('George W. Bush', 'Dick Cheney')
('Barack Obama', 'Joe Biden')

Caution: While it's tempting to iterate over keys, and look up the values in the loop, dict.iteritems() is both faster and more explicit.

In [67]:
# over key-value pairs, the BAD WAY
for prez in prez2vp.iterkeys():
    vp = prez2vp[prez]
    print (prez, vp)
('George W. Bush', 'Dick Cheney')
('Barack Obama', 'Joe Biden')

Container type-casting

The functions list, tuple, set, frozenset, and dict can all be used to cast one container to a different type.

In [68]:
x = ["fi", "fie", "foe", "fum"]
print tuple(x)
('fi', 'fie', 'foe', 'fum')

Strings and regular expressions

String literals

Strings (instances of class str) are defined using single or double quotes. I prefer double quotes.

In [69]:
abacab = "Abacadabra"
shaz = 'Shazam'

For multi-line strings (such as documentation strings), use three sets of double quotes.

In [70]:
def prove_fermat():
    This function uses a flux capacitor and dilithium crystals to prove 
    Fermat's last theorem. Unfortunately, the complexity of this operation 
    is at least factorial.
    raise NotImplementedError
Help on function prove_fermat in module __main__:

    This function uses a flux capacitor and dilithium crystals to prove 
    Fermat's last theorem. Unfortunately, the complexity of this operation 
    is at least factorial.

String methods

In Python, string instances have many more methods than most languages you are probably familiar with. These are often used where you might use a simple regular expression in some other language.

In [71]:
print abacab == shaz           # value equality?
In [72]:
print abacab.startswith("Aba")  # does the string start with "Aba"?
In [73]:
print abacab.endswith("ra")     # does the string end with "ra"?
In [74]:
print "cad" in abacab           # does the substring exist?
In [75]:
print abacab.islower()          # is the string entirely lowercase alphabetic characters?
In [76]:
print abacab.upper()            # convert ASCII lowercase characters to uppercase
In [77]:
print "    Words        ".strip()  # remove leading or trailing whitespace
In [78]:
print str(23).zfill(4)           # left-pad a number string with zeros
In [79]:
print "It's the end of the world and I feel fine".split()  # whitespace split
["It's", 'the', 'end', 'of', 'the', 'world', 'and', 'I', 'feel', 'fine']
In [80]:
print "{:.04f} is a value I'd like to interpolate".format(3. / 4.)
0.7500 is a value I'd like to interpolate

String encoding and storage

In Python 2, str objects are represented internally as arrays of bytes. str.decode returns a unicode object, and unicode.encode returns str.

Python 3 change: str objects are encoded internally as Unicode. str.encode returns a bytes object (an array of bytes), and bytes.decode returns a str object. Most people find this more intuitive, so if you work with Unicode data a lot, you may want to try out Python 3.

If this is all Greek to you, no worries; we'll cover it in more depth in the NLP class.

Regular expressions

In [81]:
from re import finditer, findall, match, search, split, sub
farm_animal_noises = "ba ba ba oink oink moo oink oink oink"

The re module provides regular expressions using similar syntax to many other languages; by convention, regular expression strings are prefixed with an r. It is safe to state a regular expression in string form even if it will be used many times, because Python caches compiled regular expressions.

In [82]:
sheep_regex = r"(ba(\s+ba)*)"
pig_regex = r"(oink(\s+oink)*)"

The match function applies a regular expression to a string, and returns a MatchObject instance if the regular expression matches the left edge of the string.

In [83]:
m = match(sheep_regex, farm_animal_noises)
if m:
    print m.group()
ba ba ba
In [84]:
m = match(pig_regex, farm_animal_noises)  # no match
if m:
    print m.group()

The search function applies a regular expression to a string, and returns a MatchObject instance if the regular expression matches any part of the string; the resulting MatchObject will describe the first match.

In [85]:
m = search(pig_regex, farm_animal_noises)
if m:
    print m.group()  # but not the second sequence of oinks
oink oink

The finditer function applies a regular expression to a string, and returns an iterator of MatchObject instances representing all non-overlapping matches.

In [86]:
for m in finditer(pig_regex, farm_animal_noises):
    print m.group()
oink oink
oink oink oink

The findall function is similar to finditer, but instead of returning an iterator of MatchObject instances, it returns an iterator of tuples containing the capture groups.

In [87]:
for groups in findall(pig_regex, farm_animal_noises):
    print groups
('oink oink', ' oink')
('oink oink oink', ' oink')

The sub function applies a regular expression to a string, and when a match is found, the match is replaced with another string. Backreferences to capture groups are also supported. Note that the replacement string goes before the query string.

In [88]:
print sub(pig_regex, "*OINK*", farm_animal_noises)
ba ba ba *OINK* moo *OINK*

The split function applies a regular expression to a string, splitting it into substrings at the point of any match.

In [89]:
print split(r"\s+", farm_animal_noises) # =: `farm_animal_noises.split()`
['ba', 'ba', 'ba', 'oink', 'oink', 'moo', 'oink', 'oink', 'oink']


The def keyword is used to define functions.

Unlike some languages (e.g., Java), Python does not permit function overloading. However, it does permit keyword arguments with default values.

In [90]:
def sum_of_squares(iterable, default_value=0):
    return sum((item * item for item in iterable), default_value)

print sum_of_squares([2, 5, 7, 9])

Caution: do not attempt to initialize any sort of instance within a default argument; the initialization will be run only at function declaration time (i.e., it is a closure), not each time the function is called.

In [91]:
def add5tolist_crazy(mylist=[]):   # doesn't do what you expect!
    return mylist

print add5tolist_crazy()
print add5tolist_crazy()
print add5tolist_crazy()
[5, 5]
[5, 5, 5]

If you want the default argument to be initialized every time the function is called, set it to None and then initialize it in the function body.

In [92]:
def add5tolist_sane(mylist=None):   # probably what you had in mind
    if mylist is None:
        mylist = []
    return mylist

print add5tolist_sane()
print add5tolist_sane()
print add5tolist_sane()

Functions may accept an arbitrary number of arguments. To enable an arbitrary number of non-keyword arguments, add *args to the argument list; all non-keyword arguments in that position will be bound to a tuple called args.

In [93]:
def first_and_rest(*args):
    return (args[0], args[1:])

print first_and_rest(0, 5, 6, 7, 9, 12, 14, 15)
(0, (5, 6, 7, 9, 12, 14, 15))

Similarly, to admit an arbitrary number of keyword arguments, add **kwargs to the argument list; all keyword arguments not yet bound will be bound to a dictionary called kwargs.


Functions can only return one object per call, and iterators can only return one object per iteration. However, Python supports destructuring (i.e., unpacking) of complex objects returned by function calls or yielded by iterators so that these complex objects are mapped onto multiple values.

In [94]:
(first, rest) = first_and_rest(2, 5, 9, 23, 27)
print rest
(5, 9, 23, 27)
In [95]:
for (prez, vp) in prez2vp.iteritems():   # repeated from above
    print (prez, vp)
('George W. Bush', 'Dick Cheney')
('Barack Obama', 'Joe Biden')

Python 3 change: the * prefix before a variable name in an unpacking expression causes that variable to bound to a tuple from the right-hand argument (e.g., (head, *tail) = iterable).

Generators and itertools


Generators are a special class of functions which return generator objects. Generators are essentially functions that lazily evaluated, meaning that the return values are computed when needed, not when the function call is evaluated; similarly, a generator object (the type of object that is returned by a generator) is a container for these lazily computed return values. The generator object produces these return values by storing the state of an incompletely evaluated function.

There are many uses for generators:

  • infinite functions: traditional functions must halt, generators calls need not halt
  • buffered input-output: generators are useful when the function does work that is faster when done in large batches
  • memory management: traditional functions must produce all return values before returning, but generator objects can produce one return value at a time

Generators can be identified by the use of the yield statements. Each yield statement that is executed will produce one object in the resulting generator object.

In [96]:
def fibonacci():
    Neverending Fibonacci number generator according to the 
    OEIS:A00045 definition.
    lower = 0
    upper = 1
    while True:
        yield lower
        summed = lower + upper
        lower = upper
        upper = summed

When we call a generator like fibonacci, an empty generator object is created and argument variable are bound, but no other work is done.

In [97]:
fibgen = fibonacci()
print fibgen
<generator object fibonacci at 0x1098a8cd0>

The next function causes the generator function to evaluate until it hits a yield statement. At this point, the value of the yield statement is returned and control is returned to the caller until next is called again.

In [98]:
print next(fibgen)
In [99]:
print next(fibgen)
In [100]:
print next(fibgen)
In [101]:
print next(fibgen)

Generator objects can be called like this until (or unless) the interpreter encounters an empty return statement in the generator function, or an exception is thrown. In the case of the fibonacci generator, it will run until the system runs out of memory.

Even when a function is finite, generators tend to be more terse than their traditional alternative, since there is no need to initialize containers.

In [102]:
def triangular(x):
    The `x`th triangular number is the number of dots composing a 
    triangle with `n` dots on a side
    return (x * x + x) // 2

def first_triangular_traditional(n):
    Traditional function returning the first `n` triangular numbers
    retval = []
    for x in range(1, 1 + n):
    return retval

def first_triangular_generator(n):
    Generator producing the first `n` triangular numbers
    for x in range(1, 1 + n):
        yield triangular(x)

Finite generator objects can be evaluated all at once by casting them to a container type, like a list or set.

In [103]:
print list(first_triangular_generator(10))
[1, 3, 6, 10, 15, 21, 28, 36, 45, 55]

This is often faster than the traditional function variant returning a container type. Presumably, the generator formulation enables certain optimizations.

In [104]:
% timeit list(first_triangular_traditional(10000))
100 loops, best of 3: 6.48 ms per loop
In [105]:
% timeit list(first_triangular_generator(10000))
100 loops, best of 3: 6.07 ms per loop

Generator objects can also be used in iteration contexts.

In [106]:
for tri in first_triangular_generator(5):
    print tri


The itertools module contains many functions producing complex generator objects. A few examples follow.

In [107]:
from itertools import combinations, combinations_with_replacement, permutations, product, repeat 
In [108]:
print " ".join(repeat("na", 5))
na na na na na
In [109]:
print list(product("ABCD", "XYZ"))
[('A', 'X'), ('A', 'Y'), ('A', 'Z'), ('B', 'X'), ('B', 'Y'), ('B', 'Z'), ('C', 'X'), ('C', 'Y'), ('C', 'Z'), ('D', 'X'), ('D', 'Y'), ('D', 'Z')]
In [110]:
print list(combinations("ABCD", 2))
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]
In [111]:
print list(combinations_with_replacement("ABCD", 2))
[('A', 'A'), ('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'B'), ('B', 'C'), ('B', 'D'), ('C', 'C'), ('C', 'D'), ('D', 'D')]
In [112]:
print list(permutations("ABCD", 2))
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'A'), ('B', 'C'), ('B', 'D'), ('C', 'A'), ('C', 'B'), ('C', 'D'), ('D', 'A'), ('D', 'B'), ('D', 'C')]

Object-oriented programming idioms

Python supports the creation of classes (or objects). Instances of an class have their own state (instance variables) and functions (instance methods) which have access to this state. Functions of a class may also be inherited from one or more superclasses. The topmost node in the class hierarchy is object; all classes inherit from it.

Defining classes

The following is a (trivial) class declaration.

In [113]:
class Dog(object):
    def __init__(self, name):
        self.name = str(name)
    def bark(self):
        return "woof!"
    def whos_a_good_boy(self):
        return self.name

To initialize an instance of the Dog class, we call it as if it were a function using the signature of the constructor instance method, called __init__.

In [114]:
fido = Dog("Fido")
print fido.bark()

Every instance method takes self as its first argument. The ultimate reason for this is that Python treats classes as namespaces, and instance methods as syntactic sugar; for instance, fido.bark() is shorthand for Dog.bark(fido).

As seen in the __init__ and whos_a_good_boy functions above, self is also used as a prefix for defining or accessing instance variables.


Rather than inheriting from object directly, a class can inherit methods and/or variables from another subclass of object.

In [115]:
class FarmAnimal(object):
    A farm animal with a name
    def __init__(self, name):
        self.name = str(name)
class Dog(FarmAnimal):
    A dog that barks and knows it's a good boy
    # __init__ is now inherited from superclass
    def bark(self):
        return 'woof!'
    def whos_a_good_boy(self):
        return self.name

fido = Dog("Fido") # calls FarmAnimal.__init__

Classic multiple inheritance is also permitted.

In [116]:
class ShepherdDog(Dog):
    A Dog which can also herd (sort) sheep
    def herd(self, sheep):
        return sorted(sheep)
rex = ShepherdDog("Rex") # still calls FarmAnimal.__init__

Classes also can inherit from multiple superclasses which are not themselves in a superclass/subclass relationship, assuming Python is able to determine an unambiguous method resolution order. This is perhaps most useful when the superclasses are partially abstract, and are "mixed in" to grant multiple unrelated classes the same abilities.

In [117]:
class Oinker(object):
    Abstract class with oinking ability
    def oink(self):
        return "oink!"

class OinkingDog(Dog, Oinker):
    A Dog which can oink
    pass # the syntax demands a statement here

rover = OinkingDog("Rover")
print rover.oink()


Class definitions may also redefine (overload) the behavior of operators or certain built-in functions (like len) via magic methods.

In [118]:
class Band(object):
    A band represented by a band name and a list of members
    def __init__(self, name, members):
        self.name = str(name)
        self.members = list(members)
    def add_member(self, member):
        Used when a new member joins the band
    def remove_member(self, member):
        Used when a member quits the band
    def __repr__(self):
        Produces the string used to print the object (in most cases)
        return "{}(name={!r}, members={!r})".format(self.__class__.__name__,
    def __len__(self):
        Returns the length of the instance when called as the argument 
        to `len`
        return len(self.members)
    def __iter__(self):
        Iterates over instance, when instance is placed in an iteration
        context or when called as the argument to `iter`
        return iter(self.members)
    def __contains__(self, member):
        return member in self.members
    def __delitem__(self, member):
        Removes a member from the band, when `del self[member]` is 
        called; this just mimics `self.remove_member(member)`
    def jam(self):
        raise NotImplementedError
In [119]:
band = Band("Holy Rollers", ["Yulia Chang", "Chris Schmutte", 
                             "Sluggo", "Rebop"])
band.remove_member("Rebop")            # he spontaneously combusted
band.add_member("Claire St. Huffins")  # so we needed a new drummer
In [120]:
for member in band:   # iteration context
    print member
Yulia Chang
Chris Schmutte
Claire St. Huffins
In [121]:
print "Woody Smith" in band  # __contains__ context

For a full list of magic methods, see the excellent A Guide To Magic Methods by R. Kettler.

Functional programming idioms

Python supports a number of functional programming idioms. This is possible since functions are objects too, and can be passed as arguments to other functions.


The filter function provides a general schema for filtering out items from an iterable. It works a bit like the following:

In [122]:
def my_filter(fnc, iterable):
    retval = []
    for item in iterable:
        if fnc(item):
    return retval

The first argument to filter is a one-place function which is applied to every item in the second argument, an iterable. If the return value evaluates to True, the item is copied into the new iterable; otherwise it is left out.

The lambda keyword is used to define simple one-line anonymous functions, such as the ones you might want to pass to filter.

In [123]:
print filter(lambda x: x % 2 == 0, xrange(1, 1 + 20))
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

Python 3 change: filter now returns a generator-like object rather than a list.


The map function provides a general schema for applying a one-place function to all elements in an iterable. It works a bit like the following:

In [124]:
def my_map(fnc, iterable):
    retval = []
    for item in iterable:
    return retval

Like filter, the first argument to map is a one-place function which is applied to every item in the second argument, an iterable. And like filter, the lambda keyword is often useful here.

In [125]:
print map(lambda x: (x * x + x) // 2, xrange(1, 1 + 20))
[1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120, 136, 153, 171, 190, 210]

Python 3 change: map now returns a generator-like object rather than a list.

filter and map via comprehensions

But filter and map have been somewhat obsoleted by list, dictionary, and set comprehensions.

In [126]:
print [x for x in xrange(1, 1 + 20) if x % 2 == 0]
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
In [127]:
print [triangular(x) for x in xrange(1, 1 + 20)]
[1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120, 136, 153, 171, 190, 210]
In [128]:
print [triangular(x) for x in xrange(1, 1 + 20) if x % 2 == 0]
[3, 10, 21, 36, 55, 78, 105, 136, 171, 210]

sum and reduce

The built-in function sum simply adds a list of numbers together.

In [129]:
print sum(xrange(1, 1 + 20))

In an imperative style, we would write this using a single for loop and an accumulator variable (retval) defined outside of the loop.

In [130]:
def my_sum(iterable):
    retval = 0
    for item in iterable:
        retval = retval + item # =: retval += item
    return retval

The reduce function generalizes this schema to allow for an arbitrary two-place function inside the loop. It works a bit like the following:

In [131]:
def my_reduce(fnc, iterable, default_val):
    retval = default_val
    for item in iterable:
        retval = fnc(retval, item)
    return retval

This is in fact the signature of reduce, except that the 3rd argument (default_val) is optional; if it is present, it is treated as if it were the first item in iterable.

In [132]:
print reduce(lambda x, y: x + y, range(1, 1 + 20))

Python 3 change: reduce has been relegated to the functools module, on the grounds that reduce calls tend to be somewhat inscrutable and are often better written via explicit loops.

Functional tricks with dictionaries

Dictionaries are often used to store per-item numeric "scores". When combined with functional built-in functions like max and anonymous functions defined using the lambda keyword, it is very easy to extract summary information from such a dictionary.

In [133]:
oldestpeople2age = {"Misao Okawa": 116.627,
                    "Gertrude Weaver": 116.296,
                    "Jeralean Talley": 115.412,
                    "Nabi Tajima": 114.211}
In [134]:
(oldest_person, oldest_age) = max(oldestpeople2age.iteritems(),
                                  key=lambda kv: kv[1])
print oldest_person
Misao Okawa
In [135]:
from heapq import nlargest  # faster than full sorting
k_oldest = nlargest(2, oldestpeople2age.iteritems(),
                    key=lambda kv: kv[1])
print k_oldest
[('Misao Okawa', 116.627), ('Gertrude Weaver', 116.296)]
In [136]:
oldest_in_order = sorted(oldestpeople2age.iteritems(),
                         key=lambda kv: kv[1], 
print oldest_in_order
[('Misao Okawa', 116.627), ('Gertrude Weaver', 116.296), ('Jeralean Talley', 115.412), ('Nabi Tajima', 114.211)]

For more information...

  • J. Campbell, P. Gries, J. Montojo, and G. Wilson. 2013. Practical Programming: An Introduction to Computer Science Using Python 3. Frisco, TX: The Pragmatic Programmers.
  • R. Kettler. 2012. A Guide to Python's Magic Methods. URL: http://www.rafekettler.com/magicmethods.html
  • A. Kuchling. 2007. Python's Dictionary Implementation: Being All Things to All People. In A. Oram and G. Wilson (ed.), Beautiful Code: Leading Programmers Explain How They Think, 239-301. Sebastopol, CA: O'Reilly Media.
  • C. Lignos. 2014. Anti-patterns in Python. URL: http://lignos.org/py_antipatterns/
  • G. van Rossum, B. Warsaw, and N. Coghlan. 2001. PEP 8: Style Guide for Python Code. URL: http://legacy.python.org/dev/peps/pep-0008/

...and, the Python 2 and Python 3 documentation.