A Fast (and Furious) Introduction to Python

The notebook is better viewed rendered as slides. You can convert it to slides and view them by:

This and other related IPython notebooks can be found at the course github repository:

A fair question: Why Python?

  • Multi-platform language.
  • object-oriented,
  • imperative and
  • functional programming or
  • procedural styles.

  • dynamic type system

  • automatic memory management
  • large and comprehensive standard library.

Python works out of the box

Many of the things I used to use a calculator for, I now use Python for:

In [1]:
23423+4345
Out[1]:
27768
In [2]:
(50-5*6)/4
Out[2]:
5.0

(If you're typing this into an IPython notebook, or otherwise using notebook file, you hit shift-Enter to evaluate a cell.)

There are some gotchas compared to using a normal calculator.

In [3]:
7/3
Out[3]:
2.3333333333333335

Python integer division, like C or Fortran integer division, truncates the remainder and returns an integer. At least it does in version 2. In version 3, Python returns a floating point number. You can get a sneak preview of this feature in Python 2 by importing the module from the future features:

from __future__ import division

Alternatively, you can convert one of the integers to a floating point number, in which case the division function returns another floating point number.

In [4]:
7/3.
Out[4]:
2.3333333333333335
In [5]:
7/float(3)
Out[5]:
2.3333333333333335

The import statement

In the last few lines, we have sped by a lot of things that we should stop for a moment and explore a little more fully. We've seen, however briefly, two different data types: integers, also known as whole numbers to the non-programming world, and floating point numbers, also known (incorrectly) as decimal numbers to the rest of the world.

We've also seen the first instance of an import statement. Python has a huge number of libraries included with the distribution. To keep things simple, most of these variables and functions are not accessible from a normal Python interactive session. Instead, you have to import the name. For example, there is a math module containing many useful functions. To access, say, the square root function, you can either first

from math import sqrt

and then

In [6]:
sqrt(83)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-9eac16ce9249> in <module>()
----> 1 sqrt(83)

NameError: name 'sqrt' is not defined

or you can simply import the math library itself

In [7]:
import math
math.sqrt(81)
Out[7]:
9.0

You can define variables using the equals (=) sign:

In [8]:
width = 20
length = 30
area = length*width
area
Out[8]:
600

If you try to access a variable that you haven't yet defined, you get an error:

In [9]:
volume
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-9-0c7fc58f9268> in <module>()
----> 1 volume

NameError: name 'volume' is not defined

and you need to define it:

In [10]:
depth = 10
volume = area*depth
volume
Out[10]:
6000

You can name a variable almost anything you want. It needs to start with an alphabetical character or "_", can contain alphanumeric charcters plus underscores ("_"). Certain words, however, are reserved for the language:

and, as, assert, break, class, continue, def, del, elif, else, except, 
exec, finally, for, from, global, if, import, in, is, lambda, not, or,
pass, print, raise, return, try, while, with, yield

Trying to define a variable using one of these will result in a syntax error:

In [11]:
return = 0
  File "<ipython-input-11-2b99136d4ec6>", line 1
    return = 0
           ^
SyntaxError: invalid syntax

The Python Tutorial has more on using Python as an interactive shell. The IPython tutorial makes a nice complement to this, since IPython has a much more sophisticated iteractive shell.

Strings

Strings are lists of printable characters, and can be defined using either single quotes

In [12]:
'Hello, World!'
Out[12]:
'Hello, World!'

or double quotes

In [13]:
"Hello, World!"
Out[13]:
'Hello, World!'

But not both at the same time, unless you want one of the symbols to be part of the string.

In [14]:
"He's a Rebel"
Out[14]:
"He's a Rebel"
In [15]:
'''fasdfasdf
dsfasd
fasdf
asdfasdf'''
Out[15]:
'fasdfasdf\ndsfasd\nfasdf\nasdfasdf'

Just like the other two data objects we're familiar with (ints and floats), you can assign a string to a variable

In [16]:
greeting = "Hello, World!"

The print function is often used for printing character strings:

In [17]:
print(greeting)
Hello, World!

But it can also print data types other than strings:

In [18]:
print("The area is ")
The area is 

In the above snipped, the number 600 (stored in the variable "area") is converted into a string before being printed out.

You can use the + operator to concatenate strings together:

In [19]:
statement = "Hello," + "World!"
print(statement)
Hello,World!

Don't forget the space between the strings, if you want one there.

In [20]:
statement = "Hello, " + "World!"
print(statement)
Hello, World!

You can use + to concatenate multiple strings in a single statement:

In [21]:
print("This " + "is " + "a " + "longer " + "statement.")
This is a longer statement.

If you have a lot of words to concatenate together, there are other, more efficient ways to do this. But this is fine for linking a few strings together.

Lists

Very often in a programming language, one wants to keep a group of similar items together. Python does this using a data type called lists.

In [22]:
days_of_the_week = ["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]

You can access members of the list using the index of that item:

In [23]:
days_of_the_week[2]
Out[23]:
'Tuesday'

Python lists, like C, but unlike Fortran, use 0 as the index of the first element of a list. Thus, in this example, the 0 element is "Sunday", 1 is "Monday", and so on. If you need to access the nth element from the end of the list, you can use a negative index. For example, the -1 element of a list is the last element:

In [24]:
days_of_the_week[-1]
Out[24]:
'Saturday'

You can add additional items to the list using the .append() command:

In [25]:
languages = ["Fortran","C","C++"]
languages.append("Python")
print(languages)
['Fortran', 'C', 'C++', 'Python']

The range() command is a convenient way to make sequential lists of numbers:

In [26]:
range(10)
Out[26]:
range(0, 10)

Note that range(n) starts at 0 and gives the sequential list of integers less than n. If you want to start at a different number, use range(start,stop).

In [27]:
range(2,8)
Out[27]:
range(2, 8)

The lists created above with range have a step of 1 between elements. You can also give a fixed step size via a third command:

In [28]:
evens = range(0,20,2)
evens
Out[28]:
range(0, 20, 2)
In [29]:
evens[3]
Out[29]:
6

You can find out how long a list is using the len() command:

In [30]:
len(evens)
Out[30]:
10

Lists do not have to hold the same data type. For example,

In [31]:
["Today",7,99.3,""]
Out[31]:
['Today', 7, 99.3, '']

However, it's good (but not essential) to use lists for similar objects that are somehow logically connected. If you want to group different data types together into a composite data object, it's best to use tuples, which we will learn about below.

Iteration, Indentation, and Blocks

One of the most useful things you can do with lists is to iterate through them, i.e. to go through each element one at a time. To do this in Python, we use the for statement:

In [32]:
for day in days_of_the_week:
    print(day)
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday

This code snippet goes through each element of the list called days_of_the_week and assigns it to the variable day. It then executes everything in the indented block (in this case only one line of code, the print statement) using those variable assignments. When the program has gone through every element of the list, it exists the block.

(Almost) every programming language defines blocks of code in some way. In Fortran, one uses END statements (ENDDO, ENDIF, etc.) to define code blocks. In C, C++, and Perl, one uses curly braces {} to define these blocks.

Python uses a colon (":"), followed by indentation level to define code blocks. Everything at a higher level of indentation is taken to be in the same block. In the above example the block was only a single line, but we could have had longer blocks as well:

In [33]:
for day in days_of_the_week:
    statement = "Today is " + day
    print(statement)
Today is Sunday
Today is Monday
Today is Tuesday
Today is Wednesday
Today is Thursday
Today is Friday
Today is Saturday

The range() command is particularly useful with the for statement to execute loops of a specified length:

In [34]:
for i in range(20):
    print("The square of ", i, " is ", i*i)
The square of  0  is  0
The square of  1  is  1
The square of  2  is  4
The square of  3  is  9
The square of  4  is  16
The square of  5  is  25
The square of  6  is  36
The square of  7  is  49
The square of  8  is  64
The square of  9  is  81
The square of  10  is  100
The square of  11  is  121
The square of  12  is  144
The square of  13  is  169
The square of  14  is  196
The square of  15  is  225
The square of  16  is  256
The square of  17  is  289
The square of  18  is  324
The square of  19  is  361

Slicing

Lists and strings have something in common that you might not suspect: they can both be treated as sequences. You already know that you can iterate through the elements of a list. You can also iterate through the letters in a string:

In [35]:
for letter in "Sunday":
    print(letter)
S
u
n
d
a
y

This is only occasionally useful. Slightly more useful is the slicing operation, which you can also use on any sequence. We already know that we can use indexing to get the first element of a list:

In [36]:
days_of_the_week[0]
Out[36]:
'Sunday'

If we want the list containing the first two elements of a list, we can do this via

In [37]:
days_of_the_week[0:2]
Out[37]:
['Sunday', 'Monday']

or simply

In [38]:
days_of_the_week[:2]
Out[38]:
['Sunday', 'Monday']

If we want the last items of the list, we can do this with negative slicing:

In [39]:
days_of_the_week[-2:]
Out[39]:
['Friday', 'Saturday']

which is somewhat logically consistent with negative indices accessing the last elements of the list.

You can do:

In [40]:
workdays = days_of_the_week[1:6]
print(workdays)
['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']

Since strings are sequences, you can also do this to them:

In [41]:
day = "Sunday"
abbreviation = day[:3]
print(abbreviation)
Sun

If we really want to get fancy, we can pass a third element into the slice, which specifies a step length (just like a third argument to the range() function specifies the step):

In [42]:
numbers = range(0,40)
evens = numbers[2::2]
evens
Out[42]:
range(2, 40, 2)

Note that in this example I was even able to omit the second argument, so that the slice started at 2, went to the end of the list, and took every second element, to generate the list of even numbers less that 40.

Booleans and Truth Testing

We have now learned a few data types. We have integers and floating point numbers, strings, and lists to contain them. We have also learned about lists, a container that can hold any data type. We have learned to print things out, and to iterate over items in lists. We will now learn about boolean variables that can be either True or False.

We invariably need some concept of conditions in programming to control branching behavior, to allow a program to react differently to different situations. If it's Monday, I'll go to work, but if it's Sunday, I'll sleep in. To do this in Python, we use a combination of boolean variables, which evaluate to either True or False, and if statements, that control branching based on boolean values.

For example:

In [43]:
day = "Sunday"
if day == None:
    print("Sleep in")
else:
    print("Go to work")
Go to work

(Quick quiz: why did the snippet print "Go to work" here? What is the variable "day" set to?)

Let's take the snippet apart to see what happened. First, note the statement

In [44]:
day == "Sunday"
Out[44]:
True

If we evaluate it by itself, as we just did, we see that it returns a boolean value, False. The "==" operator performs equality testing. If the two items are equal, it returns True, otherwise it returns False. In this case, it is comparing two variables, the string "Sunday", and whatever is stored in the variable "day", which, in this case, is the other string "Saturday". Since the two strings are not equal to each other, the truth test has the false value.

The if statement that contains the truth test is followed by a code block (a colon followed by an indented block of code). If the boolean is true, it executes the code in that block. Since it is false in the above example, we don't see that code executed.

The first block of code is followed by an else statement, which is executed if nothing else in the above if statement is true. Since the value was false, this code is executed, which is why we see "Go to work".

You can compare any data types in Python:

In [45]:
1 == 2
Out[45]:
False
In [46]:
50 == 2*25
Out[46]:
True
In [47]:
3 < 3.14159
Out[47]:
True
In [48]:
1 == 1.0
Out[48]:
True
In [49]:
1 != 0
Out[49]:
True
In [50]:
1 <= 2
Out[50]:
True
In [51]:
1 >= 1
Out[51]:
True

We see a few other boolean operators here, all of which which should be self-explanatory. Less than, equality, non-equality, and so on.

Particularly interesting is the 1 == 1.0 test, which is true, since even though the two objects are different data types (integer and floating point number), they have the same value. There is another boolean operator is, that tests whether two objects are the same object:

In [52]:
1 is 1.0
Out[52]:
False

We can do boolean tests on lists as well:

In [53]:
[1,2,3] == [1,2,4]
Out[53]:
False
In [54]:
[1,2,3] < [1,2,4]
Out[54]:
True

Finally, note that you can also string multiple comparisons together, which can result in very intuitive tests:

In [55]:
hours = 5
0 < hours < 24
Out[55]:
True

If statements can have elif parts ("else if"), in addition to if/else parts. For example:

In [56]:
if day == "Sunday":
    print("Sleep in")
elif day == "Saturday":
    print("Do chores")
else:
    print("Go to work")
Sleep in

Of course we can combine if statements with for loops, to make a snippet that is almost interesting:

In [57]:
for day in days_of_the_week:
    statement = "Today is " + day
    print(statement)
    if day == "Sunday":
        print("   Sleep in")
    elif day == "Saturday":
        print("   Do chores")
    else:
        print("   Go to work")
Today is Sunday
   Sleep in
Today is Monday
   Go to work
Today is Tuesday
   Go to work
Today is Wednesday
   Go to work
Today is Thursday
   Go to work
Today is Friday
   Go to work
Today is Saturday
   Do chores

This is something of an advanced topic, but ordinary data types have boolean values associated with them, and, indeed, in early versions of Python there was not a separate boolean object. Essentially, anything that was a 0 value (the integer or floating point 0, an empty string "", or an empty list []) was False, and everything else was true. You can see the boolean value of any data object using the bool() function.

In [58]:
bool(1)
Out[58]:
True
In [59]:
bool(0)
Out[59]:
False
In [60]:
bool(["This "," is "," a "," list"])
Out[60]:
True

Code Example: The Fibonacci Sequence

The Fibonacci sequence is a sequence in math that starts with 0 and 1, and then each successive entry is the sum of the previous two. Thus, the sequence goes 0,1,1,2,3,5,8,13,21,34,55,89,...

A very common exercise in programming books is to compute the Fibonacci sequence up to some number n. First I'll show the code, then I'll discuss what it is doing.

In [61]:
n = 10
sequence = [0,1]
for i in range(2,n): # This is going to be a problem if we ever set n <= 2!
    sequence.append(sequence[i-1]+sequence[i-2])
print(sequence)
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

Let's go through this line by line. First, we define the variable n, and set it to the integer 20. n is the length of the sequence we're going to form, and should probably have a better variable name. We then create a variable called sequence, and initialize it to the list with the integers 0 and 1 in it, the first two elements of the Fibonacci sequence. We have to create these elements "by hand", since the iterative part of the sequence requires two previous elements.

We then have a for loop over the list of integers from 2 (the next element of the list) to n (the length of the sequence). After the colon, we see a hash tag "#", and then a comment that if we had set n to some number less than 2 we would have a problem. Comments in Python start with #, and are good ways to make notes to yourself or to a user of your code explaining why you did what you did. Better than the comment here would be to test to make sure the value of n is valid, and to complain if it isn't; we'll try this later.

In the body of the loop, we append to the list an integer equal to the sum of the two previous elements of the list.

After exiting the loop (ending the indentation) we then print out the whole list. That's it!

Functions

We might want to use the Fibonacci snippet with different sequence lengths. We could cut an paste the code into another cell, changing the value of n, but it's easier and more useful to make a function out of the code. We do this with the def statement in Python:

In [65]:
def fibonacci(sequence_length):
    "Return the Fibonacci sequence of length *sequence_length*"
    sequence = [0,1]
    if sequence_length < 1:
        print("Fibonacci sequence only defined for length 1 or greater")
        return
    if 0 < sequence_length < 3:
        return sequence[:sequence_length]
    for i in range(2,sequence_length): 
        sequence.append(sequence[i-1]+sequence[i-2])
    return sequence

We can now call fibonacci() for different sequence_lengths:

In [66]:
fibonacci(2)
Out[66]:
[0, 1]
In [67]:
fibonacci(12)
Out[67]:
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

We've introduced a several new features here. First, note that the function itself is defined as a code block (a colon followed by an indented block). This is the standard way that Python delimits things. Next, note that the first line of the function is a single string. This is called a docstring, and is a special kind of comment that is often available to people using the function through the python command line:

In [68]:
help(math)
Help on module math:

NAME
    math

MODULE REFERENCE
    https://docs.python.org/3.6/library/math
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module is always available.  It provides access to the
    mathematical functions defined by the C standard.

FUNCTIONS
    acos(...)
        acos(x)
        
        Return the arc cosine (measured in radians) of x.
    
    acosh(...)
        acosh(x)
        
        Return the inverse hyperbolic cosine of x.
    
    asin(...)
        asin(x)
        
        Return the arc sine (measured in radians) of x.
    
    asinh(...)
        asinh(x)
        
        Return the inverse hyperbolic sine of x.
    
    atan(...)
        atan(x)
        
        Return the arc tangent (measured in radians) of x.
    
    atan2(...)
        atan2(y, x)
        
        Return the arc tangent (measured in radians) of y/x.
        Unlike atan(y/x), the signs of both x and y are considered.
    
    atanh(...)
        atanh(x)
        
        Return the inverse hyperbolic tangent of x.
    
    ceil(...)
        ceil(x)
        
        Return the ceiling of x as an Integral.
        This is the smallest integer >= x.
    
    copysign(...)
        copysign(x, y)
        
        Return a float with the magnitude (absolute value) of x but the sign 
        of y. On platforms that support signed zeros, copysign(1.0, -0.0) 
        returns -1.0.
    
    cos(...)
        cos(x)
        
        Return the cosine of x (measured in radians).
    
    cosh(...)
        cosh(x)
        
        Return the hyperbolic cosine of x.
    
    degrees(...)
        degrees(x)
        
        Convert angle x from radians to degrees.
    
    erf(...)
        erf(x)
        
        Error function at x.
    
    erfc(...)
        erfc(x)
        
        Complementary error function at x.
    
    exp(...)
        exp(x)
        
        Return e raised to the power of x.
    
    expm1(...)
        expm1(x)
        
        Return exp(x)-1.
        This function avoids the loss of precision involved in the direct evaluation of exp(x)-1 for small x.
    
    fabs(...)
        fabs(x)
        
        Return the absolute value of the float x.
    
    factorial(...)
        factorial(x) -> Integral
        
        Find x!. Raise a ValueError if x is negative or non-integral.
    
    floor(...)
        floor(x)
        
        Return the floor of x as an Integral.
        This is the largest integer <= x.
    
    fmod(...)
        fmod(x, y)
        
        Return fmod(x, y), according to platform C.  x % y may differ.
    
    frexp(...)
        frexp(x)
        
        Return the mantissa and exponent of x, as pair (m, e).
        m is a float and e is an int, such that x = m * 2.**e.
        If x is 0, m and e are both 0.  Else 0.5 <= abs(m) < 1.0.
    
    fsum(...)
        fsum(iterable)
        
        Return an accurate floating point sum of values in the iterable.
        Assumes IEEE-754 floating point arithmetic.
    
    gamma(...)
        gamma(x)
        
        Gamma function at x.
    
    gcd(...)
        gcd(x, y) -> int
        greatest common divisor of x and y
    
    hypot(...)
        hypot(x, y)
        
        Return the Euclidean distance, sqrt(x*x + y*y).
    
    isclose(...)
        isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0) -> bool
        
        Determine whether two floating point numbers are close in value.
        
           rel_tol
               maximum difference for being considered "close", relative to the
               magnitude of the input values
            abs_tol
               maximum difference for being considered "close", regardless of the
               magnitude of the input values
        
        Return True if a is close in value to b, and False otherwise.
        
        For the values to be considered close, the difference between them
        must be smaller than at least one of the tolerances.
        
        -inf, inf and NaN behave similarly to the IEEE 754 Standard.  That
        is, NaN is not close to anything, even itself.  inf and -inf are
        only close to themselves.
    
    isfinite(...)
        isfinite(x) -> bool
        
        Return True if x is neither an infinity nor a NaN, and False otherwise.
    
    isinf(...)
        isinf(x) -> bool
        
        Return True if x is a positive or negative infinity, and False otherwise.
    
    isnan(...)
        isnan(x) -> bool
        
        Return True if x is a NaN (not a number), and False otherwise.
    
    ldexp(...)
        ldexp(x, i)
        
        Return x * (2**i).
    
    lgamma(...)
        lgamma(x)
        
        Natural logarithm of absolute value of Gamma function at x.
    
    log(...)
        log(x[, base])
        
        Return the logarithm of x to the given base.
        If the base not specified, returns the natural logarithm (base e) of x.
    
    log10(...)
        log10(x)
        
        Return the base 10 logarithm of x.
    
    log1p(...)
        log1p(x)
        
        Return the natural logarithm of 1+x (base e).
        The result is computed in a way which is accurate for x near zero.
    
    log2(...)
        log2(x)
        
        Return the base 2 logarithm of x.
    
    modf(...)
        modf(x)
        
        Return the fractional and integer parts of x.  Both results carry the sign
        of x and are floats.
    
    pow(...)
        pow(x, y)
        
        Return x**y (x to the power of y).
    
    radians(...)
        radians(x)
        
        Convert angle x from degrees to radians.
    
    sin(...)
        sin(x)
        
        Return the sine of x (measured in radians).
    
    sinh(...)
        sinh(x)
        
        Return the hyperbolic sine of x.
    
    sqrt(...)
        sqrt(x)
        
        Return the square root of x.
    
    tan(...)
        tan(x)
        
        Return the tangent of x (measured in radians).
    
    tanh(...)
        tanh(x)
        
        Return the hyperbolic tangent of x.
    
    trunc(...)
        trunc(x:Real) -> Integral
        
        Truncates x to the nearest Integral toward 0. Uses the __trunc__ magic method.

DATA
    e = 2.718281828459045
    inf = inf
    nan = nan
    pi = 3.141592653589793
    tau = 6.283185307179586

FILE
    /Users/lm/anaconda/lib/python3.6/lib-dynload/math.cpython-36m-darwin.so


If you define a docstring for all of your functions, it makes it easier for other people to use them, since they can get help on the arguments and return values of the function.

Next, note that rather than putting a comment in about what input values lead to errors, we have some testing of these values, followed by a warning if the value is invalid, and some conditional code to handle special cases.

Recursion and Factorials

Functions can also call themselves, something that is often called recursion. We're going to experiment with recursion by computing the factorial function. The factorial is defined for a positive integer n as

$$ n! = n(n-1)(n-2)\cdots 1 $$

First, note that we don't need to write a function at all, since this is a function built into the standard math library. Let's use the help function to find out about it:

In [69]:
from math import factorial
help(factorial)
Help on built-in function factorial in module math:

factorial(...)
    factorial(x) -> Integral
    
    Find x!. Raise a ValueError if x is negative or non-integral.

This is clearly what we want.

In [70]:
factorial(20)
Out[70]:
2432902008176640000

However, if we did want to write a function ourselves, we could do recursively by noting that

$$ n! = n(n-1)!$$

The program then looks something like:

In [71]:
def fact(n):
    if n <= 0:
        return 1
    return n*fact(n-1)
In [72]:
fact(20)
Out[72]:
2432902008176640000

Recursion can be very elegant, and can lead to very simple programs.

Two More Data Structures: Tuples and Dictionaries

Before we end the Python overview, I wanted to touch on two more data structures that are very useful (and thus very common) in Python programs.

A tuple is a sequence object like a list or a string. It's constructed by grouping a sequence of objects together with commas, either without brackets, or with parentheses:

In [73]:
t = (1,2,'hi',9.0)
t
Out[73]:
(1, 2, 'hi', 9.0)

Tuples are like lists, in that you can access the elements using indices:

In [74]:
t[1]
Out[74]:
2

However, tuples are immutable, you can't append to them or change the elements of them:

In [75]:
t.append(7)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-75-50c7062b1d5f> in <module>()
----> 1 t.append(7)

AttributeError: 'tuple' object has no attribute 'append'
In [76]:
t[1]=77
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-76-03cc8ba9c07d> in <module>()
----> 1 t[1]=77

TypeError: 'tuple' object does not support item assignment

Tuples are useful anytime you want to group different pieces of data together in an object, but don't want to create a full-fledged class (see below) for them. For example, let's say you want the Cartesian coordinates of some objects in your program. Tuples are a good way to do this:

In [77]:
('Bob',0.0,21.0)
Out[77]:
('Bob', 0.0, 21.0)

Again, it's not a necessary distinction, but one way to distinguish tuples and lists is that tuples are a collection of different things, here a name, and x and y coordinates, whereas a list is a collection of similar things, like if we wanted a list of those coordinates:

In [78]:
positions = [
             ('Bob',0.0,21.0),
             ('Cat',2.5,13.1),
             ('Dog',33.0,1.2)
             ]

Tuples can be used when functions return more than one value. Say we wanted to compute the smallest x- and y-coordinates of the above list of objects. We could write:

In [79]:
def minmax(objects):
    minx = 1e20 # These are set to really big numbers
    miny = 1e20
    for obj in objects:
        name,x,y = obj
        if x < minx: 
            minx = x
        if y < miny:
            miny = y
    return minx,miny

x,y = minmax(positions)
print(x,y)
0.0 1.2

Here we did two things with tuples you haven't seen before. First, we unpacked an object into a set of named variables using tuple assignment:

>>> name,x,y = obj

We also returned multiple values (minx,miny), which were then assigned to two other variables (x,y), again by tuple assignment. This makes what would have been complicated code in C++ rather simple.

Tuple assignment is also a convenient way to swap variables:

In [80]:
x,y = 1,2
y,x = x,y
x,y
Out[80]:
(2, 1)

Dictionaries

Dictionaries are an object called "mappings" or "associative arrays" in other languages. Whereas a list associates an integer index with a set of objects:

In [81]:
mylist = [1,2,9,21]

The index in a dictionary is called the key, and the corresponding dictionary entry is the value. A dictionary can use (almost) anything as the key. Whereas lists are formed with square brackets [], dictionaries use curly brackets {}:

In [82]:
ages = {"Rick": 46, "Bob": 86, "Fred": 21}
print("Rick's age is ", ages["Rick"])
Rick's age is  46

There's also a convenient way to create dictionaries without having to quote the keys.

In [83]:
dict(Rick=46,Bob=86,Fred=20)
Out[83]:
{'Bob': 86, 'Fred': 20, 'Rick': 46}
In [84]:
ages["Rick"] = 47
In [85]:
ages
Out[85]:
{'Bob': 86, 'Fred': 21, 'Rick': 47}

The len() function works on both tuples and dictionaries:

In [86]:
len(t)
Out[86]:
4
In [87]:
len(ages)
Out[87]:
3

"Advanced" Python Topics

The map() function

data = [] for line in csv.splitlines(): words = line.split(',') data.append(map(float,words)) data = array(data)

There are two significant changes over what we did earlier. First, I'm passing the comma character ',' into the split function, so that it breaks to a new word every time it sees a comma. Next, to simplify things a big, I'm using the map() command to repeatedly apply a single function (float()) to a list, and to return the output as a list.

In [88]:
help(map)
Help on class map in module builtins:

class map(object)
 |  map(func, *iterables) --> map object
 |  
 |  Make an iterator that computes the function using arguments from
 |  each of the iterables.  Stops when the shortest iterable is exhausted.
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __reduce__(...)
 |      Return state information for pickling.

Optional arguments

You will recall that the numpy.linspace function can take either two arguments (for the starting and ending points):

In [89]:
import numpy
numpy.linspace(0,1)
Out[89]:
array([ 0.        ,  0.02040816,  0.04081633,  0.06122449,  0.08163265,
        0.10204082,  0.12244898,  0.14285714,  0.16326531,  0.18367347,
        0.20408163,  0.2244898 ,  0.24489796,  0.26530612,  0.28571429,
        0.30612245,  0.32653061,  0.34693878,  0.36734694,  0.3877551 ,
        0.40816327,  0.42857143,  0.44897959,  0.46938776,  0.48979592,
        0.51020408,  0.53061224,  0.55102041,  0.57142857,  0.59183673,
        0.6122449 ,  0.63265306,  0.65306122,  0.67346939,  0.69387755,
        0.71428571,  0.73469388,  0.75510204,  0.7755102 ,  0.79591837,
        0.81632653,  0.83673469,  0.85714286,  0.87755102,  0.89795918,
        0.91836735,  0.93877551,  0.95918367,  0.97959184,  1.        ])

or it can take three arguments, for the starting point, the ending point, and the number of points:

In [90]:
numpy.linspace(0,1,5)
Out[90]:
array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])

You can also pass in keywords to exclude the endpoint:

In [91]:
numpy.linspace(0,1,5,endpoint=False)
Out[91]:
array([ 0. ,  0.2,  0.4,  0.6,  0.8])

Right now, we only know how to specify functions that have a fixed number of arguments. We'll learn how to do the more general cases here.

If we're defining a simple version of linspace, we would start with:

In [92]:
def my_linspace(start,end):
    npoints = 50
    v = []
    d = (end-start)/float(npoints-1)
    for i in range(npoints):
        v.append(start + i*d)
    return v
my_linspace(0,1)
Out[92]:
[0.0,
 0.02040816326530612,
 0.04081632653061224,
 0.061224489795918366,
 0.08163265306122448,
 0.1020408163265306,
 0.12244897959183673,
 0.14285714285714285,
 0.16326530612244897,
 0.18367346938775508,
 0.2040816326530612,
 0.22448979591836732,
 0.24489795918367346,
 0.26530612244897955,
 0.2857142857142857,
 0.3061224489795918,
 0.32653061224489793,
 0.3469387755102041,
 0.36734693877551017,
 0.3877551020408163,
 0.4081632653061224,
 0.42857142857142855,
 0.44897959183673464,
 0.4693877551020408,
 0.4897959183673469,
 0.5102040816326531,
 0.5306122448979591,
 0.5510204081632653,
 0.5714285714285714,
 0.5918367346938775,
 0.6122448979591836,
 0.6326530612244897,
 0.6530612244897959,
 0.673469387755102,
 0.6938775510204082,
 0.7142857142857142,
 0.7346938775510203,
 0.7551020408163265,
 0.7755102040816326,
 0.7959183673469387,
 0.8163265306122448,
 0.836734693877551,
 0.8571428571428571,
 0.8775510204081632,
 0.8979591836734693,
 0.9183673469387754,
 0.9387755102040816,
 0.9591836734693877,
 0.9795918367346939,
 0.9999999999999999]

We can add an optional argument by specifying a default value in the argument list:

In [93]:
def my_linspace(start,end,npoints = 50):
    v = []
    d = (end-start)/float(npoints-1)
    for i in range(npoints):
        v.append(start + i*d)
    return v

This gives exactly the same result if we don't specify anything:

In [94]:
my_linspace(0,1)
Out[94]:
[0.0,
 0.02040816326530612,
 0.04081632653061224,
 0.061224489795918366,
 0.08163265306122448,
 0.1020408163265306,
 0.12244897959183673,
 0.14285714285714285,
 0.16326530612244897,
 0.18367346938775508,
 0.2040816326530612,
 0.22448979591836732,
 0.24489795918367346,
 0.26530612244897955,
 0.2857142857142857,
 0.3061224489795918,
 0.32653061224489793,
 0.3469387755102041,
 0.36734693877551017,
 0.3877551020408163,
 0.4081632653061224,
 0.42857142857142855,
 0.44897959183673464,
 0.4693877551020408,
 0.4897959183673469,
 0.5102040816326531,
 0.5306122448979591,
 0.5510204081632653,
 0.5714285714285714,
 0.5918367346938775,
 0.6122448979591836,
 0.6326530612244897,
 0.6530612244897959,
 0.673469387755102,
 0.6938775510204082,
 0.7142857142857142,
 0.7346938775510203,
 0.7551020408163265,
 0.7755102040816326,
 0.7959183673469387,
 0.8163265306122448,
 0.836734693877551,
 0.8571428571428571,
 0.8775510204081632,
 0.8979591836734693,
 0.9183673469387754,
 0.9387755102040816,
 0.9591836734693877,
 0.9795918367346939,
 0.9999999999999999]

But also let's us override the default value with a third argument:

In [95]:
my_linspace(0,1,5)
Out[95]:
[0.0, 0.25, 0.5, 0.75, 1.0]

We can add arbitrary keyword arguments to the function definition by putting a keyword argument **kwargs handle in:

In [96]:
def my_linspace(start,end,npoints=50,**kwargs):
    endpoint = kwargs.get('endpoint',True)
    v = []
    if endpoint:
        d = (end-start)/float(npoints-1)
    else:
        d = (end-start)/float(npoints)
    for i in range(npoints):
        v.append(start + i*d)
    return v
my_linspace(0,1,5,endpoint=False)
Out[96]:
[0.0, 0.2, 0.4, 0.6000000000000001, 0.8]

What the keyword argument construction does is to take any additional keyword arguments (i.e. arguments specified by name, like "endpoint=False"), and stick them into a dictionary called "kwargs" (you can call it anything you like, but it has to be preceded by two stars). You can then grab items out of the dictionary using the get command, which also lets you specify a default value. I realize it takes a little getting used to, but it is a common construction in Python code, and you should be able to recognize it.

There's an analogous *args that dumps any additional arguments into a list called "args". Think about the range function: it can take one (the endpoint), two (starting and ending points), or three (starting, ending, and step) arguments. How would we define this?

In [97]:
def my_range(*args):
    start = 0
    step = 1
    if len(args) == 1:
        end = args[0]
    elif len(args) == 2:
        start,end = args
    elif len(args) == 3:
        start,end,step = args
    else:
        raise Exception("Unable to parse arguments")
    v = []
    value = start
    while True:
        v.append(value)
        value += step
        if value > end: break
    return v

List Comprehensions and Generators

List comprehensions are a streamlined way to make lists. They look something like a list definition, with some logic thrown in. For example:

In [100]:
evens1 = [2*i for i in range(10)]
print(evens1)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

You can also put some boolean testing into the construct:

In [101]:
odds = [i for i in range(20) if i%2==1]
odds
Out[101]:
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

Here i%2 is the remainder when i is divided by 2, so that i%2==1 is true if the number is odd. Even though this is a relative new addition to the language, it is now fairly common since it's so convenient.

iterators are a way of making virtual sequence objects.

Consider if we had the nested loop structure:

for i in range(1000000):
    for j in range(1000000):

Inside the main loop, we make a list of 1,000,000 integers, just to loop over them one at a time. We don't need any of the additional things that a lists gives us, like slicing or random access, we just need to go through the numbers one at a time. And we're making 1,000,000 of them.

iterators are a way around this.

For example, the range function is the iterator version of range. This simply makes a counter that is looped through in sequence, so that the analogous loop structure would look like:

for i in range(1000000):
    for j in range(1000000):

Even though we've only added two characters, we've dramatically sped up the code, because we're not making 1,000,000 big lists.

We can define our own iterators using the yield statement:

In [103]:
def evens_below(n):
    for i in range(n):
        if i%2 == 0:
            yield i
    return

for i in evens_below(9):
    print(i)
0
2
4
6
8

We can always turn an iterator into a list using the list command:

In [104]:
list(evens_below(9))
Out[104]:
[0, 2, 4, 6, 8]

There's a special syntax called a generator expression that looks a lot like a list comprehension:

In [106]:
evens_gen = (i for i in range(9) if i%2==0)
for i in evens_gen:
    print(i)
0
2
4
6
8

Factory Functions

A factory function is a function that returns a function. They have the fancy name lexical closure, which makes you sound really intelligent in front of your CS friends. But, despite the arcane names, factory functions can play a very practical role.

Suppose you want the Gaussian function centered at 0.5, with height 99 and width 1.0. You could write a general function.

In [107]:
from numpy import exp
In [108]:
def gauss(x,A,a,x0):
    return A*exp(-a*(x-x0)**2)

But what if you need a function with only one argument, like f(x) rather than f(x,y,z,...)? You can do this with Factory Functions:

In [109]:
def gauss_maker(A,a,x0):
    def f(x):
        return A*exp(-a*(x-x0)**2)
    return f
In [111]:
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
In [112]:
x = numpy.linspace(0,1)
g = gauss_maker(99.0,1.0,0.5)
plt.plot(x,g(x))
Out[112]:
[<matplotlib.lines.Line2D at 0x119326198>]

Everything in Python is an object, including functions. This means that functions can be returned by other functions. (They can also be passed into other functions, which is also useful, but a topic for another discussion.) In the gauss_maker example, the g function that is output "remembers" the A, a, x0 values it was constructed with, since they're all stored in the local memory space (this is what the lexical closure really refers to) of that function.

Factories are one of the more important of the Software Design Patterns, which are a set of guidelines to follow to make high-quality, portable, readable, stable software. It's beyond the scope of the current work to go more into either factories or design patterns, but I thought I would mention them for people interested in software design.

Functional programming

Functional programming is a very broad subject. The idea is to have a series of functions, each of which generates a new data structure from an input, without changing the input structure at all. By not modifying the input structure (something that is called not having side effects), many guarantees can be made about how independent the processes are, which can help parallelization and guarantees of program accuracy. There is a Python Functional Programming HOWTO in the standard docs that goes into more details on functional programming. I just wanted to touch on a few of the most important ideas here.

There is an operator module that has function versions of most of the Python operators. For example:

In [115]:
from operator import add, mul
add([1,2], [4])
Out[115]:
[1, 2, 4]
In [116]:
mul(3,4)
Out[116]:
12

These are useful building blocks for functional programming.

The lambda operator allows us to build anonymous functions, which are simply functions that aren't defined by a normal def statement with a name. For example, a function that doubles the input is:

In [117]:
def doubler(x): return 2*x
doubler(17)
Out[117]:
34

We could also write this as:

In [118]:
lambda x: 2*x
Out[118]:
<function __main__.<lambda>>

And assign it to a function separately:

In [119]:
another_doubler = lambda x: 2*x
another_doubler(19)
Out[119]:
38

lambda is particularly convenient (as we'll see below) in passing simple functions as arguments to other functions.

map is a way to repeatedly apply a function to a list:

In [120]:
map(float,'1 2 3 4 5'.split())
Out[120]:
<map at 0x119357cc0>

reduce is a way to repeatedly apply a function to the first two items of the list. There already is a sum function in Python that is a reduction:

In [121]:
sum([1,2,3,4,5])
Out[121]:
15

Object Oriented Programming

We've seen a lot of examples of objects in Python. We create a string object with quote marks:

In [123]:
mystring = "Hi there"

and we have a bunch of methods we can use on the object:

In [124]:
mystring.split()
Out[124]:
['Hi', 'there']
In [125]:
mystring.startswith('Hi')
Out[125]:
True
In [126]:
len(mystring)
Out[126]:
8

Object oriented programming simply gives you the tools to define objects and methods for yourself. It's useful anytime you want to keep some data (like the characters in the string) tightly coupled to the functions that act on the data (length, split, startswith, etc.).

As an example, we're going to bundle the functions we did to make the 1d harmonic oscillator eigenfunctions with arbitrary potentials, so we can pass in a function defining that potential, some additional specifications, and get out something that can plot the orbitals, as well as do other things with them, if desired.

In [127]:
from  numpy.linalg import eigh
import numpy
class Schrod1d:
    """\
    Schrod1d: Solver for the one-dimensional Schrodinger equation.
    """
    def __init__(self,V,start=0,end=1,npts=50,**kwargs):
        m = kwargs.get('m',1.0)
        self.x = numpy.linspace(start,end,npts)
        self.Vx = V(self.x)
        self.H = (-0.5/m)*self.laplacian() + numpy.diag(self.Vx)
        return
    
    def plot(self,*args,**kwargs):
        titlestring = kwargs.get('titlestring',"Eigenfunctions of the 1d Potential")
        xstring = kwargs.get('xstring',"Displacement (bohr)")
        ystring = kwargs.get('ystring',"Energy (hartree)")
        if not args:
            args = [3]
        x = self.x
        E,U = eigh(self.H)
        h = x[1]-x[0]

        # Plot the Potential
        plt.plot(x,self.Vx,color='k')

        for i in range(*args):
            # For each of the first few solutions, plot the energy level:
            plt.axhline(y=E[i],color='k',ls=":")
            # as well as the eigenfunction, displaced by the energy level so they don't
            # all pile up on each other:
            plt.plot(x,U[:,i]/numpy.sqrt(h)+E[i])
        plt.title(titlestring)
        plt.xlabel(xstring)
        plt.ylabel(ystring) 
        return
        
    def laplacian(self):
        x = self.x
        h = x[1]-x[0] # assume uniformly spaced points
        n = len(x)
        M = -2*numpy.identity(n,'d')
        for i in range(1,n):
            M[i,i-1] = M[i-1,i] = 1
        return M/h**2

The __init__() function specifies what operations go on when the object is created. The self argument is the object itself, and we don't pass it in. The only required argument is the function that defines the QM potential. We can also specify additional arguments that define the numerical grid that we're going to use for the calculation.

For example, to do an infinite square well potential, we have a function that is 0 everywhere. We don't have to specify the barriers, since we'll only define the potential in the well, which means that it can't be defined anywhere else.

In [128]:
square_well = Schrod1d(lambda x: 0*x,m=10)
square_well.plot(4,titlestring="Square Well Potential")

We can similarly redefine the Harmonic Oscillator potential.

In [129]:
ho = Schrod1d(lambda x: x**2,start=-3,end=3)
ho.plot(6,titlestring="Harmonic Oscillator")

Speeding Python: Timeit, Profiling, Cython, SWIG, and PyPy

The first rule of speeding up your code is not to do it at all. As Donald Knuth said:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."

The second rule of speeding up your code is to only do it if you really think you need to do it. Python has two tools to help with this process: a timing program called timeit, and a very good code profiler. We will discuss both of these tools in this section, as well as techniques to use to speed up your code once you know it's too slow.

Timeit

timeit helps determine which of two similar routines is faster. Recall that some time ago we wrote a factorial routine, but also pointed out that Python had its own routine built into the math module. Is there any difference in the speed of the two? timeit helps us determine this. For example, timeit tells how long each method takes:

In [130]:
%timeit factorial(20)
The slowest run took 23.59 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 104 ns per loop

The little % sign that we have in front of the timeit call is an example of an IPython magic function (see IPython tutorial).

In any case, the timeit function runs 3 loops, and tells us that it took on the average of 583 ns to compute 20!. In contrast:

In [131]:
%timeit fact(20)
100000 loops, best of 3: 5.24 ┬Ás per loop

the factorial function we wrote is about a factor of 10 slower.

Profiling

Profiling complements what timeit does by splitting the overall timing into the time spent in each function. It can give us a better understanding of what our program is really spending its time on.

Suppose we want to create a list of even numbers. Our first effort yields this:

In [132]:
def evens(n):
    "Return a list of even numbers below n"
    l = []
    for x in range(n):
        if x % 2 == 0:
            l.append(x)
    return l

Is this code fast enough? We find out by running the Python profiler on a longer run:

In [133]:
import cProfile
cProfile.run('evens(100000)')
         50004 function calls in 0.032 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.024    0.024    0.031    0.031 <ipython-input-132-9d23d9d62f6b>:1(evens)
        1    0.001    0.001    0.032    0.032 <string>:1(<module>)
        1    0.000    0.000    0.032    0.032 {built-in method builtins.exec}
    50000    0.007    0.000    0.007    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


This looks okay, 0.05 seconds isn't a huge amount of time, but looking at the profiling shows that the append function is taking almost 20% of the time. Can we do better? Let's try a list comprehension.

In [134]:
def evens2(n):
    "Return a list of even numbers below n"
    return [x for x in range(n) if x % 2 == 0]
In [135]:
import cProfile
cProfile.run('evens2(100000)')
         5 function calls in 0.014 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.013    0.013 <ipython-input-134-cbb0d0b3fc58>:1(evens2)
        1    0.013    0.013    0.013    0.013 <ipython-input-134-cbb0d0b3fc58>:3(<listcomp>)
        1    0.001    0.001    0.014    0.014 <string>:1(<module>)
        1    0.000    0.000    0.014    0.014 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


By removing a small part of the code using a list comprehension, we've doubled the overall speed of the code!

Other Ways to Speed Python

When we compared the fact and factorial functions, above, we noted that C routines are often faster because they're more streamlined. Once we've determined that one routine is a bottleneck for the performance of a program, we can replace it with a faster version by writing it in C. This is called extending Python, and there's a good section in the standard documents. This can be a tedious process if you have many different routines to convert. Fortunately, there are several other options.

Swig (the simplified wrapper and interface generator) is a method to generate binding not only for Python but also for Matlab, Perl, Ruby, and other scripting languages. Swig can scan the header files of a C project and generate Python binding for it. Using Swig is substantially easier than writing the routines in C.

Cython is a C-extension language. You can start by compiling a Python routine into a shared object libraries that can be imported into faster versions of the routines. You can then add additional static typing and make other restrictions to further speed the code. Cython is generally easier than using Swig.

PyPy is the easiest way of obtaining fast code. PyPy compiles Python to a subset of the Python language called RPython that can be efficiently compiled and optimized. Over a wide range of tests, PyPy is roughly 6 times faster than the standard Python Distribution.

Conclusion of the Python Overview

There is, of course, much more to the language than I've covered here. I've tried to keep this brief enough so that you can jump in and start using Python to simplify your life and work. My own experience in learning new things is that the information doesn't "stick" unless you try and use it for something in real life.

You will no doubt need to learn more as you go. I've listed several other good references, including the Python Tutorial and Learn Python the Hard Way. Additionally, now is a good time to start familiarizing yourself with the Python Documentation, and, in particular, the Python Language Reference.

Learning Resources

Some Python topics to look at

  • IPython: simple shell, parallel computing.
  • IPython Notebooks: simple web interface for preparing report-like documents (these slides are made with it!)
  • matplotlib: a library for scientific plotting comparable to Matlab (see also bokeh).
  • pandas: efficient data handling.
  • scikit.learn: machine learning framework and algorithms.


Creative Commons License
This work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-nc-sa/4.0/).
In [136]:
# To install run: pip install version_information
%load_ext version_information
%version_information scipy, numpy, matplotlib, seaborn, deap
Out[136]:
SoftwareVersion
Python3.6.0 64bit [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
IPython5.2.2
OSDarwin 16.4.0 x86_64 i386 64bit
scipy0.18.1
numpy1.11.3
matplotlib2.0.0
seaborn0.7.1
deap1.1
Sat Mar 04 03:47:39 2017 BRT
In [137]:
# this code is here for cosmetic reasons
from IPython.core.display import HTML
from urllib.request import urlopen
HTML(urlopen('https://raw.githubusercontent.com/lmarti/jupyter_custom/master/custom.include').read().decode('utf-8'))
Out[137]: