Lab 1 - Basic iPython Tutorial

Authors:

v1.0 (2014 Fall) Rishi Sharma ***, Sahaana Suri ***, Kangwook Lee ***, Kannan Ramchandran ***
v1.1 (2015 Fall) Kabir Chandrasekher **, Max Kanwal **, Kangwook Lee ***, Kannan Ramchandran ***
v1.2 (2016 Spring) Ashvin Nair *, Kabir Chandrasekher **, Kangwook Lee ***, Kannan Ramchandran ***

modified from Berkeley Python Bootcamp 2013 https://github.com/profjsb/python-bootcamp

and Python for Signal Processing http://link.springer.com/book/10.1007%2F978-3-319-01342-8

and EE 123 iPython Tutorial http://inst.eecs.berkeley.edu/~ee123/sp14/lab/python_tutorial.ipynb

General iPython Notebook usage instructions (overview)

  • Start by clicking Help >> User Interface Tour to get yourself familiar with the iPython Notebook environment.
  • Click the Play button to run and advance a cell. The short-cut for it is shift-enter.
  • To add a new cell, either select "Insert >> Insert New Cell Below" or click the Plus button.
  • You can change the cell mode from code to text in the pulldown menu. Use Markdown for writing text.
  • You can change the text in Markdown cells by double-clicking it. The short-cut for this is enter.
  • To save your notebook, either select "File >> Save and Checkpoint" or hit Command-s for Mac and Ctrl-s for Windows.
  • To undo edits within a cell, hit Command-z for Mac and Ctrl-z for Windows.
  • Help >> Keyboard Shortcuts has a list of all useful keyboard shortcuts.
  • The Help menu also has links to many reference docs you may find useful this semester (e.g. Markdown, Python, NumPy, Matplotlib, SciPy).

Installing Python

Follow the instructions on the class website to install python (if you're reading this, you've probably already done that):

http://ipython.org/install.html

Make sure you install the notebook dependencies for iPython in addition to the basic package

Tab Completion

One useful feature of iPython is tab completion:

In [ ]:
x = 1
y = 2
x_plus_y = x+y

# type `x_` then hit TAB to auto-complete the variable
# then press shift + enter to run the cell
print x_

Help

Another useful feature is the help command. Type any function followed by ? and run the cell to return a help window. Hit the x button to close it.

In [ ]:
abs?

Floats and Integers

Doing math in Python is easy, but note that there are int and float types in Python. Integer division returns the floor. Always check this when debugging!

In [ ]:
59 / 87
In [ ]:
59 / 87.0

The division operator ('/') was changed from floor division to float division in the update from Python 2.x to Python 3.x. So if you import division from the future, then everything works fine.

In [ ]:
from __future__ import division

59 / 87

Strings

  • Double quotes and single quotes are the same thing.
  • '+' concatenates strings.
In [ ]:
# This is a comment
"Hi " + 'Bye'

Printing

Here are some fancy ways of printing:

In [ ]:
numStudents = 83
In [ ]:
print "There are currently %d students enrolled in EE 126." % numStudents
In [ ]:
print "At least {0} of you will probably end up dropping. That is {1}%."\
    .format(15, 15.0/numStudents * 100 )
In [ ]:
print "Good Luck! Prepare to work hard and learn a lot of cool stuff!"

Lists

A list is a mutable array of data, i.e. it can constantly be modified. See http://stackoverflow.com/questions/8056130/immutable-vs-mutable-types-python for more info. If you are not careful, using mutable data structures can lead to bugs in code that passes common data to many different functions.

Important functions:

  • Created a list by using square brackets [ ].
  • '+' appends lists.
  • len(x) gets the length of list x.
In [ ]:
x = [1, 2, "asdf"] + [4, 5, 6]

print x
In [ ]:
print len(x)

Tuples

A tuple is an immutable list. They can be created using round brackets ( ).

They are usually used as inputs and outputs to functions

In [ ]:
t = (1, 2, "asdf") + (3, 4, 5)
print t
In [ ]:
# cannot do assignment
t[0] = 10

# errors in ipython notebook appear inline

Arrays (NumPy)

A numpy array is like a list with multidimensional support and more functions. We will be using it a lot.

Arithmetic operations on numpy arrays correspond to elementwise operations.

Important functions:

  • .shape returns the dimensions of the array

  • .ndim returns the number of dimensions.

  • .size returns the number of entries in the array

  • len() returns the first dimension

To use functions in numpy, we have to import numpy to our workspace. This is done by the command import numpy. By convention, we rename numpy as np for convenience.

In [ ]:
import numpy as np # by convention, import numpy as np

x = np.array( [ [1, 2, 3], [4, 5, 6] ] )

print x
In [ ]:
print "Number of dimensions:", x.ndim
In [ ]:
print "Dimensions:", x.shape
In [ ]:
print "Size:", x.size
In [ ]:
print "Length:", len(x)
In [ ]:
a = np.array([1, 2, 3])

print "a=", a

print "a*a=", a * a   #element-wise arithmetic
In [ ]:
b = np.array(np.ones((3,3)))*2
print "b=\n", b
c = np.array(np.ones((3,3)))
print "c=\n", c

Multiply elementwise:

In [ ]:
print "b*c =\n", b*c

Now multiply as matrices (not arrays):

In [ ]:
print "b*c =\n", np.dot(b,c)

Alternatively, we can just convert to or create a matrix instead of an array and then use normal multiplication:

In [ ]:
print "b*c =\n", np.matrix(b)*np.matrix(c)

d = np.matrix([[1,1j,0],[1,2,3]])
e = np.matrix([[1],[1j],[0]])

print "d*e =\n", d*e

Slicing for numpy arrays

Numpy uses pass-by-reference semantics so it creates views into the existing array, without implicit copying. This is particularly helpful with very large arrays because copying can be slow.

In [ ]:
x = np.array([1,2,3,4,5,6])
print x

We slice an array from a to b-1 with [a:b]

In [ ]:
y = x[0:4]
print y

Since slicing does not copy the array, changing y changes x:

In [ ]:
y[0] = 7
print x
print y

To actually copy x, we should use .copy():

In [ ]:
x = np.array([1,2,3,4,5,6])
y = x.copy()
y[0] = 7
print x
print y

Commonly used Numpy functions: r_ and c_

We use r_ to create integer sequences

r_[0:N] creates an array listing every integer from 0 to N-1

r_[0:N:m] creates an array listing every m th integer from 0 to N-1

In [ ]:
from numpy import r_ # import r_ function from numpy directly, so that we can call r_ directly instead of np.r_

print r_[-5:5] # every integer from -5 ... 4

print r_[0:5:2] # every other integer from 0 ... 4

print abs( r_[-5:5] )

r_ stands for row concatenation, and the function r_[-5:5] is saying to row concatenate every element generated in the range [-5:5]. This is just one use case for row concatenation, but as you can imagine there are many others. The same goes for its cousin the c_ function, which performs a column concatenation.

In [ ]:
from numpy import r_
from numpy import c_

row1 = [[1,0,0]]
row2 = [[0,1,0]]
row3 = [[0,0,1]]

# we want to stack these three rows to create a 3x3 identity matrix
# this is where the r_ function comes in handy

print np.r_[row1,row2,row3] # 3x3 identity matrix appending vectors as rows

What would have happened above if you had used c_ instead of r_ on row1, row2, and row3? Try it in the box below:

In [ ]:
print np.c_[# fill in # ]     # vector created by appending elements as new columns

Some more examples:

In [ ]:
X = np.eye(3) # 3x3 Identity Matrix
Y = np.ones([3,3]) # 3x3 Matrix of ones
print "X = "
print X
print "Y = "
print Y

Z = r_[X,Y] # concatenate y to x as rows
print "\n Row Concatenated [X ; Y] : "
print Z

W = c_[X,Y] # concatenate y to x as columns
print "\n Column Concatenated [X Y] : \n "
print W

Plotting

In this class we will use matplotlib.pyplot to plot signals and images.

To begin with, we import matplotlib.pyplot as plt (again for convenience).

In [ ]:
import numpy as np
import matplotlib.pyplot as plt # by convention, we import pyplot as plt
from numpy import r_ # import r_ function from numpy


x = r_[:1:0.01] # if you don't specify a number before the colon, the starting index defaults to 0
a = np.exp( -x )
b = np.sin( x*10.0 )/4.0 + 0.5

# plot in browser instead of opening new windows
%matplotlib inline

plt.plot(x, a) plots a against x

In [ ]:
plt.figure()
plt.plot( x, a )

Once you started a figure, you can keep plotting to the same figure

In [ ]:
plt.figure()
plt.plot( x, a, 'green' )
plt.plot( x, b )

To plot different plots, you can create a second figure

In [ ]:
plt.figure()
plt.plot( x, a )
plt.figure()
plt.plot( x, b )

To label the axes, use plt.xlabel() and plt.ylabel()

In [ ]:
plt.figure()
plt.plot( x, a )
plt.plot( x, b )

plt.xlabel( "time" )
plt.ylabel( "space" )

You can also add title and legends using plt.title() and plt.legend()

In [ ]:
plt.figure()
plt.plot( x, a )
plt.plot( x, b )
plt.xlabel( "time" )
plt.ylabel( "space" )

plt.title( "Most important graph in the world" )

plt.legend( ("e^-x", "sin(x)") )

There are many options you can specify in plot(), such as color and linewidth. You can also change the axis using plt.axis

In [ ]:
plt.figure()
plt.plot( x, a ,':r',linewidth=20)
plt.plot( x, b ,'--k')
plt.xlabel( "time" )
plt.ylabel( "space" )

plt.title( "Most important graph in the world" )

plt.legend( ("blue", "red") )

plt.axis( [0, 4, -2, 3] )

There are many other plotting functions. For example, we will use plt.imshow() for showing images and plt.stem() for plotting discretized signal

In [ ]:
# image
plt.figure()

data = np.outer( a, b ) # plotting the outer product of a and b

plt.imshow(data)
In [ ]:
# stem plot
plt.figure()
plt.stem(x[::5], a[::5]) # subsample by 5
In [ ]:
# xkcd style plots 
# Note: requires matplotlib version 1.3.1 or higher
plt.xkcd()
plt.plot( x, a )
plt.plot( x, b )
plt.xlabel( "time" )
plt.ylabel( "space" )

plt.title( "Most important graph in the world" )

plt.legend( ("blue", "red") )

To turn off xkcd style plotting, restart the kernel or run the command `plt.rcdefaults()`

Logic

For-Loop

Indentation matters in Python. Everything indented belongs to the loop:

In [ ]:
for i in [4, 6, "asdf", "jkl"]:
    print i
In [ ]:
for i in r_[0:10]:
    print i

If-Else

Same goes for If-Else:

In [ ]:
if 1 != 0:
    print "1 != 0"
elif 1 == 0: 
    print "1 = 0"
else:
    print "Huh?"

Random Library

The numpy random library should be your resource for all Monte Carlo simulations which require generating instances of random variables.

The documentation for the library can be found here: http://docs.scipy.org/doc/numpy/reference/routines.random.html

The function $\operatorname{rand}()$ can be used to generates a uniform random number in the range $[0,1)$

In [ ]:
from numpy import random

print random.rand() # random number
print random.rand(5) # random vector
print random.rand(3,3) # random matrix

Let's see how we can use this to generate a fair coin toss (i.e. a discrete $\mathcal{B}\left(\frac{1}{2}\right)$ random variable)

In [ ]:
x = round(random.rand()) # Bernoulli(1/2) random variable
print x

Now let's generate several fair coin tosses and plot a histogram of the results

In [ ]:
k = 100
x1 = [round(random.rand()) for _ in xrange(k)]
plt.figure()
plt.hist(x1)

# we could also use numpy's round function to element-wise round the vector
x2 = np.round(random.rand(k))
plt.figure()
plt.hist(x2)

We can do something similar for several other distributions, and allow the histogram to give us a sense of what the distribution looks like. As we increase the number of samples we take from the distribution $k$, the more and more our histogram looks like the actual distribution.

In [ ]:
k = 1000

discrete_uniform = random.randint(0,10,size=k) # k discrete uniform random variables between 0 and 9
plt.figure(figsize=(6,3))
plt.hist(discrete_uniform)
plt.title('discrete uniform')

continuous_uniform = random.rand(k)
plt.figure(figsize=(6,3))
plt.hist(continuous_uniform)
plt.title('continous uniform')

std_normal = random.randn(k) # randn generates elements from the standard normal
plt.figure(figsize=(6,3))
plt.hist(std_normal)
plt.title('standard normal')

# To generate a normal distribution with mean mu and standard deviation sigma, we must mean shift and scale the variable
mu = 100
sigma = 40
normal_mu_sigma = mu + random.randn(k)*sigma
plt.figure(figsize=(6,3))
plt.hist(normal_mu_sigma)
plt.title('N(' + str(mu) + ',' + str(sigma) + ')')

^ We could do this all day with all sorts of distributions. I think you get the point.

Specifying a Discrete Probability Distribution for Monte Carlo Sampling

The following function takes $n$ sample from a discrete probability distribution specified by the two arrays distribution and values.

As an example, let us suppose a random variable $X$ follows the following distribution:

$$ X = \begin{cases} 1 \ \text{w/ probability 0.1} \\ 2 \ \text{w/ probability 0.4} \\ 3 \ \text{w/ probability 0.2} \\ 4 \ \text{w/ probability 0.2} \\ 5 \ \text{w/ probability 0.05} \\ 6 \ \text{w/ probability 0.05} \end{cases} $$

Then we would have: distribution = $\begin{bmatrix} 0.1 , 0.4 , 0.2 , 0.2 , 0.05 , 0.05 \end{bmatrix}$ and values = $\begin{bmatrix} 1, 2, 3, 4, 5, 6 \end{bmatrix}$

In [ ]:
def nSample(distribution, values, n):
    if sum(distribution) != 1:
        distribution = [distribution[i] / sum(distribution) for i in range(len(distribution))]
    rand = [random.rand() for i in range(n)]
    rand.sort()
    samples = []
    samplePos, distPos, cdf = 0, 0, distribution[0]
    while samplePos < n:
        if rand[samplePos] < cdf:
            samplePos += 1
            samples.append(values[distPos])
        else:
            distPos += 1
            cdf += distribution[distPos]
    return samples
In [ ]:
# collect k samples from X and plot the histogram
samplesFromX = nSample([0.1, 0.4, 0.2, 0.2, 0.05, 0.05], [1, 2, 3, 4, 5, 6], k)
plt.hist(samplesFromX)
plt.ylim((0,1000))
print "Wow, if we normalized the y-axis that would be a PMF. Incredible! I should try that."
In [ ]:
# Normalize the samples from X to plot the probability mass function below:

$\mathcal{Q}$uestion 1: Sampling and Plotting a Binomial Random Variable

A binomial random variable $X \sim \text{Bin}(n,p)$ can be thought of as the number of heads in $n$ coin flips where each flip has probability $p$ of coming up heads. We can equivalently think of it as the sum of $n$ Bernoulli random variables: $X = \sum_{i=1}^{n}X_i$ where $X_i \sim \text{Bernoulli}(p)$.

In this question, you will put your new plotting skills to work and sample the values of a binomial random variable.

In [ ]:
def plot_binomial(n, trials=[10,50,100,1000,10000], p_values=[0.1,0.2,0.5,0.8]):
    """
    On different figures, plot a histogram of the results of the given
    number of trials of the binomial variable with parameters n and p for all
    values in the given list.
    """
    
plot_binomial(100) #Feel free to play around with other values of n

Now that you have plotted many values of a few different binomial random variables, do the results coincide with what you expect them to?

$\mathcal{Q}$uestion 2: Monty Hall Proof and Simulation

Simulate the Monty Hall problem for the case when you do switch doors and when you don't. Run the simulation 100,000 times for each case and determine the simulated probability of winning given each strategy. You can use the function below as a guide (or not). Make sure your simulation results match the analytical result you expect. If you need a refresher on the Monty Hall problem, the following video may be of some help. Additionally, provide a rigorous proof of the probability of winning the prize given a strategy of switching doors vs. a strategy of staying put.

In [ ]:
from IPython.display import YouTubeVideo
YouTubeVideo('Zr_xWfThjJ0')

Simulate the Monty Hall problem below:

A version of the simulation that does not use NumPy at all is given. Your job is to re-implement the simulation using as many functions from the NumPy library as you can!

In [ ]:
from __future__ import division
import time

def MontyHall(switch, n):
    # switch is a boolean which tells you whether or not to switch from the original door chosen
    # n is the number of times you want to simulate the action
    results = []
    for trial in range(n):
        car_door = int(random.random()*3+1)
        my_door = int(random.random()*3+1)
        other_goat_doors = [i for i in range(1,4) if (i!=car_door and i!=my_door)]
        if len(other_goat_doors)==1:
            revealed_door = other_goat_doors[0]
        else:
            revealed_door = other_goat_doors[random.randint(0,1)]
        if switch:
            my_door = [i for i in range(1,4) if (i!=my_door and i!=revealed_door)][0]
        else:
            my_door = my_door
        result = my_door == car_door
        results.append(result)
    return sum(results)/n
    
    
def MontyHall_numpy(switch, n):
    # switch is a boolean which tells you whether or not to switch from the original door chosen
    # n is the number of times you want to simulate the action
    
    # Your beautiful code here... #
    pass
   
In [ ]:
start = time.clock()
print "Probability of Winning if You Switch Doors: ", MontyHall(True,100000)
print "Probability of Winning if You Don't Switch: " , MontyHall(False,100000)
end = time.clock()
total1 = end - start
In [ ]:
start = time.clock()
print "Probability of Winning if You Switch Doors: ", MontyHall_numpy(True,100000)
print "Probability of Winning if You Don't Switch: " , MontyHall_numpy(False,100000)
end = time.clock()
total2 = end - start
In [ ]:
print "w/o NumPy:\t %f s\nw/ NumPy:\t %f s" %(total1,total2)
print "Total Speedup: " + str(total1/total2) + "x"

For reference, our Monte Carlo simulation using NumPy was actually slower than than without NumPy. Why should we bother with NumPy at all then?

While many of you may not have used NumPy previously, or not see the purpose in learning this library, we strongly urge you to force yourself to use NumPy as much as possible while doing virtual labs. It will benefit you significantly in this class. Furthermore, NumPy is widely used in industry and academic research, and so it will benefit you greatly to become comfortable with it this semester! NumPy (and using matrix operations rather than loops) is your friend when it comes to dealing with lots of data or doing elaborate simulations.

In this simple simulation there was basically no manipulation of data, and only one loop. If you find yourself using many loops or list comprehensions to process data, think about using NumPy. :)

Provide a rigorous proof for the results you observed: