Authors:
v1.0 (2014 Fall) Rishi Sharma **, Sahaana Suri **, Kangwook Lee **, Kannan Ramchandran **
v1.1 (2015 Fall) Kabir Chandrasekher *, Max Kanwal *, Kangwook Lee **, Kannan Ramchandran **
modified from Berkeley Python Bootcamp 2013 https://github.com/profjsb/python-bootcamp
and Python for Signal Processing http://link.springer.com/book/10.1007%2F978-3-319-01342-8
and EE 123 iPython Tutorial http://inst.eecs.berkeley.edu/~ee123/sp14/lab/python_tutorial.ipynb
Help >> User Interface Tour
to get yourself familiar with the iPython Notebook environment.Play
button to run and advance a cell. The short-cut for it is shift-enter
."Insert >> Insert New Cell Below"
or click the Plus
button.Markdown
for writing text.Markdown
cells by double-clicking it. The short-cut for this is enter
."File >> Save and Checkpoint"
or hit Command-s
for Mac and Ctrl-s
for Windows.Command-z
for Mac and Ctrl-z
for Windows.Help >> Keyboard Shortcuts
has a list of all useful keyboard shortcuts.Help
menu also has links to many reference docs you may find useful this semester (e.g. Markdown, Python, NumPy, Matplotlib, SciPy).Follow the instructions on the class website to install python (if you're reading this, you've probably already done that):
http://ipython.org/install.html
Make sure you install the notebook dependencies for iPython in addition to the basic package
One useful feature of iPython is tab completion:
x = 1
y = 2
x_plus_y = x+y
# type `x_` then hit TAB to auto-complete the variable
# then press shift + enter to run the cell
print x_
Another useful feature is the help command. Type any function followed by ?
and run the cell to return a help window. Hit the x
button to close it.
abs?
Doing math in Python is easy, but note that there are int
and float
types in Python. Integer division returns the floor. Always check this when debugging!
59 / 87
59 / 87.0
The division operator ('/
') was changed from floor division to float division in the update from Python 2.x to Python 3.x. So if you import division from the future, then everything works fine.
from __future__ import division
59 / 87
'+'
concatenates strings.# This is a comment
"Hi " + 'Bye'
Here are some fancy ways of printing:
numStudents = 94
print "There are currently %d students enrolled in EE 126." % numStudents
print "At least {0} of you will probably end up dropping. That is {1}%."\
.format(15, 15.0/numStudents * 100 )
print "Good Luck! Prepare to work hard and learn a lot of cool stuff!"
A list is a mutable array of data, i.e. it can constantly be modified. See http://stackoverflow.com/questions/8056130/immutable-vs-mutable-types-python for more info. If you are not careful, using mutable data structures can lead to bugs in code that passes common data to many different functions.
Important functions:
[ ]
.'+'
appends lists.len(x)
gets the length of list x
.x = [1, 2, "asdf"] + [4, 5, 6]
print x
print len(x)
A tuple is an immutable list. They can be created using round brackets ( ).
They are usually used as inputs and outputs to functions
t = (1, 2, "asdf") + (3, 4, 5)
print t
# cannot do assignment
t[0] = 10
# errors in ipython notebook appear inline
A numpy array is like a list with multidimensional support and more functions. We will be using it a lot.
Arithmetic operations on numpy arrays correspond to elementwise operations.
Important functions:
.shape
returns the dimensions of the array
.ndim
returns the number of dimensions.
.size
returns the number of entries in the array
len()
returns the first dimension
To use functions in numpy, we have to import numpy to our workspace. This is done by the command import numpy
. By convention, we rename numpy
as np
for convenience.
import numpy as np # by convention, import numpy as np
x = np.array( [ [1, 2, 3], [4, 5, 6] ] )
print x
print "Number of dimensions:", x.ndim
print "Dimensions:", x.shape
print "Size:", x.size
print "Length:", len(x)
a = np.array([1, 2, 3])
print "a=", a
print "a*a=", a * a #element-wise arithmetic
b = np.array(np.ones((3,3)))*2
print "b=\n", b
c = np.array(np.ones((3,3)))
print "c=\n", c
Multiply elementwise:
print "b*c =\n", b*c
Now multiply as matrices (not arrays):
print "b*c =\n", np.dot(b,c)
Alternatively, we can just convert to or create a matrix instead of an array and then use normal multiplication:
print "b*c =\n", np.matrix(b)*np.matrix(c)
d = np.matrix([[1,1j,0],[1,2,3]])
e = np.matrix([[1],[1j],[0]])
print "d*e =\n", d*e
Numpy uses pass-by-reference semantics so it creates views into the existing array, without implicit copying. This is particularly helpful with very large arrays because copying can be slow.
x = np.array([1,2,3,4,5,6])
print x
We slice an array from a to b-1 with [a:b]
y = x[0:4]
print y
Since slicing does not copy the array, changing y
changes x
:
y[0] = 7
print x
print y
To actually copy x, we should use .copy():
x = np.array([1,2,3,4,5,6])
y = x.copy()
y[0] = 7
print x
print y
We use r_
to create integer sequences
r_[0:N]
creates an array listing every integer from 0 to N-1
r_[0:N:m]
creates an array listing every m
th integer from 0 to N-1
from numpy import r_ # import r_ function from numpy directly, so that we can call r_ directly instead of np.r_
print r_[-5:5] # every integer from -5 ... 4
print r_[0:5:2] # every other integer from 0 ... 4
print abs( r_[-5:5] )
r_ stands for row concatenation, and the function r_[-5:5] is saying to row concatenate every element generated in the range [-5:5]. This is just one use case for row concatenation, but as you can imagine there are many others. The same goes for its cousin the c_ function, which performs a column concatenation.
from numpy import r_
from numpy import c_
row1 = [[1,0,0]]
row2 = [[0,1,0]]
row3 = [[0,0,1]]
# we want to stack these three rows to create a 3x3 identity matrix
# this is where the r_ function comes in handy
print np.r_[row1,row2,row3] # 3x3 identity matrix appending vectors as rows
What would have happened above if you had used c_
instead of r_
on row1
, row2
, and row3
?
Try it in the box below:
print np.c_[# fill in # ] # vector created by appending elements as new columns
Some more examples:
X = np.eye(3) # 3x3 Identity Matrix
Y = np.ones([3,3]) # 3x3 Matrix of ones
print "X = "
print X
print "Y = "
print Y
Z = r_[X,Y] # concatenate y to x as rows
print "\n Row Concatenated [X ; Y] : "
print Z
W = c_[X,Y] # concatenate y to x as columns
print "\n Column Concatenated [X Y] : \n "
print W
In this class we will use matplotlib.pyplot
to plot signals and images.
To begin with, we import matplotlib.pyplot
as plt
(again for convenience).
import numpy as np
import matplotlib.pyplot as plt # by convention, we import pyplot as plt
from numpy import r_ # import r_ function from numpy
x = r_[:1:0.01] # if you don't specify a number before the colon, the starting index defaults to 0
a = np.exp( -x )
b = np.sin( x*10.0 )/4.0 + 0.5
# plot in browser instead of opening new windows
%matplotlib inline
plt.plot(x, a)
plots a
against x
plt.figure()
plt.plot( x, a )
Once you started a figure, you can keep plotting to the same figure
plt.figure()
plt.plot( x, a, 'green' )
plt.plot( x, b )
To plot different plots, you can create a second figure
plt.figure()
plt.plot( x, a )
plt.figure()
plt.plot( x, b )
To label the axes, use plt.xlabel()
and plt.ylabel()
plt.figure()
plt.plot( x, a )
plt.plot( x, b )
plt.xlabel( "time" )
plt.ylabel( "space" )
You can also add title and legends using plt.title()
and plt.legend()
plt.figure()
plt.plot( x, a )
plt.plot( x, b )
plt.xlabel( "time" )
plt.ylabel( "space" )
plt.title( "Most important graph in the world" )
plt.legend( ("e^-x", "sin(x)") )
There are many options you can specify in plot()
, such as color and linewidth. You can also change the axis using plt.axis
plt.figure()
plt.plot( x, a ,':r',linewidth=20)
plt.plot( x, b ,'--k')
plt.xlabel( "time" )
plt.ylabel( "space" )
plt.title( "Most important graph in the world" )
plt.legend( ("blue", "red") )
plt.axis( [0, 4, -2, 3] )
There are many other plotting functions. For example, we will use plt.imshow()
for showing images and plt.stem()
for plotting discretized signal
# image
plt.figure()
data = np.outer( a, b ) # plotting the outer product of a and b
plt.imshow(data)
# stem plot
plt.figure()
plt.stem(x[::5], a[::5]) # subsample by 5
# xkcd style plots
# Note: requires matplotlib version 1.3.1 or higher
plt.xkcd()
plt.plot( x, a )
plt.plot( x, b )
plt.xlabel( "time" )
plt.ylabel( "space" )
plt.title( "Most important graph in the world" )
plt.legend( ("blue", "red") )
plt.rcdefaults()
Indentation matters in Python. Everything indented belongs to the loop:
for i in [4, 6, "asdf", "jkl"]:
print i
for i in r_[0:10]:
print i
Same goes for If-Else:
if 1 != 0:
print "1 != 0"
elif 1 == 0:
print "1 = 0"
else:
print "Huh?"
The numpy random library should be your resource for all Monte Carlo simulations which require generating instances of random variables.
The documentation for the library can be found here: http://docs.scipy.org/doc/numpy/reference/routines.random.html
The function $\operatorname{rand}()$ can be used to generates a uniform random number in the range $[0,1)$
from numpy import random
print random.rand() # random number
print random.rand(5) # random vector
print random.rand(3,3) # random matrix
Let's see how we can use this to generate a fair coin toss (i.e. a discrete $\mathcal{B}\left(\frac{1}{2}\right)$ random variable)
x = round(random.rand()) # Bernoulli(1/2) random variable
print x
Now let's generate several fair coin tosses and plot a histogram of the results
k = 100
x1 = [round(random.rand()) for _ in xrange(k)]
plt.figure()
plt.hist(x1)
# we could also use numpy's round function to element-wise round the vector
x2 = np.round(random.rand(k))
plt.figure()
plt.hist(x2)
We can do something similar for several other distributions, and allow the histogram to give us a sense of what the distribution looks like. As we increase the number of samples we take from the distribution $k$, the more and more our histogram looks like the actual distribution.
k = 1000
discrete_uniform = random.randint(0,10,size=k) # k discrete uniform random variables between 0 and 9
plt.figure(figsize=(6,3))
plt.hist(discrete_uniform)
plt.title('discrete uniform')
continuous_uniform = random.rand(k)
plt.figure(figsize=(6,3))
plt.hist(continuous_uniform)
plt.title('continous uniform')
std_normal = random.randn(k) # randn generates elements from the standard normal
plt.figure(figsize=(6,3))
plt.hist(std_normal)
plt.title('standard normal')
# To generate a normal distribution with mean mu and standard deviation sigma, we must mean shift and scale the variable
mu = 100
sigma = 40
normal_mu_sigma = mu + random.randn(k)*sigma
plt.figure(figsize=(6,3))
plt.hist(normal_mu_sigma)
plt.title('N(' + str(mu) + ',' + str(sigma) + ')')
^ We could do this all day with all sorts of distributions. I think you get the point.
The following function takes $n$ sample from a discrete probability distribution specified by the two arrays distribution
and values
.
As an example, let us suppose a random variable $X$ follows the following distribution:
$$ X = \begin{cases} 1 \ \text{w/ probability 0.1} \\ 2 \ \text{w/ probability 0.4} \\ 3 \ \text{w/ probability 0.2} \\ 4 \ \text{w/ probability 0.2} \\ 5 \ \text{w/ probability 0.05} \\ 6 \ \text{w/ probability 0.05} \end{cases} $$Then we would have:
distribution
= $\begin{bmatrix} 0.1 , 0.4 , 0.2 , 0.2 , 0.05 , 0.05 \end{bmatrix}$ and
values
= $\begin{bmatrix} 1, 2, 3, 4, 5, 6 \end{bmatrix}$
def nSample(distribution, values, n):
if sum(distribution) != 1:
distribution = [distribution[i] / sum(distribution) for i in range(len(distribution))]
rand = [random.rand() for i in range(n)]
rand.sort()
samples = []
samplePos, distPos, cdf = 0, 0, distribution[0]
while samplePos < n:
if rand[samplePos] < cdf:
samplePos += 1
samples.append(values[distPos])
else:
distPos += 1
cdf += distribution[distPos]
return samples
# collect k samples from X and plot the histogram
samplesFromX = nSample([0.1, 0.4, 0.2, 0.2, 0.05, 0.05], [1, 2, 3, 4, 5, 6], k)
plt.hist(samplesFromX)
plt.ylim((0,1000))
print "Wow, if we normalized the y-axis that would be a PMF. Incredible! I should try that."
# Normalize the samples from X to plot the probability mass function below:
A binomial random variable $X \sim \text{Bin}(n,p)$ can be thought of as the number of heads in $n$ coin flips where each flip has probability $p$ of coming up heads. We can equivalently think of it as the sum of $n$ Bernoulli random variables: $X = \sum_{i=1}^{n}X_i$ where $X_i \sim \text{Bernoulli}(p)$.
In this question, you will put your new plotting skills to work and sample the values of a binomial random variable.
def plot_binomial(n, trials=[10,50,100,1000,10000], p_values=[0.1,0.2,0.5,0.8]):
"""
On different figures, plot a histogram of the results of the given
number of trials of the binomial variable with parameters n and p for all
values in the given list.
"""
plot_binomial(100) #Feel free to play around with other values of n
Now that you have plotted many values of a few different binomial random variables, do the results coincide with what you expect them to?
Simulate the Monty Hall problem for the case when you do switch doors and when you don't. Run the simulation 100,000 times for each case and determine the simulated probability of winning given each strategy. You can use the function below as a guide (or not). Make sure your simulation results match the analytical result you expect. If you need a refresher on the Monty Hall problem, the following video may be of some help. Additionally, provide a rigorous proof of the probability of winning the prize given a strategy of switching doors vs. a strategy of staying put.
from IPython.display import YouTubeVideo
YouTubeVideo('Zr_xWfThjJ0')
A version of the simulation that does not use NumPy at all is given. Your job is to re-implement the simulation using as many functions from the NumPy library as you can!
from __future__ import division
import time
def MontyHall(switch, n):
# switch is a boolean which tells you whether or not to switch from the original door chosen
# n is the number of times you want to simulate the action
results = []
for trial in range(n):
car_door = int(random.random()*3+1)
my_door = int(random.random()*3+1)
other_goat_doors = [i for i in range(1,4) if (i!=car_door and i!=my_door)]
if len(other_goat_doors)==1:
revealed_door = other_goat_doors[0]
else:
revealed_door = other_goat_doors[random.randint(0,1)]
if switch:
my_door = [i for i in range(1,4) if (i!=my_door and i!=revealed_door)][0]
else:
my_door = my_door
result = my_door == car_door
results.append(result)
return sum(results)/n
def MontyHall_numpy(switch, n):
# switch is a boolean which tells you whether or not to switch from the original door chosen
# n is the number of times you want to simulate the action
# Your beautiful code here... #
pass
start = time.clock()
print "Probability of Winning if You Switch Doors: ", MontyHall(True,100000)
print "Probability of Winning if You Don't Switch: " , MontyHall(False,100000)
end = time.clock()
total1 = end - start
start = time.clock()
print "Probability of Winning if You Switch Doors: ", MontyHall_numpy(True,100000)
print "Probability of Winning if You Don't Switch: " , MontyHall_numpy(False,100000)
end = time.clock()
total2 = end - start
print "w/o NumPy:\t %f s\nw/ NumPy:\t %f s" %(total1,total2)
print "Total Speedup: " + str(total1/total2) + "x"
For reference, our Monte Carlo simulation using NumPy was actually slower than than without NumPy. Why should we bother with NumPy at all then?
While many of you may not have used NumPy previously, or not see the purpose in learning this library, we strongly urge you to force yourself to use NumPy as much as possible while doing virtual labs. It will benefit you significantly in this class. Furthermore, NumPy is widely used in industry and academic research, and so it will benefit you greatly to become comfortable with it this semester! NumPy (and using matrix operations rather than loops) is your friend when it comes to dealing with lots of data or doing elaborate simulations.
In this simple simulation there was basically no manipulation of data, and only one loop. If you find yourself using many loops or list comprehensions to process data, think about using NumPy. :)