Econ 873: Tutorial 2 - The basics

In this tutorial we will get our feet wet with some basic computational tasks in Python. Once again, a good resource is Tom Sargent and John Stachurski's Quant Econ tutorials http://quant-econ.net/. For this lecture you should consult Part 1: Programming in Python, especially An Introductory Example and Python Essentials. From Part 2: The Scientific Libraries, the sections on NumPy,SciPy and Matplotlib will be covered.

Another useful resource is http://learnpythonthehardway.org.

NumPy

We start our discussion with NumPy. This package introduces the array data type, which is like a matrix. However, an array can have many dimensions. For instance, a vector is an array, but so is a 3 dimensional matrix. The word matrix applies to the two dimensional case with rows and columns.

As you may recall, we first need to import NumPy. We will do so and make sure that it is working by creating an array.

In [41]:
import numpy as np

a = np.linspace(start = 1, stop = 10, num = 5)
print a
[  1.     3.25   5.5    7.75  10.  ]

It looks like NumPy is working. If you are interested in reading the documentation for NumPy you can visit http://docs.scipy.org/doc/numpy/contents.html. However, everything you need should be included in my lectures or the Sargent/Stachurski lectures.

Now lets go through the motions of creating some simple arrays.

In [42]:
'''
Create an array from a list object
'''

a = ([1.0, 2.0], [3.0, 4.0])
print 'The Type of a is:'
print type(a)
print ''

a = np.array(a)
print 'The type of a is now:'
print type(a)
print ''

print 'The shape of a is:'
print a.shape
print ''

print 'This is a:'
print a
print ''

print 'The transpose of a:'
print np.transpose(a)
print ''

print 'The inverse of a:'
print np.linalg.inv(a)
print ''
The Type of a is:
<type 'tuple'>

The type of a is now:
<type 'numpy.ndarray'>

The shape of a is:
(2, 2)

This is a:
[[ 1.  2.]
 [ 3.  4.]]

The transpose of a:
[[ 1.  3.]
 [ 2.  4.]]

The inverse of a:
[[-2.   1. ]
 [ 1.5 -0.5]]

In order to find out what NumPy is capable of, you can type np. + TAB. This tab completion feature is quite handy. Some other ways to create arrays:

In [43]:
b = np.zeros([2,2])

print 'The array b:'
print b
print ''

c = np.ones([2,1])

print 'The array c:'
print c
print ''

d = np.identity(2)

print 'The array d:'
print d
print ''
    
    
The array b:
[[ 0.  0.]
 [ 0.  0.]]

The array c:
[[ 1.]
 [ 1.]]

The array d:
[[ 1.  0.]
 [ 0.  1.]]

Suppose you want to extract an element of an array. This can be done as follows:

In [44]:
print 'The top left element of d'
print d[0,0]
print ''

print 'The left column of d'
print d[:,0]
print ''

print 'The bottom Right element of d:'
print d[1,1]
print 'Or alternatively'
print d[-1,-1]
The top left element of d
1.0

The left column of d
[ 1.  0.]

The bottom Right element of d:
1.0
Or alternatively
1.0

First, note that indexing starts at 0. Therefore, the first element is always 0 and so on. This takes some getting used to. As well, to pick out the last element of a sequence you can either use its index or the shortcut $-1$. To see this more clearly, I use the following example:

In [45]:
a = np.linspace(0,5,6)
print 'The array a:'
print a
print ''

print 'Extracting elements of a:'
print a[-1]
print a[-2]
print a[-3]
print a[2]
print a[1]
print a[0]
The array a:
[ 0.  1.  2.  3.  4.  5.]

Extracting elements of a:
5.0
4.0
3.0
2.0
1.0
0.0

There are some useful array methods discuessed in the Stachurski/Sargent notes. A few examples.

In [46]:
a = np.array([4, 2, 3, 1])

print 'The original array'
print a
print ''

print 'We can sum it up'
print a.sum()
print ''

print 'Find the mean'
print a.mean()
print ''

print 'Find the max'
print a.max()
print ''

print 'Find the standard deviation'
print a.std()
The original array
[4 2 3 1]

We can sum it up
10

Find the mean
2.5

Find the max
4

Find the standard deviation
1.11803398875

Next we may want to do standard operations on matrices. Adding, subtracting and scalar multiplication are pretty simple. Matrix multiplication presents a small hiccup.

In [47]:
A = np.ones([2,2])
B = np.ones([2,2])

print 'Scalar multiplication'
print A * 2
print ''

print 'Addition of two matrices'
print A + B
print ''

print 'Scalar addition'
print A + 2
print ''

print 'Bad matrix multiplication'
print A * B
print ''

print 'Good matrix multiplication'
print np.dot(A, B)
print ''
Scalar multiplication
[[ 2.  2.]
 [ 2.  2.]]

Addition of two matrices
[[ 2.  2.]
 [ 2.  2.]]

Scalar addition
[[ 3.  3.]
 [ 3.  3.]]

Bad matrix multiplication
[[ 1.  1.]
 [ 1.  1.]]

Good matrix multiplication
[[ 2.  2.]
 [ 2.  2.]]

The dot product method is the appropriate way to multiply matrices. Not surprisingly, it works for 1d arrays, or vectors as well. Furthermore, we can solve a system of equations, find the determinant and inverse as follows:

In [48]:
x = np.array([1,2,3])
y = np.array([3,2,1])

print 'The dot product of x and y'
print np.dot(x,y)
print ''

A = np.array([[2,1],[3,2]])
b = np.array([1,1])

print 'The matrix A is'
print A
print ''

print 'The vector b is:'
print b
print ''

print 'Find the determinant of A'
print np.linalg.det(A)
print ''

print 'Find the inverse of A'
print np.linalg.inv(A)
print ''

print 'Solve Ax = b'
print np.linalg.solve(A,b)
print''
The dot product of x and y
10

The matrix A is
[[2 1]
 [3 2]]

The vector b is:
[1 1]

Find the determinant of A
1.0

Find the inverse of A
[[ 2. -1.]
 [-3.  2.]]

Solve Ax = b
[ 1. -1.]

The above example solved the following system of equations: $$2x_1 + x_2 = 1 $$ $$3x_1 + 2x_2 = 1 $$

Or more compactly:

$$ A x = b $$

For:

$$A = \begin{bmatrix} 2 & 1 \\ 3 &2 \end{bmatrix} \qquad b = \begin{bmatrix} 1 \\ 1\end{bmatrix}$$

You may check that the solution is: $$ x_1 = 1 \qquad x_2 = -1 $$

Writing Loops and If Statements

A handy thing to do in any programming language is writing loops. This is useful in situations where we have to do something over and over again. Here is an example of a Python for loop.

In [49]:
for i in range(4):
    print i
0
1
2
3

We may also want to write while loops. However, be careful as these loops will go on forever if the while condition is not met.

In [50]:
iterate = 0
max_it  = 4

while iterate < max_it:
    print iterate
    iterate += 1
0
1
2
3

Notice that we often make use of logical statements like $x < y$. Some examples of doing so:

In [2]:
x = 2
y = 3

print x < y

print y < x

if x < y:
    print 'x is less than y'
else:
    print 'x is greater than y'
True
False
x is less than y

Matplotlib

In this section we will introduce the plotting package Matplotlib. It is very similar to the plotting capabilities of Matlab. It is very flexible. We will just scratch the surface of its capabilities.

First we need to import pyplot, which is the part of the module we will use for now. This is a good chance to create a function and then to plot a vectorized version of it. The function will be a simple polynomial:

$$y = 1 + 2 x + 3 x^2 + 4 x^3 $$

In [6]:
import numpy as np
from matplotlib import pyplot as plt

def poly(x):
    return 1.0 + 2.0 * x + 3.0 * x ** 2.0 + 4.0 * x ** 3.0

xgrid = np.linspace(start = -5, stop = 5, num = 50)

plt.plot(xgrid, poly(xgrid))
plt.show()

Notice that we put a plt. in front of any pyplot method. As well, we have to use the show method for the plot to be displayed. Otherwise the plot is created but we can't see it. We can modify the display as follows.

In [7]:
'''
Changing the line width and colour.  Also this is how your write a multi line comment!
See!!!
'''
plt.plot(xgrid, poly(xgrid), 'r-', linewidth = 4)
plt.show()
In [8]:
#This is a single line comment.  

#And this is how we change the line type

plt.plot(xgrid, poly(xgrid), 'g--', linewidth = 4)
plt.show()
In [9]:
'''
What about adding labels and a title?
'''
from matplotlib import pyplot as plt
%matplotlib inline
plt.plot(xgrid, poly(xgrid), 'g--', linewidth = 4, label ='f(x)')
plt.plot(xgrid, xgrid**2, 'r-', linewidth = 4, label ='g(x)')
plt.xlabel('x')
plt.ylabel('y')
plt.title('A Cubic Polynomial Function')
plt.legend(loc = 'upper left') #there are many options for where the legend can go.
plt.show()

SciPy

SciPy is a very useful and large module that can help in lots of computational tasks. We will make use of the Stats package. First we will import it, though often it is helpful to just import the functions we want. Below I draw a single random normal variable from the $\mathcal{N}(\mu, \sigma^2)$ distribution with $\mu = 1$ and $\sigma^2 = 4$. Notice the scale parameter is the standard deviation.

In [10]:
import scipy.stats as ss

x = ss.norm.rvs(size = 1, loc = 1.0, scale = 2.0)

print 'A random normal draw'
print x
print ''
A random normal draw
[ 2.72658018]

Want to check that this is working? Lets draw a lot of them and see if the histogram approaches the .pdf of the known distribution.

In [55]:
import numpy as np 
import scipy.stats as ss         # import scipy.stats module under namespace ss.
import matplotlib.pyplot as plt  #import the pyplot module under namespace plt

#So that the plots show up in the web browser
%matplotlib inline 


x = ss.norm.rvs(size = 1000, loc = 1.0, scale = 2.0)
grid   = np.linspace(-6, 8, 30)

plt.hist(x, bins = 40, normed = True)
plt.plot(grid, ss.norm.pdf(grid, loc = 1.0, scale = 2.0),'r-', lw=4)
plt.show()

I will talk about all the .plt functions later, but for now lets analyze how we used scipy.stats to draw a lot of random numbers and evaluate the pdf. First, whenever we use SciPy.stats methods for the normal distribution we start the command with ss.norm. Within the normal distribution we have a whole bunch of methods which can be found here: http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.stats.norm.html. We used the pdf and rvs methods.

Another thing you may want to do is to run a simple linear regression. To do so we will:

  • Create some data
  • Run a regression
  • Plot the least squares line

The model will be:

$$ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i$$

Where $\varepsilon_i$ is distributed normally with mean 0 and variance $\sigma^2$. We will set the true $\beta = (1,2)$ and $\sigma^2 = 9$. The sample size $I = 1000$.

In [11]:
I     = 1000
beta0 = 1.0
beta1 = 2.0
sigma = 3.0

epsilon = ss.norm.rvs(size = I, loc = 0.0, scale = sigma)

x       = ss.norm.rvs(size = I, loc = 0.0, scale = 1.0)

y       = beta0 + beta1 * x + epsilon

beta1hat, beta0hat, r2, p_value, std_err = ss.linregress(x,y)

fitted = beta0hat + beta1hat * x 

plt.scatter(x,y, label = 'Data')
plt.plot(x, fitted, 'r-', linewidth = 4, label = 'Fitted Values')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Linear Regression')
plt.legend(loc = 'lower right')
plt.show()

print 'Intercept            Slope'
print beta0hat, beta1hat
Intercept            Slope
1.0569051497 2.03798609196

Homework

Question 1 is due by the next tutorial. Question 2 will be due as part of the computational assignment in the last week of class. However, the goal is that you have enough knowledge to do it now, so you may wish to get a jump on it!

  1. Download and install Python using the directions from the first tutorial. Create a notebook that adds 1 + 1 and features the statement 'Hello My Name is: '. Insert your name and follow the directions I gave for publishing the notebook. Send me the link.
  2. Solve the following system of equations using NumPy.

    $$2x_1 + 2x_2 - x_3 = -3 $$ $$4x_1 + 2x_3 = 8$$ $$6x_2 - 3x_3 = -12$$

    You should first check that the matrix $A$ is not singular.

  3. Consider a simple AR(1) model.

    $$y_t = \rho y_{t-1} + \varepsilon_t $$

    Where $\rho = 0.9$ and $\varepsilon$ is distributed normally with mean $0$ and $\sigma^2 = 2$. I want you to generate a realization of the AR(1) of length $T=250$ and plot it and then estimate $\rho$ using OLS. Assume that the time 0 realization of $y_t$ is $0$. You will need to write a loop to recursively generate the new value of $y_t$ at each time. Note that the unconditional mean is $0$ so that there is no intercept. If you estimate the process with an intercept then you should find that it is close to $0$. However, try and read the documentation for SciPy to find out how to estimate it without the intercept.