Notebook

Python/Numpy tutorial - Part 1: Arrays¶

Arrays the typical data structure used in data science a statistical analysis. Arrays in NumPy are representes by objects called ndarrays which allows data storage, manipulation, processing and modeling.

This first tutorial is intended to show you the basic operations and used of ndarrays using NumPy

In [83]:

import numpy as np

NumPy arrays basic properties¶

ndarrays provides information about its structure, shape and data content.

In [84]:

a = np.array([[1,2,3],[4,5,6]])
print "a=\n",a,"\n"
print "a has shape:", a.shape
print "a has", a.ndim,"dimensions"
print "a has",a.size,"elements"
print "a has type:",a.dtype
print "a occupies",a.itemsize,"Bytes"

a=
[[1 2 3]
 [4 5 6]] 

a has shape: (2, 3)
a has 2 dimensions
a has 6 elements
a has type: int64
a occupies 8 Bytes

Array creation and initialization¶

In [85]:

a = np.array([1,2,3]) # a row vector
print "\nRow vector:\n",a
a = np.array([[1],[2],[3]]) # a column vector
print "\nColumn vector:\n",a
a = np.empty((4,4)) #4x4 without initialization
print "\n4x4 uninitialized matrix:\n",a
a = np.zeros((4,4,)) #4x4 with zeros
print "\n4x4 matrix filled with zeros:\n",a
a = np.ones((4,4)) #4x4 with ones
print "\n4x4 matrix filled with ones\n",a
a = np.eye(4,4) #Square identity 5x5 matrix
print "\n4x4 identity matrix\n",a
a = np.random.rand(4,4) #4x4 with pseudo-random values
print "\n4x4 matrix filled with random values\n",a
a = np.arange(16) #0-15 sequence vector
print "\na vector sequence\n",a
a = np.reshape(a,(4,4)) #resized as 4x4 matrix
print "\nnow as matrix\n",a
a = np.arange(16).reshape(4,4) #a shorter way
print "\nsame matrix:\n",a
a = np.linspace(1,10,19) #sequence of 19 linearly spaced values from 0 to 2pi
print "\na linspace vector:\n",a
a = np.logspace(0,10,11,base=2.0) #sequence of 11 values from 2^0 to 2^10 evenly spaced values in a logaritmic scale 
print "\na logspace vector:\n",a

Row vector:
[1 2 3]

Column vector:
[[1]
 [2]
 [3]]

4x4 uninitialized matrix:
[[4.9e-324 1.5e-323      nan 9.9e-324]
 [     nan 2.0e-323 2.5e-323      nan]
 [4.9e-324 4.9e-324 4.9e-324 4.9e-324]
 [9.9e-324 9.9e-324 9.9e-324 9.9e-324]]

4x4 matrix filled with zeros:
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

4x4 matrix filled with ones
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]

4x4 identity matrix
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

4x4 matrix filled with random values
[[0.32263067 0.6615064  0.7554921  0.12567354]
 [0.78957221 0.15694903 0.00148748 0.0875882 ]
 [0.60052177 0.14539078 0.27666647 0.16746284]
 [0.80331606 0.12192216 0.46427821 0.95304171]]

a vector sequence
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]

now as matrix
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

same matrix:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

a linspace vector:
[ 1.   1.5  2.   2.5  3.   3.5  4.   4.5  5.   5.5  6.   6.5  7.   7.5
  8.   8.5  9.   9.5 10. ]

a logspace vector:
[1.000e+00 2.000e+00 4.000e+00 8.000e+00 1.600e+01 3.200e+01 6.400e+01
 1.280e+02 2.560e+02 5.120e+02 1.024e+03]

Basic operations with arrays¶

NumPy allows to manipulate and operate arrays structures as whole; unlike other languages, such as C++ or Java, wich are forced to access arrays element-by-element.

Is similar to work with matrices in MATLAB, however, NumPy also makes easier broadcast and element-wise operations as shown in the following code.

In [86]:

a = np.array([[1,3,-2],[-5,4,5]])
print "a=\n",a
b = np.array([[1,1,1],[2,2,2]])

### Broadcast operations ###
print "\na+1=\n",a+1
print "\na/2=\n",a/2.0
print "\na**2=\n",a**1
print "\nexp(a)=\n",np.exp(a)

### Some built-in operations ###
print "\nmean:\n", a.mean()
print "\nstd deviation:\n", a.std()
print "\nmax:\n", a.max()
print "\nmin:\n", a.min()
print "\nsum of elements:\n", a.sum()
a.sort(axis=1)
print "\nsorted:\n", a

a=
[[ 1  3 -2]
 [-5  4  5]]

a+1=
[[ 2  4 -1]
 [-4  5  6]]

a/2=
[[ 0.5  1.5 -1. ]
 [-2.5  2.   2.5]]

a**2=
[[ 1  3 -2]
 [-5  4  5]]

exp(a)=
[[2.71828183e+00 2.00855369e+01 1.35335283e-01]
 [6.73794700e-03 5.45981500e+01 1.48413159e+02]]

mean:
1.0

std deviation:
3.5118845842842465

max:
5

min:
-5

sum of elements:
6

sorted:
[[-2  1  3]
 [-5  4  5]]

Arrays indexing and element access¶

Elements or subsets of NumPy arrays can also be accessed using square brackets "[ ]". Unlike MATLAB, NumPy arrays starts at '0' and ends in 'n-1', being 'n' the length of the array.

In [87]:

a = np.random.randint(0,9,size=(5,5))
print "a=\n",a

### Accessing rows ###
print "\nall elememts in 'a' second row:\n",a[1,:]
### Accessing colums ###
print "\nall elememts in 'a' third column:\n",a[:,2]
### First element ###
print "\nall First element: \n",a[0,0]
### Last element ###
print "\nLast elements:\n",a[-1,-1]
### 3x3 subset ###
print "\nmiddle 3x3 subset:\n",a[1:4,1:4]
### Masking ###
print "\nelements greater than 5:\n",a[a>5]

a=
[[0 8 6 0 5]
 [5 5 6 2 5]
 [0 4 8 5 4]
 [1 0 1 3 6]
 [8 1 8 0 6]]

all elememts in 'a' second row:
[5 5 6 2 5]

all elememts in 'a' third column:
[6 6 8 1 8]

all First element: 
0

Last elements:
6

middle 3x3 subset:
[[5 6 2]
 [4 8 5]
 [0 1 3]]

elements greater than 5:
[8 6 6 8 6 8 8 6]

Modify size of an array¶

Arrays can be concatenated with other arrays compatible in size. You can also split one array in two, or change its shape keeping the same number of elements.

In [88]:

a = np.array([[1,3,-2,2],[-5,4,5,-1]])
print "a=\n",a
b = np.array([[1,1,1,1],[2,2,2,2]])
print "\nb=\n",b

print "\nConcatenate 'a','b' along rows: \n",np.hstack((a,b))
print "\nConcatenate 'a','b' along columns: \n",np.vstack((a,b))
print "\nSplit 'a' along columns: \n",np.hsplit(a,2)[0],"\nand\n",np.hsplit(a,2)[1]
print "\nSplit 'b' along rows: \n",np.vsplit(b,2)[0],"and",np.vsplit(b,2)[1]

a=
[[ 1  3 -2  2]
 [-5  4  5 -1]]

b=
[[1 1 1 1]
 [2 2 2 2]]

Concatenate 'a','b' along rows: 
[[ 1  3 -2  2  1  1  1  1]
 [-5  4  5 -1  2  2  2  2]]

Concatenate 'a','b' along columns: 
[[ 1  3 -2  2]
 [-5  4  5 -1]
 [ 1  1  1  1]
 [ 2  2  2  2]]

Split 'a' along columns: 
[[ 1  3]
 [-5  4]] 
and
[[-2  2]
 [ 5 -1]]

Split 'b' along rows: 
[[1 1 1 1]] and [[2 2 2 2]]

Linear Algebra operations¶

In [89]:

a = np.random.randint(0,9,size=(5,5))
print "a=\n",a

a_t = a.T
print "\na transpose\n",a_t

a_inv = np.linalg.inv(a)
print "\na inverse\n", a_inv

mat_prod = np.matmul(a,a_inv)
print "\nmatricial product (a x a⁻¹ = identity)\n", np.round(mat_prod)

a_det = np.linalg.det(a)
print "\na determinant\n", a_det

a=
[[4 5 3 0 6]
 [0 6 2 6 3]
 [0 7 3 0 3]
 [0 0 0 3 8]
 [2 1 0 0 0]]

a transpose
[[4 0 0 0 2]
 [5 6 7 0 1]
 [3 2 3 0 0]
 [0 6 0 3 0]
 [6 3 3 8 0]]

a inverse
[[ 0.13392857  0.02678571 -0.15178571 -0.05357143  0.23214286]
 [-0.26785714 -0.05357143  0.30357143  0.10714286  0.53571429]
 [ 0.64880952  0.19642857 -0.44642857 -0.39285714 -1.29761905]
 [ 0.06349206  0.19047619 -0.19047619 -0.04761905 -0.12698413]
 [-0.02380952 -0.07142857  0.07142857  0.14285714  0.04761905]]

matricial product (a x a⁻¹ = identity)
[[ 1.  0. -0.  0.  0.]
 [-0.  1. -0.  0.  0.]
 [-0.  0.  1. -0.  0.]
 [-0.  0.  0.  1.  0.]
 [ 0.  0. -0. -0.  1.]]

a determinant
1008.0000000000002

Example 1: Solve Linear equations¶

Solve following system equation

$x_1 + 2x_2 + x_3 -x_4 = 5 \\ 3x_1 + 2x_2 + 4x_3 + 4x_4 = 16 \\ 4x_1 + 4x_2 + 3x_3 + 4x_4 = 22 \\ 2x_1 + x_3 + 5x_4 = 15$

Which can be represented with the following matrices

$A = \begin{bmatrix}1 & 2 & 1 & -1\\ 3 & 2 & 4 & 4\\ 4 & 4 & 3 & 4\\ 2 & 0 & 3 & 5\end{bmatrix}$ $~~~~~~~$ $b = \begin{bmatrix}5\\ 16\\ 22\\ 15 \end{bmatrix}$

Where

$Ax = b$

$x = A^{-1}b$

In [ ]:

###YOUR CODE

Example 2: Linear regression trough least square optimization¶

$\beta = (X^TX)^{-1}X^Ty$

In [102]:

import matplotlib.pyplot as plt
x = np.linspace(0,10,100) 
y = 3.5*x + 5 + np.random.randn(100)
plt.plot(x,y,'.')
plt.show()

#YOUR CODE

In [ ]: