In the first part of the exercise we'll use a subset of the MNIST dataset to train a logistic regression classifier that can recognize handwritten digits. In the second part of the exercise we'll use an already trained neural network to recognize handwritten digits (the training itself is the object of exercise 4).

Start by importing the libraries and routines that we'll use.

In [1]:

```
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import scipy.io
from scipy.special import expit
from scipy.optimize import minimize
import time
```

Let's load the data and display it. Also prepend the bias terms (a column full of ones) to the observation matrix X.

In [2]:

```
#Load data stored in matlab format
mat = scipy.io.loadmat('ex3data1.mat')
#One can check that mat is a dictionary with keys 'X' and 'y'
X, y = mat['X'], mat['y']
#y contains the class labels, class 0 is labeled as '10', lets relabel it to 0.
y[y==10] = 0
(m, n) = np.shape(X)
# Add intercept term to X
Xt = np.concatenate((np.ones((m, 1)), X), axis=1)
```

In [3]:

```
#Each row in X represents a 20x20 image of a digit
#Randomly select n rows in matrix X and display them as an image
def display_data(X, nrows):
idxs = np.random.randint(0,X.shape[0], nrows)
for i in idxs:
plt.imshow(X[i,:].reshape((20,20), order='F'), cmap='Greys')
plt.show()
display_data(X, 5)
```

The implementation of the cost function for regularized logistic regression is the same as in exercise 2. We reproduce it here.

In [4]:

```
#Implementation of logistic regression cost function and gradient
#and of its regularized version.
def sigmoid(z):
return expit(z)
def lr_cost(theta, X, y):
""" Logistic Regression cost function and Gradient.
Denoting g(z) = 1/(1+exp(-z))
h = g(X theta) (m,)
In:
theta (n+1,) float array
X (m, n+1) float array
y (m, ) binary array
Out:
cost (float): 1/m (-y^T log(h) - (1-y)^T log(1-h))
grad (n+1, ): 1/m X^T(g(X theta)-y)
"""
m = X.shape[0]
h = sigmoid(np.dot(X, theta))
cost = 1./float(m) * (-np.dot(y, np.log(h)) - np.dot((1.-y), np.log(1.-h)))
grad = 1./float(m) * np.dot(X.T, h-y)
return cost, grad
def lr_cost_reg(theta, X, y, lam):
""" Regularized Logistic Regression cost function and Gradient.
Denoting g(z) = 1/(1+exp(-z))
h = g(X theta) (m,1)
t = theta[1:n+1]
In:
theta (n+1,) float array
X (m, n+1) float array
y (m,) binary array
lam : scalar, regularization parameter
Out:
cost (float): 1/m (-y^T log(h) - (1-y)^T log(1-h)) +lam/2m * t^T t
grad (n+1, 1): 1/m X^T(g(X theta)-y)
"""
cost, grad = lr_cost(theta, X, y)
t = theta[1:]
cost += lam/(2*m) * np.dot(t,t)
grad[1:] += lam/m * t
return cost, grad
```

For each one of the 10 classes we train a binary classifier which predicts the probability that an image is in class k or not.

In [5]:

```
def train_one_vs_all(X, y, lam, num_labels, method='BFGS'):
""" Function that trains a 1 vs all logistic regression classifier
In:
X : (m,n) array of floats containing the m observations, one per row.
It is assumed that the bias terms are included
(the first column is full of ones).
y : (m,1) array of ints between 0 and num_labels-1, each entry k designates
the label of the data at row k in X.
lam (float): positive , regularization parameter.
num_labels (int): number of labels (or classes).
method (str): method to be used by minimize, e.g BFGS or CG, see documentation
of scipy.optimize.minimize.
Out:
theta_opt : (n, num_labels) array of floats, column k contains
the optimal weights of classifier k
"""
options = {'maxiter':800, 'disp':True}
(m,n) = X.shape
theta_opt = np.zeros((n, num_labels))
theta0 = np.zeros(n)
#for each class
for k in range(num_labels):
yk = (y==k).astype(np.float64)
res = minimize(lr_cost_reg, theta0, args=(X,yk.flatten(),lam),
method=method, jac=True, options=options)
theta_opt[:,k] = res['x']
return theta_opt
lam = 1.0
#time it
t1 = time.time()
theta_opt = train_one_vs_all(Xt,y,lam, 10)
t2 = time.time()
print "time taken to train: %.2f seconds"%(t2-t1)
```

Evaluate the accuracy on the training set and define a function that does the actual prediction.

In [6]:

```
def predict_class(X, theta, num_labels):
m = X.shape[0]
prob = np.zeros((m, num_labels))
#for each class k
for k in range(num_labels):
#evaluate probability that each image is in class k
prob[:,k] = sigmoid(np.dot(X, theta[:,k]))
#predict for each image the class with the highest probability
prediction = prob.argmax(axis=1)
return prediction
def eval_accuracy(X, y, theta, num_labels):
yf = np.int64(y.flatten())
prediction = predict_class(X, theta, num_labels)
accuracy = np.double(prediction==yf).mean() * 100
return accuracy
acc = eval_accuracy(Xt, y, theta_opt, 10)
print ("Accuracy on Training set = %.2f percent"%acc)
```

In [7]:

```
#Convienience function to print an image in the dataset, its label and the predicted class.
def classify_img(Xt, y, theta, row):
img = np.array(Xt[row,1:])
plt.imshow(img.reshape((20,20), order='F'), cmap='Greys')
plt.title('Digit at row %d'%row)
print("Image at row %d is labeled as a %d:"%(row,y[row]))
pred = predict_class(Xt[row,:], theta_opt, 10)
print("Predicted class by Logistic regression: %d"%pred[0])
classify_img(Xt, y, theta_opt, row=3500)
```

The network to perform multiclass classification of the handwritten digits has already been performed, all we need to do is to load the weights and code up the prediction function.

In [8]:

```
#Neural network parameters
input_layer_size = 400 # 20x20 Input Images of Digits
hidden_layer_size = 25 # 25 hidden units
num_labels = 10
#Load weights
nn_weights = scipy.io.loadmat('ex3weights.mat')
```

In [9]:

```
theta1 = nn_weights['Theta1']
theta2 = nn_weights['Theta2']
print theta1.shape
print theta2.shape
```

In [11]:

```
def nn_predict(theta1, theta2, X):
a1 = sigmoid(np.dot(theta1, X))
bias = np.ones(1)
if X.ndim == 2:
bias = np.ones((1,a1.shape[1]))
a1biased = np.concatenate((bias, a1), axis = 0)
a2 = sigmoid(np.dot(theta2, a1biased))
return a2
```

In [17]:

```
#Example use of the neural network prediction and comparison with logistic regression
classify_img(Xt, y, theta_opt, row=0)
pred = nn_predict(theta1, theta2, Xt[0,:])
print "Output of Neural Network Prediction:", pred.argmax()+1
# note that the 10th class is predicted by the neural network, which means a 0.
```

In [18]:

```
# Evalutae accuracy
mat = scipy.io.loadmat('ex3data1.mat')
#The original vector of labels is indexed between 1 and 10 where 10 represents the digit 0
y = mat['y'][:,0]
pred = nn_predict(theta1, theta2, Xt.T)
```

In [20]:

```
#Check that the dimensions are correct, 1 column per datum
print pred.shape
```

In [21]:

```
idxs = np.argmax(pred, axis = 0 )
idxs += 1 #so that the range is 1..10
print "Predicted Classes : ", idxs
print "Ground Truth :", y
nn_acc = np.float64(idxs==y).mean()*100
print "Accuracy of Neural Network Multiclass classification = %.2f percent"%nn_acc
```