# Exercise 3 : Multiclass Classification and Neural Networks¶

In the first part of the exercise we'll use a subset of the MNIST dataset to train a logistic regression classifier that can recognize handwritten digits. In the second part of the exercise we'll use an already trained neural network to recognize handwritten digits (the training itself is the object of exercise 4).

## Part 1: Multiclass Classification using Logistic Regression¶

Start by importing the libraries and routines that we'll use.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import scipy.io
from scipy.special import expit
from scipy.optimize import minimize
import time


Let's load the data and display it. Also prepend the bias terms (a column full of ones) to the observation matrix X.

In [2]:
#Load data stored in matlab format
#One can check that mat is a dictionary with keys 'X' and 'y'
X, y = mat['X'], mat['y']
#y contains the class labels, class 0 is labeled as '10', lets relabel it to 0.
y[y==10] = 0
(m, n) = np.shape(X)
# Add intercept term to X
Xt = np.concatenate((np.ones((m, 1)), X), axis=1)

In [3]:
#Each row in X represents a 20x20 image of a digit
#Randomly select n rows in matrix X and display them as an image
def display_data(X, nrows):
idxs = np.random.randint(0,X.shape[0], nrows)
for i in idxs:
plt.imshow(X[i,:].reshape((20,20), order='F'), cmap='Greys')
plt.show()
display_data(X, 5)


The implementation of the cost function for regularized logistic regression is the same as in exercise 2. We reproduce it here.

In [4]:
#Implementation of logistic regression cost function and gradient
#and of its regularized version.
def sigmoid(z):
return expit(z)
def lr_cost(theta, X, y):
""" Logistic Regression cost function and Gradient.
Denoting g(z) = 1/(1+exp(-z))
h = g(X theta) (m,)
In:
theta (n+1,) float array
X (m, n+1) float array
y (m, ) binary array
Out:
cost (float): 1/m (-y^T log(h) - (1-y)^T log(1-h))
grad (n+1, ): 1/m X^T(g(X theta)-y)
"""
m = X.shape[0]
h = sigmoid(np.dot(X, theta))
cost = 1./float(m) * (-np.dot(y, np.log(h)) - np.dot((1.-y), np.log(1.-h)))
grad = 1./float(m) * np.dot(X.T, h-y)
def lr_cost_reg(theta, X, y, lam):
""" Regularized Logistic Regression cost function and Gradient.
Denoting g(z) = 1/(1+exp(-z))
h = g(X theta) (m,1)
t = theta[1:n+1]
In:
theta (n+1,) float array
X (m, n+1) float array
y (m,) binary array
lam : scalar, regularization parameter
Out:
cost (float): 1/m (-y^T log(h) - (1-y)^T log(1-h)) +lam/2m * t^T t
grad (n+1, 1): 1/m X^T(g(X theta)-y)
"""
cost, grad = lr_cost(theta, X, y)
t = theta[1:]
cost += lam/(2*m) * np.dot(t,t)


### Training a 1 vs all classifier.¶

For each one of the 10 classes we train a binary classifier which predicts the probability that an image is in class k or not.

In [5]:
def train_one_vs_all(X, y, lam, num_labels, method='BFGS'):
""" Function that trains a 1 vs all logistic regression classifier
In:
X : (m,n) array of floats containing the m observations, one per row.
It is assumed that the bias terms are included
(the first column is full of ones).
y : (m,1) array of ints between 0 and num_labels-1, each entry k designates
the label of the data at row k in X.
lam (float): positive , regularization parameter.
num_labels (int): number of labels (or classes).
method (str): method to be used by minimize, e.g BFGS or CG, see documentation
of scipy.optimize.minimize.
Out:
theta_opt : (n, num_labels) array of floats, column k contains
the optimal weights of classifier k
"""
options = {'maxiter':800, 'disp':True}
(m,n) = X.shape
theta_opt = np.zeros((n, num_labels))
theta0 = np.zeros(n)
#for each class
for k in range(num_labels):
yk = (y==k).astype(np.float64)
res = minimize(lr_cost_reg, theta0, args=(X,yk.flatten(),lam),
method=method, jac=True, options=options)
theta_opt[:,k] = res['x']
return theta_opt
lam = 1.0
#time it
t1 = time.time()
theta_opt = train_one_vs_all(Xt,y,lam, 10)
t2 = time.time()
print "time taken to train: %.2f seconds"%(t2-t1)

Optimization terminated successfully.
Current function value: 0.020149
Iterations: 179
Function evaluations: 180
Optimization terminated successfully.
Current function value: 0.026961
Iterations: 164
Function evaluations: 165
Optimization terminated successfully.
Current function value: 0.068444
Iterations: 208
Function evaluations: 209
Optimization terminated successfully.
Current function value: 0.071707
Iterations: 208
Function evaluations: 209
Optimization terminated successfully.
Current function value: 0.052051
Iterations: 203
Function evaluations: 204
Optimization terminated successfully.
Current function value: 0.076859
Iterations: 221
Function evaluations: 222
Optimization terminated successfully.
Current function value: 0.034787
Iterations: 187
Function evaluations: 188
Optimization terminated successfully.
Current function value: 0.046726
Iterations: 191
Function evaluations: 192
Optimization terminated successfully.
Current function value: 0.092747
Iterations: 228
Function evaluations: 229
Optimization terminated successfully.
Current function value: 0.089312
Iterations: 218
Function evaluations: 219
time taken to train: 18.01 seconds


Evaluate the accuracy on the training set and define a function that does the actual prediction.

In [6]:
def predict_class(X, theta, num_labels):
m = X.shape[0]
prob = np.zeros((m, num_labels))
#for each class k
for k in range(num_labels):
#evaluate probability that each image is in class k
prob[:,k] = sigmoid(np.dot(X, theta[:,k]))
#predict for each image the class with the highest probability
prediction = prob.argmax(axis=1)
return prediction
def eval_accuracy(X, y, theta, num_labels):
yf = np.int64(y.flatten())
prediction = predict_class(X, theta, num_labels)
accuracy = np.double(prediction==yf).mean() * 100
return accuracy
acc = eval_accuracy(Xt, y, theta_opt, 10)
print ("Accuracy on Training set = %.2f percent"%acc)

Accuracy on Training set = 94.46 percent

In [7]:
#Convienience function to print an image in the dataset, its label and the predicted class.
def classify_img(Xt, y, theta, row):
img = np.array(Xt[row,1:])
plt.imshow(img.reshape((20,20), order='F'), cmap='Greys')
plt.title('Digit at row %d'%row)
print("Image at row %d is labeled as a %d:"%(row,y[row]))
pred =  predict_class(Xt[row,:], theta_opt, 10)
print("Predicted class by Logistic regression: %d"%pred[0])
classify_img(Xt, y, theta_opt, row=3500)

Image at row 3500 is labeled as a 7:
Predicted class by Logistic regression: 7


# Part 2: Neural Networks¶

The network to perform multiclass classification of the handwritten digits has already been performed, all we need to do is to load the weights and code up the prediction function.

In [8]:
#Neural network parameters
input_layer_size  = 400  # 20x20 Input Images of Digits
hidden_layer_size = 25   # 25 hidden units
num_labels = 10

In [9]:
theta1  = nn_weights['Theta1']
theta2  = nn_weights['Theta2']
print theta1.shape
print theta2.shape

(25, 401)
(10, 26)

In [11]:
def nn_predict(theta1, theta2, X):
a1 = sigmoid(np.dot(theta1, X))
bias = np.ones(1)
if X.ndim == 2:
bias = np.ones((1,a1.shape[1]))
a1biased = np.concatenate((bias, a1), axis = 0)
a2 = sigmoid(np.dot(theta2, a1biased))
return a2

In [17]:
#Example use of the neural network prediction and comparison with logistic regression
classify_img(Xt, y, theta_opt, row=0)
pred = nn_predict(theta1, theta2, Xt[0,:])
print "Output of Neural Network Prediction:", pred.argmax()+1
# note that the 10th class is predicted by the neural network, which means a 0.

Image at row 0 is labeled as a 0:
Predicted class by Logistic regression: 0
Output of Neural Network Prediction: 10

In [18]:
# Evalutae accuracy
#The original vector of labels is indexed between 1 and 10 where 10 represents the digit 0
y = mat['y'][:,0]
pred = nn_predict(theta1, theta2, Xt.T)

In [20]:
#Check that the dimensions are correct, 1 column per datum
print pred.shape

(10, 5000)

In [21]:
idxs = np.argmax(pred, axis = 0 )
idxs += 1 #so that the range is 1..10
print "Predicted Classes : ", idxs
print "Ground Truth :", y
nn_acc = np.float64(idxs==y).mean()*100
print "Accuracy of Neural Network Multiclass classification = %.2f percent"%nn_acc

Predicted Classes :  [10 10 10 ...,  9  9  9]
Ground Truth : [10 10 10 ...,  9  9  9]
Accuracy of Neural Network Multiclass classification = 97.52 percent