MNIST digit recognition with LeNet

In this practical session we will build a convolutional neural network that is able to recognise the digits 0-9 in images using TensorFlow and Keras.

You can run the code in a cell by selecting the cell and pressing Shift+Enter.

1) Import statements

First, import some of the packages we will need (run the cell below).

Documentation for each of these packages can be found online:
For numpy: https://docs.scipy.org/doc/numpy-dev/user/quickstart.html
For matplotlib: http://matplotlib.org/api/pyplot_api.html
For Keras: https://keras.io/
For random: https://docs.python.org/2/library/random.html

In [1]:
import pickle
import gzip
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import tensorflow as tf
import keras

import time
import random
random.seed(0)
Using TensorFlow backend.

2) Loading the data

Download the data from: http://deeplearning.net/data/mnist/mnist.pkl.gz and save it somewhere on your disc. The function below loads the data from the location where you have saved it (path) and stores it in numpy arrays. The data is already split in a train set, a validation set and a test set. Each of these three sets are saved in two separate variables, one containing the labels and one containing the images. The labels are lists of numbers between 0 and 9. The images are 4-dimensional arrays (of the same length) with the image dimensions in the last 2 dimensions.

Change the path in the second cell below to the location where you have saved it and run the two cells.

In [2]:
def loadMNIST(path):
    f = gzip.open(path, 'rb')
    train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
    f.close()
    
    train_set_labels = train_set[1]
    train_set_images = np.resize(train_set[0],(len(train_set_labels),28,28,1))
    train_set_images = np.pad(train_set_images,((0,0),(2,2),(2,2),(0,0)),'constant', constant_values=0)
   
    valid_set_labels = valid_set[1]
    valid_set_images = np.resize(valid_set[0],(len(valid_set_labels),28,28,1))
    valid_set_images = np.pad(valid_set_images,((0,0),(2,2),(2,2),(0,0)),'constant', constant_values=0)

    test_set_labels = test_set[1]
    test_set_images = np.resize(test_set[0],(len(test_set_labels),28,28,1))
    test_set_images = np.pad(test_set_images,((0,0),(2,2),(2,2),(0,0)),'constant', constant_values=0)
    
    return train_set_labels, train_set_images, valid_set_labels, valid_set_images, test_set_labels, test_set_images
In [6]:
train_set_labels, train_set_images, valid_set_labels, valid_set_images, test_set_labels, test_set_images = loadMNIST(r'/Users/mitko/mnist.pkl.gz')

3) Visualising the data

Let's look at the data we've just loaded!

How many samples are in each set? (Use .shape to see the dimensions)

How large are the images?

How many samples are there for each of the 10 digits?

Show some of the images with plt.imshow (use cmap='gray_r' for black digits on a white background and interpolation='none' to see the real pixels), you can access one of the training images as: train_set_images[i,:,:,0].

In [ ]:
 

4) One-hot encoding

Convert the labels from a number between 0 and 9 to 'one-hot encoding'. This means that for a label with number 3, there should be a 1 at element 3 and 0 everywhere else, i.e. [0, 0, 0, 1, 0, 0, 0, 0 ,0 ,0]. These are our target nodes, the node at position 3 should be active when the input image shows a 3. The code below does this for the training labels.

Do the same for the validation labels!

In [7]:
train_set_labels_output = np.zeros((len(train_set_labels),10),dtype=np.int16)        
for n in range(10):
    train_set_labels_output[:,n] = train_set_labels==n

print(train_set_labels[:10])
print(train_set_labels_output[:10])
[5 0 4 1 9 2 1 3 1 4]
[[0 0 0 0 0 1 0 0 0 0]
 [1 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 1 0 0 0 0 0]
 [0 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 1]
 [0 0 1 0 0 0 0 0 0 0]
 [0 1 0 0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0 0 0 0]
 [0 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 1 0 0 0 0 0]]

5) Building the network

The function below builds the LeNet network as we looked at in the lecture. Using the Sequential model from Keras, new layers can be assed using .add. The output_shape statements show the dimensions after the current layer.

Can you recognise all the elements of the network from the lecture?

In [8]:
cnn = keras.models.Sequential()

layer0 = keras.layers.Conv2D(6, (5, 5), activation='relu', input_shape=(32, 32, 1))
cnn.add(layer0)
print(layer0.input_shape)
print(layer0.output_shape)

layer1 = keras.layers.MaxPooling2D(pool_size=(2, 2))
cnn.add(layer1)
print(layer1.output_shape)

layer2 = keras.layers.Conv2D(16, (5, 5), activation='relu')
cnn.add(layer2)
print(layer2.output_shape)

layer3 = keras.layers.MaxPooling2D(pool_size=(2, 2))
cnn.add(layer3)
print(layer3.output_shape)

layer4 = keras.layers.Flatten() 
cnn.add(layer4)
print(layer4.output_shape)

layer5 = keras.layers.Dense(120, activation='relu')
cnn.add(layer5)
print(layer5.output_shape)

layer6 = keras.layers.Dense(84, activation='relu')
cnn.add(layer6)
print(layer6.output_shape)

layer7 = keras.layers.Dense(10, activation='softmax')
cnn.add(layer7)
print(layer7.output_shape)
(None, 32, 32, 1)
(None, 28, 28, 6)
(None, 14, 14, 6)
(None, 10, 10, 16)
(None, 5, 5, 16)
(None, 400)
(None, 120)
(None, 84)
(None, 10)

6) Define optimiser and loss function

We will use stochastic gradient descent with momentum as optimiser and negative log likelihood (called categorical cross-entropy in keras) as loss function.

In [9]:
sgd = keras.optimizers.SGD(lr=0.001, momentum=0.9)
cnn.compile(loss='categorical_crossentropy', optimizer=sgd)

7) Training the network

Do the training in random batches of a specific number of samples (we set the values below to 250 batches of 100 samples).

Use random.sample(a,n) to select a random batch of n samples from array a.

Next, use the cnn.train_on_batch(X,Y) function to perform an update of the network based on a random batch of training images X and training labels Y. Remember to use the one-hot encoding of the training labels!

The train function returns the loss. Save the loss of each training batch in the variable 'losslist' so we can look at them later (you can use .append() to add the current loss to the list).

Also keep track of the loss for random batches from the validation set to see if your network is not overfitting on the training set. You can use cnn.test_on_batch(X,Y) to compute the loss on the validation set (without doing an update).

Remember that if you restart the training process from the beginning you also need to reinitialise the network by running the cells starting from 5) again.

In [ ]:
trainingsamples = list(range(len(train_set_labels))) #numbers from 0 until the number of samples
validsamples = list(range(len(valid_set_labels)))

minibatches = 250
minibatchsize = 100 

losslist = []
validlosslist = []

t0 = time.time()

for i in range(minibatches):
    #select random training en validation samples and perform training and validation steps here.   

t1 = time.time()
print('Training time: {} seconds'.format(t1-t0))

8) Loss curves

Plot the loss curves for the training and validation sets (use plt.plot(losslist) for the training loss).

Is 250 batches enough to train the network? How many do we need?

What happens if you change the learning rate in 6)?

What happens if you change the minibatchsize?

What happens if you use another optimizer?

Try to get the loss as low as possible!

What happens if you make changes to the network? Use for example more or less filters or nodes, remove a layer, etc.

In [ ]:
 

9) Evaluation on the test set

Evaluate the network on the test set with the cnn.predict(X, batch_size=10000) function. Depending on the available memory the batch size can be as large as the whole test set (10 000 in this case). You can use np.argmax() to select the node with the highest probability.

How well did it do? How many of the 10 000 test samples did it label correctly?

Have a look at which of the samples that it did not label correctly. Look at which label it selected and which label it should have selected. Can you see why it made the error?

In [ ]:
 

10) Visualising what the network has learned

To see what is happening within the network we can visualise the learned filters and their feature maps. We now define additional functions that obtain the feature maps after each layer.

In [ ]:
layer0f = keras.backend.function([keras.backend.learning_phase(), cnn.layers[0].input], [cnn.layers[0].output])
layer1f = keras.backend.function([keras.backend.learning_phase(), cnn.layers[0].input], [cnn.layers[1].output])
layer2f = keras.backend.function([keras.backend.learning_phase(), cnn.layers[0].input], [cnn.layers[2].output])
layer3f = keras.backend.function([keras.backend.learning_phase(), cnn.layers[0].input], [cnn.layers[3].output])

11) Visualising the features

Let's look at the feature maps after the first layer for one of the images from the test set. We have defined the function 'layer0f' for that. You can apply it to the test images with layer0f([cnn,test_set_images]).

Look at the shape of the output of this function.

Visualise the 6 features maps for some of the 10 000 test samples with plt.imshow

In [ ]:
 

12) Visualising the filters

Let's look at the filters that are learned. We can use the function 'get_weights' for that. These are the filters that are applied to the images to obtain the feature maps that we saw above.

Look at the shape of the filters and biases.

Visualise the 6 filters of the first layer with plt.imshow. Do you see any structure in the learned filters?

In [ ]:
 

13) Saving a trained model to disc

Training a network can often take a very long time. It's therefore useful to be able save a trained model to disc. You can use the functions below to save and load a trained network. This is especially useful for the project.

In [ ]:
cnn.save(r'D:\trained_network.h5')
In [ ]:
cnn = keras.models.load_model(r'D:\trained_network.h5')