Handwritten Digit Recognition using Neural Network Framework Lasagne

This iPython Notebook is an adaptation of the Lasagne tutorial. It demonstrates how to create, train and evaluate neural networks using the python packages Lasagne and Theano.

  • Author: Johannes Maucher
  • Last Update: 24.11.2015
In [3]:
from __future__ import print_function

import sys
import os
import time

import numpy as np
import theano
import theano.tensor as T

import lasagne
from urllib import urlretrieve
import gzip

Download and prepare the MNIST dataset

The following code loads the MNIST handwritten digit dataset. In this version each digit is represented by $28\times28$ pixels.

In [4]:
# ################## Download and prepare the MNIST dataset ##################
# This is just some way of getting the MNIST dataset from an online location
# and loading it into numpy arrays. It doesn't involve Lasagne at all.

def load_dataset():
    
    def download(filename, source='http://yann.lecun.com/exdb/mnist/'):
        print("Downloading %s" % filename)
        urlretrieve(source + filename, filename)

    # We then define functions for loading MNIST images and labels.
    # For convenience, they also download the requested files if needed.

    def load_mnist_images(filename):
        if not os.path.exists(filename):
            download(filename)
        # Read the inputs in Yann LeCun's binary format.
        with gzip.open(filename, 'rb') as f:
            data = np.frombuffer(f.read(), np.uint8, offset=16)
        # The inputs are vectors now, we reshape them to monochrome 2D images,
        # following the shape convention: (examples, channels, rows, columns)
        data = data.reshape(-1, 1, 28, 28)
        # The inputs come as bytes, we convert them to float32 in range [0,1].
        # (Actually to range [0, 255/256], for compatibility to the version
        # provided at http://deeplearning.net/data/mnist/mnist.pkl.gz.)
        return data / np.float32(256)

    def load_mnist_labels(filename):
        if not os.path.exists(filename):
            download(filename)
        # Read the labels in Yann LeCun's binary format.
        with gzip.open(filename, 'rb') as f:
            data = np.frombuffer(f.read(), np.uint8, offset=8)
        # The labels are vectors of integers now, that's exactly what we want.
        return data

    # We can now download and read the training and test set images and labels.
    X_train = load_mnist_images('train-images-idx3-ubyte.gz')
    y_train = load_mnist_labels('train-labels-idx1-ubyte.gz')
    X_test = load_mnist_images('t10k-images-idx3-ubyte.gz')
    y_test = load_mnist_labels('t10k-labels-idx1-ubyte.gz')

    # We reserve the last 10000 training examples for validation.
    X_train, X_val = X_train[:-10000], X_train[-10000:]
    y_train, y_val = y_train[:-10000], y_train[-10000:]

    # We just return all the arrays in order, as expected in main().
    # (It doesn't matter how we do this as long as we can read them again.)
    return X_train, y_train, X_val, y_val, X_test, y_test

Define Neural Network

In this section three different types of Neural Networks are defined:

  • Multilayer Perceptron with 2 hidden layers
  • Multilayer Perceptron with 2 hidden layers with dropout
  • Convolutional Neural Network with 2 Convolutional layers, Max-Pooling and a fully connected layer.

These network topologies are defined in the following functions. The functions will be called in the main() method below.

MLP without dropout

This function creates a MLP of two hidden layers of 800 units each, followed by a softmax output layer of 10 units. In the hidden layers ReLu-activation function is applied, in the output layer softmax.

In [5]:
def build_mlp(input_var=None):
    # Input layer, specifying the expected input shape of the network
    # (unspecified batchsize, 1 channel, 28 rows and 28 columns) and
    # linking it to the given Theano variable `input_var`, if any:
    l_in = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),
                                     input_var=input_var)

    # Add a fully-connected layer of 800 units, using the linear rectifier, and
    # initializing weights with Glorot's scheme (which is the default anyway):
    l_hid1 = lasagne.layers.DenseLayer(
            l_in, num_units=800,
            nonlinearity=lasagne.nonlinearities.rectify,
            W=lasagne.init.GlorotUniform())

    # Another 800-unit layer:
    l_hid2 = lasagne.layers.DenseLayer(
            l_hid1, num_units=800,
            nonlinearity=lasagne.nonlinearities.rectify)


    # Finally, we'll add the fully-connected output layer, of 10 softmax units:
    l_out = lasagne.layers.DenseLayer(
            l_hid2, num_units=10,
            nonlinearity=lasagne.nonlinearities.softmax)

    # Each layer is linked to its incoming layer(s), so we only need to pass
    # the output layer to give access to a network in Lasagne:
    return l_out

MLP with dropout

This function creates a MLP of two hidden layers of 800 units each, followed by a softmax output layer of 10 units. It applies 20% dropout to the input data and 50% dropout to the hidden layers.

In [6]:
def build_mlp_dropout(input_var=None):
    # Input layer, specifying the expected input shape of the network
    # (unspecified batchsize, 1 channel, 28 rows and 28 columns) and
    # linking it to the given Theano variable `input_var`, if any:
    l_in = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),
                                     input_var=input_var)

    # Apply 20% dropout to the input data:
    l_in_drop = lasagne.layers.DropoutLayer(l_in, p=0.2)

    # Add a fully-connected layer of 800 units, using the linear rectifier, and
    # initializing weights with Glorot's scheme (which is the default anyway):
    l_hid1 = lasagne.layers.DenseLayer(
            l_in_drop, num_units=800,
            nonlinearity=lasagne.nonlinearities.rectify,
            W=lasagne.init.GlorotUniform())

    # We'll now add dropout of 50%:
    l_hid1_drop = lasagne.layers.DropoutLayer(l_hid1, p=0.5)

    # Another 800-unit layer:
    l_hid2 = lasagne.layers.DenseLayer(
            l_hid1_drop, num_units=800,
            nonlinearity=lasagne.nonlinearities.rectify)

    # 50% dropout again:
    l_hid2_drop = lasagne.layers.DropoutLayer(l_hid2, p=0.5)

    # Finally, we'll add the fully-connected output layer, of 10 softmax units:
    l_out = lasagne.layers.DenseLayer(
            l_hid2_drop, num_units=10,
            nonlinearity=lasagne.nonlinearities.softmax)

    # Each layer is linked to its incoming layer(s), so we only need to pass
    # the output layer to give access to a network in Lasagne:
    return l_out

Create Convolutional Neural Network

This function creates a CNN of two convolution + pooling stages and a fully-connected hidden layer in front of the output layer.

In [7]:
def build_cnn(input_var=None):
    # Input layer, as usual:
    network = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),
                                        input_var=input_var)
    # This time we do not apply input dropout, as it tends to work less well
    # for convolutional layers.

    # Convolutional layer with 6 kernels of size 5x5. Strided and padded
    # convolutions are supported as well; see the docstring.
    network = lasagne.layers.Conv2DLayer(
            network, num_filters=6, filter_size=(5, 5),
            nonlinearity=lasagne.nonlinearities.rectify,
            W=lasagne.init.GlorotUniform())
    # Expert note: Lasagne provides alternative convolutional layers that
    # override Theano's choice of which implementation to use; for details
    # please see http://lasagne.readthedocs.org/en/latest/user/tutorial.html.

    # Max-pooling layer of factor 2 in both dimensions:
    network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))

    # Another convolution with 16 5x5 kernels, and another 2x2 pooling:
    network = lasagne.layers.Conv2DLayer(
            network, num_filters=16, filter_size=(5, 5),
            nonlinearity=lasagne.nonlinearities.rectify)
    network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))

    # A fully-connected layer of 100 units with 50% dropout on its inputs:
    network = lasagne.layers.DenseLayer(
            lasagne.layers.dropout(network, p=.5),
            num_units=100,
            nonlinearity=lasagne.nonlinearities.rectify)

    # And, finally, the 10-unit output layer with 50% dropout on its inputs:
    network = lasagne.layers.DenseLayer(
            lasagne.layers.dropout(network, p=.5),
            num_units=10,
            nonlinearity=lasagne.nonlinearities.softmax)

    return network

Batch Iterator

This is just a simple helper function iterating over training data in mini-batches of a particular size, optionally in random order. It assumes data is available as numpy arrays.

In [8]:
def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
    assert len(inputs) == len(targets)
    if shuffle:
        indices = np.arange(len(inputs))
        np.random.shuffle(indices)
    for start_idx in range(0, len(inputs) - batchsize + 1, batchsize):
        if shuffle:
            excerpt = indices[start_idx:start_idx + batchsize]
        else:
            excerpt = slice(start_idx, start_idx + batchsize)
        yield inputs[excerpt], targets[excerpt]

Create, train and test a neural network

In [9]:
def main(model='cnn', num_epochs=50):
    # Load the dataset
    print("Loading data...")
    X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()

    # Prepare Theano variables for inputs and targets
    input_var = T.tensor4('inputs')
    target_var = T.ivector('targets')

    # Create neural network model (depending on first command line parameter) and 
    # pass input data to the network
    print("Building model and compiling functions...")
    if model == 'mlp':
        network = build_mlp(input_var)
    elif model == 'mlp_dropout':
        network = build_mlp_dropout(input_var)
    elif model == 'cnn':
        network = build_cnn(input_var)
    else:
        print("Unrecognized model type %r." % model)
        return

    # Create a loss expression for training, i.e., a scalar objective we want
    # to minimize (for our multi-class problem, it is the cross-entropy loss):
    prediction = lasagne.layers.get_output(network)
    # compare prediction and target values by applying cross-entropy loss function
    loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)
    loss = loss.mean()
    # We could add some weight decay as well here, see lasagne.regularization.

    # Create update expressions for training, i.e., how to modify the
    # parameters at each training step. Here, we'll use Stochastic Gradient
    # Descent (SGD) with Nesterov momentum, but Lasagne offers plenty more.
    params = lasagne.layers.get_all_params(network, trainable=True)
    updates = lasagne.updates.momentum(
            loss, params, learning_rate=0.01, momentum=0.9)

    # Create a loss expression for validation/testing. The crucial difference
    # here is that we do a deterministic forward pass through the network,
    # disabling dropout layers.
    test_prediction = lasagne.layers.get_output(network, deterministic=True)
    test_loss = lasagne.objectives.categorical_crossentropy(test_prediction,
                                                            target_var)
    test_loss = test_loss.mean()
    # As a bonus, also create an expression for the classification accuracy:
    test_acc = T.mean(T.eq(T.argmax(test_prediction, axis=1), target_var),
                      dtype=theano.config.floatX)

    # Compile a function performing a training step on a mini-batch (by giving
    # the updates dictionary) and returning the corresponding training loss:
    train_fn = theano.function([input_var, target_var], loss, updates=updates)

    # Compile a second function computing the validation loss and accuracy:
    val_fn = theano.function([input_var, target_var], [test_loss, test_acc])

    # Finally, launch the training loop.
    print("Starting training...")
    # We iterate over epochs:
    for epoch in range(num_epochs):
        # In each epoch, we do a full pass over the training data:
        train_err = 0
        train_batches = 0
        start_time = time.time()
        for batch in iterate_minibatches(X_train, y_train, 500, shuffle=True):
            inputs, targets = batch
            train_err += train_fn(inputs, targets)
            train_batches += 1

        # And a full pass over the validation data:
        val_err = 0
        val_acc = 0
        val_batches = 0
        for batch in iterate_minibatches(X_val, y_val, 500, shuffle=False):
            inputs, targets = batch
            err, acc = val_fn(inputs, targets)
            val_err += err
            val_acc += acc
            val_batches += 1

        # Then we print the results for this epoch:
        print("Epoch {} of {} took {:.3f}s".format(
            epoch + 1, num_epochs, time.time() - start_time))
        print("  training loss:\t\t{:.6f}".format(train_err / train_batches))
        print("  validation loss:\t\t{:.6f}".format(val_err / val_batches))
        print("  validation accuracy:\t\t{:.2f} %".format(
            val_acc / val_batches * 100))

    # After training, we compute and print the test error:
    test_err = 0
    test_acc = 0
    test_batches = 0
    for batch in iterate_minibatches(X_test, y_test, 500, shuffle=False):
        inputs, targets = batch
        err, acc = val_fn(inputs, targets)
        test_err += err
        test_acc += acc
        test_batches += 1
    print("Final results:")
    print("  test loss:\t\t\t{:.6f}".format(test_err / test_batches))
    print("  test accuracy:\t\t{:.2f} %".format(
        test_acc / test_batches * 100))

    # Optionally, you could now dump the network weights to a file like this:
    # np.savez('model.npz', *lasagne.layers.get_all_param_values(network))
    #
    # And load them again later on like this:
    # with np.load('model.npz') as f:
    #     param_values = [f['arr_%d' % i] for i in range(len(f.files))]
    # lasagne.layers.set_all_param_values(network, param_values)
In [10]:
main(model='mlp')
Loading data...
Building model and compiling functions...
Starting training...
Epoch 1 of 50 took 29.102s
  training loss:		0.903632
  validation loss:		0.355845
  validation accuracy:		90.16 %
Epoch 2 of 50 took 29.969s
  training loss:		0.343195
  validation loss:		0.277785
  validation accuracy:		92.08 %
Epoch 3 of 50 took 28.258s
  training loss:		0.284169
  validation loss:		0.246285
  validation accuracy:		92.99 %
Epoch 4 of 50 took 28.392s
  training loss:		0.249461
  validation loss:		0.221794
  validation accuracy:		93.77 %
Epoch 5 of 50 took 28.456s
  training loss:		0.223916
  validation loss:		0.203575
  validation accuracy:		94.25 %
Epoch 6 of 50 took 30.886s
  training loss:		0.203778
  validation loss:		0.186172
  validation accuracy:		95.11 %
Epoch 7 of 50 took 31.064s
  training loss:		0.187378
  validation loss:		0.172628
  validation accuracy:		95.28 %
Epoch 8 of 50 took 31.483s
  training loss:		0.171778
  validation loss:		0.161129
  validation accuracy:		95.64 %
Epoch 9 of 50 took 29.784s
  training loss:		0.159251
  validation loss:		0.153672
  validation accuracy:		95.75 %
Epoch 10 of 50 took 28.103s
  training loss:		0.147894
  validation loss:		0.144020
  validation accuracy:		96.16 %
Epoch 11 of 50 took 28.175s
  training loss:		0.138330
  validation loss:		0.138125
  validation accuracy:		96.26 %
Epoch 12 of 50 took 28.125s
  training loss:		0.129711
  validation loss:		0.131220
  validation accuracy:		96.48 %
Epoch 13 of 50 took 28.587s
  training loss:		0.121458
  validation loss:		0.128913
  validation accuracy:		96.43 %
Epoch 14 of 50 took 28.092s
  training loss:		0.114737
  validation loss:		0.120455
  validation accuracy:		96.73 %
Epoch 15 of 50 took 28.313s
  training loss:		0.107971
  validation loss:		0.116182
  validation accuracy:		96.84 %
Epoch 16 of 50 took 28.164s
  training loss:		0.100992
  validation loss:		0.116917
  validation accuracy:		96.99 %
Epoch 17 of 50 took 28.164s
  training loss:		0.095679
  validation loss:		0.108687
  validation accuracy:		97.02 %
Epoch 18 of 50 took 28.164s
  training loss:		0.091146
  validation loss:		0.106824
  validation accuracy:		97.13 %
Epoch 19 of 50 took 28.014s
  training loss:		0.086689
  validation loss:		0.103080
  validation accuracy:		97.20 %
Epoch 20 of 50 took 28.265s
  training loss:		0.081909
  validation loss:		0.101147
  validation accuracy:		97.22 %
Epoch 21 of 50 took 28.209s
  training loss:		0.077810
  validation loss:		0.098297
  validation accuracy:		97.35 %
Epoch 22 of 50 took 28.014s
  training loss:		0.073964
  validation loss:		0.095145
  validation accuracy:		97.43 %
Epoch 23 of 50 took 28.192s
  training loss:		0.070395
  validation loss:		0.095942
  validation accuracy:		97.41 %
Epoch 24 of 50 took 28.353s
  training loss:		0.066847
  validation loss:		0.091586
  validation accuracy:		97.47 %
Epoch 25 of 50 took 28.113s
  training loss:		0.063973
  validation loss:		0.091372
  validation accuracy:		97.37 %
Epoch 26 of 50 took 28.139s
  training loss:		0.061067
  validation loss:		0.088235
  validation accuracy:		97.55 %
Epoch 27 of 50 took 28.088s
  training loss:		0.058500
  validation loss:		0.087541
  validation accuracy:		97.52 %
Epoch 28 of 50 took 28.230s
  training loss:		0.055777
  validation loss:		0.086228
  validation accuracy:		97.58 %
Epoch 29 of 50 took 28.197s
  training loss:		0.053074
  validation loss:		0.085234
  validation accuracy:		97.62 %
Epoch 30 of 50 took 28.103s
  training loss:		0.050567
  validation loss:		0.084073
  validation accuracy:		97.65 %
Epoch 31 of 50 took 28.139s
  training loss:		0.048737
  validation loss:		0.085353
  validation accuracy:		97.60 %
Epoch 32 of 50 took 28.151s
  training loss:		0.046421
  validation loss:		0.082716
  validation accuracy:		97.66 %
Epoch 33 of 50 took 28.321s
  training loss:		0.044593
  validation loss:		0.080517
  validation accuracy:		97.75 %
Epoch 34 of 50 took 28.284s
  training loss:		0.042451
  validation loss:		0.080102
  validation accuracy:		97.70 %
Epoch 35 of 50 took 28.397s
  training loss:		0.040810
  validation loss:		0.079416
  validation accuracy:		97.69 %
Epoch 36 of 50 took 4889.192s
  training loss:		0.039078
  validation loss:		0.078870
  validation accuracy:		97.80 %
Epoch 37 of 50 took 12.135s
  training loss:		0.037451
  validation loss:		0.077587
  validation accuracy:		97.87 %
Epoch 38 of 50 took 11.255s
  training loss:		0.035854
  validation loss:		0.076390
  validation accuracy:		97.78 %
Epoch 39 of 50 took 10.716s
  training loss:		0.034452
  validation loss:		0.076579
  validation accuracy:		97.78 %
Epoch 40 of 50 took 10.935s
  training loss:		0.033176
  validation loss:		0.076523
  validation accuracy:		97.74 %
Epoch 41 of 50 took 10.701s
  training loss:		0.031940
  validation loss:		0.076172
  validation accuracy:		97.85 %
Epoch 42 of 50 took 10.690s
  training loss:		0.030412
  validation loss:		0.076182
  validation accuracy:		97.85 %
Epoch 43 of 50 took 10.742s
  training loss:		0.029133
  validation loss:		0.074855
  validation accuracy:		97.85 %
Epoch 44 of 50 took 10.863s
  training loss:		0.027963
  validation loss:		0.074856
  validation accuracy:		97.82 %
Epoch 45 of 50 took 10.880s
  training loss:		0.026809
  validation loss:		0.073697
  validation accuracy:		97.84 %
Epoch 46 of 50 took 10.842s
  training loss:		0.026144
  validation loss:		0.073049
  validation accuracy:		97.85 %
Epoch 47 of 50 took 10.719s
  training loss:		0.024812
  validation loss:		0.074120
  validation accuracy:		97.87 %
Epoch 48 of 50 took 10.673s
  training loss:		0.023928
  validation loss:		0.072541
  validation accuracy:		97.90 %
Epoch 49 of 50 took 10.701s
  training loss:		0.022997
  validation loss:		0.072175
  validation accuracy:		97.84 %
Epoch 50 of 50 took 10.789s
  training loss:		0.022588
  validation loss:		0.072812
  validation accuracy:		97.92 %
Final results:
  test loss:			0.067908
  test accuracy:		97.91 %

50 epochs MLP with dropout: 97.88% test accuracy

50 epochs MLP with dropout: 97.94% test accuracy

50 epochs CNN: 98.83% test accuracy

In [10]: