Multilayer Perceptron (MLP) for handwritten digit recognition

  • Author: Johannes Maucher
  • Last Update: 10.11.2015

This example demonstrates the application of a MLP classifier to recognize handwritten digits between 0 and 9. Each handwritten digit is represented as a $8 \times 8$-greyscale image of 4 Bit depth. An image displaying digit $i$ is labeled by the class index $i$ with $i \in \lbrace 0,1,2,\ldots,9\rbrace$. The entire dataset contains 1797 labeled images. This dataset is often applied as a benchmark for evaluating and comparing machine learning algorithms. The dataset is available from different sources. E.g. it is contained in the scikits-learn datasets directory.

In [1]:
%matplotlib inline
import numpy as np
import random
from matplotlib import pyplot as plt

Miscellaneous functions

In [2]:
def sigmoid(z):
    """The sigmoid function."""
    return 1.0/(1.0+np.exp(-z))

def sigmoid_prime(z):
    """Derivative of the sigmoid function."""
    return sigmoid(z)*(1-sigmoid(z))
def oneToMany(j,d=10):
    """Return a d-dimensional unit vector with a 1.0 in the jth
    position and zeroes elsewhere.  This is used to convert a digit
    (0...9) into a corresponding desired output from the neural
    network."""
    e = np.zeros((d, 1))
    e[j] = 1.0
    return e

Class MLP

This class implements a Multilayer Perceptron (MLP) with Stochastic Gradient Descent (SGD) learning. Gradients are calculated using backpropagation.

In [3]:
class MLP(object):

    def __init__(self, layerlist):
        """The list ``layerlist`` contains the number of neurons in the
        respective layers of the network.  For example, if the list
        was [2, 3, 1] then it would be a three-layer network, with the
        first layer containing 2 neurons, the second layer 3 neurons,
        and the third layer 1 neuron.  The biases and weights for the
        network are initialized randomly, using a Gaussian
        distribution with mean 0, and variance 1.  Note that the first
        layer is assumed to be an input layer, and by convention we
        won't set any biases for those neurons, since biases are only
        ever used in computing the outputs from later layers."""
        
        self.num_layers = len(layerlist)
        self.layerlist = layerlist
        self.biases = [np.random.randn(y, 1) for y in layerlist[1:]]
        self.weights = [np.random.randn(y, x)
                        for x, y in zip(layerlist[:-1], layerlist[1:])]
        self.testCorrect=[]

    def feedforward(self, a):
        """Return the output of the network if ``a`` is input."""
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a)+b)
        return a

    def SGD(self, training_data, epochs, mini_batch_size, eta,
            test_data=None):
        """Train the neural network using mini-batch stochastic
        gradient descent.  The ``training_data`` is a list of tuples
        ``(x, y)`` representing the training inputs and the desired
        outputs.  The other non-optional parameters are
        self-explanatory.  If ``test_data`` is provided then the
        network will be evaluated against the test data after each
        epoch, and partial progress printed out.  This is useful for
        tracking progress, but slows things down substantially."""
        if test_data: n_test = len(test_data)
        n = len(training_data)
        for j in xrange(epochs):
            random.shuffle(training_data)
            mini_batches = [
                training_data[k:k+mini_batch_size]
                for k in xrange(0, n, mini_batch_size)]
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch, eta)
            if test_data:
                numCorrect=self.evaluate(test_data)
                self.testCorrect.append(numCorrect)
                print "Epoch {0}: {1} / {2}".format(j, numCorrect, n_test)     
            else:
                print "Epoch {0} complete".format(j)

    def update_mini_batch(self, mini_batch, eta):
        """Update the network's weights and biases by applying
        gradient descent using backpropagation to a single mini batch.
        The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
        is the learning rate."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        self.weights = [w-(eta/len(mini_batch))*nw
                        for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(eta/len(mini_batch))*nb
                       for b, nb in zip(self.biases, nabla_b)]

    def backprop(self, x, y):
        """Return a tuple ``(nabla_b, nabla_w)`` representing the
        gradient for the cost function C_x.  ``nabla_b`` and
        ``nabla_w`` are layer-by-layer lists of numpy arrays, similar
        to ``self.biases`` and ``self.weights``."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        # feedforward
        activation = x
        activations = [x] # list to store all the activations, layer by layer
        zs = [] # list to store all the z vectors, layer by layer
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation)+b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
        # backward pass
        delta = self.cost_derivative(activations[-1], y) * \
            sigmoid_prime(zs[-1])
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        # Note that the variable l in the loop below is used a little
        # differently to the notation in Chapter 2 of the book.  Here,
        # l = 1 means the last layer of neurons, l = 2 is the
        # second-last layer, and so on.  It's a renumbering of the
        # scheme in the book, used here to take advantage of the fact
        # that Python can use negative indices in lists.
        for l in xrange(2, self.num_layers):
            z = zs[-l]
            sp = sigmoid_prime(z)
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
        return (nabla_b, nabla_w)

    def evaluate(self, test_data):
        """Return the number of test inputs for which the neural
        network outputs the correct result. Note that the neural
        network's output is assumed to be the index of whichever
        neuron in the final layer has the highest activation."""
        test_results = [(np.argmax(self.feedforward(x)), y)
                        for (x, y) in test_data]
        return sum(int(x == y) for (x, y) in test_results)

    def cost_derivative(self, output_activations, y):
        """Return the vector of partial derivatives \partial C_x /
        \partial a for the output activations."""
        return (output_activations-y)

Load labeled data

The image handwritten digits dataset is loaded from the scikits-learn datasets directory. The load_digits() function returns a so called Bunch, which contains 2 numpy arrays - the images and the corresponding labeles:

In [4]:
from sklearn import datasets
digits = datasets.load_digits()
n_samples = len(digits.images)
print type(digits)
print type(digits.images)
print type(digits.target)
print "Number of labeled images: ",n_samples
training_data=digits
print digits.images[0]
print digits.target[0]
<class 'sklearn.datasets.base.Bunch'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
Number of labeled images:  1797
[[  0.   0.   5.  13.   9.   1.   0.   0.]
 [  0.   0.  13.  15.  10.  15.   5.   0.]
 [  0.   3.  15.   2.   0.  11.   8.   0.]
 [  0.   4.  12.   0.   0.   8.   8.   0.]
 [  0.   5.   8.   0.   0.   9.   8.   0.]
 [  0.   4.  11.   0.   1.  12.   7.   0.]
 [  0.   2.  14.   5.  10.  12.   0.   0.]
 [  0.   0.   6.  13.  10.   0.   0.   0.]]
0

In order to understand the representation of the digits the first 4 images are dispayed in a matplotlib-figure. Moreover, the contents of the first image are printed. Each image is a $8 \times 8$-numpy array with integer entries between $0$ (white) and $15$ (black).

In [5]:
plt.figure(figsize=(12, 10))
NIMAGES=4
for index in range(NIMAGES):
    plt.subplot(1,NIMAGES, index+1)
    plt.imshow(digits.images[index,:], cmap=plt.cm.gray_r)
    plt.title('Training sample of class: %i' % digits.target[index])

Split labeled dataset into training- and test-partition

In [6]:
NUMTRAIN=1000
training_inputs = [np.reshape(x, (64, 1)) for x in digits.images[:NUMTRAIN]]
training_results = [oneToMany(y) for y in digits.target[:NUMTRAIN]]
training_data = zip(training_inputs, training_results)
#test_data = zip(test_inputs, test_results)
test_inputs = [np.reshape(x, (64, 1)) for x in digits.images[NUMTRAIN:]]
test_data = zip(test_inputs, digits.target[NUMTRAIN:])

Generate, configure, train and test MLP

In [12]:
net = MLP([64, 40, 10])
net.SGD(training_data, 200, 10, 1.0,test_data)
if len(net.testCorrect)>0:
    plt.figure(figsize=(12,10))
    plt.plot(range(len(net.testCorrect)),net.testCorrect)
    plt.title("Number of correct classifications")
    plt.xlabel("epoch")
    plt.hold(True)
    plt.plot([0,len(net.testCorrect)],[n_samples-NUMTRAIN,n_samples-NUMTRAIN])
    plt.show()
Epoch 0: 310 / 797
Epoch 1: 407 / 797
Epoch 2: 463 / 797
Epoch 3: 526 / 797
Epoch 4: 527 / 797
Epoch 5: 533 / 797
Epoch 6: 575 / 797
Epoch 7: 557 / 797
Epoch 8: 583 / 797
Epoch 9: 580 / 797
Epoch 10: 597 / 797
Epoch 11: 606 / 797
Epoch 12: 610 / 797
Epoch 13: 628 / 797
Epoch 14: 627 / 797
Epoch 15: 614 / 797
Epoch 16: 608 / 797
Epoch 17: 610 / 797
Epoch 18: 635 / 797
Epoch 19: 634 / 797
Epoch 20: 628 / 797
Epoch 21: 640 / 797
Epoch 22: 633 / 797
Epoch 23: 636 / 797
Epoch 24: 632 / 797
Epoch 25: 622 / 797
Epoch 26: 638 / 797
Epoch 27: 641 / 797
Epoch 28: 628 / 797
Epoch 29: 637 / 797
Epoch 30: 640 / 797
Epoch 31: 630 / 797
Epoch 32: 640 / 797
Epoch 33: 636 / 797
Epoch 34: 645 / 797
Epoch 35: 645 / 797
Epoch 36: 647 / 797
Epoch 37: 653 / 797
Epoch 38: 642 / 797
Epoch 39: 646 / 797
Epoch 40: 647 / 797
Epoch 41: 641 / 797
Epoch 42: 643 / 797
Epoch 43: 649 / 797
Epoch 44: 644 / 797
Epoch 45: 639 / 797
Epoch 46: 642 / 797
Epoch 47: 695 / 797
Epoch 48: 700 / 797
Epoch 49: 700 / 797
Epoch 50: 708 / 797
Epoch 51: 703 / 797
Epoch 52: 712 / 797
Epoch 53: 703 / 797
Epoch 54: 704 / 797
Epoch 55: 706 / 797
Epoch 56: 710 / 797
Epoch 57: 714 / 797
Epoch 58: 714 / 797
Epoch 59: 712 / 797
Epoch 60: 707 / 797
Epoch 61: 710 / 797
Epoch 62: 715 / 797
Epoch 63: 711 / 797
Epoch 64: 706 / 797
Epoch 65: 705 / 797
Epoch 66: 707 / 797
Epoch 67: 710 / 797
Epoch 68: 701 / 797
Epoch 69: 714 / 797
Epoch 70: 703 / 797
Epoch 71: 709 / 797
Epoch 72: 707 / 797
Epoch 73: 708 / 797
Epoch 74: 711 / 797
Epoch 75: 707 / 797
Epoch 76: 713 / 797
Epoch 77: 710 / 797
Epoch 78: 718 / 797
Epoch 79: 709 / 797
Epoch 80: 709 / 797
Epoch 81: 718 / 797
Epoch 82: 713 / 797
Epoch 83: 713 / 797
Epoch 84: 714 / 797
Epoch 85: 708 / 797
Epoch 86: 710 / 797
Epoch 87: 711 / 797
Epoch 88: 712 / 797
Epoch 89: 713 / 797
Epoch 90: 712 / 797
Epoch 91: 713 / 797
Epoch 92: 713 / 797
Epoch 93: 711 / 797
Epoch 94: 713 / 797
Epoch 95: 713 / 797
Epoch 96: 713 / 797
Epoch 97: 710 / 797
Epoch 98: 708 / 797
Epoch 99: 713 / 797
Epoch 100: 713 / 797
Epoch 101: 715 / 797
Epoch 102: 713 / 797
Epoch 103: 715 / 797
Epoch 104: 715 / 797
Epoch 105: 723 / 797
Epoch 106: 710 / 797
Epoch 107: 718 / 797
Epoch 108: 713 / 797
Epoch 109: 707 / 797
Epoch 110: 718 / 797
Epoch 111: 717 / 797
Epoch 112: 716 / 797
Epoch 113: 711 / 797
Epoch 114: 718 / 797
Epoch 115: 716 / 797
Epoch 116: 714 / 797
Epoch 117: 712 / 797
Epoch 118: 712 / 797
Epoch 119: 714 / 797
Epoch 120: 716 / 797
Epoch 121: 714 / 797
Epoch 122: 715 / 797
Epoch 123: 714 / 797
Epoch 124: 714 / 797
Epoch 125: 716 / 797
Epoch 126: 715 / 797
Epoch 127: 714 / 797
Epoch 128: 713 / 797
Epoch 129: 715 / 797
Epoch 130: 715 / 797
Epoch 131: 713 / 797
Epoch 132: 712 / 797
Epoch 133: 714 / 797
Epoch 134: 714 / 797
Epoch 135: 714 / 797
Epoch 136: 714 / 797
Epoch 137: 715 / 797
Epoch 138: 714 / 797
Epoch 139: 716 / 797
Epoch 140: 716 / 797
Epoch 141: 715 / 797
Epoch 142: 713 / 797
Epoch 143: 716 / 797
Epoch 144: 717 / 797
Epoch 145: 713 / 797
Epoch 146: 715 / 797
Epoch 147: 714 / 797
Epoch 148: 714 / 797
Epoch 149: 714 / 797
Epoch 150: 715 / 797
Epoch 151: 714 / 797
Epoch 152: 714 / 797
Epoch 153: 715 / 797
Epoch 154: 715 / 797
Epoch 155: 714 / 797
Epoch 156: 714 / 797
Epoch 157: 712 / 797
Epoch 158: 713 / 797
Epoch 159: 714 / 797
Epoch 160: 715 / 797
Epoch 161: 714 / 797
Epoch 162: 714 / 797
Epoch 163: 713 / 797
Epoch 164: 715 / 797
Epoch 165: 712 / 797
Epoch 166: 714 / 797
Epoch 167: 714 / 797
Epoch 168: 713 / 797
Epoch 169: 713 / 797
Epoch 170: 714 / 797
Epoch 171: 712 / 797
Epoch 172: 713 / 797
Epoch 173: 714 / 797
Epoch 174: 714 / 797
Epoch 175: 714 / 797
Epoch 176: 714 / 797
Epoch 177: 713 / 797
Epoch 178: 712 / 797
Epoch 179: 712 / 797
Epoch 180: 711 / 797
Epoch 181: 712 / 797
Epoch 182: 712 / 797
Epoch 183: 712 / 797
Epoch 184: 712 / 797
Epoch 185: 712 / 797
Epoch 186: 713 / 797
Epoch 187: 711 / 797
Epoch 188: 712 / 797
Epoch 189: 715 / 797
Epoch 190: 714 / 797
Epoch 191: 714 / 797
Epoch 192: 712 / 797
Epoch 193: 713 / 797
Epoch 194: 713 / 797
Epoch 195: 713 / 797
Epoch 196: 713 / 797
Epoch 197: 712 / 797
Epoch 198: 712 / 797
Epoch 199: 712 / 797
In [13]:
print "Accuracy: ",net.testCorrect[-1]/float((n_samples-NUMTRAIN))
Accuracy:  0.893350062735