Notebook

Exercise: Training a digit-classification neural network on the MNIST dataset using Keras¶

This example is from Stefan Wunsch (CERN IML TensoFlow and Keras workshop). See also the example on the Keras website.

The MNIST dataset is one of the most popular benchmark-datasets in modern machine learning. The dataset consists of 70000 images of handwritten digits and associated labels, which can be used to train neural network performing image classification.

The following program presents the basic workflow of Keras showing the most import details of the API.

In [2]:

from os import environ
environ["KERAS_BACKEND"] = "tensorflow"

import numpy as np
np.random.seed(1234)

import matplotlib.pyplot as plt

Download the dataset¶

The code below downloads the dataset and performs a scaling of the pixel-values of the images. Because the images are encoded with 8-bit unsigned int values, we scale these values to floating-point values in the range [0, 1) so that the inputs match the activation of the neurons better.

In [3]:

from keras.datasets import mnist
from keras.utils.np_utils import to_categorical

# Download dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# The data is loaded as flat array with 784 entries (28x28),
# we need to reshape it into an array with shape:
# (num_images, pixels_row, pixels_column, color channels)
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

# Convert the uint8 PNG greyscale pixel values in range [0, 255]
# to floats in range [0, 1]
x_train = x_train.astype("float32")
x_test = x_test.astype("float32")
x_train /= 255
x_test /= 255

# Convert digits to one-hot vectors, e.g.,
# 2 -> [0 0 1 0 0 0 0 0 0 0]
# 0 -> [1 0 0 0 0 0 0 0 0 0]
# 9 -> [0 0 0 0 0 0 0 0 0 1]
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

Addtionally, we store some example images to disk to show later on the inference part of the Keras API.

In [4]:

import png

num_examples = 6
# offset = 100
offset = 0

plt.figure(figsize=(num_examples*2, 2))
for i in range(num_examples):
    plt.subplot(1, num_examples, i+1)
    plt.axis('off')
    example = np.squeeze(np.array(x_test[offset+i]*255).astype("uint8"))
    plt.imshow(example, cmap="gray")
    w = png.Writer(28, 28, greyscale=True)
    w.write(open("mnist_example_{}.png".format(i+1), 'wb'), example)

In [5]:

from keras.models import Sequential
from keras.layers import Dense, Flatten, MaxPooling2D, Conv2D

# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

Define the model¶

The model definition in Keras can be done using the Sequential or the functional API. Shown here is the Sequential API allowing to stack neural network layers on top of each other, which is feasible for most neural network models. In contrast, the functional API would allow to have multiple inputs and outputs for a maximum of flexibility to build your custom model.

In [6]:

from keras.models import Sequential
from keras.layers import Dense, Flatten, MaxPooling2D, Conv2D, Input, Dropout

# conv layer with 8 3x3 filters

model = Sequential(
    [
        Input(shape=input_shape),
        Conv2D(8, kernel_size=(3, 3), activation="relu"),
        MaxPooling2D(pool_size=(2, 2)),
        Flatten(),
        Dense(16, activation="relu"),
        Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 8)         80        
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 8)         0         
_________________________________________________________________
flatten (Flatten)            (None, 1352)              0         
_________________________________________________________________
dense (Dense)                (None, 16)                21648     
_________________________________________________________________
dense_1 (Dense)              (None, 10)                170       
=================================================================
Total params: 21,898
Trainable params: 21,898
Non-trainable params: 0
_________________________________________________________________

Compile the model¶

Using Keras, you have to compile a model, which means adding the loss function, the optimizer algorithm and validation metrics to your training setup.

In [7]:

model.compile(loss="categorical_crossentropy",
        optimizer="adam",
        metrics=["accuracy"])

Train the model¶

The cell below shows the training procedure of Keras using the model.fit(...) method. Besides typical options such as batch_size and epochs, which control the number of gradient steps of your training, Keras allows to use callbacks during training.

Callbacks are methods, which are called during training to perform tasks such as saving checkpoints of the model (ModelCheckpoint) or stop the training early if a convergence criteria is met (EarlyStopping).

In [8]:

from keras.callbacks import ModelCheckpoint, EarlyStopping

checkpoint = ModelCheckpoint(
            filepath="mnist_keras_model.h5",
            save_best_only=True,
            verbose=1)
early_stopping = EarlyStopping(patience=2)

history = model.fit(x_train, y_train, # Training data
            batch_size=200, # Batch size
            epochs=50, # Maximum number of training epochs
            validation_split=0.5, # Use 50% of the train dataset for validation
            callbacks=[checkpoint, early_stopping]) # Register callbacks

Epoch 1/50
150/150 [==============================] - 8s 48ms/step - loss: 1.4121 - accuracy: 0.5809 - val_loss: 0.3696 - val_accuracy: 0.8939

Epoch 00001: val_loss improved from inf to 0.36958, saving model to mnist_keras_model.h5
Epoch 2/50
150/150 [==============================] - 6s 40ms/step - loss: 0.3357 - accuracy: 0.9037 - val_loss: 0.2702 - val_accuracy: 0.9197

Epoch 00002: val_loss improved from 0.36958 to 0.27020, saving model to mnist_keras_model.h5
Epoch 3/50
150/150 [==============================] - 6s 40ms/step - loss: 0.2423 - accuracy: 0.9298 - val_loss: 0.2242 - val_accuracy: 0.9338

Epoch 00003: val_loss improved from 0.27020 to 0.22415, saving model to mnist_keras_model.h5
Epoch 4/50
150/150 [==============================] - 6s 40ms/step - loss: 0.2000 - accuracy: 0.9427 - val_loss: 0.1918 - val_accuracy: 0.9428

Epoch 00004: val_loss improved from 0.22415 to 0.19176, saving model to mnist_keras_model.h5
Epoch 5/50
150/150 [==============================] - 6s 39ms/step - loss: 0.1729 - accuracy: 0.9496 - val_loss: 0.1687 - val_accuracy: 0.9514

Epoch 00005: val_loss improved from 0.19176 to 0.16871, saving model to mnist_keras_model.h5
Epoch 6/50
150/150 [==============================] - 6s 40ms/step - loss: 0.1395 - accuracy: 0.9604 - val_loss: 0.1534 - val_accuracy: 0.9553

Epoch 00006: val_loss improved from 0.16871 to 0.15342, saving model to mnist_keras_model.h5
Epoch 7/50
150/150 [==============================] - 6s 39ms/step - loss: 0.1213 - accuracy: 0.9654 - val_loss: 0.1428 - val_accuracy: 0.9582

Epoch 00007: val_loss improved from 0.15342 to 0.14284, saving model to mnist_keras_model.h5
Epoch 8/50
150/150 [==============================] - 6s 39ms/step - loss: 0.1066 - accuracy: 0.9697 - val_loss: 0.1336 - val_accuracy: 0.9603

Epoch 00008: val_loss improved from 0.14284 to 0.13360, saving model to mnist_keras_model.h5
Epoch 9/50
150/150 [==============================] - 6s 39ms/step - loss: 0.0996 - accuracy: 0.9722 - val_loss: 0.1218 - val_accuracy: 0.9641

Epoch 00009: val_loss improved from 0.13360 to 0.12185, saving model to mnist_keras_model.h5
Epoch 10/50
150/150 [==============================] - 7s 44ms/step - loss: 0.0940 - accuracy: 0.9742 - val_loss: 0.1258 - val_accuracy: 0.9618

Epoch 00010: val_loss did not improve from 0.12185
Epoch 11/50
150/150 [==============================] - 6s 41ms/step - loss: 0.0828 - accuracy: 0.9754 - val_loss: 0.1106 - val_accuracy: 0.9667

Epoch 00011: val_loss improved from 0.12185 to 0.11065, saving model to mnist_keras_model.h5
Epoch 12/50
150/150 [==============================] - 6s 40ms/step - loss: 0.0765 - accuracy: 0.9782 - val_loss: 0.1087 - val_accuracy: 0.9673

Epoch 00012: val_loss improved from 0.11065 to 0.10867, saving model to mnist_keras_model.h5
Epoch 13/50
150/150 [==============================] - 6s 40ms/step - loss: 0.0719 - accuracy: 0.9796 - val_loss: 0.1098 - val_accuracy: 0.9668

Epoch 00013: val_loss did not improve from 0.10867
Epoch 14/50
150/150 [==============================] - 6s 42ms/step - loss: 0.0664 - accuracy: 0.9812 - val_loss: 0.0996 - val_accuracy: 0.9705

Epoch 00014: val_loss improved from 0.10867 to 0.09956, saving model to mnist_keras_model.h5
Epoch 15/50
150/150 [==============================] - 5s 36ms/step - loss: 0.0604 - accuracy: 0.9818 - val_loss: 0.0972 - val_accuracy: 0.9705

Epoch 00015: val_loss improved from 0.09956 to 0.09724, saving model to mnist_keras_model.h5
Epoch 16/50
150/150 [==============================] - 5s 36ms/step - loss: 0.0589 - accuracy: 0.9834 - val_loss: 0.0996 - val_accuracy: 0.9704

Epoch 00016: val_loss did not improve from 0.09724
Epoch 17/50
150/150 [==============================] - 5s 35ms/step - loss: 0.0567 - accuracy: 0.9849 - val_loss: 0.0961 - val_accuracy: 0.9713

Epoch 00017: val_loss improved from 0.09724 to 0.09610, saving model to mnist_keras_model.h5
Epoch 18/50
150/150 [==============================] - 5s 36ms/step - loss: 0.0493 - accuracy: 0.9858 - val_loss: 0.0961 - val_accuracy: 0.9713

Epoch 00018: val_loss did not improve from 0.09610
Epoch 19/50
150/150 [==============================] - 5s 36ms/step - loss: 0.0466 - accuracy: 0.9868 - val_loss: 0.0943 - val_accuracy: 0.9722

Epoch 00019: val_loss improved from 0.09610 to 0.09427, saving model to mnist_keras_model.h5
Epoch 20/50
150/150 [==============================] - 6s 40ms/step - loss: 0.0453 - accuracy: 0.9866 - val_loss: 0.0931 - val_accuracy: 0.9722

Epoch 00020: val_loss improved from 0.09427 to 0.09314, saving model to mnist_keras_model.h5
Epoch 21/50
150/150 [==============================] - 6s 40ms/step - loss: 0.0437 - accuracy: 0.9879 - val_loss: 0.0903 - val_accuracy: 0.9734

Epoch 00021: val_loss improved from 0.09314 to 0.09031, saving model to mnist_keras_model.h5
Epoch 22/50
150/150 [==============================] - 6s 38ms/step - loss: 0.0398 - accuracy: 0.9885 - val_loss: 0.0878 - val_accuracy: 0.9744

Epoch 00022: val_loss improved from 0.09031 to 0.08782, saving model to mnist_keras_model.h5
Epoch 23/50
150/150 [==============================] - 6s 38ms/step - loss: 0.0379 - accuracy: 0.9904 - val_loss: 0.0909 - val_accuracy: 0.9739

Epoch 00023: val_loss did not improve from 0.08782
Epoch 24/50
150/150 [==============================] - 6s 41ms/step - loss: 0.0369 - accuracy: 0.9897 - val_loss: 0.0899 - val_accuracy: 0.9733

Epoch 00024: val_loss did not improve from 0.08782

a) Plot training and validation loss as well as training and validation accurace as a function of the number of epochs¶

In [9]:

# hint: use
# history.history["loss"]
# history.history["val_loss"]
# history.history["accuracy"]
# history.history["val_accuracy"]

epochs = range(1, len(history.history["loss"])+1)
plt.figure(figsize=(12,5))
plt.subplot(1, 2, 1)
plt.plot(epochs, history.history["loss"], label="Training loss")
plt.plot(epochs, history.history["val_loss"], label="Validation loss")
plt.legend(fontsize=15), plt.xlabel("Epochs", fontsize=15), plt.ylabel("Loss", fontsize=15)
plt.subplot(1, 2, 2)
plt.plot(epochs, history.history["accuracy"], label="Training accuracy")
plt.plot(epochs, history.history["val_accuracy"], label="Validation accuracy")
plt.legend(fontsize=15), plt.xlabel("Epochs", fontsize=15), plt.ylabel("Accuracy", fontsize=15);

b) Determine the accuracy of the fully trained model¶

The prediction of unseen data is performed using the model.predict(inputs) call. Below, a basic test of the model is done by calculating the accuracy on the test dataset.

In [10]:

### Your code here ###

# Get predictions on test dataset
y_pred = model.predict(x_test)

# Compare predictions with ground truth
test_accuracy = np.sum(
        np.argmax(y_test, axis=1)==np.argmax(y_pred, axis=1))/float(x_test.shape[0])

print("Test accuracy: {}".format(test_accuracy))

Test accuracy: 0.9758

In [ ]: