- 🤖 See full list of Machine Learning Experiments on GitHub
- ▶️ Interactive Demo: try this model and other machine learning experiments in action
In this experiment we will build a Convolutional Neural Network (CNN) model using Tensorflow to recognize handwritten digits.
A convolutional neural network (CNN, or ConvNet) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other.
# Selecting Tensorflow version v2 (the command is relevant for Colab only).
%tensorflow_version 2.x
import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sn
import numpy as np
import pandas as pd
import math
import datetime
import platform
print('Python version:', platform.python_version())
print('Tensorflow version:', tf.__version__)
print('Keras version:', tf.keras.__version__)
Python version: 3.7.6 Tensorflow version: 2.1.0 Keras version: 2.2.4-tf
We will use Tensorboard to debug the model later.
# Load the TensorBoard notebook extension.
# %reload_ext tensorboard
%load_ext tensorboard
# Clear any logs from previous runs.
!rm -rf ./.logs/
The training dataset consists of 60000 28x28px images of hand-written digits from 0
to 9
.
The test dataset consists of 10000 28x28px images.
mnist_dataset = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist_dataset.load_data()
print('x_train:', x_train.shape)
print('y_train:', y_train.shape)
print('x_test:', x_test.shape)
print('y_test:', y_test.shape)
x_train: (60000, 28, 28) y_train: (60000,) x_test: (10000, 28, 28) y_test: (10000,)
# Save image parameters to the constants that we will use later for data re-shaping and for model traning.
(_, IMAGE_WIDTH, IMAGE_HEIGHT) = x_train.shape
IMAGE_CHANNELS = 1
print('IMAGE_WIDTH:', IMAGE_WIDTH);
print('IMAGE_HEIGHT:', IMAGE_HEIGHT);
print('IMAGE_CHANNELS:', IMAGE_CHANNELS);
IMAGE_WIDTH: 28 IMAGE_HEIGHT: 28 IMAGE_CHANNELS: 1
Here is how each image in the dataset looks like. It is a 28x28 matrix of integers (from 0
to 255
). Each integer represents a color of a pixel.
pd.DataFrame(x_train[0])
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 175 | 26 | 166 | 255 | 247 | 127 | 0 | 0 | 0 | 0 |
6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 36 | ... | 225 | 172 | 253 | 242 | 195 | 64 | 0 | 0 | 0 | 0 |
7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 49 | 238 | 253 | ... | 93 | 82 | 82 | 56 | 39 | 0 | 0 | 0 | 0 | 0 |
8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 18 | 219 | 253 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 156 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 150 | 27 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 253 | 187 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 253 | 249 | 64 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 253 | 207 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
19 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 250 | 182 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 78 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 23 | 66 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
22 | 0 | 0 | 0 | 0 | 0 | 0 | 18 | 171 | 219 | 253 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23 | 0 | 0 | 0 | 0 | 55 | 172 | 226 | 253 | 253 | 253 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
24 | 0 | 0 | 0 | 0 | 136 | 253 | 253 | 253 | 212 | 135 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
27 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
28 rows × 28 columns
This matrix of numbers may be drawn as follows:
plt.imshow(x_train[0], cmap=plt.cm.binary)
plt.show()
Let's print some more training examples to get the feeling of how the digits were written.
numbers_to_display = 25
num_cells = math.ceil(math.sqrt(numbers_to_display))
plt.figure(figsize=(10,10))
for i in range(numbers_to_display):
plt.subplot(num_cells, num_cells, i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(x_train[i], cmap=plt.cm.binary)
plt.xlabel(y_train[i])
plt.show()
In order to use convolution layers we need to reshape our data and add a color channel to it. As you've noticed currently every digit has a shape of (28, 28)
which means that it is a 28x28 matrix of color values form 0
to 255
. We need to reshape it to (28, 28, 1)
shape so that each pixel potentially may have multiple channels (like Red, Green and Blue).
x_train_with_chanels = x_train.reshape(
x_train.shape[0],
IMAGE_WIDTH,
IMAGE_HEIGHT,
IMAGE_CHANNELS
)
x_test_with_chanels = x_test.reshape(
x_test.shape[0],
IMAGE_WIDTH,
IMAGE_HEIGHT,
IMAGE_CHANNELS
)
print('x_train_with_chanels:', x_train_with_chanels.shape)
print('x_test_with_chanels:', x_test_with_chanels.shape)
x_train_with_chanels: (60000, 28, 28, 1) x_test_with_chanels: (10000, 28, 28, 1)
Here we're just trying to move from values range of [0...255]
to [0...1]
.
x_train_normalized = x_train_with_chanels / 255
x_test_normalized = x_test_with_chanels / 255
# Let's check just one row from the 0th image to see color chanel values after normalization.
x_train_normalized[0][18]
array([[0. ], [0. ], [0. ], [0. ], [0. ], [0. ], [0. ], [0. ], [0. ], [0. ], [0. ], [0. ], [0. ], [0. ], [0.18039216], [0.50980392], [0.71764706], [0.99215686], [0.99215686], [0.81176471], [0.00784314], [0. ], [0. ], [0. ], [0. ], [0. ], [0. ], [0. ]])
We will use Sequential Keras model.
Then we will have two pairs of Convolution2D and MaxPooling2D layers. The MaxPooling layer acts as a sort of downsampling using max values in a region instead of averaging.
After that we will use Flatten layer to convert multidimensional parameters to vector.
The las layer will be a Dense layer with 10
Softmax outputs. The output represents the network guess. The 0-th output represents a probability that the input digit is 0
, the 1-st output represents a probability that the input digit is 1
and so on...
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Convolution2D(
input_shape=(IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS),
kernel_size=5,
filters=8,
strides=1,
activation=tf.keras.activations.relu,
kernel_initializer=tf.keras.initializers.VarianceScaling()
))
model.add(tf.keras.layers.MaxPooling2D(
pool_size=(2, 2),
strides=(2, 2)
))
model.add(tf.keras.layers.Convolution2D(
kernel_size=5,
filters=16,
strides=1,
activation=tf.keras.activations.relu,
kernel_initializer=tf.keras.initializers.VarianceScaling()
))
model.add(tf.keras.layers.MaxPooling2D(
pool_size=(2, 2),
strides=(2, 2)
))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(
units=128,
activation=tf.keras.activations.relu
));
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(
units=10,
activation=tf.keras.activations.softmax,
kernel_initializer=tf.keras.initializers.VarianceScaling()
))
Here is our model summary so far.
model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 24, 24, 8) 208 _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 12, 12, 8) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 8, 8, 16) 3216 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 4, 4, 16) 0 _________________________________________________________________ flatten (Flatten) (None, 256) 0 _________________________________________________________________ dense (Dense) (None, 128) 32896 _________________________________________________________________ dropout (Dropout) (None, 128) 0 _________________________________________________________________ dense_1 (Dense) (None, 10) 1290 ================================================================= Total params: 37,610 Trainable params: 37,610 Non-trainable params: 0 _________________________________________________________________
In order to plot the model the graphviz
should be installed. For Mac OS it may be installed using brew
like brew install graphviz
.
tf.keras.utils.plot_model(
model,
show_shapes=True,
show_layer_names=True,
)
adam_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(
optimizer=adam_optimizer,
loss=tf.keras.losses.sparse_categorical_crossentropy,
metrics=['accuracy']
)
log_dir=".logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
training_history = model.fit(
x_train_normalized,
y_train,
epochs=10,
validation_data=(x_test_normalized, y_test),
callbacks=[tensorboard_callback]
)
Train on 60000 samples, validate on 10000 samples Epoch 1/10 60000/60000 [==============================] - 30s 494us/sample - loss: 0.2223 - accuracy: 0.9326 - val_loss: 0.0550 - val_accuracy: 0.9830 Epoch 2/10 60000/60000 [==============================] - 32s 538us/sample - loss: 0.0720 - accuracy: 0.9779 - val_loss: 0.0394 - val_accuracy: 0.9871 Epoch 3/10 60000/60000 [==============================] - 29s 488us/sample - loss: 0.0519 - accuracy: 0.9834 - val_loss: 0.0362 - val_accuracy: 0.9890 Epoch 4/10 60000/60000 [==============================] - 30s 494us/sample - loss: 0.0422 - accuracy: 0.9867 - val_loss: 0.0315 - val_accuracy: 0.9892 Epoch 5/10 60000/60000 [==============================] - 29s 484us/sample - loss: 0.0349 - accuracy: 0.9893 - val_loss: 0.0291 - val_accuracy: 0.9891 Epoch 6/10 60000/60000 [==============================] - 30s 492us/sample - loss: 0.0287 - accuracy: 0.9908 - val_loss: 0.0335 - val_accuracy: 0.9901 Epoch 7/10 60000/60000 [==============================] - 31s 517us/sample - loss: 0.0257 - accuracy: 0.9917 - val_loss: 0.0337 - val_accuracy: 0.9902 Epoch 8/10 60000/60000 [==============================] - 32s 531us/sample - loss: 0.0233 - accuracy: 0.9923 - val_loss: 0.0244 - val_accuracy: 0.9908 Epoch 9/10 60000/60000 [==============================] - 31s 522us/sample - loss: 0.0211 - accuracy: 0.9934 - val_loss: 0.0363 - val_accuracy: 0.9893 Epoch 10/10 60000/60000 [==============================] - 34s 574us/sample - loss: 0.0192 - accuracy: 0.9941 - val_loss: 0.0286 - val_accuracy: 0.9916
Let's see how the loss function was changing during the training. We expect it to get smaller and smaller on every next epoch.
plt.xlabel('Epoch Number')
plt.ylabel('Loss')
plt.plot(training_history.history['loss'], label='training set')
plt.plot(training_history.history['val_loss'], label='test set')
plt.legend()
<matplotlib.legend.Legend at 0x1551d61d0>
plt.xlabel('Epoch Number')
plt.ylabel('Accuracy')
plt.plot(training_history.history['accuracy'], label='training set')
plt.plot(training_history.history['val_accuracy'], label='test set')
plt.legend()
<matplotlib.legend.Legend at 0x1551c77d0>
We need to compare the accuracy of our model on training set and on test set. We expect our model to perform similarly on both sets. If the performance on a test set will be poor comparing to a training set it would be an indicator for us that the model is overfitted and we have a "high variance" issue.
%%capture
train_loss, train_accuracy = model.evaluate(x_train_normalized, y_train)
print('Training loss: ', train_loss)
print('Training accuracy: ', train_accuracy)
Training loss: 0.00793841718863229 Training accuracy: 0.9974
%%capture
validation_loss, validation_accuracy = model.evaluate(x_test_normalized, y_test)
print('Validation loss: ', validation_loss)
print('Validation accuracy: ', validation_accuracy)
Validation loss: 0.028636791965250815 Validation accuracy: 0.9916
We will save the entire model to a HDF5
file. The .h5
extension of the file indicates that the model shuold be saved in Keras format as HDF5 file. To use this model on the front-end we will convert it (later in this notebook) to Javascript understandable format (tfjs_layers_model
with .json and .bin files) using tensorflowjs_converter as it is specified in the main README.
model_name = 'digits_recognition_cnn.h5'
model.save(model_name, save_format='h5')
loaded_model = tf.keras.models.load_model(model_name)
To use the model that we've just trained for digits recognition we need to call predict()
method.
predictions_one_hot = loaded_model.predict([x_test_normalized])
print('predictions_one_hot:', predictions_one_hot.shape)
predictions_one_hot: (10000, 10)
Each prediction consists of 10 probabilities (one for each number from 0
to 9
). We need to pick the digit with the highest probability since this would be a digit that our model most confident with.
# Predictions in form of one-hot vectors (arrays of probabilities).
pd.DataFrame(predictions_one_hot)
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2.174955e-15 | 2.053761e-08 | 1.115043e-10 | 1.815501e-11 | 4.382045e-09 | 1.084139e-11 | 7.955132e-17 | 1.000000e+00 | 2.871653e-11 | 5.481966e-11 |
1 | 8.536813e-10 | 1.098458e-06 | 9.999989e-01 | 2.133420e-10 | 1.870166e-11 | 3.222684e-15 | 2.280592e-08 | 2.037868e-10 | 9.249878e-11 | 2.877182e-15 |
2 | 7.979921e-12 | 9.999999e-01 | 1.698135e-09 | 1.035064e-13 | 3.937335e-08 | 1.830633e-08 | 1.814277e-09 | 3.928236e-08 | 8.205532e-08 | 1.508753e-11 |
3 | 9.997569e-01 | 1.507927e-10 | 3.036238e-08 | 7.496398e-11 | 1.072911e-10 | 3.716224e-07 | 2.426345e-04 | 4.577839e-09 | 1.174887e-08 | 9.708089e-08 |
4 | 2.499753e-09 | 1.776901e-11 | 7.716882e-12 | 1.750144e-14 | 9.988386e-01 | 1.832140e-09 | 1.151763e-08 | 1.740052e-10 | 1.392591e-09 | 1.161428e-03 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9995 | 1.009779e-14 | 1.738570e-09 | 9.999994e-01 | 1.041259e-10 | 5.272622e-16 | 2.269359e-19 | 2.778265e-16 | 5.792643e-07 | 4.054746e-12 | 5.626435e-18 |
9996 | 2.263675e-11 | 6.903200e-08 | 3.381161e-08 | 9.999995e-01 | 8.395995e-13 | 3.699991e-07 | 2.905099e-14 | 5.455035e-10 | 4.476401e-10 | 1.262823e-09 |
9997 | 9.689668e-24 | 5.208208e-12 | 3.422634e-17 | 4.813990e-21 | 1.000000e+00 | 6.983922e-16 | 6.201705e-17 | 2.069550e-11 | 2.440485e-11 | 1.458596e-10 |
9998 | 1.358465e-08 | 9.598834e-11 | 2.364747e-11 | 5.815844e-09 | 2.983105e-12 | 9.995547e-01 | 2.545367e-05 | 1.387848e-11 | 4.197330e-04 | 1.053049e-08 |
9999 | 9.229651e-12 | 5.644470e-12 | 4.068150e-12 | 1.636682e-17 | 2.020714e-12 | 9.393594e-12 | 1.000000e+00 | 3.027714e-21 | 7.427696e-13 | 6.800262e-16 |
10000 rows × 10 columns
# Let's extract predictions with highest probabilites and detect what digits have been actually recognized.
predictions = np.argmax(predictions_one_hot, axis=1)
pd.DataFrame(predictions)
0 | |
---|---|
0 | 7 |
1 | 2 |
2 | 1 |
3 | 0 |
4 | 4 |
... | ... |
9995 | 2 |
9996 | 3 |
9997 | 4 |
9998 | 5 |
9999 | 6 |
10000 rows × 1 columns
So our model is predicting that the first example from the test set is 7
.
print(predictions[0])
7
Let's print the first image from a test set to see if model's prediction is correct.
plt.imshow(x_test_normalized[0].reshape((IMAGE_WIDTH, IMAGE_HEIGHT)), cmap=plt.cm.binary)
plt.show()
We see that our model made a correct prediction and it successfully recognized digit 7
. Let's print some more test examples and correspondent predictions to see how model performs and where it does mistakes.
numbers_to_display = 196
num_cells = math.ceil(math.sqrt(numbers_to_display))
plt.figure(figsize=(15, 15))
for plot_index in range(numbers_to_display):
predicted_label = predictions[plot_index]
plt.xticks([])
plt.yticks([])
plt.grid(False)
color_map = 'Greens' if predicted_label == y_test[plot_index] else 'Reds'
plt.subplot(num_cells, num_cells, plot_index + 1)
plt.imshow(x_test_normalized[plot_index].reshape((IMAGE_WIDTH, IMAGE_HEIGHT)), cmap=color_map)
plt.xlabel(predicted_label)
plt.subplots_adjust(hspace=1, wspace=0.5)
plt.show()
Confusion matrix shows what numbers are recognized well by the model and what numbers the model usually confuses to recognize correctly. You may see that the model performs really well but sometimes (28 times out of 10000) it may confuse number 5
with 3
or number 2
with 3
.
confusion_matrix = tf.math.confusion_matrix(y_test, predictions)
f, ax = plt.subplots(figsize=(9, 7))
sn.heatmap(
confusion_matrix,
annot=True,
linewidths=.5,
fmt="d",
square=True,
ax=ax
)
plt.show()
TensorBoard is a tool for providing the measurements and visualizations needed during the machine learning workflow. It enables tracking experiment metrics like loss and accuracy, visualizing the model graph, projecting embeddings to a lower dimensional space, and much more.
%tensorboard --logdir .logs/fit
Reusing TensorBoard on port 6008 (pid 48630), started 1 day, 21:00:20 ago. (Use '!kill 48630' to kill it.)
To use this model on the web we need to convert it into the format that will be understandable by tensorflowjs. To do so we may use tfjs-converter as following:
tensorflowjs_converter --input_format keras \
./experiments/digits_recognition_cnn/digits_recognition_cnn.h5 \
./demos/public/models/digits_recognition_cnn
You find this experiment in the Demo app and play around with it right in you browser to see how the model performs in real life.