In this notebook we will cover the following topics:
One of the most important parts of the data science workflow is evaluating the performance of a trained model and deciding:
Let's load up Keras and train an overly simple model on the CIFAR10 data.
import numpy as np
np.warnings.filterwarnings('ignore') # Hide np.floating warning
import keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
# Prevent TensorFlow from grabbing all the GPU memory
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)
import holoviews as hv
hv.extension('bokeh')
Using TensorFlow backend.
Same data preparation as before.
(Pro tip: If this wasn't a tutorial, we'd move these repetitive code to a Python module and import it in the notebook to ensure we do it consistently in every experiment.)
from keras.datasets import cifar10
import keras.utils
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# Save an unmodified copy of y_test for later, flattened to one column
y_test_true = y_test[:,0].copy()
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
# The data only has numeric categories so we also have the string labels below
cifar10_labels = np.array(['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck'])
This model resembles the one from the previous notebook, but we've removed one of the convolutional groups
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=x_train.shape[1:]))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
history = model.fit(x_train, y_train,
batch_size=128,
epochs=8,
verbose=1,
validation_data=(x_test, y_test))
Train on 50000 samples, validate on 10000 samples Epoch 1/8 50000/50000 [==============================] - 5s 108us/step - loss: 1.6994 - acc: 0.3934 - val_loss: 1.4523 - val_acc: 0.4991 Epoch 2/8 50000/50000 [==============================] - 3s 68us/step - loss: 1.2048 - acc: 0.5760 - val_loss: 1.1240 - val_acc: 0.6010 Epoch 3/8 50000/50000 [==============================] - 3s 68us/step - loss: 0.9816 - acc: 0.6570 - val_loss: 0.9996 - val_acc: 0.6534 Epoch 4/8 50000/50000 [==============================] - 4s 73us/step - loss: 0.8252 - acc: 0.7130 - val_loss: 0.9990 - val_acc: 0.6519 Epoch 5/8 50000/50000 [==============================] - 4s 70us/step - loss: 0.6964 - acc: 0.7602 - val_loss: 0.9727 - val_acc: 0.6703 Epoch 6/8 50000/50000 [==============================] - 4s 71us/step - loss: 0.5696 - acc: 0.8022 - val_loss: 0.9478 - val_acc: 0.6905 Epoch 7/8 50000/50000 [==============================] - 3s 61us/step - loss: 0.4460 - acc: 0.8474 - val_loss: 0.9793 - val_acc: 0.6868 Epoch 8/8 50000/50000 [==============================] - 3s 64us/step - loss: 0.3296 - acc: 0.8901 - val_loss: 1.0692 - val_acc: 0.6869
train_acc = hv.Curve((history.epoch, history.history['acc']), 'epoch', 'accuracy', label='training')
val_acc = hv.Curve((history.epoch, history.history['val_acc']), 'epoch', 'accuracy', label='validation')
layout = (train_acc * val_acc).redim(accuracy=dict(range=(0.4, 1.1)))
layout.opts(
hv.opts.Curve(width=400, height=300, line_width=3),
hv.opts.Overlay(legend_position='top_left')
)
This model shows a huge discrepancy in accuracy between the training and validation data, a sign of overfitting. After the epoch 2, additional training is not helping. The model is essentially memorizing the training data and not generalizing at all.
When dealing with models that predict categories, it is helpful to look at the confusion matrix as well. This will show which categories are being predicted poorly, and what kind of mispredictions are happening.
As the confusion matrix is a standard tool in all of machine learning, the sklearn
package includes a function that computes it from an array of true category IDs and an array of predicted category IDs:
from sklearn.metrics import confusion_matrix
y_pred = model.predict_classes(x_test)
confuse = confusion_matrix(y_test_true, y_pred)
# Holoviews hack to tilt labels by 45 degrees
from math import pi
def angle_label(plot, element):
plot.state.xaxis.major_label_orientation = pi / 4
layout = hv.HeatMap((cifar10_labels, cifar10_labels, confuse)).redim.label(x='true', y='predict')
layout.opts(
hv.opts.HeatMap(width=500, height=400, tools=['hover'], finalize_hooks=[angle_label]),
)
From this we can see that dogs, deer, cats, and birds are particularly problematic classes, with the confusion between cats and dogs being especially high. Note that because the test data are already balanced to have equal examples from each class, we do not need to do any special normalization of the above.
Overfitting is more or less inevitable if we train long enough. The goal is to control it with tools like regularization or dropout. Dropout is a surprisingly effective technique where layer inputs are passed to the output, with a random subset of outputs forced to zero during training. The subset of zeroed outputs changes after every batch. When the model is used for prediction after training, the dropout layers have no effect.
For more details about dropout, see this paper.
model2 = Sequential()
model2.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=x_train.shape[1:]))
model2.add(Conv2D(64, (3, 3), activation='relu'))
model2.add(MaxPooling2D(pool_size=(2, 2)))
model2.add(Dropout(0.25))
model2.add(Flatten())
model2.add(Dense(128, activation='relu'))
model2.add(Dropout(0.5))
model2.add(Dense(num_classes, activation='softmax'))
model2.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
history2 = model2.fit(x_train, y_train,
batch_size=128,
epochs=11,
verbose=1,
validation_data=(x_test, y_test))
Train on 50000 samples, validate on 10000 samples Epoch 1/11 50000/50000 [==============================] - 4s 84us/step - loss: 1.8079 - acc: 0.3489 - val_loss: 1.3708 - val_acc: 0.5100 Epoch 2/11 50000/50000 [==============================] - 4s 76us/step - loss: 1.3452 - acc: 0.5216 - val_loss: 1.1793 - val_acc: 0.5932 Epoch 3/11 50000/50000 [==============================] - 4s 78us/step - loss: 1.1686 - acc: 0.5888 - val_loss: 1.0362 - val_acc: 0.6399 Epoch 4/11 50000/50000 [==============================] - 4s 79us/step - loss: 1.0519 - acc: 0.6288 - val_loss: 0.9938 - val_acc: 0.6491 Epoch 5/11 50000/50000 [==============================] - 4s 78us/step - loss: 0.9733 - acc: 0.6584 - val_loss: 0.9187 - val_acc: 0.6788 Epoch 6/11 50000/50000 [==============================] - 4s 78us/step - loss: 0.9074 - acc: 0.6811 - val_loss: 0.9427 - val_acc: 0.6703 Epoch 7/11 50000/50000 [==============================] - 4s 78us/step - loss: 0.8460 - acc: 0.7048 - val_loss: 0.9151 - val_acc: 0.6846 Epoch 8/11 50000/50000 [==============================] - 4s 78us/step - loss: 0.8058 - acc: 0.7191 - val_loss: 0.8623 - val_acc: 0.7039 Epoch 9/11 50000/50000 [==============================] - 4s 76us/step - loss: 0.7526 - acc: 0.7386 - val_loss: 0.8827 - val_acc: 0.6947 Epoch 10/11 50000/50000 [==============================] - 4s 78us/step - loss: 0.7095 - acc: 0.7534 - val_loss: 0.8571 - val_acc: 0.7064 Epoch 11/11 50000/50000 [==============================] - 4s 77us/step - loss: 0.6744 - acc: 0.7620 - val_loss: 0.8625 - val_acc: 0.7066
train_acc = hv.Curve((history.epoch, history.history['acc']), 'epoch', 'accuracy', label='training without dropout')
val_acc = hv.Curve((history.epoch, history.history['val_acc']), 'epoch', 'accuracy', label='validation without dropout')
train_acc2 = hv.Curve((history2.epoch, history2.history['acc']), 'epoch', 'accuracy', label='training with dropout')
val_acc2 = hv.Curve((history2.epoch, history2.history['val_acc']), 'epoch', 'accuracy', label='validation with dropout')
layout = (train_acc * val_acc * train_acc2 * val_acc2).redim(accuracy=dict(range=(0.4, 1.1)))
layout.opts(
hv.opts.Curve(width=600, height=450, line_width=3),
hv.opts.Overlay(legend_position='top_left')
)
Here we can see some common features of a model with dropout:
Unfortunately, the amount of improvement in this case is still not enough to increase accuracy by more than a few percent. It looks like we need a more complex model.
To increase the sophistication of this model, we're going to employ a few strategies:
Unfortunately, this is the hardest thing to figure out in practice. Sometimes we need more layers, sometimes we need bigger layers, and sometimes we need a different model entirely. Looking at what others have done is your best guide here until you get some intuition.
model3 = Sequential()
model3.add(Conv2D(32, kernel_size=(3, 3), padding='same',
activation='relu',
input_shape=x_train.shape[1:]))
model3.add(Conv2D(32, (3, 3), activation='relu'))
model3.add(MaxPooling2D(pool_size=(2, 2)))
model3.add(Dropout(0.25))
# Second layer of convolutions
model3.add(Conv2D(64, kernel_size=(3, 3), padding='same',
activation='relu'))
model3.add(Conv2D(64, (3, 3), activation='relu'))
model3.add(MaxPooling2D(pool_size=(2, 2)))
model3.add(Dropout(0.25))
model3.add(Flatten())
model3.add(Dense(512, activation='relu'))
model3.add(Dropout(0.5))
model3.add(Dense(num_classes, activation='softmax'))
model3.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
history3 = model3.fit(x_train, y_train,
batch_size=128,
epochs=15,
verbose=1,
validation_data=(x_test, y_test))
Train on 50000 samples, validate on 10000 samples Epoch 1/15 50000/50000 [==============================] - 6s 112us/step - loss: 1.9037 - acc: 0.3054 - val_loss: 1.5701 - val_acc: 0.4516 Epoch 2/15 50000/50000 [==============================] - 5s 96us/step - loss: 1.4582 - acc: 0.4747 - val_loss: 1.2489 - val_acc: 0.5503 Epoch 3/15 50000/50000 [==============================] - 4s 89us/step - loss: 1.2333 - acc: 0.5642 - val_loss: 1.1798 - val_acc: 0.5882 Epoch 4/15 50000/50000 [==============================] - 4s 77us/step - loss: 1.0760 - acc: 0.6204 - val_loss: 0.9443 - val_acc: 0.6698 Epoch 5/15 50000/50000 [==============================] - 4s 79us/step - loss: 0.9653 - acc: 0.6623 - val_loss: 0.8570 - val_acc: 0.7048 Epoch 6/15 50000/50000 [==============================] - 4s 82us/step - loss: 0.8784 - acc: 0.6931 - val_loss: 0.8052 - val_acc: 0.7197 Epoch 7/15 50000/50000 [==============================] - 5s 90us/step - loss: 0.8105 - acc: 0.7185 - val_loss: 0.7943 - val_acc: 0.7279 Epoch 8/15 50000/50000 [==============================] - 5s 95us/step - loss: 0.7572 - acc: 0.7338 - val_loss: 0.7289 - val_acc: 0.7471 Epoch 9/15 50000/50000 [==============================] - 5s 96us/step - loss: 0.7092 - acc: 0.7520 - val_loss: 0.7364 - val_acc: 0.7473 Epoch 10/15 50000/50000 [==============================] - 5s 93us/step - loss: 0.6703 - acc: 0.7678 - val_loss: 0.7546 - val_acc: 0.7431 Epoch 11/15 50000/50000 [==============================] - 5s 94us/step - loss: 0.6342 - acc: 0.7786 - val_loss: 0.6812 - val_acc: 0.7698 Epoch 12/15 50000/50000 [==============================] - 5s 91us/step - loss: 0.5992 - acc: 0.7913 - val_loss: 0.7275 - val_acc: 0.7538 Epoch 13/15 50000/50000 [==============================] - 4s 88us/step - loss: 0.5727 - acc: 0.7999 - val_loss: 0.6538 - val_acc: 0.7828 Epoch 14/15 50000/50000 [==============================] - 5s 95us/step - loss: 0.5503 - acc: 0.8081 - val_loss: 0.6433 - val_acc: 0.7806 Epoch 15/15 50000/50000 [==============================] - 4s 88us/step - loss: 0.5208 - acc: 0.8164 - val_loss: 0.6313 - val_acc: 0.7852
train_acc = hv.Curve((history2.epoch, history2.history['val_acc']), 'epoch', 'accuracy', label='validation (simple model)')
train_acc2 = hv.Curve((history3.epoch, history3.history['acc']), 'epoch', 'accuracy', label='training (complex model)')
val_acc = hv.Curve((history3.epoch, history3.history['val_acc']), 'epoch', 'accuracy', label='validation (complex model)')
layout = (train_acc * val_acc * train_acc2).redim(accuracy=dict(range=(0.4, 1.1)))
layout.opts(
hv.opts.Curve(width=600, height=500, line_width=3),
hv.opts.Overlay(legend_position='top_left')
)
If you screw everything up, you can use File / Revert to Checkpoint to go back to the first version of the notebook and restart the Jupyter kernel with Kernel / Restart.
model3.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_5 (Conv2D) (None, 32, 32, 32) 896 _________________________________________________________________ conv2d_6 (Conv2D) (None, 30, 30, 32) 9248 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 15, 15, 32) 0 _________________________________________________________________ dropout_3 (Dropout) (None, 15, 15, 32) 0 _________________________________________________________________ conv2d_7 (Conv2D) (None, 15, 15, 64) 18496 _________________________________________________________________ conv2d_8 (Conv2D) (None, 13, 13, 64) 36928 _________________________________________________________________ max_pooling2d_4 (MaxPooling2 (None, 6, 6, 64) 0 _________________________________________________________________ dropout_4 (Dropout) (None, 6, 6, 64) 0 _________________________________________________________________ flatten_3 (Flatten) (None, 2304) 0 _________________________________________________________________ dense_5 (Dense) (None, 512) 1180160 _________________________________________________________________ dropout_5 (Dropout) (None, 512) 0 _________________________________________________________________ dense_6 (Dense) (None, 10) 5130 ================================================================= Total params: 1,250,858 Trainable params: 1,250,858 Non-trainable params: 0 _________________________________________________________________