In this notebook we will cover the following topics
import numpy as np
np.warnings.filterwarnings('ignore') # Hide np.floating warning
import holoviews as hv
hv.extension('bokeh')
import tensorflow as tf
gpu_devices = tf.config.experimental.list_physical_devices('GPU')
for device in gpu_devices:
tf.config.experimental.set_memory_growth(device, True)
Using the pattern we saw in the last notebook, we can load and transform the CIFAR10 data for deep learning.
from keras.datasets import cifar10
import keras.utils
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# Save an unmodified copy of y_test for later, flattened to one column
y_test_true = y_test[:,0].copy()
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
# The data only has numeric categories so we also have the string labels below
cifar10_labels = np.array(['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck'])
The simplest way to define a deep learning model in Keras is using the Sequential class, which holds a stack of layers that are executed in sequence.
Keras has an extensive catalog of layers, making it very easy to recreate almost any network you find in the literature. The network we will use in this tutorial have the following kinds of layers:
max()
function in 2 dimensions, also useful in image networksKeras also has a large list of supported activation functions. For all of these examples, we will use the relu
function as it has good performance.
We begin by importing the necessary classes:
from keras import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
Creating a Keras model has the following structure:
The choice of loss function depends on the kind of model we are training. Since we are doing categorization with more than two categories, categorical_crossentropy
is preferred.
The choice of optimizer is less straightforward. We're using Adadelta
because it is self-tuning and works pretty well on this problem.
Metrics are functions that score your model, but are not used to optimize it. The most common metric is accuracy, so we include it here.
model = Sequential()
### Convolution and max pool layers
# Group 1: Convolution
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=x_train.shape[1:]))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# Group 2: Convolution
model.add(Conv2D(64, kernel_size=(3, 3),
activation='relu',
input_shape=x_train.shape[1:]))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# Group 3: Dense layers
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
We can inspect various properties of the model, such as the number of free parameters:
model.summary()
Here we see that the majority of free parameters are introduced at the point where we switch from the convolutional layers to the dense layers. If we want to reduce the size of this model, we will either need to reduce the size of the dense layer or reduce the number of convolution kernels.
To train a model, we use the fit()
method on a compiled model:
%%time
history = model.fit(x_train, y_train,
batch_size=256,
epochs=5,
verbose=1,
validation_data=(x_test, y_test))
The epochs
value controls how many passes through the training data are taken. The batch_size
determines how many training examples are processed in parallel. The model parameters are updated between each batch using backpropagation according to the optimizer's strategy. Batch size affects both training performance and model quality, as we'll discuss later.
The validation data is not used by the optimizer for training, but it is scored between each epoch to give an independent assessment of the model quality. The results for the validation data are what you should keep an eye on to understand how well the model is generalizing.
Note that the model object retains its state after training. If we wanted additional rounds of training, we could call fit()
again, and it would pick up where the last fit left off:
model.fit(x_train, y_train,
batch_size=256,
epochs=2,
verbose=1,
validation_data=(x_test, y_test))
One of the more powerful features of the fit()
method is the callbacks
argument. We can use prebuilt classes, or create our own, that are called after every batch and epoch to update status or cause the fit to terminate. For example, we can use the EarlyStopping to end the fit if no improvement larger than 5% is seen for 2 training epochs
early_stop = keras.callbacks.EarlyStopping(monitor='val_accuracy', min_delta=0.05, patience=2, verbose=1)
model.fit(x_train, y_train,
batch_size=256,
epochs=10,
verbose=1,
validation_data=(x_test, y_test),
callbacks=[early_stop])
Now that the model is trained, we can use the model object in various ways. First, we can look at the training history object:
print(history.epoch)
history.history
The history.history
dictionary has tracked several different values through the training process:
accuracy
: Accuracy of the model on the training set, averaged over the batchesval_accuracy
: Accuracy of the model on the validation setloss
: Value of the loss function on the training set, averaged over the batchesval_loss
: Value of the loss function on the validation setNote that the loss function on the training data is the only thing the optimizer is trying to minimize. The other metrics hopefully improve at the same time, but do not always.
We can plot the accuracy on the test and training data with Holoviews:
train_acc = hv.Curve((history.epoch, history.history['accuracy']), 'epoch', 'accuracy', label='training')
val_acc = hv.Curve((history.epoch, history.history['val_accuracy']), 'epoch', 'accuracy', label='validation')
layout = train_acc * val_acc
layout.opts(
hv.opts.Curve(width=400, height=300, line_width=3),
hv.opts.Overlay(legend_position='top_left')
)
We can also look individual predictions. Let's run the final trained model over the validation set:
y_predict = model.predict(x_test)
y_predict[:5]
This is still using the one-hot encoding, where each input image produces 10 columns (for categories 0-9) of output. Normally, we would take the column with the largest output as the predicted category. We could do this with some NumPy magic, but Keras also includes a convenience method predict_classes()
, which does this automatically:
y_predict = model.predict_classes(x_test)
y_predict[:5]
And then we can use our label array and NumPy fancy indexing to see these as strings:
y_predict_labels = cifar10_labels[y_predict]
y_true_labels = cifar10_labels[y_test_true]
print(y_predict_labels[:5])
print(y_true_labels[:5])
Holoviews makes it easy to look at the first few predictions:
images = [hv.RGB(x_test[i], label='%s(%s)' % (y_true_labels[i], y_predict_labels[i]) ) for i in range(12)]
hv.output(
hv.Layout(images).cols(4).opts(
hv.opts.RGB(xaxis=None, yaxis=None)
),
size=64
)
In fact, let's select out the failed predictions with more NumPy fancy indexing:
failed = y_predict != y_test_true
print('Number failed:', np.count_nonzero(failed))
images = [hv.RGB(x_test[failed][i], label='%d: %s(%s)' %
(i, y_true_labels[failed][i],
y_predict_labels[failed][i]) ) for i in range(12)]
hv.output(
hv.Layout(images).cols(4).opts(
hv.opts.RGB(xaxis=None, yaxis=None),
),
size=64
)
relu
to sigmoid
.If you screw everything up, you can use File / Revert to Checkpoint to go back to the first version of the notebook and restart the Jupyter kernel with Kernel / Restart.