In this notebook we will cover the following topics:
import numpy as np
np.warnings.filterwarnings('ignore') # Hide np.floating warning
import keras
from keras.datasets import cifar10
# Prevent TensorFlow from grabbing all the GPU memory
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)
import holoviews as hv
hv.extension('bokeh')
Using TensorFlow backend.
from keras.datasets import cifar10
import keras.utils
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# Save an unmodified copy of y_test for later, flattened to one column
y_test_true = y_test[:,0].copy()
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
# The data only has numeric categories so we also have the string labels below
cifar10_labels = np.array(['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck'])
Keras provides a load_model()
function which recreates the model in memory from the provided HDF5 file:
from keras.models import load_model
This can take a few seconds depending on how large the model is:
%%time
gpu_model = load_model('cifar10_model.hdf5')
CPU times: user 660 ms, sys: 112 ms, total: 772 ms Wall time: 736 ms
gpu_model.predict_classes(x_test)
array([3, 8, 8, ..., 5, 4, 7])
While the acceleration provided by the GPU is very valuable for training, it is not always necessary when using a trained model for prediction only.
Making this tradeoff depends on:
To see how the CPU and GPU performance of our model compares in this notebook, we need to trick TensorFlow into running the model on the CPU even though a GPU is present. TensorFlow provides a context manager tf.device()
which can be used to control where operations happen. If we load the model while the TensorFlow is pinning operations to the CPU device, our model will always run on the CPU:
with tf.device("/device:CPU:0"):
cpu_model = load_model('cifar10_model.hdf5')
print(cpu_model.predict_classes(x_test))
[3 8 8 ... 5 4 7]
And now we can compare CPU and GPU performance on 10000 test images (the ideal case for the GPU):
print('GPU performance: %d images' % x_test.shape[0])
%timeit gpu_model.predict_classes(x_test)
print('CPU performance: %d images' % x_test.shape[0])
%timeit cpu_model.predict_classes(x_test)
GPU performance: 10000 images 467 ms ± 7.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) CPU performance: 10000 images 5.12 s ± 239 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Depending on your GPU and CPU, this spread can be 10x or more.
Now let's look at the performance for one image:
print('GPU performance: 1 image')
%timeit gpu_model.predict_classes(x_test[:1])
print('CPU performance: 1 image')
%timeit cpu_model.predict_classes(x_test[:1])
GPU performance: 1 image 900 µs ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) CPU performance: 1 image 1.95 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The spread is much smaller, indicating that the CPU might be sufficient for models which have to process inputs one at a time. (For example, a realtime classifier.)
If you screw everything up, you can use File / Revert to Checkpoint to go back to the first version of the notebook and restart the Jupyter kernel with Kernel / Restart.