In this notebook we will cover the following topics:
Although we have been making our own networks from scratch to learn about deep learning components, in practice you will often want to use a standard network that has been found by deep learning researchers to be successful.
Keras comes with several popular networks already defined, and can even load them with weights from standard datasets. Keras calls these premade networks "applications". Many popular networks are included, like:
Let's try out the InceptionV3 network, which is a popular image recognition network.
import numpy as np
np.warnings.filterwarnings('ignore') # Hide np.floating warning
import keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D, Input, Lambda
# Prevent TensorFlow from grabbing all the GPU memory
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)
import holoviews as hv
hv.extension('bokeh')
Using TensorFlow backend.
Same data preparation as before.
from keras.datasets import cifar10
import keras.utils
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# Save an unmodified copy of y_test for later, flattened to one column
y_test_true = y_test[:,0].copy()
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
# The data only has numeric categories so we also have the string labels below
cifar10_labels = np.array(['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck'])
When we load a network, we have a number of options we can set. Some of the more important ones are:
input_shape
: Pretrained networks assume a particular image input size. If your data is not this shape, Keras will allow you to set it here, but some models have limitations. InceptionV3 cannot go below 75 by 75.weights
: What weights to load with the model. Default is random weights or 'imagenet'
, which loads weights from training on the ImageNet dataset. We will try the pretrained weights in a later section.include_top
: Include the dense layer at the end of the network? If you are loading pre-trained weights, you will likely need to replace the top layer with your own.classes
: Number of classes to output. Needed if include_top
is True and weights
is None.For our first attempt, we'll start training the model from scratch. Because we are dealing with such small images, we'll need to built a custom first layer to rescale the image up by a factor of 3. We can do this using a Lambda
layer, which lets us call backend (TensorFlow in this case) tensor manipulation functions. Keras provides a resize_images
function which will scale up the images.
We also use the "functional" API of Keras here, where we connect one layer to the next by treating each layer like a function and passing the preceding layer to it.
from keras import backend as K
from keras.layers import Input, Lambda, GlobalAveragePooling2D
from keras.models import Model
# Rescale input from 32x32 to 96x96
input_layer = Input(shape=(32,32,3), dtype=np.float32)
resize_layer = Lambda(lambda x: K.resize_images(x, 3, 3, 'channels_last', interpolation='nearest'))(input_layer)
# Load InceptionV3 with random initial weights
inception = keras.applications.InceptionV3(
input_shape=(96,96,3), # must be larger than 75x75
weights=None, # random weights
include_top=True,
classes=num_classes,
)(resize_layer)
model = Model(inputs=[input_layer], outputs=[inception])
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
Let's see how many parameters the model has:
model.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 32, 32, 3) 0 _________________________________________________________________ lambda_1 (Lambda) (None, 96, 96, 3) 0 _________________________________________________________________ inception_v3 (Model) (None, 10) 21823274 ================================================================= Total params: 21,823,274 Trainable params: 21,788,842 Non-trainable params: 34,432 _________________________________________________________________
The InceptionV3 model is signficantly deeper than our toy models before. Let's see how Keras handles it.
history = model.fit(x_train, y_train,
batch_size=256,
epochs=5,
verbose=1,
validation_data=(x_test, y_test))
Train on 50000 samples, validate on 10000 samples Epoch 1/5 50000/50000 [==============================] - 57s 1ms/step - loss: 1.7074 - acc: 0.4050 - val_loss: 11.0392 - val_acc: 0.1181 Epoch 2/5 50000/50000 [==============================] - 36s 713us/step - loss: 1.4342 - acc: 0.4983 - val_loss: 2.8803 - val_acc: 0.2589 Epoch 3/5 50000/50000 [==============================] - 35s 699us/step - loss: 1.2737 - acc: 0.5642 - val_loss: 3.3360 - val_acc: 0.3981 Epoch 4/5 50000/50000 [==============================] - 35s 705us/step - loss: 1.0179 - acc: 0.6485 - val_loss: 2.1435 - val_acc: 0.3928 Epoch 5/5 50000/50000 [==============================] - 36s 711us/step - loss: 0.7504 - acc: 0.7379 - val_loss: 1.2512 - val_acc: 0.6089
train_acc = hv.Curve((history.epoch, history.history['acc']), 'epoch', 'accuracy', label='training')
val_acc = hv.Curve((history.epoch, history.history['val_acc']), 'epoch', 'accuracy', label='validation')
layout = (train_acc * val_acc).redim(accuracy=dict(range=(0.0, 1.1)))
layout.opts(
hv.opts.Curve(width=400, height=300, line_width=3),
hv.opts.Overlay(legend_position='bottom_right')
)
This model seems to be training reasonably well, even with completely random starting weights. Let's try seeding the model with a starting point.
Given the relatively small size of our training dataset, it can be hard to retrain a complex predefined model entirely from scratch. Let's try to retrain a model starting from the ImageNet weights:
from keras import backend as K
from keras.layers import Input, Lambda, GlobalAveragePooling2D
from keras.models import Model
# Rescale input from 32x32 to 96x96
input_layer = Input(shape=(32,32,3), dtype=np.float32)
resize_layer = Lambda(lambda x: K.resize_images(x, 3, 3, 'channels_last', interpolation='nearest'))(input_layer)
# Load InceptionV3 with imagenet weights, but removing the top dense layers
inception = keras.applications.InceptionV3(
input_shape=(96,96,3), # our scaled up dimension >= 75
weights='imagenet', # random weights
include_top=False, # we are going to replace the top of the network with our own layers
)
#inception.trainable = False # uncomment this to freeze the loaded weights
# Add our own top layers to produce 10 categories, but also adding dropout to control overfitting
prediction = Flatten()(inception(resize_layer))
prediction = Dropout(0.25)(prediction)
prediction = Dense(num_classes, activation='softmax')(prediction)
model2 = Model(inputs=[input_layer], outputs=[prediction])
model2.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
model2.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_3 (InputLayer) (None, 32, 32, 3) 0 _________________________________________________________________ lambda_2 (Lambda) (None, 96, 96, 3) 0 _________________________________________________________________ inception_v3 (Model) (None, 1, 1, 2048) 21802784 _________________________________________________________________ flatten_1 (Flatten) (None, 2048) 0 _________________________________________________________________ dropout_1 (Dropout) (None, 2048) 0 _________________________________________________________________ dense_1 (Dense) (None, 10) 20490 ================================================================= Total params: 21,823,274 Trainable params: 21,788,842 Non-trainable params: 34,432 _________________________________________________________________
history2 = model2.fit(x_train, y_train,
batch_size=256,
epochs=5,
verbose=1,
validation_data=(x_test, y_test))
Train on 50000 samples, validate on 10000 samples Epoch 1/5 50000/50000 [==============================] - 56s 1ms/step - loss: 0.7667 - acc: 0.7452 - val_loss: 0.8147 - val_acc: 0.7278 Epoch 2/5 50000/50000 [==============================] - 36s 719us/step - loss: 0.3082 - acc: 0.8982 - val_loss: 0.7629 - val_acc: 0.7641 Epoch 3/5 50000/50000 [==============================] - 36s 721us/step - loss: 0.1949 - acc: 0.9339 - val_loss: 0.7987 - val_acc: 0.7660 Epoch 4/5 50000/50000 [==============================] - 36s 722us/step - loss: 0.1375 - acc: 0.9541 - val_loss: 1.3982 - val_acc: 0.6551 Epoch 5/5 50000/50000 [==============================] - 36s 729us/step - loss: 0.0976 - acc: 0.9673 - val_loss: 1.3120 - val_acc: 0.7341
train_acc = hv.Curve((history2.epoch, history2.history['acc']), 'epoch', 'accuracy', label='training')
val_acc = hv.Curve((history2.epoch, history2.history['val_acc']), 'epoch', 'accuracy', label='validation')
layout = (train_acc * val_acc).redim(accuracy=dict(range=(0.0, 1.1)))
layout.opts(
hv.opts.Curve(width=400, height=300, line_width=3),
hv.opts.Overlay(legend_position='bottom_right')
)
inception.trainable = False
), does the training still work?interpolation='bilinear'
.If you screw everything up, you can use File / Revert to Checkpoint to go back to the first version of the notebook and restart the Jupyter kernel with Kernel / Restart.