Notebook

Part 4a - Starting From Predefined Models¶

In this notebook we will cover the following topics:

Loading predefined networks that come with Keras
Retraining from scratch
Partial retraining

Why Predefined Networks?¶

Although we have been making our own networks from scratch to learn about deep learning components, in practice you will often want to use a standard network that has been found by deep learning researchers to be successful.

Keras comes with several popular networks already defined, and can even load them with weights from standard datasets. Keras calls these premade networks "applications". Many popular networks are included, like:

Xception
VGG16
VGG19
ResNet, ResNetV2, ResNeXt
InceptionV3
InceptionResNetV2
MobileNet
MobileNetV2
DenseNet
NASNet

Let's try out the InceptionV3 network, which is a popular image recognition network.

In [1]:

import numpy as np
np.warnings.filterwarnings('ignore')  # Hide np.floating warning

import keras

from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D, Input, Lambda

# Prevent TensorFlow from grabbing all the GPU memory
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)

import holoviews as hv
hv.extension('bokeh')

Using TensorFlow backend.

Load the Data¶

Same data preparation as before.

In [2]:

from keras.datasets import cifar10
import keras.utils

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Save an unmodified copy of y_test for later, flattened to one column
y_test_true = y_test[:,0].copy()

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# The data only has numeric categories so we also have the string labels below 
cifar10_labels = np.array(['airplane', 'automobile', 'bird', 'cat', 'deer', 
                           'dog', 'frog', 'horse', 'ship', 'truck'])

Load InceptionV3¶

When we load a network, we have a number of options we can set. Some of the more important ones are:

input_shape: Pretrained networks assume a particular image input size. If your data is not this shape, Keras will allow you to set it here, but some models have limitations. InceptionV3 cannot go below 75 by 75.
weights: What weights to load with the model. Default is random weights or 'imagenet', which loads weights from training on the ImageNet dataset. We will try the pretrained weights in a later section.
include_top: Include the dense layer at the end of the network? If you are loading pre-trained weights, you will likely need to replace the top layer with your own.
classes: Number of classes to output. Needed if include_top is True and weights is None.

For our first attempt, we'll start training the model from scratch. Because we are dealing with such small images, we'll need to built a custom first layer to rescale the image up by a factor of 3. We can do this using a Lambda layer, which lets us call backend (TensorFlow in this case) tensor manipulation functions. Keras provides a resize_images function which will scale up the images.

We also use the "functional" API of Keras here, where we connect one layer to the next by treating each layer like a function and passing the preceding layer to it.

In [3]:

from keras import backend as K
from keras.layers import Input, Lambda, GlobalAveragePooling2D
from keras.models import Model

# Rescale input from 32x32 to 96x96
input_layer = Input(shape=(32,32,3), dtype=np.float32)
resize_layer = Lambda(lambda x: K.resize_images(x, 3, 3, 'channels_last', interpolation='nearest'))(input_layer)

# Load InceptionV3 with random initial weights
inception = keras.applications.InceptionV3(
    input_shape=(96,96,3),   # must be larger than 75x75
    weights=None,            # random weights
    include_top=True,  
    classes=num_classes, 
)(resize_layer)

model = Model(inputs=[input_layer], outputs=[inception])

model.compile(loss=keras.losses.categorical_crossentropy,
               optimizer=keras.optimizers.Adadelta(),
               metrics=['accuracy'])

Let's see how many parameters the model has:

In [4]:

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
lambda_1 (Lambda)            (None, 96, 96, 3)         0         
_________________________________________________________________
inception_v3 (Model)         (None, 10)                21823274  
=================================================================
Total params: 21,823,274
Trainable params: 21,788,842
Non-trainable params: 34,432
_________________________________________________________________

The InceptionV3 model is signficantly deeper than our toy models before. Let's see how Keras handles it.

In [5]:

history = model.fit(x_train, y_train,
          batch_size=256,
          epochs=5,
          verbose=1,
          validation_data=(x_test, y_test))

Train on 50000 samples, validate on 10000 samples
Epoch 1/5
50000/50000 [==============================] - 57s 1ms/step - loss: 1.7074 - acc: 0.4050 - val_loss: 11.0392 - val_acc: 0.1181
Epoch 2/5
50000/50000 [==============================] - 36s 713us/step - loss: 1.4342 - acc: 0.4983 - val_loss: 2.8803 - val_acc: 0.2589
Epoch 3/5
50000/50000 [==============================] - 35s 699us/step - loss: 1.2737 - acc: 0.5642 - val_loss: 3.3360 - val_acc: 0.3981
Epoch 4/5
50000/50000 [==============================] - 35s 705us/step - loss: 1.0179 - acc: 0.6485 - val_loss: 2.1435 - val_acc: 0.3928
Epoch 5/5
50000/50000 [==============================] - 36s 711us/step - loss: 0.7504 - acc: 0.7379 - val_loss: 1.2512 - val_acc: 0.6089

In [6]:

train_acc = hv.Curve((history.epoch, history.history['acc']), 'epoch', 'accuracy', label='training')
val_acc = hv.Curve((history.epoch, history.history['val_acc']), 'epoch', 'accuracy', label='validation')

layout = (train_acc * val_acc).redim(accuracy=dict(range=(0.0, 1.1)))

layout.opts(
    hv.opts.Curve(width=400, height=300, line_width=3),
    hv.opts.Overlay(legend_position='bottom_right')
)

Out[6]:

This model seems to be training reasonably well, even with completely random starting weights. Let's try seeding the model with a starting point.

Transfer Learning¶

Given the relatively small size of our training dataset, it can be hard to retrain a complex predefined model entirely from scratch. Let's try to retrain a model starting from the ImageNet weights:

In [7]:

from keras import backend as K
from keras.layers import Input, Lambda, GlobalAveragePooling2D
from keras.models import Model

# Rescale input from 32x32 to 96x96
input_layer = Input(shape=(32,32,3), dtype=np.float32)
resize_layer = Lambda(lambda x: K.resize_images(x, 3, 3, 'channels_last', interpolation='nearest'))(input_layer)

# Load InceptionV3 with imagenet weights, but removing the top dense layers
inception = keras.applications.InceptionV3(
    input_shape=(96,96,3),   # our scaled up dimension >= 75
    weights='imagenet',      # random weights
    include_top=False,       # we are going to replace the top of the network with our own layers
)
#inception.trainable = False  # uncomment this to freeze the loaded weights 

# Add our own top layers to produce 10 categories, but also adding dropout to control overfitting
prediction = Flatten()(inception(resize_layer))
prediction = Dropout(0.25)(prediction)
prediction = Dense(num_classes, activation='softmax')(prediction)

model2 = Model(inputs=[input_layer], outputs=[prediction])

model2.compile(loss=keras.losses.categorical_crossentropy,
               optimizer=keras.optimizers.Adadelta(),
               metrics=['accuracy'])

In [8]:

model2.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
lambda_2 (Lambda)            (None, 96, 96, 3)         0         
_________________________________________________________________
inception_v3 (Model)         (None, 1, 1, 2048)        21802784  
_________________________________________________________________
flatten_1 (Flatten)          (None, 2048)              0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 2048)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                20490     
=================================================================
Total params: 21,823,274
Trainable params: 21,788,842
Non-trainable params: 34,432
_________________________________________________________________

In [9]:

history2 = model2.fit(x_train, y_train,
          batch_size=256,
          epochs=5,
          verbose=1,
          validation_data=(x_test, y_test))

Train on 50000 samples, validate on 10000 samples
Epoch 1/5
50000/50000 [==============================] - 56s 1ms/step - loss: 0.7667 - acc: 0.7452 - val_loss: 0.8147 - val_acc: 0.7278
Epoch 2/5
50000/50000 [==============================] - 36s 719us/step - loss: 0.3082 - acc: 0.8982 - val_loss: 0.7629 - val_acc: 0.7641
Epoch 3/5
50000/50000 [==============================] - 36s 721us/step - loss: 0.1949 - acc: 0.9339 - val_loss: 0.7987 - val_acc: 0.7660
Epoch 4/5
50000/50000 [==============================] - 36s 722us/step - loss: 0.1375 - acc: 0.9541 - val_loss: 1.3982 - val_acc: 0.6551
Epoch 5/5
50000/50000 [==============================] - 36s 729us/step - loss: 0.0976 - acc: 0.9673 - val_loss: 1.3120 - val_acc: 0.7341

In [10]:

train_acc = hv.Curve((history2.epoch, history2.history['acc']), 'epoch', 'accuracy', label='training')
val_acc = hv.Curve((history2.epoch, history2.history['val_acc']), 'epoch', 'accuracy', label='validation')

layout = (train_acc * val_acc).redim(accuracy=dict(range=(0.0, 1.1)))

layout.opts(
    hv.opts.Curve(width=400, height=300, line_width=3),
    hv.opts.Overlay(legend_position='bottom_right')
)

Out[10]:

Experiments to Try¶

We allowed the ImageNet weights to vary in the transfer learning example. If we froze the imagenet weights (inception.trainable = False), does the training still work?
Does the interpolation scheme used to scale up the image matter? Try interpolation='bilinear'.
Take a look at the other models Keras includes. Try using some of the other ones. Note that not all of them will work!

If you screw everything up, you can use File / Revert to Checkpoint to go back to the first version of the notebook and restart the Jupyter kernel with Kernel / Restart.

In [ ]: