# Rather than importing everything manually, we'll make things easy # and load them all in utils.py, and just import them from there. %matplotlib inline import utils3; from utils3 import *
Using Theano backend. /home/karel/anaconda3/lib/python3.6/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1. warnings.warn("Your cuDNN version is more recent than " Using cuDNN version 6020 on context None Mapped name None to device cuda: Graphics Device (0000:02:00.0)
We need to find a way to convert the imagenet predictions to a probability of being a cat or a dog, since that is what the Kaggle competition requires us to submit. We could use the imagenet hierarchy to download a list of all the imagenet categories in each of the dog and cat groups, and could then solve our problem in various ways, such as:
But these approaches have some downsides:
A very simple solution to both of these problems is to learn a linear model that is trained using the 1,000 predictions from the imagenet model for each image as input, and the dog/cat label as target.
%matplotlib inline import os, json from glob import glob import numpy as np import scipy from sklearn.preprocessing import OneHotEncoder from sklearn.metrics import confusion_matrix np.set_printoptions(precision=4, linewidth=100) from matplotlib import pyplot as plt import utils3 from utils3 import plots, get_batches, plot_confusion_matrix, get_data
from numpy.random import random, permutation from scipy import misc, ndimage from scipy.ndimage.interpolation import zoom import keras from keras import backend as K from keras.utils.data_utils import get_file from keras.models import Sequential from keras.layers import Input from keras.layers.core import Flatten, Dense, Dropout, Lambda from keras.layers.convolutional import Conv2D, MaxPooling2D, ZeroPadding2D from keras.optimizers import SGD, RMSprop from keras.preprocessing import image
It turns out that each of the Dense() layers is just a linear model, followed by a simple activation function. We'll learn about the activation function later - first, let's review how linear models work.
A linear model is (as I'm sure you know) simply a model where each row is calculated as sum(row * weights), where weights needs to be learnt from the data, and will be the same for every row. For example, let's create some data that we know is linearly related:
x = random((30,2)) y = np.dot(x, [2., 3.]) + 1.
array([[ 0.0106, 0.363 ], [ 0.478 , 0.3925], [ 0.7236, 0.3616], [ 0.1891, 0.4651], [ 0.4663, 0.7481]])
array([ 2.1102, 3.1335, 3.532 , 2.7734, 4.1768])
We can use keras to create a simple linear model (Dense() - with no activation - in Keras) and optimize it using SGD to minimize mean squared error (mse):
lm = Sequential([ Dense(1, input_shape=(2,)) ]) lm.compile(optimizer=SGD(lr=0.1), loss='mse')
(See the Optim Tutorial notebook and associated Excel spreadsheet to learn all about SGD and related optimization algorithms.)
This has now learnt internal weights inside the lm model, which we can use to evaluate the loss function (MSE).
lm.evaluate(x, y, verbose=0)
lm.fit(x, y, epochs=5, batch_size=1)
Epoch 1/5 30/30 [==============================] - 0s - loss: 0.7089 Epoch 2/5 30/30 [==============================] - 0s - loss: 0.0475 Epoch 3/5 30/30 [==============================] - 0s - loss: 0.0224 Epoch 4/5 30/30 [==============================] - 0s - loss: 0.0094 Epoch 5/5 30/30 [==============================] - 0s - loss: 0.0042
<keras.callbacks.History at 0x7f9a7fcb51d0>
lm.evaluate(x, y, verbose=0)
And, of course, we can also take a look at the weights - after fitting, we should see that they are close to the weights we used to calculate y (2.0, 3.0, and 1.0).
[array([[ 2.0453], [ 2.7755]], dtype=float32), array([ 1.0656], dtype=float32)]
Using a Dense() layer in this way, we can easily convert the 1,000 predictions given by our model into a probability of dog vs cat--simply train a linear model to take the 1,000 predictions as input, and return dog or cat as output, learning from the Kaggle data. This should be easier and more accurate than manually creating a map from imagenet categories to one dog/cat category.
We start with some basic config steps. We copy a small amount of our data into a 'sample' directory, with the exact same structure as our 'train' directory--this is always a good idea in all machine learning, since we should do all of our initial testing using a dataset small enough that we never have to wait for it.
#path = "data/dogscats/sample/" path = "data/dogscats/" model_path = path + 'models/' if not os.path.exists(model_path): os.mkdir(model_path)
We will process as many images at a time as our graphics card allows. This is a case of trial and error to find the max batch size - the largest size that doesn't give an out of memory error.
We need to start with our VGG 16 model, since we'll be using its predictions and features.
from vgg16_3 import Vgg16 vgg = Vgg16() model = vgg.model
Our overall approach here will be:
Let's start by grabbing training and validation batches.
# Use batch size of 1 since we're just doing preprocessing on the CPU val_batches = get_batches(path+'valid', shuffle=False, batch_size=1) batches = get_batches(path+'train', shuffle=False, batch_size=1)
Found 2000 images belonging to 2 classes. Found 23000 images belonging to 2 classes.
Loading and resizing the images every time we want to use them isn't necessary - instead we should save the processed arrays. By far the fastest way to save and load numpy arrays is using bcolz. This also compresses the arrays, so we save disk space. Here are the functions we'll use to save and load using bcolz.
import bcolz def save_array(fname, arr): c=bcolz.carray(arr, rootdir=fname, mode='w'); c.flush() def load_array(fname): return bcolz.open(fname)[:]
We have provided a simple function that joins the arrays from all the batches - let's use this to grab the training and validation data:
val_data = get_data(path+'valid')
Found 2000 images belonging to 2 classes.
trn_data = get_data(path+'train')
Found 23000 images belonging to 2 classes.
(23000, 3, 224, 224)
save_array(model_path+ 'train_data.bc', trn_data) save_array(model_path + 'valid_data.bc', val_data)
We can load our training and validation data later without recalculating them:
trn_data = load_array(model_path+'train_data.bc') val_data = load_array(model_path+'valid_data.bc')
(2000, 3, 224, 224)
Keras returns classes as a single column, so we convert to one hot encoding
def onehot(x): return np.array(OneHotEncoder().fit_transform(x.reshape(-1,1)).todense())
val_classes = val_batches.classes trn_classes = batches.classes val_labels = onehot(val_classes) trn_labels = onehot(trn_classes)
array([0, 0, 0, 0], dtype=int32)
array([[ 1., 0.], [ 1., 0.], [ 1., 0.], [ 1., 0.]])
...and their 1,000 imagenet probabilties from VGG16--these will be the features for our linear model:
trn_features = model.predict(trn_data, batch_size=batch_size) val_features = model.predict(val_data, batch_size=batch_size)
save_array(model_path+ 'train_lastlayer_features.bc', trn_features) save_array(model_path + 'valid_lastlayer_features.bc', val_features)
We can load our training and validation features later without recalculating them:
trn_features = load_array(model_path+'train_lastlayer_features.bc') val_features = load_array(model_path+'valid_lastlayer_features.bc')
Now we can define our linear model, just like we did earlier:
# 1000 inputs, since that's the saved features, and 2 outputs, for dog and cat lm = Sequential([ Dense(2, activation='softmax', input_shape=(1000,)) ]) lm.compile(optimizer=RMSprop(lr=0.1), loss='categorical_crossentropy', metrics=['accuracy'])
We're ready to fit the model!
lm.fit(trn_features, trn_labels, epochs=3, batch_size=batch_size, validation_data=(val_features, val_labels))
Train on 23000 samples, validate on 2000 samples Epoch 1/3 23000/23000 [==============================] - 2s - loss: 0.1342 - acc: 0.9703 - val_loss: 0.1537 - val_acc: 0.9720 Epoch 2/3 23000/23000 [==============================] - 2s - loss: 0.1638 - acc: 0.9741 - val_loss: 0.1795 - val_acc: 0.9740 Epoch 3/3 23000/23000 [==============================] - 2s - loss: 0.1830 - acc: 0.9745 - val_loss: 0.1986 - val_acc: 0.9755
<keras.callbacks.History at 0x7f63be4909e8>
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_4 (Dense) (None, 2) 2002 ================================================================= Total params: 2,002.0 Trainable params: 2,002 Non-trainable params: 0.0 _________________________________________________________________
Keras' fit() function conveniently shows us the value of the loss function, and the accuracy, after every epoch ("epoch" refers to one full run through all training examples). The most important metrics for us to look at are for the validation set, since we want to check for over-fitting.
As well as looking at the overall metrics, it's also a good idea to look at examples of each of:
Let's see what we, if anything, we can from these (in general, these are particularly useful for debugging problems in the model; since this model is so simple, there may not be too much to learn at this stage.)
Calculate predictions on validation set, so we can find correct and incorrect examples:
# We want both the classes... preds = lm.predict_classes(val_features, batch_size=batch_size) # ...and the probabilities of being a cat probs = lm.predict_proba(val_features, batch_size=batch_size)[:,0] probs[:8]
64/2000 [..............................] - ETA: 0s
array([ 1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)
array([0, 0, 0, 0, 0, 0, 0, 0])
Get the filenames for the validation set, so we can view images:
filenames = val_batches.filenames
# Number of images to view for each visualization task n_view = 4
Helper function to plot images by index in the validation set:
def plots_idx(idx, titles=None): plots([image.load_img(path + 'valid/' + filenames[i]) for i in idx], titles=titles)
#1. A few correct labels at random correct = np.where(preds==val_labels[:,1]) idx = permutation(correct)[:n_view] plots_idx(idx, probs[idx])
#2. A few incorrect labels at random incorrect = np.where(preds!=val_labels[:,1]) idx = permutation(incorrect)[:n_view] plots_idx(idx, probs[idx])
#3. The images we most confident were cats, and are actually cats correct_cats = np.where((preds==0) & (preds==val_labels[:,1])) most_correct_cats = np.argsort(probs[correct_cats])[::-1][:n_view] plots_idx(correct_cats[most_correct_cats], probs[correct_cats][most_correct_cats])
# as above, but dogs correct_dogs = np.where((preds==1) & (preds==val_labels[:,1])) most_correct_dogs = np.argsort(probs[correct_dogs])[:n_view] plots_idx(correct_dogs[most_correct_dogs], 1-probs[correct_dogs][most_correct_dogs])
#3. The images we were most confident were cats, but are actually dogs incorrect_cats = np.where((preds==0) & (preds!=val_labels[:,1])) most_incorrect_cats = np.argsort(probs[incorrect_cats])[::-1][:n_view] plots_idx(incorrect_cats[most_incorrect_cats], probs[incorrect_cats][most_incorrect_cats])