In this notebook we're going to investigate a range of different architectures for the Kaggle fisheries competition. The video states that vgg.py and vgg_ft()
from utils.py have been updated to include VGG with batch normalization, but this is not the case. We've instead created a new file vgg_bn.py and an additional method vgg_ft_bn()
(which is already in utils.py) which we use in this notebook.
from theano.sandbox import cuda
%matplotlib inline
import utils; reload(utils)
from utils import *
from __future__ import division, print_function
#path = "data/fish/sample/"
path = "data/fish/"
batch_size=64
batches = get_batches(path+'train', batch_size=batch_size)
val_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=False)
(val_classes, trn_classes, val_labels, trn_labels,
val_filenames, filenames, test_filenames) = get_classes(path)
Found 3277 images belonging to 8 classes. Found 500 images belonging to 8 classes. Found 3277 images belonging to 8 classes. Found 500 images belonging to 8 classes. Found 1000 images belonging to 1 classes.
Sometimes it's helpful to have just the filenames, without the path.
raw_filenames = [f.split('/')[-1] for f in filenames]
raw_test_filenames = [f.split('/')[-1] for f in test_filenames]
raw_val_filenames = [f.split('/')[-1] for f in val_filenames]
We create the validation and sample sets in the usual way.
%cd data/fish
%cd train
%mkdir ../valid
g = glob('*')
for d in g: os.mkdir('../valid/'+d)
g = glob('*/*.jpg')
shuf = np.random.permutation(g)
for i in range(500): os.rename(shuf[i], '../valid/' + shuf[i])
%mkdir ../sample
%mkdir ../sample/train
%mkdir ../sample/valid
from shutil import copyfile
g = glob('*')
for d in g:
os.mkdir('../sample/train/'+d)
os.mkdir('../sample/valid/'+d)
g = glob('*/*.jpg')
shuf = np.random.permutation(g)
for i in range(400): copyfile(shuf[i], '../sample/train/' + shuf[i])
%cd ../valid
g = glob('*/*.jpg')
shuf = np.random.permutation(g)
for i in range(200): copyfile(shuf[i], '../sample/valid/' + shuf[i])
%cd ..
%mkdir results
%mkdir sample/results
%cd ../..
/data/jhoward/fast-image/nbs/data/fish
We start with our usual VGG approach. We will be using VGG with batch normalization. We explained how to add batch normalization to VGG in the imagenet_batchnorm notebook. VGG with batch normalization is implemented in vgg_bn.py, and there is a version of vgg_ft
(our fine tuning function) with batch norm called vgg_ft_bn
in utils.py.
First we create a simple fine-tuned VGG model to be our starting point.
from vgg16bn import Vgg16BN
model = vgg_ft_bn(8)
trn = get_data(path+'train')
val = get_data(path+'valid')
Found 3277 images belonging to 8 classes. Found 500 images belonging to 8 classes.
test = get_data(path+'test')
Found 1000 images belonging to 1 classes.
save_array(path+'results/trn.dat', trn)
save_array(path+'results/val.dat', val)
save_array(path+'results/test.dat', test)
trn = load_array(path+'results/trn.dat')
val = load_array(path+'results/val.dat')
test = load_array(path+'results/test.dat')
gen = image.ImageDataGenerator()
model.compile(optimizer=Adam(1e-3),
loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(trn, trn_labels, batch_size=batch_size, nb_epoch=3, validation_data=(val, val_labels))
Train on 3277 samples, validate on 500 samples Epoch 1/3 3277/3277 [==============================] - 87s - loss: 2.8985 - acc: 0.4596 - val_loss: 1.0387 - val_acc: 0.7220 Epoch 2/3 3277/3277 [==============================] - 87s - loss: 1.6575 - acc: 0.6301 - val_loss: 0.6592 - val_acc: 0.8260 Epoch 3/3 3277/3277 [==============================] - 87s - loss: 1.2879 - acc: 0.6951 - val_loss: 0.4562 - val_acc: 0.8620
<keras.callbacks.History at 0x7fb6d3d5e810>
model.save_weights(path+'results/ft1.h5')
We pre-compute the output of the last convolution layer of VGG, since we're unlikely to need to fine-tune those layers. (All following analysis will be done on just the pre-computed convolutional features.)
model.load_weights(path+'results/ft1.h5')
conv_layers,fc_layers = split_at(model, Convolution2D)
conv_model = Sequential(conv_layers)
conv_feat = conv_model.predict(trn)
conv_val_feat = conv_model.predict(val)
conv_test_feat = conv_model.predict(test)
save_array(path+'results/conv_val_feat.dat', conv_val_feat)
save_array(path+'results/conv_feat.dat', conv_feat)
save_array(path+'results/conv_test_feat.dat', conv_test_feat)
conv_feat = load_array(path+'results/conv_feat.dat')
conv_val_feat = load_array(path+'results/conv_val_feat.dat')
conv_test_feat = load_array(path+'results/conv_test_feat.dat')
conv_val_feat.shape
(500, 512, 14, 14)
We can now create our first baseline model - a simple 3-layer FC net.
def get_bn_layers(p):
return [
MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
BatchNormalization(axis=1),
Dropout(p/4),
Flatten(),
Dense(512, activation='relu'),
BatchNormalization(),
Dropout(p),
Dense(512, activation='relu'),
BatchNormalization(),
Dropout(p/2),
Dense(8, activation='softmax')
]
p=0.6
bn_model = Sequential(get_bn_layers(p))
bn_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=3,
validation_data=(conv_val_feat, val_labels))
Train on 3277 samples, validate on 500 samples Epoch 1/3 3277/3277 [==============================] - 2s - loss: 1.1471 - acc: 0.6570 - val_loss: 0.6800 - val_acc: 0.8720 Epoch 2/3 3277/3277 [==============================] - 1s - loss: 0.3055 - acc: 0.9057 - val_loss: 0.1853 - val_acc: 0.9480 Epoch 3/3 3277/3277 [==============================] - 1s - loss: 0.1909 - acc: 0.9439 - val_loss: 0.1124 - val_acc: 0.9740
<keras.callbacks.History at 0x7fb67bedc190>
bn_model.optimizer.lr = 1e-4
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=7,
validation_data=(conv_val_feat, val_labels))
Train on 3277 samples, validate on 500 samples Epoch 1/7 3277/3277 [==============================] - 1s - loss: 0.1002 - acc: 0.9695 - val_loss: 0.1023 - val_acc: 0.9840 Epoch 2/7 3277/3277 [==============================] - 1s - loss: 0.0666 - acc: 0.9826 - val_loss: 0.0974 - val_acc: 0.9840 Epoch 3/7 3277/3277 [==============================] - 1s - loss: 0.0633 - acc: 0.9814 - val_loss: 0.1327 - val_acc: 0.9780 Epoch 4/7 3277/3277 [==============================] - 1s - loss: 0.0443 - acc: 0.9875 - val_loss: 0.1313 - val_acc: 0.9820 Epoch 5/7 3277/3277 [==============================] - 1s - loss: 0.0393 - acc: 0.9866 - val_loss: 0.1056 - val_acc: 0.9880 Epoch 6/7 3277/3277 [==============================] - 1s - loss: 0.0475 - acc: 0.9850 - val_loss: 0.1051 - val_acc: 0.9880 Epoch 7/7 3277/3277 [==============================] - 1s - loss: 0.0383 - acc: 0.9884 - val_loss: 0.1047 - val_acc: 0.9880
<keras.callbacks.History at 0x7fb67bedc690>
bn_model.save_weights(path+'models/conv_512_6.h5')
bn_model.evaluate(conv_val_feat, val_labels)
500/500 [==============================] - 0s
[0.10466163465986028, 0.98799999952316286]
bn_model.load_weights(path+'models/conv_512_6.h5')
The images are of different sizes, which are likely to represent the boat they came from (since different boats will use different cameras). Perhaps this creates some data leakage that we can take advantage of to get a better Kaggle leaderboard position? To find out, first we create arrays of the file sizes for each image:
sizes = [PIL.Image.open(path+'train/'+f).size for f in filenames]
id2size = list(set(sizes))
size2id = {o:i for i,o in enumerate(id2size)}
import collections
collections.Counter(sizes)
Counter({(1192, 670): 169, (1244, 700): 23, (1276, 718): 192, (1280, 720): 1880, (1280, 750): 520, (1280, 924): 51, (1280, 974): 344, (1334, 750): 28, (1518, 854): 37, (1732, 974): 33})
Then we one-hot encode them (since we want to treat them as categorical) and normalize the data.
trn_sizes_orig = to_categorical([size2id[o] for o in sizes], len(id2size))
raw_val_sizes = [PIL.Image.open(path+'valid/'+f).size for f in val_filenames]
val_sizes = to_categorical([size2id[o] for o in raw_val_sizes], len(id2size))
trn_sizes = trn_sizes_orig-trn_sizes_orig.mean(axis=0)/trn_sizes_orig.std(axis=0)
val_sizes = val_sizes-trn_sizes_orig.mean(axis=0)/trn_sizes_orig.std(axis=0)
To use this additional "meta-data", we create a model with multiple input layers - sz_inp
will be our input for the size information.
p=0.6
inp = Input(conv_layers[-1].output_shape[1:])
sz_inp = Input((len(id2size),))
bn_inp = BatchNormalization()(sz_inp)
x = MaxPooling2D()(inp)
x = BatchNormalization(axis=1)(x)
x = Dropout(p/4)(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = BatchNormalization()(x)
x = Dropout(p)(x)
x = Dense(512, activation='relu')(x)
x = BatchNormalization()(x)
x = Dropout(p/2)(x)
x = merge([x,bn_inp], 'concat')
x = Dense(8, activation='softmax')(x)
When we compile the model, we have to specify all the input layers in an array.
model = Model([inp, sz_inp], x)
model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
And when we train the model, we have to provide all the input layers' data in an array.
model.fit([conv_feat, trn_sizes], trn_labels, batch_size=batch_size, nb_epoch=3,
validation_data=([conv_val_feat, val_sizes], val_labels))
Train on 3277 samples, validate on 500 samples Epoch 1/3 3277/3277 [==============================] - 2s - loss: 1.1621 - acc: 0.6613 - val_loss: 0.6370 - val_acc: 0.8860 Epoch 2/3 3277/3277 [==============================] - 2s - loss: 0.3222 - acc: 0.8993 - val_loss: 0.2118 - val_acc: 0.9600 Epoch 3/3 3277/3277 [==============================] - 2s - loss: 0.1668 - acc: 0.9542 - val_loss: 0.1140 - val_acc: 0.9740
<keras.callbacks.History at 0x7fb673834b50>
bn_model.optimizer.lr = 1e-4
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=8,
validation_data=(conv_val_feat, val_labels))
Train on 3277 samples, validate on 500 samples Epoch 1/8 3277/3277 [==============================] - 1s - loss: 0.0666 - acc: 0.9774 - val_loss: 0.0928 - val_acc: 0.9900 Epoch 2/8 3277/3277 [==============================] - 1s - loss: 0.0363 - acc: 0.9905 - val_loss: 0.0997 - val_acc: 0.9840 Epoch 3/8 3277/3277 [==============================] - 1s - loss: 0.0292 - acc: 0.9924 - val_loss: 0.0840 - val_acc: 0.9900 Epoch 4/8 3277/3277 [==============================] - 1s - loss: 0.0375 - acc: 0.9905 - val_loss: 0.0862 - val_acc: 0.9900 Epoch 5/8 3277/3277 [==============================] - 1s - loss: 0.0218 - acc: 0.9939 - val_loss: 0.1155 - val_acc: 0.9880 Epoch 6/8 3277/3277 [==============================] - 1s - loss: 0.0292 - acc: 0.9905 - val_loss: 0.1033 - val_acc: 0.9900 Epoch 7/8 3277/3277 [==============================] - 1s - loss: 0.0173 - acc: 0.9921 - val_loss: 0.1030 - val_acc: 0.9900 Epoch 8/8 3277/3277 [==============================] - 1s - loss: 0.0186 - acc: 0.9936 - val_loss: 0.1058 - val_acc: 0.9900
<keras.callbacks.History at 0x7fb6738b0310>
The model did not show an improvement by using the leakage, other than in the early epochs. This is most likely because the information about what boat the picture came from is readily identified from the image itself, so the meta-data turned out not to add any additional information.
A kaggle user has created bounding box annotations for each fish in each training set image. You can download them from here. We will see if we can utilize this additional information. First, we'll load in the data, and keep just the largest bounding box for each image.
import ujson as json
anno_classes = ['alb', 'bet', 'dol', 'lag', 'other', 'shark', 'yft']
def get_annotations():
annot_urls = {
'5458/bet_labels.json': 'bd20591439b650f44b36b72a98d3ce27',
'5459/shark_labels.json': '94b1b3110ca58ff4788fb659eda7da90',
'5460/dol_labels.json': '91a25d29a29b7e8b8d7a8770355993de',
'5461/yft_labels.json': '9ef63caad8f076457d48a21986d81ddc',
'5462/alb_labels.json': '731c74d347748b5272042f0661dad37c',
'5463/lag_labels.json': '92d75d9218c3333ac31d74125f2b380a'
}
cache_subdir = os.path.abspath(os.path.join(path, 'annos'))
url_prefix = 'https://kaggle2.blob.core.windows.net/forum-message-attachments/147157/'
if not os.path.exists(cache_subdir):
os.makedirs(cache_subdir)
for url_suffix, md5_hash in annot_urls.iteritems():
fname = url_suffix.rsplit('/', 1)[-1]
get_file(fname, url_prefix + url_suffix, cache_subdir=cache_subdir, md5_hash=md5_hash)
get_annotations()
Downloading data from https://kaggle2.blob.core.windows.net/forum-message-attachments/147157/5463/lag_labels.json 16384/30731 [==============>...............] - ETA: 0sDownloading data from https://kaggle2.blob.core.windows.net/forum-message-attachments/147157/5459/shark_labels.json 49152/68097 [====================>.........] - ETA: 0sDownloading data from https://kaggle2.blob.core.windows.net/forum-message-attachments/147157/5461/yft_labels.json 212992/284511 [=====================>........] - ETA: 0sDownloading data from https://kaggle2.blob.core.windows.net/forum-message-attachments/147157/5458/bet_labels.json 49152/82471 [================>.............] - ETA: 0sDownloading data from https://kaggle2.blob.core.windows.net/forum-message-attachments/147157/5462/alb_labels.json 606208/775061 [======================>.......] - ETA: 0sDownloading data from https://kaggle2.blob.core.windows.net/forum-message-attachments/147157/5460/dol_labels.json 16384/41584 [==========>...................] - ETA: 0s
bb_json = {}
for c in anno_classes:
if c == 'other': continue # no annotation file for "other" class
j = json.load(open('{}annos/{}_labels.json'.format(path, c), 'r'))
for l in j:
if 'annotations' in l.keys() and len(l['annotations'])>0:
bb_json[l['filename'].split('/')[-1]] = sorted(
l['annotations'], key=lambda x: x['height']*x['width'])[-1]
bb_json['img_04908.jpg']
{u'class': u'rect', u'height': 246.75000000000074, u'width': 432.8700000000013, u'x': 465.3000000000014, u'y': 496.32000000000147}
file2idx = {o:i for i,o in enumerate(raw_filenames)}
val_file2idx = {o:i for i,o in enumerate(raw_val_filenames)}
For any images that have no annotations, we'll create an empty bounding box.
empty_bbox = {'height': 0., 'width': 0., 'x': 0., 'y': 0.}
for f in raw_filenames:
if not f in bb_json.keys(): bb_json[f] = empty_bbox
for f in raw_val_filenames:
if not f in bb_json.keys(): bb_json[f] = empty_bbox
Finally, we convert the dictionary into an array, and convert the coordinates to our resized 224x224 images.
bb_params = ['height', 'width', 'x', 'y']
def convert_bb(bb, size):
bb = [bb[p] for p in bb_params]
conv_x = (224. / size[0])
conv_y = (224. / size[1])
bb[0] = bb[0]*conv_y
bb[1] = bb[1]*conv_x
bb[2] = max(bb[2]*conv_x, 0)
bb[3] = max(bb[3]*conv_y, 0)
return bb
trn_bbox = np.stack([convert_bb(bb_json[f], s) for f,s in zip(raw_filenames, sizes)],
).astype(np.float32)
val_bbox = np.stack([convert_bb(bb_json[f], s)
for f,s in zip(raw_val_filenames, raw_val_sizes)]).astype(np.float32)
Now we can check our work by drawing one of the annotations.
def create_rect(bb, color='red'):
return plt.Rectangle((bb[2], bb[3]), bb[1], bb[0], color=color, fill=False, lw=3)
def show_bb(i):
bb = val_bbox[i]
plot(val[i])
plt.gca().add_patch(create_rect(bb))
show_bb(0)
Since we're not allowed (by the kaggle rules) to manually annotate the test set, we'll need to create a model that predicts the locations of the bounding box on each image. To do so, we create a model with multiple outputs: it will predict both the type of fish (the 'class'), and the 4 bounding box coordinates. We prefer this approach to only predicting the bounding box coordinates, since we hope that giving the model more context about what it's looking for will help it with both tasks.
p=0.6
inp = Input(conv_layers[-1].output_shape[1:])
x = MaxPooling2D()(inp)
x = BatchNormalization(axis=1)(x)
x = Dropout(p/4)(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = BatchNormalization()(x)
x = Dropout(p)(x)
x = Dense(512, activation='relu')(x)
x = BatchNormalization()(x)
x = Dropout(p/2)(x)
x_bb = Dense(4, name='bb')(x)
x_class = Dense(8, activation='softmax', name='class')(x)
Since we have multiple outputs, we need to provide them to the model constructor in an array, and we also need to say what loss function to use for each. We also weight the bounding box loss function down by 1000x since the scale of the cross-entropy loss and the MSE is very different.
model = Model([inp], [x_bb, x_class])
model.compile(Adam(lr=0.001), loss=['mse', 'categorical_crossentropy'], metrics=['accuracy'],
loss_weights=[.001, 1.])
model.fit(conv_feat, [trn_bbox, trn_labels], batch_size=batch_size, nb_epoch=3,
validation_data=(conv_val_feat, [val_bbox, val_labels]))
Train on 3277 samples, validate on 500 samples Epoch 1/3 3277/3277 [==============================] - 2s - loss: 6.1604 - bb_loss: 5030.2780 - class_loss: 1.1302 - bb_acc: 0.4007 - class_acc: 0.6710 - val_loss: 4.8844 - val_bb_loss: 4078.7171 - val_class_loss: 0.8057 - val_bb_acc: 0.4500 - val_class_acc: 0.8400 Epoch 2/3 3277/3277 [==============================] - 1s - loss: 5.0668 - bb_loss: 4743.1687 - class_loss: 0.3237 - bb_acc: 0.4809 - class_acc: 0.9057 - val_loss: 4.1392 - val_bb_loss: 3909.2589 - val_class_loss: 0.2300 - val_bb_acc: 0.5020 - val_class_acc: 0.9500 Epoch 3/3 3277/3277 [==============================] - 1s - loss: 4.4301 - bb_loss: 4272.6528 - class_loss: 0.1574 - bb_acc: 0.5362 - class_acc: 0.9503 - val_loss: 3.5525 - val_bb_loss: 3419.4472 - val_class_loss: 0.1331 - val_bb_acc: 0.5960 - val_class_acc: 0.9800
<keras.callbacks.History at 0x7fb666146050>
model.optimizer.lr = 1e-5
model.fit(conv_feat, [trn_bbox, trn_labels], batch_size=batch_size, nb_epoch=10,
validation_data=(conv_val_feat, [val_bbox, val_labels]))
Train on 3277 samples, validate on 500 samples Epoch 1/10 3277/3277 [==============================] - 1s - loss: 3.7350 - bb_loss: 3629.4665 - class_loss: 0.1055 - bb_acc: 0.5795 - class_acc: 0.9716 - val_loss: 3.0999 - val_bb_loss: 2971.5778 - val_class_loss: 0.1283 - val_bb_acc: 0.6280 - val_class_acc: 0.9800 Epoch 2/10 3277/3277 [==============================] - 1s - loss: 2.9639 - bb_loss: 2878.7579 - class_loss: 0.0851 - bb_acc: 0.6128 - class_acc: 0.9768 - val_loss: 2.4759 - val_bb_loss: 2343.8384 - val_class_loss: 0.1321 - val_bb_acc: 0.6140 - val_class_acc: 0.9800 Epoch 3/10 3277/3277 [==============================] - 1s - loss: 2.0806 - bb_loss: 2034.5301 - class_loss: 0.0461 - bb_acc: 0.6536 - class_acc: 0.9884 - val_loss: 1.8290 - val_bb_loss: 1723.2244 - val_class_loss: 0.1057 - val_bb_acc: 0.6600 - val_class_acc: 0.9800 Epoch 4/10 3277/3277 [==============================] - 1s - loss: 1.3482 - bb_loss: 1292.2710 - class_loss: 0.0559 - bb_acc: 0.6945 - class_acc: 0.9869 - val_loss: 1.1265 - val_bb_loss: 1000.0838 - val_class_loss: 0.1264 - val_bb_acc: 0.7420 - val_class_acc: 0.9760 Epoch 5/10 3277/3277 [==============================] - 1s - loss: 0.7963 - bb_loss: 738.2009 - class_loss: 0.0581 - bb_acc: 0.7592 - class_acc: 0.9835 - val_loss: 0.6311 - val_bb_loss: 510.5506 - val_class_loss: 0.1205 - val_bb_acc: 0.8040 - val_class_acc: 0.9760 Epoch 6/10 3277/3277 [==============================] - 1s - loss: 0.4994 - bb_loss: 458.9412 - class_loss: 0.0404 - bb_acc: 0.8071 - class_acc: 0.9899 - val_loss: 0.4939 - val_bb_loss: 378.9873 - val_class_loss: 0.1149 - val_bb_acc: 0.8460 - val_class_acc: 0.9820 Epoch 7/10 3277/3277 [==============================] - 1s - loss: 0.3590 - bb_loss: 325.8812 - class_loss: 0.0331 - bb_acc: 0.8142 - class_acc: 0.9918 - val_loss: 0.4052 - val_bb_loss: 301.7754 - val_class_loss: 0.1035 - val_bb_acc: 0.8580 - val_class_acc: 0.9840 Epoch 8/10 3277/3277 [==============================] - 1s - loss: 0.2903 - bb_loss: 268.2826 - class_loss: 0.0220 - bb_acc: 0.8166 - class_acc: 0.9945 - val_loss: 0.3705 - val_bb_loss: 279.8890 - val_class_loss: 0.0907 - val_bb_acc: 0.8620 - val_class_acc: 0.9860 Epoch 9/10 3277/3277 [==============================] - 1s - loss: 0.2753 - bb_loss: 250.3531 - class_loss: 0.0250 - bb_acc: 0.8053 - class_acc: 0.9927 - val_loss: 0.3722 - val_bb_loss: 260.1041 - val_class_loss: 0.1121 - val_bb_acc: 0.8600 - val_class_acc: 0.9820 Epoch 10/10 3277/3277 [==============================] - 1s - loss: 0.2570 - bb_loss: 231.1665 - class_loss: 0.0259 - bb_acc: 0.7943 - class_acc: 0.9930 - val_loss: 0.3726 - val_bb_loss: 249.7608 - val_class_loss: 0.1228 - val_bb_acc: 0.8460 - val_class_acc: 0.9780
<keras.callbacks.History at 0x7fb66f083990>
Excitingly, it turned out that the classification model is much improved by giving it this additional task. Let's see how well the bounding box model did by taking a look at its output.
pred = model.predict(conv_val_feat[0:10])
def show_bb_pred(i):
bb = val_bbox[i]
bb_pred = pred[0][i]
plt.figure(figsize=(6,6))
plot(val[i])
ax=plt.gca()
ax.add_patch(create_rect(bb_pred, 'yellow'))
ax.add_patch(create_rect(bb))
The image shows that it can find fish that are tricky for us to see!
show_bb_pred(6)
model.evaluate(conv_val_feat, [val_bbox, val_labels])
480/500 [===========================>..] - ETA: 0s
[0.37256381034851072, 249.76081030273437, 0.12280298530682922, 0.84599999999999997, 0.97799999952316286]
model.save_weights(path+'models/bn_anno.h5')
model.load_weights(path+'models/bn_anno.h5')
Let's see if we get better results if we use larger images. We'll use 640x360, since it's the same shape as the most common size we saw earlier (1280x720), without being too big.
trn = get_data(path+'train', (360,640))
val = get_data(path+'valid', (360,640))
Found 3277 images belonging to 8 classes. Found 500 images belonging to 8 classes.
The image shows that things are much clearer at this size.
plot(trn[0])
test = get_data(path+'test', (360,640))
Found 1000 images belonging to 1 classes.
save_array(path+'results/trn_640.dat', trn)
save_array(path+'results/val_640.dat', val)
save_array(path+'results/test_640.dat', test)
trn = load_array(path+'results/trn_640.dat')
val = load_array(path+'results/val_640.dat')
We can now create our VGG model - we'll need to tell it we're not using the normal 224x224 images, which also means it won't include the fully connected layers (since they don't make sense for non-default sizes). We will also remove the last max pooling layer, since we don't want to throw away information yet.
vgg640 = Vgg16BN((360, 640)).model
vgg640.pop()
vgg640.input_shape, vgg640.output_shape
vgg640.compile(Adam(), 'categorical_crossentropy', metrics=['accuracy'])
We can now pre-compute the output of the convolutional part of VGG.
conv_val_feat = vgg640.predict(val, batch_size=32, verbose=1)
conv_trn_feat = vgg640.predict(trn, batch_size=32, verbose=1)
500/500 [==============================] - 57s 3277/3277 [==============================] - 390s
save_array(path+'results/conv_val_640.dat', conv_val_feat)
save_array(path+'results/conv_trn_640.dat', conv_trn_feat)
conv_test_feat = vgg640.predict(test, batch_size=32, verbose=1)
1000/1000 [==============================] - 115s
save_array(path+'results/conv_test_640.dat', conv_test_feat)
conv_val_feat = load_array(path+'results/conv_val_640.dat')
conv_trn_feat = load_array(path+'results/conv_trn_640.dat')
conv_test_feat = load_array(path+'results/conv_test_640.dat')
Since we're using a larger input, the output of the final convolutional layer is also larger. So we probably don't want to put a dense layer there - that would be a lot of parameters! Instead, let's use a fully convolutional net (FCN); this also has the benefit that they tend to generalize well, and also seems like a good fit for our problem (since the fish are a small part of the image).
conv_layers,_ = split_at(vgg640, Convolution2D)
I'm not using any dropout, since I found I got better results without it.
nf=128; p=0.
def get_lrg_layers():
return [
BatchNormalization(axis=1, input_shape=conv_layers[-1].output_shape[1:]),
Convolution2D(nf,3,3, activation='relu', border_mode='same'),
BatchNormalization(axis=1),
MaxPooling2D(),
Convolution2D(nf,3,3, activation='relu', border_mode='same'),
BatchNormalization(axis=1),
MaxPooling2D(),
Convolution2D(nf,3,3, activation='relu', border_mode='same'),
BatchNormalization(axis=1),
MaxPooling2D((1,2)),
Convolution2D(8,3,3, border_mode='same'),
Dropout(p),
GlobalAveragePooling2D(),
Activation('softmax')
]
lrg_model = Sequential(get_lrg_layers())
lrg_model.summary()
lrg_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
lrg_model.fit(conv_trn_feat, trn_labels, batch_size=batch_size, nb_epoch=2,
validation_data=(conv_val_feat, val_labels))
lrg_model.optimizer.lr=1e-5
lrg_model.fit(conv_trn_feat, trn_labels, batch_size=batch_size, nb_epoch=6,
validation_data=(conv_val_feat, val_labels))
Train on 3277 samples, validate on 500 samples Epoch 1/6 3277/3277 [==============================] - 6s - loss: 0.0417 - acc: 0.9896 - val_loss: 0.2282 - val_acc: 0.9440 Epoch 2/6 3277/3277 [==============================] - 6s - loss: 0.0346 - acc: 0.9921 - val_loss: 0.1377 - val_acc: 0.9520 Epoch 3/6 3277/3277 [==============================] - 6s - loss: 0.0090 - acc: 0.9979 - val_loss: 0.1374 - val_acc: 0.9620 Epoch 4/6 3277/3277 [==============================] - 6s - loss: 0.0097 - acc: 0.9988 - val_loss: 0.1127 - val_acc: 0.9680 Epoch 5/6 3277/3277 [==============================] - 6s - loss: 0.0041 - acc: 0.9994 - val_loss: 0.0969 - val_acc: 0.9740 Epoch 6/6 3277/3277 [==============================] - 6s - loss: 8.9966e-04 - acc: 1.0000 - val_loss: 0.0976 - val_acc: 0.9760
<keras.callbacks.History at 0x7fb564eadf90>
When I submitted the results of this model to Kaggle, I got the best single model results of any shown here (ranked 22nd on the leaderboard as at Dec-6-2016.)
lrg_model.save_weights(path+'models/lrg_nmp.h5')
lrg_model.load_weights(path+'models/lrg_nmp.h5')
lrg_model.evaluate(conv_val_feat, val_labels)
500/500 [==============================] - 0s
[0.097560357421636587, 0.97599999999999998]
Another benefit of this kind of model is that the last convolutional layer has to learn to classify each part of the image (since there's only an average pooling layer after). Let's create a function that grabs the output of this layer (which is the 4th-last layer of our model).
l = lrg_model.layers
conv_fn = K.function([l[0].input, K.learning_phase()], l[-4].output)
def get_cm(inp, label):
conv = conv_fn([inp,0])[0, label]
return scipy.misc.imresize(conv, (360,640), interp='nearest')
We have to add an extra dimension to our input since the CNN expects a 'batch' (even if it's just a batch of one).
inp = np.expand_dims(conv_val_feat[0], 0)
np.round(lrg_model.predict(inp)[0],2)
array([ 0.82, 0. , 0. , 0. , 0.17, 0. , 0. , 0. ], dtype=float32)
plt.imshow(to_plot(val[0]))
<matplotlib.image.AxesImage at 0x7faba62ae650>
cm = get_cm(inp, 0)
The heatmap shows that (at very low resolution) the model is finding the fish!
plt.imshow(cm, cmap="cool")
<matplotlib.image.AxesImage at 0x7faba61f8210>
To create a higher resolution heatmap, we'll remove all the max pooling layers, and repeat the previous steps.
def get_lrg_layers():
return [
BatchNormalization(axis=1, input_shape=conv_layers[-1].output_shape[1:]),
Convolution2D(nf,3,3, activation='relu', border_mode='same'),
BatchNormalization(axis=1),
Convolution2D(nf,3,3, activation='relu', border_mode='same'),
BatchNormalization(axis=1),
Convolution2D(nf,3,3, activation='relu', border_mode='same'),
BatchNormalization(axis=1),
Convolution2D(8,3,3, border_mode='same'),
GlobalAveragePooling2D(),
Activation('softmax')
]
lrg_model = Sequential(get_lrg_layers())
lrg_model.summary()
____________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ==================================================================================================== batchnormalization_2 (BatchNorma (None, 512, 22, 40) 1024 batchnormalization_input_1[0][0] ____________________________________________________________________________________________________ convolution2d_14 (Convolution2D) (None, 128, 22, 40) 589952 batchnormalization_2[0][0] ____________________________________________________________________________________________________ batchnormalization_3 (BatchNorma (None, 128, 22, 40) 256 convolution2d_14[0][0] ____________________________________________________________________________________________________ convolution2d_15 (Convolution2D) (None, 128, 22, 40) 147584 batchnormalization_3[0][0] ____________________________________________________________________________________________________ batchnormalization_4 (BatchNorma (None, 128, 22, 40) 256 convolution2d_15[0][0] ____________________________________________________________________________________________________ convolution2d_16 (Convolution2D) (None, 128, 22, 40) 147584 batchnormalization_4[0][0] ____________________________________________________________________________________________________ batchnormalization_5 (BatchNorma (None, 128, 22, 40) 256 convolution2d_16[0][0] ____________________________________________________________________________________________________ convolution2d_17 (Convolution2D) (None, 8, 22, 40) 9224 batchnormalization_5[0][0] ____________________________________________________________________________________________________ globalaveragepooling2d_1 (Global (None, 8) 0 convolution2d_17[0][0] ____________________________________________________________________________________________________ activation_1 (Activation) (None, 8) 0 globalaveragepooling2d_1[0][0] ==================================================================================================== Total params: 896136 ____________________________________________________________________________________________________
lrg_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
lrg_model.fit(conv_trn_feat, trn_labels, batch_size=batch_size, nb_epoch=2,
validation_data=(conv_val_feat, val_labels))
Train on 3277 samples, validate on 500 samples Epoch 1/2 3277/3277 [==============================] - 11s - loss: 0.9377 - acc: 0.7128 - val_loss: 2.7022 - val_acc: 0.6260 Epoch 2/2 3277/3277 [==============================] - 10s - loss: 0.2603 - acc: 0.9271 - val_loss: 0.4385 - val_acc: 0.9080
<keras.callbacks.History at 0x7faba24a3b90>
lrg_model.optimizer.lr=1e-5
lrg_model.fit(conv_trn_feat, trn_labels, batch_size=batch_size, nb_epoch=6,
validation_data=(conv_val_feat, val_labels))
Train on 3277 samples, validate on 500 samples Epoch 1/6 3277/3277 [==============================] - 11s - loss: 0.1027 - acc: 0.9747 - val_loss: 0.3641 - val_acc: 0.9060 Epoch 2/6 3277/3277 [==============================] - 10s - loss: 0.0498 - acc: 0.9844 - val_loss: 0.2743 - val_acc: 0.9200 Epoch 3/6 3277/3277 [==============================] - 10s - loss: 0.0359 - acc: 0.9918 - val_loss: 0.2262 - val_acc: 0.9520 Epoch 4/6 3277/3277 [==============================] - 11s - loss: 0.0339 - acc: 0.9912 - val_loss: 0.1877 - val_acc: 0.9540 Epoch 5/6 3277/3277 [==============================] - 10s - loss: 0.0242 - acc: 0.9945 - val_loss: 0.2320 - val_acc: 0.9460 Epoch 6/6 3277/3277 [==============================] - 10s - loss: 0.0211 - acc: 0.9930 - val_loss: 0.1813 - val_acc: 0.9520
<keras.callbacks.History at 0x7faba24a8450>
lrg_model.save_weights(path+'models/lrg_0mp.h5')
lrg_model.load_weights(path+'models/lrg_0mp.h5')
l = lrg_model.layers
conv_fn = K.function([l[0].input, K.learning_phase()], l[-3].output)
def get_cm2(inp, label):
conv = conv_fn([inp,0])[0, label]
return scipy.misc.imresize(conv, (360,640))
inp = np.expand_dims(conv_val_feat[0], 0)
plt.imshow(to_plot(val[0]))
<matplotlib.image.AxesImage at 0x7faba0f47390>
cm = get_cm2(inp, 0)
cm = get_cm2(inp, 4)
plt.imshow(cm, cmap="cool")
<matplotlib.image.AxesImage at 0x7faba09bd350>
plt.figure(figsize=(10,10))
plot(val[0])
plt.imshow(cm, cmap="cool", alpha=0.5)
<matplotlib.image.AxesImage at 0x7faba24a3c50>
Here's an example of how to create and use "inception blocks" - as you see, they use multiple different convolution filter sizes and concatenate the results together. We'll talk more about these next year.
def conv2d_bn(x, nb_filter, nb_row, nb_col, subsample=(1, 1)):
x = Convolution2D(nb_filter, nb_row, nb_col,
subsample=subsample, activation='relu', border_mode='same')(x)
return BatchNormalization(axis=1)(x)
def incep_block(x):
branch1x1 = conv2d_bn(x, 32, 1, 1, subsample=(2, 2))
branch5x5 = conv2d_bn(x, 24, 1, 1)
branch5x5 = conv2d_bn(branch5x5, 32, 5, 5, subsample=(2, 2))
branch3x3dbl = conv2d_bn(x, 32, 1, 1)
branch3x3dbl = conv2d_bn(branch3x3dbl, 48, 3, 3)
branch3x3dbl = conv2d_bn(branch3x3dbl, 48, 3, 3, subsample=(2, 2))
branch_pool = AveragePooling2D(
(3, 3), strides=(2, 2), border_mode='same')(x)
branch_pool = conv2d_bn(branch_pool, 16, 1, 1)
return merge([branch1x1, branch5x5, branch3x3dbl, branch_pool],
mode='concat', concat_axis=1)
inp = Input(vgg640.layers[-1].output_shape[1:])
x = BatchNormalization(axis=1)(inp)
x = incep_block(x)
x = incep_block(x)
x = incep_block(x)
x = Dropout(0.75)(x)
x = Convolution2D(8,3,3, border_mode='same')(x)
x = GlobalAveragePooling2D()(x)
outp = Activation('softmax')(x)
lrg_model = Model([inp], outp)
lrg_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
lrg_model.fit(conv_trn_feat, trn_labels, batch_size=batch_size, nb_epoch=2,
validation_data=(conv_val_feat, val_labels))
Train on 3277 samples, validate on 500 samples Epoch 1/2 3277/3277 [==============================] - 6s - loss: 1.3251 - acc: 0.5536 - val_loss: 1.4769 - val_acc: 0.5080 Epoch 2/2 3277/3277 [==============================] - 6s - loss: 0.4601 - acc: 0.8691 - val_loss: 0.9860 - val_acc: 0.7460
<keras.callbacks.History at 0x7fb61d3f30d0>
lrg_model.optimizer.lr=1e-5
lrg_model.fit(conv_trn_feat, trn_labels, batch_size=batch_size, nb_epoch=6,
validation_data=(conv_val_feat, val_labels))
Train on 3277 samples, validate on 500 samples Epoch 1/6 3277/3277 [==============================] - 6s - loss: 0.0260 - acc: 0.9945 - val_loss: 0.2117 - val_acc: 0.9480 Epoch 2/6 3277/3277 [==============================] - 6s - loss: 0.0240 - acc: 0.9957 - val_loss: 0.3007 - val_acc: 0.9280 Epoch 3/6 3277/3277 [==============================] - 6s - loss: 0.0120 - acc: 0.9976 - val_loss: 0.2506 - val_acc: 0.9500 Epoch 4/6 3277/3277 [==============================] - 6s - loss: 0.0060 - acc: 0.9991 - val_loss: 0.2389 - val_acc: 0.9480 Epoch 5/6 3277/3277 [==============================] - 6s - loss: 0.0029 - acc: 1.0000 - val_loss: 0.2160 - val_acc: 0.9580 Epoch 6/6 3277/3277 [==============================] - 6s - loss: 0.0028 - acc: 0.9991 - val_loss: 0.2116 - val_acc: 0.9580
<keras.callbacks.History at 0x7fb62b9ffe10>
lrg_model.fit(conv_trn_feat, trn_labels, batch_size=batch_size, nb_epoch=10,
validation_data=(conv_val_feat, val_labels))
Train on 3277 samples, validate on 500 samples Epoch 1/10 3277/3277 [==============================] - 6s - loss: 0.0029 - acc: 1.0000 - val_loss: 0.1610 - val_acc: 0.9540 Epoch 2/10 3277/3277 [==============================] - 6s - loss: 0.0016 - acc: 1.0000 - val_loss: 0.1313 - val_acc: 0.9540 Epoch 3/10 3277/3277 [==============================] - 6s - loss: 0.0995 - acc: 0.9707 - val_loss: 0.4845 - val_acc: 0.8760 Epoch 4/10 3277/3277 [==============================] - 6s - loss: 0.1335 - acc: 0.9551 - val_loss: 0.3103 - val_acc: 0.9300 Epoch 5/10 3277/3277 [==============================] - 6s - loss: 0.0634 - acc: 0.9780 - val_loss: 0.2923 - val_acc: 0.9340 Epoch 6/10 3277/3277 [==============================] - 6s - loss: 0.0205 - acc: 0.9930 - val_loss: 0.2316 - val_acc: 0.9500 Epoch 7/10 3277/3277 [==============================] - 6s - loss: 0.0049 - acc: 0.9997 - val_loss: 0.2048 - val_acc: 0.9660 Epoch 8/10 3277/3277 [==============================] - 6s - loss: 0.0016 - acc: 1.0000 - val_loss: 0.1842 - val_acc: 0.9680 Epoch 9/10 3277/3277 [==============================] - 6s - loss: 0.0011 - acc: 1.0000 - val_loss: 0.1799 - val_acc: 0.9660 Epoch 10/10 3277/3277 [==============================] - 6s - loss: 7.4032e-04 - acc: 1.0000 - val_loss: 0.1740 - val_acc: 0.9640
<keras.callbacks.History at 0x7fb654a67650>
lrg_model.save_weights(path+'models/lrg_nmp.h5')
lrg_model.load_weights(path+'models/lrg_nmp.h5')
preds = model.predict([conv_test_feat, test_sizes], batch_size=batch_size*2)
gen = image.ImageDataGenerator()
test_batches = gen.flow(conv_test_feat, preds, batch_size=16)
val_batches = gen.flow(conv_val_feat, val_labels, batch_size=4)
batches = gen.flow(conv_feat, trn_labels, batch_size=44)
mi = MixIterator([batches, test_batches, val_batches])
bn_model.fit_generator(mi, mi.N, nb_epoch=8, validation_data=(conv_val_feat, val_labels))
Epoch 1/8 4833/4777 [==============================] - 4s - loss: 0.2538 - acc: 0.9462 - val_loss: 0.1313 - val_acc: 0.9700 Epoch 2/8 192/4777 [>.............................] - ETA: 2s - loss: 0.1972 - acc: 0.9635
/usr/local/lib/python2.7/dist-packages/keras/engine/training.py:1494: UserWarning: Epoch comprised more than `samples_per_epoch` samples, which might affect learning results. Set `samples_per_epoch` correctly to avoid this warning. warnings.warn('Epoch comprised more than '
4833/4777 [==============================] - 4s - loss: 0.2231 - acc: 0.9491 - val_loss: 0.0820 - val_acc: 0.9820 Epoch 3/8 4833/4777 [==============================] - 3s - loss: 0.1860 - acc: 0.9545 - val_loss: 0.0580 - val_acc: 0.9840 Epoch 4/8 4833/4777 [==============================] - 3s - loss: 0.1559 - acc: 0.9663 - val_loss: 0.0521 - val_acc: 0.9840 Epoch 5/8 4825/4777 [==============================] - 3s - loss: 0.1366 - acc: 0.9693 - val_loss: 0.0422 - val_acc: 0.9840 Epoch 6/8 4833/4777 [==============================] - 3s - loss: 0.1303 - acc: 0.9704 - val_loss: 0.0195 - val_acc: 0.9940 Epoch 7/8 4833/4777 [==============================] - 3s - loss: 0.1283 - acc: 0.9708 - val_loss: 0.0133 - val_acc: 0.9940 Epoch 8/8 4833/4777 [==============================] - 4s - loss: 0.1199 - acc: 0.9752 - val_loss: 0.0247 - val_acc: 0.9900
<keras.callbacks.History at 0x7f8a38b07e50>
def do_clip(arr, mx): return np.clip(arr, (1-mx)/7, mx)
lrg_model.evaluate(conv_val_feat, val_labels, batch_size*2)
500/500 [==============================] - 0s
[0.11417267167568207, 0.97199999332427978]
preds = model.predict(conv_test_feat, batch_size=batch_size)
preds = preds[1]
test = load_array(path+'results/test_640.dat')
test = load_array(path+'results/test.dat')
preds = conv_model.predict(test, batch_size=32)
subm = do_clip(preds,0.82)
subm_name = path+'results/subm_bb.gz'
# classes = sorted(batches.class_indices, key=batches.class_indices.get)
classes = ['ALB', 'BET', 'DOL', 'LAG', 'NoF', 'OTHER', 'SHARK', 'YFT']
submission = pd.DataFrame(subm, columns=classes)
submission.insert(0, 'image', raw_test_filenames)
submission.head()
image | ALB | BET | DOL | LAG | NoF | OTHER | SHARK | YFT | |
---|---|---|---|---|---|---|---|---|---|
0 | img_00005.jpg | 0.025714 | 0.025714 | 0.025714 | 0.025714 | 0.820000 | 0.025714 | 0.025714 | 0.025714 |
1 | img_00007.jpg | 0.820000 | 0.025714 | 0.025714 | 0.025714 | 0.025714 | 0.025714 | 0.025714 | 0.025714 |
2 | img_00009.jpg | 0.820000 | 0.025714 | 0.025714 | 0.025714 | 0.025714 | 0.025714 | 0.025714 | 0.025714 |
3 | img_00018.jpg | 0.457916 | 0.025714 | 0.025714 | 0.025714 | 0.025714 | 0.539635 | 0.025714 | 0.025714 |
4 | img_00027.jpg | 0.820000 | 0.025714 | 0.025714 | 0.025714 | 0.025714 | 0.025714 | 0.025714 | 0.102664 |
submission.to_csv(subm_name, index=False, compression='gzip')
FileLink(subm_name)