from theano.sandbox import cuda
cuda.use('gpu1')
Using gpu device 1: GeForce GTX TITAN X (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
%matplotlib inline
from __future__ import print_function, division
#path = "data/state/"
path = "data/state/sample/"
import utils; reload(utils)
from utils import *
from IPython.display import FileLink
Using Theano backend.
batch_size=64
The following assumes you've already created your validation set - remember that the training and validation set should contain different drivers, as mentioned on the Kaggle competition page.
%cd data/state
%cd train
%mkdir ../sample
%mkdir ../sample/train
%mkdir ../sample/valid
for d in glob('c?'):
os.mkdir('../sample/train/'+d)
os.mkdir('../sample/valid/'+d)
from shutil import copyfile
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1500): copyfile(shuf[i], '../sample/train/' + shuf[i])
%cd ../valid
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1000): copyfile(shuf[i], '../sample/valid/' + shuf[i])
%cd ../../..
%mkdir data/state/results
%mkdir data/state/sample/test
batches = get_batches(path+'train', batch_size=batch_size)
val_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=False)
Found 1568 images belonging to 10 classes. Found 1002 images belonging to 10 classes.
(val_classes, trn_classes, val_labels, trn_labels, val_filenames, filenames,
test_filename) = get_classes(path)
Found 1568 images belonging to 10 classes. Found 1002 images belonging to 10 classes. Found 0 images belonging to 0 classes.
First, we try the simplest model and use default parameters. Note the trick of making the first layer a batchnorm layer - that way we don't have to worry about normalizing the input ourselves.
model = Sequential([
BatchNormalization(axis=1, input_shape=(3,224,224)),
Flatten(),
Dense(10, activation='softmax')
])
As you can see below, this training is going nowhere...
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Epoch 1/2 1568/1568 [==============================] - 20s - loss: 13.8189 - acc: 0.1040 - val_loss: 13.5792 - val_acc: 0.1517 Epoch 2/2 1568/1568 [==============================] - 5s - loss: 14.4052 - acc: 0.1052 - val_loss: 13.8349 - val_acc: 0.1397
<keras.callbacks.History at 0x7fbf34d441d0>
Let's first check the number of parameters to see that there's enough parameters to find some useful relationships:
model.summary()
____________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ==================================================================================================== batchnormalization_65 (BatchNorma(None, 3, 224, 224) 6 batchnormalization_input_23[0][0] ____________________________________________________________________________________________________ flatten_23 (Flatten) (None, 150528) 0 batchnormalization_65[0][0] ____________________________________________________________________________________________________ dense_39 (Dense) (None, 10) 1505290 flatten_23[0][0] ==================================================================================================== Total params: 1505296 ____________________________________________________________________________________________________
Over 1.5 million parameters - that should be enough. Incidentally, it's worth checking you understand why this is the number of parameters in this layer:
10*3*224*224
150528
Since we have a simple model with no regularization and plenty of parameters, it seems most likely that our learning rate is too high. Perhaps it is jumping to a solution where it predicts one or two classes with high confidence, so that it can give a zero prediction to as many classes as possible - that's the best approach for a model that is no better than random, and there is likely to be where we would end up with a high learning rate. So let's check:
np.round(model.predict_generator(batches, batches.N)[:10],2)
array([[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.], [ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.], [ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.], [ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.]], dtype=float32)
Our hypothesis was correct. It's nearly always predicting class 1 or 6, with very high confidence. So let's try a lower learning rate:
model = Sequential([
BatchNormalization(axis=1, input_shape=(3,224,224)),
Flatten(),
Dense(10, activation='softmax')
])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Epoch 1/2 1568/1568 [==============================] - 7s - loss: 2.4180 - acc: 0.1575 - val_loss: 5.2975 - val_acc: 0.1477 Epoch 2/2 1568/1568 [==============================] - 5s - loss: 1.7690 - acc: 0.4196 - val_loss: 4.0165 - val_acc: 0.1926
<keras.callbacks.History at 0x7fbf22eda150>
Great - we found our way out of that hole... Now we can increase the learning rate and see where we can get to.
model.optimizer.lr=0.001
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Epoch 1/4 1568/1568 [==============================] - 7s - loss: 1.3763 - acc: 0.5816 - val_loss: 2.5994 - val_acc: 0.2884 Epoch 2/4 1568/1568 [==============================] - 5s - loss: 1.0961 - acc: 0.7136 - val_loss: 1.9945 - val_acc: 0.3902 Epoch 3/4 1568/1568 [==============================] - 5s - loss: 0.9395 - acc: 0.7730 - val_loss: 1.9828 - val_acc: 0.3822 Epoch 4/4 1568/1568 [==============================] - 5s - loss: 0.7894 - acc: 0.8323 - val_loss: 1.8041 - val_acc: 0.3962
<keras.callbacks.History at 0x7fbf2294e210>
We're stabilizing at validation accuracy of 0.39. Not great, but a lot better than random. Before moving on, let's check that our validation set on the sample is large enough that it gives consistent results:
rnd_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=True)
Found 1002 images belonging to 10 classes.
val_res = [model.evaluate_generator(rnd_batches, rnd_batches.nb_sample) for i in range(10)]
np.round(val_res, 2)
array([[ 4.4 , 0.49], [ 4.57, 0.49], [ 4.48, 0.48], [ 4.28, 0.51], [ 4.66, 0.48], [ 4.5 , 0.49], [ 4.46, 0.49], [ 4.51, 0.47], [ 4.45, 0.51], [ 4.47, 0.49]])
Yup, pretty consistent - if we see improvements of 3% or more, it's probably not random, based on the above samples.
The previous model is over-fitting a lot, but we can't use dropout since we only have one layer. We can try to decrease overfitting in our model by adding l2 regularization (i.e. add the sum of squares of the weights to our loss function):
model = Sequential([
BatchNormalization(axis=1, input_shape=(3,224,224)),
Flatten(),
Dense(10, activation='softmax', W_regularizer=l2(0.01))
])
model.compile(Adam(lr=10e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Epoch 1/2 1568/1568 [==============================] - 7s - loss: 5.7173 - acc: 0.2583 - val_loss: 14.5162 - val_acc: 0.0988 Epoch 2/2 1568/1568 [==============================] - 5s - loss: 2.5953 - acc: 0.6148 - val_loss: 4.8340 - val_acc: 0.3952
<keras.callbacks.History at 0x7fbf05ab7190>
model.optimizer.lr=0.001
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Epoch 1/4 1568/1568 [==============================] - 7s - loss: 1.5759 - acc: 0.8355 - val_loss: 4.3326 - val_acc: 0.3902 Epoch 2/4 1568/1568 [==============================] - 5s - loss: 0.9414 - acc: 0.8552 - val_loss: 3.5898 - val_acc: 0.3872 Epoch 3/4 1568/1568 [==============================] - 5s - loss: 0.4152 - acc: 0.9401 - val_loss: 2.3976 - val_acc: 0.4780 Epoch 4/4 1568/1568 [==============================] - 5s - loss: 0.3282 - acc: 0.9726 - val_loss: 2.3441 - val_acc: 0.5100
<keras.callbacks.History at 0x7fbf1a862dd0>
Looks like we can get a bit over 50% accuracy this way. This will be a good benchmark for our future models - if we can't beat 50%, then we're not even beating a linear model trained on a sample, so we'll know that's not a good approach.
The next simplest model is to add a single hidden layer.
model = Sequential([
BatchNormalization(axis=1, input_shape=(3,224,224)),
Flatten(),
Dense(100, activation='relu'),
BatchNormalization(),
Dense(10, activation='softmax')
])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
model.optimizer.lr = 0.01
model.fit_generator(batches, batches.nb_sample, nb_epoch=5, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Epoch 1/2 1568/1568 [==============================] - 7s - loss: 2.0182 - acc: 0.3412 - val_loss: 3.4769 - val_acc: 0.2435 Epoch 2/2 1568/1568 [==============================] - 5s - loss: 1.0104 - acc: 0.7379 - val_loss: 2.2270 - val_acc: 0.4361 Epoch 1/5 1568/1568 [==============================] - 7s - loss: 0.5350 - acc: 0.9043 - val_loss: 1.8474 - val_acc: 0.4621 Epoch 2/5 1568/1568 [==============================] - 5s - loss: 0.3459 - acc: 0.9458 - val_loss: 1.9591 - val_acc: 0.4222 Epoch 3/5 1568/1568 [==============================] - 5s - loss: 0.2296 - acc: 0.9802 - val_loss: 1.7887 - val_acc: 0.4441 Epoch 4/5 1568/1568 [==============================] - 5s - loss: 0.1591 - acc: 0.9936 - val_loss: 1.6847 - val_acc: 0.4830 Epoch 5/5 1568/1568 [==============================] - 5s - loss: 0.1204 - acc: 0.9943 - val_loss: 1.6344 - val_acc: 0.4910
<keras.callbacks.History at 0x7fbf0da4f990>
Not looking very encouraging... which isn't surprising since we know that CNNs are a much better choice for computer vision problems. So we'll try one.
2 conv layers with max pooling followed by a simple dense network is a good simple CNN to start with:
def conv1(batches):
model = Sequential([
BatchNormalization(axis=1, input_shape=(3,224,224)),
Convolution2D(32,3,3, activation='relu'),
BatchNormalization(axis=1),
MaxPooling2D((3,3)),
Convolution2D(64,3,3, activation='relu'),
BatchNormalization(axis=1),
MaxPooling2D((3,3)),
Flatten(),
Dense(200, activation='relu'),
BatchNormalization(),
Dense(10, activation='softmax')
])
model.compile(Adam(lr=1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
model.optimizer.lr = 0.001
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
return model
conv1(batches)
Epoch 1/2 1568/1568 [==============================] - 11s - loss: 1.3664 - acc: 0.6020 - val_loss: 1.8697 - val_acc: 0.3932 Epoch 2/2 1568/1568 [==============================] - 11s - loss: 0.3201 - acc: 0.9388 - val_loss: 2.2294 - val_acc: 0.1537 Epoch 1/4 1568/1568 [==============================] - 11s - loss: 0.0862 - acc: 0.9911 - val_loss: 2.5230 - val_acc: 0.1517 Epoch 2/4 1568/1568 [==============================] - 11s - loss: 0.0350 - acc: 0.9994 - val_loss: 2.8057 - val_acc: 0.1497 Epoch 3/4 1568/1568 [==============================] - 11s - loss: 0.0201 - acc: 1.0000 - val_loss: 2.9036 - val_acc: 0.1607 Epoch 4/4 1568/1568 [==============================] - 11s - loss: 0.0124 - acc: 1.0000 - val_loss: 2.9390 - val_acc: 0.1647
<keras.models.Sequential at 0x7fbe53519190>
The training set here is very rapidly reaching a very high accuracy. So if we could regularize this, perhaps we could get a reasonable result.
So, what kind of regularization should we try first? As we discussed in lesson 3, we should start with data augmentation.
To find the best data augmentation parameters, we can try each type of data augmentation, one at a time. For each type, we can try four very different levels of augmentation, and see which is the best. In the steps below we've only kept the single best result we found. We're using the CNN we defined above, since we have already observed it can model the data quickly and accurately.
Width shift: move the image left and right -
gen_t = image.ImageDataGenerator(width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
Found 1568 images belonging to 10 classes.
model = conv1(batches)
Epoch 1/2 1568/1568 [==============================] - 11s - loss: 2.1802 - acc: 0.3316 - val_loss: 2.9037 - val_acc: 0.1038 Epoch 2/2 1568/1568 [==============================] - 11s - loss: 1.0996 - acc: 0.6862 - val_loss: 2.1270 - val_acc: 0.2495 Epoch 1/4 1568/1568 [==============================] - 11s - loss: 0.6856 - acc: 0.8106 - val_loss: 2.1610 - val_acc: 0.1487 Epoch 2/4 1568/1568 [==============================] - 11s - loss: 0.4989 - acc: 0.8693 - val_loss: 2.0959 - val_acc: 0.2525 Epoch 3/4 1568/1568 [==============================] - 11s - loss: 0.3715 - acc: 0.9120 - val_loss: 2.1168 - val_acc: 0.2385 Epoch 4/4 1568/1568 [==============================] - 11s - loss: 0.2916 - acc: 0.9254 - val_loss: 2.1028 - val_acc: 0.3044
Height shift: move the image up and down -
gen_t = image.ImageDataGenerator(height_shift_range=0.05)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
Found 1568 images belonging to 10 classes.
model = conv1(batches)
Epoch 1/2 1568/1568 [==============================] - 11s - loss: 1.7843 - acc: 0.4458 - val_loss: 2.1259 - val_acc: 0.2375 Epoch 2/2 1568/1568 [==============================] - 11s - loss: 0.7028 - acc: 0.7825 - val_loss: 2.0232 - val_acc: 0.3164 Epoch 1/4 1568/1568 [==============================] - 11s - loss: 0.3586 - acc: 0.9152 - val_loss: 2.1772 - val_acc: 0.1806 Epoch 2/4 1568/1568 [==============================] - 11s - loss: 0.2335 - acc: 0.9490 - val_loss: 2.1935 - val_acc: 0.1727 Epoch 3/4 1568/1568 [==============================] - 11s - loss: 0.1626 - acc: 0.9656 - val_loss: 2.1944 - val_acc: 0.2106 Epoch 4/4 1568/1568 [==============================] - 11s - loss: 0.1214 - acc: 0.9758 - val_loss: 2.3481 - val_acc: 0.1766
Random shear angles (max in radians) -
gen_t = image.ImageDataGenerator(shear_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
Found 1568 images belonging to 10 classes.
model = conv1(batches)
Epoch 1/2 1568/1568 [==============================] - 11s - loss: 1.6148 - acc: 0.5223 - val_loss: 2.2513 - val_acc: 0.2475 Epoch 2/2 1568/1568 [==============================] - 11s - loss: 0.3915 - acc: 0.9203 - val_loss: 2.0757 - val_acc: 0.2725 Epoch 1/4 1568/1568 [==============================] - 11s - loss: 0.1478 - acc: 0.9821 - val_loss: 2.1869 - val_acc: 0.3084 Epoch 2/4 1568/1568 [==============================] - 11s - loss: 0.0831 - acc: 0.9904 - val_loss: 2.2449 - val_acc: 0.3164 Epoch 3/4 1568/1568 [==============================] - 11s - loss: 0.0530 - acc: 0.9955 - val_loss: 2.2426 - val_acc: 0.3154 Epoch 4/4 1568/1568 [==============================] - 11s - loss: 0.0343 - acc: 0.9994 - val_loss: 2.2609 - val_acc: 0.3234
Rotation: max in degrees -
gen_t = image.ImageDataGenerator(rotation_range=15)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
Found 1568 images belonging to 10 classes.
model = conv1(batches)
Epoch 1/2 1568/1568 [==============================] - 11s - loss: 1.9734 - acc: 0.3865 - val_loss: 2.1849 - val_acc: 0.3064 Epoch 2/2 1568/1568 [==============================] - 11s - loss: 0.8523 - acc: 0.7411 - val_loss: 2.0310 - val_acc: 0.2655 Epoch 1/4 1568/1568 [==============================] - 11s - loss: 0.4652 - acc: 0.8833 - val_loss: 2.0401 - val_acc: 0.2036 Epoch 2/4 1568/1568 [==============================] - 11s - loss: 0.3448 - acc: 0.9101 - val_loss: 2.2149 - val_acc: 0.1317 Epoch 3/4 1568/1568 [==============================] - 11s - loss: 0.2411 - acc: 0.9420 - val_loss: 2.2614 - val_acc: 0.1287 Epoch 4/4 1568/1568 [==============================] - 11s - loss: 0.1722 - acc: 0.9636 - val_loss: 2.1208 - val_acc: 0.2106
Channel shift: randomly changing the R,G,B colors -
gen_t = image.ImageDataGenerator(channel_shift_range=20)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
Found 1568 images belonging to 10 classes.
model = conv1(batches)
Epoch 1/2 1568/1568 [==============================] - 11s - loss: 1.6381 - acc: 0.5191 - val_loss: 2.2146 - val_acc: 0.3483 Epoch 2/2 1568/1568 [==============================] - 11s - loss: 0.3530 - acc: 0.9305 - val_loss: 2.0966 - val_acc: 0.2665 Epoch 1/4 1568/1568 [==============================] - 11s - loss: 0.1036 - acc: 0.9923 - val_loss: 2.4195 - val_acc: 0.1766 Epoch 2/4 1568/1568 [==============================] - 11s - loss: 0.0450 - acc: 1.0000 - val_loss: 2.6192 - val_acc: 0.1667 Epoch 3/4 1568/1568 [==============================] - 11s - loss: 0.0259 - acc: 0.9994 - val_loss: 2.7227 - val_acc: 0.1816 Epoch 4/4 1568/1568 [==============================] - 11s - loss: 0.0180 - acc: 0.9994 - val_loss: 2.7049 - val_acc: 0.2206
And finally, putting it all together!
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05,
shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
Found 1568 images belonging to 10 classes.
model = conv1(batches)
Epoch 1/2 1568/1568 [==============================] - 12s - loss: 2.4533 - acc: 0.2258 - val_loss: 2.1042 - val_acc: 0.2265 Epoch 2/2 1568/1568 [==============================] - 11s - loss: 1.7107 - acc: 0.4305 - val_loss: 2.1321 - val_acc: 0.2295 Epoch 1/2 1568/1568 [==============================] - 11s - loss: 1.4329 - acc: 0.5478 - val_loss: 2.3451 - val_acc: 0.1427 Epoch 2/2 1568/1568 [==============================] - 11s - loss: 1.2623 - acc: 0.5918 - val_loss: 2.4122 - val_acc: 0.1088
At first glance, this isn't looking encouraging, since the validation set is poor and getting worse. But the training set is getting better, and still has a long way to go in accuracy - so we should try annealing our learning rate and running more epochs, before we make a decisions.
model.optimizer.lr = 0.0001
model.fit_generator(batches, batches.nb_sample, nb_epoch=5, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Epoch 1/5 1568/1568 [==============================] - 11s - loss: 1.1570 - acc: 0.6282 - val_loss: 2.4787 - val_acc: 0.1048 Epoch 2/5 1568/1568 [==============================] - 11s - loss: 1.0278 - acc: 0.6582 - val_loss: 2.4211 - val_acc: 0.1267 Epoch 3/5 1568/1568 [==============================] - 11s - loss: 0.9459 - acc: 0.6939 - val_loss: 2.5656 - val_acc: 0.1477 Epoch 4/5 1568/1568 [==============================] - 11s - loss: 0.9045 - acc: 0.6996 - val_loss: 2.2994 - val_acc: 0.2365 Epoch 5/5 1568/1568 [==============================] - 11s - loss: 0.8346 - acc: 0.7360 - val_loss: 2.1203 - val_acc: 0.2705
<keras.callbacks.History at 0x7fe52f0bef90>
Lucky we tried that - we starting to make progress! Let's keep going.
model.fit_generator(batches, batches.nb_sample, nb_epoch=25, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Epoch 1/25 1568/1568 [==============================] - 11s - loss: 0.8055 - acc: 0.7423 - val_loss: 2.0895 - val_acc: 0.2984 Epoch 2/25 1568/1568 [==============================] - 11s - loss: 0.7538 - acc: 0.7621 - val_loss: 1.8985 - val_acc: 0.4212 Epoch 3/25 1568/1568 [==============================] - 11s - loss: 0.7037 - acc: 0.7774 - val_loss: 1.7200 - val_acc: 0.4411 Epoch 4/25 1568/1568 [==============================] - 11s - loss: 0.6865 - acc: 0.7966 - val_loss: 1.5225 - val_acc: 0.5180 Epoch 5/25 1568/1568 [==============================] - 11s - loss: 0.6404 - acc: 0.8036 - val_loss: 1.3924 - val_acc: 0.5319 Epoch 6/25 1568/1568 [==============================] - 11s - loss: 0.6116 - acc: 0.8144 - val_loss: 1.4472 - val_acc: 0.5259 Epoch 7/25 1568/1568 [==============================] - 11s - loss: 0.5671 - acc: 0.8361 - val_loss: 1.4703 - val_acc: 0.5549 Epoch 8/25 1568/1568 [==============================] - 11s - loss: 0.5559 - acc: 0.8265 - val_loss: 1.2402 - val_acc: 0.6337 Epoch 9/25 1568/1568 [==============================] - 11s - loss: 0.5434 - acc: 0.8406 - val_loss: 1.2765 - val_acc: 0.6297 Epoch 10/25 1568/1568 [==============================] - 11s - loss: 0.4877 - acc: 0.8533 - val_loss: 1.2366 - val_acc: 0.6267 Epoch 11/25 1568/1568 [==============================] - 11s - loss: 0.4944 - acc: 0.8406 - val_loss: 1.3992 - val_acc: 0.5349 Epoch 12/25 1568/1568 [==============================] - 11s - loss: 0.4694 - acc: 0.8597 - val_loss: 1.1821 - val_acc: 0.6277 Epoch 13/25 1568/1568 [==============================] - 11s - loss: 0.4251 - acc: 0.8858 - val_loss: 1.1803 - val_acc: 0.6427 Epoch 14/25 1568/1568 [==============================] - 11s - loss: 0.4501 - acc: 0.8680 - val_loss: 1.2752 - val_acc: 0.5908 Epoch 15/25 1568/1568 [==============================] - 11s - loss: 0.3922 - acc: 0.8846 - val_loss: 1.1758 - val_acc: 0.6457 Epoch 16/25 1568/1568 [==============================] - 11s - loss: 0.4406 - acc: 0.8629 - val_loss: 1.3147 - val_acc: 0.5808 Epoch 17/25 1568/1568 [==============================] - 11s - loss: 0.4075 - acc: 0.8788 - val_loss: 1.2941 - val_acc: 0.6148 Epoch 18/25 1568/1568 [==============================] - 11s - loss: 0.3890 - acc: 0.8948 - val_loss: 1.1871 - val_acc: 0.6567 Epoch 19/25 1568/1568 [==============================] - 11s - loss: 0.3708 - acc: 0.8890 - val_loss: 1.1560 - val_acc: 0.6756 Epoch 20/25 1568/1568 [==============================] - 11s - loss: 0.3539 - acc: 0.8973 - val_loss: 1.2621 - val_acc: 0.6537 Epoch 21/25 1568/1568 [==============================] - 11s - loss: 0.3582 - acc: 0.8909 - val_loss: 1.1357 - val_acc: 0.6677 Epoch 22/25 1568/1568 [==============================] - 11s - loss: 0.3232 - acc: 0.9056 - val_loss: 1.2114 - val_acc: 0.6287 Epoch 23/25 1568/1568 [==============================] - 11s - loss: 0.3286 - acc: 0.9011 - val_loss: 1.2917 - val_acc: 0.6377 Epoch 24/25 1568/1568 [==============================] - 11s - loss: 0.3080 - acc: 0.9139 - val_loss: 1.2519 - val_acc: 0.6248 Epoch 25/25 1568/1568 [==============================] - 11s - loss: 0.2999 - acc: 0.9152 - val_loss: 1.1980 - val_acc: 0.6647
<keras.callbacks.History at 0x7fe52f0bea50>
Amazingly, using nothing but a small sample, a simple (not pre-trained) model with no dropout, and data augmentation, we're getting results that would get us into the top 50% of the competition! This looks like a great foundation for our futher experiments.
To go further, we'll need to use the whole dataset, since dropout and data volumes are very related, so we can't tweak dropout without using all the data.