Many of the exciting deep learning algorithms for computer vision require massive datasets for training. The most popular benchmark dataset, ImageNet, for example, contains one million images from one thousand categories. But for any practical problem, we typically have access to comparatively small datasets. In these cases, if we were to train a neural network's weights from scratch, starting from random initialized parameters, we would overfit the training set badly.
One approach to get around this problem is to first pretrain a deep net on a large-scale dataset, like ImageNet. Then, given a new dataset, we can start with these pretrained weights when training on our new task. This process commonly called "fine-tuning". There are anumber of variations of fine-tuning. Sometimes, the initial neural network is used only as a feature extractor. That means that we freeze every layer prior to the output layer and simply learn a new output layer. In another document, we explained how to do this kind of feature extraction. Another approach is to update all of networks weights for the new task, and that's the appraoch we demonstrate in this document.
To fine-tune a network, we must first replace the last fully-connected layer with a new one that outputs the desired number of classes. We initialize its weights randomly. Then we continue training as normal. Sometimes it's common use a smaller learning rate based on the intuition that we may already be close to a good result.
In this demonstration, we'll fine-tune a model pre-trained on ImageNet to the smaller caltech-256 dataset. Following this example, you can finetune to other datasets, even for strikingly different applications such as face identification.
We will show that, even with simple hyper-parameters setting, we can match and even outperform state-of-the-art results on caltech-256.
Network | Accuracy |
---|---|
Resnet-50 | 77.4% |
Resnet-152 | 86.4% |
We follow the standard protocol to sample 60 images from each class as the training set, and the rest for the validation set. We resize images into 256x256 size and pack them into the rec file. The scripts to prepare the data is as following.
wget http://www.vision.caltech.edu/Image_Datasets/Caltech256/256_ObjectCategories.tar
tar -xf 256_ObjectCategories.tar
mkdir -p caltech_256_train_60
for i in 256_ObjectCategories/*; do
c=`basename $i`
mkdir -p caltech_256_train_60/$c
for j in `ls $i/*.jpg | shuf | head -n 60`; do
mv $j caltech_256_train_60/$c/
done
done
python ~/mxnet/tools/im2rec.py --list True --recursive True caltech-256-60-train caltech_256_train_60/
python ~/mxnet/tools/im2rec.py --list True --recursive True caltech-256-60-val 256_ObjectCategories/
python ~/mxnet/tools/im2rec.py --resize 256 --quality 90 --num-thread 16 caltech-256-60-val 256_ObjectCategories/
python ~/mxnet/tools/im2rec.py --resize 256 --quality 90 --num-thread 16 caltech-256-60-train caltech_256_train_60/
The following codes download the pre-generated rec files. It may take a few minutes.
import os, urllib
def download(url):
filename = url.split("/")[-1]
if not os.path.exists(filename):
urllib.urlretrieve(url, filename)
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec')
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec')
Next we define the function which returns the data iterators:
import mxnet as mx
def get_iterators(batch_size, data_shape=(3, 224, 224)):
train = mx.io.ImageRecordIter(
path_imgrec = './caltech-256-60-train.rec',
data_name = 'data',
label_name = 'softmax_label',
batch_size = batch_size,
data_shape = data_shape,
shuffle = True,
rand_crop = True,
rand_mirror = True)
val = mx.io.ImageRecordIter(
path_imgrec = './caltech-256-60-val.rec',
data_name = 'data',
label_name = 'softmax_label',
batch_size = batch_size,
data_shape = data_shape,
rand_crop = False,
rand_mirror = False)
return (train, val)
We then download a pretrained 50-layer ResNet model and load into memory. Note that if load_checkpoint
reports an error, we can remove the downloaded files and try get_model
again.
def get_model(prefix, epoch):
download(prefix+'-symbol.json')
download(prefix+'-%04d.params' % (epoch,))
get_model('http://data.mxnet.io/models/imagenet/resnet/50-layers/resnet-50', 0)
sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-50', 0)
We first define a function which replaces the the last fully-connected layer for a given network.
def get_fine_tune_model(symbol, arg_params, num_classes, layer_name='flatten0'):
"""
symbol: the pre-trained network symbol
arg_params: the argument parameters of the pre-trained model
num_classes: the number of classes for the fine-tune datasets
layer_name: the layer name before the last fully-connected layer
"""
all_layers = sym.get_internals()
net = all_layers[layer_name+'_output']
net = mx.symbol.FullyConnected(data=net, num_hidden=num_classes, name='fc1')
net = mx.symbol.SoftmaxOutput(data=net, name='softmax')
new_args = dict({k:arg_params[k] for k in arg_params if 'fc1' not in k})
return (net, new_args)
Now we create a module. We pass the argument parameters of the pre-trained model to replace all parameters except for the last fully-connected layer. For the last fully-connected layer, we use an initializer to initialize.
import logging
head = '%(asctime)-15s %(message)s'
logging.basicConfig(level=logging.DEBUG, format=head)
def fit(symbol, arg_params, aux_params, train, val, batch_size, num_gpus):
devs = [mx.gpu(i) for i in range(num_gpus)]
mod = mx.mod.Module(symbol=new_sym, context=devs)
mod.fit(train, val,
num_epoch=8,
arg_params=arg_params,
aux_params=aux_params,
allow_missing=True,
batch_end_callback = mx.callback.Speedometer(batch_size, 10),
kvstore='device',
optimizer='sgd',
optimizer_params={'learning_rate':0.01},
initializer=mx.init.Xavier(rnd_type='gaussian', factor_type="in", magnitude=2),
eval_metric='acc')
metric = mx.metric.Accuracy()
return mod.score(val, metric)
Then we can start training. We use AWS EC2 g2.8xlarge, which has 8 GPUs.
# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
num_classes = 256
batch_per_gpu = 16
num_gpus = 8
(new_sym, new_args) = get_fine_tune_model(sym, arg_params, num_classes)
batch_size = batch_per_gpu * num_gpus
(train, val) = get_iterators(batch_size)
mod_score = fit(new_sym, new_args, aux_params, train, val, batch_size, num_gpus)
assert mod_score > 0.77, "Low training accuracy."
2016-10-22 18:24:16,695 Already binded, ignoring bind() 2016-10-22 18:24:22,361 Epoch[0] Batch [10] Speed: 325.98 samples/sec Train-accuracy=0.004261 2016-10-22 18:24:26,205 Epoch[0] Batch [20] Speed: 333.06 samples/sec Train-accuracy=0.011719 2016-10-22 18:24:30,072 Epoch[0] Batch [30] Speed: 331.06 samples/sec Train-accuracy=0.021094 2016-10-22 18:24:33,954 Epoch[0] Batch [40] Speed: 329.84 samples/sec Train-accuracy=0.020313 2016-10-22 18:24:37,811 Epoch[0] Batch [50] Speed: 331.93 samples/sec Train-accuracy=0.023438 2016-10-22 18:24:41,668 Epoch[0] Batch [60] Speed: 331.93 samples/sec Train-accuracy=0.032813 2016-10-22 18:24:45,557 Epoch[0] Batch [70] Speed: 329.22 samples/sec Train-accuracy=0.049219 2016-10-22 18:24:49,424 Epoch[0] Batch [80] Speed: 331.12 samples/sec Train-accuracy=0.071875 2016-10-22 18:24:53,323 Epoch[0] Batch [90] Speed: 328.36 samples/sec Train-accuracy=0.084375 2016-10-22 18:24:57,203 Epoch[0] Batch [100] Speed: 329.95 samples/sec Train-accuracy=0.115625 2016-10-22 18:25:01,091 Epoch[0] Batch [110] Speed: 329.33 samples/sec Train-accuracy=0.153906 2016-10-22 18:25:05,000 Epoch[0] Batch [120] Speed: 327.49 samples/sec Train-accuracy=0.187500 2016-10-22 18:25:05,001 Epoch[0] Train-accuracy=nan 2016-10-22 18:25:05,002 Epoch[0] Time cost=48.301 2016-10-22 18:25:24,502 Epoch[0] Validation-accuracy=0.297072 2016-10-22 18:25:28,564 Epoch[1] Batch [10] Speed: 330.58 samples/sec Train-accuracy=0.240767 2016-10-22 18:25:32,426 Epoch[1] Batch [20] Speed: 331.53 samples/sec Train-accuracy=0.265625 2016-10-22 18:25:36,289 Epoch[1] Batch [30] Speed: 331.41 samples/sec Train-accuracy=0.287500 2016-10-22 18:25:40,173 Epoch[1] Batch [40] Speed: 329.64 samples/sec Train-accuracy=0.314063 2016-10-22 18:25:44,032 Epoch[1] Batch [50] Speed: 331.80 samples/sec Train-accuracy=0.361719 2016-10-22 18:25:47,876 Epoch[1] Batch [60] Speed: 333.07 samples/sec Train-accuracy=0.347656 2016-10-22 18:25:51,741 Epoch[1] Batch [70] Speed: 331.30 samples/sec Train-accuracy=0.410156 2016-10-22 18:25:55,603 Epoch[1] Batch [80] Speed: 331.50 samples/sec Train-accuracy=0.417187 2016-10-22 18:25:59,460 Epoch[1] Batch [90] Speed: 331.88 samples/sec Train-accuracy=0.425781 2016-10-22 18:26:03,304 Epoch[1] Batch [100] Speed: 333.11 samples/sec Train-accuracy=0.419531 2016-10-22 18:26:07,196 Epoch[1] Batch [110] Speed: 328.97 samples/sec Train-accuracy=0.496875 2016-10-22 18:26:10,665 Epoch[1] Train-accuracy=0.488715 2016-10-22 18:26:10,666 Epoch[1] Time cost=46.163 2016-10-22 18:26:29,719 Epoch[1] Validation-accuracy=0.556066 2016-10-22 18:26:33,883 Epoch[2] Batch [10] Speed: 325.12 samples/sec Train-accuracy=0.514915 2016-10-22 18:26:37,757 Epoch[2] Batch [20] Speed: 330.50 samples/sec Train-accuracy=0.524219 2016-10-22 18:26:41,684 Epoch[2] Batch [30] Speed: 325.98 samples/sec Train-accuracy=0.536719 2016-10-22 18:26:45,562 Epoch[2] Batch [40] Speed: 330.21 samples/sec Train-accuracy=0.514844 2016-10-22 18:26:49,448 Epoch[2] Batch [50] Speed: 329.44 samples/sec Train-accuracy=0.564844 2016-10-22 18:26:53,338 Epoch[2] Batch [60] Speed: 329.16 samples/sec Train-accuracy=0.534375 2016-10-22 18:26:57,230 Epoch[2] Batch [70] Speed: 328.99 samples/sec Train-accuracy=0.576562 2016-10-22 18:27:01,128 Epoch[2] Batch [80] Speed: 328.42 samples/sec Train-accuracy=0.604688 2016-10-22 18:27:04,990 Epoch[2] Batch [90] Speed: 331.54 samples/sec Train-accuracy=0.582812 2016-10-22 18:27:08,874 Epoch[2] Batch [100] Speed: 329.63 samples/sec Train-accuracy=0.572656 2016-10-22 18:27:12,737 Epoch[2] Batch [110] Speed: 331.45 samples/sec Train-accuracy=0.625781 2016-10-22 18:27:16,591 Epoch[2] Batch [120] Speed: 332.20 samples/sec Train-accuracy=0.603125 2016-10-22 18:27:16,597 Epoch[2] Train-accuracy=nan 2016-10-22 18:27:16,598 Epoch[2] Time cost=46.878 2016-10-22 18:27:34,905 Epoch[2] Validation-accuracy=0.651947 2016-10-22 18:27:38,961 Epoch[3] Batch [10] Speed: 330.53 samples/sec Train-accuracy=0.636364 2016-10-22 18:27:42,811 Epoch[3] Batch [20] Speed: 332.56 samples/sec Train-accuracy=0.634375 2016-10-22 18:27:46,675 Epoch[3] Batch [30] Speed: 331.38 samples/sec Train-accuracy=0.629687 2016-10-22 18:27:50,545 Epoch[3] Batch [40] Speed: 330.79 samples/sec Train-accuracy=0.641406 2016-10-22 18:27:54,423 Epoch[3] Batch [50] Speed: 330.16 samples/sec Train-accuracy=0.665625 2016-10-22 18:27:58,273 Epoch[3] Batch [60] Speed: 332.54 samples/sec Train-accuracy=0.638281 2016-10-22 18:28:02,131 Epoch[3] Batch [70] Speed: 331.93 samples/sec Train-accuracy=0.671875 2016-10-22 18:28:05,988 Epoch[3] Batch [80] Speed: 331.88 samples/sec Train-accuracy=0.691406 2016-10-22 18:28:09,870 Epoch[3] Batch [90] Speed: 329.84 samples/sec Train-accuracy=0.670312 2016-10-22 18:28:13,742 Epoch[3] Batch [100] Speed: 330.65 samples/sec Train-accuracy=0.660156 2016-10-22 18:28:17,636 Epoch[3] Batch [110] Speed: 328.77 samples/sec Train-accuracy=0.681250 2016-10-22 18:28:21,097 Epoch[3] Train-accuracy=0.684028 2016-10-22 18:28:21,098 Epoch[3] Time cost=46.192 2016-10-22 18:28:40,464 Epoch[3] Validation-accuracy=0.701943 2016-10-22 18:28:44,610 Epoch[4] Batch [10] Speed: 327.03 samples/sec Train-accuracy=0.708807 2016-10-22 18:28:48,480 Epoch[4] Batch [20] Speed: 330.86 samples/sec Train-accuracy=0.708594 2016-10-22 18:28:52,371 Epoch[4] Batch [30] Speed: 329.02 samples/sec Train-accuracy=0.713281 2016-10-22 18:28:56,234 Epoch[4] Batch [40] Speed: 331.46 samples/sec Train-accuracy=0.700781 2016-10-22 18:29:00,129 Epoch[4] Batch [50] Speed: 328.65 samples/sec Train-accuracy=0.712500 2016-10-22 18:29:04,006 Epoch[4] Batch [60] Speed: 330.30 samples/sec Train-accuracy=0.697656 2016-10-22 18:29:07,865 Epoch[4] Batch [70] Speed: 331.74 samples/sec Train-accuracy=0.717969 2016-10-22 18:29:11,737 Epoch[4] Batch [80] Speed: 330.61 samples/sec Train-accuracy=0.737500 2016-10-22 18:29:15,592 Epoch[4] Batch [90] Speed: 332.19 samples/sec Train-accuracy=0.714844 2016-10-22 18:29:19,435 Epoch[4] Batch [100] Speed: 333.15 samples/sec Train-accuracy=0.696875 2016-10-22 18:29:23,287 Epoch[4] Batch [110] Speed: 332.35 samples/sec Train-accuracy=0.734375 2016-10-22 18:29:27,136 Epoch[4] Batch [120] Speed: 332.61 samples/sec Train-accuracy=0.726562 2016-10-22 18:29:27,137 Epoch[4] Train-accuracy=nan 2016-10-22 18:29:27,138 Epoch[4] Time cost=46.673 2016-10-22 18:29:45,791 Epoch[4] Validation-accuracy=0.736935 2016-10-22 18:29:49,873 Epoch[5] Batch [10] Speed: 332.48 samples/sec Train-accuracy=0.749290 2016-10-22 18:29:53,765 Epoch[5] Batch [20] Speed: 328.95 samples/sec Train-accuracy=0.732031 2016-10-22 18:29:57,648 Epoch[5] Batch [30] Speed: 329.67 samples/sec Train-accuracy=0.736719 2016-10-22 18:30:01,540 Epoch[5] Batch [40] Speed: 329.42 samples/sec Train-accuracy=0.722656 2016-10-22 18:30:05,433 Epoch[5] Batch [50] Speed: 328.82 samples/sec Train-accuracy=0.751563 2016-10-22 18:30:09,309 Epoch[5] Batch [60] Speed: 330.37 samples/sec Train-accuracy=0.736719 2016-10-22 18:30:13,198 Epoch[5] Batch [70] Speed: 329.27 samples/sec Train-accuracy=0.771875 2016-10-22 18:30:17,084 Epoch[5] Batch [80] Speed: 329.47 samples/sec Train-accuracy=0.762500 2016-10-22 18:30:20,958 Epoch[5] Batch [90] Speed: 330.43 samples/sec Train-accuracy=0.742969 2016-10-22 18:30:24,858 Epoch[5] Batch [100] Speed: 328.32 samples/sec Train-accuracy=0.770312 2016-10-22 18:30:28,734 Epoch[5] Batch [110] Speed: 330.27 samples/sec Train-accuracy=0.781250 2016-10-22 18:30:32,217 Epoch[5] Train-accuracy=0.757812 2016-10-22 18:30:32,218 Epoch[5] Time cost=46.426 2016-10-22 18:30:51,745 Epoch[5] Validation-accuracy=0.752450 2016-10-22 18:30:55,887 Epoch[6] Batch [10] Speed: 326.48 samples/sec Train-accuracy=0.754261 2016-10-22 18:30:59,754 Epoch[6] Batch [20] Speed: 331.16 samples/sec Train-accuracy=0.768750 2016-10-22 18:31:03,612 Epoch[6] Batch [30] Speed: 331.83 samples/sec Train-accuracy=0.774219 2016-10-22 18:31:07,472 Epoch[6] Batch [40] Speed: 331.66 samples/sec Train-accuracy=0.751563 2016-10-22 18:31:11,326 Epoch[6] Batch [50] Speed: 332.21 samples/sec Train-accuracy=0.777344 2016-10-22 18:31:15,194 Epoch[6] Batch [60] Speed: 331.01 samples/sec Train-accuracy=0.762500 2016-10-22 18:31:19,062 Epoch[6] Batch [70] Speed: 331.03 samples/sec Train-accuracy=0.801562 2016-10-22 18:31:22,938 Epoch[6] Batch [80] Speed: 330.32 samples/sec Train-accuracy=0.788281 2016-10-22 18:31:26,802 Epoch[6] Batch [90] Speed: 331.37 samples/sec Train-accuracy=0.773438 2016-10-22 18:31:30,656 Epoch[6] Batch [100] Speed: 332.24 samples/sec Train-accuracy=0.777344 2016-10-22 18:31:34,555 Epoch[6] Batch [110] Speed: 328.36 samples/sec Train-accuracy=0.791406 2016-10-22 18:31:38,412 Epoch[6] Batch [120] Speed: 331.89 samples/sec Train-accuracy=0.791406 2016-10-22 18:31:38,413 Epoch[6] Train-accuracy=nan 2016-10-22 18:31:38,414 Epoch[6] Time cost=46.668 2016-10-22 18:31:57,459 Epoch[6] Validation-accuracy=0.768382 2016-10-22 18:32:01,634 Epoch[7] Batch [10] Speed: 324.04 samples/sec Train-accuracy=0.789773 2016-10-22 18:32:05,542 Epoch[7] Batch [20] Speed: 327.57 samples/sec Train-accuracy=0.794531 2016-10-22 18:32:09,411 Epoch[7] Batch [30] Speed: 330.90 samples/sec Train-accuracy=0.788281 2016-10-22 18:32:13,311 Epoch[7] Batch [40] Speed: 328.36 samples/sec Train-accuracy=0.778906 2016-10-22 18:32:17,190 Epoch[7] Batch [50] Speed: 330.00 samples/sec Train-accuracy=0.803125 2016-10-22 18:32:21,075 Epoch[7] Batch [60] Speed: 329.54 samples/sec Train-accuracy=0.780469 2016-10-22 18:32:24,934 Epoch[7] Batch [70] Speed: 331.78 samples/sec Train-accuracy=0.779687 2016-10-22 18:32:28,803 Epoch[7] Batch [80] Speed: 330.92 samples/sec Train-accuracy=0.821875 2016-10-22 18:32:32,662 Epoch[7] Batch [90] Speed: 331.79 samples/sec Train-accuracy=0.783594 2016-10-22 18:32:36,515 Epoch[7] Batch [100] Speed: 332.32 samples/sec Train-accuracy=0.802344 2016-10-22 18:32:40,393 Epoch[7] Batch [110] Speed: 330.16 samples/sec Train-accuracy=0.800000 2016-10-22 18:32:43,832 Epoch[7] Train-accuracy=0.782118 2016-10-22 18:32:43,833 Epoch[7] Time cost=46.373 2016-10-22 18:33:01,994 Epoch[7] Validation-accuracy=0.774422
As you can see, after only 8 epochs, we can get 78% validation accuracy. This matches the state-of-the-art results training on caltech-256 alone, e.g. VGG.
Next, we try to use another pretrained model. This model was trained on the complete Imagenet dataset, which is 10x larger than the Imagenet 1K classes version, and uses a 3x deeper Resnet architecture.
# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
get_model('http://data.mxnet.io/models/imagenet-11k/resnet-152/resnet-152', 0)
sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-152', 0)
(new_sym, new_args) = get_fine_tune_model(sym, arg_params, num_classes)
mod_score = fit(new_sym, new_args, aux_params, train, val, batch_size, num_gpus)
assert mod_score > 0.86, "Low training accuracy."
2016-10-22 18:35:42,274 Already binded, ignoring bind() 2016-10-22 18:35:55,659 Epoch[0] Batch [10] Speed: 139.63 samples/sec Train-accuracy=0.070312 2016-10-22 18:36:04,814 Epoch[0] Batch [20] Speed: 139.83 samples/sec Train-accuracy=0.349219 2016-10-22 18:36:13,991 Epoch[0] Batch [30] Speed: 139.49 samples/sec Train-accuracy=0.585156 2016-10-22 18:36:23,163 Epoch[0] Batch [40] Speed: 139.57 samples/sec Train-accuracy=0.642188 2016-10-22 18:36:32,309 Epoch[0] Batch [50] Speed: 139.97 samples/sec Train-accuracy=0.728906 2016-10-22 18:36:41,426 Epoch[0] Batch [60] Speed: 140.41 samples/sec Train-accuracy=0.760156 2016-10-22 18:36:50,531 Epoch[0] Batch [70] Speed: 140.60 samples/sec Train-accuracy=0.778906 2016-10-22 18:36:59,631 Epoch[0] Batch [80] Speed: 140.68 samples/sec Train-accuracy=0.786719 2016-10-22 18:37:08,742 Epoch[0] Batch [90] Speed: 140.51 samples/sec Train-accuracy=0.797656 2016-10-22 18:37:17,857 Epoch[0] Batch [100] Speed: 140.45 samples/sec Train-accuracy=0.823438 2016-10-22 18:37:26,969 Epoch[0] Batch [110] Speed: 140.50 samples/sec Train-accuracy=0.827344 2016-10-22 18:37:36,094 Epoch[0] Batch [120] Speed: 140.29 samples/sec Train-accuracy=0.829688 2016-10-22 18:37:36,095 Epoch[0] Train-accuracy=nan 2016-10-22 18:37:36,096 Epoch[0] Time cost=113.804 2016-10-22 18:38:08,728 Epoch[0] Validation-accuracy=0.829780 2016-10-22 18:38:18,228 Epoch[1] Batch [10] Speed: 139.92 samples/sec Train-accuracy=0.862926 2016-10-22 18:38:27,365 Epoch[1] Batch [20] Speed: 140.10 samples/sec Train-accuracy=0.867969 2016-10-22 18:38:36,476 Epoch[1] Batch [30] Speed: 140.52 samples/sec Train-accuracy=0.884375 2016-10-22 18:38:45,581 Epoch[1] Batch [40] Speed: 140.60 samples/sec Train-accuracy=0.856250 2016-10-22 18:38:54,671 Epoch[1] Batch [50] Speed: 140.84 samples/sec Train-accuracy=0.888281 2016-10-22 18:39:03,774 Epoch[1] Batch [60] Speed: 140.62 samples/sec Train-accuracy=0.891406 2016-10-22 18:39:12,893 Epoch[1] Batch [70] Speed: 140.38 samples/sec Train-accuracy=0.893750 2016-10-22 18:39:22,016 Epoch[1] Batch [80] Speed: 140.33 samples/sec Train-accuracy=0.911719 2016-10-22 18:39:31,173 Epoch[1] Batch [90] Speed: 139.79 samples/sec Train-accuracy=0.893750 2016-10-22 18:39:40,341 Epoch[1] Batch [100] Speed: 139.65 samples/sec Train-accuracy=0.885938 2016-10-22 18:39:49,522 Epoch[1] Batch [110] Speed: 139.45 samples/sec Train-accuracy=0.901563 2016-10-22 18:39:57,750 Epoch[1] Train-accuracy=0.907986 2016-10-22 18:39:57,751 Epoch[1] Time cost=109.022 2016-10-22 18:40:30,649 Epoch[1] Validation-accuracy=0.848608 2016-10-22 18:40:40,134 Epoch[2] Batch [10] Speed: 140.33 samples/sec Train-accuracy=0.921875 2016-10-22 18:40:49,247 Epoch[2] Batch [20] Speed: 140.47 samples/sec Train-accuracy=0.911719 2016-10-22 18:40:58,367 Epoch[2] Batch [30] Speed: 140.37 samples/sec Train-accuracy=0.914844 2016-10-22 18:41:07,515 Epoch[2] Batch [40] Speed: 139.93 samples/sec Train-accuracy=0.913281 2016-10-22 18:41:16,659 Epoch[2] Batch [50] Speed: 140.01 samples/sec Train-accuracy=0.929688 2016-10-22 18:41:25,826 Epoch[2] Batch [60] Speed: 139.64 samples/sec Train-accuracy=0.940625 2016-10-22 18:41:35,015 Epoch[2] Batch [70] Speed: 139.31 samples/sec Train-accuracy=0.927344 2016-10-22 18:41:44,178 Epoch[2] Batch [80] Speed: 139.72 samples/sec Train-accuracy=0.940625 2016-10-22 18:41:53,316 Epoch[2] Batch [90] Speed: 140.09 samples/sec Train-accuracy=0.928125 2016-10-22 18:42:02,413 Epoch[2] Batch [100] Speed: 140.72 samples/sec Train-accuracy=0.948438 2016-10-22 18:42:11,522 Epoch[2] Batch [110] Speed: 140.53 samples/sec Train-accuracy=0.925781 2016-10-22 18:42:20,624 Epoch[2] Batch [120] Speed: 140.66 samples/sec Train-accuracy=0.928906 2016-10-22 18:42:20,625 Epoch[2] Train-accuracy=nan 2016-10-22 18:42:20,626 Epoch[2] Time cost=109.976 2016-10-22 18:42:53,414 Epoch[2] Validation-accuracy=0.853269 2016-10-22 18:43:02,925 Epoch[3] Batch [10] Speed: 139.86 samples/sec Train-accuracy=0.941051 2016-10-22 18:43:12,095 Epoch[3] Batch [20] Speed: 139.60 samples/sec Train-accuracy=0.935156 2016-10-22 18:43:21,270 Epoch[3] Batch [30] Speed: 139.52 samples/sec Train-accuracy=0.939844 2016-10-22 18:43:30,434 Epoch[3] Batch [40] Speed: 139.70 samples/sec Train-accuracy=0.945312 2016-10-22 18:43:39,557 Epoch[3] Batch [50] Speed: 140.31 samples/sec Train-accuracy=0.946094 2016-10-22 18:43:48,680 Epoch[3] Batch [60] Speed: 140.33 samples/sec Train-accuracy=0.937500 2016-10-22 18:43:57,775 Epoch[3] Batch [70] Speed: 140.75 samples/sec Train-accuracy=0.951562 2016-10-22 18:44:06,899 Epoch[3] Batch [80] Speed: 140.31 samples/sec Train-accuracy=0.956250 2016-10-22 18:44:16,000 Epoch[3] Batch [90] Speed: 140.67 samples/sec Train-accuracy=0.942969 2016-10-22 18:44:25,110 Epoch[3] Batch [100] Speed: 140.52 samples/sec Train-accuracy=0.958594 2016-10-22 18:44:34,225 Epoch[3] Batch [110] Speed: 140.46 samples/sec Train-accuracy=0.946875 2016-10-22 18:44:42,448 Epoch[3] Train-accuracy=0.952257 2016-10-22 18:44:42,450 Epoch[3] Time cost=109.035 2016-10-22 18:45:15,423 Epoch[3] Validation-accuracy=0.857587 2016-10-22 18:45:24,921 Epoch[4] Batch [10] Speed: 139.90 samples/sec Train-accuracy=0.965199 2016-10-22 18:45:34,041 Epoch[4] Batch [20] Speed: 140.37 samples/sec Train-accuracy=0.964844 2016-10-22 18:45:43,172 Epoch[4] Batch [30] Speed: 140.20 samples/sec Train-accuracy=0.968750 2016-10-22 18:45:52,287 Epoch[4] Batch [40] Speed: 140.45 samples/sec Train-accuracy=0.955469 2016-10-22 18:46:01,418 Epoch[4] Batch [50] Speed: 140.20 samples/sec Train-accuracy=0.971094 2016-10-22 18:46:10,534 Epoch[4] Batch [60] Speed: 140.43 samples/sec Train-accuracy=0.954688 2016-10-22 18:46:19,664 Epoch[4] Batch [70] Speed: 140.21 samples/sec Train-accuracy=0.964063 2016-10-22 18:46:28,811 Epoch[4] Batch [80] Speed: 139.96 samples/sec Train-accuracy=0.969531 2016-10-22 18:46:37,986 Epoch[4] Batch [90] Speed: 139.53 samples/sec Train-accuracy=0.961719 2016-10-22 18:46:47,150 Epoch[4] Batch [100] Speed: 139.70 samples/sec Train-accuracy=0.966406 2016-10-22 18:46:56,307 Epoch[4] Batch [110] Speed: 139.79 samples/sec Train-accuracy=0.966406 2016-10-22 18:47:05,456 Epoch[4] Batch [120] Speed: 139.94 samples/sec Train-accuracy=0.966406 2016-10-22 18:47:05,457 Epoch[4] Train-accuracy=nan 2016-10-22 18:47:05,457 Epoch[4] Time cost=110.033 2016-10-22 18:47:38,303 Epoch[4] Validation-accuracy=0.862329 2016-10-22 18:47:47,779 Epoch[5] Batch [10] Speed: 140.25 samples/sec Train-accuracy=0.971591 2016-10-22 18:47:56,897 Epoch[5] Batch [20] Speed: 140.40 samples/sec Train-accuracy=0.970313 2016-10-22 18:48:06,006 Epoch[5] Batch [30] Speed: 140.53 samples/sec Train-accuracy=0.976562 2016-10-22 18:48:15,150 Epoch[5] Batch [40] Speed: 140.01 samples/sec Train-accuracy=0.967187 2016-10-22 18:48:24,320 Epoch[5] Batch [50] Speed: 139.60 samples/sec Train-accuracy=0.975781 2016-10-22 18:48:33,515 Epoch[5] Batch [60] Speed: 139.22 samples/sec Train-accuracy=0.971094 2016-10-22 18:48:42,707 Epoch[5] Batch [70] Speed: 139.26 samples/sec Train-accuracy=0.971875 2016-10-22 18:48:51,857 Epoch[5] Batch [80] Speed: 139.92 samples/sec Train-accuracy=0.988281 2016-10-22 18:49:00,980 Epoch[5] Batch [90] Speed: 140.32 samples/sec Train-accuracy=0.969531 2016-10-22 18:49:10,092 Epoch[5] Batch [100] Speed: 140.49 samples/sec Train-accuracy=0.984375 2016-10-22 18:49:19,205 Epoch[5] Batch [110] Speed: 140.49 samples/sec Train-accuracy=0.978125 2016-10-22 18:49:27,399 Epoch[5] Train-accuracy=0.968750 2016-10-22 18:49:27,400 Epoch[5] Time cost=109.095 2016-10-22 18:50:00,339 Epoch[5] Validation-accuracy=0.864102 2016-10-22 18:50:09,861 Epoch[6] Batch [10] Speed: 139.72 samples/sec Train-accuracy=0.978693 2016-10-22 18:50:19,028 Epoch[6] Batch [20] Speed: 139.65 samples/sec Train-accuracy=0.976562 2016-10-22 18:50:28,206 Epoch[6] Batch [30] Speed: 139.48 samples/sec Train-accuracy=0.975000 2016-10-22 18:50:37,343 Epoch[6] Batch [40] Speed: 140.11 samples/sec Train-accuracy=0.976562 2016-10-22 18:50:46,475 Epoch[6] Batch [50] Speed: 140.18 samples/sec Train-accuracy=0.971094 2016-10-22 18:50:55,613 Epoch[6] Batch [60] Speed: 140.10 samples/sec Train-accuracy=0.976562 2016-10-22 18:51:04,717 Epoch[6] Batch [70] Speed: 140.60 samples/sec Train-accuracy=0.978906 2016-10-22 18:51:13,821 Epoch[6] Batch [80] Speed: 140.63 samples/sec Train-accuracy=0.977344 2016-10-22 18:51:22,932 Epoch[6] Batch [90] Speed: 140.50 samples/sec Train-accuracy=0.971875 2016-10-22 18:51:32,039 Epoch[6] Batch [100] Speed: 140.56 samples/sec Train-accuracy=0.980469 2016-10-22 18:51:41,172 Epoch[6] Batch [110] Speed: 140.17 samples/sec Train-accuracy=0.978906 2016-10-22 18:51:50,312 Epoch[6] Batch [120] Speed: 140.06 samples/sec Train-accuracy=0.978906 2016-10-22 18:51:50,314 Epoch[6] Train-accuracy=nan 2016-10-22 18:51:50,314 Epoch[6] Time cost=109.974 2016-10-22 18:52:23,287 Epoch[6] Validation-accuracy=0.864738 2016-10-22 18:52:32,798 Epoch[7] Batch [10] Speed: 139.84 samples/sec Train-accuracy=0.982244 2016-10-22 18:52:41,881 Epoch[7] Batch [20] Speed: 140.94 samples/sec Train-accuracy=0.980469 2016-10-22 18:52:50,982 Epoch[7] Batch [30] Speed: 140.67 samples/sec Train-accuracy=0.978906 2016-10-22 18:53:00,086 Epoch[7] Batch [40] Speed: 140.61 samples/sec Train-accuracy=0.980469 2016-10-22 18:53:09,208 Epoch[7] Batch [50] Speed: 140.35 samples/sec Train-accuracy=0.975000 2016-10-22 18:53:18,342 Epoch[7] Batch [60] Speed: 140.15 samples/sec Train-accuracy=0.970313 2016-10-22 18:53:27,490 Epoch[7] Batch [70] Speed: 139.94 samples/sec Train-accuracy=0.978125 2016-10-22 18:53:36,623 Epoch[7] Batch [80] Speed: 140.15 samples/sec Train-accuracy=0.989844 2016-10-22 18:53:45,795 Epoch[7] Batch [90] Speed: 139.58 samples/sec Train-accuracy=0.976562 2016-10-22 18:53:54,958 Epoch[7] Batch [100] Speed: 139.70 samples/sec Train-accuracy=0.981250 2016-10-22 18:54:04,143 Epoch[7] Batch [110] Speed: 139.39 samples/sec Train-accuracy=0.974219 2016-10-22 18:54:12,364 Epoch[7] Train-accuracy=0.976562 2016-10-22 18:54:12,365 Epoch[7] Time cost=109.077 2016-10-22 18:54:45,259 Epoch[7] Validation-accuracy=0.863905
As can be seen, even for a single data epoch, it reaches 83% validation accuracy. After 8 epoches, the validation accuracy increases to 86.4%.