```
# build/tools/caffe train \
# -solver models/finetune_flickr_style/solver.prototxt \
# -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel \
# -gpu 0
#
```

#
# However, we will train using Python in this example.
#
# We'll first define `run_solvers`, a function that takes a list of solvers and steps each one in a round robin manner, recording the accuracy and loss values each iteration. At the end, the learned weights are saved to a file.
# In[16]:
def run_solvers(niter, solvers, disp_interval=10):
"""Run solvers for niter iterations,
returning the loss and accuracy recorded each iteration.
`solvers` is a list of (name, solver) tuples."""
blobs = ('loss', 'acc')
loss, acc = ({name: np.zeros(niter) for name, _ in solvers}
for _ in blobs)
for it in range(niter):
for name, s in solvers:
s.step(1) # run a single SGD step in Caffe
loss[name][it], acc[name][it] = (s.net.blobs[b].data.copy()
for b in blobs)
if it % disp_interval == 0 or it + 1 == niter:
loss_disp = '; '.join('%s: loss=%.3f, acc=%2d%%' %
(n, loss[n][it], np.round(100*acc[n][it]))
for n, _ in solvers)
print '%3d) %s' % (it, loss_disp)
# Save the learned weights from both nets.
weight_dir = tempfile.mkdtemp()
weights = {}
for name, s in solvers:
filename = 'weights.%s.caffemodel' % name
weights[name] = os.path.join(weight_dir, filename)
s.net.save(weights[name])
return loss, acc, weights
# Let's create and run solvers to train nets for the style recognition task. We'll create two solvers -- one (`style_solver`) will have its train net initialized to the ImageNet-pretrained weights (this is done by the call to the `copy_from` method), and the other (`scratch_style_solver`) will start from a *randomly* initialized net.
#
# During training, we should see that the ImageNet pretrained net is learning faster and attaining better accuracies than the scratch net.
# In[17]:
niter = 200 # number of iterations to train
# Reset style_solver as before.
style_solver_filename = solver(style_net(train=True))
style_solver = caffe.get_solver(style_solver_filename)
style_solver.net.copy_from(weights)
# For reference, we also create a solver that isn't initialized from
# the pretrained ImageNet weights.
scratch_style_solver_filename = solver(style_net(train=True))
scratch_style_solver = caffe.get_solver(scratch_style_solver_filename)
print 'Running solvers for %d iterations...' % niter
solvers = [('pretrained', style_solver),
('scratch', scratch_style_solver)]
loss, acc, weights = run_solvers(niter, solvers)
print 'Done.'
train_loss, scratch_train_loss = loss['pretrained'], loss['scratch']
train_acc, scratch_train_acc = acc['pretrained'], acc['scratch']
style_weights, scratch_style_weights = weights['pretrained'], weights['scratch']
# Delete solvers to save memory.
del style_solver, scratch_style_solver, solvers
# Let's look at the training loss and accuracy produced by the two training procedures. Notice how quickly the ImageNet pretrained model's loss value (blue) drops, and that the randomly initialized model's loss value (green) barely (if at all) improves from training only the classifier layer.
# In[18]:
plot(np.vstack([train_loss, scratch_train_loss]).T)
xlabel('Iteration #')
ylabel('Loss')
# In[19]:
plot(np.vstack([train_acc, scratch_train_acc]).T)
xlabel('Iteration #')
ylabel('Accuracy')
# Let's take a look at the testing accuracy after running 200 iterations of training. Note that we're classifying among 5 classes, giving chance accuracy of 20%. We expect both results to be better than chance accuracy (20%), and we further expect the result from training using the ImageNet pretraining initialization to be much better than the one from training from scratch. Let's see.
# In[20]:
def eval_style_net(weights, test_iters=10):
test_net = caffe.Net(style_net(train=False), weights, caffe.TEST)
accuracy = 0
for it in xrange(test_iters):
accuracy += test_net.forward()['acc']
accuracy /= test_iters
return test_net, accuracy
# In[21]:
test_net, accuracy = eval_style_net(style_weights)
print 'Accuracy, trained from ImageNet initialization: %3.1f%%' % (100*accuracy, )
scratch_test_net, scratch_accuracy = eval_style_net(scratch_style_weights)
print 'Accuracy, trained from random initialization: %3.1f%%' % (100*scratch_accuracy, )
# ### 4. End-to-end finetuning for style
#
# Finally, we'll train both nets again, starting from the weights we just learned. The only difference this time is that we'll be learning the weights "end-to-end" by turning on learning in *all* layers of the network, starting from the RGB `conv1` filters directly applied to the input image. We pass the argument `learn_all=True` to the `style_net` function defined earlier in this notebook, which tells the function to apply a positive (non-zero) `lr_mult` value for all parameters. Under the default, `learn_all=False`, all parameters in the pretrained layers (`conv1` through `fc7`) are frozen (`lr_mult = 0`), and we learn only the classifier layer `fc8_flickr`.
#
# Note that both networks start at roughly the accuracy achieved at the end of the previous training session, and improve significantly with end-to-end training. To be more scientific, we'd also want to follow the same additional training procedure *without* the end-to-end training, to ensure that our results aren't better simply because we trained for twice as long. Feel free to try this yourself!
# In[22]:
end_to_end_net = style_net(train=True, learn_all=True)
# Set base_lr to 1e-3, the same as last time when learning only the classifier.
# You may want to play around with different values of this or other
# optimization parameters when fine-tuning. For example, if learning diverges
# (e.g., the loss gets very large or goes to infinity/NaN), you should try
# decreasing base_lr (e.g., to 1e-4, then 1e-5, etc., until you find a value
# for which learning does not diverge).
base_lr = 0.001
style_solver_filename = solver(end_to_end_net, base_lr=base_lr)
style_solver = caffe.get_solver(style_solver_filename)
style_solver.net.copy_from(style_weights)
scratch_style_solver_filename = solver(end_to_end_net, base_lr=base_lr)
scratch_style_solver = caffe.get_solver(scratch_style_solver_filename)
scratch_style_solver.net.copy_from(scratch_style_weights)
print 'Running solvers for %d iterations...' % niter
solvers = [('pretrained, end-to-end', style_solver),
('scratch, end-to-end', scratch_style_solver)]
_, _, finetuned_weights = run_solvers(niter, solvers)
print 'Done.'
style_weights_ft = finetuned_weights['pretrained, end-to-end']
scratch_style_weights_ft = finetuned_weights['scratch, end-to-end']
# Delete solvers to save memory.
del style_solver, scratch_style_solver, solvers
# Let's now test the end-to-end finetuned models. Since all layers have been optimized for the style recognition task at hand, we expect both nets to get better results than the ones above, which were achieved by nets with only their classifier layers trained for the style task (on top of either ImageNet pretrained or randomly initialized weights).
# In[23]:
test_net, accuracy = eval_style_net(style_weights_ft)
print 'Accuracy, finetuned from ImageNet initialization: %3.1f%%' % (100*accuracy, )
scratch_test_net, scratch_accuracy = eval_style_net(scratch_style_weights_ft)
print 'Accuracy, finetuned from random initialization: %3.1f%%' % (100*scratch_accuracy, )
# We'll first look back at the image we started with and check our end-to-end trained model's predictions.
# In[24]:
plt.imshow(deprocess_net_image(image))
disp_style_preds(test_net, image)
# Whew, that looks a lot better than before! But note that this image was from the training set, so the net got to see its label at training time.
#
# Finally, we'll pick an image from the test set (an image the model hasn't seen) and look at our end-to-end finetuned style model's predictions for it.
# In[25]:
batch_index = 1
image = test_net.blobs['data'].data[batch_index]
plt.imshow(deprocess_net_image(image))
print 'actual label =', style_labels[int(test_net.blobs['label'].data[batch_index])]
# In[26]:
disp_style_preds(test_net, image)
# We can also look at the predictions of the network trained from scratch. We see that in this case, the scratch network also predicts the correct label for the image (*Pastel*), but is much less confident in its prediction than the pretrained net.
# In[27]:
disp_style_preds(scratch_test_net, image)
# Of course, we can again look at the ImageNet model's predictions for the above image:
# In[28]:
disp_imagenet_preds(imagenet_net, image)
# So we did finetuning and it is awesome. Let's take a look at what kind of results we are able to get with a longer, more complete run of the style recognition dataset. Note: the below URL might be occasionally down because it is run on a research machine.
#
# http://demo.vislab.berkeleyvision.org/