In this example, we'll explore learning with Caffe in Python, using the fully-exposed Solver
interface.
import os
os.chdir('..')
import sys
sys.path.insert(0, './python')
import caffe
from pylab import *
%matplotlib inline
We'll be running the provided LeNet example (make sure you've downloaded the data and created the databases, as below).
# Download and prepare data
!data/mnist/get_mnist.sh
!examples/mnist/create_mnist.sh
Downloading... --2015-06-30 14:41:56-- http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz Resolving yann.lecun.com... 128.122.47.89 Connecting to yann.lecun.com|128.122.47.89|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 9912422 (9.5M) [application/x-gzip] Saving to: 'train-images-idx3-ubyte.gz' train-images-idx3-u 100%[=====================>] 9.45M 146KB/s in 57s 2015-06-30 14:42:53 (171 KB/s) - 'train-images-idx3-ubyte.gz' saved [9912422/9912422] --2015-06-30 14:42:53-- http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz Resolving yann.lecun.com... 128.122.47.89 Connecting to yann.lecun.com|128.122.47.89|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 28881 (28K) [application/x-gzip] Saving to: 'train-labels-idx1-ubyte.gz' train-labels-idx1-u 100%[=====================>] 28.20K 107KB/s in 0.3s 2015-06-30 14:42:53 (107 KB/s) - 'train-labels-idx1-ubyte.gz' saved [28881/28881] --2015-06-30 14:42:53-- http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz Resolving yann.lecun.com... 128.122.47.89 Connecting to yann.lecun.com|128.122.47.89|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1648877 (1.6M) [application/x-gzip] Saving to: 't10k-images-idx3-ubyte.gz' t10k-images-idx3-ub 100%[=====================>] 1.57M 205KB/s in 8.2s 2015-06-30 14:43:02 (197 KB/s) - 't10k-images-idx3-ubyte.gz' saved [1648877/1648877] --2015-06-30 14:43:02-- http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz Resolving yann.lecun.com... 128.122.47.89 Connecting to yann.lecun.com|128.122.47.89|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 4542 (4.4K) [application/x-gzip] Saving to: 't10k-labels-idx1-ubyte.gz' t10k-labels-idx1-ub 100%[=====================>] 4.44K 26.9KB/s in 0.2s 2015-06-30 14:43:02 (26.9 KB/s) - 't10k-labels-idx1-ubyte.gz' saved [4542/4542] Unzipping... Done. Creating lmdb... Done.
We need two external files to help out:
We start with the net. We'll write the net in a succinct and natural way as Python code that serializes to Caffe's protobuf model format.
This network expects to read from pregenerated LMDBs, but reading directly from ndarray
s is also possible using MemoryDataLayer
.
from caffe import layers as L
from caffe import params as P
def lenet(lmdb, batch_size):
# our version of LeNet: a series of linear and simple nonlinear transformations
n = caffe.NetSpec()
n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb,
transform_param=dict(scale=1./255), ntop=2)
n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))
n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
n.ip1 = L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))
n.relu1 = L.ReLU(n.ip1, in_place=True)
n.ip2 = L.InnerProduct(n.relu1, num_output=10, weight_filler=dict(type='xavier'))
n.loss = L.SoftmaxWithLoss(n.ip2, n.label)
return n.to_proto()
with open('examples/mnist/lenet_auto_train.prototxt', 'w') as f:
f.write(str(lenet('examples/mnist/mnist_train_lmdb', 64)))
with open('examples/mnist/lenet_auto_test.prototxt', 'w') as f:
f.write(str(lenet('examples/mnist/mnist_test_lmdb', 100)))
The net has been written to disk in more verbose but human-readable serialization format using Google's protobuf library. You can read, write, and modify this description directly. Let's take a look at the train net.
!cat examples/mnist/lenet_auto_train.prototxt
layer { name: "data" type: "Data" top: "data" top: "label" transform_param { scale: 0.00392156862745 } data_param { source: "examples/mnist/mnist_train_lmdb" batch_size: 64 backend: LMDB } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" convolution_param { num_output: 20 kernel_size: 5 weight_filler { type: "xavier" } } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" convolution_param { num_output: 50 kernel_size: 5 weight_filler { type: "xavier" } } } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" inner_product_param { num_output: 500 weight_filler { type: "xavier" } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" inner_product_param { num_output: 10 weight_filler { type: "xavier" } } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" top: "loss" }
Now let's see the learning parameters, which are also written as a prototxt
file. We're using SGD with momentum, weight decay, and a specific learning rate schedule.
!cat examples/mnist/lenet_auto_solver.prototxt
# The train/test net protocol buffer definition train_net: "examples/mnist/lenet_auto_train.prototxt" test_net: "examples/mnist/lenet_auto_test.prototxt" # test_iter specifies how many forward passes the test should carry out. # In the case of MNIST, we have test batch size 100 and 100 test iterations, # covering the full 10,000 testing images. test_iter: 100 # Carry out testing every 500 training iterations. test_interval: 500 # The base learning rate, momentum and the weight decay of the network. base_lr: 0.01 momentum: 0.9 weight_decay: 0.0005 # The learning rate policy lr_policy: "inv" gamma: 0.0001 power: 0.75 # Display every 100 iterations display: 100 # The maximum number of iterations max_iter: 10000 # snapshot intermediate results snapshot: 5000 snapshot_prefix: "examples/mnist/lenet"
Let's pick a device and load the solver. We'll use SGD (with momentum), but Adagrad and Nesterov's accelerated gradient are also available.
caffe.set_device(0)
caffe.set_mode_gpu()
solver = caffe.SGDSolver('examples/mnist/lenet_auto_solver.prototxt')
To get an idea of the architecture of our net, we can check the dimensions of the intermediate features (blobs) and parameters (these will also be useful to refer to when manipulating data later).
# each output is (batch size, feature dim, spatial dim)
[(k, v.data.shape) for k, v in solver.net.blobs.items()]
[('data', (64, 1, 28, 28)), ('label', (64,)), ('conv1', (64, 20, 24, 24)), ('pool1', (64, 20, 12, 12)), ('conv2', (64, 50, 8, 8)), ('pool2', (64, 50, 4, 4)), ('ip1', (64, 500)), ('ip2', (64, 10)), ('loss', ())]
# just print the weight sizes (not biases)
[(k, v[0].data.shape) for k, v in solver.net.params.items()]
[('conv1', (20, 1, 5, 5)), ('conv2', (50, 20, 5, 5)), ('ip1', (500, 800)), ('ip2', (10, 500))]
Before taking off, let's check that everything is loaded as we expect. We'll run a forward pass on the train and test nets and check that they contain our data.
solver.net.forward() # train net
solver.test_nets[0].forward() # test net (there can be more than one)
{'loss': array(2.301163673400879, dtype=float32)}
# we use a little trick to tile the first eight images
imshow(solver.net.blobs['data'].data[:8, 0].transpose(1, 0, 2).reshape(28, 8*28), cmap='gray')
print solver.net.blobs['label'].data[:8]
[ 5. 0. 4. 1. 9. 2. 1. 3.]
imshow(solver.test_nets[0].blobs['data'].data[:8, 0].transpose(1, 0, 2).reshape(28, 8*28), cmap='gray')
print solver.test_nets[0].blobs['label'].data[:8]
[ 7. 2. 1. 0. 4. 1. 4. 9.]
Both train and test nets seem to be loading data, and to have correct labels.
Let's take one step of (minibatch) SGD and see what happens.
solver.step(1)
Do we have gradients propagating through our filters? Let's see the updates to the first layer, shown here as a $4 \times 5$ grid of $5 \times 5$ filters.
imshow(solver.net.params['conv1'][0].diff[:, 0].reshape(4, 5, 5, 5)
.transpose(0, 2, 1, 3).reshape(4*5, 5*5), cmap='gray')
<matplotlib.image.AxesImage at 0x7f79383819d0>
Something is happening. Let's run the net for a while, keeping track of a few things as it goes.
Note that this process will be the same as if training through the caffe
binary. In particular:
Since we have control of the loop in Python, we're free to compute additional things as we go, as we show below. We can do many other things as well, for example:
%%time
niter = 200
test_interval = 25
# losses will also be stored in the log
train_loss = zeros(niter)
test_acc = zeros(int(np.ceil(niter / test_interval)))
output = zeros((niter, 8, 10))
# the main solver loop
for it in range(niter):
solver.step(1) # SGD by Caffe
# store the train loss
train_loss[it] = solver.net.blobs['loss'].data
# store the output on the first test batch
# (start the forward pass at conv1 to avoid loading new data)
solver.test_nets[0].forward(start='conv1')
output[it] = solver.test_nets[0].blobs['ip2'].data[:8]
# run a full test every so often
# (Caffe can also do this for us and write to a log, but we show here
# how to do it directly in Python, where more complicated things are easier.)
if it % test_interval == 0:
print 'Iteration', it, 'testing...'
correct = 0
for test_it in range(100):
solver.test_nets[0].forward()
correct += sum(solver.test_nets[0].blobs['ip2'].data.argmax(1)
== solver.test_nets[0].blobs['label'].data)
test_acc[it // test_interval] = correct / 1e4
Iteration 0 testing... Iteration 25 testing... Iteration 50 testing... Iteration 75 testing... Iteration 100 testing... Iteration 125 testing... Iteration 150 testing... Iteration 175 testing... CPU times: user 12.3 s, sys: 3.96 s, total: 16.2 s Wall time: 15.7 s
Let's plot the train loss and test accuracy.
_, ax1 = subplots()
ax2 = ax1.twinx()
ax1.plot(arange(niter), train_loss)
ax2.plot(test_interval * arange(len(test_acc)), test_acc, 'r')
ax1.set_xlabel('iteration')
ax1.set_ylabel('train loss')
ax2.set_ylabel('test accuracy')
<matplotlib.text.Text at 0x7f793878f490>
The loss seems to have dropped quickly and coverged (except for stochasticity), while the accuracy rose correspondingly. Hooray!
Since we saved the results on the first test batch, we can watch how our prediction scores evolved. We'll plot time on the $x$ axis and each possible label on the $y$, with lightness indicating confidence.
for i in range(8):
figure(figsize=(2, 2))
imshow(solver.test_nets[0].blobs['data'].data[i, 0], cmap='gray')
figure(figsize=(10, 2))
imshow(output[:50, i].T, interpolation='nearest', cmap='gray')
xlabel('iteration')
ylabel('label')
We started with little idea about any of these digits, and ended up with correct classifications for each. If you've been following along, you'll see the last digit is the most difficult, a slanted "9" that's (understandably) most confused with "4".
Note that these are the "raw" output scores rather than the softmax-computed probability vectors. The latter, shown below, make it easier to see the confidence of our net (but harder to see the scores for less likely digits).
for i in range(8):
figure(figsize=(2, 2))
imshow(solver.test_nets[0].blobs['data'].data[i, 0], cmap='gray')
figure(figsize=(10, 2))
imshow(exp(output[:50, i].T) / exp(output[:50, i].T).sum(0), interpolation='nearest', cmap='gray')
xlabel('iteration')
ylabel('label')