This notebook is trying to explain detailed the steps needed to port the network parameters trained on a Caffe's fork to Keras. The network was trained over the Sports1M Dataset, and the Caffe version was a fork which implements the 3D convolution. More info at the paper.
The first step is to obtain the parameters file related to the network. This file is stored as a caffemodel. To read this file it will be needed to generate the python output of the proto definition at the fork of caffe C3D.
To compile the proto file and then be able to read the caffemodel file with python require to recompile the protobuf library increasing the limit size. By definition, protobuf is thought to treat no large amount of data and it is limitated to read files smaller than 64MB. We can increase that limit following these steps:
git clone https://github.com/google/protobuf
diff --git a/src/google/protobuf/io/coded_stream.h b/src/google/protobuf/io/coded_stream.h
index c81a33a..eeb8863 100644
--- a/src/google/protobuf/io/coded_stream.h
+++ b/src/google/protobuf/io/coded_stream.h
@@ -609,7 +609,7 @@ class LIBPROTOBUF_EXPORT CodedInputStream {
// Return the size of the buffer.
int BufferSize() const;
- static const int kDefaultTotalBytesLimit = 64 << 20; // 64MB
+ static const int kDefaultTotalBytesLimit = 256 << 20; // 256MB
static const int kDefaultTotalBytesWarningThreshold = 32 << 20; // 32MB
Follow the instructions here to recompile the protoc compiler.
Follow the instructions here to install the python protobuf with the reading file size increased.
Compile the caffe.proto
file for python.
protoc --python_out=. caffe.proto
Once protobuf is build, it is also required to have all the depencies ok Keras and Theano (it is going to use as the backend of Keras because is the only one that supports 3D convolutions).
Following the definition of the caffe proto of the C3D network used to train Sports1M and the paper, we define the same model in Keras.
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution3D, MaxPooling3D, ZeroPadding3D
from keras.optimizers import SGD
def get_model(summary=False):
""" Return the Keras model of the network
"""
model = Sequential()
# 1st layer group
model.add(Convolution3D(64, 3, 3, 3, activation='relu',
border_mode='same', name='conv1',
subsample=(1, 1, 1),
input_shape=(3, 16, 112, 112)))
model.add(MaxPooling3D(pool_size=(1, 2, 2), strides=(1, 2, 2),
border_mode='valid', name='pool1'))
# 2nd layer group
model.add(Convolution3D(128, 3, 3, 3, activation='relu',
border_mode='same', name='conv2',
subsample=(1, 1, 1)))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=(2, 2, 2),
border_mode='valid', name='pool2'))
# 3rd layer group
model.add(Convolution3D(256, 3, 3, 3, activation='relu',
border_mode='same', name='conv3a',
subsample=(1, 1, 1)))
model.add(Convolution3D(256, 3, 3, 3, activation='relu',
border_mode='same', name='conv3b',
subsample=(1, 1, 1)))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=(2, 2, 2),
border_mode='valid', name='pool3'))
# 4th layer group
model.add(Convolution3D(512, 3, 3, 3, activation='relu',
border_mode='same', name='conv4a',
subsample=(1, 1, 1)))
model.add(Convolution3D(512, 3, 3, 3, activation='relu',
border_mode='same', name='conv4b',
subsample=(1, 1, 1)))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=(2, 2, 2),
border_mode='valid', name='pool4'))
# 5th layer group
model.add(Convolution3D(512, 3, 3, 3, activation='relu',
border_mode='same', name='conv5a',
subsample=(1, 1, 1)))
model.add(Convolution3D(512, 3, 3, 3, activation='relu',
border_mode='same', name='conv5b',
subsample=(1, 1, 1)))
model.add(ZeroPadding3D(padding=(0, 1, 1)))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=(2, 2, 2),
border_mode='valid', name='pool5'))
model.add(Flatten())
# FC layers group
model.add(Dense(4096, activation='relu', name='fc6'))
model.add(Dropout(.5))
model.add(Dense(4096, activation='relu', name='fc7'))
model.add(Dropout(.5))
model.add(Dense(487, activation='softmax', name='fc8'))
if summary:
print(model.summary())
return model
model = get_model(summary=True)
Using Theano backend.
-------------------------------------------------------------------------------- Initial input shape: (None, 3, 16, 112, 112) -------------------------------------------------------------------------------- Layer (name) Output Shape Param # -------------------------------------------------------------------------------- Convolution3D (conv1) (None, 64, 16, 112, 112) 5248 MaxPooling3D (pool1) (None, 64, 16, 56, 56) 0 Convolution3D (conv2) (None, 128, 16, 56, 56) 221312 MaxPooling3D (pool2) (None, 128, 8, 28, 28) 0 Convolution3D (conv3a) (None, 256, 8, 28, 28) 884992 Convolution3D (conv3b) (None, 256, 8, 28, 28) 1769728 MaxPooling3D (pool3) (None, 256, 4, 14, 14) 0 Convolution3D (conv4a) (None, 512, 4, 14, 14) 3539456 Convolution3D (conv4b) (None, 512, 4, 14, 14) 7078400 MaxPooling3D (pool4) (None, 512, 2, 7, 7) 0 Convolution3D (conv5a) (None, 512, 2, 7, 7) 7078400 Convolution3D (conv5b) (None, 512, 2, 7, 7) 7078400 ZeroPadding3D (zeropadding3d) (None, 512, 2, 9, 9) 0 MaxPooling3D (pool5) (None, 512, 1, 4, 4) 0 Flatten (flatten) (None, 8192) 0 Dense (fc6) (None, 4096) 33558528 Dropout (dropout) (None, 4096) 0 Dense (fc7) (None, 4096) 16781312 Dropout (dropout) (None, 4096) 0 Dense (fc8) (None, 487) 1995239 -------------------------------------------------------------------------------- Total params: 79991015 -------------------------------------------------------------------------------- None
Now lets load all the parametes. One thing to consider is that this operation waste a lot of computer memory (arround 3GB of RAM) due to ineficiencies of protobuf with large serialized objects.
Due to differences between the order of saving the parametes in caffe, some transformations to the matrix should be made.
import caffe_pb2 as caffe
import numpy as np
p = caffe.NetParameter()
p.ParseFromString(
open('model/conv3d_deepnetA_sport1m_iter_1900000', 'rb').read()
)
def rot90(W):
for i in range(W.shape[0]):
for j in range(W.shape[1]):
for k in range(W.shape[2]):
W[i, j, k] = np.rot90(W[i, j, k], 2)
return W
params = []
conv_layers_indx = [1, 4, 7, 9, 12, 14, 17, 19]
fc_layers_indx = [22, 25, 28]
for i in conv_layers_indx:
layer = p.layers[i]
weights_b = np.array(layer.blobs[1].data, dtype=np.float32)
weights_p = np.array(layer.blobs[0].data, dtype=np.float32).reshape(
layer.blobs[0].num, layer.blobs[0].channels, layer.blobs[0].length,
layer.blobs[0].height, layer.blobs[0].width
)
weights_p = rot90(weights_p)
params.append([weights_p, weights_b])
for i in fc_layers_indx:
layer = p.layers[i]
weights_b = np.array(layer.blobs[1].data, dtype=np.float32)
weights_p = np.array(layer.blobs[0].data, dtype=np.float32).reshape(
layer.blobs[0].num, layer.blobs[0].channels, layer.blobs[0].length,
layer.blobs[0].height, layer.blobs[0].width)[0,0,0,:,:].T
params.append([weights_p, weights_b])
Now that all the params are loaded, lets put it to the model.
model_layers_indx = [0, 2, 4, 5, 7, 8, 10, 11] + [15, 17, 19] #conv + fc
for i, j in zip(model_layers_indx, range(11)):
model.layers[i].set_weights(params[j])
import h5py
model.save_weights('sports1M_weights.h5', overwrite=True)
json_string = model.to_json()
with open('sports1M_model.json', 'w') as f:
f.write(json_string)
For now on, it is highly recommended to reestart the kernel and reload the weights from the saved file. Doing this will only require 300MB instead of 3GB of memory. Also compile the model with a sgd and optimizer. This will not affect on testing due it work for training, but is necessary to compile the model so theano tensor's operations could be done in forward direction.
from keras.models import model_from_json
model = model_from_json(open('sports1M_model.json', 'r').read())
model.load_weights('sports1M_weights.h5')
model.compile(loss='mean_squared_error', optimizer='sgd')
Using Theano backend.
For testing we are going to load all the labels corresponding to the Sports1M dataset as can be found here.
with open('dataset/labels.txt', 'r') as f:
labels = [line.strip() for line in f.readlines()]
print('Total labels: {}'.format(len(labels)))
Total labels: 487
import cv2
import numpy as np
cap = cv2.VideoCapture('dM06AMFLsrc.mp4')
vid = []
while True:
ret, img = cap.read()
if not ret:
break
vid.append(cv2.resize(img, (171, 128)))
vid = np.array(vid, dtype=np.float32)
Plot a frame of the video. As can be seen, the video correspond to the label: basquetball
import matplotlib.pyplot as plt
%matplotlib inline
plt.imshow(vid[2000]/256)
<matplotlib.image.AxesImage at 0x122b014a8>
Now from the video extract a 16 frame clip and crop the center to get a 3x16x112x112 clip.
X = vid[2000:2016, 8:120, 30:142, :].transpose((3, 0, 1, 2))
output = model.predict_on_batch(np.array([X]))
plt.plot(output[0][0])
[<matplotlib.lines.Line2D at 0x110e579e8>]
print('Position of maximum probability: {}'.format(output[0].argmax()))
print('Maximum probability: {:.5f}'.format(max(output[0][0])))
print('Corresponding label: {}'.format(labels[output[0].argmax()]))
# sort top five predictions from softmax output
top_inds = output[0][0].argsort()[::-1][:5] # reverse sort and take five largest items
print('\nTop 5 probabilities and labels:')
_ =[print('{:.5f} {}'.format(output[0][0][i], labels[i])) for i in top_inds]
Position of maximum probability: 367 Maximum probability: 0.45910 Corresponding label: basketball Top 5 probabilities and labels: 0.45910 basketball 0.39566 streetball 0.02090 greco-roman wrestling 0.01479 freestyle wrestling 0.01391 slamball
As can be seen on the previous results, the classification of the video has been done correctly, giving as output the basketball category.