This approach follows ideas described in Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. Arxiv 2013.
First of all, we'll need a little Python script to run the Matlab Selective Search code.
Let's run detection on an image of a couple of cats frolicking (one of the ImageNet detection challenge pictures), which we will download from the web.
Before you get started with this notebook, make sure to follow instructions for getting the pretrained ImageNet model.
!mkdir _temp
!curl http://farm1.static.flickr.com/220/512450093_7717fb8ce8.jpg > _temp/cat.jpg
!echo `pwd`/_temp/cat.jpg > _temp/cat.txt
!../python/detect.py --crop_mode=selective_search --pretrained_model=imagenet/caffe_reference_imagenet_model --model_def=imagenet/imagenet_deploy.prototxt _temp/cat.txt _temp/cat.h5
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 212k 100 212k 0 0 852k 0 --:--:-- --:--:-- --:--:-- 858k WARNING: Logging before InitGoogleLogging() is written to STDERR I0520 12:14:46.505362 2522 net.cpp:75] Creating Layer conv1 I0520 12:14:46.505406 2522 net.cpp:85] conv1 <- data I0520 12:14:46.505462 2522 net.cpp:111] conv1 -> conv1 I0520 12:14:46.505530 2522 net.cpp:126] Top shape: 10 96 55 55 (2904000) I0520 12:14:46.505542 2522 net.cpp:152] conv1 needs backward computation. I0520 12:14:46.505550 2522 net.cpp:75] Creating Layer relu1 I0520 12:14:46.505556 2522 net.cpp:85] relu1 <- conv1 I0520 12:14:46.505563 2522 net.cpp:99] relu1 -> conv1 (in-place) I0520 12:14:46.505570 2522 net.cpp:126] Top shape: 10 96 55 55 (2904000) I0520 12:14:46.505578 2522 net.cpp:152] relu1 needs backward computation. I0520 12:14:46.505584 2522 net.cpp:75] Creating Layer pool1 I0520 12:14:46.505590 2522 net.cpp:85] pool1 <- conv1 I0520 12:14:46.505596 2522 net.cpp:111] pool1 -> pool1 I0520 12:14:46.505606 2522 net.cpp:126] Top shape: 10 96 27 27 (699840) I0520 12:14:46.505612 2522 net.cpp:152] pool1 needs backward computation. I0520 12:14:46.505620 2522 net.cpp:75] Creating Layer norm1 I0520 12:14:46.505626 2522 net.cpp:85] norm1 <- pool1 I0520 12:14:46.505632 2522 net.cpp:111] norm1 -> norm1 I0520 12:14:46.505640 2522 net.cpp:126] Top shape: 10 96 27 27 (699840) I0520 12:14:46.505646 2522 net.cpp:152] norm1 needs backward computation. I0520 12:14:46.505656 2522 net.cpp:75] Creating Layer conv2 I0520 12:14:46.505661 2522 net.cpp:85] conv2 <- norm1 I0520 12:14:46.505668 2522 net.cpp:111] conv2 -> conv2 I0520 12:14:46.506363 2522 net.cpp:126] Top shape: 10 256 27 27 (1866240) I0520 12:14:46.506383 2522 net.cpp:152] conv2 needs backward computation. I0520 12:14:46.506392 2522 net.cpp:75] Creating Layer relu2 I0520 12:14:46.506398 2522 net.cpp:85] relu2 <- conv2 I0520 12:14:46.506409 2522 net.cpp:99] relu2 -> conv2 (in-place) I0520 12:14:46.506417 2522 net.cpp:126] Top shape: 10 256 27 27 (1866240) I0520 12:14:46.506422 2522 net.cpp:152] relu2 needs backward computation. I0520 12:14:46.506429 2522 net.cpp:75] Creating Layer pool2 I0520 12:14:46.506435 2522 net.cpp:85] pool2 <- conv2 I0520 12:14:46.506441 2522 net.cpp:111] pool2 -> pool2 I0520 12:14:46.506448 2522 net.cpp:126] Top shape: 10 256 13 13 (432640) I0520 12:14:46.506454 2522 net.cpp:152] pool2 needs backward computation. I0520 12:14:46.506463 2522 net.cpp:75] Creating Layer norm2 I0520 12:14:46.506469 2522 net.cpp:85] norm2 <- pool2 I0520 12:14:46.506475 2522 net.cpp:111] norm2 -> norm2 I0520 12:14:46.506482 2522 net.cpp:126] Top shape: 10 256 13 13 (432640) I0520 12:14:46.506489 2522 net.cpp:152] norm2 needs backward computation. I0520 12:14:46.506496 2522 net.cpp:75] Creating Layer conv3 I0520 12:14:46.506502 2522 net.cpp:85] conv3 <- norm2 I0520 12:14:46.506508 2522 net.cpp:111] conv3 -> conv3 I0520 12:14:46.508342 2522 net.cpp:126] Top shape: 10 384 13 13 (648960) I0520 12:14:46.508359 2522 net.cpp:152] conv3 needs backward computation. I0520 12:14:46.508369 2522 net.cpp:75] Creating Layer relu3 I0520 12:14:46.508375 2522 net.cpp:85] relu3 <- conv3 I0520 12:14:46.508383 2522 net.cpp:99] relu3 -> conv3 (in-place) I0520 12:14:46.508389 2522 net.cpp:126] Top shape: 10 384 13 13 (648960) I0520 12:14:46.508395 2522 net.cpp:152] relu3 needs backward computation. I0520 12:14:46.508402 2522 net.cpp:75] Creating Layer conv4 I0520 12:14:46.508409 2522 net.cpp:85] conv4 <- conv3 I0520 12:14:46.508415 2522 net.cpp:111] conv4 -> conv4 I0520 12:14:46.509848 2522 net.cpp:126] Top shape: 10 384 13 13 (648960) I0520 12:14:46.509870 2522 net.cpp:152] conv4 needs backward computation. I0520 12:14:46.509877 2522 net.cpp:75] Creating Layer relu4 I0520 12:14:46.509884 2522 net.cpp:85] relu4 <- conv4 I0520 12:14:46.509891 2522 net.cpp:99] relu4 -> conv4 (in-place) I0520 12:14:46.509897 2522 net.cpp:126] Top shape: 10 384 13 13 (648960) I0520 12:14:46.509903 2522 net.cpp:152] relu4 needs backward computation. I0520 12:14:46.509912 2522 net.cpp:75] Creating Layer conv5 I0520 12:14:46.509917 2522 net.cpp:85] conv5 <- conv4 I0520 12:14:46.509923 2522 net.cpp:111] conv5 -> conv5 I0520 12:14:46.510815 2522 net.cpp:126] Top shape: 10 256 13 13 (432640) I0520 12:14:46.510850 2522 net.cpp:152] conv5 needs backward computation. I0520 12:14:46.510860 2522 net.cpp:75] Creating Layer relu5 I0520 12:14:46.510867 2522 net.cpp:85] relu5 <- conv5 I0520 12:14:46.510875 2522 net.cpp:99] relu5 -> conv5 (in-place) I0520 12:14:46.510884 2522 net.cpp:126] Top shape: 10 256 13 13 (432640) I0520 12:14:46.510890 2522 net.cpp:152] relu5 needs backward computation. I0520 12:14:46.510897 2522 net.cpp:75] Creating Layer pool5 I0520 12:14:46.510903 2522 net.cpp:85] pool5 <- conv5 I0520 12:14:46.510910 2522 net.cpp:111] pool5 -> pool5 I0520 12:14:46.510920 2522 net.cpp:126] Top shape: 10 256 6 6 (92160) I0520 12:14:46.510926 2522 net.cpp:152] pool5 needs backward computation. I0520 12:14:46.510936 2522 net.cpp:75] Creating Layer fc6 I0520 12:14:46.510942 2522 net.cpp:85] fc6 <- pool5 I0520 12:14:46.510949 2522 net.cpp:111] fc6 -> fc6 I0520 12:14:46.566017 2522 net.cpp:126] Top shape: 10 4096 1 1 (40960) I0520 12:14:46.566061 2522 net.cpp:152] fc6 needs backward computation. I0520 12:14:46.566076 2522 net.cpp:75] Creating Layer relu6 I0520 12:14:46.566084 2522 net.cpp:85] relu6 <- fc6 I0520 12:14:46.566092 2522 net.cpp:99] relu6 -> fc6 (in-place) I0520 12:14:46.566100 2522 net.cpp:126] Top shape: 10 4096 1 1 (40960) I0520 12:14:46.566140 2522 net.cpp:152] relu6 needs backward computation. I0520 12:14:46.566149 2522 net.cpp:75] Creating Layer drop6 I0520 12:14:46.566155 2522 net.cpp:85] drop6 <- fc6 I0520 12:14:46.566161 2522 net.cpp:99] drop6 -> fc6 (in-place) I0520 12:14:46.566174 2522 net.cpp:126] Top shape: 10 4096 1 1 (40960) I0520 12:14:46.566179 2522 net.cpp:152] drop6 needs backward computation. I0520 12:14:46.566187 2522 net.cpp:75] Creating Layer fc7 I0520 12:14:46.566193 2522 net.cpp:85] fc7 <- fc6 I0520 12:14:46.566200 2522 net.cpp:111] fc7 -> fc7 I0520 12:14:46.600733 2522 net.cpp:126] Top shape: 10 4096 1 1 (40960) I0520 12:14:46.600765 2522 net.cpp:152] fc7 needs backward computation. I0520 12:14:46.600777 2522 net.cpp:75] Creating Layer relu7 I0520 12:14:46.600785 2522 net.cpp:85] relu7 <- fc7 I0520 12:14:46.600793 2522 net.cpp:99] relu7 -> fc7 (in-place) I0520 12:14:46.600802 2522 net.cpp:126] Top shape: 10 4096 1 1 (40960) I0520 12:14:46.600808 2522 net.cpp:152] relu7 needs backward computation. I0520 12:14:46.600816 2522 net.cpp:75] Creating Layer drop7 I0520 12:14:46.600823 2522 net.cpp:85] drop7 <- fc7 I0520 12:14:46.600829 2522 net.cpp:99] drop7 -> fc7 (in-place) I0520 12:14:46.600836 2522 net.cpp:126] Top shape: 10 4096 1 1 (40960) I0520 12:14:46.600843 2522 net.cpp:152] drop7 needs backward computation. I0520 12:14:46.600850 2522 net.cpp:75] Creating Layer fc8 I0520 12:14:46.600857 2522 net.cpp:85] fc8 <- fc7 I0520 12:14:46.600864 2522 net.cpp:111] fc8 -> fc8 I0520 12:14:46.615557 2522 net.cpp:126] Top shape: 10 1000 1 1 (10000) I0520 12:14:46.615602 2522 net.cpp:152] fc8 needs backward computation. I0520 12:14:46.615614 2522 net.cpp:75] Creating Layer prob I0520 12:14:46.615623 2522 net.cpp:85] prob <- fc8 I0520 12:14:46.615631 2522 net.cpp:111] prob -> prob I0520 12:14:46.615649 2522 net.cpp:126] Top shape: 10 1000 1 1 (10000) I0520 12:14:46.615656 2522 net.cpp:152] prob needs backward computation. I0520 12:14:46.615664 2522 net.cpp:163] This network produces output prob I0520 12:14:46.615682 2522 net.cpp:181] Collecting Learning Rate and Weight Decay. I0520 12:14:46.615696 2522 net.cpp:174] Network initialization done. I0520 12:14:46.615702 2522 net.cpp:175] Memory required for Data 42022840 Loading input... selective_search({'/home/shelhamer/caffe/examples/_temp/cat.jpg'}, '/tmp/tmplkH92s.mat') Processed 223 windows in 16.525 s. /home/shelhamer/anaconda/lib/python2.7/site-packages/pandas/io/pytables.py:2446: PerformanceWarning: your performance may suffer as PyTables will pickle object types that it cannot map directly to c-types [inferred_type->mixed,key->block1_values] [items->['prediction']] warnings.warn(ws, PerformanceWarning) Saved to _temp/cat.h5 in 0.353 s.
Running this outputs a DataFrame with the filenames, selected windows, and their ImageNet scores to an HDF5 file. (We only ran on one image, so the filenames will all be the same.)
import pandas as pd
df = pd.read_hdf('_temp/cat.h5', 'df')
print(df.shape)
print(df.iloc[0])
(223, 5) prediction [6.67012e-06, 1.26349e-06, 1.86075e-06, 1.0960... ymin 0 xmin 0 ymax 500 xmax 496 Name: /home/shelhamer/caffe/examples/_temp/cat.jpg, dtype: object
In general, detect.py
is most efficient when running on a lot of images: it first extracts window proposals for all of them, batches the windows for efficient GPU processing, and then outputs the results.
Simply list an image per line in the images_file
, and it will process all of them.
Although this guide gives an example of ImageNet detection, detect.py
is clever enough to adapt to different Caffe models’ input dimensions, batch size, and output categories. Refer to python detect.py --help
for the parameters to describe your data set. No need for hardcoding.
Anyway, let's now load ImageNet class names and make a DataFrame of the predictions. Note you'll need the auxiliary ilsvrc2012 data fetched by data/ilsvrc12/get_ilsvrc12_aux.sh
.
with open('../data/ilsvrc12/synset_words.txt') as f:
labels_df = pd.DataFrame([
{
'synset_id': l.strip().split(' ')[0],
'name': ' '.join(l.strip().split(' ')[1:]).split(',')[0]
}
for l in f.readlines()
])
labels_df.sort('synset_id')
predictions_df = pd.DataFrame(np.vstack(df.prediction.values), columns=labels_df['name'])
print(predictions_df.iloc[0])
name tench 0.000007 goldfish 0.000001 great white shark 0.000002 tiger shark 0.000001 hammerhead 0.000007 electric ray 0.000004 stingray 0.000007 cock 0.000057 hen 0.002985 ostrich 0.000010 brambling 0.000004 goldfinch 0.000001 house finch 0.000004 junco 0.000002 indigo bunting 0.000001 ... daisy 0.000002 yellow lady's slipper 0.000002 corn 0.000019 acorn 0.000011 hip 0.000003 buckeye 0.000010 coral fungus 0.000005 agaric 0.000019 gyromitra 0.000039 stinkhorn 0.000002 earthstar 0.000025 hen-of-the-woods 0.000035 bolete 0.000036 ear 0.000008 toilet tissue 0.000019 Name: 0, Length: 1000, dtype: float32
Let's look at the activations.
gray()
matshow(predictions_df.values)
xlabel('Classes')
ylabel('Windows')
<matplotlib.text.Text at 0x4798650>
<matplotlib.figure.Figure at 0x4668990>
Now let's take max across all windows and plot the top classes.
max_s = predictions_df.max(0)
max_s.sort(ascending=False)
print(max_s[:10])
name proboscis monkey 0.920136 tiger cat 0.916973 milk can 0.791307 American black bear 0.625850 broccoli 0.609467 dhole 0.513998 platypus 0.507829 tiger 0.497029 lion 0.481180 dingo 0.474689 dtype: float32
Okay, there are indeed cats in there (and some nonsense). Picking good localizations is work in progress; manually, we see that the third and thirteenth top detections correspond to the two cats.
# Find, print, and display max detection.
window_order = pd.Series(predictions_df.values.max(1)).order(ascending=False)
i = window_order.index[3]
j = window_order.index[13]
# Show top predictions for top detection.
f = pd.Series(df['prediction'].iloc[i], index=labels_df['name'])
print('Top detection:')
print(f.order(ascending=False)[:5])
print('')
# Show top predictions for 10th top detection.
f = pd.Series(df['prediction'].iloc[j], index=labels_df['name'])
print('10th detection:')
print(f.order(ascending=False)[:5])
# Show top detection in red, 10th top detection in blue.
im = imread('_temp/cat.jpg')
imshow(im)
currentAxis = plt.gca()
det = df.iloc[i]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(Rectangle(*coords, fill=False, edgecolor='r', linewidth=5))
det = df.iloc[j]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(Rectangle(*coords, fill=False, edgecolor='b', linewidth=5))
Top detection: name tiger cat 0.882972 tiger 0.073158 tabby 0.025290 lynx 0.012881 Egyptian cat 0.004481 dtype: float32 10th detection: name tiger cat 0.677493 Pembroke 0.064214 dingo 0.050635 golden retriever 0.028331 tabby 0.021945 dtype: float32
<matplotlib.patches.Rectangle at 0x4ab3510>
That's cool. Both of these detections are tiger cats. Let's take all 'tiger cat' detections and NMS them to get rid of overlapping windows.
def nms_detections(dets, overlap=0.5):
"""
Non-maximum suppression: Greedily select high-scoring detections and
skip detections that are significantly covered by a previously
selected detection.
This version is translated from Matlab code by Tomasz Malisiewicz,
who sped up Pedro Felzenszwalb's code.
Parameters
----------
dets: ndarray
each row is ['xmin', 'ymin', 'xmax', 'ymax', 'score']
overlap: float
minimum overlap ratio (0.5 default)
Output
------
dets: ndarray
remaining after suppression.
"""
if np.shape(dets)[0] < 1:
return dets
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
w = x2 - x1
h = y2 - y1
area = w * h
s = dets[:, 4]
ind = np.argsort(s)
pick = []
counter = 0
while len(ind) > 0:
last = len(ind) - 1
i = ind[last]
pick.append(i)
counter += 1
xx1 = np.maximum(x1[i], x1[ind[:last]])
yy1 = np.maximum(y1[i], y1[ind[:last]])
xx2 = np.minimum(x2[i], x2[ind[:last]])
yy2 = np.minimum(y2[i], y2[ind[:last]])
w = np.maximum(0., xx2 - xx1 + 1)
h = np.maximum(0., yy2 - yy1 + 1)
o = w * h / area[ind[:last]]
to_delete = np.concatenate(
(np.nonzero(o > overlap)[0], np.array([last])))
ind = np.delete(ind, to_delete)
return dets[pick, :]
scores = predictions_df['tiger cat']
windows = df[['xmin', 'ymin', 'xmax', 'ymax']].values
dets = np.hstack((windows, scores[:, np.newaxis]))
nms_dets = nms_detections(dets)
Show top 3 NMS'd detections for 'tiger cat' in the image.
imshow(im)
currentAxis = plt.gca()
colors = ['r', 'b', 'y']
for c, det in zip(colors, nms_dets[:3]):
currentAxis.add_patch(
Rectangle((det[0], det[1]), det[2], det[3],
fill=False, edgecolor=c, linewidth=5)
)
Remove the temp directory to clean up.
import shutil
shutil.rmtree('_temp')