R-CNN是一个非常优秀的目标检测模型,虽然相比今天很多state-of-the-art方法,它的精度和效率都有略显不足,但是该模型是很多算法的基础思想。本文以官网文本为基础,只做本地运行的修改和中文说明:http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/detection.ipynb
细节信息可参考作者论文:Rich feature hierarchies for accurate object detection and semantic segmentation. Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. CVPR 2014. Arxiv 2013.
---Last update 2015年6月8日
1.下载训练好的R-CNN模型,也可以自己使用Caffe训练一个自己的模型。预训练模型基于Imagenet数据集,并在ILSVRC13上进行微调,输出200个检测分类。
下载方法:~/caffe-master$ ./scripts/download_model_binary.py models/bvlc_reference_rcnn_ilsvrc13
2.下载Selective Search,并运行matlab编译相关mex文件。
(1) 下载方法:https://github.com/sergeyk/selective_search_ijcv_with_python ,下载后解压,改名,并复制到 ~/caffe-master/python/selective_search_ijcv_with_python/
(2) 编译方法:启动matlab客户端,并运行~/caffe-master/python/selective_search_ijcv_with_python/demo.m ,无报错信息运行后关闭matlab即可。
3.执行python/detect.py报错时,可参考如下修改方法:
(1) 报错信息:OSError: [Errno 2] No such file or directory
修改文件:~/caffe-master/python/selective_search_ijcv_with_python/selective_search.py
修改前:mc = "matlab -nojvm -r \"try; {}; catch; exit; end; exit\"".format(command)
修改后:mc = "/usr/local/MATLAB/R2014a/bin/matlab -nojvm -r \"try; {}; catch; exit; end; exit\"".format(command)
(2) 报错信息:ValueError: 'axis' entry 2 is out of bounds (-2, 2)
修改文件:~/caffe-master/python/caffe/detector.py
修改前:predictions = out[self.outputs[0]].squeeze(axis=(2, 3))
修改后:predictions = out[self.outputs[0]].squeeze()
创建临时目录,并导入检测样本。检测样本可以同时导入多个,但会被作为一个样本进行处理,这种方式适合多预处理融合。
% cd '/home/ouxinyu/caffe-master'
! mkdir -p _temp
! echo examples/images/fish-bike.jpg > _temp/det_input.txt
/home/ouxinyu/caffe-master
调用Selective Search进行Region Proposal,然后调用Caffe进行分类预测。默认运行于GPU模式,若需要运行于CPU模式,可去掉--gpu
! python/detect.py --crop_mode=selective_search --pretrained_model=models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel --model_def=models/bvlc_reference_rcnn_ilsvrc13/deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5
GPU mode WARNING: Logging before InitGoogleLogging() is written to STDERR I0608 10:32:38.067106 6131 net.cpp:42] Initializing net from parameters: name: "R-CNN-ilsvrc13" input: "data" input_dim: 10 input_dim: 3 input_dim: 227 input_dim: 227 state { phase: TEST } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" convolution_param { num_output: 96 kernel_size: 11 stride: 4 } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "norm1" type: "LRN" bottom: "pool1" top: "norm1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "conv2" type: "Convolution" bottom: "norm1" top: "conv2" convolution_param { num_output: 256 pad: 2 kernel_size: 5 group: 2 } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "norm2" type: "LRN" bottom: "pool2" top: "norm2" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "conv3" type: "Convolution" bottom: "norm2" top: "conv3" convolution_param { num_output: 384 pad: 1 kernel_size: 3 } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "conv4" type: "Convolution" bottom: "conv3" top: "conv4" convolution_param { num_output: 384 pad: 1 kernel_size: 3 group: 2 } } layer { name: "relu4" type: "ReLU" bottom: "conv4" top: "conv4" } layer { name: "conv5" type: "Convolution" bottom: "conv4" top: "conv5" convolution_param { num_output: 256 pad: 1 kernel_size: 3 group: 2 } } layer { name: "relu5" type: "ReLU" bottom: "conv5" top: "conv5" } layer { name: "pool5" type: "Pooling" bottom: "conv5" top: "pool5" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "fc6" type: "InnerProduct" bottom: "pool5" top: "fc6" inner_product_param { num_output: 4096 } } layer { name: "relu6" type: "ReLU" bottom: "fc6" top: "fc6" } layer { name: "drop6" type: "Dropout" bottom: "fc6" top: "fc6" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7" type: "InnerProduct" bottom: "fc6" top: "fc7" inner_product_param { num_output: 4096 } } layer { name: "relu7" type: "ReLU" bottom: "fc7" top: "fc7" } layer { name: "drop7" type: "Dropout" bottom: "fc7" top: "fc7" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc-rcnn" type: "InnerProduct" bottom: "fc7" top: "fc-rcnn" inner_product_param { num_output: 200 } } I0608 10:32:38.067556 6131 net.cpp:370] Input 0 -> data I0608 10:32:38.067576 6131 layer_factory.hpp:74] Creating layer conv1 I0608 10:32:38.067585 6131 net.cpp:90] Creating Layer conv1 I0608 10:32:38.067589 6131 net.cpp:410] conv1 <- data I0608 10:32:38.067595 6131 net.cpp:368] conv1 -> conv1 I0608 10:32:38.067603 6131 net.cpp:120] Setting up conv1 I0608 10:32:38.108999 6131 net.cpp:127] Top shape: 10 96 55 55 (2904000) I0608 10:32:38.109035 6131 layer_factory.hpp:74] Creating layer relu1 I0608 10:32:38.109048 6131 net.cpp:90] Creating Layer relu1 I0608 10:32:38.109055 6131 net.cpp:410] relu1 <- conv1 I0608 10:32:38.109063 6131 net.cpp:357] relu1 -> conv1 (in-place) I0608 10:32:38.109076 6131 net.cpp:120] Setting up relu1 I0608 10:32:38.109233 6131 net.cpp:127] Top shape: 10 96 55 55 (2904000) I0608 10:32:38.109244 6131 layer_factory.hpp:74] Creating layer pool1 I0608 10:32:38.109257 6131 net.cpp:90] Creating Layer pool1 I0608 10:32:38.109263 6131 net.cpp:410] pool1 <- conv1 I0608 10:32:38.109269 6131 net.cpp:368] pool1 -> pool1 I0608 10:32:38.109277 6131 net.cpp:120] Setting up pool1 I0608 10:32:38.109311 6131 net.cpp:127] Top shape: 10 96 27 27 (699840) I0608 10:32:38.109318 6131 layer_factory.hpp:74] Creating layer norm1 I0608 10:32:38.109325 6131 net.cpp:90] Creating Layer norm1 I0608 10:32:38.109329 6131 net.cpp:410] norm1 <- pool1 I0608 10:32:38.109335 6131 net.cpp:368] norm1 -> norm1 I0608 10:32:38.109341 6131 net.cpp:120] Setting up norm1 I0608 10:32:38.109349 6131 net.cpp:127] Top shape: 10 96 27 27 (699840) I0608 10:32:38.109352 6131 layer_factory.hpp:74] Creating layer conv2 I0608 10:32:38.109360 6131 net.cpp:90] Creating Layer conv2 I0608 10:32:38.109364 6131 net.cpp:410] conv2 <- norm1 I0608 10:32:38.109370 6131 net.cpp:368] conv2 -> conv2 I0608 10:32:38.109376 6131 net.cpp:120] Setting up conv2 I0608 10:32:38.109931 6131 net.cpp:127] Top shape: 10 256 27 27 (1866240) I0608 10:32:38.109947 6131 layer_factory.hpp:74] Creating layer relu2 I0608 10:32:38.109954 6131 net.cpp:90] Creating Layer relu2 I0608 10:32:38.109959 6131 net.cpp:410] relu2 <- conv2 I0608 10:32:38.109966 6131 net.cpp:357] relu2 -> conv2 (in-place) I0608 10:32:38.109972 6131 net.cpp:120] Setting up relu2 I0608 10:32:38.110002 6131 net.cpp:127] Top shape: 10 256 27 27 (1866240) I0608 10:32:38.110008 6131 layer_factory.hpp:74] Creating layer pool2 I0608 10:32:38.110014 6131 net.cpp:90] Creating Layer pool2 I0608 10:32:38.110018 6131 net.cpp:410] pool2 <- conv2 I0608 10:32:38.110024 6131 net.cpp:368] pool2 -> pool2 I0608 10:32:38.110030 6131 net.cpp:120] Setting up pool2 I0608 10:32:38.110136 6131 net.cpp:127] Top shape: 10 256 13 13 (432640) I0608 10:32:38.110144 6131 layer_factory.hpp:74] Creating layer norm2 I0608 10:32:38.110152 6131 net.cpp:90] Creating Layer norm2 I0608 10:32:38.110157 6131 net.cpp:410] norm2 <- pool2 I0608 10:32:38.110162 6131 net.cpp:368] norm2 -> norm2 I0608 10:32:38.110168 6131 net.cpp:120] Setting up norm2 I0608 10:32:38.110175 6131 net.cpp:127] Top shape: 10 256 13 13 (432640) I0608 10:32:38.110179 6131 layer_factory.hpp:74] Creating layer conv3 I0608 10:32:38.110187 6131 net.cpp:90] Creating Layer conv3 I0608 10:32:38.110191 6131 net.cpp:410] conv3 <- norm2 I0608 10:32:38.110198 6131 net.cpp:368] conv3 -> conv3 I0608 10:32:38.110203 6131 net.cpp:120] Setting up conv3 I0608 10:32:38.111160 6131 net.cpp:127] Top shape: 10 384 13 13 (648960) I0608 10:32:38.111176 6131 layer_factory.hpp:74] Creating layer relu3 I0608 10:32:38.111183 6131 net.cpp:90] Creating Layer relu3 I0608 10:32:38.111189 6131 net.cpp:410] relu3 <- conv3 I0608 10:32:38.111194 6131 net.cpp:357] relu3 -> conv3 (in-place) I0608 10:32:38.111202 6131 net.cpp:120] Setting up relu3 I0608 10:32:38.111232 6131 net.cpp:127] Top shape: 10 384 13 13 (648960) I0608 10:32:38.111238 6131 layer_factory.hpp:74] Creating layer conv4 I0608 10:32:38.111243 6131 net.cpp:90] Creating Layer conv4 I0608 10:32:38.111248 6131 net.cpp:410] conv4 <- conv3 I0608 10:32:38.111253 6131 net.cpp:368] conv4 -> conv4 I0608 10:32:38.111260 6131 net.cpp:120] Setting up conv4 I0608 10:32:38.112344 6131 net.cpp:127] Top shape: 10 384 13 13 (648960) I0608 10:32:38.112357 6131 layer_factory.hpp:74] Creating layer relu4 I0608 10:32:38.112365 6131 net.cpp:90] Creating Layer relu4 I0608 10:32:38.112370 6131 net.cpp:410] relu4 <- conv4 I0608 10:32:38.112375 6131 net.cpp:357] relu4 -> conv4 (in-place) I0608 10:32:38.112381 6131 net.cpp:120] Setting up relu4 I0608 10:32:38.112411 6131 net.cpp:127] Top shape: 10 384 13 13 (648960) I0608 10:32:38.112416 6131 layer_factory.hpp:74] Creating layer conv5 I0608 10:32:38.112422 6131 net.cpp:90] Creating Layer conv5 I0608 10:32:38.112427 6131 net.cpp:410] conv5 <- conv4 I0608 10:32:38.112432 6131 net.cpp:368] conv5 -> conv5 I0608 10:32:38.112439 6131 net.cpp:120] Setting up conv5 I0608 10:32:38.113263 6131 net.cpp:127] Top shape: 10 256 13 13 (432640) I0608 10:32:38.113279 6131 layer_factory.hpp:74] Creating layer relu5 I0608 10:32:38.113286 6131 net.cpp:90] Creating Layer relu5 I0608 10:32:38.113291 6131 net.cpp:410] relu5 <- conv5 I0608 10:32:38.113297 6131 net.cpp:357] relu5 -> conv5 (in-place) I0608 10:32:38.113303 6131 net.cpp:120] Setting up relu5 I0608 10:32:38.113333 6131 net.cpp:127] Top shape: 10 256 13 13 (432640) I0608 10:32:38.113339 6131 layer_factory.hpp:74] Creating layer pool5 I0608 10:32:38.113347 6131 net.cpp:90] Creating Layer pool5 I0608 10:32:38.113350 6131 net.cpp:410] pool5 <- conv5 I0608 10:32:38.113356 6131 net.cpp:368] pool5 -> pool5 I0608 10:32:38.113363 6131 net.cpp:120] Setting up pool5 I0608 10:32:38.113502 6131 net.cpp:127] Top shape: 10 256 6 6 (92160) I0608 10:32:38.113520 6131 layer_factory.hpp:74] Creating layer fc6 I0608 10:32:38.113528 6131 net.cpp:90] Creating Layer fc6 I0608 10:32:38.113533 6131 net.cpp:410] fc6 <- pool5 I0608 10:32:38.113538 6131 net.cpp:368] fc6 -> fc6 I0608 10:32:38.113545 6131 net.cpp:120] Setting up fc6 I0608 10:32:38.140440 6131 net.cpp:127] Top shape: 10 4096 (40960) I0608 10:32:38.140478 6131 layer_factory.hpp:74] Creating layer relu6 I0608 10:32:38.140492 6131 net.cpp:90] Creating Layer relu6 I0608 10:32:38.140498 6131 net.cpp:410] relu6 <- fc6 I0608 10:32:38.140506 6131 net.cpp:357] relu6 -> fc6 (in-place) I0608 10:32:38.140516 6131 net.cpp:120] Setting up relu6 I0608 10:32:38.140576 6131 net.cpp:127] Top shape: 10 4096 (40960) I0608 10:32:38.140583 6131 layer_factory.hpp:74] Creating layer drop6 I0608 10:32:38.140589 6131 net.cpp:90] Creating Layer drop6 I0608 10:32:38.140594 6131 net.cpp:410] drop6 <- fc6 I0608 10:32:38.140599 6131 net.cpp:357] drop6 -> fc6 (in-place) I0608 10:32:38.140605 6131 net.cpp:120] Setting up drop6 I0608 10:32:38.140611 6131 net.cpp:127] Top shape: 10 4096 (40960) I0608 10:32:38.140616 6131 layer_factory.hpp:74] Creating layer fc7 I0608 10:32:38.140622 6131 net.cpp:90] Creating Layer fc7 I0608 10:32:38.140630 6131 net.cpp:410] fc7 <- fc6 I0608 10:32:38.140636 6131 net.cpp:368] fc7 -> fc7 I0608 10:32:38.140643 6131 net.cpp:120] Setting up fc7 I0608 10:32:38.153045 6131 net.cpp:127] Top shape: 10 4096 (40960) I0608 10:32:38.153095 6131 layer_factory.hpp:74] Creating layer relu7 I0608 10:32:38.153105 6131 net.cpp:90] Creating Layer relu7 I0608 10:32:38.153112 6131 net.cpp:410] relu7 <- fc7 I0608 10:32:38.153120 6131 net.cpp:357] relu7 -> fc7 (in-place) I0608 10:32:38.153129 6131 net.cpp:120] Setting up relu7 I0608 10:32:38.153200 6131 net.cpp:127] Top shape: 10 4096 (40960) I0608 10:32:38.153206 6131 layer_factory.hpp:74] Creating layer drop7 I0608 10:32:38.153214 6131 net.cpp:90] Creating Layer drop7 I0608 10:32:38.153219 6131 net.cpp:410] drop7 <- fc7 I0608 10:32:38.153224 6131 net.cpp:357] drop7 -> fc7 (in-place) I0608 10:32:38.153231 6131 net.cpp:120] Setting up drop7 I0608 10:32:38.153237 6131 net.cpp:127] Top shape: 10 4096 (40960) I0608 10:32:38.153242 6131 layer_factory.hpp:74] Creating layer fc-rcnn I0608 10:32:38.153249 6131 net.cpp:90] Creating Layer fc-rcnn I0608 10:32:38.153254 6131 net.cpp:410] fc-rcnn <- fc7 I0608 10:32:38.153259 6131 net.cpp:368] fc-rcnn -> fc-rcnn I0608 10:32:38.153267 6131 net.cpp:120] Setting up fc-rcnn I0608 10:32:38.154058 6131 net.cpp:127] Top shape: 10 200 (2000) I0608 10:32:38.154080 6131 net.cpp:194] fc-rcnn does not need backward computation. I0608 10:32:38.154085 6131 net.cpp:194] drop7 does not need backward computation. I0608 10:32:38.154090 6131 net.cpp:194] relu7 does not need backward computation. I0608 10:32:38.154095 6131 net.cpp:194] fc7 does not need backward computation. I0608 10:32:38.154100 6131 net.cpp:194] drop6 does not need backward computation. I0608 10:32:38.154105 6131 net.cpp:194] relu6 does not need backward computation. I0608 10:32:38.154110 6131 net.cpp:194] fc6 does not need backward computation. I0608 10:32:38.154115 6131 net.cpp:194] pool5 does not need backward computation. I0608 10:32:38.154129 6131 net.cpp:194] relu5 does not need backward computation. I0608 10:32:38.154134 6131 net.cpp:194] conv5 does not need backward computation. I0608 10:32:38.154139 6131 net.cpp:194] relu4 does not need backward computation. I0608 10:32:38.154145 6131 net.cpp:194] conv4 does not need backward computation. I0608 10:32:38.154150 6131 net.cpp:194] relu3 does not need backward computation. I0608 10:32:38.154155 6131 net.cpp:194] conv3 does not need backward computation. I0608 10:32:38.154160 6131 net.cpp:194] norm2 does not need backward computation. I0608 10:32:38.154165 6131 net.cpp:194] pool2 does not need backward computation. I0608 10:32:38.154170 6131 net.cpp:194] relu2 does not need backward computation. I0608 10:32:38.154175 6131 net.cpp:194] conv2 does not need backward computation. I0608 10:32:38.154180 6131 net.cpp:194] norm1 does not need backward computation. I0608 10:32:38.154193 6131 net.cpp:194] pool1 does not need backward computation. I0608 10:32:38.154198 6131 net.cpp:194] relu1 does not need backward computation. I0608 10:32:38.154203 6131 net.cpp:194] conv1 does not need backward computation. I0608 10:32:38.154208 6131 net.cpp:235] This network produces output fc-rcnn I0608 10:32:38.154220 6131 net.cpp:482] Collecting Learning Rate and Weight Decay. I0608 10:32:38.154227 6131 net.cpp:247] Network initialization done. I0608 10:32:38.154232 6131 net.cpp:248] Memory required for data: 62425920 E0608 10:32:38.221285 6131 upgrade_proto.cpp:618] Attempting to upgrade input file specified using deprecated V1LayerParameter: models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel I0608 10:32:38.324671 6131 upgrade_proto.cpp:626] Successfully upgraded file specified using deprecated V1LayerParameter Loading input... selective_search_rcnn({'/home/ouxinyu/caffe-master/examples/images/fish-bike.jpg'}, '/tmp/tmpu85WGa.mat') Processed 1570 windows in 17.131 s. /usr/lib/python2.7/dist-packages/pandas/io/pytables.py:2487: PerformanceWarning: your performance may suffer as PyTables will pickle object types that it cannot map directly to c-types [inferred_type->mixed,key->block1_values] [items->['prediction']] warnings.warn(ws, PerformanceWarning) Saved to _temp/det_output.h5 in 0.025 s.
下面的内容没什么问题,路径继续改改,说明直接贴原作的....
Running this outputs a DataFrame with the filenames, selected windows, and their detection scores to an HDF5 file. (We only ran on one image, so the filenames will all be the same.)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_hdf('_temp/det_output.h5', 'df')
print(df.shape)
print(df.iloc[0])
(1570, 5) prediction [-2.64134, -2.90464, -2.84325, -3.23465, -1.97... ymin 79.846 xmin 9.62 ymax 246.31 xmax 339.624 Name: /home/ouxinyu/caffe-master/examples/images/fish-bike.jpg, dtype: object
1570 regions were proposed with the R-CNN configuration of selective search. The number of proposals will vary from image to image based on its contents and size -- selective search isn't scale invariant.
In general, detect.py is most efficient when running on a lot of images: it first extracts window proposals for all of them, batches the windows for efficient GPU processing, and then outputs the results. Simply list an image per line in the images_file, and it will process all of them.
Although this guide gives an example of R-CNN ImageNet detection, detect.py is clever enough to adapt to different Caffe models’ input dimensions, batch size, and output categories. You can switch the model definition and pretrained model as desired. Refer to python detect.py --help for the parameters to describe your data set. There's no need for hardcoding.
Anyway, let's now load the ILSVRC13 detection class names and make a DataFrame of the predictions. Note you'll need the auxiliary ilsvrc2012 data fetched by data/ilsvrc12/get_ilsvrc12_aux.sh.
with open('data/ilsvrc12/det_synset_words.txt') as f:
labels_df = pd.DataFrame([
{
'synset_id': l.strip().split(' ')[0],
'name': ' '.join(l.strip().split(' ')[1:]).split(',')[0]
}
for l in f.readlines()
])
labels_df.sort('synset_id')
predictions_df = pd.DataFrame(np.vstack(df.prediction.values), columns=labels_df['name'])
print(predictions_df.iloc[0])
name accordion -2.641338 airplane -2.904639 ant -2.843245 antelope -3.234649 apple -1.976960 armadillo -2.488007 artichoke -2.218568 axe -2.338795 baby bed -2.755479 backpack -2.180768 bagel -2.697270 balance beam -2.780527 banana -2.433329 band aid -1.631823 banjo -2.317316 ... trombone -2.587927 trumpet -2.396858 turtle -2.376043 tv or monitor -2.763605 unicycle -2.254395 vacuum -1.918464 violin -2.746913 volleyball -2.758842 waffle iron -2.421376 washer -2.415665 water bottle -2.175697 watercraft -2.949454 whale -3.157514 wine bottle -2.790261 zebra -2.768192 Name: 0, Length: 200, dtype: float32
Let's look at the activations.
plt.gray()
plt.matshow(predictions_df.values)
plt.xlabel('Classes')
plt.ylabel('Windows')
<matplotlib.text.Text at 0x7faa0d591f50>
<matplotlib.figure.Figure at 0x7faa365216d0>
Now let's take max across all windows and plot the top classes.
max_s = predictions_df.max(0)
max_s.sort(ascending=False)
print(max_s[:10])
name person 1.839884 bicycle 0.855625 unicycle 0.068060 motorcycle 0.003604 banjo -0.001440 turtle -0.030387 electric fan -0.220595 cart -0.225192 lizard -0.365948 helmet -0.477555 dtype: float32
The top detections are in fact a person and bicycle. Picking good localizations is a work in progress; we pick the top-scoring person and bicycle detections.
# Find, print, and display the top detections: person and bicycle.
i = predictions_df['person'].argmax()
j = predictions_df['bicycle'].argmax()
# Show top predictions for top detection.
f = pd.Series(df['prediction'].iloc[i], index=labels_df['name'])
print('Top detection:')
print(f.order(ascending=False)[:5])
print('')
# Show top predictions for second-best detection.
f = pd.Series(df['prediction'].iloc[j], index=labels_df['name'])
print('Second-best detection:')
print(f.order(ascending=False)[:5])
# Show top detection in red, second-best top detection in blue.
im = plt.imread('examples/images/fish-bike.jpg')
plt.imshow(im)
currentAxis = plt.gca()
det = df.iloc[i]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='r', linewidth=5))
det = df.iloc[j]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='b', linewidth=5))
Top detection: name person 1.839884 swimming trunks -1.157806 turtle -1.168884 tie -1.217268 rubber eraser -1.246662 dtype: float32 Second-best detection: name bicycle 0.855625 unicycle -0.334367 scorpion -0.824552 lobster -0.965544 lamp -1.076225 dtype: float32
<matplotlib.patches.Rectangle at 0x7faa0d746c50>
That's cool. Let's take all 'bicycle' detections and NMS them to get rid of overlapping windows.
def nms_detections(dets, overlap=0.3):
"""
Non-maximum suppression: Greedily select high-scoring detections and
skip detections that are significantly covered by a previously
selected detection.
This version is translated from Matlab code by Tomasz Malisiewicz,
who sped up Pedro Felzenszwalb's code.
Parameters
----------
dets: ndarray
each row is ['xmin', 'ymin', 'xmax', 'ymax', 'score']
overlap: float
minimum overlap ratio (0.3 default)
Output
------
dets: ndarray
remaining after suppression.
"""
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
ind = np.argsort(dets[:, 4])
w = x2 - x1
h = y2 - y1
area = (w * h).astype(float)
pick = []
while len(ind) > 0:
i = ind[-1]
pick.append(i)
ind = ind[:-1]
xx1 = np.maximum(x1[i], x1[ind])
yy1 = np.maximum(y1[i], y1[ind])
xx2 = np.minimum(x2[i], x2[ind])
yy2 = np.minimum(y2[i], y2[ind])
w = np.maximum(0., xx2 - xx1)
h = np.maximum(0., yy2 - yy1)
wh = w * h
o = wh / (area[i] + area[ind] - wh)
ind = ind[np.nonzero(o <= overlap)[0]]
return dets[pick, :]
scores = predictions_df['bicycle']
windows = df[['xmin', 'ymin', 'xmax', 'ymax']].values
dets = np.hstack((windows, scores[:, np.newaxis]))
nms_dets = nms_detections(dets)
Show top 3 NMS'd detections for 'bicycle' in the image and note the gap between the top scoring box (red) and the remaining boxes.
plt.imshow(im)
currentAxis = plt.gca()
colors = ['r', 'b', 'y']
for c, det in zip(colors, nms_dets[:3]):
currentAxis.add_patch(
plt.Rectangle((det[0], det[1]), det[2]-det[0], det[3]-det[1],
fill=False, edgecolor=c, linewidth=5)
)
print 'scores:', nms_dets[:3, 4]
scores: [ 0.85562468 -0.73134422 -1.33959854]
This was an easy instance for bicycle as it was in the class's training set. However, the person result is a true detection since this was not in the set for that class.
You should try out detection on an image of your own next!
(Remove the temp directory to clean up, and we're done.)
!rm -rf _temp