这个例子因为年久失修,已经跑不起来了。网上推荐看 rbg 的 faster-rcnn 代码。
R-CNN is a state-of-the-art detector that classifies region proposals by a finetuned Caffe model. For the full details of the R-CNN system and model, refer to its project site and the paper:
Rich feature hierarchies for accurate object detection and semantic segmentation. Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. CVPR 2014. Arxiv 2013.
In this example, we do detection by a pure Caffe edition of the R-CNN model for ImageNet. The R-CNN detector outputs class scores for the 200 detection classes of ILSVRC13. Keep in mind that these are raw one vs. all SVM scores, so they are not probabilistically calibrated or exactly comparable across classes. Note that this off-the-shelf model is simply for convenience, and is not the full R-CNN model.
Let's run detection on an image of a bicyclist riding a fish bike in the desert (from the ImageNet challenge—no joke).
First, we'll need region proposals and the Caffe R-CNN ImageNet model:
Selective Search is the region proposer used by R-CNN. The selective_search_ijcv_with_python Python module takes care of extracting proposals through the selective search MATLAB implementation. To install it, download the module and name its directory selective_search_ijcv_with_python
, run the demo in MATLAB to compile the necessary functions, then add it to your PYTHONPATH
for importing. (If you have your own region proposals prepared, or would rather not bother with this step, detect.py accepts a list of images and bounding boxes as CSV.)
Run ./scripts/download_model_binary.py models/bvlc_reference_rcnn_ilsvrc13
to get the Caffe R-CNN ImageNet model.
With that done, we'll call the bundled detect.py
to generate the region proposals and run the network. For an explanation of the arguments, do ./detect.py --help
.
import os
CURDIR = os.path.realpath(os.getcwd())
print CURDIR
import sys
sys.path.insert(0, '/home/tzx/dev/caffe-rc3')
!mkdir -p _temp
!echo `pwd`/images/fish-bike.jpg > _temp/det_input.txt
/home/tzx/dev/caffe-rc3/examples
import selective_search_ijcv_with_python
!export PYTHONPATH=/home/tzx/dev/caffe-rc3/python:/home/tzx/dev/caffe-rc3:$PYTHONPATH
!../python/detect.py --crop_mode=selective_search --pretrained_model=../models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel --model_def=../models/bvlc_reference_rcnn_ilsvrc13/deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5
GPU mode WARNING: Logging before InitGoogleLogging() is written to STDERR I0621 16:58:20.702862 9538 net.cpp:49] Initializing net from parameters: name: "R-CNN-ilsvrc13" input: "data" state { phase: TEST } input_shape { dim: 10 dim: 3 dim: 227 dim: 227 } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" convolution_param { num_output: 96 kernel_size: 11 stride: 4 } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "norm1" type: "LRN" bottom: "pool1" top: "norm1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "conv2" type: "Convolution" bottom: "norm1" top: "conv2" convolution_param { num_output: 256 pad: 2 kernel_size: 5 group: 2 } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "norm2" type: "LRN" bottom: "pool2" top: "norm2" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "conv3" type: "Convolution" bottom: "norm2" top: "conv3" convolution_param { num_output: 384 pad: 1 kernel_size: 3 } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "conv4" type: "Convolution" bottom: "conv3" top: "conv4" convolution_param { num_output: 384 pad: 1 kernel_size: 3 group: 2 } } layer { name: "relu4" type: "ReLU" bottom: "conv4" top: "conv4" } layer { name: "conv5" type: "Convolution" bottom: "conv4" top: "conv5" convolution_param { num_output: 256 pad: 1 kernel_size: 3 group: 2 } } layer { name: "relu5" type: "ReLU" bottom: "conv5" top: "conv5" } layer { name: "pool5" type: "Pooling" bottom: "conv5" top: "pool5" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "fc6" type: "InnerProduct" bottom: "pool5" top: "fc6" inner_product_param { num_output: 4096 } } layer { name: "relu6" type: "ReLU" bottom: "fc6" top: "fc6" } layer { name: "drop6" type: "Dropout" bottom: "fc6" top: "fc6" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7" type: "InnerProduct" bottom: "fc6" top: "fc7" inner_product_param { num_output: 4096 } } layer { name: "relu7" type: "ReLU" bottom: "fc7" top: "fc7" } layer { name: "drop7" type: "Dropout" bottom: "fc7" top: "fc7" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc-rcnn" type: "InnerProduct" bottom: "fc7" top: "fc-rcnn" inner_product_param { num_output: 200 } } I0621 16:58:20.703330 9538 net.cpp:413] Input 0 -> data I0621 16:58:20.717985 9538 layer_factory.hpp:77] Creating layer conv1 I0621 16:58:20.718034 9538 net.cpp:106] Creating Layer conv1 I0621 16:58:20.718042 9538 net.cpp:454] conv1 <- data I0621 16:58:20.718052 9538 net.cpp:411] conv1 -> conv1 I0621 16:58:20.818161 9538 net.cpp:150] Setting up conv1 I0621 16:58:20.818195 9538 net.cpp:157] Top shape: 10 96 55 55 (2904000) I0621 16:58:20.818202 9538 net.cpp:165] Memory required for data: 11616000 I0621 16:58:20.818222 9538 layer_factory.hpp:77] Creating layer relu1 I0621 16:58:20.818238 9538 net.cpp:106] Creating Layer relu1 I0621 16:58:20.818244 9538 net.cpp:454] relu1 <- conv1 I0621 16:58:20.818251 9538 net.cpp:397] relu1 -> conv1 (in-place) I0621 16:58:20.818477 9538 net.cpp:150] Setting up relu1 I0621 16:58:20.818492 9538 net.cpp:157] Top shape: 10 96 55 55 (2904000) I0621 16:58:20.818497 9538 net.cpp:165] Memory required for data: 23232000 I0621 16:58:20.818502 9538 layer_factory.hpp:77] Creating layer pool1 I0621 16:58:20.818514 9538 net.cpp:106] Creating Layer pool1 I0621 16:58:20.818521 9538 net.cpp:454] pool1 <- conv1 I0621 16:58:20.818527 9538 net.cpp:411] pool1 -> pool1 I0621 16:58:20.818568 9538 net.cpp:150] Setting up pool1 I0621 16:58:20.818578 9538 net.cpp:157] Top shape: 10 96 27 27 (699840) I0621 16:58:20.818583 9538 net.cpp:165] Memory required for data: 26031360 I0621 16:58:20.818588 9538 layer_factory.hpp:77] Creating layer norm1 I0621 16:58:20.818598 9538 net.cpp:106] Creating Layer norm1 I0621 16:58:20.818603 9538 net.cpp:454] norm1 <- pool1 I0621 16:58:20.818608 9538 net.cpp:411] norm1 -> norm1 I0621 16:58:20.818758 9538 net.cpp:150] Setting up norm1 I0621 16:58:20.818771 9538 net.cpp:157] Top shape: 10 96 27 27 (699840) I0621 16:58:20.818776 9538 net.cpp:165] Memory required for data: 28830720 I0621 16:58:20.818781 9538 layer_factory.hpp:77] Creating layer conv2 I0621 16:58:20.818794 9538 net.cpp:106] Creating Layer conv2 I0621 16:58:20.818799 9538 net.cpp:454] conv2 <- norm1 I0621 16:58:20.818805 9538 net.cpp:411] conv2 -> conv2 I0621 16:58:20.820828 9538 net.cpp:150] Setting up conv2 I0621 16:58:20.820847 9538 net.cpp:157] Top shape: 10 256 27 27 (1866240) I0621 16:58:20.820852 9538 net.cpp:165] Memory required for data: 36295680 I0621 16:58:20.820863 9538 layer_factory.hpp:77] Creating layer relu2 I0621 16:58:20.820870 9538 net.cpp:106] Creating Layer relu2 I0621 16:58:20.820876 9538 net.cpp:454] relu2 <- conv2 I0621 16:58:20.820883 9538 net.cpp:397] relu2 -> conv2 (in-place) I0621 16:58:20.821020 9538 net.cpp:150] Setting up relu2 I0621 16:58:20.821033 9538 net.cpp:157] Top shape: 10 256 27 27 (1866240) I0621 16:58:20.821038 9538 net.cpp:165] Memory required for data: 43760640 I0621 16:58:20.821043 9538 layer_factory.hpp:77] Creating layer pool2 I0621 16:58:20.821050 9538 net.cpp:106] Creating Layer pool2 I0621 16:58:20.821055 9538 net.cpp:454] pool2 <- conv2 I0621 16:58:20.821061 9538 net.cpp:411] pool2 -> pool2 I0621 16:58:20.821097 9538 net.cpp:150] Setting up pool2 I0621 16:58:20.821105 9538 net.cpp:157] Top shape: 10 256 13 13 (432640) I0621 16:58:20.821110 9538 net.cpp:165] Memory required for data: 45491200 I0621 16:58:20.821115 9538 layer_factory.hpp:77] Creating layer norm2 I0621 16:58:20.821123 9538 net.cpp:106] Creating Layer norm2 I0621 16:58:20.821130 9538 net.cpp:454] norm2 <- pool2 I0621 16:58:20.821135 9538 net.cpp:411] norm2 -> norm2 I0621 16:58:20.821372 9538 net.cpp:150] Setting up norm2 I0621 16:58:20.821388 9538 net.cpp:157] Top shape: 10 256 13 13 (432640) I0621 16:58:20.821393 9538 net.cpp:165] Memory required for data: 47221760 I0621 16:58:20.821398 9538 layer_factory.hpp:77] Creating layer conv3 I0621 16:58:20.821409 9538 net.cpp:106] Creating Layer conv3 I0621 16:58:20.821415 9538 net.cpp:454] conv3 <- norm2 I0621 16:58:20.821422 9538 net.cpp:411] conv3 -> conv3 I0621 16:58:20.823508 9538 net.cpp:150] Setting up conv3 I0621 16:58:20.823526 9538 net.cpp:157] Top shape: 10 384 13 13 (648960) I0621 16:58:20.823532 9538 net.cpp:165] Memory required for data: 49817600 I0621 16:58:20.823544 9538 layer_factory.hpp:77] Creating layer relu3 I0621 16:58:20.823552 9538 net.cpp:106] Creating Layer relu3 I0621 16:58:20.823559 9538 net.cpp:454] relu3 <- conv3 I0621 16:58:20.823565 9538 net.cpp:397] relu3 -> conv3 (in-place) I0621 16:58:20.823699 9538 net.cpp:150] Setting up relu3 I0621 16:58:20.823711 9538 net.cpp:157] Top shape: 10 384 13 13 (648960) I0621 16:58:20.823716 9538 net.cpp:165] Memory required for data: 52413440 I0621 16:58:20.823721 9538 layer_factory.hpp:77] Creating layer conv4 I0621 16:58:20.823731 9538 net.cpp:106] Creating Layer conv4 I0621 16:58:20.823736 9538 net.cpp:454] conv4 <- conv3 I0621 16:58:20.823745 9538 net.cpp:411] conv4 -> conv4 I0621 16:58:20.825955 9538 net.cpp:150] Setting up conv4 I0621 16:58:20.825978 9538 net.cpp:157] Top shape: 10 384 13 13 (648960) I0621 16:58:20.825985 9538 net.cpp:165] Memory required for data: 55009280 I0621 16:58:20.825994 9538 layer_factory.hpp:77] Creating layer relu4 I0621 16:58:20.826002 9538 net.cpp:106] Creating Layer relu4 I0621 16:58:20.826009 9538 net.cpp:454] relu4 <- conv4 I0621 16:58:20.826014 9538 net.cpp:397] relu4 -> conv4 (in-place) I0621 16:58:20.826148 9538 net.cpp:150] Setting up relu4 I0621 16:58:20.826159 9538 net.cpp:157] Top shape: 10 384 13 13 (648960) I0621 16:58:20.826164 9538 net.cpp:165] Memory required for data: 57605120 I0621 16:58:20.826169 9538 layer_factory.hpp:77] Creating layer conv5 I0621 16:58:20.826179 9538 net.cpp:106] Creating Layer conv5 I0621 16:58:20.826184 9538 net.cpp:454] conv5 <- conv4 I0621 16:58:20.826191 9538 net.cpp:411] conv5 -> conv5 I0621 16:58:20.828217 9538 net.cpp:150] Setting up conv5 I0621 16:58:20.828235 9538 net.cpp:157] Top shape: 10 256 13 13 (432640) I0621 16:58:20.828241 9538 net.cpp:165] Memory required for data: 59335680 I0621 16:58:20.828251 9538 layer_factory.hpp:77] Creating layer relu5 I0621 16:58:20.828259 9538 net.cpp:106] Creating Layer relu5 I0621 16:58:20.828265 9538 net.cpp:454] relu5 <- conv5 I0621 16:58:20.828272 9538 net.cpp:397] relu5 -> conv5 (in-place) I0621 16:58:20.828408 9538 net.cpp:150] Setting up relu5 I0621 16:58:20.828419 9538 net.cpp:157] Top shape: 10 256 13 13 (432640) I0621 16:58:20.828424 9538 net.cpp:165] Memory required for data: 61066240 I0621 16:58:20.828429 9538 layer_factory.hpp:77] Creating layer pool5 I0621 16:58:20.828438 9538 net.cpp:106] Creating Layer pool5 I0621 16:58:20.828444 9538 net.cpp:454] pool5 <- conv5 I0621 16:58:20.828451 9538 net.cpp:411] pool5 -> pool5 I0621 16:58:20.828486 9538 net.cpp:150] Setting up pool5 I0621 16:58:20.828495 9538 net.cpp:157] Top shape: 10 256 6 6 (92160) I0621 16:58:20.828500 9538 net.cpp:165] Memory required for data: 61434880 I0621 16:58:20.828505 9538 layer_factory.hpp:77] Creating layer fc6 I0621 16:58:20.828517 9538 net.cpp:106] Creating Layer fc6 I0621 16:58:20.828522 9538 net.cpp:454] fc6 <- pool5 I0621 16:58:20.828528 9538 net.cpp:411] fc6 -> fc6 I0621 16:58:20.889945 9538 net.cpp:150] Setting up fc6 I0621 16:58:20.889986 9538 net.cpp:157] Top shape: 10 4096 (40960) I0621 16:58:20.889993 9538 net.cpp:165] Memory required for data: 61598720 I0621 16:58:20.890004 9538 layer_factory.hpp:77] Creating layer relu6 I0621 16:58:20.890018 9538 net.cpp:106] Creating Layer relu6 I0621 16:58:20.890027 9538 net.cpp:454] relu6 <- fc6 I0621 16:58:20.890034 9538 net.cpp:397] relu6 -> fc6 (in-place) I0621 16:58:20.890375 9538 net.cpp:150] Setting up relu6 I0621 16:58:20.890391 9538 net.cpp:157] Top shape: 10 4096 (40960) I0621 16:58:20.890398 9538 net.cpp:165] Memory required for data: 61762560 I0621 16:58:20.890403 9538 layer_factory.hpp:77] Creating layer drop6 I0621 16:58:20.890414 9538 net.cpp:106] Creating Layer drop6 I0621 16:58:20.890420 9538 net.cpp:454] drop6 <- fc6 I0621 16:58:20.890427 9538 net.cpp:397] drop6 -> fc6 (in-place) I0621 16:58:20.890455 9538 net.cpp:150] Setting up drop6 I0621 16:58:20.890462 9538 net.cpp:157] Top shape: 10 4096 (40960) I0621 16:58:20.890467 9538 net.cpp:165] Memory required for data: 61926400 I0621 16:58:20.890472 9538 layer_factory.hpp:77] Creating layer fc7 I0621 16:58:20.890482 9538 net.cpp:106] Creating Layer fc7 I0621 16:58:20.890487 9538 net.cpp:454] fc7 <- fc6 I0621 16:58:20.890496 9538 net.cpp:411] fc7 -> fc7 I0621 16:58:20.917780 9538 net.cpp:150] Setting up fc7 I0621 16:58:20.917819 9538 net.cpp:157] Top shape: 10 4096 (40960) I0621 16:58:20.917824 9538 net.cpp:165] Memory required for data: 62090240 I0621 16:58:20.917836 9538 layer_factory.hpp:77] Creating layer relu7 I0621 16:58:20.917847 9538 net.cpp:106] Creating Layer relu7 I0621 16:58:20.917855 9538 net.cpp:454] relu7 <- fc7 I0621 16:58:20.917862 9538 net.cpp:397] relu7 -> fc7 (in-place) I0621 16:58:20.918051 9538 net.cpp:150] Setting up relu7 I0621 16:58:20.918063 9538 net.cpp:157] Top shape: 10 4096 (40960) I0621 16:58:20.918069 9538 net.cpp:165] Memory required for data: 62254080 I0621 16:58:20.918074 9538 layer_factory.hpp:77] Creating layer drop7 I0621 16:58:20.918086 9538 net.cpp:106] Creating Layer drop7 I0621 16:58:20.918090 9538 net.cpp:454] drop7 <- fc7 I0621 16:58:20.918097 9538 net.cpp:397] drop7 -> fc7 (in-place) I0621 16:58:20.918121 9538 net.cpp:150] Setting up drop7 I0621 16:58:20.918129 9538 net.cpp:157] Top shape: 10 4096 (40960) I0621 16:58:20.918134 9538 net.cpp:165] Memory required for data: 62417920 I0621 16:58:20.918138 9538 layer_factory.hpp:77] Creating layer fc-rcnn I0621 16:58:20.918148 9538 net.cpp:106] Creating Layer fc-rcnn I0621 16:58:20.918154 9538 net.cpp:454] fc-rcnn <- fc7 I0621 16:58:20.918160 9538 net.cpp:411] fc-rcnn -> fc-rcnn I0621 16:58:20.919497 9538 net.cpp:150] Setting up fc-rcnn I0621 16:58:20.919513 9538 net.cpp:157] Top shape: 10 200 (2000) I0621 16:58:20.919518 9538 net.cpp:165] Memory required for data: 62425920 I0621 16:58:20.919528 9538 net.cpp:228] fc-rcnn does not need backward computation. I0621 16:58:20.919533 9538 net.cpp:228] drop7 does not need backward computation. I0621 16:58:20.919538 9538 net.cpp:228] relu7 does not need backward computation. I0621 16:58:20.919543 9538 net.cpp:228] fc7 does not need backward computation. I0621 16:58:20.919548 9538 net.cpp:228] drop6 does not need backward computation. I0621 16:58:20.919553 9538 net.cpp:228] relu6 does not need backward computation. I0621 16:58:20.919559 9538 net.cpp:228] fc6 does not need backward computation. I0621 16:58:20.919564 9538 net.cpp:228] pool5 does not need backward computation. I0621 16:58:20.919569 9538 net.cpp:228] relu5 does not need backward computation. I0621 16:58:20.919574 9538 net.cpp:228] conv5 does not need backward computation. I0621 16:58:20.919579 9538 net.cpp:228] relu4 does not need backward computation. I0621 16:58:20.919584 9538 net.cpp:228] conv4 does not need backward computation. I0621 16:58:20.919589 9538 net.cpp:228] relu3 does not need backward computation. I0621 16:58:20.919595 9538 net.cpp:228] conv3 does not need backward computation. I0621 16:58:20.919600 9538 net.cpp:228] norm2 does not need backward computation. I0621 16:58:20.919605 9538 net.cpp:228] pool2 does not need backward computation. I0621 16:58:20.919610 9538 net.cpp:228] relu2 does not need backward computation. I0621 16:58:20.919615 9538 net.cpp:228] conv2 does not need backward computation. I0621 16:58:20.919620 9538 net.cpp:228] norm1 does not need backward computation. I0621 16:58:20.919625 9538 net.cpp:228] pool1 does not need backward computation. I0621 16:58:20.919630 9538 net.cpp:228] relu1 does not need backward computation. I0621 16:58:20.919634 9538 net.cpp:228] conv1 does not need backward computation. I0621 16:58:20.919639 9538 net.cpp:270] This network produces output fc-rcnn I0621 16:58:20.919652 9538 net.cpp:283] Network initialization done. I0621 16:58:21.193883 9538 upgrade_proto.cpp:51] Attempting to upgrade input file specified using deprecated V1LayerParameter: ../models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel I0621 16:58:21.310328 9538 upgrade_proto.cpp:59] Successfully upgraded file specified using deprecated V1LayerParameter Loading input... selective_search_rcnn({'/home/tzx/dev/caffe-rc3/examples/images/fish-bike.jpg'}, '/tmp/tmpzKKXgc.mat') No protocol specified /home/tzx/dev/caffe-rc3/python/caffe/detector.py:140: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future crop = im[window[0]:window[2], window[1]:window[3]] /home/tzx/dev/caffe-rc3/python/caffe/detector.py:174: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future context_crop = im[box[0]:box[2], box[1]:box[3]] /home/tzx/dev/caffe-rc3/python/caffe/detector.py:177: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future crop[pad_y:(pad_y + crop_h), pad_x:(pad_x + crop_w)] = context_crop Traceback (most recent call last): File "../python/detect.py", line 173, in <module> main(sys.argv) File "../python/detect.py", line 144, in main detections = detector.detect_selective_search(inputs) File "/home/tzx/dev/caffe-rc3/python/caffe/detector.py", line 123, in detect_selective_search return self.detect_windows(zip(image_fnames, windows_list)) File "/home/tzx/dev/caffe-rc3/python/caffe/detector.py", line 86, in detect_windows predictions = out[self.outputs[0]].squeeze(axis=(2, 3)) ValueError: 'axis' entry 2 is out of bounds [-2, 2)
This run was in GPU mode. For CPU mode detection, call detect.py
without the --gpu
argument.
Running this outputs a DataFrame with the filenames, selected windows, and their detection scores to an HDF5 file. (We only ran on one image, so the filenames will all be the same.)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_hdf('_temp/det_output.h5', 'df')
print(df.shape)
print(df.iloc[0])
(1570, 5) prediction [-2.62247, -2.84579, -2.85122, -3.20838, -1.94... ymin 79.846 xmin 9.62 ymax 246.31 xmax 339.624 Name: /Users/shelhamer/h/desk/caffe/caffe-dev/examples/images/fish-bike.jpg, dtype: object
1570 regions were proposed with the R-CNN configuration of selective search. The number of proposals will vary from image to image based on its contents and size -- selective search isn't scale invariant.
In general, detect.py
is most efficient when running on a lot of images: it first extracts window proposals for all of them, batches the windows for efficient GPU processing, and then outputs the results.
Simply list an image per line in the images_file
, and it will process all of them.
Although this guide gives an example of R-CNN ImageNet detection, detect.py
is clever enough to adapt to different Caffe models’ input dimensions, batch size, and output categories. You can switch the model definition and pretrained model as desired. Refer to python detect.py --help
for the parameters to describe your data set. There's no need for hardcoding.
Anyway, let's now load the ILSVRC13 detection class names and make a DataFrame of the predictions. Note you'll need the auxiliary ilsvrc2012 data fetched by data/ilsvrc12/get_ilsvrc12_aux.sh
.
with open('../data/ilsvrc12/det_synset_words.txt') as f:
labels_df = pd.DataFrame([
{
'synset_id': l.strip().split(' ')[0],
'name': ' '.join(l.strip().split(' ')[1:]).split(',')[0]
}
for l in f.readlines()
])
labels_df.sort('synset_id')
predictions_df = pd.DataFrame(np.vstack(df.prediction.values), columns=labels_df['name'])
print(predictions_df.iloc[0])
name accordion -2.622471 airplane -2.845788 ant -2.851219 antelope -3.208377 apple -1.949950 armadillo -2.472935 artichoke -2.201684 axe -2.327404 baby bed -2.737925 backpack -2.176763 bagel -2.681061 balance beam -2.722538 banana -2.390628 band aid -1.598909 banjo -2.298197 ... trombone -2.582361 trumpet -2.352853 turtle -2.360859 tv or monitor -2.761043 unicycle -2.218467 vacuum -1.907717 violin -2.757079 volleyball -2.723689 waffle iron -2.418540 washer -2.408994 water bottle -2.174899 watercraft -2.837425 whale -3.120338 wine bottle -2.772960 zebra -2.742913 Name: 0, Length: 200, dtype: float32
Let's look at the activations.
plt.gray()
plt.matshow(predictions_df.values)
plt.xlabel('Classes')
plt.ylabel('Windows')
<matplotlib.text.Text at 0x114f15f90>
<matplotlib.figure.Figure at 0x114254b50>
Now let's take max across all windows and plot the top classes.
max_s = predictions_df.max(0)
max_s.sort(ascending=False)
print(max_s[:10])
name person 1.835771 bicycle 0.866110 unicycle 0.057080 motorcycle -0.006122 banjo -0.028209 turtle -0.189831 electric fan -0.206788 cart -0.214235 lizard -0.393519 helmet -0.477942 dtype: float32
The top detections are in fact a person and bicycle. Picking good localizations is a work in progress; we pick the top-scoring person and bicycle detections.
# Find, print, and display the top detections: person and bicycle.
i = predictions_df['person'].argmax()
j = predictions_df['bicycle'].argmax()
# Show top predictions for top detection.
f = pd.Series(df['prediction'].iloc[i], index=labels_df['name'])
print('Top detection:')
print(f.order(ascending=False)[:5])
print('')
# Show top predictions for second-best detection.
f = pd.Series(df['prediction'].iloc[j], index=labels_df['name'])
print('Second-best detection:')
print(f.order(ascending=False)[:5])
# Show top detection in red, second-best top detection in blue.
im = plt.imread('images/fish-bike.jpg')
plt.imshow(im)
currentAxis = plt.gca()
det = df.iloc[i]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='r', linewidth=5))
det = df.iloc[j]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='b', linewidth=5))
Top detection: name person 1.835771 swimming trunks -1.150371 rubber eraser -1.231106 turtle -1.266037 plastic bag -1.303265 dtype: float32 Second-best detection: name bicycle 0.866110 unicycle -0.359139 scorpion -0.811621 lobster -0.982891 lamp -1.096808 dtype: float32
<matplotlib.patches.Rectangle at 0x118576a90>
That's cool. Let's take all 'bicycle' detections and NMS them to get rid of overlapping windows.
def nms_detections(dets, overlap=0.3):
"""
Non-maximum suppression: Greedily select high-scoring detections and
skip detections that are significantly covered by a previously
selected detection.
This version is translated from Matlab code by Tomasz Malisiewicz,
who sped up Pedro Felzenszwalb's code.
Parameters
----------
dets: ndarray
each row is ['xmin', 'ymin', 'xmax', 'ymax', 'score']
overlap: float
minimum overlap ratio (0.3 default)
Output
------
dets: ndarray
remaining after suppression.
"""
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
ind = np.argsort(dets[:, 4])
w = x2 - x1
h = y2 - y1
area = (w * h).astype(float)
pick = []
while len(ind) > 0:
i = ind[-1]
pick.append(i)
ind = ind[:-1]
xx1 = np.maximum(x1[i], x1[ind])
yy1 = np.maximum(y1[i], y1[ind])
xx2 = np.minimum(x2[i], x2[ind])
yy2 = np.minimum(y2[i], y2[ind])
w = np.maximum(0., xx2 - xx1)
h = np.maximum(0., yy2 - yy1)
wh = w * h
o = wh / (area[i] + area[ind] - wh)
ind = ind[np.nonzero(o <= overlap)[0]]
return dets[pick, :]
scores = predictions_df['bicycle']
windows = df[['xmin', 'ymin', 'xmax', 'ymax']].values
dets = np.hstack((windows, scores[:, np.newaxis]))
nms_dets = nms_detections(dets)
Show top 3 NMS'd detections for 'bicycle' in the image and note the gap between the top scoring box (red) and the remaining boxes.
plt.imshow(im)
currentAxis = plt.gca()
colors = ['r', 'b', 'y']
for c, det in zip(colors, nms_dets[:3]):
currentAxis.add_patch(
plt.Rectangle((det[0], det[1]), det[2]-det[0], det[3]-det[1],
fill=False, edgecolor=c, linewidth=5)
)
print 'scores:', nms_dets[:3, 4]
scores: [ 0.86610985 -0.70051557 -1.34796357]
This was an easy instance for bicycle as it was in the class's training set. However, the person result is a true detection since this was not in the set for that class.
You should try out detection on an image of your own next!
(Remove the temp directory to clean up, and we're done.)
!rm -rf _temp