Notebook

注意¶

这个例子因为年久失修，已经跑不起来了。网上推荐看 rbg 的 faster-rcnn 代码。

R-CNN is a state-of-the-art detector that classifies region proposals by a finetuned Caffe model. For the full details of the R-CNN system and model, refer to its project site and the paper:

Rich feature hierarchies for accurate object detection and semantic segmentation. Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. CVPR 2014. Arxiv 2013.

In this example, we do detection by a pure Caffe edition of the R-CNN model for ImageNet. The R-CNN detector outputs class scores for the 200 detection classes of ILSVRC13. Keep in mind that these are raw one vs. all SVM scores, so they are not probabilistically calibrated or exactly comparable across classes. Note that this off-the-shelf model is simply for convenience, and is not the full R-CNN model.

Let's run detection on an image of a bicyclist riding a fish bike in the desert (from the ImageNet challenge—no joke).

First, we'll need region proposals and the Caffe R-CNN ImageNet model:

Selective Search is the region proposer used by R-CNN. The selective_search_ijcv_with_python Python module takes care of extracting proposals through the selective search MATLAB implementation. To install it, download the module and name its directory selective_search_ijcv_with_python, run the demo in MATLAB to compile the necessary functions, then add it to your PYTHONPATH for importing. (If you have your own region proposals prepared, or would rather not bother with this step, detect.py accepts a list of images and bounding boxes as CSV.)
Run ./scripts/download_model_binary.py models/bvlc_reference_rcnn_ilsvrc13 to get the Caffe R-CNN ImageNet model.

With that done, we'll call the bundled detect.py to generate the region proposals and run the network. For an explanation of the arguments, do ./detect.py --help.

In [1]:

import os
CURDIR = os.path.realpath(os.getcwd())
print CURDIR
import sys
sys.path.insert(0, '/home/tzx/dev/caffe-rc3')

!mkdir -p _temp
!echo `pwd`/images/fish-bike.jpg > _temp/det_input.txt

/home/tzx/dev/caffe-rc3/examples

In [2]:

import selective_search_ijcv_with_python
!export PYTHONPATH=/home/tzx/dev/caffe-rc3/python:/home/tzx/dev/caffe-rc3:$PYTHONPATH

In [5]:

!../python/detect.py --crop_mode=selective_search --pretrained_model=../models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel --model_def=../models/bvlc_reference_rcnn_ilsvrc13/deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5

GPU mode
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0621 16:58:20.702862  9538 net.cpp:49] Initializing net from parameters: 
name: "R-CNN-ilsvrc13"
input: "data"
state {
  phase: TEST
}
input_shape {
  dim: 10
  dim: 3
  dim: 227
  dim: 227
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc-rcnn"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc-rcnn"
  inner_product_param {
    num_output: 200
  }
}
I0621 16:58:20.703330  9538 net.cpp:413] Input 0 -> data
I0621 16:58:20.717985  9538 layer_factory.hpp:77] Creating layer conv1
I0621 16:58:20.718034  9538 net.cpp:106] Creating Layer conv1
I0621 16:58:20.718042  9538 net.cpp:454] conv1 <- data
I0621 16:58:20.718052  9538 net.cpp:411] conv1 -> conv1
I0621 16:58:20.818161  9538 net.cpp:150] Setting up conv1
I0621 16:58:20.818195  9538 net.cpp:157] Top shape: 10 96 55 55 (2904000)
I0621 16:58:20.818202  9538 net.cpp:165] Memory required for data: 11616000
I0621 16:58:20.818222  9538 layer_factory.hpp:77] Creating layer relu1
I0621 16:58:20.818238  9538 net.cpp:106] Creating Layer relu1
I0621 16:58:20.818244  9538 net.cpp:454] relu1 <- conv1
I0621 16:58:20.818251  9538 net.cpp:397] relu1 -> conv1 (in-place)
I0621 16:58:20.818477  9538 net.cpp:150] Setting up relu1
I0621 16:58:20.818492  9538 net.cpp:157] Top shape: 10 96 55 55 (2904000)
I0621 16:58:20.818497  9538 net.cpp:165] Memory required for data: 23232000
I0621 16:58:20.818502  9538 layer_factory.hpp:77] Creating layer pool1
I0621 16:58:20.818514  9538 net.cpp:106] Creating Layer pool1
I0621 16:58:20.818521  9538 net.cpp:454] pool1 <- conv1
I0621 16:58:20.818527  9538 net.cpp:411] pool1 -> pool1
I0621 16:58:20.818568  9538 net.cpp:150] Setting up pool1
I0621 16:58:20.818578  9538 net.cpp:157] Top shape: 10 96 27 27 (699840)
I0621 16:58:20.818583  9538 net.cpp:165] Memory required for data: 26031360
I0621 16:58:20.818588  9538 layer_factory.hpp:77] Creating layer norm1
I0621 16:58:20.818598  9538 net.cpp:106] Creating Layer norm1
I0621 16:58:20.818603  9538 net.cpp:454] norm1 <- pool1
I0621 16:58:20.818608  9538 net.cpp:411] norm1 -> norm1
I0621 16:58:20.818758  9538 net.cpp:150] Setting up norm1
I0621 16:58:20.818771  9538 net.cpp:157] Top shape: 10 96 27 27 (699840)
I0621 16:58:20.818776  9538 net.cpp:165] Memory required for data: 28830720
I0621 16:58:20.818781  9538 layer_factory.hpp:77] Creating layer conv2
I0621 16:58:20.818794  9538 net.cpp:106] Creating Layer conv2
I0621 16:58:20.818799  9538 net.cpp:454] conv2 <- norm1
I0621 16:58:20.818805  9538 net.cpp:411] conv2 -> conv2
I0621 16:58:20.820828  9538 net.cpp:150] Setting up conv2
I0621 16:58:20.820847  9538 net.cpp:157] Top shape: 10 256 27 27 (1866240)
I0621 16:58:20.820852  9538 net.cpp:165] Memory required for data: 36295680
I0621 16:58:20.820863  9538 layer_factory.hpp:77] Creating layer relu2
I0621 16:58:20.820870  9538 net.cpp:106] Creating Layer relu2
I0621 16:58:20.820876  9538 net.cpp:454] relu2 <- conv2
I0621 16:58:20.820883  9538 net.cpp:397] relu2 -> conv2 (in-place)
I0621 16:58:20.821020  9538 net.cpp:150] Setting up relu2
I0621 16:58:20.821033  9538 net.cpp:157] Top shape: 10 256 27 27 (1866240)
I0621 16:58:20.821038  9538 net.cpp:165] Memory required for data: 43760640
I0621 16:58:20.821043  9538 layer_factory.hpp:77] Creating layer pool2
I0621 16:58:20.821050  9538 net.cpp:106] Creating Layer pool2
I0621 16:58:20.821055  9538 net.cpp:454] pool2 <- conv2
I0621 16:58:20.821061  9538 net.cpp:411] pool2 -> pool2
I0621 16:58:20.821097  9538 net.cpp:150] Setting up pool2
I0621 16:58:20.821105  9538 net.cpp:157] Top shape: 10 256 13 13 (432640)
I0621 16:58:20.821110  9538 net.cpp:165] Memory required for data: 45491200
I0621 16:58:20.821115  9538 layer_factory.hpp:77] Creating layer norm2
I0621 16:58:20.821123  9538 net.cpp:106] Creating Layer norm2
I0621 16:58:20.821130  9538 net.cpp:454] norm2 <- pool2
I0621 16:58:20.821135  9538 net.cpp:411] norm2 -> norm2
I0621 16:58:20.821372  9538 net.cpp:150] Setting up norm2
I0621 16:58:20.821388  9538 net.cpp:157] Top shape: 10 256 13 13 (432640)
I0621 16:58:20.821393  9538 net.cpp:165] Memory required for data: 47221760
I0621 16:58:20.821398  9538 layer_factory.hpp:77] Creating layer conv3
I0621 16:58:20.821409  9538 net.cpp:106] Creating Layer conv3
I0621 16:58:20.821415  9538 net.cpp:454] conv3 <- norm2
I0621 16:58:20.821422  9538 net.cpp:411] conv3 -> conv3
I0621 16:58:20.823508  9538 net.cpp:150] Setting up conv3
I0621 16:58:20.823526  9538 net.cpp:157] Top shape: 10 384 13 13 (648960)
I0621 16:58:20.823532  9538 net.cpp:165] Memory required for data: 49817600
I0621 16:58:20.823544  9538 layer_factory.hpp:77] Creating layer relu3
I0621 16:58:20.823552  9538 net.cpp:106] Creating Layer relu3
I0621 16:58:20.823559  9538 net.cpp:454] relu3 <- conv3
I0621 16:58:20.823565  9538 net.cpp:397] relu3 -> conv3 (in-place)
I0621 16:58:20.823699  9538 net.cpp:150] Setting up relu3
I0621 16:58:20.823711  9538 net.cpp:157] Top shape: 10 384 13 13 (648960)
I0621 16:58:20.823716  9538 net.cpp:165] Memory required for data: 52413440
I0621 16:58:20.823721  9538 layer_factory.hpp:77] Creating layer conv4
I0621 16:58:20.823731  9538 net.cpp:106] Creating Layer conv4
I0621 16:58:20.823736  9538 net.cpp:454] conv4 <- conv3
I0621 16:58:20.823745  9538 net.cpp:411] conv4 -> conv4
I0621 16:58:20.825955  9538 net.cpp:150] Setting up conv4
I0621 16:58:20.825978  9538 net.cpp:157] Top shape: 10 384 13 13 (648960)
I0621 16:58:20.825985  9538 net.cpp:165] Memory required for data: 55009280
I0621 16:58:20.825994  9538 layer_factory.hpp:77] Creating layer relu4
I0621 16:58:20.826002  9538 net.cpp:106] Creating Layer relu4
I0621 16:58:20.826009  9538 net.cpp:454] relu4 <- conv4
I0621 16:58:20.826014  9538 net.cpp:397] relu4 -> conv4 (in-place)
I0621 16:58:20.826148  9538 net.cpp:150] Setting up relu4
I0621 16:58:20.826159  9538 net.cpp:157] Top shape: 10 384 13 13 (648960)
I0621 16:58:20.826164  9538 net.cpp:165] Memory required for data: 57605120
I0621 16:58:20.826169  9538 layer_factory.hpp:77] Creating layer conv5
I0621 16:58:20.826179  9538 net.cpp:106] Creating Layer conv5
I0621 16:58:20.826184  9538 net.cpp:454] conv5 <- conv4
I0621 16:58:20.826191  9538 net.cpp:411] conv5 -> conv5
I0621 16:58:20.828217  9538 net.cpp:150] Setting up conv5
I0621 16:58:20.828235  9538 net.cpp:157] Top shape: 10 256 13 13 (432640)
I0621 16:58:20.828241  9538 net.cpp:165] Memory required for data: 59335680
I0621 16:58:20.828251  9538 layer_factory.hpp:77] Creating layer relu5
I0621 16:58:20.828259  9538 net.cpp:106] Creating Layer relu5
I0621 16:58:20.828265  9538 net.cpp:454] relu5 <- conv5
I0621 16:58:20.828272  9538 net.cpp:397] relu5 -> conv5 (in-place)
I0621 16:58:20.828408  9538 net.cpp:150] Setting up relu5
I0621 16:58:20.828419  9538 net.cpp:157] Top shape: 10 256 13 13 (432640)
I0621 16:58:20.828424  9538 net.cpp:165] Memory required for data: 61066240
I0621 16:58:20.828429  9538 layer_factory.hpp:77] Creating layer pool5
I0621 16:58:20.828438  9538 net.cpp:106] Creating Layer pool5
I0621 16:58:20.828444  9538 net.cpp:454] pool5 <- conv5
I0621 16:58:20.828451  9538 net.cpp:411] pool5 -> pool5
I0621 16:58:20.828486  9538 net.cpp:150] Setting up pool5
I0621 16:58:20.828495  9538 net.cpp:157] Top shape: 10 256 6 6 (92160)
I0621 16:58:20.828500  9538 net.cpp:165] Memory required for data: 61434880
I0621 16:58:20.828505  9538 layer_factory.hpp:77] Creating layer fc6
I0621 16:58:20.828517  9538 net.cpp:106] Creating Layer fc6
I0621 16:58:20.828522  9538 net.cpp:454] fc6 <- pool5
I0621 16:58:20.828528  9538 net.cpp:411] fc6 -> fc6
I0621 16:58:20.889945  9538 net.cpp:150] Setting up fc6
I0621 16:58:20.889986  9538 net.cpp:157] Top shape: 10 4096 (40960)
I0621 16:58:20.889993  9538 net.cpp:165] Memory required for data: 61598720
I0621 16:58:20.890004  9538 layer_factory.hpp:77] Creating layer relu6
I0621 16:58:20.890018  9538 net.cpp:106] Creating Layer relu6
I0621 16:58:20.890027  9538 net.cpp:454] relu6 <- fc6
I0621 16:58:20.890034  9538 net.cpp:397] relu6 -> fc6 (in-place)
I0621 16:58:20.890375  9538 net.cpp:150] Setting up relu6
I0621 16:58:20.890391  9538 net.cpp:157] Top shape: 10 4096 (40960)
I0621 16:58:20.890398  9538 net.cpp:165] Memory required for data: 61762560
I0621 16:58:20.890403  9538 layer_factory.hpp:77] Creating layer drop6
I0621 16:58:20.890414  9538 net.cpp:106] Creating Layer drop6
I0621 16:58:20.890420  9538 net.cpp:454] drop6 <- fc6
I0621 16:58:20.890427  9538 net.cpp:397] drop6 -> fc6 (in-place)
I0621 16:58:20.890455  9538 net.cpp:150] Setting up drop6
I0621 16:58:20.890462  9538 net.cpp:157] Top shape: 10 4096 (40960)
I0621 16:58:20.890467  9538 net.cpp:165] Memory required for data: 61926400
I0621 16:58:20.890472  9538 layer_factory.hpp:77] Creating layer fc7
I0621 16:58:20.890482  9538 net.cpp:106] Creating Layer fc7
I0621 16:58:20.890487  9538 net.cpp:454] fc7 <- fc6
I0621 16:58:20.890496  9538 net.cpp:411] fc7 -> fc7
I0621 16:58:20.917780  9538 net.cpp:150] Setting up fc7
I0621 16:58:20.917819  9538 net.cpp:157] Top shape: 10 4096 (40960)
I0621 16:58:20.917824  9538 net.cpp:165] Memory required for data: 62090240
I0621 16:58:20.917836  9538 layer_factory.hpp:77] Creating layer relu7
I0621 16:58:20.917847  9538 net.cpp:106] Creating Layer relu7
I0621 16:58:20.917855  9538 net.cpp:454] relu7 <- fc7
I0621 16:58:20.917862  9538 net.cpp:397] relu7 -> fc7 (in-place)
I0621 16:58:20.918051  9538 net.cpp:150] Setting up relu7
I0621 16:58:20.918063  9538 net.cpp:157] Top shape: 10 4096 (40960)
I0621 16:58:20.918069  9538 net.cpp:165] Memory required for data: 62254080
I0621 16:58:20.918074  9538 layer_factory.hpp:77] Creating layer drop7
I0621 16:58:20.918086  9538 net.cpp:106] Creating Layer drop7
I0621 16:58:20.918090  9538 net.cpp:454] drop7 <- fc7
I0621 16:58:20.918097  9538 net.cpp:397] drop7 -> fc7 (in-place)
I0621 16:58:20.918121  9538 net.cpp:150] Setting up drop7
I0621 16:58:20.918129  9538 net.cpp:157] Top shape: 10 4096 (40960)
I0621 16:58:20.918134  9538 net.cpp:165] Memory required for data: 62417920
I0621 16:58:20.918138  9538 layer_factory.hpp:77] Creating layer fc-rcnn
I0621 16:58:20.918148  9538 net.cpp:106] Creating Layer fc-rcnn
I0621 16:58:20.918154  9538 net.cpp:454] fc-rcnn <- fc7
I0621 16:58:20.918160  9538 net.cpp:411] fc-rcnn -> fc-rcnn
I0621 16:58:20.919497  9538 net.cpp:150] Setting up fc-rcnn
I0621 16:58:20.919513  9538 net.cpp:157] Top shape: 10 200 (2000)
I0621 16:58:20.919518  9538 net.cpp:165] Memory required for data: 62425920
I0621 16:58:20.919528  9538 net.cpp:228] fc-rcnn does not need backward computation.
I0621 16:58:20.919533  9538 net.cpp:228] drop7 does not need backward computation.
I0621 16:58:20.919538  9538 net.cpp:228] relu7 does not need backward computation.
I0621 16:58:20.919543  9538 net.cpp:228] fc7 does not need backward computation.
I0621 16:58:20.919548  9538 net.cpp:228] drop6 does not need backward computation.
I0621 16:58:20.919553  9538 net.cpp:228] relu6 does not need backward computation.
I0621 16:58:20.919559  9538 net.cpp:228] fc6 does not need backward computation.
I0621 16:58:20.919564  9538 net.cpp:228] pool5 does not need backward computation.
I0621 16:58:20.919569  9538 net.cpp:228] relu5 does not need backward computation.
I0621 16:58:20.919574  9538 net.cpp:228] conv5 does not need backward computation.
I0621 16:58:20.919579  9538 net.cpp:228] relu4 does not need backward computation.
I0621 16:58:20.919584  9538 net.cpp:228] conv4 does not need backward computation.
I0621 16:58:20.919589  9538 net.cpp:228] relu3 does not need backward computation.
I0621 16:58:20.919595  9538 net.cpp:228] conv3 does not need backward computation.
I0621 16:58:20.919600  9538 net.cpp:228] norm2 does not need backward computation.
I0621 16:58:20.919605  9538 net.cpp:228] pool2 does not need backward computation.
I0621 16:58:20.919610  9538 net.cpp:228] relu2 does not need backward computation.
I0621 16:58:20.919615  9538 net.cpp:228] conv2 does not need backward computation.
I0621 16:58:20.919620  9538 net.cpp:228] norm1 does not need backward computation.
I0621 16:58:20.919625  9538 net.cpp:228] pool1 does not need backward computation.
I0621 16:58:20.919630  9538 net.cpp:228] relu1 does not need backward computation.
I0621 16:58:20.919634  9538 net.cpp:228] conv1 does not need backward computation.
I0621 16:58:20.919639  9538 net.cpp:270] This network produces output fc-rcnn
I0621 16:58:20.919652  9538 net.cpp:283] Network initialization done.
I0621 16:58:21.193883  9538 upgrade_proto.cpp:51] Attempting to upgrade input file specified using deprecated V1LayerParameter: ../models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel
I0621 16:58:21.310328  9538 upgrade_proto.cpp:59] Successfully upgraded file specified using deprecated V1LayerParameter
Loading input...
selective_search_rcnn({'/home/tzx/dev/caffe-rc3/examples/images/fish-bike.jpg'}, '/tmp/tmpzKKXgc.mat')
No protocol specified
/home/tzx/dev/caffe-rc3/python/caffe/detector.py:140: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  crop = im[window[0]:window[2], window[1]:window[3]]
/home/tzx/dev/caffe-rc3/python/caffe/detector.py:174: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  context_crop = im[box[0]:box[2], box[1]:box[3]]
/home/tzx/dev/caffe-rc3/python/caffe/detector.py:177: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  crop[pad_y:(pad_y + crop_h), pad_x:(pad_x + crop_w)] = context_crop
Traceback (most recent call last):
  File "../python/detect.py", line 173, in <module>
    main(sys.argv)
  File "../python/detect.py", line 144, in main
    detections = detector.detect_selective_search(inputs)
  File "/home/tzx/dev/caffe-rc3/python/caffe/detector.py", line 123, in detect_selective_search
    return self.detect_windows(zip(image_fnames, windows_list))
  File "/home/tzx/dev/caffe-rc3/python/caffe/detector.py", line 86, in detect_windows
    predictions = out[self.outputs[0]].squeeze(axis=(2, 3))
ValueError: 'axis' entry 2 is out of bounds [-2, 2)

This run was in GPU mode. For CPU mode detection, call detect.py without the --gpu argument.

Running this outputs a DataFrame with the filenames, selected windows, and their detection scores to an HDF5 file. (We only ran on one image, so the filenames will all be the same.)

In [2]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_hdf('_temp/det_output.h5', 'df')
print(df.shape)
print(df.iloc[0])

(1570, 5)
prediction    [-2.62247, -2.84579, -2.85122, -3.20838, -1.94...
ymin                                                     79.846
xmin                                                       9.62
ymax                                                     246.31
xmax                                                    339.624
Name: /Users/shelhamer/h/desk/caffe/caffe-dev/examples/images/fish-bike.jpg, dtype: object

1570 regions were proposed with the R-CNN configuration of selective search. The number of proposals will vary from image to image based on its contents and size -- selective search isn't scale invariant.

In general, detect.py is most efficient when running on a lot of images: it first extracts window proposals for all of them, batches the windows for efficient GPU processing, and then outputs the results. Simply list an image per line in the images_file, and it will process all of them.

Although this guide gives an example of R-CNN ImageNet detection, detect.py is clever enough to adapt to different Caffe models’ input dimensions, batch size, and output categories. You can switch the model definition and pretrained model as desired. Refer to python detect.py --help for the parameters to describe your data set. There's no need for hardcoding.

Anyway, let's now load the ILSVRC13 detection class names and make a DataFrame of the predictions. Note you'll need the auxiliary ilsvrc2012 data fetched by data/ilsvrc12/get_ilsvrc12_aux.sh.

In [3]:

with open('../data/ilsvrc12/det_synset_words.txt') as f:
    labels_df = pd.DataFrame([
        {
            'synset_id': l.strip().split(' ')[0],
            'name': ' '.join(l.strip().split(' ')[1:]).split(',')[0]
        }
        for l in f.readlines()
    ])
labels_df.sort('synset_id')
predictions_df = pd.DataFrame(np.vstack(df.prediction.values), columns=labels_df['name'])
print(predictions_df.iloc[0])

name
accordion      -2.622471
airplane       -2.845788
ant            -2.851219
antelope       -3.208377
apple          -1.949950
armadillo      -2.472935
artichoke      -2.201684
axe            -2.327404
baby bed       -2.737925
backpack       -2.176763
bagel          -2.681061
balance beam   -2.722538
banana         -2.390628
band aid       -1.598909
banjo          -2.298197
...
trombone        -2.582361
trumpet         -2.352853
turtle          -2.360859
tv or monitor   -2.761043
unicycle        -2.218467
vacuum          -1.907717
violin          -2.757079
volleyball      -2.723689
waffle iron     -2.418540
washer          -2.408994
water bottle    -2.174899
watercraft      -2.837425
whale           -3.120338
wine bottle     -2.772960
zebra           -2.742913
Name: 0, Length: 200, dtype: float32

Let's look at the activations.

In [4]:

plt.gray()
plt.matshow(predictions_df.values)
plt.xlabel('Classes')
plt.ylabel('Windows')

Out[4]:

<matplotlib.text.Text at 0x114f15f90>

<matplotlib.figure.Figure at 0x114254b50>

Now let's take max across all windows and plot the top classes.

In [5]:

max_s = predictions_df.max(0)
max_s.sort(ascending=False)
print(max_s[:10])

name
person          1.835771
bicycle         0.866110
unicycle        0.057080
motorcycle     -0.006122
banjo          -0.028209
turtle         -0.189831
electric fan   -0.206788
cart           -0.214235
lizard         -0.393519
helmet         -0.477942
dtype: float32

The top detections are in fact a person and bicycle. Picking good localizations is a work in progress; we pick the top-scoring person and bicycle detections.

In [6]:

# Find, print, and display the top detections: person and bicycle.
i = predictions_df['person'].argmax()
j = predictions_df['bicycle'].argmax()

# Show top predictions for top detection.
f = pd.Series(df['prediction'].iloc[i], index=labels_df['name'])
print('Top detection:')
print(f.order(ascending=False)[:5])
print('')

# Show top predictions for second-best detection.
f = pd.Series(df['prediction'].iloc[j], index=labels_df['name'])
print('Second-best detection:')
print(f.order(ascending=False)[:5])

# Show top detection in red, second-best top detection in blue.
im = plt.imread('images/fish-bike.jpg')
plt.imshow(im)
currentAxis = plt.gca()

det = df.iloc[i]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='r', linewidth=5))

det = df.iloc[j]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='b', linewidth=5))

Top detection:
name
person             1.835771
swimming trunks   -1.150371
rubber eraser     -1.231106
turtle            -1.266037
plastic bag       -1.303265
dtype: float32

Second-best detection:
name
bicycle     0.866110
unicycle   -0.359139
scorpion   -0.811621
lobster    -0.982891
lamp       -1.096808
dtype: float32

Out[6]:

<matplotlib.patches.Rectangle at 0x118576a90>

That's cool. Let's take all 'bicycle' detections and NMS them to get rid of overlapping windows.

In [7]:

def nms_detections(dets, overlap=0.3):
    """
    Non-maximum suppression: Greedily select high-scoring detections and
    skip detections that are significantly covered by a previously
    selected detection.

    This version is translated from Matlab code by Tomasz Malisiewicz,
    who sped up Pedro Felzenszwalb's code.

    Parameters
    ----------
    dets: ndarray
        each row is ['xmin', 'ymin', 'xmax', 'ymax', 'score']
    overlap: float
        minimum overlap ratio (0.3 default)

    Output
    ------
    dets: ndarray
        remaining after suppression.
    """
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    ind = np.argsort(dets[:, 4])

    w = x2 - x1
    h = y2 - y1
    area = (w * h).astype(float)

    pick = []
    while len(ind) > 0:
        i = ind[-1]
        pick.append(i)
        ind = ind[:-1]

        xx1 = np.maximum(x1[i], x1[ind])
        yy1 = np.maximum(y1[i], y1[ind])
        xx2 = np.minimum(x2[i], x2[ind])
        yy2 = np.minimum(y2[i], y2[ind])

        w = np.maximum(0., xx2 - xx1)
        h = np.maximum(0., yy2 - yy1)

        wh = w * h
        o = wh / (area[i] + area[ind] - wh)

        ind = ind[np.nonzero(o <= overlap)[0]]

    return dets[pick, :]

In [8]:

scores = predictions_df['bicycle']
windows = df[['xmin', 'ymin', 'xmax', 'ymax']].values
dets = np.hstack((windows, scores[:, np.newaxis]))
nms_dets = nms_detections(dets)

Show top 3 NMS'd detections for 'bicycle' in the image and note the gap between the top scoring box (red) and the remaining boxes.

In [9]:

plt.imshow(im)
currentAxis = plt.gca()
colors = ['r', 'b', 'y']
for c, det in zip(colors, nms_dets[:3]):
    currentAxis.add_patch(
        plt.Rectangle((det[0], det[1]), det[2]-det[0], det[3]-det[1],
        fill=False, edgecolor=c, linewidth=5)
    )
print 'scores:', nms_dets[:3, 4]

scores: [ 0.86610985 -0.70051557 -1.34796357]

This was an easy instance for bicycle as it was in the class's training set. However, the person result is a true detection since this was not in the set for that class.

You should try out detection on an image of your own next!

(Remove the temp directory to clean up, and we're done.)

In [10]:

!rm -rf _temp