Objects Detection (SSDLite, MobileNetV2, COCO)

Experiment overview

In this experiment we will use pre-trained ssdlite_mobilenet_v2_coco model from Tensorflow detection models zoo to do objects detection on the photos.

objects_detection_ssdlite_mobilenet_v2.jpg

_This notebook is inspired by Objects Detection API Demo_

Importing Dependencies

  • tensorflow - for developing and training ML models.
  • matplotlib - for plotting the data.
  • numpy - for linear algebra operations.
  • cv2 - for processing the images and drawing object detections on top of them.
  • PIL - for convenient image loading.
  • pathlib - for working with model files.
  • math - to do simple math operations while drawing the detection frames.
  • google.protobuf - for reading the files in protobuf format.
In [ ]:
# Selecting Tensorflow version v2 (the command is relevant for Colab only).
%tensorflow_version 2.x
In [1]:
import tensorflow as tf
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import pathlib
import cv2
import math
from PIL import Image
from google.protobuf import text_format
import platform

print('Python version:', platform.python_version())
print('Tensorflow version:', tf.__version__)
print('Keras version:', tf.keras.__version__)
Tensorflow version: 2.1.0
Keras version: 2.2.4-tf

Loading the model

To do objects detection we're going to use ssdlite_mobilenet_v2_coco model from Tensorflow detection models zoo.

The full name of the model will be ssdlite_mobilenet_v2_coco_2018_05_09.

In [2]:
# Create cache folder.
!mkdir .tmp
mkdir: .tmp: File exists
In [3]:
# Loads the module from internet, unpacks it and initializes a Tensorflow saved model.
def load_model(model_name):
    model_url = 'http://download.tensorflow.org/models/object_detection/' + model_name + '.tar.gz'
    
    model_dir = tf.keras.utils.get_file(
        fname=model_name, 
        origin=model_url,
        untar=True,
        cache_dir=pathlib.Path('.tmp').absolute()
    )
    model = tf.saved_model.load(model_dir + '/saved_model')
    
    return model
In [4]:
MODEL_NAME = 'ssdlite_mobilenet_v2_coco_2018_05_09'
saved_model = load_model(MODEL_NAME)
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
In [5]:
# Exploring model signatures.
saved_model.signatures
Out[5]:
_SignatureMap({'serving_default': <tensorflow.python.eager.wrap_function.WrappedFunction object at 0x144f98d10>})
In [6]:
# Loading default model signature.
model = saved_model.signatures['serving_default']

Loading model labels

Depending on what dataset has been used to train the model we need to download proper labels set from tensorflow models repository.

The ssdlite_mobilenet_v2_coco model has been trained on COCO dataset which has 90 objects categories. This list of categories we're going to download and explore. We need a label file with the name mscoco_label_map.pbtxt.

Compiling the protobuf label map

Label object structure is defined in string_int_label_map.proto file in protobuf format.

In order to convert mscoco_label_map.pbtxt file to Python dictionary we need to load string_int_label_map.proto file and compile it using protoc. Before doing the we need to install protoc.

One of the ways to install protoc is to load it manually:

PROTOC_ZIP=protoc-3.7.1-osx-x86_64.zip
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v3.7.1/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d .tmp/protoc
rm -f $PROTOC_ZIP

After that we may compile proto files by running:

.tmp/protoc/bin/protoc ./protos/*.proto --python_out=.

☝🏻 For simplicity reasons we have string_int_label_map.proto and its compiled version string_int_label_map_pb2.py in the protos directory. So let's just include this compiled package.

In [7]:
from protos import string_int_label_map_pb2

Loading and parsing the labels

In [8]:
def load_labels(labels_name):
    labels_url = 'https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/data/' + labels_name
    
    labels_path = tf.keras.utils.get_file(
        fname=labels_name, 
        origin=labels_url,
        cache_dir=pathlib.Path('.tmp').absolute()
    )
    
    labels_file = open(labels_path, 'r')
    labels_string = labels_file.read()
    
    labels_map = string_int_label_map_pb2.StringIntLabelMap()
    try:
        text_format.Merge(labels_string, labels_map)
    except text_format.ParseError:
        labels_map.ParseFromString(labels_string)
    
    labels_dict = {}
    for item in labels_map.item:
        labels_dict[item.id] = item.display_name
    
    return labels_dict
In [9]:
LABELS_NAME = 'mscoco_label_map.pbtxt'
labels = load_labels(LABELS_NAME)
labels
Out[9]:
{1: 'person',
 2: 'bicycle',
 3: 'car',
 4: 'motorcycle',
 5: 'airplane',
 6: 'bus',
 7: 'train',
 8: 'truck',
 9: 'boat',
 10: 'traffic light',
 11: 'fire hydrant',
 13: 'stop sign',
 14: 'parking meter',
 15: 'bench',
 16: 'bird',
 17: 'cat',
 18: 'dog',
 19: 'horse',
 20: 'sheep',
 21: 'cow',
 22: 'elephant',
 23: 'bear',
 24: 'zebra',
 25: 'giraffe',
 27: 'backpack',
 28: 'umbrella',
 31: 'handbag',
 32: 'tie',
 33: 'suitcase',
 34: 'frisbee',
 35: 'skis',
 36: 'snowboard',
 37: 'sports ball',
 38: 'kite',
 39: 'baseball bat',
 40: 'baseball glove',
 41: 'skateboard',
 42: 'surfboard',
 43: 'tennis racket',
 44: 'bottle',
 46: 'wine glass',
 47: 'cup',
 48: 'fork',
 49: 'knife',
 50: 'spoon',
 51: 'bowl',
 52: 'banana',
 53: 'apple',
 54: 'sandwich',
 55: 'orange',
 56: 'broccoli',
 57: 'carrot',
 58: 'hot dog',
 59: 'pizza',
 60: 'donut',
 61: 'cake',
 62: 'chair',
 63: 'couch',
 64: 'potted plant',
 65: 'bed',
 67: 'dining table',
 70: 'toilet',
 72: 'tv',
 73: 'laptop',
 74: 'mouse',
 75: 'remote',
 76: 'keyboard',
 77: 'cell phone',
 78: 'microwave',
 79: 'oven',
 80: 'toaster',
 81: 'sink',
 82: 'refrigerator',
 84: 'book',
 85: 'clock',
 86: 'vase',
 87: 'scissors',
 88: 'teddy bear',
 89: 'hair drier',
 90: 'toothbrush'}

Exploring the model

In [10]:
# List model files
!ls -la .tmp/datasets/ssdlite_mobilenet_v2_coco_2018_05_09
total 81680
drwxr-x---  9 trekhleb  staff       288 May 10  2018 .
drwxr-xr-x  5 trekhleb  staff       160 Jan 23 07:24 ..
-rw-r-----  1 trekhleb  staff        77 May 10  2018 checkpoint
-rw-r-----  1 trekhleb  staff  19911343 May 10  2018 frozen_inference_graph.pb
-rw-r-----  1 trekhleb  staff  18205188 May 10  2018 model.ckpt.data-00000-of-00001
-rw-r-----  1 trekhleb  staff     17703 May 10  2018 model.ckpt.index
-rw-r-----  1 trekhleb  staff   3665866 May 10  2018 model.ckpt.meta
-rw-r-----  1 trekhleb  staff      4199 May 10  2018 pipeline.config
drwxr-x---  4 trekhleb  staff       128 May 10  2018 saved_model
In [11]:
# Check model pipeline.
!cat .tmp/datasets/ssdlite_mobilenet_v2_coco_2018_05_09/pipeline.config
model {
  ssd {
    num_classes: 90
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    feature_extractor {
      type: "ssd_mobilenet_v2"
      depth_multiplier: 1.0
      min_depth: 16
      conv_hyperparams {
        regularizer {
          l2_regularizer {
            weight: 3.99999989895e-05
          }
        }
        initializer {
          truncated_normal_initializer {
            mean: 0.0
            stddev: 0.0299999993294
          }
        }
        activation: RELU_6
        batch_norm {
          decay: 0.999700009823
          center: true
          scale: true
          epsilon: 0.0010000000475
          train: true
        }
      }
      use_depthwise: true
    }
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    box_predictor {
      convolutional_box_predictor {
        conv_hyperparams {
          regularizer {
            l2_regularizer {
              weight: 3.99999989895e-05
            }
          }
          initializer {
            truncated_normal_initializer {
              mean: 0.0
              stddev: 0.0299999993294
            }
          }
          activation: RELU_6
          batch_norm {
            decay: 0.999700009823
            center: true
            scale: true
            epsilon: 0.0010000000475
            train: true
          }
        }
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.800000011921
        kernel_size: 3
        box_code_size: 4
        apply_sigmoid_to_scores: false
        use_depthwise: true
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.20000000298
        max_scale: 0.949999988079
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.333299994469
      }
    }
    post_processing {
      batch_non_max_suppression {
        score_threshold: 0.300000011921
        iou_threshold: 0.600000023842
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
    normalize_loss_by_num_matches: true
    loss {
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_loss {
        weighted_sigmoid {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.990000009537
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 3
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
  }
}
train_config {
  batch_size: 24
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
  optimizer {
    rms_prop_optimizer {
      learning_rate {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.00400000018999
          decay_steps: 800720
          decay_factor: 0.949999988079
        }
      }
      momentum_optimizer_value: 0.899999976158
      decay: 0.899999976158
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
  num_steps: 200000
  fine_tune_checkpoint_type: "detection"
}
train_input_reader {
  label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record"
  }
}
eval_config {
  num_examples: 8000
  max_evals: 10
  use_moving_averages: false
}
eval_input_reader {
  label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
  shuffle: false
  num_readers: 1
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record"
  }
}
In [12]:
model.inputs
Out[12]:
[<tf.Tensor 'image_tensor:0' shape=(None, None, None, 3) dtype=uint8>]
In [13]:
model.outputs
Out[13]:
[<tf.Tensor 'detection_boxes:0' shape=(None, 100, 4) dtype=float32>,
 <tf.Tensor 'detection_classes:0' shape=(None, 100) dtype=float32>,
 <tf.Tensor 'detection_scores:0' shape=(None, 100) dtype=float32>,
 <tf.Tensor 'num_detections:0' shape=(None,) dtype=float32>]

Loading test images

In [14]:
def display_image(image_np):
    plt.figure()
    plt.imshow(image_np)
In [15]:
TEST_IMAGES_DIR_PATH = pathlib.Path('data')
TEST_IMAGE_PATHS = sorted(list(TEST_IMAGES_DIR_PATH.glob('*.jpg')))
TEST_IMAGE_PATHS
Out[15]:
[PosixPath('data/appartment.jpg'),
 PosixPath('data/bicycle.jpg'),
 PosixPath('data/dog.jpg'),
 PosixPath('data/food.jpg'),
 PosixPath('data/football.jpg'),
 PosixPath('data/pedestrians.jpg'),
 PosixPath('data/street.jpg')]
In [16]:
for image_path in TEST_IMAGE_PATHS:
    image_np = mpimg.imread(image_path)
    display_image(image_np)