Notebook

Shipping SqueezeNet from PyTorch to ONNX to Android App¶

Notebook last updated on 2018-12-31 using PyTorch 1.0 nightly with Android fixes. Code changes:

_Exported the two Protobuf files (squeeze_init_net_v1.pb and squeeze_predict_net_v1.pb) again for verification.
Modified SqueezeNet network to use nn.AdaptiveAvgPool2d((1, 1)).

In this notebook we will show you how to export SqueezeNet which is implemented and trained in fastai library (TODO) and PyTorch to run on mobile devices.

Let's get started. First, you should have PyTorch and ONNX installed in your environment and git cloned AICamera repo.

NOTE: Caffe2 pre-built binaries were installed together when you install PyTorch as the Caffe2 source code now lives in the PyTorch repository.

Install PyTorch 1.0 preview locally. Run this command:

conda install pytorch-nightly cuda92 -c pytorch

Install ONNX. See the instructions in this notebook.

Import some Python packages

In [1]:

import io
import numpy as np
import torch.onnx

Note: we are using PyTorch 1.0 preview (nightly) released on 2018-12-31.

In [2]:

print(torch.__version__)

1.0.0.dev20181231

NOTE: as the work to bridge ResNet-family models build using fastai v1 library to pure PyTorch land continues, for now, the steps below will use an example of mobile-first CNN, SqueezeNet available from torchvision. This model was developed in plain PyTorch (not in fastai v1).

Network - SqueezeNet v1.1¶

Efficient Convolutional Neural Networks (CNNs) for Mobile Vision¶

SqueezeNet is a small CNN which achieves AlexNet level accuracy on ImageNet with 50x fewer parameters. Paper.

Use cases

SqueezeNet models perform image classification—they take images as input and classify the major object in the image into a set of pre-defined classes. They are trained on ImageNet dataset which contains images from 1000 classes. SqueezeNet models are highly efficient in terms of size and speed while providing good accuracies. This makes them ideal for platforms with strict constraints on size.

SqueezeNet version 1.1

SqueezeNet 1.1 presented in the official SqueezeNet repo is an improved version of SqueezeNet 1.0 from the paper.

SqueezeNet version 1.1 requires 2.4x less computation than version 1.0, without sacrificing accuracy. [Jun 2016]

SqueezeNet 1.1 pre-trained model weights

The following SqueezeNet implementation in PyTorch by Marat Dukhan and it is part of torchvision:

In [8]:

import torch
import torch.nn as nn
import torch.nn.init as init
import torch.utils.model_zoo as model_zoo


__all__ = ['SqueezeNet', 'squeezenet1_0', 'squeezenet1_1']


model_urls = {
    'squeezenet1_0': 'https://download.pytorch.org/models/squeezenet1_0-a815701f.pth',
    'squeezenet1_1': 'https://download.pytorch.org/models/squeezenet1_1-f364aa15.pth',
}


class Fire(nn.Module):

    def __init__(self, inplanes, squeeze_planes,
                 expand1x1_planes, expand3x3_planes):
        super(Fire, self).__init__()
        self.inplanes = inplanes
        self.squeeze = nn.Conv2d(inplanes, squeeze_planes, kernel_size=1)
        self.squeeze_activation = nn.ReLU(inplace=True)
        self.expand1x1 = nn.Conv2d(squeeze_planes, expand1x1_planes,
                                   kernel_size=1)
        self.expand1x1_activation = nn.ReLU(inplace=True)
        self.expand3x3 = nn.Conv2d(squeeze_planes, expand3x3_planes,
                                   kernel_size=3, padding=1)
        self.expand3x3_activation = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.squeeze_activation(self.squeeze(x))
        return torch.cat([
            self.expand1x1_activation(self.expand1x1(x)),
            self.expand3x3_activation(self.expand3x3(x))
        ], 1)


class SqueezeNet(nn.Module):

    def __init__(self, version=1.0, num_classes=1000):
        super(SqueezeNet, self).__init__()
        if version not in [1.0, 1.1]:
            raise ValueError("Unsupported SqueezeNet version {version}:"
                             "1.0 or 1.1 expected".format(version=version))
        self.num_classes = num_classes
        if version == 1.0:
            self.features = nn.Sequential(
                nn.Conv2d(3, 96, kernel_size=7, stride=2),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=False),
                Fire(96, 16, 64, 64),
                Fire(128, 16, 64, 64),
                Fire(128, 32, 128, 128),
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=False),
                Fire(256, 32, 128, 128),
                Fire(256, 48, 192, 192),
                Fire(384, 48, 192, 192),
                Fire(384, 64, 256, 256),
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=False),
                Fire(512, 64, 256, 256),
            )
        else:
            self.features = nn.Sequential(
                nn.Conv2d(3, 64, kernel_size=3, stride=2),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=False),
                Fire(64, 16, 64, 64),
                Fire(128, 16, 64, 64),
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=False),
                Fire(128, 32, 128, 128),
                Fire(256, 32, 128, 128),
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=False),
                Fire(256, 48, 192, 192),
                Fire(384, 48, 192, 192),
                Fire(384, 64, 256, 256),
                Fire(512, 64, 256, 256),
            )
        # Final convolution is initialized differently form the rest
        final_conv = nn.Conv2d(512, self.num_classes, kernel_size=1)
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            final_conv,
            nn.ReLU(inplace=True),
            nn.AdaptiveAvgPool2d((1, 1))
        )

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                if m is final_conv:
                    init.normal_(m.weight, mean=0.0, std=0.01)
                else:
                    init.kaiming_uniform_(m.weight)
                if m.bias is not None:
                    init.constant_(m.bias, 0)

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x.view(x.size(0), self.num_classes)


def squeezenet1_0(pretrained=False, **kwargs):
    r"""SqueezeNet model architecture from the `"SqueezeNet: AlexNet-level
    accuracy with 50x fewer parameters and <0.5MB model size"
    <https://arxiv.org/abs/1602.07360>`_ paper.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = SqueezeNet(version=1.0, **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['squeezenet1_0']))
    return model


def squeezenet1_1(pretrained=False, **kwargs):
    r"""SqueezeNet 1.1 model from the `official SqueezeNet repo
    <https://github.com/DeepScale/SqueezeNet/tree/master/SqueezeNet_v1.1>`_.
    SqueezeNet 1.1 has 2.4x less computation and slightly fewer parameters
    than SqueezeNet 1.0, without sacrificing accuracy.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = SqueezeNet(version=1.1, **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['squeezenet1_1']))
    return model

Model¶

We can get the PyTorch model by calling the following function:

In [3]:

# Get pre-trained SqueezeNet model
torch_model = squeezenet1_1(True)

/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel_launcher.py:94: UserWarning: nn.init.kaiming_uniform is now deprecated in favor of nn.init.kaiming_uniform_.
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel_launcher.py:92: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal_.
Downloading: "https://download.pytorch.org/models/squeezenet1_1-f364aa15.pth" to /home/ubuntu/.torch/models/squeezenet1_1-f364aa15.pth
100%|██████████| 4966400/4966400 [00:01<00:00, 3173216.83it/s]

ONNX¶

Note: we are using ONNX 1.3.0 here.

In [1]:

!cat ~/development/resources/onnx/VERSION_NUMBER

1.3.0

Export the PyTorch model as ONNX model¶

In [5]:

from torch.autograd import Variable
batch_size = 1 # just a random number

Input to the model:

In [6]:

x = Variable(torch.randn(batch_size, 3, 224, 224), requires_grad=True)

Export the model¶

In [7]:

torch_out = torch.onnx._export(torch_model,             # model being run
                               x,                       # model input (or a tuple for multiple inputs)
                               "squeezenet-v1.onnx",       # where to save the model (can be a file or file-like object)
                               export_params=True)      # store the trained parameter weights inside the model file

/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/onnx/symbolic.py:131: UserWarning: ONNX export failed on max_pool2d_with_indices because ceil_mode not supported
  warnings.warn("ONNX export failed on " + op + " because " + msg + " not supported")

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-8e5c012efbf1> in <module>
      2                                x,                       # model input (or a tuple for multiple inputs)
      3                                "squeezenet-v1.onnx",       # where to save the model (can be a file or file-like object)
----> 4                                export_params=True)      # store the trained parameter weights inside the model file

~/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/onnx/__init__.py in _export(*args, **kwargs)
     20 def _export(*args, **kwargs):
     21     from torch.onnx import utils
---> 22     return utils._export(*args, **kwargs)
     23 
     24 

~/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/onnx/utils.py in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, example_outputs, propagate)
    285     defer_weight_export = export_type is not ExportTypes.PROTOBUF_FILE
    286     if export_params:
--> 287         proto, export_map = graph._export_onnx(params, _onnx_opset_version, defer_weight_export, operator_export_type)
    288     else:
    289         proto, export_map = graph._export_onnx([], _onnx_opset_version, False, operator_export_type)

RuntimeError: ONNX export failed: Couldn't export operator aten::max_pool2d_with_indices

Defined at:
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/functional.py(416): max_pool2d_with_indices
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/functional.py(424): _max_pool2d
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/_jit_internal.py(129): fn
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/pooling.py(148): forward
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/module.py(477): _slow_forward
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/module.py(487): __call__
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/container.py(92): forward
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/module.py(477): _slow_forward
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/module.py(487): __call__
<ipython-input-3-f5e9e8cddb5f>(98): forward
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/module.py(477): _slow_forward
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/module.py(487): __call__
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/jit/__init__.py(253): forward
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/module.py(489): __call__
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/jit/__init__.py(198): get_trace_graph
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/onnx/utils.py(192): _trace_and_get_graph_from_model
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/onnx/utils.py(224): _model_to_graph
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/onnx/utils.py(281): _export
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/onnx/__init__.py(22): _export
<ipython-input-7-8e5c012efbf1>(4): <module>
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/IPython/core/interactiveshell.py(3265): run_code
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/IPython/core/interactiveshell.py(3183): run_ast_nodes
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/IPython/core/interactiveshell.py(3018): run_cell_async
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/IPython/core/async_helpers.py(67): _pseudo_sync_runner
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/IPython/core/interactiveshell.py(2843): _run_cell
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/IPython/core/interactiveshell.py(2817): run_cell
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel/zmqshell.py(536): run_cell
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel/ipkernel.py(294): do_execute
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/gen.py(326): wrapper
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel/kernelbase.py(534): execute_request
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/gen.py(326): wrapper
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel/kernelbase.py(267): dispatch_shell
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/gen.py(326): wrapper
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel/kernelbase.py(357): process_one
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/gen.py(1147): run
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/gen.py(1233): inner
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/stack_context.py(300): null_wrapper
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/ioloop.py(758): _run_callback
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/asyncio/events.py(145): _run
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/asyncio/base_events.py(1440): _run_once
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/asyncio/base_events.py(427): run_forever
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/platform/asyncio.py(132): start
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel/kernelapp.py(505): start
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/traitlets/config/application.py(658): launch_instance
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel_launcher.py(16): <module>
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/runpy.py(85): _run_code
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/runpy.py(193): _run_module_as_main


Graph we tried to export:
graph(%0 : Float(1, 3, 224, 224)
      %1 : Float(64, 3, 3, 3)
      %2 : Float(64)
      %3 : Float(16, 64, 1, 1)
      %4 : Float(16)
      %5 : Float(64, 16, 1, 1)
      %6 : Float(64)
      %7 : Float(64, 16, 3, 3)
      %8 : Float(64)
      %9 : Float(16, 128, 1, 1)
      %10 : Float(16)
      %11 : Float(64, 16, 1, 1)
      %12 : Float(64)
      %13 : Float(64, 16, 3, 3)
      %14 : Float(64)
      %15 : Float(32, 128, 1, 1)
      %16 : Float(32)
      %17 : Float(128, 32, 1, 1)
      %18 : Float(128)
      %19 : Float(128, 32, 3, 3)
      %20 : Float(128)
      %21 : Float(32, 256, 1, 1)
      %22 : Float(32)
      %23 : Float(128, 32, 1, 1)
      %24 : Float(128)
      %25 : Float(128, 32, 3, 3)
      %26 : Float(128)
      %27 : Float(48, 256, 1, 1)
      %28 : Float(48)
      %29 : Float(192, 48, 1, 1)
      %30 : Float(192)
      %31 : Float(192, 48, 3, 3)
      %32 : Float(192)
      %33 : Float(48, 384, 1, 1)
      %34 : Float(48)
      %35 : Float(192, 48, 1, 1)
      %36 : Float(192)
      %37 : Float(192, 48, 3, 3)
      %38 : Float(192)
      %39 : Float(64, 384, 1, 1)
      %40 : Float(64)
      %41 : Float(256, 64, 1, 1)
      %42 : Float(256)
      %43 : Float(256, 64, 3, 3)
      %44 : Float(256)
      %45 : Float(64, 512, 1, 1)
      %46 : Float(64)
      %47 : Float(256, 64, 1, 1)
      %48 : Float(256)
      %49 : Float(256, 64, 3, 3)
      %50 : Float(256)
      %51 : Float(1000, 512, 1, 1)
      %52 : Float(1000)) {
  %53 : Float(1, 64, 111, 111) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2]](%0, %1, %2), scope: SqueezeNet/Sequential[features]/Conv2d[0]
  %54 : Float(1, 64, 111, 111) = onnx::Relu(%53), scope: SqueezeNet/Sequential[features]/ReLU[1]
  %55 : int[] = onnx::Constant[value= 3  3 [ CPULongType{2} ]]()
  %56 : int[] = onnx::Constant[value= 2  2 [ CPULongType{2} ]]()
  %57 : int[] = onnx::Constant[value= 0  0 [ CPULongType{2} ]]()
  %58 : int[] = onnx::Constant[value= 1  1 [ CPULongType{2} ]]()
  %59 : Long() = onnx::Constant[value={1}](), scope: SqueezeNet/Sequential[features]/MaxPool2d[2]
  %60 : Float(1, 64, 55, 55), %61 : Long(1, 64, 55, 55) = aten::max_pool2d_with_indices(%54, %55, %56, %57, %58, %59), scope: SqueezeNet/Sequential[features]/MaxPool2d[2]
  %62 : Float(1, 16, 55, 55) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%60, %3, %4), scope: SqueezeNet/Sequential[features]/Fire[3]/Conv2d[squeeze]
  %63 : Float(1, 16, 55, 55) = onnx::Relu(%62), scope: SqueezeNet/Sequential[features]/Fire[3]/ReLU[squeeze_activation]
  %64 : Float(1, 64, 55, 55) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%63, %5, %6), scope: SqueezeNet/Sequential[features]/Fire[3]/Conv2d[expand1x1]
  %65 : Float(1, 64, 55, 55) = onnx::Relu(%64), scope: SqueezeNet/Sequential[features]/Fire[3]/ReLU[expand1x1_activation]
  %66 : Float(1, 64, 55, 55) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%63, %7, %8), scope: SqueezeNet/Sequential[features]/Fire[3]/Conv2d[expand3x3]
  %67 : Float(1, 64, 55, 55) = onnx::Relu(%66), scope: SqueezeNet/Sequential[features]/Fire[3]/ReLU[expand3x3_activation]
  %68 : Float(1, 128, 55, 55) = onnx::Concat[axis=1](%65, %67), scope: SqueezeNet/Sequential[features]/Fire[3]
  %69 : Float(1, 16, 55, 55) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%68, %9, %10), scope: SqueezeNet/Sequential[features]/Fire[4]/Conv2d[squeeze]
  %70 : Float(1, 16, 55, 55) = onnx::Relu(%69), scope: SqueezeNet/Sequential[features]/Fire[4]/ReLU[squeeze_activation]
  %71 : Float(1, 64, 55, 55) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%70, %11, %12), scope: SqueezeNet/Sequential[features]/Fire[4]/Conv2d[expand1x1]
  %72 : Float(1, 64, 55, 55) = onnx::Relu(%71), scope: SqueezeNet/Sequential[features]/Fire[4]/ReLU[expand1x1_activation]
  %73 : Float(1, 64, 55, 55) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%70, %13, %14), scope: SqueezeNet/Sequential[features]/Fire[4]/Conv2d[expand3x3]
  %74 : Float(1, 64, 55, 55) = onnx::Relu(%73), scope: SqueezeNet/Sequential[features]/Fire[4]/ReLU[expand3x3_activation]
  %75 : Float(1, 128, 55, 55) = onnx::Concat[axis=1](%72, %74), scope: SqueezeNet/Sequential[features]/Fire[4]
  %76 : int[] = onnx::Constant[value= 3  3 [ CPULongType{2} ]]()
  %77 : int[] = onnx::Constant[value= 2  2 [ CPULongType{2} ]]()
  %78 : int[] = onnx::Constant[value= 0  0 [ CPULongType{2} ]]()
  %79 : int[] = onnx::Constant[value= 1  1 [ CPULongType{2} ]]()
  %80 : Long() = onnx::Constant[value={1}](), scope: SqueezeNet/Sequential[features]/MaxPool2d[5]
  %81 : Float(1, 128, 27, 27), %82 : Long(1, 128, 27, 27) = aten::max_pool2d_with_indices(%75, %76, %77, %78, %79, %80), scope: SqueezeNet/Sequential[features]/MaxPool2d[5]
  %83 : Float(1, 32, 27, 27) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%81, %15, %16), scope: SqueezeNet/Sequential[features]/Fire[6]/Conv2d[squeeze]
  %84 : Float(1, 32, 27, 27) = onnx::Relu(%83), scope: SqueezeNet/Sequential[features]/Fire[6]/ReLU[squeeze_activation]
  %85 : Float(1, 128, 27, 27) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%84, %17, %18), scope: SqueezeNet/Sequential[features]/Fire[6]/Conv2d[expand1x1]
  %86 : Float(1, 128, 27, 27) = onnx::Relu(%85), scope: SqueezeNet/Sequential[features]/Fire[6]/ReLU[expand1x1_activation]
  %87 : Float(1, 128, 27, 27) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%84, %19, %20), scope: SqueezeNet/Sequential[features]/Fire[6]/Conv2d[expand3x3]
  %88 : Float(1, 128, 27, 27) = onnx::Relu(%87), scope: SqueezeNet/Sequential[features]/Fire[6]/ReLU[expand3x3_activation]
  %89 : Float(1, 256, 27, 27) = onnx::Concat[axis=1](%86, %88), scope: SqueezeNet/Sequential[features]/Fire[6]
  %90 : Float(1, 32, 27, 27) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%89, %21, %22), scope: SqueezeNet/Sequential[features]/Fire[7]/Conv2d[squeeze]
  %91 : Float(1, 32, 27, 27) = onnx::Relu(%90), scope: SqueezeNet/Sequential[features]/Fire[7]/ReLU[squeeze_activation]
  %92 : Float(1, 128, 27, 27) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%91, %23, %24), scope: SqueezeNet/Sequential[features]/Fire[7]/Conv2d[expand1x1]
  %93 : Float(1, 128, 27, 27) = onnx::Relu(%92), scope: SqueezeNet/Sequential[features]/Fire[7]/ReLU[expand1x1_activation]
  %94 : Float(1, 128, 27, 27) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%91, %25, %26), scope: SqueezeNet/Sequential[features]/Fire[7]/Conv2d[expand3x3]
  %95 : Float(1, 128, 27, 27) = onnx::Relu(%94), scope: SqueezeNet/Sequential[features]/Fire[7]/ReLU[expand3x3_activation]
  %96 : Float(1, 256, 27, 27) = onnx::Concat[axis=1](%93, %95), scope: SqueezeNet/Sequential[features]/Fire[7]
  %97 : int[] = onnx::Constant[value= 3  3 [ CPULongType{2} ]]()
  %98 : int[] = onnx::Constant[value= 2  2 [ CPULongType{2} ]]()
  %99 : int[] = onnx::Constant[value= 0  0 [ CPULongType{2} ]]()
  %100 : int[] = onnx::Constant[value= 1  1 [ CPULongType{2} ]]()
  %101 : Long() = onnx::Constant[value={1}](), scope: SqueezeNet/Sequential[features]/MaxPool2d[8]
  %102 : Float(1, 256, 13, 13), %103 : Long(1, 256, 13, 13) = aten::max_pool2d_with_indices(%96, %97, %98, %99, %100, %101), scope: SqueezeNet/Sequential[features]/MaxPool2d[8]
  %104 : Float(1, 48, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%102, %27, %28), scope: SqueezeNet/Sequential[features]/Fire[9]/Conv2d[squeeze]
  %105 : Float(1, 48, 13, 13) = onnx::Relu(%104), scope: SqueezeNet/Sequential[features]/Fire[9]/ReLU[squeeze_activation]
  %106 : Float(1, 192, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%105, %29, %30), scope: SqueezeNet/Sequential[features]/Fire[9]/Conv2d[expand1x1]
  %107 : Float(1, 192, 13, 13) = onnx::Relu(%106), scope: SqueezeNet/Sequential[features]/Fire[9]/ReLU[expand1x1_activation]
  %108 : Float(1, 192, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%105, %31, %32), scope: SqueezeNet/Sequential[features]/Fire[9]/Conv2d[expand3x3]
  %109 : Float(1, 192, 13, 13) = onnx::Relu(%108), scope: SqueezeNet/Sequential[features]/Fire[9]/ReLU[expand3x3_activation]
  %110 : Float(1, 384, 13, 13) = onnx::Concat[axis=1](%107, %109), scope: SqueezeNet/Sequential[features]/Fire[9]
  %111 : Float(1, 48, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%110, %33, %34), scope: SqueezeNet/Sequential[features]/Fire[10]/Conv2d[squeeze]
  %112 : Float(1, 48, 13, 13) = onnx::Relu(%111), scope: SqueezeNet/Sequential[features]/Fire[10]/ReLU[squeeze_activation]
  %113 : Float(1, 192, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%112, %35, %36), scope: SqueezeNet/Sequential[features]/Fire[10]/Conv2d[expand1x1]
  %114 : Float(1, 192, 13, 13) = onnx::Relu(%113), scope: SqueezeNet/Sequential[features]/Fire[10]/ReLU[expand1x1_activation]
  %115 : Float(1, 192, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%112, %37, %38), scope: SqueezeNet/Sequential[features]/Fire[10]/Conv2d[expand3x3]
  %116 : Float(1, 192, 13, 13) = onnx::Relu(%115), scope: SqueezeNet/Sequential[features]/Fire[10]/ReLU[expand3x3_activation]
  %117 : Float(1, 384, 13, 13) = onnx::Concat[axis=1](%114, %116), scope: SqueezeNet/Sequential[features]/Fire[10]
  %118 : Float(1, 64, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%117, %39, %40), scope: SqueezeNet/Sequential[features]/Fire[11]/Conv2d[squeeze]
  %119 : Float(1, 64, 13, 13) = onnx::Relu(%118), scope: SqueezeNet/Sequential[features]/Fire[11]/ReLU[squeeze_activation]
  %120 : Float(1, 256, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%119, %41, %42), scope: SqueezeNet/Sequential[features]/Fire[11]/Conv2d[expand1x1]
  %121 : Float(1, 256, 13, 13) = onnx::Relu(%120), scope: SqueezeNet/Sequential[features]/Fire[11]/ReLU[expand1x1_activation]
  %122 : Float(1, 256, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%119, %43, %44), scope: SqueezeNet/Sequential[features]/Fire[11]/Conv2d[expand3x3]
  %123 : Float(1, 256, 13, 13) = onnx::Relu(%122), scope: SqueezeNet/Sequential[features]/Fire[11]/ReLU[expand3x3_activation]
  %124 : Float(1, 512, 13, 13) = onnx::Concat[axis=1](%121, %123), scope: SqueezeNet/Sequential[features]/Fire[11]
  %125 : Float(1, 64, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%124, %45, %46), scope: SqueezeNet/Sequential[features]/Fire[12]/Conv2d[squeeze]
  %126 : Float(1, 64, 13, 13) = onnx::Relu(%125), scope: SqueezeNet/Sequential[features]/Fire[12]/ReLU[squeeze_activation]
  %127 : Float(1, 256, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%126, %47, %48), scope: SqueezeNet/Sequential[features]/Fire[12]/Conv2d[expand1x1]
  %128 : Float(1, 256, 13, 13) = onnx::Relu(%127), scope: SqueezeNet/Sequential[features]/Fire[12]/ReLU[expand1x1_activation]
  %129 : Float(1, 256, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%126, %49, %50), scope: SqueezeNet/Sequential[features]/Fire[12]/Conv2d[expand3x3]
  %130 : Float(1, 256, 13, 13) = onnx::Relu(%129), scope: SqueezeNet/Sequential[features]/Fire[12]/ReLU[expand3x3_activation]
  %131 : Float(1, 512, 13, 13) = onnx::Concat[axis=1](%128, %130), scope: SqueezeNet/Sequential[features]/Fire[12]
  %132 : Float(1, 512, 13, 13), %133 : Tensor = onnx::Dropout[ratio=0.5](%131), scope: SqueezeNet/Sequential[classifier]/Dropout[0]
  %134 : Float(1, 1000, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%132, %51, %52), scope: SqueezeNet/Sequential[classifier]/Conv2d[1]
  %135 : Float(1, 1000, 13, 13) = onnx::Relu(%134), scope: SqueezeNet/Sequential[classifier]/ReLU[2]
  %136 : Float(1, 1000, 1, 1) = onnx::GlobalAveragePool(%135), scope: SqueezeNet/Sequential[classifier]/AdaptiveAvgPool2d[3]
  %137 : Long() = onnx::Constant[value={0}](), scope: SqueezeNet
  %138 : Tensor = onnx::Shape(%136), scope: SqueezeNet
  %139 : Long() = onnx::Gather[axis=0](%138, %137), scope: SqueezeNet
  %140 : Long() = onnx::Constant[value={1000}](), scope: SqueezeNet
  %141 : Tensor = onnx::Unsqueeze[axes=[0]](%139)
  %142 : Tensor = onnx::Unsqueeze[axes=[0]](%140)
  %143 : Tensor = onnx::Concat[axis=0](%141, %142)
  %144 : Float(1, 1000) = onnx::Reshape(%136, %143), scope: SqueezeNet
  return (%144);
}

NOTE: Fix the previous error by reverting the changes to class SqueezeNet. Revert ceil_mode=True to ceil_mode=False.

In [10]:

torch_out = torch.onnx._export(torch_model,             # model being run
                               x,                       # model input (or a tuple for multiple inputs)
                               "squeezenet-v1.onnx",       # where to save the model (can be a file or file-like object)
                               export_params=True)      # store the trained parameter weights inside the model file

This step will output a squeezenet-v1.onnx file (around 5 MB) in your server/computer storage.

Caffe2¶

After that, we can prepare and run the model and verify that the result of the model running on PyTorch matches the result running on ONNX (with Caffe2 backend).

In [11]:

import onnx
import caffe2.python.onnx.backend
from onnx import helper

Load the ONNX GraphProto object. Graph is a standard Python protobuf object.

In [12]:

model = onnx.load("squeezenet-v1.onnx")

Prepare the Caffe2 backend for executing the model. This converts the ONNX graph into a Caffe2 NetDef that can execute it.

In [13]:

prepared_backend = caffe2.python.onnx.backend.prepare(model)

Run the model in Caffe2.

Construct a map from input names to Tensor data.

The graph itself contains inputs for all weight parameters, followed by the input image.

Since the weights are already embedded, we just need to pass the input image.

Last parameter is the input to the graph.

In [14]:

W = {model.graph.input[0].name: x.data.numpy()}

Run the Caffe2 net:

In [15]:

c2_out = prepared_backend.run(W)[0]

Verify the numerical correctness upto 3 decimal places.

In [16]:

np.testing.assert_almost_equal(torch_out.data.cpu().numpy(), c2_out, decimal=3)

Export the model to run on mobile devices¶

Leverage the cross-platform capability of Caffe2.

In [17]:

# Export to mobile
from caffe2.python.onnx.backend import Caffe2Backend as c2

Caffe2Backend is the backend for running ONNX on Caffe2.

Rewrite ONNX graph to Caffe2 NetDef:

In [18]:

init_net, predict_net = c2.onnx_graph_to_caffe2_net(model)

with open("squeeze_init_net_v1.pb", "wb") as f:
    f.write(init_net.SerializeToString())
with open("squeeze_predict_net_v1.pb", "wb") as f:
    f.write(predict_net.SerializeToString())

You'll see 2 files, squeeze_init_net.pb and squeeze_predict_net.pb in the same directory of this notebook. Let's make sure it can run with Predictor since that's what we'll use in the mobile app.

Loading Pre-Trained Models¶

Optional read or for reference:

Tutorial
- In this tutorial, they will use the SqueezeNet model to identify objects in images.
- You'll learn how to read the protobuf files (i.e.: init_net.pb, predict_net.pb), use the Predictor function in your Caffe2 workspace to load the blobs from the protobufs, and run the net and get the results.

Verify it runs with Predictor

Read the protobuf (*.pb) files:

In [16]:

# with open("squeeze_init_net_v1.pb") as f:
#     init_net = f.read()
# with open("squeeze_predict_net_v1.pb") as f:
#     predict_net = f.read()

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-16-5b4eb965df20> in <module>
      1 with open("squeeze_init_net.pb") as f:
----> 2     init_net = f.read()
      3 with open("squeeze_predict_net.pb") as f:
      4     predict_net = f.read()

~/anaconda3/envs/caffe2/lib/python3.6/codecs.py in decode(self, input, final)
    319         # decode input (taking the buffer into account)
    320         data = self.buffer + input
--> 321         (result, consumed) = self._buffer_decode(data, self.errors, final)
    322         # keep undecoded input until the next call
    323         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 24: invalid continuation byte

Fix the previous UnicodeDecodeError error.

Solution: adding rb flag when opening the file

In [20]:

with open("squeeze_init_net_v1.pb", "rb") as f:
    init_net = f.read()
with open("squeeze_predict_net_v1.pb", "rb") as f:
    predict_net = f.read()

Workspace is a key component of Caffe2.

Workspace is a class that holds all the related objects created during runtime:

all blobs, and

all instantiated networks. It is the owner of all these objects and deals with the scaffolding logistics.

I think this concept is somewhat similar to TensorFlow Session.

Use the Predictor function in your Workspace to load the blobs from the protobufs:

In [21]:

from caffe2.python import workspace

p = workspace.Predictor(init_net, predict_net) # create Predictor by using init NetDef and predict NetDef

Finally, run the net and get the results!

In [22]:

img = np.random.rand(1, 3, 224, 224).astype(np.float32) # create a random image tensor

result, = p.run([img])
print(result.shape) # our model produces prediction for each of ImageNet 1000 classes

(1, 1000)

Fast.ai Mobile Camera Project¶

Integrating Caffe2 on Mobile¶

Caffe2 is optimized for mobile integrations, both Android and iOS and running models on lower powered devices.

In this notebook, we will go through what you need to know to implement Caffe2 in your mobile project.

Shipping the models into the Android app¶

After we are sure that it runs with Predictor, we can copy squeeze_init_net_v1.pb and squeeze_predict_net_v1.pb to AICamera/app/src/main/assets directory.

Now we can launch Android Studio and import the AICamera project. Next, run the app by pressing the Shift + F10 shortcut keys.

You can check Caffe2 AI Camera tutorial for more details of how Caffe2 can be invoked in the Android mobile app.

Android App Development using Android Studio¶

We are building our mobile app using Android Studio version 3.2.1 and NDK version r18 and above.

Some of the problems we encountered (OUTDATED AS OF 2019-01-01):¶

[RESOLVED] Gradle Sync problems:
- Resolution: Install Android Native Development Kit (NDK) version r15.
[RESOLVED] Error: "Unable to get the CMake version located at: /home/cedric/m/dev/android/sdk/cmake/bin"
- Install CMake 3.6.xxxxxxx using SDK Manager:
[RESOLVED] Error: "Expected NDK STL shared object file at /home/cedric/m/dev/android/sdk/ndk-bundle/sources/cxx-stl/gnu-libstdc++/4.9/libs/armeabi-v7a/libgnustl_shared.so"
- GitHub Issue
[RESOLVED] Error: "This Gradle plugin requires a newer IDE able to request IDE model level 3. For Android Studio this means version 3.0+"
- GitHub issue
- StackOverflow question
[RESOLVED] Error: "(5, 0) Could not find method google() for arguments [] on repository container."
- GitHub Issue
[RESOLVED] Error: "A problem occurred configuring project ':app'. > buildToolsVersion is not specified."
- GitHub Issue
- StackOverflow question
[WIP] Error: "android A/libc Fatal signal 6 (SIGABRT), code -6"
- GitHub Issue—AICamera demo with Other Networks

Android project, Java source code, and the Android Studio work space¶

ClassifyCamera.java code, initialize Caffe2 core C++ libraries, squeeze_predict_net_v1.pb protobuf file.

NDK external native C++ build handled by CMake tooling and a CMakeLists file.

CMakeLists source code and JNI libs such as Caffe2 libraries for ARM architecture (i.e.armeabi-v7a/libCaffe2.a, etc)

Demo¶

Check out a working Caffe2 implementation on mobile:

Android camera app demo (video)

Technical specifications¶

Network architecture: SqueezeNet 1.1
Real-time image classification from video stream
Performance: average 3 fps (frames per second)

Steps¶

Deploy mobile config and the models to devices.
Instantiate a Caffe2 instance (Android) or caffe2::Predictor instance (iOS) to expose the model to your Java or iOS code.
Pass inputs to the model and get outputs back.

Objects in graph¶

caffe2::NetDef - (binary-serialized) protocol buffer instance that encapsulates the computation graph and the pre-trained weights.
caffe2::Predictor - stateful class that is instantiated with an "initialization" NetDef and a "predict" NetDef, and executes the "predict" NetDef with the input and returns the output.

Mobile app layout in pure C++¶

Caffe2 core library, composed of the Workspace, Blob, Net, and Operator classes.
Caffe2 operator library, a range of Operator implementations (such as convolution, etc)
Non-optional dependencies:
- Google Protobuf (the lite version, around 300kb)
- Eigen, a BLAS (on Android) is required for certain primitives, and a vectorized vector/matrix manipulation library, and Eigen is the fastest benchmarked on ARM.
[NEW 2019-01-01!] Quantized Neural Network PACKage (QNNPACK)—mobile-optimized implementation of quantized neural network operators
NNPACK, which specifically optimizes convolutions on ARM

Model¶

A model consists of two parts—a set of weights that represent the learned parameters (updated during training), and a set of 'operations' that form a computation graph that represent how to combine the input data (that varies with each graph pass) with the learned parameters (constant with each graph pass). The parameters (and intermediate states in the computation graph live in a Caffe2 Workspace (like TensorFlow Session), where a Blob represents an arbitrary typed pointer, typically a TensorCPU, which is an n-dimensional array (like PyTorch’s Tensor).

The core class is caffe2::Predictor, which exposes the constructor:

Predictor(const NetDef& init_net, const NetDef& predict_net)

where the two NetDef inputs are Google Protocol Buffer objects that represent the 2 computation graphs described above:

the init_net typically runs a set of operations that deserialize weights into the Workspace
the predict_net specifies how to execute the computation graph for each input

The Predictor is a stateful class.

Performance considerations¶

Currently Caffe2 is optimized for ARM CPUs with NEON (basically any ARM CPU since 2012). There are other advantages to offloading compute onto the GPU/DSP, and it's an active work in progress to expose these in Caffe2.

For a convolutional implementation, it is recommended to use NNPACK since that's substantially faster (around 2x-3x) than the standard im2col/sgemm implementation used in most frameworks.

For non-convolutional (e.g. ranking) workloads, the key computational primitive are often fully-connected layers (e.g. FullyConnectedOp in Caffe2, InnerProductLayer in Caffe, nn.Linear in Torch). For these use cases, you can fall back to a BLAS library, specifically Accelerate on iOS and Eigen on Android.

Memory considerations¶

The model for memory usage of an instantiated and run Predictor is that it’s the sum of the size of the weights and the total size of the activations. There is no ‘static’ memory allocated, all allocations are tied to the Workspace instance owned by the Predictor, so there should be no memory impact after all Predictor instances are deleted.

In [ ]: