Notebook last updated on 2018-12-31 using PyTorch 1.0 nightly with Android fixes. Code changes:
squeeze_init_net_v1.pb
and squeeze_predict_net_v1.pb
) again for verification.nn.AdaptiveAvgPool2d((1, 1))
.In this notebook we will show you how to export SqueezeNet which is implemented and trained in fastai library (TODO) and PyTorch to run on mobile devices.
Let's get started. First, you should have PyTorch and ONNX installed in your environment and git cloned AICamera repo.
NOTE: Caffe2 pre-built binaries were installed together when you install PyTorch as the Caffe2 source code now lives in the PyTorch repository.
conda install pytorch-nightly cuda92 -c pytorch
Import some Python packages
import io
import numpy as np
import torch.onnx
Note: we are using PyTorch 1.0 preview (nightly) released on 2018-12-31.
print(torch.__version__)
1.0.0.dev20181231
NOTE: as the work to bridge ResNet-family models build using fastai v1 library to pure PyTorch land continues, for now, the steps below will use an example of mobile-first CNN, SqueezeNet available from torchvision. This model was developed in plain PyTorch (not in fastai v1).
SqueezeNet is a small CNN which achieves AlexNet level accuracy on ImageNet with 50x fewer parameters. Paper.
Use cases
SqueezeNet models perform image classification—they take images as input and classify the major object in the image into a set of pre-defined classes. They are trained on ImageNet dataset which contains images from 1000 classes. SqueezeNet models are highly efficient in terms of size and speed while providing good accuracies. This makes them ideal for platforms with strict constraints on size.
SqueezeNet version 1.1
SqueezeNet 1.1 presented in the official SqueezeNet repo is an improved version of SqueezeNet 1.0 from the paper.
SqueezeNet version 1.1 requires 2.4x less computation than version 1.0, without sacrificing accuracy. [Jun 2016]
The following SqueezeNet implementation in PyTorch by Marat Dukhan and it is part of torchvision
:
import torch
import torch.nn as nn
import torch.nn.init as init
import torch.utils.model_zoo as model_zoo
__all__ = ['SqueezeNet', 'squeezenet1_0', 'squeezenet1_1']
model_urls = {
'squeezenet1_0': 'https://download.pytorch.org/models/squeezenet1_0-a815701f.pth',
'squeezenet1_1': 'https://download.pytorch.org/models/squeezenet1_1-f364aa15.pth',
}
class Fire(nn.Module):
def __init__(self, inplanes, squeeze_planes,
expand1x1_planes, expand3x3_planes):
super(Fire, self).__init__()
self.inplanes = inplanes
self.squeeze = nn.Conv2d(inplanes, squeeze_planes, kernel_size=1)
self.squeeze_activation = nn.ReLU(inplace=True)
self.expand1x1 = nn.Conv2d(squeeze_planes, expand1x1_planes,
kernel_size=1)
self.expand1x1_activation = nn.ReLU(inplace=True)
self.expand3x3 = nn.Conv2d(squeeze_planes, expand3x3_planes,
kernel_size=3, padding=1)
self.expand3x3_activation = nn.ReLU(inplace=True)
def forward(self, x):
x = self.squeeze_activation(self.squeeze(x))
return torch.cat([
self.expand1x1_activation(self.expand1x1(x)),
self.expand3x3_activation(self.expand3x3(x))
], 1)
class SqueezeNet(nn.Module):
def __init__(self, version=1.0, num_classes=1000):
super(SqueezeNet, self).__init__()
if version not in [1.0, 1.1]:
raise ValueError("Unsupported SqueezeNet version {version}:"
"1.0 or 1.1 expected".format(version=version))
self.num_classes = num_classes
if version == 1.0:
self.features = nn.Sequential(
nn.Conv2d(3, 96, kernel_size=7, stride=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=False),
Fire(96, 16, 64, 64),
Fire(128, 16, 64, 64),
Fire(128, 32, 128, 128),
nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=False),
Fire(256, 32, 128, 128),
Fire(256, 48, 192, 192),
Fire(384, 48, 192, 192),
Fire(384, 64, 256, 256),
nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=False),
Fire(512, 64, 256, 256),
)
else:
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, stride=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=False),
Fire(64, 16, 64, 64),
Fire(128, 16, 64, 64),
nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=False),
Fire(128, 32, 128, 128),
Fire(256, 32, 128, 128),
nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=False),
Fire(256, 48, 192, 192),
Fire(384, 48, 192, 192),
Fire(384, 64, 256, 256),
Fire(512, 64, 256, 256),
)
# Final convolution is initialized differently form the rest
final_conv = nn.Conv2d(512, self.num_classes, kernel_size=1)
self.classifier = nn.Sequential(
nn.Dropout(p=0.5),
final_conv,
nn.ReLU(inplace=True),
nn.AdaptiveAvgPool2d((1, 1))
)
for m in self.modules():
if isinstance(m, nn.Conv2d):
if m is final_conv:
init.normal_(m.weight, mean=0.0, std=0.01)
else:
init.kaiming_uniform_(m.weight)
if m.bias is not None:
init.constant_(m.bias, 0)
def forward(self, x):
x = self.features(x)
x = self.classifier(x)
return x.view(x.size(0), self.num_classes)
def squeezenet1_0(pretrained=False, **kwargs):
r"""SqueezeNet model architecture from the `"SqueezeNet: AlexNet-level
accuracy with 50x fewer parameters and <0.5MB model size"
<https://arxiv.org/abs/1602.07360>`_ paper.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = SqueezeNet(version=1.0, **kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(model_urls['squeezenet1_0']))
return model
def squeezenet1_1(pretrained=False, **kwargs):
r"""SqueezeNet 1.1 model from the `official SqueezeNet repo
<https://github.com/DeepScale/SqueezeNet/tree/master/SqueezeNet_v1.1>`_.
SqueezeNet 1.1 has 2.4x less computation and slightly fewer parameters
than SqueezeNet 1.0, without sacrificing accuracy.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = SqueezeNet(version=1.1, **kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(model_urls['squeezenet1_1']))
return model
We can get the PyTorch model by calling the following function:
# Get pre-trained SqueezeNet model
torch_model = squeezenet1_1(True)
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel_launcher.py:94: UserWarning: nn.init.kaiming_uniform is now deprecated in favor of nn.init.kaiming_uniform_. /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel_launcher.py:92: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal_. Downloading: "https://download.pytorch.org/models/squeezenet1_1-f364aa15.pth" to /home/ubuntu/.torch/models/squeezenet1_1-f364aa15.pth 100%|██████████| 4966400/4966400 [00:01<00:00, 3173216.83it/s]
Note: we are using ONNX 1.3.0 here.
!cat ~/development/resources/onnx/VERSION_NUMBER
1.3.0
from torch.autograd import Variable
batch_size = 1 # just a random number
Input to the model:
x = Variable(torch.randn(batch_size, 3, 224, 224), requires_grad=True)
torch_out = torch.onnx._export(torch_model, # model being run
x, # model input (or a tuple for multiple inputs)
"squeezenet-v1.onnx", # where to save the model (can be a file or file-like object)
export_params=True) # store the trained parameter weights inside the model file
/home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/onnx/symbolic.py:131: UserWarning: ONNX export failed on max_pool2d_with_indices because ceil_mode not supported warnings.warn("ONNX export failed on " + op + " because " + msg + " not supported")
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-7-8e5c012efbf1> in <module> 2 x, # model input (or a tuple for multiple inputs) 3 "squeezenet-v1.onnx", # where to save the model (can be a file or file-like object) ----> 4 export_params=True) # store the trained parameter weights inside the model file ~/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/onnx/__init__.py in _export(*args, **kwargs) 20 def _export(*args, **kwargs): 21 from torch.onnx import utils ---> 22 return utils._export(*args, **kwargs) 23 24 ~/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/onnx/utils.py in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, example_outputs, propagate) 285 defer_weight_export = export_type is not ExportTypes.PROTOBUF_FILE 286 if export_params: --> 287 proto, export_map = graph._export_onnx(params, _onnx_opset_version, defer_weight_export, operator_export_type) 288 else: 289 proto, export_map = graph._export_onnx([], _onnx_opset_version, False, operator_export_type) RuntimeError: ONNX export failed: Couldn't export operator aten::max_pool2d_with_indices Defined at: /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/functional.py(416): max_pool2d_with_indices /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/functional.py(424): _max_pool2d /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/_jit_internal.py(129): fn /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/pooling.py(148): forward /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/module.py(477): _slow_forward /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/module.py(487): __call__ /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/container.py(92): forward /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/module.py(477): _slow_forward /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/module.py(487): __call__ <ipython-input-3-f5e9e8cddb5f>(98): forward /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/module.py(477): _slow_forward /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/module.py(487): __call__ /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/jit/__init__.py(253): forward /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/nn/modules/module.py(489): __call__ /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/jit/__init__.py(198): get_trace_graph /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/onnx/utils.py(192): _trace_and_get_graph_from_model /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/onnx/utils.py(224): _model_to_graph /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/onnx/utils.py(281): _export /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/torch/onnx/__init__.py(22): _export <ipython-input-7-8e5c012efbf1>(4): <module> /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/IPython/core/interactiveshell.py(3265): run_code /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/IPython/core/interactiveshell.py(3183): run_ast_nodes /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/IPython/core/interactiveshell.py(3018): run_cell_async /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/IPython/core/async_helpers.py(67): _pseudo_sync_runner /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/IPython/core/interactiveshell.py(2843): _run_cell /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/IPython/core/interactiveshell.py(2817): run_cell /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel/zmqshell.py(536): run_cell /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel/ipkernel.py(294): do_execute /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/gen.py(326): wrapper /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel/kernelbase.py(534): execute_request /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/gen.py(326): wrapper /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel/kernelbase.py(267): dispatch_shell /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/gen.py(326): wrapper /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel/kernelbase.py(357): process_one /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/gen.py(1147): run /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/gen.py(1233): inner /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/stack_context.py(300): null_wrapper /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/ioloop.py(758): _run_callback /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/asyncio/events.py(145): _run /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/asyncio/base_events.py(1440): _run_once /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/asyncio/base_events.py(427): run_forever /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/tornado/platform/asyncio.py(132): start /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel/kernelapp.py(505): start /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/traitlets/config/application.py(658): launch_instance /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/site-packages/ipykernel_launcher.py(16): <module> /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/runpy.py(85): _run_code /home/ubuntu/anaconda3/envs/caffe2/lib/python3.6/runpy.py(193): _run_module_as_main Graph we tried to export: graph(%0 : Float(1, 3, 224, 224) %1 : Float(64, 3, 3, 3) %2 : Float(64) %3 : Float(16, 64, 1, 1) %4 : Float(16) %5 : Float(64, 16, 1, 1) %6 : Float(64) %7 : Float(64, 16, 3, 3) %8 : Float(64) %9 : Float(16, 128, 1, 1) %10 : Float(16) %11 : Float(64, 16, 1, 1) %12 : Float(64) %13 : Float(64, 16, 3, 3) %14 : Float(64) %15 : Float(32, 128, 1, 1) %16 : Float(32) %17 : Float(128, 32, 1, 1) %18 : Float(128) %19 : Float(128, 32, 3, 3) %20 : Float(128) %21 : Float(32, 256, 1, 1) %22 : Float(32) %23 : Float(128, 32, 1, 1) %24 : Float(128) %25 : Float(128, 32, 3, 3) %26 : Float(128) %27 : Float(48, 256, 1, 1) %28 : Float(48) %29 : Float(192, 48, 1, 1) %30 : Float(192) %31 : Float(192, 48, 3, 3) %32 : Float(192) %33 : Float(48, 384, 1, 1) %34 : Float(48) %35 : Float(192, 48, 1, 1) %36 : Float(192) %37 : Float(192, 48, 3, 3) %38 : Float(192) %39 : Float(64, 384, 1, 1) %40 : Float(64) %41 : Float(256, 64, 1, 1) %42 : Float(256) %43 : Float(256, 64, 3, 3) %44 : Float(256) %45 : Float(64, 512, 1, 1) %46 : Float(64) %47 : Float(256, 64, 1, 1) %48 : Float(256) %49 : Float(256, 64, 3, 3) %50 : Float(256) %51 : Float(1000, 512, 1, 1) %52 : Float(1000)) { %53 : Float(1, 64, 111, 111) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2]](%0, %1, %2), scope: SqueezeNet/Sequential[features]/Conv2d[0] %54 : Float(1, 64, 111, 111) = onnx::Relu(%53), scope: SqueezeNet/Sequential[features]/ReLU[1] %55 : int[] = onnx::Constant[value= 3 3 [ CPULongType{2} ]]() %56 : int[] = onnx::Constant[value= 2 2 [ CPULongType{2} ]]() %57 : int[] = onnx::Constant[value= 0 0 [ CPULongType{2} ]]() %58 : int[] = onnx::Constant[value= 1 1 [ CPULongType{2} ]]() %59 : Long() = onnx::Constant[value={1}](), scope: SqueezeNet/Sequential[features]/MaxPool2d[2] %60 : Float(1, 64, 55, 55), %61 : Long(1, 64, 55, 55) = aten::max_pool2d_with_indices(%54, %55, %56, %57, %58, %59), scope: SqueezeNet/Sequential[features]/MaxPool2d[2] %62 : Float(1, 16, 55, 55) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%60, %3, %4), scope: SqueezeNet/Sequential[features]/Fire[3]/Conv2d[squeeze] %63 : Float(1, 16, 55, 55) = onnx::Relu(%62), scope: SqueezeNet/Sequential[features]/Fire[3]/ReLU[squeeze_activation] %64 : Float(1, 64, 55, 55) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%63, %5, %6), scope: SqueezeNet/Sequential[features]/Fire[3]/Conv2d[expand1x1] %65 : Float(1, 64, 55, 55) = onnx::Relu(%64), scope: SqueezeNet/Sequential[features]/Fire[3]/ReLU[expand1x1_activation] %66 : Float(1, 64, 55, 55) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%63, %7, %8), scope: SqueezeNet/Sequential[features]/Fire[3]/Conv2d[expand3x3] %67 : Float(1, 64, 55, 55) = onnx::Relu(%66), scope: SqueezeNet/Sequential[features]/Fire[3]/ReLU[expand3x3_activation] %68 : Float(1, 128, 55, 55) = onnx::Concat[axis=1](%65, %67), scope: SqueezeNet/Sequential[features]/Fire[3] %69 : Float(1, 16, 55, 55) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%68, %9, %10), scope: SqueezeNet/Sequential[features]/Fire[4]/Conv2d[squeeze] %70 : Float(1, 16, 55, 55) = onnx::Relu(%69), scope: SqueezeNet/Sequential[features]/Fire[4]/ReLU[squeeze_activation] %71 : Float(1, 64, 55, 55) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%70, %11, %12), scope: SqueezeNet/Sequential[features]/Fire[4]/Conv2d[expand1x1] %72 : Float(1, 64, 55, 55) = onnx::Relu(%71), scope: SqueezeNet/Sequential[features]/Fire[4]/ReLU[expand1x1_activation] %73 : Float(1, 64, 55, 55) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%70, %13, %14), scope: SqueezeNet/Sequential[features]/Fire[4]/Conv2d[expand3x3] %74 : Float(1, 64, 55, 55) = onnx::Relu(%73), scope: SqueezeNet/Sequential[features]/Fire[4]/ReLU[expand3x3_activation] %75 : Float(1, 128, 55, 55) = onnx::Concat[axis=1](%72, %74), scope: SqueezeNet/Sequential[features]/Fire[4] %76 : int[] = onnx::Constant[value= 3 3 [ CPULongType{2} ]]() %77 : int[] = onnx::Constant[value= 2 2 [ CPULongType{2} ]]() %78 : int[] = onnx::Constant[value= 0 0 [ CPULongType{2} ]]() %79 : int[] = onnx::Constant[value= 1 1 [ CPULongType{2} ]]() %80 : Long() = onnx::Constant[value={1}](), scope: SqueezeNet/Sequential[features]/MaxPool2d[5] %81 : Float(1, 128, 27, 27), %82 : Long(1, 128, 27, 27) = aten::max_pool2d_with_indices(%75, %76, %77, %78, %79, %80), scope: SqueezeNet/Sequential[features]/MaxPool2d[5] %83 : Float(1, 32, 27, 27) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%81, %15, %16), scope: SqueezeNet/Sequential[features]/Fire[6]/Conv2d[squeeze] %84 : Float(1, 32, 27, 27) = onnx::Relu(%83), scope: SqueezeNet/Sequential[features]/Fire[6]/ReLU[squeeze_activation] %85 : Float(1, 128, 27, 27) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%84, %17, %18), scope: SqueezeNet/Sequential[features]/Fire[6]/Conv2d[expand1x1] %86 : Float(1, 128, 27, 27) = onnx::Relu(%85), scope: SqueezeNet/Sequential[features]/Fire[6]/ReLU[expand1x1_activation] %87 : Float(1, 128, 27, 27) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%84, %19, %20), scope: SqueezeNet/Sequential[features]/Fire[6]/Conv2d[expand3x3] %88 : Float(1, 128, 27, 27) = onnx::Relu(%87), scope: SqueezeNet/Sequential[features]/Fire[6]/ReLU[expand3x3_activation] %89 : Float(1, 256, 27, 27) = onnx::Concat[axis=1](%86, %88), scope: SqueezeNet/Sequential[features]/Fire[6] %90 : Float(1, 32, 27, 27) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%89, %21, %22), scope: SqueezeNet/Sequential[features]/Fire[7]/Conv2d[squeeze] %91 : Float(1, 32, 27, 27) = onnx::Relu(%90), scope: SqueezeNet/Sequential[features]/Fire[7]/ReLU[squeeze_activation] %92 : Float(1, 128, 27, 27) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%91, %23, %24), scope: SqueezeNet/Sequential[features]/Fire[7]/Conv2d[expand1x1] %93 : Float(1, 128, 27, 27) = onnx::Relu(%92), scope: SqueezeNet/Sequential[features]/Fire[7]/ReLU[expand1x1_activation] %94 : Float(1, 128, 27, 27) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%91, %25, %26), scope: SqueezeNet/Sequential[features]/Fire[7]/Conv2d[expand3x3] %95 : Float(1, 128, 27, 27) = onnx::Relu(%94), scope: SqueezeNet/Sequential[features]/Fire[7]/ReLU[expand3x3_activation] %96 : Float(1, 256, 27, 27) = onnx::Concat[axis=1](%93, %95), scope: SqueezeNet/Sequential[features]/Fire[7] %97 : int[] = onnx::Constant[value= 3 3 [ CPULongType{2} ]]() %98 : int[] = onnx::Constant[value= 2 2 [ CPULongType{2} ]]() %99 : int[] = onnx::Constant[value= 0 0 [ CPULongType{2} ]]() %100 : int[] = onnx::Constant[value= 1 1 [ CPULongType{2} ]]() %101 : Long() = onnx::Constant[value={1}](), scope: SqueezeNet/Sequential[features]/MaxPool2d[8] %102 : Float(1, 256, 13, 13), %103 : Long(1, 256, 13, 13) = aten::max_pool2d_with_indices(%96, %97, %98, %99, %100, %101), scope: SqueezeNet/Sequential[features]/MaxPool2d[8] %104 : Float(1, 48, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%102, %27, %28), scope: SqueezeNet/Sequential[features]/Fire[9]/Conv2d[squeeze] %105 : Float(1, 48, 13, 13) = onnx::Relu(%104), scope: SqueezeNet/Sequential[features]/Fire[9]/ReLU[squeeze_activation] %106 : Float(1, 192, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%105, %29, %30), scope: SqueezeNet/Sequential[features]/Fire[9]/Conv2d[expand1x1] %107 : Float(1, 192, 13, 13) = onnx::Relu(%106), scope: SqueezeNet/Sequential[features]/Fire[9]/ReLU[expand1x1_activation] %108 : Float(1, 192, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%105, %31, %32), scope: SqueezeNet/Sequential[features]/Fire[9]/Conv2d[expand3x3] %109 : Float(1, 192, 13, 13) = onnx::Relu(%108), scope: SqueezeNet/Sequential[features]/Fire[9]/ReLU[expand3x3_activation] %110 : Float(1, 384, 13, 13) = onnx::Concat[axis=1](%107, %109), scope: SqueezeNet/Sequential[features]/Fire[9] %111 : Float(1, 48, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%110, %33, %34), scope: SqueezeNet/Sequential[features]/Fire[10]/Conv2d[squeeze] %112 : Float(1, 48, 13, 13) = onnx::Relu(%111), scope: SqueezeNet/Sequential[features]/Fire[10]/ReLU[squeeze_activation] %113 : Float(1, 192, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%112, %35, %36), scope: SqueezeNet/Sequential[features]/Fire[10]/Conv2d[expand1x1] %114 : Float(1, 192, 13, 13) = onnx::Relu(%113), scope: SqueezeNet/Sequential[features]/Fire[10]/ReLU[expand1x1_activation] %115 : Float(1, 192, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%112, %37, %38), scope: SqueezeNet/Sequential[features]/Fire[10]/Conv2d[expand3x3] %116 : Float(1, 192, 13, 13) = onnx::Relu(%115), scope: SqueezeNet/Sequential[features]/Fire[10]/ReLU[expand3x3_activation] %117 : Float(1, 384, 13, 13) = onnx::Concat[axis=1](%114, %116), scope: SqueezeNet/Sequential[features]/Fire[10] %118 : Float(1, 64, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%117, %39, %40), scope: SqueezeNet/Sequential[features]/Fire[11]/Conv2d[squeeze] %119 : Float(1, 64, 13, 13) = onnx::Relu(%118), scope: SqueezeNet/Sequential[features]/Fire[11]/ReLU[squeeze_activation] %120 : Float(1, 256, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%119, %41, %42), scope: SqueezeNet/Sequential[features]/Fire[11]/Conv2d[expand1x1] %121 : Float(1, 256, 13, 13) = onnx::Relu(%120), scope: SqueezeNet/Sequential[features]/Fire[11]/ReLU[expand1x1_activation] %122 : Float(1, 256, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%119, %43, %44), scope: SqueezeNet/Sequential[features]/Fire[11]/Conv2d[expand3x3] %123 : Float(1, 256, 13, 13) = onnx::Relu(%122), scope: SqueezeNet/Sequential[features]/Fire[11]/ReLU[expand3x3_activation] %124 : Float(1, 512, 13, 13) = onnx::Concat[axis=1](%121, %123), scope: SqueezeNet/Sequential[features]/Fire[11] %125 : Float(1, 64, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%124, %45, %46), scope: SqueezeNet/Sequential[features]/Fire[12]/Conv2d[squeeze] %126 : Float(1, 64, 13, 13) = onnx::Relu(%125), scope: SqueezeNet/Sequential[features]/Fire[12]/ReLU[squeeze_activation] %127 : Float(1, 256, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%126, %47, %48), scope: SqueezeNet/Sequential[features]/Fire[12]/Conv2d[expand1x1] %128 : Float(1, 256, 13, 13) = onnx::Relu(%127), scope: SqueezeNet/Sequential[features]/Fire[12]/ReLU[expand1x1_activation] %129 : Float(1, 256, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[1, 1]](%126, %49, %50), scope: SqueezeNet/Sequential[features]/Fire[12]/Conv2d[expand3x3] %130 : Float(1, 256, 13, 13) = onnx::Relu(%129), scope: SqueezeNet/Sequential[features]/Fire[12]/ReLU[expand3x3_activation] %131 : Float(1, 512, 13, 13) = onnx::Concat[axis=1](%128, %130), scope: SqueezeNet/Sequential[features]/Fire[12] %132 : Float(1, 512, 13, 13), %133 : Tensor = onnx::Dropout[ratio=0.5](%131), scope: SqueezeNet/Sequential[classifier]/Dropout[0] %134 : Float(1, 1000, 13, 13) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%132, %51, %52), scope: SqueezeNet/Sequential[classifier]/Conv2d[1] %135 : Float(1, 1000, 13, 13) = onnx::Relu(%134), scope: SqueezeNet/Sequential[classifier]/ReLU[2] %136 : Float(1, 1000, 1, 1) = onnx::GlobalAveragePool(%135), scope: SqueezeNet/Sequential[classifier]/AdaptiveAvgPool2d[3] %137 : Long() = onnx::Constant[value={0}](), scope: SqueezeNet %138 : Tensor = onnx::Shape(%136), scope: SqueezeNet %139 : Long() = onnx::Gather[axis=0](%138, %137), scope: SqueezeNet %140 : Long() = onnx::Constant[value={1000}](), scope: SqueezeNet %141 : Tensor = onnx::Unsqueeze[axes=[0]](%139) %142 : Tensor = onnx::Unsqueeze[axes=[0]](%140) %143 : Tensor = onnx::Concat[axis=0](%141, %142) %144 : Float(1, 1000) = onnx::Reshape(%136, %143), scope: SqueezeNet return (%144); }
NOTE: Fix the previous error by reverting the changes to class SqueezeNet
. Revert ceil_mode=True
to ceil_mode=False
.
torch_out = torch.onnx._export(torch_model, # model being run
x, # model input (or a tuple for multiple inputs)
"squeezenet-v1.onnx", # where to save the model (can be a file or file-like object)
export_params=True) # store the trained parameter weights inside the model file
This step will output a squeezenet-v1.onnx
file (around 5 MB) in your server/computer storage.
After that, we can prepare and run the model and verify that the result of the model running on PyTorch matches the result running on ONNX (with Caffe2 backend).
import onnx
import caffe2.python.onnx.backend
from onnx import helper
Load the ONNX GraphProto object. Graph is a standard Python protobuf object.
model = onnx.load("squeezenet-v1.onnx")
Prepare the Caffe2 backend for executing the model. This converts the ONNX graph into a Caffe2 NetDef that can execute it.
prepared_backend = caffe2.python.onnx.backend.prepare(model)
Run the model in Caffe2.
Construct a map from input names to Tensor data.
The graph itself contains inputs for all weight parameters, followed by the input image.
Since the weights are already embedded, we just need to pass the input image.
Last parameter is the input to the graph.
W = {model.graph.input[0].name: x.data.numpy()}
Run the Caffe2 net:
c2_out = prepared_backend.run(W)[0]
Verify the numerical correctness upto 3 decimal places.
np.testing.assert_almost_equal(torch_out.data.cpu().numpy(), c2_out, decimal=3)
Leverage the cross-platform capability of Caffe2.
# Export to mobile
from caffe2.python.onnx.backend import Caffe2Backend as c2
Caffe2Backend
is the backend for running ONNX on Caffe2.
Rewrite ONNX graph to Caffe2 NetDef:
init_net, predict_net = c2.onnx_graph_to_caffe2_net(model)
with open("squeeze_init_net_v1.pb", "wb") as f:
f.write(init_net.SerializeToString())
with open("squeeze_predict_net_v1.pb", "wb") as f:
f.write(predict_net.SerializeToString())
You'll see 2 files, squeeze_init_net.pb
and squeeze_predict_net.pb
in the same directory of this notebook. Let's make sure it can run with Predictor
since that's what we'll use in the mobile app.
Optional read or for reference:
Verify it runs with Predictor
Read the protobuf (*.pb
) files:
# with open("squeeze_init_net_v1.pb") as f:
# init_net = f.read()
# with open("squeeze_predict_net_v1.pb") as f:
# predict_net = f.read()
--------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) <ipython-input-16-5b4eb965df20> in <module> 1 with open("squeeze_init_net.pb") as f: ----> 2 init_net = f.read() 3 with open("squeeze_predict_net.pb") as f: 4 predict_net = f.read() ~/anaconda3/envs/caffe2/lib/python3.6/codecs.py in decode(self, input, final) 319 # decode input (taking the buffer into account) 320 data = self.buffer + input --> 321 (result, consumed) = self._buffer_decode(data, self.errors, final) 322 # keep undecoded input until the next call 323 self.buffer = data[consumed:] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 24: invalid continuation byte
Fix the previous UnicodeDecodeError
error.
Solution: adding rb
flag when opening the file
with open("squeeze_init_net_v1.pb", "rb") as f:
init_net = f.read()
with open("squeeze_predict_net_v1.pb", "rb") as f:
predict_net = f.read()
Workspace is a key component of Caffe2.
Workspace is a class that holds all the related objects created during runtime:
- all blobs, and
- all instantiated networks. It is the owner of all these objects and deals with the scaffolding logistics.
I think this concept is somewhat similar to TensorFlow Session.
Use the Predictor
function in your Workspace
to load the blobs from the protobufs:
from caffe2.python import workspace
p = workspace.Predictor(init_net, predict_net) # create Predictor by using init NetDef and predict NetDef
Finally, run the net and get the results!
img = np.random.rand(1, 3, 224, 224).astype(np.float32) # create a random image tensor
result, = p.run([img])
print(result.shape) # our model produces prediction for each of ImageNet 1000 classes
(1, 1000)
Caffe2 is optimized for mobile integrations, both Android and iOS and running models on lower powered devices.
In this notebook, we will go through what you need to know to implement Caffe2 in your mobile project.
After we are sure that it runs with Predictor
, we can copy squeeze_init_net_v1.pb
and squeeze_predict_net_v1.pb
to
AICamera/app/src/main/assets
directory.
Now we can launch Android Studio and import the AICamera project. Next, run the app by pressing the Shift + F10
shortcut keys.
You can check Caffe2 AI Camera tutorial for more details of how Caffe2 can be invoked in the Android mobile app.
We are building our mobile app using Android Studio version 3.2.1 and NDK version r18 and above.
/home/cedric/m/dev/android/sdk/ndk-bundle/sources/cxx-stl/gnu-libstdc++/4.9/libs/armeabi-v7a/libgnustl_shared.so
"
ClassifyCamera.java
code, initialize Caffe2 core C++ libraries, squeeze_predict_net_v1.pb
protobuf file.armeabi-v7a/libCaffe2.a
, etc)Check out a working Caffe2 implementation on mobile:
Android camera app demo (video)
A model consists of two parts—a set of weights that represent the learned parameters (updated during training), and a set of 'operations' that form a computation graph that represent how to combine the input data (that varies with each graph pass) with the learned parameters (constant with each graph pass). The parameters (and intermediate states in the computation graph live in a Caffe2 Workspace (like TensorFlow Session), where a Blob represents an arbitrary typed pointer, typically a TensorCPU, which is an n-dimensional array (like PyTorch’s Tensor).
The core class is caffe2::Predictor, which exposes the constructor:
Predictor(const NetDef& init_net, const NetDef& predict_net)
where the two NetDef
inputs are Google Protocol Buffer objects that represent the 2 computation graphs described above:
init_net
typically runs a set of operations that deserialize weights into the Workspacepredict_net
specifies how to execute the computation graph for each inputThe Predictor is a stateful class.
Currently Caffe2 is optimized for ARM CPUs with NEON (basically any ARM CPU since 2012). There are other advantages to offloading compute onto the GPU/DSP, and it's an active work in progress to expose these in Caffe2.
For a convolutional implementation, it is recommended to use NNPACK since that's substantially faster (around 2x-3x) than the standard im2col/sgemm
implementation used in most frameworks.
For non-convolutional (e.g. ranking) workloads, the key computational primitive are often fully-connected layers (e.g. FullyConnectedOp in Caffe2, InnerProductLayer in Caffe, nn.Linear in Torch). For these use cases, you can fall back to a BLAS library, specifically Accelerate on iOS and Eigen on Android.
The model for memory usage of an instantiated and run Predictor is that it’s the sum of the size of the weights and the total size of the activations. There is no ‘static’ memory allocated, all allocations are tied to the Workspace instance owned by the Predictor, so there should be no memory impact after all Predictor instances are deleted.