Notebook

Example: Convert a PyTorch model to CoreML¶

In this example, we will convert a pretrained PyTorch model. For more details on all the different ways to convert these models, see the coremltools PyTorch documentation.

Create a PyTorch Model¶

For this example, we will load MobileNetV2 with pretrained weights from ImageNet. PyTorch assumes the input image pixel values have a been normalized in a specific way, which coremltools can approximate during the conversion process. We also follow the model with a Softmax layer since we are doing classification.

In [ ]:

import tensorflow as tf

input_shape = (3, 224, 224)

import torch
import torchvision

torch_model = torch.nn.Sequential(
    torch.hub.load('pytorch/vision:v0.6.0', 'mobilenet_v2', pretrained=True),
    torch.nn.Softmax(dim=1)
)
torch_model.eval();
torch_model

We also want the class labels to convert the model predictions to English category names.

In [ ]:

# Download class labels (from a separate file)
import urllib
label_url = 'https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt'
class_labels = urllib.request.urlopen(label_url).read().decode("utf-8").splitlines()
class_labels = class_labels[1:] # remove the first class which is background
assert len(class_labels) == 1000

Test the PyTorch Model¶

As a quick check, let's load an image of a dog (source, CC BY-SA 4.0) and run it through the PyTorch model to ensure it is working.

In [ ]:

import PIL

img = PIL.Image.open('Dog.jpg').resize(input_shape[1:])
img

In [ ]:

import numpy as np
from torchvision import transforms

preprocess = transforms.Compose([
                 # This converts the PIL Image to a tensor with the pixel range of [0, 1]
                 transforms.ToTensor(),
                 # This computes (image - mean) / std
                 transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

input_tensor = preprocess(img)
input_batch = input_tensor.unsqueeze(0)

with torch.no_grad():
    out = torch_model(input_batch)[0].numpy()

top_5_indices = np.flip(np.argsort(out))[:5]
for i in top_5_indices:
    print(class_labels[i], out[i])

Convert to CoreML Model¶

The convert() function in coremltools will take a TensorFlow or PyTorch model in one of many forms, analyze it, and generate a CoreML model wrapped in a ct.models.MLModel Python object. In the case of PyTorch, convert() needs a trace from executing the PyTorch model to create the operation graph, which we can generate using torch.jit.trace().

The convert() API has two optional arguments which we will use:

inputs: List of type descriptions for the input layers. The types can be inferred from the input model in some cases, but here we want to interpret the (3, 224, 224) input tensor as an image. This will allow us to specify a scale and offset ("bias") for the pixel values, as required by the model, and also describe the color channel ordering. We will include color layout metadata so it can be used by the consuming application of the model to ensure correct pixel ordering.

Note that the scale and bias are applied in the opposite order and "inversed" compared to the PyTorch Normalize transform (image * scale + bias). Additionally, we need to rescale the integer pixel values to the range [0, 1] with the scale factor.

classifier_config: Since this is a classifier model, we include the class labels here so that the user of the CoreML model has that data as well.

The converter will translate the PyTorch trace into a Model Intermediate Language representation, optimize it for performance, and finally emit the equivalent CoreML operations.

In [ ]:

import coremltools as ct

# Trace with random data
example_input = torch.rand(1, *input_shape)
traced_model = torch.jit.trace(torch_model, example_input)
cml_model = ct.convert(traced_model,
                       inputs=[ct.ImageType(color_layout='RGB', scale=1.0/255.0/0.226,
                                            bias=(-0.485/0.226, -0.456/0.226, -0.406/0.226),
                                            shape=example_input.shape)],
                       classifier_config=ct.ClassifierConfig(class_labels)
                      )

We can see the metadata describing this model by looking at the string representation:

In [ ]:

cml_model

Finally, we can write the model to disk in the CoreML format.

In [ ]:

cml_model.save('MobileNet_v2_torch.mlmodel')

Using the CoreML Model¶

There are two ways to use the trained CoreML model for inference:

Load the model in Python using coremltools and call the predict() method. This requires macOS as the model is executed by the CoreML framework. The OS will use whatever specialized hardware may be available to accelerate execution.
Load the model in a compiled macOS or iOS app using the CoreML APIs.

If you are running this notebook on a Mac (not available on Binder), you can test the model directly

In [ ]:

import sys
IS_MACOS = sys.platform == 'darwin'

if IS_MACOS:
    loaded_model = ct.models.MLModel('MobileNet_v2_torch.mlmodel')
    prediction = loaded_model.predict({'input.1': img})
    print('top prediction:', prediction['classLabel'])
else:
    prediction = 'Skipping prediction on non-macOS system'

Testing the CoreML Model on iOS¶

If you are running this notebook remotely and browsing on an iOS device, you can download the model you just trained and test it out with the CoreMLCompare app (source code) and your device camera. (Note: the author of CoreMLCompare is not affiliated with coremltools or Apple. It is just a handy app for loading and comparing CoreML image classifiers.)

Steps:

Install CoreMLCompare from the App Store or build it from source and load on your device if you have an Apple developer account.
Download the CoreML model you just trained onto your device from the Jupyter file browser or by tapping this link: MobileNet_v2_torch.mlmodel
Open the "Files" app in iOS, browse to the file you downloaded, and tap on it. Then tap the share icon and select "CoreMLCompare" on the share sheet that appears. This will open the CoreMLCompare app.
The app will pop up a modal asking for the index of the new model. CoreMLCompare can run up to 4 different image classifiers simultaneously, and includes two by default (MobileNet and SqueezeNet). Select index 3 and tap "Import" to put your new model into an empty slot.
After a brief delay, the image classifications at the bottom of the image will start to update live based on what your camera is pointing at.

Note that this model is using ImageNet categories which might not correspond to the types of objects you have nearby. A pen on a white piece of paper provides a good test case. It will likely be classified as "Ballpoint" or "Fountain Pen".

In [ ]: