In this example, we will convert a pretrained PyTorch model. For more details on all the different ways to convert these models, see the coremltools PyTorch documentation.
For this example, we will load MobileNetV2 with pretrained weights from ImageNet. PyTorch assumes the input image pixel values have a been normalized in a specific way, which coremltools can approximate during the conversion process. We also follow the model with a Softmax layer since we are doing classification.
import tensorflow as tf
input_shape = (3, 224, 224)
import torch
import torchvision
torch_model = torch.nn.Sequential(
torch.hub.load('pytorch/vision:v0.6.0', 'mobilenet_v2', pretrained=True),
torch.nn.Softmax(dim=1)
)
torch_model.eval();
torch_model
We also want the class labels to convert the model predictions to English category names.
# Download class labels (from a separate file)
import urllib
label_url = 'https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt'
class_labels = urllib.request.urlopen(label_url).read().decode("utf-8").splitlines()
class_labels = class_labels[1:] # remove the first class which is background
assert len(class_labels) == 1000
As a quick check, let's load an image of a dog (source, CC BY-SA 4.0) and run it through the PyTorch model to ensure it is working.
import PIL
img = PIL.Image.open('Dog.jpg').resize(input_shape[1:])
img
import numpy as np
from torchvision import transforms
preprocess = transforms.Compose([
# This converts the PIL Image to a tensor with the pixel range of [0, 1]
transforms.ToTensor(),
# This computes (image - mean) / std
transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
input_tensor = preprocess(img)
input_batch = input_tensor.unsqueeze(0)
with torch.no_grad():
out = torch_model(input_batch)[0].numpy()
top_5_indices = np.flip(np.argsort(out))[:5]
for i in top_5_indices:
print(class_labels[i], out[i])
The convert() function in coremltools will take a TensorFlow or PyTorch model in one of many forms, analyze it, and generate a CoreML model wrapped in a ct.models.MLModel
Python object. In the case of PyTorch, convert() needs a trace from executing the PyTorch model to create the operation graph, which we can generate using torch.jit.trace()
.
The convert() API has two optional arguments which we will use:
inputs
: List of type descriptions for the input layers. The types can be inferred from the input model in some cases, but here we want to interpret the (3, 224, 224) input tensor as an image. This will allow us to specify a scale and offset ("bias") for the pixel values, as required by the model, and also describe the color channel ordering. We will include color layout metadata so it can be used by the consuming application of the model to ensure correct pixel ordering.Note that the scale and bias are applied in the opposite order and "inversed" compared to the PyTorch Normalize transform (image * scale + bias
). Additionally, we need to rescale the integer pixel values to the range [0, 1] with the scale
factor.
classifier_config
: Since this is a classifier model, we include the class labels here so that the user of the CoreML model has that data as well.The converter will translate the PyTorch trace into a Model Intermediate Language representation, optimize it for performance, and finally emit the equivalent CoreML operations.
import coremltools as ct
# Trace with random data
example_input = torch.rand(1, *input_shape)
traced_model = torch.jit.trace(torch_model, example_input)
cml_model = ct.convert(traced_model,
inputs=[ct.ImageType(color_layout='RGB', scale=1.0/255.0/0.226,
bias=(-0.485/0.226, -0.456/0.226, -0.406/0.226),
shape=example_input.shape)],
classifier_config=ct.ClassifierConfig(class_labels)
)
We can see the metadata describing this model by looking at the string representation:
cml_model
Finally, we can write the model to disk in the CoreML format.
cml_model.save('MobileNet_v2_torch.mlmodel')
There are two ways to use the trained CoreML model for inference:
Load the model in Python using coremltools and call the predict() method. This requires macOS as the model is executed by the CoreML framework. The OS will use whatever specialized hardware may be available to accelerate execution.
Load the model in a compiled macOS or iOS app using the CoreML APIs.
If you are running this notebook on a Mac (not available on Binder), you can test the model directly
import sys
IS_MACOS = sys.platform == 'darwin'
if IS_MACOS:
loaded_model = ct.models.MLModel('MobileNet_v2_torch.mlmodel')
prediction = loaded_model.predict({'input.1': img})
print('top prediction:', prediction['classLabel'])
else:
prediction = 'Skipping prediction on non-macOS system'
If you are running this notebook remotely and browsing on an iOS device, you can download the model you just trained and test it out with the CoreMLCompare app (source code) and your device camera. (Note: the author of CoreMLCompare is not affiliated with coremltools or Apple. It is just a handy app for loading and comparing CoreML image classifiers.)
Steps:
Note that this model is using ImageNet categories which might not correspond to the types of objects you have nearby. A pen on a white piece of paper provides a good test case. It will likely be classified as "Ballpoint" or "Fountain Pen".