BentoML makes moving trained ML models to production easy:
BentoML is a framework for serving, managing, and deploying machine learning models. It is aiming to bridge the gap between Data Science and DevOps, and enable teams to deliver prediction services in a fast, repeatable, and scalable way.
Before reading this example project, be sure to check out the Getting started guide to learn about the basic concepts in BentoML.
This example notebook demonstrates how to use ONNX model zoo with BentoML. It defines a BentoService with resnet50
model and deploys it to AWS sagemaker as an API endpoint.
original notebook: https://github.com/onnx/onnx-docker/blob/master/onnx-ecosystem/inference_demos/resnet50_modelzoo_onnxruntime_inference.ipynb
%reload_ext autoreload
%autoreload 2
%matplotlib inline
!pip install -q bentoml onnx>=1.7.0 onnxruntime>=1.4.0
import numpy as np # we're going to use numpy to process input and output data
import onnxruntime # to inference ONNX models, we use the ONNX Runtime
import onnx
from onnx import numpy_helper
import urllib.request
import json
import time
# display images in notebook
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw, ImageFont
%matplotlib inline
onnx_model_url = "https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v2/resnet50v2.tar.gz"
imagenet_labels_url = "https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json"
# retrieve our model from the ONNX model zoo
urllib.request.urlretrieve(onnx_model_url, filename="resnet50v2.tar.gz")
urllib.request.urlretrieve(imagenet_labels_url, filename="imagenet-simple-labels.json")
!curl https://raw.githubusercontent.com/onnx/onnx-docker/master/onnx-ecosystem/inference_demos/images/dog.jpg -o dog.jpg
!tar xvzf resnet50v2.tar.gz
test_data_dir = 'resnet50v2/test_data_set'
test_data_num = 3
import glob
import os
# Load inputs
inputs = []
for i in range(test_data_num):
input_file = os.path.join(test_data_dir + '_{}'.format(i), 'input_0.pb')
tensor = onnx.TensorProto()
with open(input_file, 'rb') as f:
tensor.ParseFromString(f.read())
inputs.append(numpy_helper.to_array(tensor))
print('Loaded {} inputs successfully.'.format(test_data_num))
# Load reference outputs
ref_outputs = []
for i in range(test_data_num):
output_file = os.path.join(test_data_dir + '_{}'.format(i), 'output_0.pb')
tensor = onnx.TensorProto()
with open(output_file, 'rb') as f:
tensor.ParseFromString(f.read())
ref_outputs.append(numpy_helper.to_array(tensor))
print('Loaded {} reference outputs successfully.'.format(test_data_num))
def load_labels(path):
with open(path) as f:
data = json.load(f)
return np.asarray(data)
labels = load_labels('imagenet-simple-labels.json')
labels
%%writefile onnx_resnet50.py
from typing import List
import numpy as np
import bentoml
from bentoml.frameworks.onnx import OnnxModelArtifact
from bentoml.service.artifacts.common import PickleArtifact
from bentoml.adapters import ImageInput
@bentoml.env(infer_pip_packages=True)
@bentoml.artifacts([OnnxModelArtifact('model'), PickleArtifact('labels')])
class OnnxResnet50(bentoml.BentoService):
def preprocess(self, input_data):
# convert the input data into the float32 input
img_data = np.stack(input_data).transpose(0, 3, 1, 2)
#normalize
mean_vec = np.array([0.485, 0.456, 0.406])
stddev_vec = np.array([0.229, 0.224, 0.225])
norm_img_data = np.zeros(img_data.shape).astype('float32')
for i in range(img_data.shape[0]):
for j in range(img_data.shape[1]):
norm_img_data[i,j,:,:] = (img_data[i,j,:,:]/255 - mean_vec[j]) / stddev_vec[j]
#add batch channel
norm_img_data = norm_img_data.reshape(-1, 3, 224, 224).astype('float32')
return norm_img_data
def softmax(self, x):
x = x.reshape(-1)
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
def post_process(self, raw_result):
return self.softmax(np.array(raw_result)).tolist()
@bentoml.api(input=ImageInput(), batch=True)
def predict(self, image_ndarrays: List[np.ndarray]) -> List[str]:
input_datas = self.preprocess(image_ndarrays)
input_name = self.artifacts.model.get_inputs()[0].name
outputs = []
for i in range(input_datas.shape[0]):
raw_result = self.artifacts.model.run([], {input_name: input_datas[i:i+1]})
result = self.post_process(raw_result)
idx = np.argmax(result)
sort_idx = np.flip(np.squeeze(np.argsort(result)))
# return top 5 labels
outputs.append(self.artifacts.labels[sort_idx[:5]])
return outputs
from onnx_resnet50 import OnnxResnet50
svc = OnnxResnet50()
svc.pack('labels', labels)
svc.pack('model', 'resnet50v2/resnet50v2.onnx')
saved_path = svc.save()
saved_path
To start a REST API model server with the BentoService saved above, use the bentoml serve command:
!bentoml serve OnnxResnet50:latest
If you are running this notebook from Google Colab, you can start the dev server with --run-with-ngrok
option, to gain acccess to the API endpoint via a public endpoint managed by ngrok:
!bentoml serve OnnxResnet50:latest --run-with-ngrok
Sending POST request from termnial:
curl -X POST "http://127.0.0.1:5000/predict" -F image=@dog.jpg
curl -X POST "http://127.0.0.1:5000/predict" -H "Content-Type: image/png" --data-binary @dog.jpg
Go visit http://127.0.0.1:5000/ from your browser, click /predict
-> Try it out
-> Choose File
-> Execute
to sumbit an image from your computer
One common way of distributing this model API server for production deployment, is via Docker containers. And BentoML provides a convenient way to do that.
Note that docker is not available in Google Colab. You will need to download and run this notebook locally to try out this containerization with docker feature.
If you already have docker configured, simply run the follow command to product a docker container serving the IrisClassifier prediction service created above:
!bentoml containerize OnnxResnet50:latest
!docker run --rm -p 5000:5000 onnxresnet50:20200922124402_4DE94A
BentoML cli supports loading and running a packaged model from CLI. With the DataframeInput adapter, the CLI command supports reading input Dataframe data from CLI argument or local csv or json files:
!bentoml run OnnxResnet50:latest predict --input-file dog.jpg
bentoml.load is the API for loading a BentoML packaged model in python:
image = Image.open('dog.jpg')
image
from bentoml import load
loaded_svc = load(saved_path)
image_data = np.array(image)
np.array(loaded_svc.predict([image_data] * 4))
If you are at a small team with limited engineering or DevOps resources, try out automated deployment with BentoML CLI, currently supporting AWS Lambda, AWS SageMaker, and Azure Functions:
If the cloud platform you are working with is not on the list above, try out these step-by-step guide on manually deploying BentoML packaged model to cloud platforms:
Lastly, if you have a DevOps or ML Engineering team who's operating a Kubernetes or OpenShift cluster, use the following guides as references for implementating your deployment strategy: