Notebook

BentoML Example: ONNX GPU Serving¶

BentoML makes moving trained ML models to production easy:

Package models trained with any ML framework and reproduce them for model serving in production
Deploy anywhere for online API serving or offline batch serving
High-Performance API model server with adaptive micro-batching support
Central hub for managing models and deployment process via Web UI and APIs
Modular and flexible design making it adaptable to your infrastrcuture

BentoML is a framework for serving, managing, and deploying machine learning models. It is aiming to bridge the gap between Data Science and DevOps, and enable teams to deliver prediction services in a fast, repeatable, and scalable way. Before reading this example project, be sure to check out the Getting started guide to learn about the basic concepts in BentoML.

This notebook demonstrates how to export your PyTorch model the serve with BentoML, building a Docker Images that has GPU supports. Please refers to GPU Serving guides for more information.

This is an extension of BentoML's PyTorch with GPU Serving. Please refers to that tutorial before going with forward.

In [1]:

%reload_ext autoreload
%autoreload 2

In [2]:

!pip install -q bentoml torch==1.8.1+cu111 torchtext==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html onnxruntime-gpu onnx

In [3]:

!cp -r ../../pytorch/news-classification-gpu/model/ .

In [4]:

import torch
from torch import nn
from torchtext.datasets import AG_NEWS
from torchtext.data.utils import get_tokenizer
from collections import Counter
from torchtext.vocab import Vocab


from bentoml import BentoService, api, env, artifacts
from bentoml.adapters import JsonInput, JsonOutput
from bentoml.frameworks.onnx import OnnxModelArtifact
from bentoml.service.artifacts.pickle import PickleArtifact

import onnx
from onnxruntime.capi.onnxruntime_pybind11_state import InvalidArgument

Convert our PyTorch model to ONNX format¶

We need to define our PyTorch model and some helpers functions, refers to BentoML's PyTorch with GPU Serving

In [5]:

# https://www.onnxruntime.ai/python/auto_examples/plot_common_errors.html

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

EMSIZE = 64


class TextClassificationModel(nn.Module):

    def __init__(self, vocab_size, embed_dim, num_class):
        super().__init__()
        self.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse=True)
        self.fc = nn.Linear(embed_dim, num_class)
        self.offsets = torch.tensor([0]).to(device)
        self.init_weights()

    def init_weights(self):
        init_range = 0.5
        self.embedding.weight.data.uniform_(-init_range, init_range)
        self.fc.weight.data.uniform_(-init_range, init_range)
        self.fc.bias.data.zero_()

    def forward(self, text):
        embedded = self.embedding(text, offsets=self.offsets)
        return self.fc(embedded)


def get_tokenizer_vocab(dataset=AG_NEWS, tokenizer_fn='basic_english', root_data_dir='dataset'):
    print('Getting tokenizer and vocab...')
    tokenizer = get_tokenizer(tokenizer_fn)
    train_ = dataset(root=root_data_dir, split='train')
    counter = Counter()
    for (label, line) in train_:
        counter.update(tokenizer(line))
    vocab = Vocab(counter, min_freq=1)
    return tokenizer, vocab


def get_model_params(vocab):
    print('Setup model params...')
    train_iter = AG_NEWS(root='dataset', split='train')
    num_class = len(set([label for (label, text) in train_iter]))
    vocab_size = len(vocab)
    return vocab_size, EMSIZE, num_class

Define our BentoService¶

Please refers to our GPU Serving guide to setup your environment correctly.

We will be using Docker images provided by BentoML : bentoml/model-server:0.12.1-py38-gpu to prepare our CUDA-enabled images.

Since onnxruntime.InferenceSession only accepts numpy array, refers ONNX API for more information.

We need to convert our torch.Tensor to numpy array with to_numpy below. .detach() is used to make sure that if you have a requires_grad=True tensor the function will convert correctly.

In [6]:

%%writefile bento_svc.py

import torch
from bentoml import BentoService, api, env, artifacts
from bentoml.adapters import JsonInput, JsonOutput
from bentoml.frameworks.onnx import OnnxModelArtifact
from bentoml.service.artifacts.pickle import PickleArtifact
from onnxruntime.capi.onnxruntime_pybind11_state import InvalidArgument


device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")



def get_pipeline(tokenizer, vocab):
    print('Setup pipeline...')
    text_pipeline = lambda x: [vocab[token] for token in tokenizer(x)]
    label_pipeline = lambda x: int(x) - 1
    return text_pipeline, label_pipeline

def to_numpy(tensor):
    return tensor.detach().cpu().clone().numpy() if tensor.requires_grad else tensor.cpu().clone().numpy()


@env(infer_pip_packages=False, pip_packages=['onnxruntime-gpu'], requirements_txt_file="./requirements.txt", docker_base_image="bentoml/model-server:0.12.1-py38-gpu")
@artifacts(
    [OnnxModelArtifact('model', backend='onnxruntime-gpu'), PickleArtifact('tokenizer'), PickleArtifact('vocab')])
class OnnxService(BentoService):
    def __init__(self):
        super().__init__()
        self.news_label = {1: 'World',
                           2: 'Sports',
                           3: 'Business',
                           4: 'Sci/Tec'}

    def classify_categories(self, sentence):
        text_pipeline, _ = get_pipeline(self.artifacts.tokenizer, self.artifacts.vocab)
        text = to_numpy(torch.tensor(text_pipeline(sentence)).to(device))
        tensor_name = self.artifacts.model.get_inputs()[0].name
        output_name = self.artifacts.model.get_outputs()[0].name
        onnx_inputs = {tensor_name: text}
        print(f'providers: {self.artifacts.model.get_providers()}')

        try:
            r = self.artifacts.model.run([output_name], onnx_inputs)[0]
            return r.argmax(1).item() + 1
        except (RuntimeError, InvalidArgument) as e:
            print(f"ERROR with shape: {onnx_inputs[tensor_name].shape} - {e}")

    @api(input=JsonInput(), output=JsonOutput())
    def predict(self, parsed_json):
        sentence = parsed_json.get('text')
        return {'categories': self.news_label[self.classify_categories(sentence)]}

Overwriting bento_svc.py

Pack our BentoService¶

In [7]:

from bento_svc import OnnxService

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
onnx_model_path = "model/pytorch_model.onnx"


tokenizer, vocab = get_tokenizer_vocab()
vocab_size, embedding_size, num_class = get_model_params(vocab)
model = TextClassificationModel(vocab_size, embedding_size, num_class).to(device)
model.load_state_dict(torch.load("model/pytorch_model.pt"))
model.eval()

# convert our dummy inputs to torch.cuda.LongTensor
print("\nExporting torch model to onnx...")
inp = torch.rand(vocab_size).long().to(device)

# set our dynamic_axes to vocab_size since our inputs for news piece can vary.  Users have to make sure that variables name in dynamic_axes match our dummy ones
# e.g: since we define our vocab_size as our size for dummy inputs, dynamic_axes parameters have to follow as shown below.
torch.onnx.export(model, inp, onnx_model_path, export_params=True, opset_version=11, do_constant_folding=True,
                  input_names=["input"], output_names=["output"],
                  dynamic_axes={"input": {0: "vocab_size"}, "output": {0: "vocab_size"}})

[2021-06-04 13:31:18,292] WARNING - Ignoring pip_packages as requirements_txt_file is set.
[2021-06-04 13:31:18,405] WARNING - Using BentoML installed in `editable` model, the local BentoML repository including all code changes will be packaged together with saved bundle created, under the './bundled_pip_dependencies' directory of the saved bundle.
[2021-06-04 13:31:18,556] INFO - Using user specified docker base image: `bentoml/model-server:0.12.1-py38-gpu`, usermust make sure that the base image either has Python 3.8 or conda installed.
Getting tokenizer and vocab...
Setup model params...

Exporting torch model to onnx...

In [8]:

print("\n Loading model to check...")
onnx_model = onnx.load(onnx_model_path)
onnx.checker.check_model(onnx_model)

# check will returns nothing if our ONNX model is valid.
bento_svc = OnnxService()
bento_svc.pack("model", onnx_model_path)
bento_svc.pack("tokenizer", tokenizer)
bento_svc.pack("vocab", vocab)
saved_path = bento_svc.save()

 Loading model to check...
[2021-06-04 13:31:28,129] WARNING - pip package requirement `onnxruntime` not found in current python environment
[2021-06-04 13:31:30,729] INFO - Detected non-PyPI-released BentoML installed, copying local BentoML modulefiles to target saved bundle path..

/home/aarnphm/.pyenv/versions/3.8.8/lib/python3.8/site-packages/setuptools/distutils_patch.py:25: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first.
  warnings.warn(
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyo' found anywhere in distribution
warning: no previously-included files matching '.git' found anywhere in distribution
warning: no previously-included files matching '.ipynb_checkpoints' found anywhere in distribution
warning: no previously-included files matching '__pycache__' found anywhere in distribution
no previously-included directories found matching 'e2e_tests'
no previously-included directories found matching 'tests'
no previously-included directories found matching 'benchmark'

UPDATING BentoML-0.12.1+53.g9d8b599/bentoml/_version.py
set BentoML-0.12.1+53.g9d8b599/bentoml/_version.py to '0.12.1+53.g9d8b599'
[2021-06-04 13:31:35,111] WARNING - Saved BentoService bundle version mismatch: loading BentoService bundle create with BentoML version 0.12.1, but loading from BentoML version 0.12.1+53.g9d8b599
[2021-06-04 13:31:35,361] INFO - BentoService bundle 'OnnxService:20210604133128_5B736C' saved to: /home/aarnphm/bentoml/repository/OnnxService/20210604133128_5B736C

REST API Model Serving¶

To start a REST API model server with the BentoService save above, use the serve command:

In [9]:

!bentoml serve OnnxService:latest

[2021-06-04 13:31:36,615] INFO - Getting latest version OnnxService:20210604133128_5B736C
[2021-06-04 13:31:36,639] INFO - Starting BentoML API proxy in development mode..
[2021-06-04 13:31:36,642] INFO - Starting BentoML API server in development mode..
[2021-06-04 13:31:36,674] WARNING - Using BentoML installed in `editable` model, the local BentoML repository including all code changes will be packaged together with saved bundle created, under the './bundled_pip_dependencies' directory of the saved bundle.
[2021-06-04 13:31:36,674] WARNING - Using BentoML installed in `editable` model, the local BentoML repository including all code changes will be packaged together with saved bundle created, under the './bundled_pip_dependencies' directory of the saved bundle.
[2021-06-04 13:31:36,749] WARNING - Saved BentoService bundle version mismatch: loading BentoService bundle create with BentoML version 0.12.1, but loading from BentoML version 0.12.1+53.g9d8b599
[2021-06-04 13:31:36,750] WARNING - Saved BentoService bundle version mismatch: loading BentoService bundle create with BentoML version 0.12.1, but loading from BentoML version 0.12.1+53.g9d8b599
[2021-06-04 13:31:36,752] INFO - Your system nofile limit is 4096, which means each instance of microbatch service is able to hold this number of connections at same time. You can increase the number of file descriptors for the server process, or launch more microbatch instances to accept more concurrent connection.
======== Running on http://0.0.0.0:5000 ========
(Press CTRL+C to quit)
[2021-06-04 13:31:37,433] WARNING - Ignoring pip_packages as requirements_txt_file is set.
[2021-06-04 13:31:37,481] INFO - Using user specified docker base image: `bentoml/model-server:0.12.1-py38-gpu`, usermust make sure that the base image either has Python 3.8 or conda installed.
[2021-06-04 13:31:37,585] WARNING - pip package requirement `onnxruntime` not found in current python environment
 * Serving Flask app 'OnnxService' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
INFO:werkzeug: * Running on http://127.0.0.1:56609/ (Press CTRL+C to quit)
INFO:werkzeug:127.0.0.1 - - [04/Jun/2021 13:31:55] "GET / HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [04/Jun/2021 13:31:55] "GET /static_content/main.css HTTP/1.1" 304 -
INFO:werkzeug:127.0.0.1 - - [04/Jun/2021 13:31:55] "GET /static_content/swagger-ui.css HTTP/1.1" 304 -
INFO:werkzeug:127.0.0.1 - - [04/Jun/2021 13:31:55] "GET /static_content/readme.css HTTP/1.1" 304 -
INFO:werkzeug:127.0.0.1 - - [04/Jun/2021 13:31:55] "GET /static_content/swagger-ui-bundle.js HTTP/1.1" 304 -
INFO:werkzeug:127.0.0.1 - - [04/Jun/2021 13:31:55] "GET /static_content/marked.min.js HTTP/1.1" 304 -
INFO:werkzeug:127.0.0.1 - - [04/Jun/2021 13:31:55] "GET /docs.json HTTP/1.1" 200 -
Setup pipeline...
[2021-06-04 13:32:05,997] INFO - Initializing onnxruntime InferenceSession from onnx file:'/home/aarnphm/bentoml/repository/OnnxService/20210604133128_5B736C/OnnxService/artifacts/model.onnx'
2021-06-04 13:32:06.254352914 [W:onnxruntime:, graph_utils.cc:121 CanUpdateImplicitInputNameInSubgraphs]  Implicit input name 17 cannot be safely updated to 12 in one of the subgraphs.
2021-06-04 13:32:06.254434141 [W:onnxruntime:, graph_utils.cc:121 CanUpdateImplicitInputNameInSubgraphs]  Implicit input name 17 cannot be safely updated to 12 in one of the subgraphs.
2021-06-04 13:32:06.254626241 [W:onnxruntime:, graph_utils.cc:121 CanUpdateImplicitInputNameInSubgraphs]  Implicit input name 17 cannot be safely updated to 12 in one of the subgraphs.
providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
[2021-06-04 13:32:07,835] INFO - {'service_name': 'OnnxService', 'service_version': '20210604133128_5B736C', 'api': 'predict', 'task': {'data': '{"text":"WASHINGTON — President Biden offered a series of concessions to try to secure a $1 trillion infrastructure deal with Senate Republicans in an Oval Office meeting this week, narrowing both his spending and tax proposals as negotiations barreled into the final days of what could be an improbable agreement or a blame game that escalates quickly."}', 'task_id': '8596ba67-807c-4e87-8198-8a882438144e', 'http_headers': (('Host', 'localhost:5000'), ('User-Agent', 'Mozilla/5.0 (X11; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0'), ('Accept', '*/*'), ('Accept-Language', 'en-US,en;q=0.5'), ('Accept-Encoding', 'gzip, deflate'), ('Referer', 'http://localhost:5000/'), ('Content-Type', 'application/json'), ('Origin', 'http://localhost:5000'), ('Content-Length', '357'), ('Dnt', '1'), ('Connection', 'keep-alive'), ('Cookie', 'username-localhost-8888="2|1:0|10:1622788250|23:username-localhost-8888|44:Yzg2OTNlMWRmYjBjNDY2NjkyNzNkZmUxYjQ2YjZkYmM=|df53891cd11d2c1ee8e5ec8603e79f4c37b67d178711e1778ac907d8afb5acf0"; _xsrf=2|33d87053|2d1249d56e8ad5d63c884ea9a243cc2a|1622740800; username-localhost-8889="2|1:0|10:1622781169|23:username-localhost-8889|44:MDYwN2JlMDFmYWMyNGRlNmIwOWRjOTNjZDI5MmMwOWQ=|9dbbfeba548b71497371eb9688b6aa246ca19720c5e2b043cfbaa436a77ce27a"'))}, 'result': {'data': '{"categories": "Business"}', 'http_status': 200, 'http_headers': (('Content-Type', 'application/json'),)}, 'request_id': '8596ba67-807c-4e87-8198-8a882438144e'}
INFO:werkzeug:127.0.0.1 - - [04/Jun/2021 13:32:07] "POST /predict HTTP/1.1" 200 -
^C

In [10]:

!nvidia-smi

Fri Jun  4 13:32:15 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.31       Driver Version: 465.31       CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   70C    P2    29W /  N/A |    819MiB /  6078MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1191      G   /usr/lib/Xorg                       4MiB |
|    0   N/A  N/A    518032      C   ...sions/3.8.8/bin/python3.8      811MiB |
+-----------------------------------------------------------------------------+

If you are running this notebook from Google Colab, start the dev server with --run-with-ngrok option to gain access to the API endpoint via a public endpoint managed by ngrok:

In [ ]:

!bentoml serve PyTorchFashionClassifier:latest --run-with-ngrok

Containerize our model server with Docker¶

One common way of distributing this model API server for production deployment, is via Docker containers. And BentoML provides a convenient way to do that.

Note that docker is not available in Google Colab. You will need to download and run this notebook locally to try out this containerization with docker feature.

If you already have docker configured, simply run the follow command to product a docker container serving the ONNXService with GPU prediction service created above:

In [11]:

!bentoml containerize OnnxService:latest -t onnx-service-gpu:latest

[2021-06-04 13:32:22,759] INFO - Getting latest version OnnxService:20210604133128_5B736C
Found Bento: /home/aarnphm/bentoml/repository/OnnxService/20210604133128_5B736C
[2021-06-04 13:32:22,787] WARNING - Using BentoML installed in `editable` model, the local BentoML repository including all code changes will be packaged together with saved bundle created, under the './bundled_pip_dependencies' directory of the saved bundle.
[2021-06-04 13:32:22,854] WARNING - Saved BentoService bundle version mismatch: loading BentoService bundle create with BentoML version 0.12.1, but loading from BentoML version 0.12.1+53.g9d8b599
Containerizing OnnxService:20210604133128_5B736C with local YataiService and docker daemon from local environment|^C

In [12]:

!docker run --gpus all --device /dev/nvidia0 --device /dev/nvidiactl --device /dev/nvidia-modeset --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools -p 5000:5000 onnx-service-gpu

[2021-06-04 06:32:34,031] INFO - Starting BentoML proxy in production mode..
[2021-06-04 06:32:34,032] INFO - Starting BentoML API server in production mode..
[2021-06-04 06:32:34,046] INFO - Running micro batch service on :5000
[2021-06-04 06:32:34 +0000] [20] [INFO] Starting gunicorn 20.1.0
[2021-06-04 06:32:34 +0000] [20] [INFO] Listening at: http://0.0.0.0:54499 (20)
[2021-06-04 06:32:34 +0000] [20] [INFO] Using worker: sync
[2021-06-04 06:32:34 +0000] [21] [INFO] Booting worker with pid: 21
[2021-06-04 06:32:34,062] WARNING - Using BentoML not from official PyPI release. In order to find the same version of BentoML when deploying your BentoService, you must set the 'core/bentoml_deploy_version' config to a http/git location of your BentoML fork, e.g.: 'bentoml_deploy_version = git+https://github.com/{username}/bentoml.git@{branch}'
[2021-06-04 06:32:34,081] WARNING - Saved BentoService bundle version mismatch: loading BentoService bundle create with BentoML version 0.12.1, but loading from BentoML version 0.12.1+53.g9d8b599
[2021-06-04 06:32:34 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2021-06-04 06:32:34 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2021-06-04 06:32:34 +0000] [1] [INFO] Using worker: aiohttp.worker.GunicornWebWorker
[2021-06-04 06:32:34 +0000] [22] [INFO] Booting worker with pid: 22
[2021-06-04 06:32:34,179] WARNING - Using BentoML not from official PyPI release. In order to find the same version of BentoML when deploying your BentoService, you must set the 'core/bentoml_deploy_version' config to a http/git location of your BentoML fork, e.g.: 'bentoml_deploy_version = git+https://github.com/{username}/bentoml.git@{branch}'
[2021-06-04 06:32:34,195] WARNING - Saved BentoService bundle version mismatch: loading BentoService bundle create with BentoML version 0.12.1, but loading from BentoML version 0.12.1+53.g9d8b599
[2021-06-04 06:32:34,197] INFO - Your system nofile limit is 1048576, which means each instance of microbatch service is able to hold this number of connections at same time. You can increase the number of file descriptors for the server process, or launch more microbatch instances to accept more concurrent connection.
[2021-06-04 06:32:36,330] WARNING - Ignoring pip_packages as requirements_txt_file is set.
[2021-06-04 06:32:36,377] INFO - Using user specified docker base image: `bentoml/model-server:0.12.1-py38-gpu`, usermust make sure that the base image either has Python 3.8 or conda installed.
[2021-06-04 06:32:36,578] WARNING - pip package requirement `onnxruntime` not found in current python environment
Setup pipeline...
[2021-06-04 06:32:50,294] INFO - Initializing onnxruntime InferenceSession from onnx file:'./OnnxService/artifacts/model.onnx'
2021-06-04 06:32:50.602900911 [W:onnxruntime:, graph_utils.cc:121 CanUpdateImplicitInputNameInSubgraphs]  Implicit input name 17 cannot be safely updated to 12 in one of the subgraphs.
2021-06-04 06:32:50.603000976 [W:onnxruntime:, graph_utils.cc:121 CanUpdateImplicitInputNameInSubgraphs]  Implicit input name 17 cannot be safely updated to 12 in one of the subgraphs.
2021-06-04 06:32:50.603301562 [W:onnxruntime:, graph_utils.cc:121 CanUpdateImplicitInputNameInSubgraphs]  Implicit input name 17 cannot be safely updated to 12 in one of the subgraphs.
providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
[2021-06-04 06:32:52,649] INFO - {'service_name': 'OnnxService', 'service_version': '20210604132022_031FC3', 'api': 'predict', 'task': {'data': '{"text":"WASHINGTON — President Biden offered a series of concessions to try to secure a $1 trillion infrastructure deal with Senate Republicans in an Oval Office meeting this week, narrowing both his spending and tax proposals as negotiations barreled into the final days of what could be an improbable agreement or a blame game that escalates quickly."}', 'task_id': '6ade33c6-0b8f-4988-90dc-6c9eb9e6b5c7', 'http_headers': (('Host', 'localhost:5000'), ('User-Agent', 'Mozilla/5.0 (X11; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0'), ('Accept', '*/*'), ('Accept-Language', 'en-US,en;q=0.5'), ('Accept-Encoding', 'gzip, deflate'), ('Referer', 'http://localhost:5000/'), ('Content-Type', 'application/json'), ('Origin', 'http://localhost:5000'), ('Content-Length', '357'), ('Dnt', '1'), ('Connection', 'keep-alive'), ('Cookie', 'username-localhost-8888="2|1:0|10:1622788250|23:username-localhost-8888|44:Yzg2OTNlMWRmYjBjNDY2NjkyNzNkZmUxYjQ2YjZkYmM=|df53891cd11d2c1ee8e5ec8603e79f4c37b67d178711e1778ac907d8afb5acf0"; _xsrf=2|33d87053|2d1249d56e8ad5d63c884ea9a243cc2a|1622740800; username-localhost-8889="2|1:0|10:1622781169|23:username-localhost-8889|44:MDYwN2JlMDFmYWMyNGRlNmIwOWRjOTNjZDI5MmMwOWQ=|9dbbfeba548b71497371eb9688b6aa246ca19720c5e2b043cfbaa436a77ce27a"'))}, 'result': {'data': '{"categories": "Business"}', 'http_status': 200, 'http_headers': (('Content-Type', 'application/json'),)}, 'request_id': '6ade33c6-0b8f-4988-90dc-6c9eb9e6b5c7'}
^C
[2021-06-04 06:32:56 +0000] [1] [INFO] Handling signal: int
[2021-06-04 06:32:56 +0000] [22] [INFO] Worker exiting (pid: 22)
[2021-06-04 06:32:56 +0000] [20] [INFO] Handling signal: term
[2021-06-04 06:32:56 +0000] [21] [INFO] Worker exiting (pid: 21)

Deployment Options¶

If you are at a small team with limited engineering or DevOps resources, try out automated deployment with BentoML CLI, currently supporting AWS Lambda, AWS SageMaker, and Azure Functions:

If the cloud platform you are working with is not on the list above, try out these step-by-step guide on manually deploying BentoML packaged model to cloud platforms:

Lastly, if you have a DevOps or ML Engineering team who's operating a Kubernetes or OpenShift cluster, use the following guides as references for implementating your deployment strategy:

In [ ]: