BentoML makes moving trained ML models to production easy:
BentoML is a framework for serving, managing, and deploying machine learning models. It is aiming to bridge the gap between Data Science and DevOps, and enable teams to deliver prediction services in a fast, repeatable, and scalable way. Before reading this example project, be sure to check out the Getting started guide to learn about the basic concepts in BentoML.
This notebook demonstrates how to use BentoML to turn a PyTorch model into a docker image containing a REST API server serving this model, how to use your ML service built with BentoML as a CLI tool, and how to distribute it a pypi package.
This example was built based on https://github.com/baldassarreFe/zalando-pytorch/blob/master/notebooks/4.0-fb-autoencoder.ipynb, if you are familiar with this, jump start to Model Serving using BentoML
%reload_ext autoreload
%autoreload 2
%matplotlib inline
!pip install -q bentoml "torch==1.6.0" "torchvision==0.7.0" "sklearn>=0.23.2" "pillow==7.2.0" "pandas>=1.1.1" "numpy>=1.16.0"
import bentoml
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms
from torch.autograd import Variable
from sklearn.manifold import TSNE
from sklearn.metrics import accuracy_score
print("Torch version: ", torch.__version__)
print("CUDA: ", torch.cuda.is_available())
PyTorch supports FashionMNIST now, so we can import it directly.
from torchvision.datasets import FashionMNIST
FASHION_MNIST_CLASSES = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
Load train and test set in batches of 1000.
The 28x28
images are scaled up to 29x29
so that combining convolutions and transposed convolutions would not chop off pixels from the reconstructed images.
batch_size = 1000
train_dataset = FashionMNIST(
'../data', train=True, download=True,
transform=transforms.Compose([transforms.CenterCrop((29, 29)), transforms.ToTensor()]))
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_dataset = FashionMNIST(
'../data', train=False, download=True,
transform=transforms.Compose([transforms.CenterCrop((29, 29)), transforms.ToTensor()]))
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=True)
Note that in this section we'll never use the image labels, the whole training is unsupervised.
The two components of the autoencoder are defined subclassing nn.Module
, that gives more flexibility than nn.Sequential
.
A series of convolutions with kernel_size=5
and stride=2
is used to squeeze the images into a volume of 40x1x1, then a fully connected layer turns this vector in a vector of size embedding_size
, that can be specified externally.
The decoder takes up from where the encoder left, first transforming back the embedding of size embedding_size
into a volume of size 40x1x1, then applying a series of Transposed Convolutions to yield an image of the same size of the original input.
At this time we can show some images in this Dataloader.
class Encoder(nn.Module):
def __init__(self, embedding_size):
super(Encoder, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5, stride=2)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5, stride=2)
self.conv3 = nn.Conv2d(20, 40, kernel_size=5, stride=2)
self.fully = nn.Linear(40, embedding_size)
def forward(self, x):
# 1x29x29
x = torch.relu(self.conv1(x))
# 10x13x13
x = torch.relu(self.conv2(x))
# 20x5x5
x = torch.relu(self.conv3(x))
# 40x1x1
x = x.view(x.data.shape[0], 40)
# 40
x = self.fully(x)
# output_size
return x
class Decoder(nn.Module):
def __init__(self, input_size):
super(Decoder, self).__init__()
self.fully = nn.Linear(input_size, 40)
self.conv1 = nn.ConvTranspose2d(40, 20, kernel_size=5, stride=2)
self.conv2 = nn.ConvTranspose2d(20, 10, kernel_size=5, stride=2)
self.conv3 = nn.ConvTranspose2d(10, 1, kernel_size=5, stride=2)
def forward(self, x):
x = self.fully(x)
x = x.view(x.data.shape[0], 40, 1, 1)
x = torch.relu(self.conv1(x))
x = torch.relu(self.conv2(x))
x = torch.sigmoid(self.conv3(x))
return x
We are going to use an embedding size of 20, this number has no particular reason, except that it is in the same range of the number of classes. Naively, the network could learn to encode coarse-grained information (i.e. the kind of dress) in half of the embedding vector and then use the other half for fine-grained information.
embedding_size = 20
encoder = Encoder(embedding_size)
decoder = Decoder(embedding_size)
autoencoder = nn.Sequential(encoder, decoder)
A 29x29 black and white image passed through the autoencoder should give the same output dimension
x = Variable(torch.ones(1, 1, 29, 29))
e = encoder(x)
d = decoder(e)
print('Input\t ', list(x.data.shape))
print('Embedding', list(e.data.shape))
print('Output\t ', list(d.data.shape))
autoencoder.train()
loss_fn = nn.MSELoss()
optimizer = optim.Adam(autoencoder.parameters())
epoch_loss = []
for epoch in range(5):
batch_loss = []
for batch_num, (data, _) in enumerate(train_loader):
data = Variable(data)
optimizer.zero_grad()
output = autoencoder(data)
loss = loss_fn(output, data)
loss.backward()
optimizer.step()
batch_loss.append(loss.item())
epoch_loss.append(sum(batch_loss) / len(batch_loss))
print('Epoch {}:\tloss {:.4f}'.format(epoch, epoch_loss[-1]))
plt.plot(epoch_loss)
plt.title('Final value {:.4f}'.format(epoch_loss[-1]))
plt.xlabel('Epoch')
plt.grid(True)
Reconsruction evaluation on a single batch
autoencoder.eval()
data, targets = next(test_loader.__iter__())
encodings = encoder(Variable(data))
outputs = decoder(encodings)
print('Test loss: {:.4f}'.format(loss_fn(outputs, Variable(data)).item()))
fig, axes = plt.subplots(8, 8, figsize=(16, 16))
axes = axes.ravel()
zip_these = axes[::2], axes[1::2], data.numpy().squeeze(), outputs.data.numpy().squeeze(), targets
for ax1, ax2, original, reconstructed, target in zip(*zip_these):
ax1.imshow(original, cmap='gray')
ax1.axis('off')
ax1.set_title(FASHION_MNIST_CLASSES[target])
ax2.imshow(reconstructed, cmap='gray')
ax2.axis('off')
The embeddings are 20-dimensional, t-SNE is used to visualize them as clusters in 2D space.
Even though the autoencoder learned the embeddings in a completely unsupervised way we can observe the emergence of clusters:
pca = TSNE(n_components=2)
encodings_2 = pca.fit_transform(encodings.data.numpy())
plt.figure(figsize=(10, 10))
for k in range(len(FASHION_MNIST_CLASSES)):
class_indexes = (targets.numpy() == k)
plt.scatter(encodings_2[class_indexes, 0], encodings_2[class_indexes, 1], label=FASHION_MNIST_CLASSES[k])
plt.legend();
Once trained in an unsupervised fashion, the encoder module can be used to generate fashion embeddings (see what I did here?), that can then be used to train a simple classifier on the original labels.
The weights of the encoder are freezed, so only the classifier will be trained.
(later on, when the classifier starts performing decently, we could unfreeze them and do some fine-tuning)
for param in encoder.parameters():
param.requires_grad = False
classifier = nn.Sequential(
encoder,
nn.Linear(embedding_size, 15),
nn.ReLU(),
nn.Linear(15, len(FASHION_MNIST_CLASSES)),
nn.LogSoftmax()
)
classifier.train()
loss_fn = nn.NLLLoss()
optimizer = optim.Adam([p for p in classifier.parameters() if p.requires_grad])
epoch_loss = []
for epoch in range(5):
batch_loss = []
for batch_num, (data, targets) in enumerate(train_loader):
data, targets = Variable(data), Variable(targets)
optimizer.zero_grad()
output = classifier(data)
loss = loss_fn(output, targets)
loss.backward()
optimizer.step()
batch_loss.append(loss.item())
epoch_loss.append(sum(batch_loss) / len(batch_loss))
accuracy = accuracy_score(targets.data.numpy(), output.data.numpy().argmax(axis=1))
print('Epoch {}:\tloss {:.4f}\taccuracy {:.2%}'.format(epoch, epoch_loss[-1], accuracy))
plt.plot(epoch_loss)
plt.title('Final value {:.4f}'.format(epoch_loss[-1]))
plt.xlabel('Epoch')
plt.grid(True)
Reconsruction evaluation on a single batch
classifier.eval()
data, targets = next(test_loader.__iter__())
outputs = classifier(Variable(data))
log_probs, output_classes = outputs.max(dim=1)
accuracy = accuracy_score(targets.numpy(), output_classes.data.numpy())
print('Accuracy: {:.2%}'.format(accuracy))
fig, axex = plt.subplots(8, 8, figsize=(16, 16))
zip_these = axex.ravel(), log_probs.data.exp(), output_classes.data, targets, data.numpy().squeeze()
for ax, prob, output_class, target, img in zip(*zip_these):
ax.imshow(img, cmap='gray' if output_class == target else 'autumn')
ax.axis('off')
ax.set_title('{} {:.1%}'.format(FASHION_MNIST_CLASSES[output_class], prob))
%%writefile pytorch_fashion_mnist.py
from typing import BinaryIO, List
import bentoml
from PIL import Image
import torch
from torchvision import transforms
from bentoml.frameworks.pytorch import PytorchModelArtifact
from bentoml.adapters import FileInput, JsonOutput
FASHION_MNIST_CLASSES = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
@bentoml.env(pip_packages=['torch', 'numpy', 'torchvision', 'scikit-learn'])
@bentoml.artifacts([PytorchModelArtifact('classifier')])
class PyTorchFashionClassifier(bentoml.BentoService):
@bentoml.utils.cached_property # reuse transformer
def transform(self):
return transforms.Compose([transforms.CenterCrop((29, 29)), transforms.ToTensor()])
@bentoml.api(input=FileInput(), output=JsonOutput(), batch=True)
def predict(self, file_streams: List[BinaryIO]) -> List[str]:
img_tensors = []
for fs in file_streams:
img = Image.open(fs).convert(mode="L").resize((28, 28))
img_tensors.append(self.transform(img))
outputs = self.artifacts.classifier(torch.stack(img_tensors))
_, output_classes = outputs.max(dim=1)
return [FASHION_MNIST_CLASSES[output_class] for output_class in output_classes]
# 1) import the custom BentoService defined above
from pytorch_fashion_mnist import PyTorchFashionClassifier
# 2) `pack` it with required artifacts
bento_svc = PyTorchFashionClassifier()
bento_svc.pack('classifier', classifier)
# 3) save your BentoSerivce
saved_path = bento_svc.save()
To start a REST API model server with the BentoService saved above, use the bentoml serve command:
!bentoml serve PyTorchFashionClassifier:latest
If you are running this notebook from Google Colab, you can start the dev server with --run-with-ngrok
option, to gain acccess to the API endpoint via a public endpoint managed by ngrok:
!bentoml serve PyTorchFashionClassifier:latest --run-with-ngrok
Sending POST request from termnial:
curl -X POST "http://127.0.0.1:5000/predict" -F image=@sample_image.png
curl -X POST "http://127.0.0.1:5000/predict" -H "Content-Type: image/png" --data-binary @sample_image.png
Go visit http://127.0.0.1:5000/ from your browser, click /predict
-> Try it out
-> Choose File
-> Execute
to sumbit an image from your computer
One common way of distributing this model API server for production deployment, is via Docker containers. And BentoML provides a convenient way to do that.
Note that docker is not available in Google Colab. You will need to download and run this notebook locally to try out this containerization with docker feature.
If you already have docker configured, simply run the follow command to product a docker container serving the IrisClassifier prediction service created above:
!bentoml containerize PyTorchFashionClassifier:latest -t pytorch-fashion-mnist:latest
!docker run -p 5000:5000 pytorch-fashion-mnist
BentoML cli supports loading and running a packaged model from CLI. With the DataframeInput adapter, the CLI command supports reading input Dataframe data from CLI argument or local csv or json files:
!bentoml run PyTorchFashionClassifier:latest predict --input-file sample_image.png
If you are at a small team with limited engineering or DevOps resources, try out automated deployment with BentoML CLI, currently supporting AWS Lambda, AWS SageMaker, and Azure Functions:
If the cloud platform you are working with is not on the list above, try out these step-by-step guide on manually deploying BentoML packaged model to cloud platforms:
Lastly, if you have a DevOps or ML Engineering team who's operating a Kubernetes or OpenShift cluster, use the following guides as references for implementating your deployment strategy: