Tutorial : Building an AWS SageMaker End-to-end Workflow with BentoML

BentoML makes moving trained ML models to production easy:

  • Package models trained with any ML framework and reproduce them for model serving in production
  • Deploy anywhere for online API serving or offline batch serving
  • High-Performance API model server with adaptive micro-batching support
  • Central hub for managing models and deployment process via Web UI and APIs
  • Modular and flexible design making it adaptable to your infrastrcuture

BentoML is a framework for serving, managing, and deploying machine learning models. It is aiming to bridge the gap between Data Science and DevOps, and enable teams to deliver prediction services in a fast, repeatable, and scalable way.



This tutorial provides an end-to-end guide using BentoML with AWS SageMaker -- a machine learning model training platform. It demonstrates the workflow of integrating BentoML with SageMaker, including: setting up a SageMaker notebook instance, model training, creating an S3 bucket, uploading the BentoService bundle into S3, and deploying the BentoML packaged model to SageMaker as an API endpoint using the BentoML CLI tool.

For demonstration, this tutorial uses the IMDB movie review sentiment dataset with BERT and Tensorflow 2.0.(please note: the following model is a modification of the original version: https://github.com/kpe/bert-for-tf2/blob/master/examples/gpu_movie_reviews.ipynb)


  • demonstrates an end-to-end workflow of using BentoML with AWS SageMaker
  • deploys and tests API endpoints to the cloud
  • provides two ways of API server local testing using BentoML CLI tool


1. Create a SageMaker notebook instance

For model training in SageMaker, we will start by creating a notebook instance. After logging into the AWS management console -- type SageMaker to launch the service. From the SageMaker dashboard, select Notebook instances. Go ahead enter a notebook name and select the instance type


Next,under Permissions and encryption, select Create a new role or choosing an existing role. This allows both the notebook instance and user to access and upload data to Amazon S3. Then, select Any S3 bucket, which allows your SageMaker to access all S3 buckets in your account.


After the notebook instance is created, the status will change from pending to InService. Select Open Jupyter under Actions, and choose Conda_python 3 under New tab to launch the Jupyter notebook within SageMaker.

Note: SageMaker also provides a local model through pip install SageMaker.


Finally to prepare for the model training, let's import some libraries -- Boto3 and SageMaker and set up the IAM role. Boto3 is the AWS SDK for Python, which makes it easier to integrate our model with AWS services such as Amazon S3

In [ ]:
import boto3, sagemaker
from sagemaker import get_execution_role

# Define IAM role
role = get_execution_role()
prefix = 'sagemaker/bert-moviereview-bento'
my_region = boto3.session.Session().region_name # set the region of the instance

In this step, we will create an S3 bucket named movie-review-dataset to store the dataset. Users could click on the bucket name and upload the dataset directly into S3. Alternatively, for cost-efficiency, users could train the model locally using the SageMaker local mode

In [7]:
bucket_name = 'movie-review-dataset'
s3 = boto3.resource('s3')


2. Model Training -- Movie review sentiment with BERT and TensorFlow2

The second step of this tutorial is model training. We will be using the IMDB movie review dataset to create a sentiment analysis model which contains 25K positive and negative movie reviews each. First, let's install the bert-for-tf2 package.

In [ ]:
!pip install -q bentoml bert-for-tf2>=0.14.5
Collecting bert-for-tf2
  Downloading https://files.pythonhosted.org/packages/35/5c/6439134ecd17b33fe0396fb0b7d6ce3c5a120c42a4516ba0e9a2d6e43b25/bert-for-tf2-0.14.4.tar.gz (40kB)
     |████████████████████████████████| 40kB 2.3MB/s 
Collecting py-params>=0.9.6
  Downloading https://files.pythonhosted.org/packages/a4/bf/c1c70d5315a8677310ea10a41cfc41c5970d9b37c31f9c90d4ab98021fd1/py-params-0.9.7.tar.gz
Collecting params-flow>=0.8.0
  Downloading https://files.pythonhosted.org/packages/a9/95/ff49f5ebd501f142a6f0aaf42bcfd1c192dc54909d1d9eb84ab031d46056/params-flow-0.8.2.tar.gz
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from params-flow>=0.8.0->bert-for-tf2) (1.18.5)
Requirement already satisfied: tqdm in /usr/local/lib/python3.6/dist-packages (from params-flow>=0.8.0->bert-for-tf2) (4.41.1)
Building wheels for collected packages: bert-for-tf2, py-params, params-flow
  Building wheel for bert-for-tf2 (setup.py) ... done
  Created wheel for bert-for-tf2: filename=bert_for_tf2-0.14.4-cp36-none-any.whl size=30114 sha256=5f3374b29261a7a31e5ddeb2d661d5c326dd68e720d3897502670561b6fd2f74
  Stored in directory: /root/.cache/pip/wheels/cf/3f/4d/79d7735015a5f523648df90d871ce8e89a7df8185f7703eeab
  Building wheel for py-params (setup.py) ... done
  Created wheel for py-params: filename=py_params-0.9.7-cp36-none-any.whl size=7302 sha256=3454cb699e6561be3b0787bd7b0d6e9af9a4ed711b6d11138b3ea7b11f96bf20
  Stored in directory: /root/.cache/pip/wheels/67/f5/19/b461849a50aefdf4bab47c4756596e82ee2118b8278e5a1980
  Building wheel for params-flow (setup.py) ... done
  Created wheel for params-flow: filename=params_flow-0.8.2-cp36-none-any.whl size=19473 sha256=a43985d8a6541a645abc4514701aab9390d0a9d6c5840291c172d6e82a83f6a5
  Stored in directory: /root/.cache/pip/wheels/08/c8/7f/81c86b9ff2b86e2c477e3914175be03e679e596067dc630c06
Successfully built bert-for-tf2 py-params params-flow
Installing collected packages: py-params, params-flow, bert-for-tf2
Successfully installed bert-for-tf2-0.14.4 params-flow-0.8.2 py-params-0.9.7
In [5]:
import bert
from bert import BertModelLayer
from bert.loader import StockBertConfig, map_stock_config_to_params, load_stock_weights
from bert.tokenization.bert_tokenization import FullTokenizer
In [3]:
import os
import re
import sys
import math
import datetime
import pandas as pd 
import numpy as np
import tensorflow as tf
from tensorflow import keras
In [4]:
print ('Tensorflow: ', tf.__version__)
print ('Python: ', sys.version)
Tensorflow:  2.2.0
Python:  3.7.6 (default, Jan  8 2020, 13:42:34) 
[Clang 4.0.1 (tags/RELEASE_401/final)]

2.1 Download Movie Review Data and BERT Weights

Here, we will download, extracts and import the IMDB large movie review dataset.

In [ ]:
from tensorflow import keras 
import os 
import re

# load all files from the directory into a dataframe
def load_directory_data(directory):
    data ={}
    data['sentence'] = []
    data['sentiment'] = []
    for file_path in os.listdir(directory): 
        with tf.io.gfile.GFile(os.path.join(directory, file_path), 'r') as f: 
    return pd.DataFrame.from_dict(data)

# combine positive and negative reviews into a dataframe; add a polarity column 
def load_dataset(directory):
    pos_df = load_directory_data(os.path.join(directory, 'pos'))
    neg_df = load_directory_data(os.path.join(directory,'neg'))
    pos_df['polarity'] = 1
    neg_df['polarity'] = 0
    return pd.concat([pos_df, neg_df]).sample(frac=1).reset_index(drop=True)

# download dataset from link
def download_and_load_datasets(force_download=False):
    dataset = tf.keras.utils.get_file(
        fname = 'acImbd.tar.gz',
        origin = 'http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz',
        extract = True)
    train_df = load_dataset(os.path.join(os.path.dirname(dataset),"aclImdb",'train'))
    test_df = load_dataset(os.path.join(os.path.dirname(dataset),"aclImdb",'test'))

    return train_df, test_df

Let's use the MovieReviewData class below, to prepare/encode the data for feeding into our BERT model, by:

  • tokenizing the text
  • trim or pad it to a max_seq_len length
  • append the special tokens [CLS] and [SEP]
  • convert the string tokens to numerical IDs using the original model's token encoding from vocab.txt
In [ ]:
class MovieReviewData:
    DATA_COLUMN = 'sentence'
    LABEL_COLUMN = 'polarity'
    def __init__(self, tokenizer: FullTokenizer, sample_size =None, max_seq_len =1024):
        self.tokenizer = tokenizer
        self.sample_size = sample_size
        self.max_seq_len = 0
        train, test = download_and_load_datasets()

        train,test =map(lambda df: df.reindex(df[MovieReviewData.DATA_COLUMN].str.len().sort_values().index),[train,test])

        if sample_size is not None:
            train, test = train.head(sample_size), test.head(sample_size)
        ((self.train_x, self.train_y),
        (self.test_x, self.test_y)) = map(self._prepare, [train,test])

        print('max_seq_len', self.max_seq_len)
        self.max_seq_len = min(self.max_seq_len,max_seq_len)
        ((self.train_x, self.train_x_token_types),
        (self.test_x, self.test_x_token_types)) = map(self._pad,[self.train_x , self.test_x])
    def _prepare(self,df):
        x,y =[],[]
        with tqdm(total =df.shape[0], unit_scale=True) as pbar:
            for ndx, row in df.iterrows():
                text, label = row[MovieReviewData.DATA_COLUMN], row[MovieReviewData.LABEL_COLUMN]
                tokens = self.tokenizer.tokenize(text)
                tokens = ['[CLS]'] + tokens + ['[SEP]']
                token_ids = self.tokenizer.convert_tokens_to_ids(tokens)
                self.max_seq_len = max(self.max_seq_len , len(token_ids))

    return np.array(x),np.array(y)

    def _pad(self,ids):
        x,t = [],[]
        token_type_ids = [0] * self.max_seq_len
        for input_ids in ids:
            input_ids = input_ids[:min(len(input_ids),self.max_seq_len - 2)]
            input_ids = input_ids + [0] * (self.max_seq_len - len(input_ids))
        return np.array(x), np.array(t)

This tutorial uses the pre-trained BERT model -- BERT-Base,Uncased, which could be downloaded here. https://github.com/google-research/bert. Users could save the BERT weights in the S3 bucket created early or use the local SageMaker enviroment and save the weights locally.

In [ ]:
assert_path = 'asset'
bert_model_name = 'uncased_L-12_H-768_A-12'
bert_ckpt_dir = os.path.join(assert_path , bert_model_name)
bert_ckpt_file = os.path.join(bert_ckpt_dir, 'bert_model.ckpt')
bert_config_file = os.path.join(bert_ckpt_dir, 'bert_config.json')

After getting both the IMDB movie review data and the BERT weights ready, we will use the S3 bucket to store these data by directly uploading them.