BentoML Example

Titanic Survival Prediction with LightBGM

This is a BentoML Demo Project demonstrating how to package and serve LightBGM model for production using BentoML.

BentoML is an open source platform for machine learning model serving and deployment.

Let's get started! Impression

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")
In [2]:
!pip install bentoml
!pip install lightgbm numpy pandas
In [3]:
import pandas as pd
import numpy as np
import lightgbm as lgb
from sklearn.model_selection import train_test_split
import bentoml

Prepare Dataset

download dataset from https://www.kaggle.com/c/titanic/data

In [6]:
!mkdir data
!curl https://raw.githubusercontent.com/agconti/kaggle-titanic/master/data/train.csv -o ./data/train.csv
!curl https://raw.githubusercontent.com/agconti/kaggle-titanic/master/data/test.csv -o ./data/test.csv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 60302  100 60302    0     0   168k      0 --:--:-- --:--:-- --:--:--  168k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 28210  100 28210    0     0  83461      0 --:--:-- --:--:-- --:--:-- 83215
In [7]:
train_df = pd.read_csv('./data/train.csv')
test_df = pd.read_csv('./data/test.csv')
train_df.head()
Out[7]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
In [8]:
y = train_df.pop('Survived')
cols = ['Pclass', 'Age', 'Fare', 'SibSp', 'Parch']
X_train, X_test, y_train, y_test = train_test_split(train_df[cols], 
                                                    y, 
                                                    test_size=0.2, 
                                                    random_state=42)
In [9]:
# Create an LGBM dataset for training
train_data = lgb.Dataset(data=X_train[cols],
                        label=y_train)

# Create an LGBM dataset from the test
test_data = lgb.Dataset(data=X_test[cols],
                        label=y_test)

Model Training

In [10]:
lgb_params = {
    'boosting': 'dart',          # dart (drop out trees) often performs better
    'application': 'binary',     # Binary classification
    'learning_rate': 0.05,       # Learning rate, controls size of a gradient descent step
    'min_data_in_leaf': 20,      # Data set is quite small so reduce this a bit
    'feature_fraction': 0.7,     # Proportion of features in each boost, controls overfitting
    'num_leaves': 41,            # Controls size of tree since LGBM uses leaf wise splits
    'metric': 'binary_logloss',  # Area under ROC curve as the evaulation metric
    'drop_rate': 0.15
              }

evaluation_results = {}
model = lgb.train(train_set=train_data,
                params=lgb_params,
                valid_sets=[train_data, test_data], 
                valid_names=['Train', 'Test'],
                evals_result=evaluation_results,
                num_boost_round=500,
                early_stopping_rounds=100,
                verbose_eval=20
                )
[20]	Train's binary_logloss: 0.55215	Test's binary_logloss: 0.587358
[40]	Train's binary_logloss: 0.510164	Test's binary_logloss: 0.559348
[60]	Train's binary_logloss: 0.500602	Test's binary_logloss: 0.551635
[80]	Train's binary_logloss: 0.490215	Test's binary_logloss: 0.547154
[100]	Train's binary_logloss: 0.486812	Test's binary_logloss: 0.547076
[120]	Train's binary_logloss: 0.479242	Test's binary_logloss: 0.542552
[140]	Train's binary_logloss: 0.469847	Test's binary_logloss: 0.539319
[160]	Train's binary_logloss: 0.471384	Test's binary_logloss: 0.542278
[180]	Train's binary_logloss: 0.453052	Test's binary_logloss: 0.535512
[200]	Train's binary_logloss: 0.442048	Test's binary_logloss: 0.533921
[220]	Train's binary_logloss: 0.436788	Test's binary_logloss: 0.534261
[240]	Train's binary_logloss: 0.427196	Test's binary_logloss: 0.532026
[260]	Train's binary_logloss: 0.420145	Test's binary_logloss: 0.531791
[280]	Train's binary_logloss: 0.413336	Test's binary_logloss: 0.527412
[300]	Train's binary_logloss: 0.406546	Test's binary_logloss: 0.529314
[320]	Train's binary_logloss: 0.402753	Test's binary_logloss: 0.525075
[340]	Train's binary_logloss: 0.39979	Test's binary_logloss: 0.523438
[360]	Train's binary_logloss: 0.403024	Test's binary_logloss: 0.525361
[380]	Train's binary_logloss: 0.398387	Test's binary_logloss: 0.528122
[400]	Train's binary_logloss: 0.394841	Test's binary_logloss: 0.529159
[420]	Train's binary_logloss: 0.390478	Test's binary_logloss: 0.528173
[440]	Train's binary_logloss: 0.384254	Test's binary_logloss: 0.526504
[460]	Train's binary_logloss: 0.381594	Test's binary_logloss: 0.525863
[480]	Train's binary_logloss: 0.371362	Test's binary_logloss: 0.527399
[500]	Train's binary_logloss: 0.369978	Test's binary_logloss: 0.525418
In [11]:
test_df['pred'] = model.predict(test_df[cols])
test_df[['Pclass', 'Age', 'Fare', 'SibSp', 'Parch','pred']].iloc[10:].head(2)
Out[11]:
Pclass Age Fare SibSp Parch pred
10 3 NaN 7.8958 0 0 0.052353
11 1 46.0 26.0000 0 0 0.308877

Create BentoService for model serving

In [12]:
%%writefile lightbgm_titanic_bento_service.py

import lightgbm as lgb

import bentoml
from bentoml.artifact import LightGBMModelArtifact
from bentoml.handlers import DataframeHandler

@bentoml.artifacts([LightGBMModelArtifact('model')])
@bentoml.env(pip_dependencies=['lightgbm'])
class TitanicSurvivalPredictionService(bentoml.BentoService):
    
    @bentoml.api(DataframeHandler)
    def predict(self, df):
        data = df[['Pclass', 'Age', 'Fare', 'SibSp', 'Parch']]
        return self.artifacts.model.predict(data)
Writing lightbgm_titanic_bento_service.py

Save BentoML service archive

In [13]:
# 1) import the custom BentoService defined above
from lightbgm_titanic_bento_service import TitanicSurvivalPredictionService

# 2) `pack` it with required artifacts
bento_service = TitanicSurvivalPredictionService.pack(model = model)

# 3) save your BentoSerivce
saved_path = bento_service.save()
[2019-11-25 23:58:44,756] INFO - BentoService bundle 'TitanicSurvivalPredictionService:20191125235823_AC541E' created at: /private/var/folders/dc/dtsln2wx0s3202znr340xfdr0000gn/T/bentoml-temp-h0g7kdf8
[2019-11-25 23:58:45,304] INFO - BentoService bundle 'TitanicSurvivalPredictionService:20191125235823_AC541E' created at: /Users/hongjian/bentoml/repository/TitanicSurvivalPredictionService/20191125235823_AC541E

Load saved BentoService for serving

In [15]:
import bentoml

bento_model = bentoml.load(saved_path)

result = bento_model.predict(test_df)
test_df['pred'] = result
test_df[['Pclass', 'Age', 'Fare', 'SibSp', 'Parch','pred']].iloc[10:].head(2)
[2019-11-25 23:58:54,631] WARNING - Module `lightbgm_titanic_bento_service` already loaded, using existing imported module.
Out[15]:
Pclass Age Fare SibSp Parch pred
10 3 NaN 7.8958 0 0 0.052353
11 1 46.0 26.0000 0 0 0.308877

Model Serving via REST API

In your termnial, run the following command to start the REST API server:

In [16]:
!bentoml serve {saved_path}
/Users/hongjian/opt/anaconda3/lib/python3.7/site-packages/lightgbm/__init__.py:48: UserWarning: Starting from version 2.2.1, the library file in distribution wheels for macOS is built by the Apple Clang (Xcode_8.3.3) compiler.
This means that in case of installing LightGBM from PyPI via the ``pip install lightgbm`` command, you don't need to install the gcc compiler anymore.
Instead of that, you need to install the OpenMP library, which is required for running LightGBM on the system with the Apple Clang compiler.
You can install the OpenMP library by the following command: ``brew install libomp``.
  "You can install the OpenMP library by the following command: ``brew install libomp``.", UserWarning)
 * Serving Flask app "TitanicSurvivalPredictionService" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [25/Nov/2019 23:59:20] "POST /predict HTTP/1.1" 200 -
^C

Copy following command to make a curl request to Rest API server

curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '[{"Pclass": 1, "Age": 30, "Fare": 200, "SibSp": 1, "Parch": 0}]' \
localhost:5000/predict
In [ ]: