BentoML Example: H2O Loan Default Prediction

BentoML makes moving trained ML models to production easy:

  • Package models trained with any ML framework and reproduce them for model serving in production
  • Deploy anywhere for online API serving or offline batch serving
  • High-Performance API model server with adaptive micro-batching support
  • Central hub for managing models and deployment process via Web UI and APIs
  • Modular and flexible design making it adaptable to your infrastrcuture

BentoML is a framework for serving, managing, and deploying machine learning models. It is aiming to bridge the gap between Data Science and DevOps, and enable teams to deliver prediction services in a fast, repeatable, and scalable way.

Before reading this example project, be sure to check out the Getting started guide to learn about the basic concepts in BentoML.

This notebook demonstrates how to use BentoML to turn a H2O model into a docker image containing a REST API server serving this model, as well as distributing your model as a command line tool or a pip-installable PyPI package.

The notebook was built based on: https://github.com/kguruswamy/H2O3-Driverless-AI-Code-Examples/blob/master/Lending%20Club%20Data%20-%20H2O3%20Auto%20ML%20-%20Python%20Tutorial.ipynb

Impression

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")
In [ ]:
!pip install -q bentoml "h2o>=3.24.0.2" "xlrd>=1.2.0" "sklearn>=0.23.2" "pandas>=1.1.1" "numpy>=1.18.4"
In [2]:
import h2o
import bentoml
import numpy as np
import pandas as pd

import requests
import math
from sklearn import model_selection

h2o.init(strict_version_check=False)
Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: java version "9.0.1"; Java(TM) SE Runtime Environment (build 9.0.1+11); Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)
  Starting server from /usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /var/folders/kn/xnc9k74x03567n1mx2tfqnpr0000gn/T/tmpm34g1lnd
  JVM stdout: /var/folders/kn/xnc9k74x03567n1mx2tfqnpr0000gn/T/tmpm34g1lnd/h2o_bozhaoyu_started_from_python.out
  JVM stderr: /var/folders/kn/xnc9k74x03567n1mx2tfqnpr0000gn/T/tmpm34g1lnd/h2o_bozhaoyu_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.
H2O cluster uptime: 02 secs
H2O cluster timezone: America/Los_Angeles
H2O data parsing timezone: UTC
H2O cluster version: 3.24.0.2
H2O cluster version age: 1 year, 5 months and 5 days !!!
H2O cluster name: H2O_from_python_bozhaoyu_392ekt
H2O cluster total nodes: 1
H2O cluster free memory: 4 Gb
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster status: accepting new members, healthy
H2O connection url: http://127.0.0.1:54321
H2O connection proxy: None
H2O internal security: False
H2O API Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4
Python version: 3.7.3 final

Prepare Dataset

In [3]:
%%bash

# Download training dataset
if [ ! -f ./LoanStats3c.csv.zip ]; then
    curl -O https://resources.lendingclub.com/LoanStats3c.csv.zip
fi
In [4]:
pd.set_option('expand_frame_repr', True)
pd.set_option('max_colwidth',9999)
pd.set_option('display.max_columns',9999)
pd.set_option('display.max_rows',9999)

data_dictionary = pd.read_excel("https://resources.lendingclub.com/LCDataDictionary.xlsx")
data_dictionary
Out[4]:
LoanStatNew Description
0 acc_now_delinq The number of accounts on which the borrower is now delinquent.
1 acc_open_past_24mths Number of trades opened in past 24 months.
2 addr_state The state provided by the borrower in the loan application
3 all_util Balance to credit limit on all trades
4 annual_inc The self-reported annual income provided by the borrower during registration.
5 annual_inc_joint The combined self-reported annual income provided by the co-borrowers during registration
6 application_type Indicates whether the loan is an individual application or a joint application with two co-borrowers
7 avg_cur_bal Average current balance of all accounts
8 bc_open_to_buy Total open to buy on revolving bankcards.
9 bc_util Ratio of total current balance to high credit/credit limit for all bankcard accounts.
10 chargeoff_within_12_mths Number of charge-offs within 12 months
11 collection_recovery_fee post charge off collection fee
12 collections_12_mths_ex_med Number of collections in 12 months excluding medical collections
13 delinq_2yrs The number of 30+ days past-due incidences of delinquency in the borrower's credit file for the past 2 years
14 delinq_amnt The past-due amount owed for the accounts on which the borrower is now delinquent.
15 desc Loan description provided by the borrower
16 dti A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
17 dti_joint A ratio calculated using the co-borrowers' total monthly payments on the total debt obligations, excluding mortgages and the requested LC loan, divided by the co-borrowers' combined self-reported monthly income
18 earliest_cr_line The month the borrower's earliest reported credit line was opened
19 emp_length Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.
20 emp_title The job title supplied by the Borrower when applying for the loan.*
21 fico_range_high The upper boundary range the borrower’s FICO at loan origination belongs to.
22 fico_range_low The lower boundary range the borrower’s FICO at loan origination belongs to.
23 funded_amnt The total amount committed to that loan at that point in time.
24 funded_amnt_inv The total amount committed by investors for that loan at that point in time.
25 grade LC assigned loan grade
26 home_ownership The home ownership status provided by the borrower during registration or obtained from the credit report. Our values are: RENT, OWN, MORTGAGE, OTHER
27 id A unique LC assigned ID for the loan listing.
28 il_util Ratio of total current balance to high credit/credit limit on all install acct
29 initial_list_status The initial listing status of the loan. Possible values are – W, F
30 inq_fi Number of personal finance inquiries
31 inq_last_12m Number of credit inquiries in past 12 months
32 inq_last_6mths The number of inquiries in past 6 months (excluding auto and mortgage inquiries)
33 installment The monthly payment owed by the borrower if the loan originates.
34 int_rate Interest Rate on the loan
35 issue_d The month which the loan was funded
36 last_credit_pull_d The most recent month LC pulled credit for this loan
37 last_fico_range_high The upper boundary range the borrower’s last FICO pulled belongs to.
38 last_fico_range_low The lower boundary range the borrower’s last FICO pulled belongs to.
39 last_pymnt_amnt Last total payment amount received
40 last_pymnt_d Last month payment was received
41 loan_amnt The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
42 loan_status Current status of the loan
43 max_bal_bc Maximum current balance owed on all revolving accounts
44 member_id A unique LC assigned Id for the borrower member.
45 mo_sin_old_il_acct Months since oldest bank installment account opened
46 mo_sin_old_rev_tl_op Months since oldest revolving account opened
47 mo_sin_rcnt_rev_tl_op Months since most recent revolving account opened
48 mo_sin_rcnt_tl Months since most recent account opened
49 mort_acc Number of mortgage accounts.
50 mths_since_last_delinq The number of months since the borrower's last delinquency.
51 mths_since_last_major_derog Months since most recent 90-day or worse rating
52 mths_since_last_record The number of months since the last public record.
53 mths_since_rcnt_il Months since most recent installment accounts opened
54 mths_since_recent_bc Months since most recent bankcard account opened.
55 mths_since_recent_bc_dlq Months since most recent bankcard delinquency
56 mths_since_recent_inq Months since most recent inquiry.
57 mths_since_recent_revol_delinq Months since most recent revolving delinquency.
58 next_pymnt_d Next scheduled payment date
59 num_accts_ever_120_pd Number of accounts ever 120 or more days past due
60 num_actv_bc_tl Number of currently active bankcard accounts
61 num_actv_rev_tl Number of currently active revolving trades
62 num_bc_sats Number of satisfactory bankcard accounts
63 num_bc_tl Number of bankcard accounts
64 num_il_tl Number of installment accounts
65 num_op_rev_tl Number of open revolving accounts
66 num_rev_accts Number of revolving accounts
67 num_rev_tl_bal_gt_0 Number of revolving trades with balance >0
68 num_sats Number of satisfactory accounts
69 num_tl_120dpd_2m Number of accounts currently 120 days past due (updated in past 2 months)
70 num_tl_30dpd Number of accounts currently 30 days past due (updated in past 2 months)
71 num_tl_90g_dpd_24m Number of accounts 90 or more days past due in last 24 months
72 num_tl_op_past_12m Number of accounts opened in past 12 months
73 open_acc The number of open credit lines in the borrower's credit file.
74 open_acc_6m Number of open trades in last 6 months
75 open_il_12m Number of installment accounts opened in past 12 months
76 open_il_24m Number of installment accounts opened in past 24 months
77 open_act_il Number of currently active installment trades
78 open_rv_12m Number of revolving trades opened in past 12 months
79 open_rv_24m Number of revolving trades opened in past 24 months
80 out_prncp Remaining outstanding principal for total amount funded
81 out_prncp_inv Remaining outstanding principal for portion of total amount funded by investors
82 pct_tl_nvr_dlq Percent of trades never delinquent
83 percent_bc_gt_75 Percentage of all bankcard accounts > 75% of limit.
84 policy_code publicly available policy_code=1\nnew products not publicly available policy_code=2
85 pub_rec Number of derogatory public records
86 pub_rec_bankruptcies Number of public record bankruptcies
87 purpose A category provided by the borrower for the loan request.
88 pymnt_plan Indicates if a payment plan has been put in place for the loan
89 recoveries post charge off gross recovery
90 revol_bal Total credit revolving balance
91 revol_util Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.
92 sub_grade LC assigned loan subgrade
93 tax_liens Number of tax liens
94 term The number of payments on the loan. Values are in months and can be either 36 or 60.
95 title The loan title provided by the borrower
96 tot_coll_amt Total collection amounts ever owed
97 tot_cur_bal Total current balance of all accounts
98 tot_hi_cred_lim Total high credit/credit limit
99 total_acc The total number of credit lines currently in the borrower's credit file
100 total_bal_ex_mort Total credit balance excluding mortgage
101 total_bal_il Total current balance of all installment accounts
102 total_bc_limit Total bankcard high credit/credit limit
103 total_cu_tl Number of finance trades
104 total_il_high_credit_limit Total installment high credit/credit limit
105 total_pymnt Payments received to date for total amount funded
106 total_pymnt_inv Payments received to date for portion of total amount funded by investors
107 total_rec_int Interest received to date
108 total_rec_late_fee Late fees received to date
109 total_rec_prncp Principal received to date
110 total_rev_hi_lim Total revolving high credit/credit limit
111 url URL for the LC page with listing data.
112 verification_status Indicates if income was verified by LC, not verified, or if the income source was verified
113 verified_status_joint Indicates if the co-borrowers' joint income was verified by LC, not verified, or if the income source was verified
114 zip_code The first 3 numbers of the zip code provided by the borrower in the loan application.
115 revol_bal_joint Sum of revolving credit balance of the co-borrowers, net of duplicate balances
116 sec_app_fico_range_low FICO range (high) for the secondary applicant
117 sec_app_fico_range_high FICO range (low) for the secondary applicant
118 sec_app_earliest_cr_line Earliest credit line at time of application for the secondary applicant
119 sec_app_inq_last_6mths Credit inquiries in the last 6 months at time of application for the secondary applicant
120 sec_app_mort_acc Number of mortgage accounts at time of application for the secondary applicant
121 sec_app_open_acc Number of open trades at time of application for the secondary applicant
122 sec_app_revol_util Ratio of total current balance to high credit/credit limit for all revolving accounts
123 sec_app_open_act_il Number of currently active installment trades at time of application for the secondary applicant
124 sec_app_num_rev_accts Number of revolving accounts at time of application for the secondary applicant
125 sec_app_chargeoff_within_12_mths Number of charge-offs within last 12 months at time of application for the secondary applicant
126 sec_app_collections_12_mths_ex_med Number of collections within last 12 months excluding medical collections at time of application for the secondary applicant
127 sec_app_mths_since_last_major_derog Months since most recent 90-day or worse rating at time of application for the secondary applicant
128 hardship_flag Flags whether or not the borrower is on a hardship plan
129 hardship_type Describes the hardship plan offering
130 hardship_reason Describes the reason the hardship plan was offered
131 hardship_status Describes if the hardship plan is active, pending, canceled, completed, or broken
132 deferral_term Amount of months that the borrower is expected to pay less than the contractual monthly payment amount due to a hardship plan
133 hardship_amount The interest payment that the borrower has committed to make each month while they are on a hardship plan
134 hardship_start_date The start date of the hardship plan period
135 hardship_end_date The end date of the hardship plan period
136 payment_plan_start_date The day the first hardship plan payment is due. For example, if a borrower has a hardship plan period of 3 months, the start date is the start of the three-month period in which the borrower is allowed to make interest-only payments.
137 hardship_length The number of months the borrower will make smaller payments than normally obligated due to a hardship plan
138 hardship_dpd Account days past due as of the hardship plan start date
139 hardship_loan_status Loan Status as of the hardship plan start date
140 orig_projected_additional_accrued_interest The original projected additional interest amount that will accrue for the given hardship payment plan as of the Hardship Start Date. This field will be null if the borrower has broken their hardship payment plan.
141 hardship_payoff_balance_amount The payoff balance amount as of the hardship plan start date
142 hardship_last_payment_amount The last payment amount as of the hardship plan start date
143 disbursement_method The method by which the borrower receives their loan. Possible values are: CASH, DIRECT_PAY
144 debt_settlement_flag Flags whether or not the borrower, who has charged-off, is working with a debt-settlement company.
145 debt_settlement_flag_date The most recent date that the Debt_Settlement_Flag has been set
146 settlement_status The status of the borrower’s settlement plan. Possible values are: COMPLETE, ACTIVE, BROKEN, CANCELLED, DENIED, DRAFT
147 settlement_date The date that the borrower agrees to the settlement plan
148 settlement_amount The loan amount that the borrower has agreed to settle for
149 settlement_percentage The settlement amount as a percentage of the payoff balance amount on the loan
150 settlement_term The number of months that the borrower will be on the settlement plan
151 NaN NaN
152 NaN * Employer Title replaces Employer Name for all loans listed after 9/23/2013
In [5]:
# Very first row has non-header data and hence skipping it. Read to a data frame
# Fix the Mon-Year on one column to be readable

def parse_dates(x):
    return datetime.strptime(x, "%b-%d")

lc = pd.read_csv("LoanStats3c.csv.zip", skiprows=1,verbose=False, parse_dates=['issue_d'],low_memory=False) 
lc.shape
Out[5]:
(235631, 144)
In [6]:
lc.loan_status.unique()
Out[6]:
array(['Fully Paid', 'Charged Off', 'Current', 'In Grace Period',
       'Late (31-120 days)', 'Default', 'Late (16-30 days)', nan],
      dtype=object)
In [7]:
# Keep just "Fully Paid" and "Charged Off" to make it a simple 'Yes' or 'No' - binary classification problem

lc = lc[lc.loan_status.isin(['Fully Paid','Charged Off'])]
lc.loan_status.unique()
Out[7]:
array(['Fully Paid', 'Charged Off'], dtype=object)
In [8]:
# Drop the columns from the data frame that are Target Leakage ones
# Target Leakage columns are generally created in hindsight by analysts/data engineers/operations after an outcome 
# was detected in historical data. If we don't remove them now, they would climb to the top of the feature list after a model is built and 
# falsely increase the accuracy to 95% :) 
#
# In Production or real life scoring environment, don't expect these columns to be available at scoring time
# , that is,when someone applies for a loan. So we don't train on those columns ...

ignored_cols = [ 
                'out_prncp',                 # Remaining outstanding principal for total amount funded
                'out_prncp_inv',             # Remaining outstanding principal for portion of total amount 
                                             # funded by investors
                'total_pymnt',               # Payments received to date for total amount funded
                'total_pymnt_inv',           # Payments received to date for portion of total amount 
                                             # funded by investors
                'total_rec_prncp',           # Principal received to date 
                'total_rec_int',             # Interest received to date
                'total_rec_late_fee',        # Late fees received to date
                'recoveries',                # post charge off gross recovery
                'collection_recovery_fee',   # post charge off collection fee
                'last_pymnt_d',              # Last month payment was received
                'last_pymnt_amnt',           # Last total payment amount received
                'next_pymnt_d',              # Next scheduled payment date
                'last_credit_pull_d',        # The most recent month LC pulled credit for this loan
                'settlement_term',           # The number of months that the borrower will be on the settlement plan
                'settlement_date',           # The date that the borrower agrees to the settlement plan
                'settlement_amount',         # The loan amount that the borrower has agreed to settle for
                'settlement_percentage',     # The settlement amount as a percentage of the payoff balance amount on the loan
                'settlement_status',         # The status of the borrower’s settlement plan. Possible values are: 
                                             # COMPLETE, ACTIVE, BROKEN, CANCELLED, DENIED, DRAF
                'debt_settlement_flag',      # Flags whether or not the borrower, who has charged-off, is working with 
                                             # a debt-settlement company.
                'debt_settlement_flag_date'  # The most recent date that the Debt_Settlement_Flag has been set
                ]

lc = lc.drop(columns=ignored_cols, axis = 1)
In [9]:
# After dropping Target Leakage columns, we have 223K rows and 125 columns
lc.shape
Out[9]:
(235543, 124)
In [10]:
import csv
import os 

train_path = os.getcwd() + "/train_lc.csv.zip"
test_path = os.getcwd() + "/test_lc.csv.zip"

train_lc, test_lc = model_selection.train_test_split(lc, test_size=0.2, random_state=10,stratify=lc['loan_status'])
train_lc.to_csv(train_path, index=False,compression="zip")
test_lc.to_csv(test_path, index=False,compression="zip")
print('Train LC shape', train_lc.shape)
print('Test LC shape', test_lc.shape)

# These two CSV files were created in the previous section
train_path = os.getcwd()+"/train_lc.csv.zip"
test_path = os.getcwd()+ "/test_lc.csv.zip"

train = h2o.load_dataset(train_path)
test = h2o.load_dataset(test_path)


train.describe()
Train LC shape (188434, 124)
Test LC shape (47109, 124)
Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%
Rows:188434
Cols:124


id member_id loan_amnt funded_amnt funded_amnt_inv term int_rate installment grade sub_grade emp_title emp_length home_ownership annual_inc verification_status issue_d loan_status pymnt_plan url desc purpose title zip_code addr_state dti delinq_2yrs earliest_cr_line inq_last_6mths mths_since_last_delinq mths_since_last_record open_acc pub_rec revol_bal revol_util total_acc initial_list_status collections_12_mths_ex_med mths_since_last_major_derog policy_code application_type annual_inc_joint dti_joint verification_status_joint acc_now_delinq tot_coll_amt tot_cur_bal open_acc_6m open_act_il open_il_12m open_il_24m mths_since_rcnt_il total_bal_il il_util open_rv_12m open_rv_24m max_bal_bc all_util total_rev_hi_lim inq_fi total_cu_tl inq_last_12m acc_open_past_24mths avg_cur_bal bc_open_to_buy bc_util chargeoff_within_12_mths delinq_amnt mo_sin_old_il_acct mo_sin_old_rev_tl_op mo_sin_rcnt_rev_tl_op mo_sin_rcnt_tl mort_acc mths_since_recent_bc mths_since_recent_bc_dlq mths_since_recent_inq mths_since_recent_revol_delinq num_accts_ever_120_pd num_actv_bc_tl num_actv_rev_tl num_bc_sats num_bc_tl num_il_tl num_op_rev_tl num_rev_accts num_rev_tl_bal_gt_0 num_sats num_tl_120dpd_2m num_tl_30dpd num_tl_90g_dpd_24m num_tl_op_past_12m pct_tl_nvr_dlq percent_bc_gt_75 pub_rec_bankruptcies tax_liens tot_hi_cred_lim total_bal_ex_mort total_bc_limit total_il_high_credit_limit revol_bal_joint sec_app_earliest_cr_line sec_app_inq_last_6mths sec_app_mort_acc sec_app_open_acc sec_app_revol_util sec_app_open_act_il sec_app_num_rev_accts sec_app_chargeoff_within_12_mths sec_app_collections_12_mths_ex_med sec_app_mths_since_last_major_derog hardship_flag hardship_type hardship_reason hardship_status deferral_term hardship_amount hardship_start_date hardship_end_date payment_plan_start_date hardship_length hardship_dpd hardship_loan_status orig_projected_additional_accrued_interest hardship_payoff_balance_amount hardship_last_payment_amount
type int int int int int enum real real enum enum enum enum enum real enum time enum enum int stringenum enum enum enum real int time int int int int int int real int enum int int int enum int int int int int int int int int int int int int int int int int int int int int int int int real int int int int int int int int int int int int int int int int int int int int int int int int int real real int int int int int int int int int int int int int int int int int enum enum enum enum int real time time time int int enum real real real
mins NaN NaN 1000.0 1000.0 950.0 0.06 23.36 3000.0 1388534400000.0 NaN NaN 0.0 0.0 -820540800000.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 1.0 NaN NaN NaN 0.0 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 NaN NaN NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 16.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.0 1.47 1485907200000.0 1491004800000.0 1485907200000.0 3.0 0.0 4.41 174.15 0.04
mean 0.0 0.0 14884.78048016811814884.78048016811814879.984769203027 0.13768163813324627 443.02308527123586 74842.1021655859 1403772635222.944 0.0 NaN 18.0384319178067440.3439559739749727878010488536.0388 0.757559676066950733.40950190769873 70.73781512605042 11.6715773161955880.222459853317342316517.3175170085670.5562211189754319 26.019354256662876 0.015474914293598825 42.4452214452214 1.0 0.0 0.0 0.0 0.0056200048823460734280.02962841100884139916.782602926220.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 30777.3117271830120.0 0.0 0.0 4.403568358151916 13425.63404110787 8488.266061096774 64.592774200472150.010783616544784914 9.796506999798337128.53483025519225 185.81975121262616 13.078101616481097 8.0036458388613421.853216510820764 24.440105868864123 39.5963653177332 6.918806067907228 35.46866324059305 0.5053122048038038 3.686240275109585 5.803241453240905 4.646629589139977 8.54615939798552 8.57417451203075 8.27700945689208615.3042179224556125.767600326904917 11.6223186898330350.00095513605199453280.00365114575925788330.09438848615430329 2.0082309986520546 94.2433769914134850.68730479940775 0.13524098623390737 0.05543054862710553170413.4813303332848481.2246834436 20070.11836505092 39932.08656611851 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.0 114.215738636363641509994881818.182 1516242681818.182 1510936200000.0 3.0 14.59375 339.8909328358209 7947.003380681818 187.48076704545457
maxs NaN NaN 35000.0 35000.0 35000.0 0.2606 1409.99 7500000.0 1417392000000.0 NaN NaN 39.99 22.0 1320105600000.0 6.0 188.0 121.0 84.0 63.0 2560703.0 8.923 156.0 20.0 188.0 1.0 NaN NaN NaN 3.0 9152545.0 3840795.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 9999999.0 NaN NaN NaN 53.0 497484.0 260250.0 255.2 7.0 65000.0 561.0 842.0 372.0 226.0 37.0 616.0 170.0 25.0 180.0 30.0 26.0 38.0 35.0 61.0 150.0 62.0 105.0 38.0 84.0 2.0 3.0 22.0 26.0 100.0 100.0 7.0 63.0 9999999.0 2688920.0 1090700.0 1027358.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.0 344.24 1556668800000.0 1561939200000.0 1556668800000.0 3.0 32.0 1032.72 20321.15 713.04
sigma -0.0 -0.0 8444.529842237767 8444.529842237767 8441.74965734052 0.043236714795126696245.55956150039466 55879.6254552511 8618925772.810982 -0.0 NaN 8.023289119934871 0.9000809312845888235685775429.60028 1.035364539099023 21.777363524527555 28.5075895180931 5.280407394680721 0.605396865550370321598.80009195033 0.2310240222484618611.891471363727664 0.14101201867640875 20.880974652987707 0.0 -0.0 -0.0 -0.0 0.07950349603856371 21174.68581402208 153006.06065658265-0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 38384.00007004573 -0.0 -0.0 -0.0 2.8642976762774732 16026.70277646250213412.41141441076626.424017661426290.1173765114760668 565.411644236004151.34992108739378 93.00236919417999 16.134812986440643 8.7590525517506632.161127563440871830.30743682243691 22.573879361083183 5.9354275780436705 22.304832497335703 1.2700994420156504 2.15271474268570673.14054498864890472.72003942827463654.8126947852769787.30895419633316754.3182236825395088.047168776251766 3.1221446203671164 5.278551769075766 0.03211042594205823 0.06423450317302083 0.49334840182534967 1.6084979574620704 8.48144727683834434.90775600526837 0.3756866400997473 0.4123003808101904 172512.507823174 46113.3699393363 20243.50759606858641490.84884582259 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 0.0 77.0094025673706 14500662214.885572 14471630662.23209 14520502785.954859 0.0 9.499873480938898 240.1814929663525 4637.06552271656 147.17888898625247
zeros 0 0 0 0 0 0 0 0 0 0 0 60 149487 34 100647 222 3 2 155114 448 480 0 185745 37 0 0 0 0 187438 159941 34 0 0 0 0 0 0 0 0 0 0 0 66 0 0 0 7733 31 3801 1595 186638 187742 2 0 2977 3084 72408 1245 97 15431 157 143588 3405 453 1839 330 5758 53 0 448 2 182006 187787 176622 32108 0 32367 164655 181918 3 68 2036 25148 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 68 0 0 0
missing188434188434 0 0 0 0 0 0 0 0 10581 9589 0 0 0 0 0 0 1884341762170 0 0 0 0 0 0 0 92769 155114 0 0 0 101 0 0 0 135238 0 0 188434 188434 188434 0 0 0 188434 188434 188434 188434 188434 188434 188434 188434 188434 188434 188434 0 188434 188434 188434 0 3 1943 2074 0 0 5748 0 0 0 0 1788 138691 17436 120718 0 0 0 0 0 0 0 0 0 0 6261 0 0 0 0 2036 0 0 0 0 0 0 188434 188434 188434 188434 188434 188434 188434 188434 188434 188434 188434 0 188082 188082 188082 188082 188082 188082 188082 188082 188082 188082 188082 188166 188082 188082
0 nan nan 18700.0 18700.0 18700.0 60 months0.1629 457.64 D D2 Assistant Manager 4 years MORTGAGE 52000.0 Not Verified 2014-05-01 00:00:00Fully Paid n nan credit_card Credit card refinancing630xx MO 11.65 0.0 1999-08-01 00:00:005.0 59.0 nan 20.0 0.0 16920.0 0.502 37.0 w 0.0 59.0 1.0 Individual nan nan nan 0.0 0.0 117999.0 nan nan nan nan nan nan nan nan nan nan nan 33700.0 nan nan nan 6.0 6210.0 4113.0 80.4 0.0 0.0 177.0 123.0 1.0 1.0 1.0 8.0 59.0 1.0 59.0 1.0 5.0 6.0 9.0 10.0 17.0 16.0 19.0 6.0 20.0 0.0 0.0 0.0 2.0 91.7 50.0 0.0 0.0 141329.0 51831.0 21000.0 39404.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
1 nan nan 20000.0 20000.0 20000.0 36 months0.0917 637.58 B B1 Engineering 4 years RENT 93000.0 Source Verified 2014-08-01 00:00:00Fully Paid n nan debt_consolidationDebt consolidation 334xx FL 19.15 0.0 1996-09-01 00:00:000.0 43.0 nan 9.0 0.0 10597.0 0.609 24.0 f 1.0 43.0 1.0 Individual nan nan nan 0.0 2305.0 61086.0 nan nan nan nan nan nan nan nan nan nan nan 17400.0 nan nan nan 2.0 6787.0 2303.0 82.1 0.0 0.0 215.0 178.0 33.0 5.0 0.0 33.0 nan 9.0 nan 2.0 3.0 3.0 4.0 9.0 11.0 5.0 12.0 3.0 9.0 0.0 0.0 0.0 1.0 91.7 50.0 0.0 0.0 79108.0 61086.0 12900.0 61708.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
2 nan nan 11000.0 11000.0 11000.0 36 months0.1099 360.08 B B2 Teacher director 10+ years RENT 30000.0 Not Verified 2014-02-01 00:00:00Fully Paid n nan debt_consolidationDebt consolidation 010xx MA 27.84 0.0 1994-11-01 00:00:001.0 nan nan 10.0 0.0 11523.0 0.546 21.0 w 0.0 nan 1.0 Individual nan nan nan 0.0 90.0 17778.0 nan nan nan nan nan nan nan nan nan nan nan 21100.0 nan nan nan 2.0 1778.0 2200.0 71.8 0.0 0.0 133.0 231.0 6.0 6.0 0.0 63.0 nan 6.0 nan 0.0 4.0 9.0 4.0 9.0 5.0 9.0 16.0 9.0 10.0 0.0 0.0 0.0 1.0 100.0 75.0 0.0 0.0 35076.0 17778.0 7800.0 13976.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
3 nan nan 15000.0 15000.0 15000.0 60 months0.1561 361.67 D D1 Accounting Assistant < 1 year RENT 50000.0 Verified 2014-07-01 00:00:00Fully Paid n nan credit_card Credit card refinancing076xx NJ 20.91 0.0 1997-07-01 00:00:000.0 nan 94.0 24.0 1.0 7884.0 0.276 31.0 w 0.0 nan 1.0 Individual nan nan nan 0.0 0.0 34486.0 nan nan nan nan nan nan nan nan nan nan nan 28600.0 nan nan nan 14.0 1499.0 5285.0 57.0 0.0 0.0 148.0 203.0 3.0 1.0 0.0 10.0 nan 7.0 nan 0.0 3.0 7.0 4.0 6.0 11.0 16.0 20.0 7.0 24.0 0.0 0.0 0.0 11.0 100.0 25.0 1.0 0.0 88501.0 34486.0 12300.0 59901.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
4 nan nan 6250.0 6250.0 6250.0 36 months0.0949 200.18 B B2 Full Time Active Duty Military10+ years MORTGAGE 78000.0 Source Verified 2014-11-01 00:00:00Fully Paid n nan debt_consolidationDebt consolidation 254xx WV 1.14 0.0 2000-12-01 00:00:000.0 55.0 nan 6.0 0.0 3132.0 0.344 14.0 w 0.0 55.0 1.0 Individual nan nan nan 0.0 0.0 393615.0 nan nan nan nan nan nan nan nan nan nan nan 9100.0 nan nan nan 2.0 78723.0 2968.0 51.3 0.0 0.0 167.0 117.0 12.0 12.0 4.0 35.0 55.0 20.0 55.0 3.0 1.0 1.0 2.0 5.0 2.0 4.0 8.0 1.0 6.0 0.0 0.0 0.0 1.0 71.4 50.0 0.0 0.0 454355.0 3132.0 6100.0 0.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
5 nan nan 15000.0 15000.0 15000.0 36 months0.0712 463.98 A A3 OWN 60000.0 Source Verified 2014-10-01 00:00:00Fully Paid n nan credit_card Credit card refinancing301xx GA 8.42 0.0 1976-11-01 00:00:000.0 40.0 20.0 9.0 2.0 15165.0 0.442 11.0 w 0.0 nan 1.0 Individual nan nan nan 0.0 0.0 15165.0 nan nan nan nan nan nan nan nan nan nan nan 34300.0 nan nan nan 3.0 1685.0 14340.0 51.2 0.0 0.0 nan 454.0 11.0 11.0 0.0 11.0 nan 17.0 40.0 0.0 4.0 6.0 5.0 6.0 0.0 9.0 11.0 6.0 9.0 0.0 0.0 0.0 1.0 90.9 40.0 0.0 2.0 34300.0 15165.0 29400.0 0.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
6 nan nan 35000.0 35000.0 35000.0 36 months0.1398 1195.88 C C3 Teacher 10+ years MORTGAGE 91886.0 Verified 2014-08-01 00:00:00Fully Paid n nan debt_consolidationDebt consolidation 920xx CA 20.97 0.0 1994-07-01 00:00:002.0 nan nan 10.0 0.0 38409.0 0.662 26.0 w 0.0 nan 1.0 Individual nan nan nan 0.0 0.0 474161.0 nan nan nan nan nan nan nan nan nan nan nan 58000.0 nan nan nan 3.0 52685.0 14206.0 61.1 0.0 0.0 138.0 240.0 11.0 5.0 8.0 71.0 nan 1.0 nan 0.0 3.0 7.0 3.0 7.0 6.0 8.0 12.0 7.0 10.0 0.0 0.0 0.0 3.0 100.0 66.7 0.0 0.0 505693.0 58911.0 36500.0 30694.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
7 nan nan 4000.0 4000.0 3950.0 36 months0.0917 127.52 B B1 Associate Professor 5 years MORTGAGE 55000.0 Source Verified 2014-05-01 00:00:00Charged Off n nan credit_card Credit card refinancing275xx NC 22.81 1.0 1993-12-01 00:00:001.0 21.0 nan 15.0 0.0 22042.0 0.249 36.0 f 0.0 nan 1.0 Individual nan nan nan 0.0 0.0 43231.0 nan nan nan nan nan nan nan nan nan nan nan 88400.0 nan nan nan 2.0 2882.0 56510.0 27.9 0.0 0.0 83.0 244.0 12.0 7.0 3.0 65.0 21.0 1.0 21.0 0.0 3.0 5.0 8.0 20.0 4.0 13.0 28.0 5.0 15.0 0.0 0.0 0.0 2.0 97.2 25.0 0.0 0.0 111174.0 43231.0 78400.0 17117.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
8 nan nan 10000.0 10000.0 10000.0 36 months0.0649 306.45 A A2 owner 10+ years OWN 100000.0 Source Verified 2014-06-01 00:00:00Fully Paid n nan home_improvement Home improvement 890xx NV 17.88 0.0 1991-02-01 00:00:000.0 nan nan 6.0 0.0 5414.0 0.722 23.0 w 0.0 nan 1.0 Individual nan nan nan 0.0 0.0 153711.0 nan nan nan nan nan nan nan nan nan nan nan 7500.0 nan nan nan 2.0 25619.0 99.0 98.0 0.0 0.0 127.0 280.0 56.0 7.0 7.0 107.0 nan 7.0 nan 0.0 1.0 2.0 1.0 5.0 9.0 2.0 7.0 2.0 6.0 0.0 0.0 0.0 2.0 100.0 100.0 0.0 0.0 178016.0 83729.0 5000.0 95922.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
9 nan nan 15000.0 15000.0 15000.0 36 months0.1099 491.01 B B3 Sr Business Analyst 10+ years MORTGAGE 78200.0 Not Verified 2014-05-01 00:00:00Fully Paid n nan home_improvement Home improvement 786xx TX 14.01 1.0 1995-09-01 00:00:001.0 18.0 nan 13.0 0.0 7559.0 0.411 28.0 w 1.0 nan 1.0 Individual nan nan nan 0.0 735.0 270873.0 nan nan nan nan nan nan nan nan nan nan nan 19600.0 nan nan nan 3.0 22573.0 7306.0 48.9 0.0 0.0 145.0 224.0 12.0 2.0 4.0 12.0 nan 2.0 18.0 0.0 4.0 6.0 5.0 8.0 8.0 10.0 15.0 6.0 13.0 0.0 0.0 0.0 3.0 96.2 40.0 0.0 0.0 318951.0 32615.0 14300.0 29351.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
In [11]:
import os

# These two CSV files were created in the previous section

train_path = os.getcwd()+"/train_lc.csv.zip"
test_path = os.getcwd()+ "/test_lc.csv.zip"

train = h2o.load_dataset(train_path)
test = h2o.load_dataset(test_path)
Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%
In [12]:
train.describe()
Rows:188434
Cols:124


id member_id loan_amnt funded_amnt funded_amnt_inv term int_rate installment grade sub_grade emp_title emp_length home_ownership annual_inc verification_status issue_d loan_status pymnt_plan url desc purpose title zip_code addr_state dti delinq_2yrs earliest_cr_line inq_last_6mths mths_since_last_delinq mths_since_last_record open_acc pub_rec revol_bal revol_util total_acc initial_list_status collections_12_mths_ex_med mths_since_last_major_derog policy_code application_type annual_inc_joint dti_joint verification_status_joint acc_now_delinq tot_coll_amt tot_cur_bal open_acc_6m open_act_il open_il_12m open_il_24m mths_since_rcnt_il total_bal_il il_util open_rv_12m open_rv_24m max_bal_bc all_util total_rev_hi_lim inq_fi total_cu_tl inq_last_12m acc_open_past_24mths avg_cur_bal bc_open_to_buy bc_util chargeoff_within_12_mths delinq_amnt mo_sin_old_il_acct mo_sin_old_rev_tl_op mo_sin_rcnt_rev_tl_op mo_sin_rcnt_tl mort_acc mths_since_recent_bc mths_since_recent_bc_dlq mths_since_recent_inq mths_since_recent_revol_delinq num_accts_ever_120_pd num_actv_bc_tl num_actv_rev_tl num_bc_sats num_bc_tl num_il_tl num_op_rev_tl num_rev_accts num_rev_tl_bal_gt_0 num_sats num_tl_120dpd_2m num_tl_30dpd num_tl_90g_dpd_24m num_tl_op_past_12m pct_tl_nvr_dlq percent_bc_gt_75 pub_rec_bankruptcies tax_liens tot_hi_cred_lim total_bal_ex_mort total_bc_limit total_il_high_credit_limit revol_bal_joint sec_app_earliest_cr_line sec_app_inq_last_6mths sec_app_mort_acc sec_app_open_acc sec_app_revol_util sec_app_open_act_il sec_app_num_rev_accts sec_app_chargeoff_within_12_mths sec_app_collections_12_mths_ex_med sec_app_mths_since_last_major_derog hardship_flag hardship_type hardship_reason hardship_status deferral_term hardship_amount hardship_start_date hardship_end_date payment_plan_start_date hardship_length hardship_dpd hardship_loan_status orig_projected_additional_accrued_interest hardship_payoff_balance_amount hardship_last_payment_amount
type int int int int int enum real real enum enum enum enum enum real enum time enum enum int stringenum enum enum enum real int time int int int int int int real int enum int int int enum int int int int int int int int int int int int int int int int int int int int int int int int real int int int int int int int int int int int int int int int int int int int int int int int int int real real int int int int int int int int int int int int int int int int int enum enum enum enum int real time time time int int enum real real real
mins NaN NaN 1000.0 1000.0 950.0 0.06 23.36 3000.0 1388534400000.0 NaN NaN 0.0 0.0 -820540800000.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 1.0 NaN NaN NaN 0.0 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 NaN NaN NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 16.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.0 1.47 1485907200000.0 1491004800000.0 1485907200000.0 3.0 0.0 4.41 174.15 0.04
mean 0.0 0.0 14884.78048016811814884.78048016811814879.984769203027 0.13768163813324627 443.02308527123586 74842.1021655859 1403772635222.944 0.0 NaN 18.0384319178067440.3439559739749727878010488536.0388 0.757559676066950733.40950190769873 70.73781512605042 11.6715773161955880.222459853317342316517.3175170085670.5562211189754319 26.019354256662876 0.015474914293598825 42.4452214452214 1.0 0.0 0.0 0.0 0.0056200048823460734280.02962841100884139916.782602926220.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 30777.3117271830120.0 0.0 0.0 4.403568358151916 13425.63404110787 8488.266061096774 64.592774200472150.010783616544784914 9.796506999798337128.53483025519225 185.81975121262616 13.078101616481097 8.0036458388613421.853216510820764 24.440105868864123 39.5963653177332 6.918806067907228 35.46866324059305 0.5053122048038038 3.686240275109585 5.803241453240905 4.646629589139977 8.54615939798552 8.57417451203075 8.27700945689208615.3042179224556125.767600326904917 11.6223186898330350.00095513605199453280.00365114575925788330.09438848615430329 2.0082309986520546 94.2433769914134850.68730479940775 0.13524098623390737 0.05543054862710553170413.4813303332848481.2246834436 20070.11836505092 39932.08656611851 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.0 114.215738636363641509994881818.182 1516242681818.182 1510936200000.0 3.0 14.59375 339.8909328358209 7947.003380681818 187.48076704545457
maxs NaN NaN 35000.0 35000.0 35000.0 0.2606 1409.99 7500000.0 1417392000000.0 NaN NaN 39.99 22.0 1320105600000.0 6.0 188.0 121.0 84.0 63.0 2560703.0 8.923 156.0 20.0 188.0 1.0 NaN NaN NaN 3.0 9152545.0 3840795.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 9999999.0 NaN NaN NaN 53.0 497484.0 260250.0 255.2 7.0 65000.0 561.0 842.0 372.0 226.0 37.0 616.0 170.0 25.0 180.0 30.0 26.0 38.0 35.0 61.0 150.0 62.0 105.0 38.0 84.0 2.0 3.0 22.0 26.0 100.0 100.0 7.0 63.0 9999999.0 2688920.0 1090700.0 1027358.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.0 344.24 1556668800000.0 1561939200000.0 1556668800000.0 3.0 32.0 1032.72 20321.15 713.04
sigma -0.0 -0.0 8444.529842237767 8444.529842237767 8441.74965734052 0.043236714795126696245.55956150039466 55879.6254552511 8618925772.810982 -0.0 NaN 8.023289119934871 0.9000809312845888235685775429.60028 1.035364539099023 21.777363524527555 28.5075895180931 5.280407394680721 0.605396865550370321598.80009195033 0.2310240222484618611.891471363727664 0.14101201867640875 20.880974652987707 0.0 -0.0 -0.0 -0.0 0.07950349603856371 21174.68581402208 153006.06065658265-0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 38384.00007004573 -0.0 -0.0 -0.0 2.8642976762774732 16026.70277646250213412.41141441076626.424017661426290.1173765114760668 565.411644236004151.34992108739378 93.00236919417999 16.134812986440643 8.7590525517506632.161127563440871830.30743682243691 22.573879361083183 5.9354275780436705 22.304832497335703 1.2700994420156504 2.15271474268570673.14054498864890472.72003942827463654.8126947852769787.30895419633316754.3182236825395088.047168776251766 3.1221446203671164 5.278551769075766 0.03211042594205823 0.06423450317302083 0.49334840182534967 1.6084979574620704 8.48144727683834434.90775600526837 0.3756866400997473 0.4123003808101904 172512.507823174 46113.3699393363 20243.50759606858641490.84884582259 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 0.0 77.0094025673706 14500662214.885572 14471630662.23209 14520502785.954859 0.0 9.499873480938898 240.1814929663525 4637.06552271656 147.17888898625247
zeros 0 0 0 0 0 0 0 0 0 0 0 60 149487 34 100647 222 3 2 155114 448 480 0 185745 37 0 0 0 0 187438 159941 34 0 0 0 0 0 0 0 0 0 0 0 66 0 0 0 7733 31 3801 1595 186638 187742 2 0 2977 3084 72408 1245 97 15431 157 143588 3405 453 1839 330 5758 53 0 448 2 182006 187787 176622 32108 0 32367 164655 181918 3 68 2036 25148 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 68 0 0 0
missing188434188434 0 0 0 0 0 0 0 0 10581 9589 0 0 0 0 0 0 1884341762170 0 0 0 0 0 0 0 92769 155114 0 0 0 101 0 0 0 135238 0 0 188434 188434 188434 0 0 0 188434 188434 188434 188434 188434 188434 188434 188434 188434 188434 188434 0 188434 188434 188434 0 3 1943 2074 0 0 5748 0 0 0 0 1788 138691 17436 120718 0 0 0 0 0 0 0 0 0 0 6261 0 0 0 0 2036 0 0 0 0 0 0 188434 188434 188434 188434 188434 188434 188434 188434 188434 188434 188434 0 188082 188082 188082 188082 188082 188082 188082 188082 188082 188082 188082 188166 188082 188082
0 nan nan 18700.0 18700.0 18700.0 60 months0.1629 457.64 D D2 Assistant Manager 4 years MORTGAGE 52000.0 Not Verified 2014-05-01 00:00:00Fully Paid n nan credit_card Credit card refinancing630xx MO 11.65 0.0 1999-08-01 00:00:005.0 59.0 nan 20.0 0.0 16920.0 0.502 37.0 w 0.0 59.0 1.0 Individual nan nan nan 0.0 0.0 117999.0 nan nan nan nan nan nan nan nan nan nan nan 33700.0 nan nan nan 6.0 6210.0 4113.0 80.4 0.0 0.0 177.0 123.0 1.0 1.0 1.0 8.0 59.0 1.0 59.0 1.0 5.0 6.0 9.0 10.0 17.0 16.0 19.0 6.0 20.0 0.0 0.0 0.0 2.0 91.7 50.0 0.0 0.0 141329.0 51831.0 21000.0 39404.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
1 nan nan 20000.0 20000.0 20000.0 36 months0.0917 637.58 B B1 Engineering 4 years RENT 93000.0 Source Verified 2014-08-01 00:00:00Fully Paid n nan debt_consolidationDebt consolidation 334xx FL 19.15 0.0 1996-09-01 00:00:000.0 43.0 nan 9.0 0.0 10597.0 0.609 24.0 f 1.0 43.0 1.0 Individual nan nan nan 0.0 2305.0 61086.0 nan nan nan nan nan nan nan nan nan nan nan 17400.0 nan nan nan 2.0 6787.0 2303.0 82.1 0.0 0.0 215.0 178.0 33.0 5.0 0.0 33.0 nan 9.0 nan 2.0 3.0 3.0 4.0 9.0 11.0 5.0 12.0 3.0 9.0 0.0 0.0 0.0 1.0 91.7 50.0 0.0 0.0 79108.0 61086.0 12900.0 61708.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
2 nan nan 11000.0 11000.0 11000.0 36 months0.1099 360.08 B B2 Teacher director 10+ years RENT 30000.0 Not Verified 2014-02-01 00:00:00Fully Paid n nan debt_consolidationDebt consolidation 010xx MA 27.84 0.0 1994-11-01 00:00:001.0 nan nan 10.0 0.0 11523.0 0.546 21.0 w 0.0 nan 1.0 Individual nan nan nan 0.0 90.0 17778.0 nan nan nan nan nan nan nan nan nan nan nan 21100.0 nan nan nan 2.0 1778.0 2200.0 71.8 0.0 0.0 133.0 231.0 6.0 6.0 0.0 63.0 nan 6.0 nan 0.0 4.0 9.0 4.0 9.0 5.0 9.0 16.0 9.0 10.0 0.0 0.0 0.0 1.0 100.0 75.0 0.0 0.0 35076.0 17778.0 7800.0 13976.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
3 nan nan 15000.0 15000.0 15000.0 60 months0.1561 361.67 D D1 Accounting Assistant < 1 year RENT 50000.0 Verified 2014-07-01 00:00:00Fully Paid n nan credit_card Credit card refinancing076xx NJ 20.91 0.0 1997-07-01 00:00:000.0 nan 94.0 24.0 1.0 7884.0 0.276 31.0 w 0.0 nan 1.0 Individual nan nan nan 0.0 0.0 34486.0 nan nan nan nan nan nan nan nan nan nan nan 28600.0 nan nan nan 14.0 1499.0 5285.0 57.0 0.0 0.0 148.0 203.0 3.0 1.0 0.0 10.0 nan 7.0 nan 0.0 3.0 7.0 4.0 6.0 11.0 16.0 20.0 7.0 24.0 0.0 0.0 0.0 11.0 100.0 25.0 1.0 0.0 88501.0 34486.0 12300.0 59901.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
4 nan nan 6250.0 6250.0 6250.0 36 months0.0949 200.18 B B2 Full Time Active Duty Military10+ years MORTGAGE 78000.0 Source Verified 2014-11-01 00:00:00Fully Paid n nan debt_consolidationDebt consolidation 254xx WV 1.14 0.0 2000-12-01 00:00:000.0 55.0 nan 6.0 0.0 3132.0 0.344 14.0 w 0.0 55.0 1.0 Individual nan nan nan 0.0 0.0 393615.0 nan nan nan nan nan nan nan nan nan nan nan 9100.0 nan nan nan 2.0 78723.0 2968.0 51.3 0.0 0.0 167.0 117.0 12.0 12.0 4.0 35.0 55.0 20.0 55.0 3.0 1.0 1.0 2.0 5.0 2.0 4.0 8.0 1.0 6.0 0.0 0.0 0.0 1.0 71.4 50.0 0.0 0.0 454355.0 3132.0 6100.0 0.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
5 nan nan 15000.0 15000.0 15000.0 36 months0.0712 463.98 A A3 OWN 60000.0 Source Verified 2014-10-01 00:00:00Fully Paid n nan credit_card Credit card refinancing301xx GA 8.42 0.0 1976-11-01 00:00:000.0 40.0 20.0 9.0 2.0 15165.0 0.442 11.0 w 0.0 nan 1.0 Individual nan nan nan 0.0 0.0 15165.0 nan nan nan nan nan nan nan nan nan nan nan 34300.0 nan nan nan 3.0 1685.0 14340.0 51.2 0.0 0.0 nan 454.0 11.0 11.0 0.0 11.0 nan 17.0 40.0 0.0 4.0 6.0 5.0 6.0 0.0 9.0 11.0 6.0 9.0 0.0 0.0 0.0 1.0 90.9 40.0 0.0 2.0 34300.0 15165.0 29400.0 0.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
6 nan nan 35000.0 35000.0 35000.0 36 months0.1398 1195.88 C C3 Teacher 10+ years MORTGAGE 91886.0 Verified 2014-08-01 00:00:00Fully Paid n nan debt_consolidationDebt consolidation 920xx CA 20.97 0.0 1994-07-01 00:00:002.0 nan nan 10.0 0.0 38409.0 0.662 26.0 w 0.0 nan 1.0 Individual nan nan nan 0.0 0.0 474161.0 nan nan nan nan nan nan nan nan nan nan nan 58000.0 nan nan nan 3.0 52685.0 14206.0 61.1 0.0 0.0 138.0 240.0 11.0 5.0 8.0 71.0 nan 1.0 nan 0.0 3.0 7.0 3.0 7.0 6.0 8.0 12.0 7.0 10.0 0.0 0.0 0.0 3.0 100.0 66.7 0.0 0.0 505693.0 58911.0 36500.0 30694.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
7 nan nan 4000.0 4000.0 3950.0 36 months0.0917 127.52 B B1 Associate Professor 5 years MORTGAGE 55000.0 Source Verified 2014-05-01 00:00:00Charged Off n nan credit_card Credit card refinancing275xx NC 22.81 1.0 1993-12-01 00:00:001.0 21.0 nan 15.0 0.0 22042.0 0.249 36.0 f 0.0 nan 1.0 Individual nan nan nan 0.0 0.0 43231.0 nan nan nan nan nan nan nan nan nan nan nan 88400.0 nan nan nan 2.0 2882.0 56510.0 27.9 0.0 0.0 83.0 244.0 12.0 7.0 3.0 65.0 21.0 1.0 21.0 0.0 3.0 5.0 8.0 20.0 4.0 13.0 28.0 5.0 15.0 0.0 0.0 0.0 2.0 97.2 25.0 0.0 0.0 111174.0 43231.0 78400.0 17117.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
8 nan nan 10000.0 10000.0 10000.0 36 months0.0649 306.45 A A2 owner 10+ years OWN 100000.0 Source Verified 2014-06-01 00:00:00Fully Paid n nan home_improvement Home improvement 890xx NV 17.88 0.0 1991-02-01 00:00:000.0 nan nan 6.0 0.0 5414.0 0.722 23.0 w 0.0 nan 1.0 Individual nan nan nan 0.0 0.0 153711.0 nan nan nan nan nan nan nan nan nan nan nan 7500.0 nan nan nan 2.0 25619.0 99.0 98.0 0.0 0.0 127.0 280.0 56.0 7.0 7.0 107.0 nan 7.0 nan 0.0 1.0 2.0 1.0 5.0 9.0 2.0 7.0 2.0 6.0 0.0 0.0 0.0 2.0 100.0 100.0 0.0 0.0 178016.0 83729.0 5000.0 95922.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan
9 nan nan 15000.0 15000.0 15000.0 36 months0.1099 491.01 B B3 Sr Business Analyst 10+ years MORTGAGE 78200.0 Not Verified 2014-05-01 00:00:00Fully Paid n nan home_improvement Home improvement 786xx TX 14.01 1.0 1995-09-01 00:00:001.0 18.0 nan 13.0 0.0 7559.0 0.411 28.0 w 1.0 nan 1.0 Individual nan nan nan 0.0 735.0 270873.0 nan nan nan nan nan nan nan nan nan nan nan 19600.0 nan nan nan 3.0 22573.0 7306.0 48.9 0.0 0.0 145.0 224.0 12.0 2.0 4.0 12.0 nan 2.0 18.0 0.0 4.0 6.0 5.0 8.0 8.0 10.0 15.0 6.0 13.0 0.0 0.0 0.0 3.0 96.2 40.0 0.0 0.0 318951.0 32615.0 14300.0 29351.0 nan nan nan nan nan nan nan nan nan nan nan N nan nan nan nan nan nan nan

Model Training

In [13]:
from h2o.automl import H2OAutoML

# Identify predictors and response
x = train.columns
y = "loan_status"
x.remove(y)

# For binary classification, response should be a factor
train[y] = train[y].asfactor()
test[y] = test[y].asfactor()

# Run AutoML 
aml = H2OAutoML(project_name='LP', 
                max_models=1,         # 1 base models *FOR DEMO PURPOSE
                balance_classes=True,  # Doing smart Class imbalance sampling
                max_runtime_secs=3600,  # 1 hour *FOR DEMO PURPOSE (need to be longer runtime or else model will not train)
                seed=1234)             # Set a seed for reproducability
aml.train(x=x, y=y, training_frame=train)
AutoML progress: |████████████████████████████████████████████████████████| 100%

View the AutoML Leaderboard

In [14]:
lb = aml.leaderboard
lb.head(rows=lb.nrows)  # Print all rows instead of default (10 rows)
model_id auc logloss mean_per_class_error rmse mse
XGBoost_1_AutoML_20200922_1222030.705681 0.4271 0.495880.3660570.133997
Out[14]:

In [15]:
test_pc = aml.predict(test)
test_pc
xgboost prediction progress: |████████████████████████████████████████████| 100%
predict Charged Off Fully Paid
Fully Paid 0.0300723 0.969928
Fully Paid 0.038023 0.961977
Fully Paid 0.0306047 0.969395
Fully Paid 0.242732 0.757268
Fully Paid 0.0589932 0.941007
Fully Paid 0.133116 0.866884
Fully Paid 0.34577 0.65423
Charged Off 0.432134 0.567866
Fully Paid 0.379319 0.620681
Fully Paid 0.0814494 0.918551
Out[15]:

Define BentoService for model serving

In [16]:
%%writefile loan_prediction.py

import h2o

from bentoml import api, env, artifacts, BentoService
from bentoml.frameworks.h2o import H2oModelArtifact
from bentoml.adapters import DataframeInput

@env(
    pip_packages=['h2o==3.24.0.2', 'pandas'],
    conda_channels=['h2oai'],
    conda_dependencies=['h2o==3.24.0.2']
)
@artifacts([H2oModelArtifact('model')])
class LoanPrediction(BentoService):
    
    @api(input=DataframeInput(), batch=True)
    def predict(self, df):
        h2o_frame = h2o.H2OFrame(df, na_strings=['NaN'])
        predictions = self.artifacts.model.predict(h2o_frame)
        return predictions.as_data_frame()
Overwriting loan_prediction.py

Save BentoService to file archive

In [17]:
# 1) import the custom BentoService defined above
from loan_prediction import LoanPrediction

# 2) `pack` it with required artifacts
bentoml_svc = LoanPrediction()
bentoml_svc.pack('model', aml.leader)

# 3) save your BentoSerivce
saved_path = bentoml_svc.save()
[2020-09-22 12:39:14,554] WARNING - Using BentoML installed in `editable` model, the local BentoML repository including all code changes will be packaged together with saved bundle created, under the './bundled_pip_dependencies' directory of the saved bundle.
[2020-09-22 12:39:14,867] INFO - Using default docker base image: `None` specified inBentoML config file or env var. User must make sure that the docker base image either has Python 3.7 or conda installed.
[2020-09-22 12:39:14,870] WARNING - pip package requirement pandas already exist
[2020-09-22 12:39:14,873] WARNING - pip package requirement h2o already exist
[2020-09-22 12:39:15,844] INFO - Detected non-PyPI-released BentoML installed, copying local BentoML modulefiles to target saved bundle path..
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyo' found anywhere in distribution
warning: no previously-included files matching '.git' found anywhere in distribution
warning: no previously-included files matching '.ipynb_checkpoints' found anywhere in distribution
warning: no previously-included files matching '__pycache__' found anywhere in distribution
no previously-included directories found matching 'e2e_tests'
no previously-included directories found matching 'tests'
no previously-included directories found matching 'benchmark'
UPDATING BentoML-0.9.0rc0+3.gcebf2015/bentoml/_version.py
set BentoML-0.9.0rc0+3.gcebf2015/bentoml/_version.py to '0.9.0.pre+3.gcebf2015'
[2020-09-22 12:39:19,606] INFO - BentoService bundle 'LoanPrediction:20200922123915_EEBBD2' saved to: /Users/bozhaoyu/bentoml/repository/LoanPrediction/20200922123915_EEBBD2

REST API Model Serving

To start a REST API model server with the BentoService saved above, use the bentoml serve command:

In [25]:
!bentoml serve LoanPrediction:latest
[2020-09-22 17:48:06,148] INFO - Getting latest version LoanPrediction:20200922123915_EEBBD2
[2020-09-22 17:48:06,148] INFO - Starting BentoML API server in development mode..
[2020-09-22 17:48:06,386] WARNING - Using BentoML installed in `editable` model, the local BentoML repository including all code changes will be packaged together with saved bundle created, under the './bundled_pip_dependencies' directory of the saved bundle.
[2020-09-22 17:48:06,406] WARNING - Saved BentoService bundle version mismatch: loading BentoService bundle create with BentoML version 0.9.0.pre, but loading from BentoML version 0.9.0.pre+3.gcebf2015
[2020-09-22 17:48:06,777] INFO - Using default docker base image: `None` specified inBentoML config file or env var. User must make sure that the docker base image either has Python 3.7 or conda installed.
Checking whether there is an H2O instance running at http://localhost:54321 . connected.
Warning: Your H2O cluster version is too old (1 year, 5 months and 5 days)! Please download and install the latest version from http://h2o.ai/download/
--------------------------  ---------------------------------------------------
H2O cluster uptime:         5 hours 27 mins
H2O cluster timezone:       America/Los_Angeles
H2O data parsing timezone:  UTC
H2O cluster version:        3.24.0.2
H2O cluster version age:    1 year, 5 months and 5 days !!!
H2O cluster name:           H2O_from_python_bozhaoyu_392ekt
H2O cluster total nodes:    1
H2O cluster free memory:    3.906 Gb
H2O cluster total cores:    8
H2O cluster allowed cores:  8
H2O cluster status:         locked, healthy
H2O connection url:         http://localhost:54321
H2O connection proxy:
H2O internal security:      False
H2O API Extensions:         Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4
Python version:             3.7.3 final
--------------------------  ---------------------------------------------------
[2020-09-22 17:48:08,298] WARNING - pip package requirement pandas already exist
[2020-09-22 17:48:08,298] WARNING - pip package requirement h2o already exist
 * Serving Flask app "LoanPrediction" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
INFO:werkzeug: * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
Parse progress: |█████████████████████████████████████████████████████████| 100%
xgboost prediction progress: |████████████████████████████████████████████| 100%
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'hardship_loan_status': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'initial_list_status': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'mths_since_last_delinq': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'mths_since_last_record': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'hardship_amount': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'hardship_start_date': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'hardship_end_date': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'payment_plan_start_date': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'hardship_dpd': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'orig_projected_additional_accrued_interest': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'hardship_payoff_balance_amount': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'hardship_last_payment_amount': substituting in a column of NaN
  warnings.warn(w)
INFO:werkzeug:127.0.0.1 - - [22/Sep/2020 17:48:11] "POST /predict HTTP/1.1" 400 -
^C
H2O session _sid_9aec closed.

If you are running this notebook from Google Colab, you can start the dev server with --run-with-ngrok option, to gain acccess to the API endpoint via a public endpoint managed by ngrok:

In [ ]:
!bentoml serve LoanPrediction:latest --run-with-ngrok

Open http://127.0.0.1:5000 to see more information about the REST APIs server in your browser.

curl -i \
    --request POST \
    --header "Content-Type: text/csv" \
    --data @sample_data.csv \
    localhost:5000/predict

Containerize model server with Docker

One common way of distributing this model API server for production deployment, is via Docker containers. And BentoML provides a convenient way to do that.

Note that docker is not available in Google Colab. You will need to download and run this notebook locally to try out this containerization with docker feature.

If you already have docker configured, simply run the follow command to product a docker container serving the IrisClassifier prediction service created above:

In [27]:
!bentoml containerize LoanPrediction:latest
[2020-09-22 17:52:45,149] INFO - Getting latest version LoanPrediction:20200922123915_EEBBD2
Found Bento: /Users/bozhaoyu/bentoml/repository/LoanPrediction/20200922123915_EEBBD2
[2020-09-22 17:52:45,190] WARNING - Using BentoML installed in `editable` model, the local BentoML repository including all code changes will be packaged together with saved bundle created, under the './bundled_pip_dependencies' directory of the saved bundle.
[2020-09-22 17:52:45,208] WARNING - Saved BentoService bundle version mismatch: loading BentoService bundle create with BentoML version 0.9.0.pre, but loading from BentoML version 0.9.0.pre+3.gcebf2015
Tag not specified, using tag parsed from BentoService: 'loanprediction:20200922123915_EEBBD2'
Building Docker image loanprediction:20200922123915_EEBBD2 from LoanPrediction:latest 
-we in here
processed docker file
(None, None)
root in create archive /Users/bozhaoyu/bentoml/repository/LoanPrediction/20200922123915_EEBBD2 ['Dockerfile', 'LoanPrediction', 'LoanPrediction/__init__.py', 'LoanPrediction/__pycache__', 'LoanPrediction/__pycache__/loan_prediction.cpython-37.pyc', 'LoanPrediction/artifacts', 'LoanPrediction/artifacts/__init__.py', 'LoanPrediction/artifacts/model', 'LoanPrediction/bentoml.yml', 'LoanPrediction/loan_prediction.py', 'MANIFEST.in', 'README.md', 'bentoml-init.sh', 'bentoml.yml', 'bundled_pip_dependencies', 'bundled_pip_dependencies/BentoML-0.9.0rc0+3.gcebf2015.tar.gz', 'docker-entrypoint.sh', 'environment.yml', 'python_version', 'requirements.txt', 'setup.py']
about to build
about to upgrade params
check each param and update
if use config proxy
if buildargs
if shmsize
if labels
if cache from
if target
if network_mode
if squash
if extra hosts is not None
if platform is not None
if isolcation is not None
if context is not None
setting auth {'Content-Type': 'application/tar'}
\docker build <tempfile._TemporaryFileWrapper object at 0x7fa3e54b4da0> {'t': 'loanprediction:20200922123915_EEBBD2', 'remote': None, 'q': False, 'nocache': False, 'rm': False, 'forcerm': False, 'pull': False, 'dockerfile': (None, None)}
|docker response <Response [200]>
context closes
\print responses
Step 1/15 : FROM bentoml/model-server:0.9.0.pre
 ---> a25066aa8b0e
Step 2/15 : ARG EXTRA_PIP_INSTALL_ARGS=
 ---> Using cache
 ---> 315719b8980e
Step 3/15 : ENV EXTRA_PIP_INSTALL_ARGS $EXTRA_PIP_INSTALL_ARGS
 ---> Using cache
 ---> a3b6c8107d94
Step 4/15 : COPY environment.yml requirements.txt setup.sh* bentoml-init.sh python_version* /bento/
 ---> Using cache
 ---> 8a93eb1a85af
Step 5/15 : WORKDIR /bento
 ---> Using cache
 ---> 22714b0bba7e
Step 6/15 : RUN chmod +x /bento/bentoml-init.sh
 ---> Using cache
 ---> 44c32c282581
Step 7/15 : RUN if [ -f /bento/bentoml-init.sh ]; then bash -c /bento/bentoml-init.sh; fi
 ---> Running in dde5e5814405
|+++ dirname /bento/bentoml-init.sh

++ cd /bento
++ pwd -P

+ SAVED_BUNDLE_PATH=/bento
+ cd /bento
+ '[' -f ./setup.sh ']'

+ '[' -f ./python_version ']'

++ cat ./python_version

+ PY_VERSION_SAVED=3.7.3
+ DESIRED_PY_VERSION=3.7

++ python -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")'

\+ CURRENT_PY_VERSION=3.7
+ [[ 3.7 == \3\.\7 ]]
+ echo 'Python Version in docker base image 3.7 matches requirement python=3.7. Skipping.'
+ command -v conda
+ echo 'Updating conda base environment with environment.yml'
+ conda env update -n base -f ./environment.yml

Python Version in docker base image 3.7 matches requirement python=3.7. Skipping.
Updating conda base environment with environment.yml
\Collecting package metadata (repodata.json): ...working... 
\done
Solving environment: ...working... 
Examining python=3.7:   0%|          | 0/5 [00:00<?, ?it/s]  
Examining @/linux-64::__glibc==2.28=0:  40%|████      | 2/5 [00:00<00:01,  2.04it/s]
Examining h2o==3.24.0.2:  40%|████      | 2/5 [00:00<00:01,  2.04it/s]              
Examining pip:  80%|████████  | 4/5 [00:01<00:00,  2.78it/s]          
Examining openjdk:  80%|████████  | 4/5 [00:10<00:00,  2.78it/s]
Examining openjdk: 100%|██████████| 5/5 [00:10<00:00,  3.19s/it]
                                                                
Examining conflict for python h2o pip:   0%|          | 0/5 [00:00<?, ?it/s]
Examining conflict for python pip:  40%|████      | 2/5 [00:00<00:00,  7.88it/s]
Examining conflict for python openjdk:  60%|██████    | 3/5 [00:00<00:00,  8.36it/s]
Examining conflict for python openjdk h2o pip:  80%|████████  | 4/5 [00:00<00:00,  8.33it/s]
Examining conflict for h2o pip:  80%|████████  | 4/5 [00:00<00:00,  8.33it/s]               
Examining conflict for h2o pip: 100%|██████████| 5/5 [00:00<00:00,  5.08it/s]
                                                                             

Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed
Solving environment: ...working... 
Examining python=3.7:   0%|          | 0/5 [00:00<?, ?it/s]  
Examining @/linux-64::__glibc==2.28=0:  40%|████      | 2/5 [00:01<00:01,  1.87it/s]
Examining h2o==3.24.0.2:  40%|████      | 2/5 [00:01<00:01,  1.87it/s]              
Examining pip:  80%|████████  | 4/5 [00:01<00:00,  2.56it/s]          
Examining openjdk: 100%|██████████| 5/5 [00:11<00:00,  3.29s/it]
                                                                
Examining conflict for python h2o pip:   0%|          | 0/5 [00:00<?, ?it/s]
Examining conflict for python pip:  40%|████      | 2/5 [00:00<00:00,  7.38it/s]
Examining conflict for python openjdk:  60%|██████    | 3/5 [00:00<00:00,  7.79it/s]
Examining conflict for python openjdk h2o pip:  80%|████████  | 4/5 [00:00<00:00,  7.81it/s]
Examining conflict for h2o pip:  80%|████████  | 4/5 [00:00<00:00,  7.81it/s]               
Examining conflict for h2o pip: 100%|██████████| 5/5 [00:00<00:00,  4.75it/s]
                                                                             

Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - h2o==3.24.0.2 -> python[version='>=2.7,<2.8.0a0|>=3.5,<3.6.0a0|>=3.6,<3.7.0a0']

Your python: python=3.7

If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.

The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

Package setuptools conflicts for:
pip -> setuptools
python=3.7 -> pip -> setuptools

Package _libgcc_mutex conflicts for:
python=3.7 -> libgcc-ng[version='>=7.5.0'] -> _libgcc_mutex[version='*|0.1',build='main|conda_forge']
openjdk -> libgcc-ng[version='>=7.5.0'] -> _libgcc_mutex[version='*|0.1',build='main|conda_forge']

Package certifi conflicts for:
h2o==3.24.0.2 -> requests[version='>=2.10'] -> certifi[version='>=2017.4.17']
pip -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']

Package libstdcxx-ng conflicts for:
python=3.7 -> libstdcxx-ng[version='>=4.9|>=7.3.0|>=7.5.0|>=7.2.0']
h2o==3.24.0.2 -> python[version='>=3.6,<3.7.0a0'] -> libstdcxx-ng[version='>=4.9|>=7.3.0|>=7.5.0|>=7.2.0']
pip -> python[version='>=3'] -> libstdcxx-ng[version='>=4.9|>=7.3.0|>=7.5.0|>=7.2.0']
openjdk -> libstdcxx-ng[version='>=7.3.0|>=7.5.0']


Error: bentoml-cli containerize failed: The command '/bin/sh -c if [ -f /bento/bentoml-init.sh ]; then bash -c /bento/bentoml-init.sh; fi' returned a non-zero code: 1
In [ ]:
!docker run --p 5000:5000 loanprediction

Load saved BentoService

bentoml.load is the API for loading a BentoML packaged model in python:

In [125]:
import pandas as pd

loaded_bentoml_svc = bentoml.load(saved_path)
sample_data = pd.read_csv('sample_data.csv')
result = loaded_bentoml_svc.predict(sample_data)
print(result)
[2020-02-24 17:21:25,109] WARNING - BentoML local changes detected - Local BentoML repository including all code changes will be bundled together with the BentoService bundle. When used with docker, the base docker image will be default to same version as last PyPI release at version: 0.6.2. You can also force bentoml to use a specific version for deploying your BentoService bundle, by setting the config 'core/bentoml_deploy_version' to a pinned version or your custom BentoML on github, e.g.:'bentoml_deploy_version = git+https://github.com/{username}/[email protected]{branch}'
[2020-02-24 17:21:25,121] WARNING - Saved BentoService bundle version mismatch: loading BentoServie bundle create with BentoML version 0.6.2,  but loading from BentoML version 0.6.2+16.g7795c2f
[2020-02-24 17:21:25,122] WARNING - Module `loan_prediction` already loaded, using existing imported module.
Checking whether there is an H2O instance running at http://localhost:54321 . connected.
Warning: Your H2O cluster version is too old (10 months and 7 days)! Please download and install the latest version from http://h2o.ai/download/
H2O cluster uptime: 2 hours 30 mins
H2O cluster timezone: America/Los_Angeles
H2O data parsing timezone: UTC
H2O cluster version: 3.24.0.2
H2O cluster version age: 10 months and 7 days !!!
H2O cluster name: H2O_from_python_bozhaoyu_7bamxr
H2O cluster total nodes: 1
H2O cluster free memory: 3.805 Gb
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster status: locked, healthy
H2O connection url: http://localhost:54321
H2O connection proxy: None
H2O internal security: False
H2O API Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4
Python version: 3.7.3 final
[2020-02-24 17:21:25,414] WARNING - BentoML local changes detected - Local BentoML repository including all code changes will be bundled together with the BentoService bundle. When used with docker, the base docker image will be default to same version as last PyPI release at version: 0.6.2. You can also force bentoml to use a specific version for deploying your BentoService bundle, by setting the config 'core/bentoml_deploy_version' to a pinned version or your custom BentoML on github, e.g.:'bentoml_deploy_version = git+https://github.com/{username}/[email protected]{branch}'
Parse progress: |█████████████████████████████████████████████████████████| 100%
xgboost prediction progress: |████████████████████████████████████████████| 100%
       predict  Charged Off  Fully Paid
0  Charged Off     0.436739    0.563261
1   Fully Paid     0.056414    0.943586

Launch inference job from CLI

BentoML cli supports loading and running a packaged model from CLI. With the DataframeInput adapter, the CLI command supports reading input Dataframe data from CLI argument or local csv or json files:

In [127]:
!bentoml run LoanPrediction:latest predict --input sample_data.csv
[2020-02-24 17:30:05,013] INFO - Getting latest version LoanPrediction:20200224153935_977ED8
[2020-02-24 17:30:05,014] WARNING - BentoML local changes detected - Local BentoML repository including all code changes will be bundled together with the BentoService bundle. When used with docker, the base docker image will be default to same version as last PyPI release at version: 0.6.2. You can also force bentoml to use a specific version for deploying your BentoService bundle, by setting the config 'core/bentoml_deploy_version' to a pinned version or your custom BentoML on github, e.g.:'bentoml_deploy_version = git+https://github.com/{username}/[email protected]{branch}'
[2020-02-24 17:30:05,028] WARNING - Saved BentoService bundle version mismatch: loading BentoServie bundle create with BentoML version 0.6.2,  but loading from BentoML version 0.6.2+16.g7795c2f
[2020-02-24 17:30:05,114] WARNING - BentoML local changes detected - Local BentoML repository including all code changes will be bundled together with the BentoService bundle. When used with docker, the base docker image will be default to same version as last PyPI release at version: 0.6.2. You can also force bentoml to use a specific version for deploying your BentoService bundle, by setting the config 'core/bentoml_deploy_version' to a pinned version or your custom BentoML on github, e.g.:'bentoml_deploy_version = git+https://github.com/{username}/[email protected]{branch}'
Checking whether there is an H2O instance running at http://localhost:54321 . connected.
Warning: Your H2O cluster version is too old (10 months and 7 days)! Please download and install the latest version from http://h2o.ai/download/
--------------------------  ---------------------------------------------------
H2O cluster uptime:         2 hours 38 mins
H2O cluster timezone:       America/Los_Angeles
H2O data parsing timezone:  UTC
H2O cluster version:        3.24.0.2
H2O cluster version age:    10 months and 7 days !!!
H2O cluster name:           H2O_from_python_bozhaoyu_7bamxr
H2O cluster total nodes:    1
H2O cluster free memory:    3.805 Gb
H2O cluster total cores:    8
H2O cluster allowed cores:  8
H2O cluster status:         locked, healthy
H2O connection url:         http://localhost:54321
H2O connection proxy:
H2O internal security:      False
H2O API Extensions:         Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4
Python version:             3.7.3 final
--------------------------  ---------------------------------------------------
[2020-02-24 17:30:06,948] WARNING - BentoML local changes detected - Local BentoML repository including all code changes will be bundled together with the BentoService bundle. When used with docker, the base docker image will be default to same version as last PyPI release at version: 0.6.2. You can also force bentoml to use a specific version for deploying your BentoService bundle, by setting the config 'core/bentoml_deploy_version' to a pinned version or your custom BentoML on github, e.g.:'bentoml_deploy_version = git+https://github.com/{username}/[email protected]{branch}'
Parse progress: |█████████████████████████████████████████████████████████| 100%
xgboost prediction progress: |████████████████████████████████████████████| 100%
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset column 'emp_title' has levels not trained on: [Sr, Project Coordinator]
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset column 'hardship_reason' has levels not trained on: [nan]
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset column 'hardship_loan_status' has levels not trained on: [nan]
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset column 'hardship_status' has levels not trained on: [nan]
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'initial_list_status': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'mths_since_last_delinq': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'mths_since_last_record': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'hardship_amount': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'hardship_start_date': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'hardship_end_date': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'payment_plan_start_date': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'hardship_dpd': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'orig_projected_additional_accrued_interest': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'hardship_payoff_balance_amount': substituting in a column of NaN
  warnings.warn(w)
/usr/local/anaconda3/envs/dev-py3/lib/python3.7/site-packages/h2o/job.py:69: UserWarning: Test/Validation dataset is missing column 'hardship_last_payment_amount': substituting in a column of NaN
  warnings.warn(w)
       predict  Charged Off  Fully Paid
0  Charged Off     0.436739    0.563261
1   Fully Paid     0.056414    0.943586
H2O session _sid_80c4 closed.

Deployment Options

If you are at a small team with limited engineering or DevOps resources, try out automated deployment with BentoML CLI, currently supporting AWS Lambda, AWS SageMaker, and Azure Functions:

If the cloud platform you are working with is not on the list above, try out these step-by-step guide on manually deploying BentoML packaged model to cloud platforms:

Lastly, if you have a DevOps or ML Engineering team who's operating a Kubernetes or OpenShift cluster, use the following guides as references for implementating your deployment strategy:

In [ ]: