License¶

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

DISCLAIMER: This notebook is not legal compliance advice.

Engineering Transparency into Your Machine Learning Model with Python and XGBoost¶

Monotonic XGBoost models, partial dependence, ICE, and Shapley explanations¶

A key to building interpretable models is to limit their complexity. The more complex a model is, the harder it is to explain and understand. Overly complex models can also make unstable predictions on new data, which is both difficult to explain and makes models harder to trust. Monotonicity constraints not only simplify models, but do so in a way that is somewhat natural for human reasoning, increasing the transparency of predictive models. Under monotonicity constraints, model predictions can only increase or only decrease as an input variable value increases, and the direction of the constraint is typically specified by the user for logical reasons. For instance, a model might be constrained to produce only increasing probabilities of a certain medical condition as a patient's age increases, or to make only increasing predictions for home prices as a home's square footage increases.

In this notebook a gradient boosting machine (GBM) is trained with monotonicity constraints to predict credit card payment defaults, using the UCI credit card default data, Python, NumPy, Pandas, and XGBoost. First, the credit card default data is loaded and prepared. Then Pearson correlation with the prediction target is used to determine the direction of the monotonicity constraints for each input variable and the model is trained. After the model is trained, partial dependence and individual conditional expectation (ICE) plots are used to analyze and verify the model's monotonic behavior. Finally an example of creating regulator mandated reason codes from high fidelity Shapley explanations for any model prediction is presented. This combination of monotonic XGBoost, partial dependence, ICE, and Shapley explanations is probably the most direct way to create an interpretable machine learning model today.

Note: As of the h2o 3.24 "Yates" release, Shapley values are supported in h2o, in addition to GBM monotonicity constraints and partial dependence. To see Shapley values and monotonicity constraints for an h2o GBM in action please see: https://github.com/jphall663/interpretable_machine_learning_with_python/blob/master/dia.ipynb.

Python imports¶

Let's start with Python package imports. NumPy is used for basic arrray, vector, and matrix calculations. Pandas is used for data frame manipulation and plotting, and XGBoost is used to train a GBM with monotonicity constraints.

In [1]:

import numpy as np                   # array, vector, matrix calculations
import pandas as pd                  # DataFrame handling
import shap                          # for consistent, signed variable importance measurements
import xgboost as xgb                # gradient boosting machines (GBMs)

import matplotlib.pyplot as plt      # plotting
pd.options.display.max_columns = 999 # enable display of all columns in notebook

# enables display of plots in notebook
%matplotlib inline

np.random.seed(12345)                # set random seed for reproducibility

/home/patrickh/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d

1. Download, explore, and prepare UCI credit card default data¶

UCI credit card default data: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

The UCI credit card default data contains demographic and payment information about credit card customers in Taiwan in the year 2005. The data set contains 23 input variables:

LIMIT_BAL: Amount of given credit (NT dollar)
SEX: 1 = male; 2 = female
EDUCATION: 1 = graduate school; 2 = university; 3 = high school; 4 = others
MARRIAGE: 1 = married; 2 = single; 3 = others
AGE: Age in years
PAY_0, PAY_2 - PAY_6: History of past payment; PAY_0 = the repayment status in September, 2005; PAY_2 = the repayment status in August, 2005; ...; PAY_6 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; ...; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
BILL_AMT1 - BILL_AMT6: Amount of bill statement (NT dollar). BILL_AMNT1 = amount of bill statement in September, 2005; BILL_AMT2 = amount of bill statement in August, 2005; ...; BILL_AMT6 = amount of bill statement in April, 2005.
PAY_AMT1 - PAY_AMT6: Amount of previous payment (NT dollar). PAY_AMT1 = amount paid in September, 2005; PAY_AMT2 = amount paid in August, 2005; ...; PAY_AMT6 = amount paid in April, 2005.

These 23 input variables are used to predict the target variable, whether or not a customer defaulted on their credit card bill in late 2005. Because XGBoost accepts only numeric inputs, all variables will be treated as numeric.

Import data and clean¶

The credit card default data is available as an .xls file. Pandas reads .xls files automatically, so it's used to load the credit card default data and give the prediction target a shorter name: DEFAULT_NEXT_MONTH.

In [2]:

# import XLS file
path = 'default_of_credit_card_clients.xls'
data = pd.read_excel(path,
                     skiprows=1) # skip the first row of the spreadsheet

# remove spaces from target column name 
data = data.rename(columns={'default payment next month': 'DEFAULT_NEXT_MONTH'}) 

Assign modeling roles¶

The shorthand name y is assigned to the prediction target. X is assigned to all other input variables in the credit card default data except the row indentifier, ID.

In [3]:

# assign target and inputs for GBM
y = 'DEFAULT_NEXT_MONTH'
X = [name for name in data.columns if name not in [y, 'ID']]
print('y =', y)
print('X =', X)

y = DEFAULT_NEXT_MONTH
X = ['LIMIT_BAL', 'SEX', 'EDUCATION', 'MARRIAGE', 'AGE', 'PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1', 'BILL_AMT2', 'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1', 'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6']

Display descriptive statistics¶

The Pandas describe() function displays a brief description of the credit card default data. The input variables SEX, EDUCATION, MARRIAGE, PAY_0-PAY_6, and the prediction target DEFAULT_NEXT_MONTH, are really categorical variables, but they have already been encoded into meaningful numeric, integer values, which is great for XGBoost. Also, there are no missing values in this dataset.

In [4]:

data[X + [y]].describe() # display descriptive statistics for all columns

Out[4]:

	LIMIT_BAL	SEX	EDUCATION	MARRIAGE	AGE	PAY_0	PAY_2	PAY_3	PAY_4	PAY_5	PAY_6	BILL_AMT1	BILL_AMT2	BILL_AMT3	BILL_AMT4	BILL_AMT5	BILL_AMT6	PAY_AMT1	PAY_AMT2	PAY_AMT3	PAY_AMT4	PAY_AMT5	PAY_AMT6	DEFAULT_NEXT_MONTH
count	30000.000000	30000.000000	30000.000000	30000.000000	30000.000000	30000.000000	30000.000000	30000.000000	30000.000000	30000.000000	30000.000000	30000.000000	30000.000000	3.000000e+04	30000.000000	30000.000000	30000.000000	30000.000000	3.000000e+04	30000.00000	30000.000000	30000.000000	30000.000000	30000.000000
mean	167484.322667	1.603733	1.853133	1.551867	35.485500	-0.016700	-0.133767	-0.166200	-0.220667	-0.266200	-0.291100	51223.330900	49179.075167	4.701315e+04	43262.948967	40311.400967	38871.760400	5663.580500	5.921163e+03	5225.68150	4826.076867	4799.387633	5215.502567	0.221200
std	129747.661567	0.489129	0.790349	0.521970	9.217904	1.123802	1.197186	1.196868	1.169139	1.133187	1.149988	73635.860576	71173.768783	6.934939e+04	64332.856134	60797.155770	59554.107537	16563.280354	2.304087e+04	17606.96147	15666.159744	15278.305679	17777.465775	0.415062
min	10000.000000	1.000000	0.000000	0.000000	21.000000	-2.000000	-2.000000	-2.000000	-2.000000	-2.000000	-2.000000	-165580.000000	-69777.000000	-1.572640e+05	-170000.000000	-81334.000000	-339603.000000	0.000000	0.000000e+00	0.00000	0.000000	0.000000	0.000000	0.000000
25%	50000.000000	1.000000	1.000000	1.000000	28.000000	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000	3558.750000	2984.750000	2.666250e+03	2326.750000	1763.000000	1256.000000	1000.000000	8.330000e+02	390.00000	296.000000	252.500000	117.750000	0.000000
50%	140000.000000	2.000000	2.000000	2.000000	34.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	22381.500000	21200.000000	2.008850e+04	19052.000000	18104.500000	17071.000000	2100.000000	2.009000e+03	1800.00000	1500.000000	1500.000000	1500.000000	0.000000
75%	240000.000000	2.000000	2.000000	2.000000	41.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	67091.000000	64006.250000	6.016475e+04	54506.000000	50190.500000	49198.250000	5006.000000	5.000000e+03	4505.00000	4013.250000	4031.500000	4000.000000	0.000000
max	1000000.000000	2.000000	6.000000	3.000000	79.000000	8.000000	8.000000	8.000000	8.000000	8.000000	8.000000	964511.000000	983931.000000	1.664089e+06	891586.000000	927171.000000	961664.000000	873552.000000	1.684259e+06	896040.00000	621000.000000	426529.000000	528666.000000	1.000000

2. Investigate pair-wise Pearson correlations for DEFAULT_NEXT_MONTH¶

Monotonic relationships are much easier to explain to colleagues, bosses, customers, and regulators than more complex, non-monotonic relationships and monotonic relationships may also prevent overfitting and excess error due to variance for new data.

To train a transparent monotonic classifier, contraints must be supplied to XGBoost that determine whether the learned relationship between an input variable and the prediction target DEFAULT_NEXT_MONTH will be increasing for increases in an input variable or decreasing for increases in an input variable. Pearson correlation provides a linear measure of the direction of the relationship between each input variable and the target. If the pair-wise Pearson correlation between an input and DEFAULT_NEXT_MONTH is positive, it will be constrained to have an increasing relationship with the predictions for DEFAULT_NEXT_MONTH. If the pair-wise Pearson correlation is negative, the input will be constrained to have a decreasing relationship with the predictions for DEFAULT_NEXT_MONTH.

Constrainsts are supplied to XGBoost in the form of a Python tuple with length equal to the number of inputs. Each item in the tuple is associated with an input variable based on its index in the tuple. The first constraint in the tuple is associated with the first variable in the training data, the second constraint in the tuple is associated with the second variable in the training data, and so on. The constraints themselves take the form of a 1 for a positive relationship and a -1 for a negative relationship.

Calculate Pearson correlation¶

The Pandas .corr() function returns the pair-wise Pearson correlation between variables in a Pandas DataFrame. Because DEFAULT_NEXT_MONTH is the last column in the data DataFrame, the last column of the Pearson correlation matrix indicates the direction of the linear relationship between each input variable and the prediction target, DEFAULT_NEXT_MONTH. According to the calculated values, as a customer's balance limit (LIMIT_BAL), bill amounts (BILL_AMT1-BILL_AMT6), and payment amounts (PAY_AMT1-PAY_AMT6) increase, their probability of default tends to decrease. However as a customer's number of late payments increase (PAY_0, PAY_2-PAY6), their probability of default usually increases. In general, the Pearson correlation values make sense, and they will be used to ensure that the modeled relationships will make sense as well. (Pearson correlation values between the target variable, DEFAULT_NEXT_MONTH, and each input variable are displayed directly below.)

In [5]:

# displays last column of Pearson correlation matrix as Pandas DataFrame
pd.DataFrame(data[X + [y]].corr()[y]).iloc[:-1] 

Out[5]:

	DEFAULT_NEXT_MONTH
LIMIT_BAL	-0.153520
SEX	-0.039961
EDUCATION	0.028006
MARRIAGE	-0.024339
AGE	0.013890
PAY_0	0.324794
PAY_2	0.263551
PAY_3	0.235253
PAY_4	0.216614
PAY_5	0.204149
PAY_6	0.186866
BILL_AMT1	-0.019644
BILL_AMT2	-0.014193
BILL_AMT3	-0.014076
BILL_AMT4	-0.010156
BILL_AMT5	-0.006760
BILL_AMT6	-0.005372
PAY_AMT1	-0.072929
PAY_AMT2	-0.058579
PAY_AMT3	-0.056250
PAY_AMT4	-0.056827
PAY_AMT5	-0.055124
PAY_AMT6	-0.053183

Create tuple of monotonicity constraints from Pearson correlation values¶

The last column of the Pearson correlation matrix is transformed from a numeric column in a Pandas DataFrame into a Python tuple of 1s and -1s that will be used to specifiy monotonicity constraints for each input variable in XGBoost. If the Pearson correlation between an input variable and DEFAULT_NEXT_MONTH is positive, a positive montonic relationship constraint is specified for that variable using 1. If the correlation is negative, a negative monotonic constraint is specified using -1. (Specifying 0 indicates that no constraints should be used.) The resulting tuple will be passed to XGBoost when the GBM model is trained.

In [6]:

# creates a tuple in which positive correlation values are assigned a 1
# and negative correlation values are assigned a -1
mono_constraints = tuple([int(i) for i in np.sign(data[X + [y]].corr()[y].values[:-1])])

# (-1, -1, 1, -1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1)

3. Train XGBoost with monotonicity constraints¶

XGBoost is a very accurate, open source GBM library for regression and classification tasks. XGBoost can learn complex relationships between input variables and a target variable, but here the monotone_constraints tuning parameter is used to enforce monotonicity between inputs and the prediction for DEFAULT_NEXT_MONTH. XGBoost's early stopping functionality is also used to limit overfitting to the training data

XGBoost is available from: https://github.com/dmlc/xgboost and the implementation of XGBoost is described in detail here: http://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf.

After training, GBM variable importance is calculated and displayed. GBM variable importance is a global measure of the overall impact of an input variable on the GBM model predictions. Global variable importance values give an indication of the magnitude of a variable's contribution to model predictions for all observations. To enhance trust in the GBM model, variable importance values should typically conform to human domain knowledge and reasonable expectations.

Split data into training and test sets for early stopping¶

The credit card default data is split into training and test sets to monitor and prevent overtraining. Reproducibility is another important factor in creating trustworthy models, and randomly splitting datasets can introduce randomness in model predictions and other results. A random seed is used here to ensure the data split is reproducible.

In [7]:

np.random.seed(12345) # set random seed for reproducibility
split_ratio = 0.7     # 70%/30% train/test split

# execute split
split = np.random.rand(len(data)) < split_ratio
train = data[split]
test = data[~split]

# summarize split
print('Train data rows = %d, columns = %d' % (train.shape[0], train.shape[1]))
print('Test data rows = %d, columns = %d' % (test.shape[0], test.shape[1]))

Train data rows = 20946, columns = 25
Test data rows = 9054, columns = 25

Train XGBoost GBM classifier¶

To train an XGBoost classifier, the training and test data must be converted from Pandas DataFrames into SVMLight format. The DMatrix() function in the XGBoost package is used to convert the data. Many XGBoost tuning parameters must be specified as well. Typically a grid search would be performed to identify the best parameters for a given modeling task. For brevity's sake, a previously-discovered set of good tuning parameters are specified here. Notice that the monotonicity constraints are passed to XGBoost using the monotone_constraints parameter. Because gradient boosting methods typically resample training data, an additional random seed is also specified for XGBoost using the seed paramter to create reproducible predictions, error rates, and variable importance values. To avoid overfitting, the early_stopping_rounds parameter is used to stop the training process after the test area under the curve (AUC) statistic fails to increase for 50 iterations.

In [8]:

# XGBoost uses SVMLight data structure, not Numpy arrays or Pandas DataFrames 
dtrain = xgb.DMatrix(train[X], train[y])
dtest = xgb.DMatrix(test[X], test[y])

# used to calibrate predictions to mean of y 
base_y = train[y].mean()

# tuning parameters
params = {
    'objective': 'binary:logistic',             # produces 0-1 probabilities for binary classification
    'booster': 'gbtree',                        # base learner will be decision tree
    'eval_metric': 'auc',                       # stop training based on maximum AUC, AUC always between 0-1
    'eta': 0.08,                                # learning rate
    'subsample': 0.9,                           # use 90% of rows in each decision tree
    'colsample_bytree': 0.9,                    # use 90% of columns in each decision tree
    'max_depth': 15,                            # allow decision trees to grow to depth of 15
    'monotone_constraints':mono_constraints,    # 1 = increasing relationship, -1 = decreasing relationship
    'base_score': base_y,                       # calibrate predictions to mean of y 
    'seed': 12345                               # set random seed for reproducibility
}

# watchlist is used for early stopping
watchlist = [(dtrain, 'train'), (dtest, 'eval')]

# train model
xgb_model = xgb.train(params,                   # set tuning parameters from above                   
                      dtrain,                   # training data
                      1000,                     # maximum of 1000 iterations (trees)
                      evals=watchlist,          # use watchlist for early stopping 
                      early_stopping_rounds=50, # stop after 50 iterations (trees) without increase in AUC
                      verbose_eval=True)        # display iteration progress

[0]	train-auc:0.738066	eval-auc:0.733449
Multiple eval metrics have been passed: 'eval-auc' will be used for early stopping.

Will train until eval-auc hasn't improved in 50 rounds.
[1]	train-auc:0.772991	eval-auc:0.769311
[2]	train-auc:0.775536	eval-auc:0.772624
[3]	train-auc:0.776248	eval-auc:0.771985
[4]	train-auc:0.777216	eval-auc:0.772796
[5]	train-auc:0.777782	eval-auc:0.773066
[6]	train-auc:0.777783	eval-auc:0.773471
[7]	train-auc:0.777856	eval-auc:0.773591
[8]	train-auc:0.777633	eval-auc:0.773209
[9]	train-auc:0.777417	eval-auc:0.772892
[10]	train-auc:0.777519	eval-auc:0.772666
[11]	train-auc:0.778476	eval-auc:0.773346
[12]	train-auc:0.778541	eval-auc:0.773312
[13]	train-auc:0.778351	eval-auc:0.773266
[14]	train-auc:0.778519	eval-auc:0.773695
[15]	train-auc:0.779383	eval-auc:0.774314
[16]	train-auc:0.779555	eval-auc:0.77473
[17]	train-auc:0.780178	eval-auc:0.775043
[18]	train-auc:0.780743	eval-auc:0.775246
[19]	train-auc:0.78146	eval-auc:0.776021
[20]	train-auc:0.781944	eval-auc:0.776477
[21]	train-auc:0.782181	eval-auc:0.776696
[22]	train-auc:0.782319	eval-auc:0.776915
[23]	train-auc:0.78263	eval-auc:0.776998
[24]	train-auc:0.782965	eval-auc:0.777144
[25]	train-auc:0.783396	eval-auc:0.777498
[26]	train-auc:0.783791	eval-auc:0.777791
[27]	train-auc:0.784151	eval-auc:0.778091
[28]	train-auc:0.784382	eval-auc:0.778375
[29]	train-auc:0.784678	eval-auc:0.778567
[30]	train-auc:0.784912	eval-auc:0.778755
[31]	train-auc:0.785231	eval-auc:0.778938
[32]	train-auc:0.785418	eval-auc:0.779125
[33]	train-auc:0.785598	eval-auc:0.779226
[34]	train-auc:0.785838	eval-auc:0.779421
[35]	train-auc:0.7861	eval-auc:0.779432
[36]	train-auc:0.786347	eval-auc:0.779549
[37]	train-auc:0.786633	eval-auc:0.779575
[38]	train-auc:0.786833	eval-auc:0.779668
[39]	train-auc:0.787093	eval-auc:0.779743
[40]	train-auc:0.787307	eval-auc:0.779926
[41]	train-auc:0.787602	eval-auc:0.780015
[42]	train-auc:0.787819	eval-auc:0.780089
[43]	train-auc:0.787943	eval-auc:0.780196
[44]	train-auc:0.788092	eval-auc:0.780239
[45]	train-auc:0.788194	eval-auc:0.780198
[46]	train-auc:0.788298	eval-auc:0.780235
[47]	train-auc:0.788422	eval-auc:0.780305
[48]	train-auc:0.78862	eval-auc:0.780372
[49]	train-auc:0.788803	eval-auc:0.780392
[50]	train-auc:0.789049	eval-auc:0.780558
[51]	train-auc:0.789221	eval-auc:0.780659
[52]	train-auc:0.789405	eval-auc:0.780756
[53]	train-auc:0.789553	eval-auc:0.780721
[54]	train-auc:0.78968	eval-auc:0.780791
[55]	train-auc:0.789779	eval-auc:0.780849
[56]	train-auc:0.789875	eval-auc:0.780932
[57]	train-auc:0.79002	eval-auc:0.780963
[58]	train-auc:0.790156	eval-auc:0.781038
[59]	train-auc:0.790292	eval-auc:0.781085
[60]	train-auc:0.790403	eval-auc:0.781079
[61]	train-auc:0.790509	eval-auc:0.781091
[62]	train-auc:0.790554	eval-auc:0.781061
[63]	train-auc:0.790635	eval-auc:0.781148
[64]	train-auc:0.790679	eval-auc:0.781166
[65]	train-auc:0.790822	eval-auc:0.781155
[66]	train-auc:0.790896	eval-auc:0.781178
[67]	train-auc:0.790977	eval-auc:0.781188
[68]	train-auc:0.791155	eval-auc:0.781211
[69]	train-auc:0.791239	eval-auc:0.781179
[70]	train-auc:0.7914	eval-auc:0.781347
[71]	train-auc:0.791525	eval-auc:0.78134
[72]	train-auc:0.791578	eval-auc:0.781312
[73]	train-auc:0.791691	eval-auc:0.781325
[74]	train-auc:0.791747	eval-auc:0.781323
[75]	train-auc:0.791801	eval-auc:0.781304
[76]	train-auc:0.791844	eval-auc:0.781313
[77]	train-auc:0.7919	eval-auc:0.781325
[78]	train-auc:0.792056	eval-auc:0.781448
[79]	train-auc:0.792088	eval-auc:0.781397
[80]	train-auc:0.792151	eval-auc:0.78138
[81]	train-auc:0.792173	eval-auc:0.781388
[82]	train-auc:0.792236	eval-auc:0.781301
[83]	train-auc:0.792327	eval-auc:0.781355
[84]	train-auc:0.792372	eval-auc:0.781356
[85]	train-auc:0.792402	eval-auc:0.781321
[86]	train-auc:0.79247	eval-auc:0.781286
[87]	train-auc:0.792521	eval-auc:0.781283
[88]	train-auc:0.792543	eval-auc:0.781265
[89]	train-auc:0.792595	eval-auc:0.781255
[90]	train-auc:0.792618	eval-auc:0.781242
[91]	train-auc:0.792673	eval-auc:0.781309
[92]	train-auc:0.792766	eval-auc:0.781357
[93]	train-auc:0.792826	eval-auc:0.781381
[94]	train-auc:0.792914	eval-auc:0.781387
[95]	train-auc:0.792967	eval-auc:0.781385
[96]	train-auc:0.793016	eval-auc:0.781375
[97]	train-auc:0.793053	eval-auc:0.781353
[98]	train-auc:0.79312	eval-auc:0.78137
[99]	train-auc:0.793145	eval-auc:0.781413
[100]	train-auc:0.793191	eval-auc:0.781456
[101]	train-auc:0.793256	eval-auc:0.781435
[102]	train-auc:0.793282	eval-auc:0.781382
[103]	train-auc:0.793322	eval-auc:0.781385
[104]	train-auc:0.793346	eval-auc:0.781361
[105]	train-auc:0.793399	eval-auc:0.781418
[106]	train-auc:0.793436	eval-auc:0.781398
[107]	train-auc:0.793511	eval-auc:0.781358
[108]	train-auc:0.793578	eval-auc:0.78137
[109]	train-auc:0.793655	eval-auc:0.781336
[110]	train-auc:0.793683	eval-auc:0.781299
[111]	train-auc:0.793701	eval-auc:0.781277
[112]	train-auc:0.79371	eval-auc:0.78128
[113]	train-auc:0.793745	eval-auc:0.781298
[114]	train-auc:0.793817	eval-auc:0.781307
[115]	train-auc:0.793838	eval-auc:0.781301
[116]	train-auc:0.793867	eval-auc:0.781308
[117]	train-auc:0.793877	eval-auc:0.781315
[118]	train-auc:0.793917	eval-auc:0.781246
[119]	train-auc:0.793947	eval-auc:0.781302
[120]	train-auc:0.794	eval-auc:0.781358
[121]	train-auc:0.794026	eval-auc:0.781357
[122]	train-auc:0.794053	eval-auc:0.781277
[123]	train-auc:0.794064	eval-auc:0.781257
[124]	train-auc:0.794209	eval-auc:0.781289
[125]	train-auc:0.794219	eval-auc:0.781288
[126]	train-auc:0.794287	eval-auc:0.781347
[127]	train-auc:0.79429	eval-auc:0.781347
[128]	train-auc:0.794327	eval-auc:0.781331
[129]	train-auc:0.794336	eval-auc:0.781349
[130]	train-auc:0.794367	eval-auc:0.781347
[131]	train-auc:0.794364	eval-auc:0.78134
[132]	train-auc:0.794385	eval-auc:0.781363
[133]	train-auc:0.794387	eval-auc:0.781326
[134]	train-auc:0.794472	eval-auc:0.78132
[135]	train-auc:0.794483	eval-auc:0.781326
[136]	train-auc:0.794495	eval-auc:0.781306
[137]	train-auc:0.794583	eval-auc:0.781293
[138]	train-auc:0.794589	eval-auc:0.781293
[139]	train-auc:0.7946	eval-auc:0.781286
[140]	train-auc:0.794642	eval-auc:0.7813
[141]	train-auc:0.794656	eval-auc:0.781303
[142]	train-auc:0.794654	eval-auc:0.781287
[143]	train-auc:0.794695	eval-auc:0.781273
[144]	train-auc:0.794705	eval-auc:0.781268
[145]	train-auc:0.794708	eval-auc:0.781268
[146]	train-auc:0.79471	eval-auc:0.781258
[147]	train-auc:0.794775	eval-auc:0.781284
[148]	train-auc:0.794782	eval-auc:0.781277
[149]	train-auc:0.794794	eval-auc:0.781283
[150]	train-auc:0.794815	eval-auc:0.78126
Stopping. Best iteration:
[100]	train-auc:0.793191	eval-auc:0.781456

Global Shapley variable importance¶

By setting pred_contribs=True, XGBoost's predict() function will return Shapley values for each row of the test set. Instead of relying on traditional single-value variable importance measures, local Shapley values for each input will be ploted below to get a more holistic and consisent measurement for the global importance of each input variable. Shapley values are introduced in greater detail in Section 6 below, but for now notice the monotonicity of the input variable contributions displayed in the Shapley summary plot.

In [9]:

# dtest is DMatrix
# shap_values is Numpy array
shap_values = xgb_model.predict(dtest, pred_contribs=True, ntree_limit=xgb_model.best_ntree_limit)

In [10]:

# plot Shapley variable importance summary 
shap.summary_plot(shap_values[:, :-1], test[xgb_model.feature_names])

Display Shapley variable importance summary¶

The variable importance ranking should be parsimonious with human domain knowledge and reasonable expectations. In this case, PAY_0 is by far the most important variable. As someone's most recent behavior is a very good indicator of future behavior, this checks out.

4. Calculating partial dependence and ICE to validate and explain monotonic behavior¶

Partial dependence plots are used to view the global, average prediction behavior of a variable under the monotonic model. Partial dependence plots show the average prediction of the monotonic model as a function of specific values of an input variable of interest, indicating how the monotonic GBM predictions change based on the values of the input variable of interest, while taking nonlinearity into consideration and averaging out the effects of all other input variables. Partial dependence plots enable increased transparency into the monotonic GBM's mechanisms and enable validation and debugging of the monotonic GBM by comparing a variable's average predictions across its domain to known standards and reasonable expectations. Partial dependence plots are described in greater detail in The Elements of Statistical Learning, section 10.13: https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print12.pdf.

Individual conditional expectation (ICE) plots, a newer and less well-known adaptation of partial dependence plots, can be used to create more localized explanations for a single observation of data using the same basic ideas as partial dependence plots. ICE is also a type of nonlinear sensitivity analysis in which the model predictions for a single observation are measured while a feature of interest is varied over its domain. ICE increases understanding and transparency by displaying the nonlinear behavior of the monotonic GBM. ICE also enhances trust, accountability, and fairness by enabling comparisons of nonlinear behavior to human domain knowledge and reasonable expectations. ICE, as a type of sensitivity analysis, can also engender trust when model behavior on simulated or extreme data points is acceptable. A detailed description of ICE is available in this arXiv preprint: https://arxiv.org/abs/1309.6392.

Because partial dependence and ICE are measured on the same scale, they can be displayed in the same line plot to compare the global, average prediction behavior for the entire model and the local prediction behavior for certain rows of data. Overlaying the two types of curves enables analysis of both global and local behavior simultaneously and provides an indication of the trustworthiness of the average behavior represented by partial dependence. (Partial dependence can be misleading in the presence of strong interactions or correlation. ICE curves diverging from the partial dependence curve can be indicative of such problems.) Histograms are also presented with the partial dependence and ICE curves, to enable a rough measure of epistemic uncertainty for model predictions: predictions based on small amounts of training data are likely less dependable.

Function for calculating partial dependence¶

Since partial dependence and ICE will be calculated for several important variables in the GBM model, it's convenient to have a function doing so. It's probably best to analyze partial dependence and ICE for all variables in a model, but only the top three most important input variables will be investigated here. It's also a good idea to analyze partial dependence and ICE on the test data, or other holdout datasets, to see how the model will perform on new data. This simple function is designed to return partial dependence when it is called for an entire dataset and ICE when it is called for a single row. The bins argument will be used later to calculate ICE values at the same places in an input variable domain that partial dependence is calculated directly below.

In [11]:

def par_dep(xs, frame, model, resolution=20, bins=None):
    
    """ Creates Pandas DataFrame containing partial dependence for a 
        single variable.
    
    Args:
        xs: Variable for which to calculate partial dependence.
        frame: Pandas DataFrame for which to calculate partial dependence.
        model: XGBoost model for which to calculate partial dependence.
        resolution: The number of points across the domain of xs for which 
                    to calculate partial dependence, default 20.
        bins: List of values at which to set xs, default 20 equally-spaced 
              points between column minimum and maximum.
    
    Returns:
        Pandas DataFrame containing partial dependence values.
        
    """
    
    # turn off pesky Pandas copy warning
    pd.options.mode.chained_assignment = None
    
    # initialize empty Pandas DataFrame with correct column names
    par_dep_frame = pd.DataFrame(columns=[xs, 'partial_dependence'])
    
    # cache original column values 
    col_cache = frame.loc[:, xs].copy(deep=True)
  
    # determine values at which to calculate partial dependence
    if bins == None:
        min_ = frame[xs].min()
        max_ = frame[xs].max()
        by = (max_ - min_)/resolution
        bins = np.arange(min_, max_, by)
        
    # calculate partial dependence  
    # by setting column of interest to constant 
    # and scoring the altered data and taking the mean of the predictions
    for j in bins:
        frame.loc[:, xs] = j
        dframe = xgb.DMatrix(frame)
        par_dep_i = pd.DataFrame(model.predict(dframe, ntree_limit=model.best_ntree_limit))
        par_dep_j = par_dep_i.mean()[0]
        par_dep_frame = par_dep_frame.append({xs:j,
                                              'partial_dependence': par_dep_j}, 
                                              ignore_index=True)
        
    # return input frame to original cached state    
    frame.loc[:, xs] = col_cache

    return par_dep_frame

Calculate partial dependence for the most important input variables in the GBM¶

The partial dependence for LIMIT_BAL can be seen to decrease as credit balance limits increase. This finding is aligned with expectations that the model predictions will be monotonically decreasing with increasing LIMIT_BAL and parsimonious with well-known business practices in credit lending. Partial dependence for other important values is displayed in plots further below.

In [12]:

par_dep_PAY_0 = par_dep('PAY_0', test[X], xgb_model)         # calculate partial dependence for PAY_0
par_dep_LIMIT_BAL = par_dep('LIMIT_BAL', test[X], xgb_model) # calculate partial dependence for LIMIT_BAL
par_dep_BILL_AMT1 = par_dep('BILL_AMT1', test[X], xgb_model) # calculate partial dependence for BILL_AMT1

# display partial dependence for LIMIT_BAL
par_dep_LIMIT_BAL

Out[12]:

	LIMIT_BAL	partial_dependence
0	10000.0	0.266888
1	59500.0	0.243301
2	109000.0	0.224759
3	158500.0	0.216966
4	208000.0	0.214594
5	257500.0	0.209798
6	307000.0	0.206901
7	356500.0	0.198712
8	406000.0	0.197883
9	455500.0	0.197086
10	505000.0	0.194449
11	554500.0	0.194407
12	604000.0	0.190247
13	653500.0	0.190209
14	703000.0	0.190188
15	752500.0	0.190188
16	802000.0	0.190188
17	851500.0	0.190188
18	901000.0	0.190188
19	950500.0	0.190188

Helper function for finding percentiles of predictions¶

ICE can be calculated for any row in the training or test data, but without intimate knowledge of a data source it can be difficult to know where to apply ICE. Calculating and analyzing ICE curves for every row of training and test data set can be overwhelming, even for the example credit card default dataset. One place to start with ICE is to calculate ICE curves at every decile of predicted probabilities in a dataset, giving an indication of local prediction behavior across the dataset. The function below finds and returns the row indices for the maximum, minimum, and deciles of one column in terms of another -- in this case, the model predictions (p_DEFAULT_NEXT_MONTH) and the row identifier (ID), respectively.

In [13]:

def get_percentile_dict(yhat, id_, frame):

    """ Returns the percentiles of a column, yhat, as the indices based on 
        another column id_.
    
    Args:
        yhat: Column in which to find percentiles.
        id_: Id column that stores indices for percentiles of yhat.
        frame: Pandas DataFrame containing yhat and id_. 
    
    Returns:
        Dictionary of percentile values and index column values.
    
    """
    
    # create a copy of frame and sort it by yhat
    sort_df = frame.copy(deep=True)
    sort_df.sort_values(yhat, inplace=True)
    sort_df.reset_index(inplace=True)
    
    # find top and bottom percentiles
    percentiles_dict = {}
    percentiles_dict[0] = sort_df.loc[0, id_]
    percentiles_dict[99] = sort_df.loc[sort_df.shape[0]-1, id_]

    # find 10th-90th percentiles
    inc = sort_df.shape[0]//10
    for i in range(1, 10):
        percentiles_dict[i * 10] = sort_df.loc[i * inc,  id_]

    return percentiles_dict

Find some percentiles of yhat in the test data¶

The values for ID that correspond to the maximum, minimum, and deciles of p_DEFAULT_NEXT_MONTH are displayed below. ICE will be calculated for the rows of the test dataset associated with these ID values.

In [14]:

# merge GBM predictions onto test data
yhat_test = pd.concat([test.reset_index(drop=True), pd.DataFrame(xgb_model.predict(dtest, 
                                                                                   ntree_limit=xgb_model.best_ntree_limit))],
                      axis=1)
yhat_test = yhat_test.rename(columns={0:'p_DEFAULT_NEXT_MONTH'})

# find percentiles of predictions
percentile_dict = get_percentile_dict('p_DEFAULT_NEXT_MONTH', 'ID', yhat_test)

# display percentiles dictionary
# ID values for rows
# from lowest prediction 
# to highest prediction
percentile_dict

Out[14]:

Calculate ICE curve values¶

ICE values represent a model's prediction for a row of data while an input variable of interest is varied across its domain. The values of the input variable are chosen to match the values at which partial dependence was calculated earlier, and ICE is calculated for the top three most important variables and for rows at each percentile of the test dataset.

In [15]:

# retreive bins from original partial dependence calculation

bins_PAY_0 = list(par_dep_PAY_0['PAY_0'])
bins_LIMIT_BAL = list(par_dep_LIMIT_BAL['LIMIT_BAL'])
bins_BILL_AMT1 = list(par_dep_BILL_AMT1['BILL_AMT1'])

# for each percentile in percentile_dict
# create a new column in the par_dep frame 
# representing the ICE curve for that percentile
# and the variables of interest
for i in sorted(percentile_dict.keys()):
    
    col_name = 'Percentile_' + str(i)
    
    # ICE curves for PAY_0 across percentiles at bins_PAY_0 intervals
    par_dep_PAY_0[col_name] = par_dep('PAY_0', 
                                    test[test['ID'] == int(percentile_dict[i])][X],  
                                    xgb_model, 
                                    bins=bins_PAY_0)['partial_dependence']
    
    # ICE curves for LIMIT_BAL across percentiles at bins_LIMIT_BAL intervals
    par_dep_LIMIT_BAL[col_name] = par_dep('LIMIT_BAL', 
                                          test[test['ID'] == int(percentile_dict[i])][X], 
                                          xgb_model, 
                                          bins=bins_LIMIT_BAL)['partial_dependence']
    


    # ICE curves for BILL_AMT1 across percentiles at bins_BILL_AMT1 intervals
    par_dep_BILL_AMT1[col_name] = par_dep('BILL_AMT1', 
                                          test[test['ID'] == int(percentile_dict[i])][X],  
                                          xgb_model, 
                                          bins=bins_BILL_AMT1)['partial_dependence']

Display partial dependence and ICE for `LIMIT_BAL`¶

Partial dependence and ICE values for rows at the minimum, maximum and deciles (0%, 10%, 20%, ..., 90%, 99%) of predictions for DEFAULT_NEXT_MONTH and at the values of LIMIT_BAL used for partial dependence are shown here. Each column of ICE values will be a curve in the plots below. ICE values represent a prediction for a row of test data, at a percentile of interest noted in the column name above, and setting LIMIT_BAL to the value of LIMIT_BAL at right. Notice that monotonic decreasing prediction behavior for LIMIT_BAL holds at each displayed percentile of predicted DEFAULT_NEXT_MONTH, helping to validate that the trained GBM predictions are monotonic for LIMIT_BAL.

In [16]:

par_dep_LIMIT_BAL

Out[16]:

	LIMIT_BAL	partial_dependence	Percentile_0	Percentile_10	Percentile_20	Percentile_30	Percentile_40	Percentile_50	Percentile_60	Percentile_70	Percentile_80	Percentile_90	Percentile_99
0	10000.0	0.266888	0.009752	0.103610	0.114130	0.161717	0.178196	0.176423	0.215932	0.237970	0.482309	0.600873	0.963266
1	59500.0	0.243301	0.007765	0.095805	0.106769	0.151972	0.155897	0.162622	0.177366	0.206651	0.337691	0.589542	0.943703
2	109000.0	0.224759	0.004514	0.073701	0.089827	0.122272	0.147910	0.129995	0.164410	0.186139	0.330370	0.575241	0.939937
3	158500.0	0.216966	0.003510	0.061587	0.082731	0.108702	0.139923	0.117246	0.158431	0.174685	0.330370	0.571536	0.939715
4	208000.0	0.214594	0.003208	0.059036	0.081863	0.106850	0.137178	0.115268	0.158413	0.171402	0.330370	0.569508	0.938817
5	257500.0	0.209798	0.002925	0.054268	0.078228	0.102988	0.134296	0.113019	0.155023	0.166766	0.330130	0.559060	0.936782
6	307000.0	0.206901	0.002824	0.053263	0.075723	0.100607	0.133915	0.109701	0.152706	0.162116	0.330130	0.555402	0.935711
7	356500.0	0.198712	0.001872	0.046365	0.066366	0.083248	0.128932	0.090932	0.150245	0.147120	0.329356	0.552223	0.935394
8	406000.0	0.197883	0.001872	0.045775	0.064863	0.081453	0.127613	0.090882	0.149558	0.146809	0.324522	0.550167	0.934938
9	455500.0	0.197086	0.001872	0.044761	0.063219	0.079312	0.126955	0.088818	0.149372	0.146495	0.322438	0.550167	0.934894
10	505000.0	0.194449	0.001847	0.044276	0.062503	0.078314	0.121430	0.087403	0.143454	0.143816	0.314405	0.549231	0.934661
11	554500.0	0.194407	0.001847	0.044262	0.062484	0.078040	0.121430	0.087377	0.143394	0.143775	0.314405	0.549195	0.934661
12	604000.0	0.190247	0.001763	0.042238	0.059409	0.074256	0.119003	0.083055	0.139884	0.137249	0.307714	0.547538	0.933731
13	653500.0	0.190209	0.001763	0.042231	0.059384	0.074229	0.119003	0.082968	0.139812	0.137186	0.307588	0.547495	0.933731
14	703000.0	0.190188	0.001763	0.042211	0.059356	0.074195	0.119003	0.082929	0.139812	0.137126	0.307588	0.547370	0.933731
15	752500.0	0.190188	0.001763	0.042211	0.059356	0.074195	0.119003	0.082929	0.139812	0.137126	0.307588	0.547370	0.933731
16	802000.0	0.190188	0.001763	0.042211	0.059356	0.074195	0.119003	0.082929	0.139812	0.137126	0.307588	0.547370	0.933731
17	851500.0	0.190188	0.001763	0.042211	0.059356	0.074195	0.119003	0.082929	0.139812	0.137126	0.307588	0.547370	0.933731
18	901000.0	0.190188	0.001763	0.042211	0.059356	0.074195	0.119003	0.082929	0.139812	0.137126	0.307588	0.547370	0.933731
19	950500.0	0.190188	0.001763	0.042211	0.059356	0.074195	0.119003	0.082929	0.139812	0.137126	0.307588	0.547370	0.933731

5. Plotting partial dependence and ICE to validate and explain monotonic behavior¶

Overlaying partial dependence onto ICE in a plot is a convenient way to validate and understand both global and local monotonic behavior. Plots of partial dependence curves overlayed onto ICE curves for several percentiles of predictions for DEFAULT_NEXT_MONTH are used to validate monotonic behavior, describe the GBM model mechanisms, and to compare the most extreme GBM behavior with the average GBM behavior in the test data. Partial dependence and ICE plots are displayed for the three most important variables in the GBM: PAY_0, LIMIT_BAL, and BILL_AMT1.

In [17]:

#### Function to plot partial dependence and ICE

def plot_par_dep_ICE(xs, par_dep_frame):

    
    """ Plots ICE overlayed onto partial dependence for a single variable.
    
    Args: 
        xs: Name of variable for which to plot ICE and partial dependence.
        par_dep_frame: Name of Pandas DataFrame containing ICE and partial
                       dependence values.
    
    """
    
    # initialize figure and axis
    fig, ax = plt.subplots()
    
    # plot ICE curves
    par_dep_frame.drop('partial_dependence', axis=1).plot(x=xs, 
                                                          colormap='gnuplot',
                                                          ax=ax)

    # overlay partial dependence, annotate plot
    par_dep_frame.plot(title='Partial Dependence and ICE for ' + str(xs),
                       x=xs, 
                       y='partial_dependence',
                       style='r-', 
                       linewidth=3, 
                       ax=ax)

    # add legend
    _ = plt.legend(bbox_to_anchor=(1.05, 0),
                   loc=3, 
                   borderaxespad=0.)

Partial dependence and ICE plot for `LIMIT_BAL`¶

The monotonic prediction behavior displayed in the partial dependence, and ICE tables for LIMIT_BAL is also visible in this plot. Monotonic decreasing behavior is evident at every percentile of predictions for DEFAULT_NEXT_MONTH. Most percentiles of predictions show that sharper decreases in probability of default occur when LIMIT_BAL increases just slightly from its lowest values in the test set. However, for the custumers that are most likely to default according to the GBM model, no increase in LIMIT_BAL has a strong impact on probabilitiy of default. As mentioned previously, the displayed relationship between credit balance limits and probablility of default is not uncommon in credit lending. As can be seen from the displayed histogram, above ~$NT 500,000 prediction behavior may have been learned from extremely small samples of data.

In [18]:

plot_par_dep_ICE('LIMIT_BAL', par_dep_LIMIT_BAL) # plot partial dependence and ICE for LIMIT_BAL

In [19]:

_ = train['LIMIT_BAL'].plot(kind='hist', bins=20, title='Histogram: LIMIT_BAL')

Partial dependence and ICE plot for `PAY_0`¶

Monotonic increasing prediction behavior for PAY_0 is displayed for all percentiles of model predictions. Predition behavior is different at different deciles, but not abnormal or vastly different from the average prediction behavior represented by the red partial dependence curve. The largest jump in predicted probability appears to occur at PAY_0 = 2, or when a customer becomes two months late on their most recent payment. Above PAY_0 = 3 there are few examples from which the model could learn.

In [20]:

plot_par_dep_ICE('PAY_0', par_dep_PAY_0) # plot partial dependence and ICE for PAY_0

In [21]:

_ = train['PAY_0'].plot(kind='hist', bins=20, title='Histogram: PAY_0')

Partial dependence and ICE plot for `BILL_AMT1`¶

Monotonic decreasing prediction behavior for BILL_AMT1 is also displayed for all percentiles. This mild decrease in probability of default as most recent bill amount increases could be related to wealthier, big-spending customers taking on more debt but also being able to pay it off reliably. Also, customers with negative bills are more likely to default, potentially indicating charge-offs are being recorded as negative bills. In a mission-critical situation, this issue would require more debugging. Also predictions below $ NT 0 and above $ NT 400,000 are based on very little training data.

In [22]:

plot_par_dep_ICE('BILL_AMT1', par_dep_BILL_AMT1) # plot partial dependence and ICE for BILL_AMT1

In [23]:

_ = train['BILL_AMT1'].plot(kind='hist', bins=20, title='Histogram: BILL_AMT1')

6. Generate reason codes using the Shapley method¶

Now that the monotonic behavior of the GBM has been verified and compared against domain knowledge and reasonable expectations, a method called Shapley explanations will be used to calculate the local variable importance for any one prediction: http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions. Shapley explanations are the only possible consistent local variable importance values. (Here consistency means that if a variable is more important than another variable in a given prediction, the more important variable's Shapley value will not be smaller in magnitude than the less important variable's Shapley value.) Very crucially Shapley values also always sum to the actual prediction of the XGBoost model. When used in a model-specific context for decision tree models, Shapley values are likely the most accurate known local variable importance method available today. In this notebook, XGBoost itself is used to create Shapley values with the pred_contribs parameter to predict(), but the shap package is also available for other types of models: https://github.com/slundberg/shap.

The numeric Shapley values in each column are an estimate of how much each variable contributed to each prediction. Shapley contributions can indicate how a variable and its values were weighted in any given decision by the model. These values are crucially important for machine learning interpretability and are related to "local feature importance", "reason codes", or "turn-down codes." The latter phrases are borrowed from credit scoring. Credit lenders in the U.S. must provide reasons for automatically rejecting a credit application. Reason codes can be easily extracted from Shapley local variable contribution values by ranking the variables that played the largest role in any given decision.

To find the index corresponding to a particular row of interest later, the index of the test DataFrame is reset to begin at 0 and increase sequentially. Without resetting the index, the test DataFrame row indices still correspond to the original raw data from which the test set was sampled.

In [24]:

test.reset_index(drop=True, inplace=True)

Select most risky customer in test data¶

One person who might be of immediate interest is the most likely to default customer in the test data. This customer's row will be selected and local variable importance for the corresponding prediction will be analyzed.

In [25]:

decile = 99
row = test[test['ID'] == percentile_dict[decile]]

Create a Pandas DataFrame of Shapley values for riskiest customer¶

The most interesting Shapley values are probably those that push this customer's probability of default higher, i.e. the highest positive Shapley values. Those values are plotted below.

In [26]:

# reset test data index to find riskiest customer in shap_values 
# sort to find largest positive contributions
s_df = pd.DataFrame(shap_values[row.index[0], :][:-1].reshape(23, 1), columns=['Reason Codes'], index=X)
s_df.sort_values(by='Reason Codes', inplace=True, ascending=False)

In [27]:

s_df

Out[27]:

	Reason Codes
PAY_0	1.526435
PAY_5	0.511360
PAY_6	0.328183
PAY_2	0.289542
LIMIT_BAL	0.287896
PAY_4	0.227838
AGE	0.225641
BILL_AMT1	0.197412
PAY_3	0.178575
MARRIAGE	0.171177
PAY_AMT3	0.133252
PAY_AMT1	0.107046
PAY_AMT2	0.082654
BILL_AMT3	0.068008
EDUCATION	0.060491
PAY_AMT4	0.060251
BILL_AMT2	0.057245
BILL_AMT4	0.039342
PAY_AMT5	0.036137
PAY_AMT6	0.025210
BILL_AMT6	0.005666
BILL_AMT5	0.002853
SEX	-0.096203

Plot top local contributions as reason codes¶

In [28]:

_ = s_df[:5].plot(kind='bar', 
                  title='Top Five Reason Codes for a Risky Customer\n', 
                  legend=False)

For the customer in the test dataset that the GBM predicts as most likely to default, the most important input variables in the prediction are, in descending order, PAY_0, PAY_5, PAY_6, PAY_2, and LIMIT_BAL.

Display customer in question¶

The local contributions for this customer appear reasonable, especially when considering her payment information. Her most recent payment was 3 months late and her payment for 6 months and 5 months previous were 7 months late. Also her credit limit was extremely low, so it's logical that these factors would weigh heavily into the model's prediction for default for this customer.

In [29]:

row # helps understand reason codes

Out[29]:

	ID	LIMIT_BAL	SEX	EDUCATION	MARRIAGE	AGE	PAY_0	PAY_2	PAY_3	PAY_4	PAY_5	PAY_6	BILL_AMT1	BILL_AMT2	BILL_AMT3	BILL_AMT4	BILL_AMT5	BILL_AMT6	PAY_AMT1	PAY_AMT2	PAY_AMT3	PAY_AMT4	PAY_AMT5	PAY_AMT6	DEFAULT_NEXT_MONTH
5399	17757	10000	2	3	1	51	3	2	2	7	7	7	2400	2400	2400	2400	2400	2400	0	0	0	0	0	0	1

To generate reason codes for the model's decision, the locally important variable and its value are used together. If this customer was denied future credit based on this model and data, the top five Shapley-based reason codes for the automated decision would be:

Most recent payment is 3 months delayed.
5th most recent payment is 7 months delayed.
6th most recent payment is 7 months delayed.
2nd most recent payment is 2 months delayed.
Credit limit is too low: 10,000 $NT.

(Of course, credit limits are set by the lender and are used to price-in risk to credit decisions, so using credit limits as reason codes or even in a probability of default model is likely questionable. However, in this small, example data set all input columns were used to generate a better model fit. For a slightly more careful treatment of gradient boosting in the context of credit scoring, please see: https://github.com/jphall663/interpretable_machine_learning_with_python/blob/master/dia.ipynb)

Summary¶

In this notebook, a highly transparent, nonlinear, monotonic GBM classifier was trained to predict credit card defaults and the monotonic behavior of the classifier was analyzed and validated. To do so, Pearson correlation between each input and the target was used to determine the direction for monotonicity constraints for each input variable in the XGBoost classifier. GBM variable importance, partial dependence, and ICE were calculated, plotted, and compared to one another, domain knowledge, and reasonable expectations. Shapley values were then used to explain the model predictions for the single most risky customer in the test set. These techniques should generalize well for many types of business and research problems, enabling you to train a monotonic GBM model and analyze, validate, and explain it to your colleagues, bosses, and potentially, external regulators.