#!/usr/bin/env python
# coding: utf-8

# ## License 
# 
# Copyright 2017 - 2020 Patrick Hall and the H2O.ai team
# 
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# 
#     http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# **DISCLAIMER:** This notebook is not legal compliance advice.

# # Explain Your Predictive Models to Business Stakeholders using LIME with Python and H2O
# #### Describing complex models and generating reason codes with Local Interpretable Model-agnostic Explanations (LIME) and LIME-variants
# 
# Local Interpretable Model-agnostic Explanations (LIME) shed light on how almost any machine learning model makes decisions for specific rows of data. LIME builds local linear surrogate models around observations of interest and leverages the highly interpretable properties of linear models to increase transparency and accountability for the corresponding model predictions. In this notebook, an h2o GBM is trained on the UCI credit card default data and then predictions for a highly risky customer are explained using linear model coefficients and LIME-derived reason codes. The notebook concludes by introducing a variant of LIME that is easier to execute on new data and that can be analyzed alongside observed (i.e., not simulated) data.
# 
# **Note**: As of the h2o 3.24 "Yates" release, Shapley values are supported in h2o. Shapley values can be used in place of or along with LIME. To see Shapley values for an h2o GBM in action please see: https://github.com/jphall663/interpretable_machine_learning_with_python/blob/master/dia.ipynb.

# #### Python imports
# In general, NumPy and Pandas will be used for data manipulation purposes and h2o will be used for modeling tasks. 

# In[1]:


# h2o Python API with specific classes
import h2o 
from h2o.estimators.glm import H2OGeneralizedLinearEstimator # for LIME
from h2o.grid.grid_search import H2OGridSearch               # for LIME
from h2o.estimators.gbm import H2OGradientBoostingEstimator  # for GBM


import operator # for sorting dictionaries

import numpy as np   # array, vector, matrix calculations
import pandas as pd  # DataFrame handling

# display plots in notebook
get_ipython().run_line_magic('matplotlib', 'inline')


# #### Start h2o
# H2o is both a library and a server. The machine learning algorithms in the library take advantage of the multithreaded and distributed architecture provided by the server to train machine learning algorithms extremely efficiently. The API for the library was imported above in cell 1, but the server still needs to be started.

# In[2]:


h2o.init(max_mem_size='2G')       # start h2o
h2o.remove_all()                  # remove any existing data structures from h2o memory


# ## 1. Download, explore, and prepare UCI credit card default data
# 
# UCI credit card default data: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
# 
# The UCI credit card default data contains demographic and payment information about credit card customers in Taiwan in the year 2005. The data set contains 23 input variables: 
# 
# * **`LIMIT_BAL`**: Amount of given credit (NT dollar)
# * **`SEX`**: 1 = male; 2 = female
# * **`EDUCATION`**: 1 = graduate school; 2 = university; 3 = high school; 4 = others 
# * **`MARRIAGE`**: 1 = married; 2 = single; 3 = others
# * **`AGE`**: Age in years 
# * **`PAY_0`, `PAY_2` - `PAY_6`**: History of past payment; `PAY_0` = the repayment status in September, 2005; `PAY_2` = the repayment status in August, 2005; ...; `PAY_6` = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; ...; 8 = payment delay for eight months; 9 = payment delay for nine months and above. 
# * **`BILL_AMT1` - `BILL_AMT6`**: Amount of bill statement (NT dollar). `BILL_AMNT1` = amount of bill statement in September, 2005; `BILL_AMT2` = amount of bill statement in August, 2005; ...; `BILL_AMT6` = amount of bill statement in April, 2005. 
# * **`PAY_AMT1` - `PAY_AMT6`**: Amount of previous payment (NT dollar). `PAY_AMT1` = amount paid in September, 2005; `PAY_AMT2` = amount paid in August, 2005; ...; `PAY_AMT6` = amount paid in April, 2005. 
# 
# These 23 input variables are used to predict the target variable, whether or not a customer defaulted on their credit card bill in late 2005.
# 
# Because h2o accepts both numeric and character inputs, some variables will be recoded into more transparent character values.

# #### Import data and clean
# The credit card default data is available as an `.xls` file. Pandas reads `.xls` files automatically, so it's used to load the credit card default data and give the prediction target a shorter name: `DEFAULT_NEXT_MONTH`.

# In[3]:


# import XLS file
path = 'default_of_credit_card_clients.xls'
data = pd.read_excel(path,
                     skiprows=1)

# remove spaces from target column name 
data = data.rename(columns={'default payment next month': 'DEFAULT_NEXT_MONTH'}) 


# #### Assign modeling roles
# The shorthand name `y` is assigned to the prediction target. `X` is assigned to all other input variables in the credit card default data except the row indentifier, `ID`.

# In[4]:


# assign target and inputs for GBM
y = 'DEFAULT_NEXT_MONTH'
X = [name for name in data.columns if name not in [y, 'ID']]
print('y =', y)
print('X =', X)


# #### Helper function for recoding values in the UCI credict card default data
# This simple function maps longer, more understandable character string values from the UCI credit card default data dictionary to the original integer values of the input variables found in the dataset. These character values can be used directly in h2o decision tree models, and the function returns the original Pandas DataFrame as an h2o object, an H2OFrame. H2o models cannot run on Pandas DataFrames. They require H2OFrames.

# In[5]:


def recode_cc_data(frame):
    
    """ Recodes numeric categorical variables into categorical character variables
    with more transparent values. 
    
    Args:
        frame: Pandas DataFrame version of UCI credit card default data.
        
    Returns: 
        H2OFrame with recoded values.
        
    """
    
    # define recoded values
    sex_dict = {1:'male', 2:'female'}
    education_dict = {0:'other', 1:'graduate school', 2:'university', 3:'high school', 
                      4:'other', 5:'other', 6:'other'}
    marriage_dict = {0:'other', 1:'married', 2:'single', 3:'divorced'}
    pay_dict = {-2:'no consumption', -1:'pay duly', 0:'use of revolving credit', 1:'1 month delay', 
                2:'2 month delay', 3:'3 month delay', 4:'4 month delay', 5:'5 month delay', 6:'6 month delay', 
                7:'7 month delay', 8:'8 month delay', 9:'9+ month delay'}
    
    # recode values using Pandas apply() and anonymous function
    frame['SEX'] = frame['SEX'].apply(lambda i: sex_dict[i])
    frame['EDUCATION'] = frame['EDUCATION'].apply(lambda i: education_dict[i])    
    frame['MARRIAGE'] = frame['MARRIAGE'].apply(lambda i: marriage_dict[i]) 
    for name in frame.columns:
        if name in ['PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6']:
            frame[name] = frame[name].apply(lambda i: pay_dict[i])            
                
    return h2o.H2OFrame(frame)

data = recode_cc_data(data)


# #### Ensure target is handled as a categorical variable
# In h2o, a numeric variable can be treated as numeric or categorical. The target variable `DEFAULT_NEXT_MONTH` takes on values of `0` or `1`. To ensure this numeric variable is treated as a categorical variable, the `asfactor()` function is used to explicitly declare that it is a categorical variable. 

# In[6]:


data[y] = data[y].asfactor() 


# #### Display descriptive statistics
# The h2o `describe()` function displays a brief description of the credit card default data. For the categorical input variables `LIMIT_BAL`, `SEX`, `EDUCATION`, `MARRIAGE`, and `PAY_0`-`PAY_6`, the new character values created above in cell 5 are visible. Basic descriptive statistics are displayed for numeric inputs.

# In[7]:


data.describe()


# ## 2. Train an H2O GBM classifier

# #### Split data into training and test sets for early stopping
# The credit card default data is split into training and test sets to monitor and prevent overtraining. Reproducibility is also an important factor in creating trustworthy models, and randomly splitting datasets can introduce randomness in model predictions and other results. A random seed is used here to ensure the data split is reproducible.

# In[8]:


# split into training and validation
train, test = data.split_frame([0.7], seed=12345)

# summarize split
print('Train data rows = %d, columns = %d' % (train.shape[0], train.shape[1]))
print('Test data rows = %d, columns = %d' % (test.shape[0], test.shape[1]))


# #### Train h2o GBM classifier
# Many tuning parameters must be specified to train a GBM using h2o. Typically a grid search would be performed to identify the best parameters for a given modeling task using the `H2OGridSearch` class. For brevity's sake, a previously-discovered set of good tuning parameters are specified here. Because gradient boosting methods typically resample training data, an additional random seed is also specified for the h2o GBM using the `seed` parameter to create reproducible predictions, error rates, and variable importance values. To avoid overfitting, the `stopping_rounds` parameter is used to stop the training process after the test error fails to decrease for 5 iterations.

# In[9]:


# initialize GBM model
model = H2OGradientBoostingEstimator(ntrees=150,            # maximum 150 trees in GBM
                                     max_depth=4,           # trees can have maximum depth of 4
                                     sample_rate=0.9,       # use 90% of rows in each iteration (tree)
                                     col_sample_rate=0.9,   # use 90% of variables in each iteration (tree)
                                     stopping_rounds=5,     # stop if validation error does not decrease for 5 iterations (trees)
                                     score_tree_interval=1, # for reproducibility, set higher for bigger data
                                     seed=12345)            # random seed for reproducibility

# train a GBM model
model.train(y=y, x=X, training_frame=train, validation_frame=test)

# print AUC
print('GBM Test AUC = %.2f' % model.auc(valid=True))


# ## 3. Use LIME to generate descriptions for a local region with a perturbed sample
# 
# LIME was originally described in the context of explaining image or text classification decisions here: http://www.kdd.org/kdd2016/papers/files/rfp0573-ribeiroA.pdf. It can certainly also be applied to business or customer data, as will be done in the remaining sections of this notebook. Multiple Python implementations of LIME are available from the original authors of LIME, from the eli5 package, from the skater package, and probably others. However, this notebook uses a simple, step-by-step implementation of LIME for instructional purposes. 
# 
# A linear model cannot be built on a single observation, so LIME typically requires that a set of rows similar to the row of interest be simulated. This set of records are scored using the complex model to be explained. Then the records are weighted by their closeness to the record of interest, and a regularized linear model is trained on this weighted explanatory set. The parameters of the linear model and LIME-derived reason codes are then used to explain the prediction for the selected record. Because simulation of new points can seem abstract to some practicioners and simulation and distance calculations can be somewhat burdensome for creating explanations quickly in mission-critical applications, this notebook also presents a variation of LIME in which a more practical sample, instead of a perturbed, simulated sample, is used to create a local region in which to fit a linear model.

# #### Display the most risky customer
# In the Oriole notebook *Increase Transparency and Accountability in Your Machine Learning Project with Python and H2O*, row index 29116 was found to contain the riskiest customer in the test dataset according to the h2o GBM model. Sections 3-7 focus on deriving reason codes and other explanations for this customer's GBM prediction. The riskiest customer is selected first for analysis as an exercise in boundary testing.

# In[10]:


row = test[test['ID'] == 29116]
row


# To use LIME, a sample of similar (i.e., near or local) points is simulated around the customer of interest. This simple function draws numeric values from normal distributions centered around the customer of interest and draws categorical values at random from the variable values in the test set.

# In[11]:


def generate_local_sample(row, frame, X, N=1000):
    
    """ Generates a perturbed sample around a row of interest.
    
    Args:
        row: Row of H2OFrame to be explained.
        frame: H2OFrame in which row is stored.
        X: List of model input variables.
        N: Number of samples to generate.
    
    Returns:
        Pandas DataFrame containing perturbed sample.
    
    """
    
    # initialize Pandas DataFrame
    sample_frame = pd.DataFrame(data=np.zeros(shape=(N, len(X))), columns=X)
    
    # generate column vectors of 
    # randomly drawn levels for categorical variables
    # normally distributed numeric values around mean of column for numeric variables
    for key, val in frame[X].types.items():
        if val == 'enum': # 'enum' means categorical
            rs = np.random.RandomState(11111) # random seed for reproducibility
            draw = rs.choice(frame[key].levels()[0], size=(1, N))[0]
        else:
            rs = np.random.RandomState(11111) # random seed for reproducibility
            loc = row[key][0, 0]
            sd = frame[key].sd()
            draw = rs.normal(loc, sd, (N, 1))
            draw[draw < 0] = loc # prevents unrealistic values when std. dev. is large
        
        sample_frame[key] = draw
        
    return sample_frame

# run and display results
perturbed_sample = generate_local_sample(row, test, X)
perturbed_sample.head(n=3)


# #### Calculate distance between row of interest and perturbed sample
# Once the sample is simulated, then distances from the point of interest are used to weigh each point before fitting a penalized regression model. Since Euclidean distance calculations require numeric quanitites, categorical input variables are one-hot encoded. (Pandas has convenient functionality for one-hot encoding, and the H2OFrames are temporarily casted back to Pandas DataFrames to perform the encoding.) To prevent the disparate scales of numeric values, such as `AGE` and `LIMIT_BAL`, from  skewing Euclidean distances, numeric input variables are standardized. 
# 
# First, the row containing the riskiest customer is encoded and standardized.

# In[12]:


# scaling and one-hot encoding for calculating Euclidian distance
# for the row of interest

# scale numeric
numeric = list(set(X) - set(['ID', 'SEX', 'EDUCATION', 'MARRIAGE', 'PAY_0', 'PAY_2',
                             'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'DEFAULT_NEXT_MONTH']))

scaled_test = test.as_data_frame()
scaled_test[numeric] = (scaled_test[numeric] - scaled_test[numeric].mean())/scaled_test[numeric].std()
    
# encode categorical
row_df = scaled_test[scaled_test['ID'] == 22760]
row_dummies = pd.concat([row_df.drop(['ID', 'SEX', 'EDUCATION', 'MARRIAGE', 'PAY_0', 'PAY_2',
                                      'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'DEFAULT_NEXT_MONTH'], axis=1),
                        pd.get_dummies(row_df[['SEX', 'EDUCATION', 'MARRIAGE', 'PAY_0',
                                               'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6']])], 
                        axis=1)

# convert to H2OFrame
row_dummies = h2o.H2OFrame(row_dummies)
row_dummies


# Then the simulated sample is encoded and standardized.

# In[13]:


# scaling and one-hot encoding for calculating Euclidian distance 
# for the simulated sample

# scale
scaled_perturbed_sample = perturbed_sample[numeric].copy(deep=True)
scaled_perturbed_sample = (scaled_perturbed_sample - scaled_perturbed_sample.mean())/scaled_perturbed_sample.std()

# encode
perturbed_sample_dummies = pd.concat([scaled_perturbed_sample,
                                      pd.get_dummies(perturbed_sample[['SEX', 'EDUCATION', 'MARRIAGE', 'PAY_0',
                                                                       'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6']])],
                                     axis=1)

# convert to H2OFrame
perturbed_sample_dummies = h2o.H2OFrame(perturbed_sample_dummies[row_dummies.columns])
perturbed_sample_dummies.head(rows=3)


# Distance is calculated using h2o. The distance is substracted from the maximum distance, changing the distance values into similarity values. Now the observations with the highest values are those that are closest to the observation of interest and they will carry the most weight in the local explanatory linear model. A few sample similarity values are displayed directly below.

# In[14]:


# calculate distance using H2OFrame distance function
distance = row_dummies.distance(perturbed_sample_dummies, measure='l2').transpose()
distance.columns = ['distance']          # rename 
distance = distance.max() - distance     # lower distances, higher weight in LIME
distance.head(rows=3)


# #### Bind distance weights onto perturbed sample
# To fit an h2o linear model using the similarities as observation weights, the distance column must reside in the same H2OFrame as the simulated sample data.

# In[15]:


perturbed_sample = h2o.H2OFrame(perturbed_sample).cbind(distance)
perturbed_sample.head(rows=3)


# #### Bind model predictions onto perturbed sample
# For LIME, the target of the explanatory local linear model is the predictions of the GBM model in the local simulated sample. The values are calculated and column-bound to the simulated sample. 

# In[16]:


yhat = 'p_DEFAULT_NEXT_MONTH'
preds1 = model.predict(perturbed_sample).drop(['predict', 'p0'])
preds1.columns = [yhat]
perturbed_sample = perturbed_sample.cbind(preds1)
perturbed_sample.head(rows=3)


# #### Train penalized linear model in local region 
# Once the simulated sample has been weighted with distances and contains the GBM model predictions, a linear model is fit to the original inputs and the GBM model predictions, weighted by similarity to the row of interest. The trained GLM coefficients are helpful for understanding the local region of response function around the riskiest customer.

# In[17]:


# initialize
local_glm1 = H2OGeneralizedLinearEstimator(weights_column='distance',
                                           seed=12345)
# train 
local_glm1.train(x=X, y=yhat, training_frame=perturbed_sample)

# coefs
print('\nLocal Positive GLM Coefficients:')
for c_name, c_val in sorted(local_glm1.coef().items(), key=operator.itemgetter(1)):
    if c_val > 0.0:
        print('%s %s' % (str(c_name + ':').ljust(25), c_val))
        
# r2
print('\nLocal GLM R-square:\n%.2f' % local_glm1.r2())


# The coefficients of the local linear model describe the average behavior of the GBM response function around the riskiest customer. In this local region, customers who missed payments are treated as the most likely to default.

# ## 4. Generate reason codes with LIME based on a perturbed sample
# This basic function uses the coefficients of the local linear explanatory model and the values in the row of interest to plot reason code values in a bar chart. The local GLM coefficient multiplied by the value in a specific row are estimates of how much each variable contributed to each prediction decision. These values can tell you how a variable and its values were weighted in any given decision by the model. These values are crucially important for machine learning interpretability and are often to referred to "local feature importance", "reason codes", or "turn-down codes." The latter phrases are borrowed from credit scoring. Credit lenders in the U.S. must provide reasons for turning down certain credit applications in an automated fashion. Reason codes can be easily extracted from LIME local feature importance values by simply ranking the variables that played the largest role in any given decision.

# In[18]:


def plot_local_contrib(row, model, X, g_pred=None, scale=False): 

    """ Plots reason codes in a bar chart. 
    
    Args:
    
        row: Row of H2OFrame to be explained.
        model: H2O linear model used for generating reason codes.
        X: List of model input variables.
        g_pred: Prediction of model to be explained, sometimes denoted g, used for scaling.
        scale: Whether to rescale contributions to sum to model predictions.
    
    """
    
    # initialize Pandas DataFrame to store results
    local_contrib_frame = pd.DataFrame(columns=['Name', 'Local Contribution', 'Sign'])
    
    # multiply values in row by local glm coefficients    
    for key, val in sorted(row[X].types.items()):
        contrib = 0
        name = ''
        if val == 'enum':
                level = row[key][0, 0]
                name = '.'.join([str(key), str(level)])
                if name in model.coef():
                    contrib = model.coef()[name]
        else:
            name = key
            if name in model.coef():
                contrib = row[name][0, 0]*model.coef()[name]
        
        # save only non-zero values
        if contrib != 0.0:
            local_contrib_frame = local_contrib_frame.append({'Name': name,
                                                              'Local Contribution': contrib,
                                                              'Sign': contrib > 0}, 
                                                             ignore_index=True) 

    if scale:
        scaler = (g_pred - model.coef()['Intercept']) /\
            local_contrib_frame['Local Contribution'].sum()
        local_contrib_frame['Local Contribution']  *= scaler
        
    # plot
    _ = local_contrib_frame.plot(x='Name',
                                 y='Local Contribution',
                                 kind='bar', 
                                 title='Reason Codes', 
                                 color=local_contrib_frame.Sign.map({True:'b', False:'g'}), 
                                 legend=False) 
    

# #### Display reason codes
# Here it can be seen that the riskiest customer's prediction is driven by her values for payment variables. Specifically, the top five LIME-derived reason codes contributing to her high probability of default are:
# 
# 1. Most recent payment is 3 months delayed.
# 2. 3rd most recent payment is 3 months delayed.
# 3. 2nd most recent payment is 2 months delayed.
# 4. 4th most recent payment is 2 months delayed.
# 5. Customer is married.
# 
# (Of course variables like `MARRIAGE`, `AGE`, and `SEX`should not be used in credit lending decisions. For a slightly more careful treatment of GBM in the context of fair lending see: https://github.com/jphall663/interpretable_machine_learning_with_python/blob/master/dia.ipynb)
# 
# This result is somewhat aligned with LOCO-derived reason codes found in section 5 of the *Increase Transparency and Accountability in Your Machine Learning Project with Python and H2O* Oriole notebook. Both perspectives weigh the riskiest customer's most recent and 3rd most recent payments very heavily in the model's prediction. A minor discrepancy between LOCO- and LIME-derived reason codes is somewhat expected. LIME explanations are linear, do not consider interactions, and represent offsets from the local linear model intercept. LOCO importance values are nonlinear, do consider interactions, and do not explicitly consider a linear intercept or offset. **Because most currently-available explanatory techniques are approximate, it is recommended that users employ several different explanatory techniques and trust only consisent results across techniques. Also, as of h2o 3.24, Shapley values are supported for h2o GBM. Use Shapley values along with or instead of LIME for any high-stakes application.**
# 
# It is also imperative to compare these results to domain knowledge and reasonable expectations. In this case, the LIME reason codes and linear model coefficients tell a relatively parsimonious story about the GBM's prediction behavior. If this was not so, steps should be taken to either reconcile or remove inconsistencies and unreasonable predictions.

# In[19]:


plot_local_contrib(row, local_glm1, X)


# ## 5. Use LIME to generate descriptions for a local region with a practical sample
# Using a previously-existing local sample based on clusters, deciles, or other more natural segments to create LIME explanations is computationally cheaper and perhaps more straightforward than using a simulated perturbed sample, but it does have one major drawback. If the sample is too large, the explanatory linear model maybe not be accurate enough to explain all predictions in the sample. The remaining sections of this notebook will explore the idea of generating LIME explanations using a practical sample. 

# #### Create a local region based on values of SEX and merge with GBM model predictions
# Instead of using a perturbed simulated sample, a linear model will be fit on all women in the test set, and the sample is not weighted by distance from any one point. A few lines of the all female sample are displayed directly below.

# In[20]:


preds2 = model.predict(test).drop(['predict', 'p0'])
preds2.columns = [yhat]
practical_sample = test.cbind(preds2)
practical_sample = practical_sample[practical_sample['SEX'] == 'female']
practical_sample.head(rows=3)


# #### Train penalized linear model in local region 
# A penalized linear model is trained in the local region defined by women in the test set. Because fit is a concern in this much larger explanatory sample, users should always check the R<sup>2</sup> or other goodness-of-fit measures to ensure surrogate model is accurate in the sample. 

# In[21]:


# initialize
local_glm2 = H2OGeneralizedLinearEstimator(seed=12345)

# train 
local_glm2.train(x=X, y=yhat, training_frame=practical_sample)

# coefs
print('\nLocal Positive GLM Coefficients:')
for c_name, c_val in sorted(local_glm2.coef().items(), key=operator.itemgetter(1)):
    if c_val > 0.0:
        print('%s %s' % (str(c_name + ':').ljust(25), c_val))
        
# r2
print('\nLocal GLM R-square:\n%.2f' % local_glm2.r2())


# The R<sup>2</sup> is quite high for this local sample and linear model. Because the sample is simply the women in the test set and because the model fit is acceptable, the trained linear model and coefficients can be used to understand the average behavior of women in the test set. On average, late payments, particulary `PAY_0`, `PAY_2`, and `PAY_6`, are the most likely to push the GBM model towards higher probability of default values for women.

# ## 6. Generate a ranked predictions plot to assess validity of local explanatory model
# A *ranked predictions plot* can also be used to ensure the local linear surrogate model is a good fit for the model inputs and predictions. A ranked predictions plot is a way to visually check whether the surrogate model is a good fit for the complex model. The y-axis is the numeric prediction of both models for a given point. The x-axis is the rank of a point when the predictions are sorted by their GBM prediction, from lowest on the left to highest on the right. When both sets of predictions are aligned, as they are below, this a good indication that the linear model fits the complex, nonlinear GBM well in the practical sample.

# In[22]:


# ranked predictions plot
pred_frame = local_glm2.predict(practical_sample).cbind(practical_sample)\
                       .as_data_frame()[['predict', yhat]]

pred_frame.columns = ['Surrogate Preds.', 'ML Preds.']
pred_frame.sort_values(by='ML Preds.', inplace=True)
pred_frame.reset_index(inplace=True, drop=True)
_ = pred_frame.plot(title='Ranked Predictions Plot')


# Both the R<sup>2</sup> and ranked predictions plot show the linear model is a good fit in the practical, approximately local sample. This means the regression coefficients are likely a very accurate representation of the behavior of the nonlinear model in this region.

# ## 7. Generate reason codes using a practical sample

# #### Create explanations (or 'reason codes') for a row in the local set
# Reason codes are generated for the model based on the practical sample, just as they were for the model based on the perturbed sample. Again, the woman's value for `PAY_0` is the most important local contributor to her GBM prediction. As seen in other attempts to explain this prediction, `PAY_0` followed by other payment variables, marital status, and her age play a role in the model decision. Also, for better local accuracy and explainability, LIME contributions are scaled such that contributions for each prediction plus the LIME intercept term sum to the GBM's predictions.

# In[23]:


g_pred_ = model.predict(row)['p1'][0, 0]
plot_local_contrib(row, local_glm2, X, g_pred=g_pred_, scale=True)                                                             


# #### Shutdown H2O
# After using h2o, it's typically best to shut it down. However, before doing so, users should ensure that they have saved any h2o data structures, such as models and H2OFrames, or scoring artifacts, such as POJOs and MOJOs.

# In[24]:


# be careful, this can erase your work!
h2o.cluster().shutdown(prompt=True)


# #### Summary
# In this notebook, LIME was used to explain and generate reason codes for a complex GBM classifier. To do so, local linear models were fit to appropriate, representative samples and linear model coefficients were used to explain the average behavior in the samples and to create reason codes. Reason codes were assesed against domain knowledge and reasonable expectations. A ranked prediction plot was also introduced to compare surrogate linear model predictions to GBM model predictions. These techniques should generalize well for many types of business and research problems, enabling you to train a complex machine learning model and analyze, validate, and explain it to your colleagues, bosses, and potentially, external regulators.