ELI5 is a Python library which allows to visualize and debug various Machine Learning models using unified API. It has built-in support for several ML frameworks and provides a way to explain black-box models.
In this notebook we will cover explainability methods for:
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
This dataset classifies people described by a set of attributes as good or bad credit risks.
from sklearn.datasets import fetch_openml
original_features, labels = fetch_openml('credit-g', return_X_y=True, as_frame=True)
original_features.head()
checking_status | duration | credit_history | purpose | credit_amount | savings_status | employment | installment_commitment | personal_status | other_parties | residence_since | property_magnitude | age | other_payment_plans | housing | existing_credits | job | num_dependents | own_telephone | foreign_worker | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | <0 | 6.0 | critical/other existing credit | radio/tv | 1169.0 | no known savings | >=7 | 4.0 | male single | none | 4.0 | real estate | 67.0 | none | own | 2.0 | skilled | 1.0 | yes | yes |
1 | 0<=X<200 | 48.0 | existing paid | radio/tv | 5951.0 | <100 | 1<=X<4 | 2.0 | female div/dep/mar | none | 2.0 | real estate | 22.0 | none | own | 1.0 | skilled | 1.0 | none | yes |
2 | no checking | 12.0 | critical/other existing credit | education | 2096.0 | <100 | 4<=X<7 | 2.0 | male single | none | 3.0 | real estate | 49.0 | none | own | 1.0 | unskilled resident | 2.0 | none | yes |
3 | <0 | 42.0 | existing paid | furniture/equipment | 7882.0 | <100 | 4<=X<7 | 2.0 | male single | guarantor | 4.0 | life insurance | 45.0 | none | for free | 1.0 | skilled | 2.0 | none | yes |
4 | <0 | 24.0 | delayed previously | new car | 4870.0 | <100 | 1<=X<4 | 3.0 | male single | none | 4.0 | no known property | 53.0 | none | for free | 2.0 | skilled | 2.0 | none | yes |
labels.head()
0 good 1 bad 2 good 3 good 4 bad Name: class, dtype: category Categories (2, object): [good, bad]
# map the target categories to integers
label_to_int = {'good': 1, 'bad': 0}
int_to_label = {1: 'good', 0: 'bad'}
# one-hot-encoding for the categorical features
encoded_features = pd.get_dummies(original_features, prefix_sep=': ')
labels = labels.map(label_to_int)
from sklearn.model_selection import train_test_split
# train test split
features_train, features_test, labels_train, labels_test = train_test_split(encoded_features, labels, test_size=0.2)
Scaling, compared to Normalizing, gives a more intuitive meaning to the resulting linear model weights
from sklearn.preprocessing import MinMaxScaler
# feature scaling
scaler = MinMaxScaler()
features_train = pd.DataFrame(scaler.fit_transform(features_train),
columns=encoded_features.columns)
features_test = pd.DataFrame(scaler.transform(features_test),
columns=encoded_features.columns)
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
regressor = LogisticRegression(max_iter=10000)
regressor.fit(features_train, labels_train)
print(accuracy_score(labels_test, regressor.predict(features_test)))
0.715
import eli5
Global model interpretation (Amplitude + Direction)
Let's see in general what type of features are related to good credit and bad credit.
For linear models, it's just the model weights, but presented sorted and colored for clarity!
eli5.explain_weights(regressor, feature_names=encoded_features.columns.to_list(),
target_names=int_to_label)
y=good top features
Weight? | Feature |
---|---|
+2.194 | <BIAS> |
+1.001 | credit_history: critical/other existing credit |
+0.814 | checking_status: no checking |
+0.802 | purpose: domestic appliance |
+0.695 | purpose: used car |
+0.567 | age |
+0.492 | other_parties: guarantor |
+0.442 | employment: 4<=X<7 |
… 22 more positive … | |
… 20 more negative … | |
-0.428 | savings_status: <100 |
-0.431 | property_magnitude: no known property |
-0.544 | other_parties: co applicant |
-0.587 | credit_history: all paid |
-0.609 | installment_commitment |
-0.619 | existing_credits |
-0.646 | credit_history: no credits/all paid |
-0.747 | checking_status: <0 |
-0.856 | purpose: education |
-0.873 | purpose: new car |
-1.340 | credit_amount |
-1.463 | duration |
Local model interpretation (Amplitude + Direction)
For a specific prediction we can highlight what specific features led the model to its prediction.
For this linear model it's basically the corresponding model weight * sample feature.
test_sample_idx = 0
print('True label: {}'.format(int_to_label[labels_test.iloc[test_sample_idx]]))
eli5.explain_prediction(regressor, features_test.iloc[test_sample_idx], target_names=int_to_label)
True label: good
y=good (probability 0.792, score 1.339) top features
Contribution? | Feature |
---|---|
+2.194 | <BIAS> |
+0.814 | checking_status: no checking |
+0.283 | age |
+0.237 | personal_status: male single |
+0.199 | housing: own |
+0.146 | credit_history: existing paid |
+0.109 | property_magnitude: life insurance |
+0.076 | own_telephone: yes |
+0.056 | employment: >=7 |
+0.051 | other_parties: none |
-0.026 | residence_since |
-0.028 | other_payment_plans: bank |
-0.086 | num_dependents |
-0.118 | purpose: furniture/equipment |
-0.159 | job: skilled |
-0.232 | credit_amount |
-0.301 | duration |
-0.413 | existing_credits |
-0.426 | foreign_worker: yes |
-0.428 | savings_status: <100 |
-0.609 | installment_commitment |
test_sample_idx = 2
print('True label: {}'.format(int_to_label[labels_test.iloc[test_sample_idx]]))
eli5.explain_prediction(regressor, features_test.iloc[test_sample_idx], target_names=int_to_label)
True label: good
y=good (probability 0.612, score 0.455) top features
Contribution? | Feature |
---|---|
+2.194 | <BIAS> |
+0.814 | checking_status: no checking |
+0.367 | other_payment_plans: none |
+0.237 | personal_status: male single |
+0.233 | age |
+0.199 | housing: own |
+0.146 | credit_history: existing paid |
+0.109 | property_magnitude: life insurance |
+0.076 | own_telephone: yes |
+0.056 | employment: >=7 |
+0.051 | other_parties: none |
-0.026 | residence_since |
-0.121 | job: high qualif/self emp/mgmt |
-0.203 | installment_commitment |
-0.426 | foreign_worker: yes |
-0.428 | savings_status: <100 |
-0.745 | credit_amount |
-0.873 | purpose: new car |
-1.205 | duration |
from keras.layers import Dense, Dropout
from keras.models import Sequential, load_model
from keras.optimizers import RMSprop
from keras.callbacks import ModelCheckpoint
nn = Sequential()
nn.add(Dense(10, activation='relu', input_shape=[encoded_features.shape[1],]))
nn.add(Dropout(0.5))
nn.add(Dense(1, activation='sigmoid'))
nn.compile(optimizer=RMSprop(lr=0.01),
loss='binary_crossentropy',
metrics=['accuracy'])
# in case we want to reset our weights
nn.save('init_weights.h5')
callbacks = [ModelCheckpoint('credit_trained.h5', monitor='val_acc', verbose=1, save_best_only=True, mode='max')]
nn.fit(features_train, labels_train,
batch_size=32,
validation_split=0.2,
callbacks=callbacks,
epochs=1000)
Compute feature importances for any black-box estimator by measuring how score decreases when a feature is not available; the method is also known as “Mean Decrease Accuracy (MDA)”.
from eli5.permutation_importance import get_score_importances
# nn = load_model('credit_trained.h5')
# the more 'correct' but computationally expensive method to compute feature importance
def score_with_train(features, labels):
nn.load_weights('init_weights.h5')
nn.fit(features, labels,
epochs=1000)
predictions = nn.predict_classes(features)
return accuracy_score(labels, predictions)
# the faster option, without re-training, noising features on the test set
def score_without_train(features, labels):
predictions = nn.predict_classes(features)
return accuracy_score(labels, predictions)
base_score, score_decreases = get_score_importances(score_without_train,
features_test.values,
labels_test.values,
n_iter=3)
feature_importances = np.mean(score_decreases, axis=0)
feature_importances = pd.DataFrame({'importance': feature_importances},
index=encoded_features.columns)
feature_importances.sort_values(by='importance').plot.barh()
<matplotlib.axes._subplots.AxesSubplot at 0x17c37996a58>
Too much cluttering on the feature names (y-axis) but we get a general idea on overall feature importance trend, and even observe a peculiar behaviour of negative importance features!
We will plot the top 10 best/worst to get a better impression on the features
# plot the top 10 most important features
feature_importances.sort_values(by='importance')[-10:].plot.barh()
<matplotlib.axes._subplots.AxesSubplot at 0x17c37806748>
Please note that this feature importance is only for 'Amplitude', we don't know the direction (positive / negative) impact each feature has on the prediction.
# plot the top 10 least important features
feature_importances.sort_values(by='importance')[:10].plot.barh()
<matplotlib.axes._subplots.AxesSubplot at 0x17c37b88c18>
This weird phenomenon where you're better off without some of the features is actually quite common when you're performing one-hot-encoding on categorical features.
In a way, every one-hot-encoding operation always makes 1 redundant feature, because all the information needed is in n-1 encoded features from n categories (i.e 0 for n-1 categories means 1 for the nth category). Keeping these redundant features allows for more degrees of freedom for your model to overfit on your data and not generalize well on the test set (on which this permutance computation is checked on).
From Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization:
" We propose a technique for producing "visual explanations" for decisions from a large class of CNN-based models, making them more transparent. Our approach - Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept, flowing into the final convolutional layer to produce a coarse localization map highlighting important regions in the image for predicting the concept. "
from keras.applications.mobilenet_v2 import MobileNetV2, preprocess_input, decode_predictions
from keras.preprocessing.image import load_img, img_to_array
# we'll inspect an already trained model on ImageNet
model = MobileNetV2(include_top=True, weights='imagenet', classes=1000)
from PIL import Image
image = 'woof_meow.jpg'
im = Image.open(image)
display(im)
# prepare image as input to MobileNet model
dims = model.input_shape[1:3] # (height, width)
im = load_img(image, target_size=dims)
doc = img_to_array(im)
doc = np.expand_dims(doc, axis=0)
doc = preprocess_input(doc)
# predict classes for image
predictions = model.predict(doc)
# check the top 5 indices
decode_predictions(predictions)
[[('n02108089', 'boxer', 0.3668898), ('n02129165', 'lion', 0.12986171), ('n02108422', 'bull_mastiff', 0.11059377), ('n02093256', 'Staffordshire_bullterrier', 0.07208044), ('n02112137', 'chow', 0.022743672)]]
# let'e further examine with ELI5 why the model chose 'boxer' as its top prediction
eli5.show_prediction(model, doc)
We can see that the model is indeed focused on the Boxer's head for its top 5 predictions! Also very specific area in terms of activation map.
cat_idx = 282 # ImageNet index for 'cat'
eli5.show_prediction(model, doc, targets=[cat_idx]) # pass the class id
We can see that the cat portion of the image did trigger activations for the cat class, however, its just that the classes for dog types were getting stronger signals which pushed the classification towards 'Boxer' in the end.
Already demonstrated in Visualizing and Understanding Convolutional Networks, we know that the first convolutional layers tend to learn relativly simple features, whereas the deeper layers tend to learn more complex and specific features for each class.
We can visualize the activation map for different depths of the network to get a better sense of what were the activations in each layer with ELI5:
for layer in ['block_2_expand', 'block_9_expand', 'Conv_1']:
print(layer)
display(eli5.show_prediction(model, doc, layer=layer))
block_2_expand
block_9_expand
Conv_1
Indeed we can see that the first layers were activating on simple features such as edges, it seems the middle layer was already targeting the two animals but didn't differentiate between them, and the final layer was activated specifically on the dog.