This notebook contains code and comments from Section 3.2 of the book Ensemble Methods for Machine Learning. Please see the book for additional details on this topic. This notebook and code are released under the MIT license.
First, we copy over the data generation stuff and the functions fit
and predict_individual
exactly as is from Section 3.1. (An alternate way to do this would be to import the functions from the notebook using the ipynb
package).
# Ignore FutureWarnings
# Currently generated by sklearn.neighbors which uses scipy.mode
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning)
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
X, y = make_moons(600, noise=0.25, random_state=13)
X, Xval, y, yval = train_test_split(X, y, test_size=0.25) # Set aside 25% of data for validation
Xtrn, Xtst, ytrn, ytst = train_test_split(X, y, test_size=0.25) # Set aside a further 25% of data for hold-out test
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
estimators = [('dt', DecisionTreeClassifier(max_depth=5)),
('svm', SVC(gamma=1.0, C=1.0, probability=True)),
('gp', GaussianProcessClassifier(RBF(1.0))),
('3nn', KNeighborsClassifier(n_neighbors=3)),
('rf',RandomForestClassifier(max_depth=3, n_estimators=25)),
('gnb', GaussianNB())]
def fit(estimators, X, y):
for model, estimator in estimators:
estimator.fit(X, y)
return estimators
import numpy as np
def predict_individual(X, estimators, proba=False):
n_estimators = len(estimators)
n_samples = X.shape[0]
y = np.zeros((n_samples, n_estimators))
for i, (model, estimator) in enumerate(estimators):
if proba:
y[:, i] = estimator.predict_proba(X)[:, 1]
else:
y[:, i] = estimator.predict(X)
return y
Fit the estimators to the synthetic data set.
estimators = fit(estimators, Xtrn, ytrn)
Listing 3.3: Combine predictions using majority vote. The listing below combines the individual predictions y_individual from a heterogeneous set of base estimators using majority voting. Note that since the weights of the base estimators are all equal, we do not explicitly compute them.
from scipy.stats import mode
def combine_using_majority_vote(X, estimators):
y_individual = predict_individual(X, estimators, proba=False)
y_final = mode(y_individual, axis=1, keepdims=False)
return y_final[0].reshape(-1, )
from sklearn.metrics import accuracy_score
ypred = combine_using_majority_vote(Xtst, estimators)
tst_err = 1 - accuracy_score(ytst, ypred)
tst_err
0.06194690265486724
Listing 3.4: Combine using accuracy weighting
Accuracy weighting is a performance-based weighting strategy, where higher-performing base estimators in the ensemble are assigned higher weights, so that they contribute more to the final prediction.
Once we have trained each base classifier, we evaluate its performance on a validation set. Let αt be the validation accuracy of the t-th classifier, Ht. The weight of each base classifier is then computed as wt=αt∑mt=1αt.
The denominator is a normalization term: the sum of all the individual validation accuracies. This computation ensures that a classifier’s weight is proportional to its accuracy and all the weights sum to 1.
def combine_using_accuracy_weighting(X, estimators, Xval, yval):
n_estimators = len(estimators)
yval_individual = predict_individual(Xval, estimators, proba=False)
wts = [accuracy_score(yval, yval_individual[:, i])
for i in range(n_estimators)]
wts /= np.sum(wts)
ypred_individual = predict_individual(X, estimators, proba=False)
y_final = np.dot(ypred_individual, wts)
return np.round(y_final)
ypred = combine_using_accuracy_weighting(Xtst, estimators, Xval, yval)
tst_err = 1 - accuracy_score(ytst, ypred)
tst_err
0.03539823008849563
The entropy weighting approach is another performance-based weighting approach, except that it uses entropy as the evaluation metric to judge the value of each base estimator. Entropy is a measure of uncertainty or impurity in a set; a more disorderly set will have higher entropy.
Listing 3.5 Computing entropy.
def entropy(y):
_, counts = np.unique(y, return_counts=True)
p = np.array(counts.astype('float') / len(y))
ent = -p.T @ np.log2(p)
return ent
Listing 3.6: Combine using entropy weighting
Let Et be the validation entropy of the t-th classifier, Ht. The weight of each base classifier is then computed as \frac{(1/E_t)}{\sum_{t=1}^m , (1/E_t)}$. Contrast this with accuracy weighting, and observe that the entropies are inverted. This is because lower entropies are desirable, much like higher accuracies are desirable in a model.
def combine_using_entropy_weighting(X, estimators, Xval, yval):
n_estimators = len(estimators)
yval_individual = predict_individual(Xval, estimators, proba=False)
wts = [1/entropy(yval_individual[:, i])
for i in range(n_estimators)]
wts /= np.sum(wts)
ypred_individual = predict_individual(X, estimators, proba=False)
y_final = np.dot(ypred_individual, wts)
return np.round(y_final)
ypred = combine_using_entropy_weighting(Xtst, estimators, Xval, yval)
tst_err = 1 - accuracy_score(ytst, ypred)
tst_err
0.03539823008849563
Dempster-Shafer Theory is a generalization of probability theory that supports reasoning under uncertainty and with incomplete knowledge. While the foundations of DST are beyond the scope of this book, the theory itself provides a way to fuse beliefs and evidence from multiple sources into one single belief.
DST uses a number between 0 and 1 to indicate belief in a proposition, such as “the test example x belongs to Class 1”. This number is known as a basic probability assignment (BPA) and expresses the certainty that the text example x belongs to Class 1. BPA values closer to 1 characterize decisions made with more certainty. The BPA allows us to translate an estimator’s confidence to a belief over what the true label is.
Listing 3.7: Combining using Dempster-Shafer
def combine_using_Dempster_Schafer(X, estimators):
p_individual = predict_individual(X, estimators, proba=True)
bpa0 = 1.0 - np.prod(p_individual, axis=1) - 1e-6
bpa1 = 1.0 - np.prod(1.0 - p_individual, axis=1) - 1e-6
belief = np.vstack([bpa0 / (1 - bpa0), bpa1 / (1 - bpa1)]).T
y_final = np.argmax(belief, axis=1)
return y_final
ypred = combine_using_Dempster_Schafer(Xtst, estimators)
tst_err = 1 - accuracy_score(ytst, ypred)
tst_err
0.053097345132743334
Unlisted: Logarithmic opinion pooling (LOP), first described in Heskes98.
def combine_using_LOP(X, estimators, Xval, yval):
n_estimators = len(estimators)
yval_individual = predict_individual(Xval, estimators, proba=False)
wts = [1 - accuracy_score(yval, yval_individual[:, i]) for i in range(n_estimators)]
wts /= np.sum(wts)
p_individual = predict_individual(X, estimators, proba=True)
p_LOP = np.exp(np.dot(np.log(p_individual), wts))
return np.round(p_LOP)
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as col
combination_methods = [('Majority Vote', combine_using_majority_vote),
('Dempster-Shafer', combine_using_Dempster_Schafer),
('Accuracy Weighting', combine_using_accuracy_weighting),
('Entropy Weighting', combine_using_entropy_weighting)]
# ('Logarithmic Opinion Pool', combine_using_LOP)]
nrows, ncols = 2, 2
# cm = get_colors(colormap='RdBu')
cmap = cm.get_cmap('Blues')
colors = cmap(np.linspace(0, 0.5, num=2))
# Init plotting
xMin, xMax = Xtrn[:, 0].min() - 0.25, Xtrn[:, 0].max() + 0.25
yMin, yMax = Xtrn[:, 1].min() - 0.25, Xtrn[:, 1].max() + 0.25
xMesh, yMesh = np.meshgrid(np.arange(xMin, xMax, 0.05),
np.arange(yMin, yMax, 0.05))
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize=(8, 8))
for i, (method, combiner) in enumerate(combination_methods):
c, r = divmod(i, 2)
if i < 2:
zMesh = combiner(np.c_[xMesh.ravel(), yMesh.ravel()], estimators)
ypred = combiner(Xtst, estimators)
else:
zMesh = combiner(np.c_[xMesh.ravel(), yMesh.ravel()], estimators, Xval, yval)
ypred = combiner(Xtst, estimators, Xval, yval)
zMesh = zMesh.reshape(xMesh.shape)
ax[r, c].contourf(xMesh, yMesh, zMesh, cmap='Blues', alpha=0.2)
ax[r, c].contour(xMesh, yMesh, zMesh, [0.5], colors='k', linewidths=2.5)
ax[r, c].scatter(Xtrn[ytrn == 0, 0], Xtrn[ytrn == 0, 1], marker='o',
c=col.rgb2hex(colors[0]), edgecolors='k', s=80, alpha=0.5)
ax[r, c].scatter(Xtrn[ytrn == 1, 0], Xtrn[ytrn == 1, 1], marker='s',
c=col.rgb2hex(colors[1]), edgecolors='k', s=80, alpha=0.5)
ax[r, c].set_xlabel('$x_1$', fontsize=12)
ax[r, c].set_ylabel('$x_2$', fontsize=12)
tst_err = 1 - accuracy_score(ytst, ypred)
title = '{0} (err = {1:4.2f}%)'.format(method, tst_err*100)
ax[r, c].set_title(title, fontsize=12)
fig.tight_layout()
# plt.savefig('./figures/CH03_F10_Kunapuli.png', format='png', dpi=300, bbox_inches='tight');
# plt.savefig('./figures/CH03_F10_Kunapuli.pdf', format='pdf', dpi=300, bbox_inches='tight');