Ensemble learning - this is a method use multiple learning algorithms to obtain(I'd say usually it so, but not in any cases) better predictive performance than could be obtained from any of the constituent learning algorithms alone.
The most common techniques are:
We are looking at some meta-algorithms to make an ensemble, which can improve the performance of a metric, get a better speed of experimenting and simplify code.
I would like to show several libraries for ensambling in python:
We will understand the use of different libraries on a simple example and plot of decision boundaries to visualize differences.
!pip install mlxtend
!pip install mlens
!pip install deslib
Prepare our notebook to further experiments:
We'll use Iris dataset as an example.
Features
Number of samples: 150.
Target variable (discrete): {50x Setosa, 50x Versicolor, 50x Virginica}
import itertools
import warnings
import matplotlib.gridspec as gridspec
import matplotlib.pyplot as plt
# common libraries
import numpy as np
from deslib.dcs import MCB
from deslib.des.knora_e import KNORAE
from deslib.static import StaticSelection
from mlens.ensemble import (BlendEnsemble, SequentialEnsemble, Subsemble,
SuperLearner)
from mlxtend.classifier import (EnsembleVoteClassifier, StackingClassifier,
StackingCVClassifier)
from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
warnings.filterwarnings("ignore")
# random seed
seed = 10
# Loading example data
X, y = iris_data()
X = X[:, [0, 2]]
# split the data into training and test data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, random_state=seed
)
# Initializing several classifiers
clf1 = LogisticRegression(random_state=seed)
clf2 = RandomForestClassifier(random_state=seed)
clf3 = SVC(random_state=seed, probability=True)
def compare(classifier, X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test):
# Plotting Decision Regions
gs = gridspec.GridSpec(2, 2)
fig = plt.figure(figsize=(10, 8))
# Label for our classifiers
labels = ["Logistic Regression", "Random Forest", "RBF kernel SVM", "Ensemble"]
classifiers = [clf1, clf2, clf3, classifier]
for clf, label, grid in zip(
classifiers, labels, itertools.product([0, 1], repeat=2)
):
clf.fit(X, y)
ax = plt.subplot(gs[grid[0], grid[1]])
fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2)
plt.title(label)
plt.show()
for clf, label in zip(classifiers, labels):
print(label)
print(classification_report(clf.predict(X_test), y_test))
Mlxtend classes:
Let's discover how we can use it by examples.
To get more information look https://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/
eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], weights=[2, 1, 1], voting="soft")
compare(eclf)
To get more information look https://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/
sclf = StackingClassifier(
classifiers=[clf1, clf2, clf3], meta_classifier=LogisticRegression()
)
compare(sclf)
To get more information look https://rasbt.github.io/mlxtend/user_guide/classifier/StackingCVClassifier/
scvclf = StackingCVClassifier(
classifiers=[clf1, clf2, clf3], meta_classifier=LogisticRegression()
)
compare(scvclf)
Mlens has several useful classes:
To get more information follow the link http://ml-ensemble.com/info/start/ensembles.html#super-learner
sl = SuperLearner(folds=5, random_state=seed, verbose=2)
# Build the first layer
sl.add([clf1, clf2, clf3])
# Attach the final meta-estimator
sl.add_meta(LogisticRegression())
compare(sl)
To get more information follow the link http://ml-ensemble.com/info/start/ensembles.html#subsemble
sub = Subsemble(partitions=3, random_state=seed, verbose=2, shuffle=True)
# Build the first layer
sub.add([clf1, clf2, clf3])
sub.add_meta(SVC())
compare(sub)
To get more information follow the link http://ml-ensemble.com/info/start/ensembles.html#blend-ensemble
be = BlendEnsemble(test_size=0.7, random_state=seed, verbose=2, shuffle=True)
# Build the first layer
be.add([clf1, clf2, clf3])
be.add_meta(LogisticRegression())
compare(be)
To get more information follow the link http://ml-ensemble.com/info/start/ensembles.html#sequential-ensemble
se = SequentialEnsemble(random_state=seed, shuffle=True)
# The initial layer is a blended layer, same as a layer in the BlendEnsemble
se.add("blend", [clf1, clf2])
# The second layer is a stacked layer, same as a layer of the SuperLearner
se.add("stack", [clf1, clf3])
# The meta estimator is added as in any other ensemble
se.add_meta(SVC())
compare(se)
DESlib has 23 algorithms different ensemble technics split into 3 group:
let's try some of them from different groups:
To get more information follow the link https://deslib.readthedocs.io/en/latest/api.html
To get more information follow the link https://deslib.readthedocs.io/en/latest/modules/des/knora_e.html
kne = KNORAE([clf1, clf2, clf3])
compare(kne)
To get more information follow the link https://deslib.readthedocs.io/en/latest/modules/dcs/mcb.html
mcb = MCB([clf1, clf2, clf3])
compare(mcb)
To get more information follow the link https://deslib.readthedocs.io/en/latest/modules/static/static_selection.html
ss = StaticSelection([clf1, clf2, clf3])
compare(ss)
We looked at different algorithms and different libraries, which might save a lot of time when you need to use ensemble technic. Ensemble modeling is a powerful way to improve the performance of your machine learning models. If you wish to be on the top of the leaderboard in any machine learning competition or want to improve models you are working on – ensemble is the way to go.
P.S. There is no silver bullet... try different tools and algorithms.