🤖⚡ scikit-learn tip #47 (video)¶

Want to improve the accuracy of your VotingClassifier? Try tuning the 'voting' and 'weights' parameters to change how predictions are combined!

See example 👇

P.S. If you're using VotingRegressor, just tune the 'weights' parameter

In [1]:

import pandas as pd
df = pd.read_csv('http://bit.ly/kaggletrain')

In [2]:

cols = ['Pclass', 'Parch', 'SibSp', 'Fare']
X = df[cols]
y = df['Survived']

In [3]:

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import cross_val_score, GridSearchCV

In [4]:

lr = LogisticRegression(solver='liblinear', random_state=1)
rf = RandomForestClassifier(max_features=None, random_state=1)
nb = MultinomialNB()

In [5]:

# create an ensemble of 3 classifiers
vc = VotingClassifier([('clf1', lr), ('clf2', rf), ('clf3', nb)])
cross_val_score(vc, X, y).mean()

Out[5]:

0.6970560542338836

In [6]:

# define VotingClassifier parameters to search
params = {'voting':['hard', 'soft'],
          'weights':[(1,1,1), (2,1,1), (1,2,1), (1,1,2)]}

In [7]:

# find the best set of parameters
grid = GridSearchCV(vc, params)
grid.fit(X, y)
grid.best_params_

Out[7]:

{'voting': 'soft', 'weights': (1, 2, 1)}

In [8]:

# accuracy has improved
grid.best_score_

Out[8]:

0.7262820915196786

🤖⚡ scikit-learn tip #47 (video)¶

Want more tips? View all tips on GitHub or Sign up to receive 2 tips by email every week 💌¶