🤖⚡ scikit-learn tip #18 (video)¶

Hyperparameter search results (from GridSearchCV or RandomizedSearchCV) can be converted into a pandas DataFrame.

Makes it far easier to explore the results!

See example 👇

In [1]:

import pandas as pd
df = pd.read_csv('http://bit.ly/kaggletrain')

In [2]:

X = df[['Pclass', 'Sex', 'Name']]
y = df['Survived']

In [3]:

from sklearn.preprocessing import OneHotEncoder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.compose import make_column_transformer
from sklearn.pipeline import Pipeline

In [4]:

ohe = OneHotEncoder()
vect = CountVectorizer()
clf = LogisticRegression(solver='liblinear', random_state=1)

In [5]:

ct = make_column_transformer((ohe, ['Sex']), (vect, 'Name'), remainder='passthrough')
pipe = Pipeline([('preprocessor', ct), ('model', clf)])

In [6]:

# specify parameter values to search
params = {}
params['model__C'] = [0.1, 1, 10]
params['model__penalty'] = ['l1', 'l2']

In [7]:

# try all possible combinations of those parameter values
from sklearn.model_selection import GridSearchCV
grid = GridSearchCV(pipe, params, cv=5, scoring='accuracy')
grid.fit(X, y);

In [8]:

# convert results into a DataFrame
results = pd.DataFrame(grid.cv_results_)[['params', 'mean_test_score', 'rank_test_score']]

In [9]:

# sort by test score
results.sort_values('rank_test_score')

Out[9]:

	params	mean_test_score	rank_test_score
2	{'model__C': 1, 'model__penalty': 'l1'}	0.821512	1
4	{'model__C': 10, 'model__penalty': 'l1'}	0.820413	2
5	{'model__C': 10, 'model__penalty': 'l2'}	0.817055	3
3	{'model__C': 1, 'model__penalty': 'l2'}	0.812573	4
1	{'model__C': 0.1, 'model__penalty': 'l2'}	0.791225	5
0	{'model__C': 0.1, 'model__penalty': 'l1'}	0.788984	6

🤖⚡ scikit-learn tip #18 (video)¶

Want more tips? View all tips on GitHub or Sign up to receive 2 tips by email every week 💌¶