🤖⚡ scikit-learn tip #35 (video)¶

There's no need to use ".values" when passing a DataFrame or Series to scikit-learn... it knows how to access the underlying NumPy array!

See example 👇

In [1]:

import pandas as pd
df = pd.read_csv('http://bit.ly/kaggletrain', usecols=['Survived', 'Pclass', 'Fare'])

In [2]:

from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()

In [3]:

X = df[['Pclass', 'Fare']]
y = df['Survived']

In [4]:

print(type(X))
print(type(y))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>

In [5]:

# there's no need to use X.values or y.values
clf.fit(X, y)

Out[5]:

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

🤖⚡ scikit-learn tip #35 (video)¶

Want more tips? View all tips on GitHub or Sign up to receive 2 tips by email every week 💌¶