Regression¶

We'll use the famous Boston house prices dataset.

In [1]:

import pandas as pd
from sklearn.datasets import load_boston

boston = load_boston()
df = pd.DataFrame(boston['data'], columns=boston['feature_names'])
df.head()

Out[1]:

	CRIM	ZN	INDUS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.00632	18.0	2.31	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98
1	0.02731	0.0	7.07	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14
2	0.02729	0.0	7.07	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03
3	0.03237	0.0	2.18	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94
4	0.06905	0.0	2.18	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33

In [0]:

dfX = df[['RM', 'AGE', 'DIS', 'LSTAT']]
X = dfX.values
y = boston['target']

We could plot the data:

In [0]:

Let's do machine learning!

First, we'll just use one feature:

In [0]:

x = X[:, 0].reshape(-1, 1)

In [29]:

plt.plot(x, y, 'o')

Out[29]:

[<matplotlib.lines.Line2D at 0x7f78a52a43c8>]

In [0]:

import numpy as np
from sklearn.linear_model import LinearRegression

reg = LinearRegression()

reg.fit(x, y)

y_hat = reg.predict(x)

In [38]:

reg.coef_, reg.intercept_

Out[38]:

(array([9.10210898]), -34.67062077643857)

In [42]:

model_x = np.arange(3, 10, 0.1).reshape(-1, 1)
model_y = reg.predict(model_x)

plt.plot(x, y, 'o')
plt.plot(model_x, model_y)

Out[42]:

[<matplotlib.lines.Line2D at 0x7f78a505a278>]

In [26]:

from sklearn.metrics import r2_score

r2_score(y, y_hat)

Out[26]:

0.48352545599133423

In [0]: