# 03 - Linear Regression¶

version 1.3, June 2018

## Part of the class Applied Deep Learning¶

In [1]:
import pandas as pd
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
import itertools
plt.style.use('ggplot')

In [2]:
print(plt.style.available)

['seaborn-pastel', 'seaborn-dark-palette', 'seaborn-darkgrid', '_classic_test', 'bmh', 'seaborn-muted', 'fast', 'fivethirtyeight', 'seaborn-dark', 'ggplot', 'seaborn-bright', 'seaborn-talk', 'dark_background', 'seaborn-poster', 'seaborn-notebook', 'seaborn-ticks', 'seaborn-deep', 'seaborn-white', 'seaborn', 'Solarize_Light2', 'classic', 'grayscale', 'seaborn-whitegrid', 'seaborn-colorblind', 'seaborn-paper']

In [3]:
plt.style.use('fivethirtyeight')

In [4]:
# Test Dataset
import zipfile
with zipfile.ZipFile('../datasets/houses_portland.csv.zip', 'r') as z:
f = z.open('houses_portland.csv')

Out[4]:
area bedroom price
0 2104 3 399900
1 1600 3 329900
2 2400 3 369000
3 1416 2 232000
4 3000 4 539900
In [5]:
data.columns

Out[5]:
Index(['area', 'bedroom', ' price'], dtype='object')
In [6]:
y = data[' price'].values
X = data['area'].values
plt.scatter(X, y)
plt.xlabel('Area')
plt.ylabel('Price')

Out[6]:
Text(0,0.5,'Price')

## $$x = \frac{x -\overline x}{\sigma_x}$$¶

In [7]:
y_mean, y_std = y.mean(), y.std()
X_mean, X_std = X.mean(), X.std()

y = (y - y_mean)/ y_std
X = (X - X_mean)/ X_std

plt.scatter(X, y)
plt.xlabel('Area')
plt.ylabel('Price')

Out[7]:
Text(0,0.5,'Price')

## $$h_\beta(x) = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_nx_n$$¶

• $h_\beta(x)$ is the response
• $\beta_0$ is the intercept
• $\beta_1$ is the coefficient for $x_1$ (the first feature)
• $\beta_n$ is the coefficient for $x_n$ (the nth feature)

The $\beta$ values are called the model coefficients:

• These values are estimated (or "learned") during the model fitting process using the least squares criterion.
• Specifically, we are find the line (mathematically) which minimizes the sum of squared residuals (or "sum of squared errors").
• And once we've learned these coefficients, we can use the model to predict the response.

In the diagram above:

• The black dots are the observed values of x and y.
• The blue line is our least squares line.
• The red lines are the residuals, which are the vertical distances between the observed values and the least squares line.

### Cost function¶

The goal became to estimate the parameters $\beta$ that minimisse the sum of squared residuals

## $$J(\beta_0, \beta_1)=\frac{1}{2n}\sum_{i=1}^n (h_\beta(x_i)-y_i)^2$$¶

In [8]:
# create X and y
n_samples = X.shape[0]
X_ = np.c_[np.ones(n_samples), X]


Lets suppose the following betas

In [9]:
beta_ini = np.array([-1, 1])

In [10]:
# h
def lr_h(beta,x):
return np.dot(beta, x.T)

In [11]:
# scatter plot
plt.scatter(X, y,c='b')

# Plot the linear regression
x = np.c_[np.ones(2), [X.min(), X.max()]]
plt.plot(x[:, 1], lr_h(beta_ini, x), 'r', lw=5)
plt.xlabel('Area')
plt.ylabel('Price')

Out[11]:
Text(0,0.5,'Price')

Lets calculate the error of such regression

In [12]:
# Cost function
def lr_cost_func(beta, x, y):
# Can be vectorized
res = 0
for i in range(x.shape[0]):
res += (lr_h(beta,x[i, :]) - y[i]) ** 2
res *= 1 / (2*x.shape[0])
return res
lr_cost_func(beta_ini, X_, y)

Out[12]:
0.6450124071218747

### Understanding the cost function¶

Lets see how the cost function looks like for different values of $\beta$

In [13]:
beta0 = np.arange(-15, 20, 1)

beta1 = 2

In [14]:
cost_func=[]
for beta_0 in beta0:
cost_func.append(lr_cost_func(np.array([beta_0, beta1]), X_, y) )

plt.plot(beta0, cost_func)
plt.xlabel('beta_0')
plt.ylabel('J(beta)')

Out[14]:
Text(0,0.5,'J(beta)')
In [15]:
beta0 = 0
beta1 = np.arange(-15, 20, 1)

In [16]:
cost_func=[]
for beta_1 in beta1:
cost_func.append(lr_cost_func(np.array([beta0, beta_1]), X_, y) )

plt.plot(beta1, cost_func)
plt.xlabel('beta_1')
plt.ylabel('J(beta)')

Out[16]:
Text(0,0.5,'J(beta)')

Analyzing both at the same time

In [17]:
beta0 = np.arange(-5, 7, 0.2)
beta1 = np.arange(-5, 7, 0.2)

In [18]:
cost_func = pd.DataFrame(index=beta0, columns=beta1)

for beta_0 in beta0:
for beta_1 in beta1:
cost_func.loc[beta_0, beta_1] = lr_cost_func(np.array([beta_0, beta_1]), X_, y)

In [19]:
betas = np.transpose([np.tile(beta0, beta1.shape[0]), np.repeat(beta1, beta0.shape[0])])
fig = plt.figure(figsize=(10, 10))
ax = fig.gca(projection='3d')
ax.plot_trisurf(betas[:, 0], betas[:, 1], cost_func.T.values.flatten(), cmap=cm.jet, linewidth=0.1)
ax.set_xlabel('beta_0')
ax.set_ylabel('beta_1')
ax.set_zlabel('J(beta)')
plt.show()


It can also be seen as a contour plot

In [20]:
contour_levels = [0, 0.1, 0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, 5, 7, 10, 12, 15, 20]
plt.contour(beta0, beta1, cost_func.T.values, contour_levels)
plt.xlabel('beta_0')
plt.ylabel('beta_1')

Out[20]:
Text(0,0.5,'beta_1')

Lets understand how different values of betas are observed on the contour plot

In [21]:
betas = np.array([[0, 0],
[-1, -1],
[-5, 5],
[3, -2]])

In [22]:
plt.style.use('seaborn-notebook')

In [23]:
for beta in betas:
print('\n\nLinear Regression with betas ', beta)
f, (ax1, ax2) = plt.subplots(1,2, figsize=(12, 6))
ax2.contour(beta0, beta1, cost_func.T.values, contour_levels)
ax2.set_xlabel('beta_0')
ax2.set_ylabel('beta_1')
ax2.scatter(beta[0], beta[1],c='b', s=50)

# scatter plot
ax1.scatter(X, y,c='b')

# Plot the linear regression
x = np.c_[np.ones(2), [X.min(), X.max()]]
ax1.plot(x[:, 1], lr_h(beta, x), 'r', lw=5)
ax1.set_xlabel('Area')
ax1.set_ylabel('Price')
plt.show()

Linear Regression with betas  [0 0]

Linear Regression with betas  [-1 -1]

Linear Regression with betas  [-5  5]

Linear Regression with betas  [ 3 -2]