Notebook

Regularization¶

Authors: Brian Stucky, Carson Andorf

1. Introduction¶

Two loss function

(Image from Google machine learning crash course)

An example dataset¶

This simple dataset contains information about insect, fish, and bird species and whether or not they can fly:

Name	Class	Can fly
Pileated woodpecker	Birds	Yes
Emu	Birds	No
Northern cardinal	Birds	Yes
Blacktip shark	Cartilaginous fishes	No
Bluntnose stingray	Cartilaginous fishes	No
Black drum	Bony fishes	No
Florida carpenter ant	Insects	No
Periodical cicada	Insects	Yes
Luna moth	Insects	Yes

Your task: Develop a model to classify whether or not an animal can fly, based on information available in the dataset.

Model 1¶

If the animal is a bird or an insect, predict that it can fly.
Otherwise, predict that it cannot fly.

Does this model make any mistakes? If so, can we improve it?

Model 2¶

If the species is a bird and has a one-word name, predict that it cannot fly.
If it is a bird with a two-word name, predict that it can fly.
If it is an insect with a three-word name, predict that it cannot fly.
If it is an insect with a two-word name, predict that it can fly.
Otherwise, predict that it cannot fly.

Aha! That model classifies each training example perfectly!

Key points¶

We want our models to be general enough to work well on new examples.
Methods to help prevent overfitting are collectively referred to as regularization techniques.
Do not trust your training examples too much!

2. L₁ and L₂ regularization¶

For this lesson, we will focus on two widely used regularization methods: L₁ and L₂ regularization. Both of these methods represent model complexity as a function of the model's feature weights.

Reminder: The general linear regression model looks like this:

$$ y = w_0 + w_1 x_1 + w_2 x_2 + \ldots + w_k x_k $$

The L₁ regularization penalty is:

$$L_1\text{ }regularization\text{ }penalty = \lambda\sum_{i=1}^k |w_i|$$

In [ ]:

import numpy as np

weights = [-0.5, -0.2, 0.5, 0.7, 1.0, 2.5]

The L₂ regularization penalty is:

$$L_2\text{ }regularization\text{ }penalty = \lambda\sum_{i=1}^k w_i^2$$

In [ ]:

2.a. Adding regularization to a loss/cost function¶

Recall that the usual loss function for linear regression is the mean square error:

$$ MSE = \frac{1}{n} \sum_{i=1}^n (y_i - (w_0 + w_1 x_{i,1} + w_2 x_{i,2} + \ldots + w_k x_{i,k}))^2 $$

To add L₁ regularization, we want to minimize:

$$ MSE + \lambda\sum_{i=1}^k |w_i|$$

2.b. Lambda¶

2.c. Practical differences between L₁ and L₂ regularization¶

L₁ regularization can result in models where some of the feature weights are 0.
L₂ regularization can decrease model weights but not drive them to 0.
L₂ regularization results in a minimization problem with a unique solution, which is not always the case for L₁ regularization.
Which is best depends on the specifics of the data, the modeling problem, and the goals of the analysis.

3. Practice example / demonstration¶

Let's analyze a dataset called regularization.csv that you can find in the nb-datasets folder.

First, try using regular old non-regularized linear regression¶

In [ ]:

import pandas as pd
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as mse

Try using L₁ regularization¶

In [ ]:

Exercise¶

Try experimenting with the value of the regularization parameter in the code above. How does changing the value of alpha affect the results? When do you get results that are misleading or just plain wrong?

Try using L₂ regularization¶

In [ ]:

4. Practice example using real data¶

Let's try using regularization on a real dataset. We'll again use the iris dataset that you've already seen in previous lessons. We might not have time for this example during the workshop, and if not, I encourage you to explore it on your own.

Load the data and split out training and testing sets.¶

In [ ]:

idata = pd.read_csv('../nb-datasets/iris_dataset.csv')
idata['species'] = idata['species'].astype('category')

# Convert the categorical variable "species" to 1-hot encoding (AKA "dummy variables"),
# but eliminate the first dummy variable because it is collinear with the other two
# and does not provide any additional information.
idata_enc = pd.get_dummies(idata, drop_first=True)

# Separate the x and y values.
x = idata_enc.drop(columns='petal_length')
y = idata_enc['petal_length']

# Split the train and test sets.
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)

# See what we have.
idata_enc.head()

Give standard linear regression a try¶

In [ ]:

Try L₁ regularization¶

In [ ]:

Try L₂ regularization¶

In [ ]:

Exercises¶

Try experimenting with the value of alpha/$\lambda$ in the code above for both L₁ regularization and L₂ regularization. As you do so, consider these questions:

How does changing the value of the regularization parameter affect the coefficient weights and training/test performance?
What values of the regularization parameter give you the best test accuracy?
For these data, does L₁ or L₂ regularization perform better?

Regularization¶

1. Introduction¶

An example dataset¶

Model 1¶

Model 2¶

Key points¶

2. L1 and L2 regularization¶

2.a. Adding regularization to a loss/cost function¶

2.b. Lambda¶

2.c. Practical differences between L1 and L2 regularization¶

3. Practice example / demonstration¶

First, try using regular old non-regularized linear regression¶

Try using L1 regularization¶

Exercise¶

Try using L2 regularization¶

4. Practice example using real data¶

Load the data and split out training and testing sets.¶

Give standard linear regression a try¶

Try L1 regularization¶

Try L2 regularization¶

Exercises¶

2. L₁ and L₂ regularization¶

2.c. Practical differences between L₁ and L₂ regularization¶

Try using L₁ regularization¶

Try using L₂ regularization¶

Try L₁ regularization¶

Try L₂ regularization¶