Excercises from *Chapter 6 of An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani.
import numpy as np
Best subset selection will exhibit the smallest training RSS because all possible combinations of predictors are considered for a given k. It is possible that the most effective model is also found by forward stepwise or backward stepwise, but not possible that they will find a model more effective than best subset, as the former consider a subset of the models considered by best subset selection.
Best subset selection will certainly yield the smallest test rss, this model might also be discovered by forward stepwise or backward stepwise but this is not certain.
i. The predictors in the k-variable model identified by forward stepwise are a subset of the predictors in the (k+1)-variable model identified by forward stepwise selection.
ii. The predictors in the k-variable model identified by backward stepwise are a subset of the predictors in the (k + 1)- variable model identified by backward stepwise selection.
iii. The predictors in the k-variable model identified by backward stepwise are a subset of the predictors in the (k + 1)- variable model identified by forward stepwise selection.
iv. The predictors in the k-variable model identified by forward stepwise are a subset of the predictors in the (k+1)-variable model identified by backward stepwise selection.
v. The predictors in the k-variable model identified by best subset are a subset of the predictors in the (k + 1)-variable model identified by best subset selection.
i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
As we increase s we increase the models flexibility, in the training setting this will lead to overfitting and a monotonic decrease in the RSS error.
The increased flexibility of our model improves our models performance in the test setting up until the point where our model starts to overfit the data.
Reduced constraint on sum of coefficients means monotonic increase in flexibility – increase variance
Reduced constraint on sum of coefficients means monotonic decrease in bias.
The irreducible error represents the inherent noise in our data, because this noise is random it contains no useful information and remains constant regardless of the models flexibility.
As lambda increases a heavier penalty is placed on a higher total value of coefficients so the model becomes less flexible and so is less able to fite variability in the training data.
As lambda increases the models bias is increased reducing any overfitting of the training data and so improving the test RSS accuracy. At some point the bias will increase to the point that our model is less able to represent true relationships in the data.
As lambas increases a higher penalty is placed on model flexibility and so variance decreases.
As lambda increases coefficient estimates are reduced and so bias increases.
Irreducible error is not effected by the model.
import matplotlib.pyplot as plt
import seaborn as sns
def rr(y1, β1, λ):
return np.power(y1 - β1, 2) + λ*(β1**2)
y1 = 5
λ = 1
β = list(range(-10, 15))
results = [rr(y1, β1, λ) for β1 in β]
βR = y1/(1+λ)
ax = sns.scatterplot(x=β, y=results)
ax.axvline(x=βR, color='r')
plt.xlabel('β1')
plt.ylabel('Cost');
def lasso(y1, β1, λ):
return np.power(y1 - β1, 2) + λ*(np.absolute(β1))
y1 = -1
λ = 3
β = list(range(-10, 15))
results = [lasso(y1, β1, λ) for β1 in β]
if y1 > λ/2:
print('y1 > λ/2')
βL = y1 - λ/2
if y1 < -λ/2:
print('y1 < -λ/2')
βL = y1 + λ/2
if np.absolute(y1) <= λ/2:
print('np.absolute(y1) <= λ/2')
βL = 0
ax = sns.scatterplot(x=β, y=results)
ax.axvline(x=βL, color='r')
plt.xlabel('β1')
plt.ylabel('Cost');
np.absolute(y1) <= λ/2
np.random.seed(1)
x1 = np.random.normal(0, 1, 100)
y_hat = x1 + np.random.normal(0, 1, 100)*0.2
ax = sns.scatterplot(x=x1, y=y_hat)
plt.xlabel('x1')
plt.ylabel('y_hat');
np.random.seed(1)
x1 = np.random.normal(0, 1, 100)
y_hat = np.random.normal(0, 1, 100)*0.2
sns.scatterplot(x=x1, y=y_hat)
plt.xlabel('x1')
plt.ylabel('y_hat');