We perform best subset, forward stepwise, and backward stepwise selection on a single data set. For each approach, we obtain p + 1 models, containing 0, 1, 2, . . . , p predictors. Explain your answers:
(a) Which of the three models with k predictors has the smallest training RSS?
The model obtained by best subset selection has training RSS equal to or smaller than the others. The best subset procedure considers all possible models for each number of predictors, forward stepwise, and backward stepwise do not.
For k=1, best subset and forward stepwise will always obtain the same model.
For k=p, best subset and backward stepwise will always obtain the same model.
(b) Which of the three models with k predictors has the smallest test RSS?
(c)
i. The predictors in the k-variable model identified by forward stepwise are a subset of the predictors in the (k+1)-variable model identified by forward stepwise selection.
ii. The predictors in the k-variable model identified by backward stepwise are a subset of the predictors in the (k + 1) variable model identified by backward stepwise selection.
iii. The predictors in the k-variable model identified by back- ward stepwise are a subset of the predictors in the (k + 1)- variable model identified by forward stepwise selection.
iv. The predictors in the k-variable model identified by forward stepwise are a subset of the predictors in the (k+1) variable model identified by backward stepwise selection.
v. The predictors in the k-variable model identified by best subset are a subset of the predictors in the (k + 1) variable model identified by best subset selection.
(a) The lasso, relative to least squares, is:
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
(b) Repeat (a) for ridge regression relative to least squares.
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
(c) Repeat (a) for non-linear methods relative to least squares.
ii. More flexible and hence will give improved prediction accu- racy when its increase in variance is less than its decrease in bias.
for a particular value of s. For parts (a) through (e), indicate which of i. through v. is correct. Justify your answer
(a) As we increase s from 0, the training RSS will:
iii. Steadily increase.
(b) Repeat (a) for test RSS.
ii. Decrease initially, and then eventually start increasing in a U shape.
(c) Repeat (a) for variance.
iv. Steadily decrease
(d) Repeat (a) for (squared) bias.
iii. Steadily increase.
(e) Repeat (a) for the irreducible error.
v. Remain constant.
(Ridge optimisation objective)
answers same as for 3
Suppose that n = 2, p = 2
, x11 = x12
, x21 = x22
. Furthermore, suppose that y1+y2 = 0
and x11 + x21 = 0
and x12 + x22 =0
,so that the estimate for the intercept in a least squares, ridge regression, or lasso model is zero: β_hat_0 = 0.
(a) Write out the ridge regression optimization problem in this set- ting. (b) Argue that in this setting, the ridge coefficient estimates satisfy βˆ 1 = βˆ 2 .
(c) Write out the lasso optimization problem in this setting. (d) Argue that in this setting, the lasso coefficients βˆ1 and βˆ2 are not unique—in other words, there are many possible solutions to the optimization problem in (c). Describe these solutions.
(a) Consider (6.12) with p = 1. For some choice of y1 and λ > 0, plot (6.12) as a function of β1. Your plot should confirm that (6.12) is solved by (6.14).
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
y = 10
lmbda = 2
def eq_612(beta):
return (y - beta) ** 2 + (lmbda * (beta ** 2))
betas = np.linspace(-3,10,101)
f_of_betas = [eq_612(b) for b in betas]
sns.lineplot(x=betas, y=f_of_betas)
plt.axvline(x=y / (1 + lmbda), c='y')
<matplotlib.lines.Line2D at 0x115c534a8>
(b) Consider (6.13) with p = 1. For some choice of y1 and λ > 0, plot (6.13) as a function of β1. Your plot should confirm that (6.13) is solved by (6.15).
y = 10
lmbda = 2
def eq_613(beta):
return (y - beta) ** 2 + (lmbda * np.abs(beta))
betas = np.linspace(-3,20,101)
f_of_betas = [eq_613(b) for b in betas]
sns.lineplot(x=betas, y=f_of_betas)
solution = None
if y > (lmbda/2):
solution = y - (lmbda/2)
elif y < - lmbda / 2:
solution = y + lmbda/2
elif y <= lmbda/2:
soluton = 0
plt.axvline(x=solution, c='y')
<matplotlib.lines.Line2D at 0x1153d03c8>