It is mentioned in Section 8.2.3 that boosting using depth-one trees (or stumps) leads to an additive model: that is, a model of the form: SEE ISL p332
Explain why this is the case. You can begin with (8.12) in Algorithm 8.2.
... We can see this by inspection?!
Hint: In a setting with two classes, pˆm1 = 1 − pˆm2. You could make this plot by hand, but it will be much easier to make in R.
import numpy as np
import pandas as pd
import seaborn as sns
# binary classification: pm1 is the complement of pm2
pm1 = np.linspace(0, 1, 10)
pm2 = np.repeat(1, 10) - pm1
pmk = pd.DataFrame({'pm1':pm1,'pm2':pm2})
# binary classification: k = 2
# Gini
g1 = pmk.pm1 * (1 - pmk.pm1)
g2 = pmk.pm2 * (1 - pmk.pm2)
g = g1 + g2
# Entropy
e1 = pmk.pm1 * np.log(pmk.pm1.replace([0], 1))
e2 = pmk.pm2 * np.log(pmk.pm2.replace([0], 1))
e = -(e1 + e2)
# Classification error
# it is the fraction not belonging to most common class
ce = pmk.min(axis=1)
# plot
line_df = pd.DataFrame({'gini' : g,
'pm1': pmk.pm1,
'class_err': ce,
'entropy' : e})
sns.lineplot(x='pm1', y='gini', data=line_df, color='tab:red')
sns.lineplot(x='pm1', y='entropy', data=line_df, color='tab:blue')
sns.lineplot(x='pm1', y='class_err', data=line_df, color='tab:green')
<matplotlib.axes._subplots.AxesSubplot at 0x10f565320>
(a) Sketch the tree corresponding to the partition of the predictor space illustrated in the left-hand panel of Figure 8.12. The num- bers inside the boxes indicate the mean of Y within each region.
(b) Create a diagram similar to the left-hand panel of Figure 8.12, using the tree illustrated in the right-hand panel of the same figure. You should divide up the predictor space into the correct regions, and indicate the mean for each region.
P(Class is Red|X): 0.1, 0.15, 0.2, 0.2, 0.55, 0.6, 0.6, 0.65, 0.7, and 0.75.
3There are two common ways to combine these results together into a single class prediction. One is the majority vote approach discussed in this chapter. The second approach is to classify based on the average probability. In this example, what is the final classification under each of these two approaches?
# marjority vote
ps = np.array([0.1, 0.15, 0.2, 0.2, 0.55, 0.6, 0.6, 0.65, 0.7, 0.75])
res = 'Red' if (((ps > 0.5).sum() / len(ps)) > 0.5) else 'Green'
print('majority vote: {}'.format(res))
# avg probability
res = 'Red' if (ps.mean() > 0.5) else 'Green'
print('mean probability: {}'.format(res))
majority vote: Red mean probability: Green
If we decide on some small number for the minimum number of observations in the terminal nodes, (1) will yield a tree with low bias and high variance. To obtain a lower vairance model one can use cost complexity pruning, to collapse nodes; increasing bias in the hope of a bigger dividend in reduced variance with the aim of improving overall predictive performance. In practice, this technique is not commonplace in ML, the preferred means of reducing variance is to use an ensemble method: bagging and random forests or to use the additive boosted tree model. These techniques are not as interpretable as the simple tree.