Supervised learning, in which the data comes with additional attributes that we want to predict. This problem can be either:
classification: samples belong to two or more classes and we want to learn from already labeled data how to predict the class of unlabeled data. An example of classification problem would be the handwritten digit recognition example, in which the aim is to assign each input vector to one of a finite number of discrete categories. Another way to think of classification is as a discrete (as opposed to continuous) form of supervised learning where one has a limited number of categories and for each of the n samples provided, one is to try to label them with the correct category or class.
regression: if the desired output consists of one or more continuous variables, then the task is called regression. An example of a regression problem would be the prediction of the length of a salmon as a function of its age and weight.
MNIST dataset - a set of 70,000 small images of digits handwritten. You can read more via The MNIST Database
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
mnist
{'COL_NAMES': ['label', 'data'], 'DESCR': 'mldata.org dataset: mnist-original', 'data': array([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]], dtype=uint8), 'target': array([0., 0., 0., ..., 9., 9., 9.])}
len(mnist['data'])
70000
X, y = mnist['data'], mnist['target']
X
array([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)
y
array([0., 0., 0., ..., 9., 9., 9.])
X[69999]
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 89, 156, 231, 255, 163, 18, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 35, 165, 253, 253, 253, 254, 253, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 43, 153, 224, 253, 253, 180, 174, 254, 253, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 70, 237, 253, 207, 71, 19, 2, 0, 254, 253, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 147, 253, 253, 177, 23, 0, 0, 0, 0, 254, 253, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 61, 217, 254, 254, 131, 0, 0, 0, 0, 0, 83, 255, 254, 101, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 87, 229, 254, 251, 135, 3, 0, 0, 0, 44, 132, 244, 254, 253, 129, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 85, 247, 253, 235, 124, 0, 0, 0, 0, 112, 229, 253, 253, 254, 253, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 175, 253, 253, 120, 0, 0, 52, 212, 235, 250, 253, 253, 253, 254, 167, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 235, 253, 253, 240, 195, 195, 248, 253, 254, 253, 253, 253, 253, 231, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 254, 254, 254, 255, 254, 254, 222, 120, 38, 5, 156, 254, 254, 38, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 136, 233, 241, 241, 225, 135, 25, 0, 0, 103, 253, 253, 207, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30, 30, 0, 0, 0, 0, 19, 196, 253, 240, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 112, 253, 253, 146, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 231, 253, 222, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 158, 255, 254, 152, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 199, 254, 236, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 70, 253, 254, 135, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 227, 253, 207, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 159, 253, 60, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint8)
y[69999]
9.0
X.shape
(70000, 784)
y.shape
(70000,)
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
_ = X[1000]
_image = _.reshape(28, 28)
plt.imshow(_image);
y[1000]
0.0
num_split = 60000
X_train, X_test, y_train, y_test = X[:num_split], X[num_split:], y[:num_split], y[num_split:]
Tips: Typically we shuffle the training set. This ensures the training set is randomised and your data distribution is consistent. However, shuffling is a bad idea for time series data.
import numpy as np
shuffle_index = np.random.permutation(num_split)
X_train, y_train = X_train[shuffle_index], y_train[shuffle_index]
To simplify our problem, we will make this an exercise of "zero" or "non-zero", making it a two-class problem.
We need to first convert our target to 0 or non zero.
y_train_0 = (y_train == 0)
y_train_0
array([False, False, False, ..., False, False, False])
y_test_0 = (y_test == 0)
y_test_0
array([ True, True, True, ..., False, False, False])
At this point we can pick any classifier and train it. This is the iterative part of choosing and testing all the classifiers and tuning the hyper parameters
from sklearn.linear_model import SGDClassifier
clf = SGDClassifier(random_state = 0)
clf.fit(X_train, y_train_0)
C:\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3. "and default tol will be 1e-3." % type(self), FutureWarning)
SGDClassifier(alpha=0.0001, average=False, class_weight=None, epsilon=0.1, eta0=0.0, fit_intercept=True, l1_ratio=0.15, learning_rate='optimal', loss='hinge', max_iter=None, n_iter=None, n_jobs=1, penalty='l2', power_t=0.5, random_state=0, shuffle=True, tol=None, verbose=0, warm_start=False)
clf.predict(X[1000].reshape(1, -1))
array([ True])
Let's try with the StratifiedKFold
stratified sampling to create multiple folds. At each iteration, the classifier was cloned and trained using the training folds and makes predictions on the test fold.
from sklearn.model_selection import StratifiedKFold
from sklearn.base import clone
clf = SGDClassifier(random_state=0)
skfolds = StratifiedKFold(n_splits=3, random_state=100)
for train_index, test_index in skfolds.split(X_train, y_train_0):
clone_clf = clone(clf)
X_train_fold = X_train[train_index]
y_train_folds = (y_train_0[train_index])
X_test_fold = X_train[test_index]
y_test_fold = (y_train_0[test_index])
clone_clf.fit(X_train_fold, y_train_folds)
y_pred = clone_clf.predict(X_test_fold)
n_correct = sum(y_pred == y_test_fold)
print("{0:.4f}".format(n_correct / len(y_pred)))
C:\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3. "and default tol will be 1e-3." % type(self), FutureWarning)
0.9812
C:\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3. "and default tol will be 1e-3." % type(self), FutureWarning)
0.9880
C:\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3. "and default tol will be 1e-3." % type(self), FutureWarning)
0.9882
cross_val_score
using K-fold Cross-Validation¶K-fold cross-validation splits the training set into K-folds and then make predictions and evaluate them on each fold using a model trained on the remaning folds.
from sklearn.model_selection import cross_val_score
cross_val_score(clf, X_train, y_train_0, cv=3, scoring='accuracy')
C:\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3. "and default tol will be 1e-3." % type(self), FutureWarning) C:\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3. "and default tol will be 1e-3." % type(self), FutureWarning) C:\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3. "and default tol will be 1e-3." % type(self), FutureWarning)
array([0.98120094, 0.988 , 0.98819941])
Let's check against a dumb classifier
1 - sum(y_train_0) / len(y_train_0)
0.9012833333333333
A simple check shows that 90.1% of the images are not zero. Any time you guess the image is not zero, you will be right 90.13% of the time.
Bare this in mind when you are dealing with skewed datasets. Because of this, accuracy is generally not the preferred performance measure for classifiers.
from sklearn.model_selection import cross_val_predict
y_train_pred = cross_val_predict(clf, X_train, y_train_0, cv=3)
C:\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3. "and default tol will be 1e-3." % type(self), FutureWarning) C:\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3. "and default tol will be 1e-3." % type(self), FutureWarning) C:\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3. "and default tol will be 1e-3." % type(self), FutureWarning)
from sklearn.metrics import confusion_matrix
confusion_matrix(y_train_0, y_train_pred)
array([[53578, 499], [ 353, 5570]], dtype=int64)
Each row: actual class
Each column: predicted class
First row: Non-zero images, the negative class:
Second row: The images of zeros, the positive class:
Precision measures the accuracy of positive predictions. Also called the precision
of the classifier
from sklearn.metrics import precision_score, recall_score
precision_score(y_train_0, y_train_pred) # 5528 / (717 + 5528)
0.9177788762563849
5528 / (717+5528)
0.8851881505204163
Precision
is typically used with recall
(Sensitivity
or True Positive Rate
). The ratio of positive instances that are correctly detected by the classifier.
recall_score(y_train_0, y_train_pred) # 5528 / (395 + 5528)
0.9404018234003039
5528 / (395 + 5528)
0.9333108222184704
$F_1$ score is the harmonic mean of precision and recall. Regular mean gives equal weight to all values. Harmonic mean gives more weight to low values.
$$F_1=\frac{2}{\frac{1}{\textrm{precision}}+\frac{1}{\textrm{recall}}}=2\times \frac{\textrm{precision}\times \textrm{recall}}{\textrm{precision}+ \textrm{recall}}=\frac{TP}{TP+\frac{FN+FP}{2}}$$The $F_1$ score favours classifiers that have similar precision and recall.
from sklearn.metrics import f1_score
f1_score(y_train_0, y_train_pred)
0.92895263509006
Increasing precision reduced recall and vice versa
Our classifier is designed to pick up zeros.
12 observations
Central Arrow
Suppose the decision threshold is positioned at the central arrow:
At this threshold, the precision accuracy is $\frac{4}{5}=80\%$
However, out of the 6 zeros, the classifier only picked up 4. The recall accuracy is $\frac{4}{6}=67\%$
Right Arrow
At this threshold, the precision accuracy is $\frac{3}{3}=100\%$ However, out of the 6 zeros, the classifier only picked up 3. The recall accuracy is $\frac{3}{6}=50\%$
Left Arrow
At this threshold, the precision accuracy is $\frac{6}{8}=75\%$ Out of the 6 zeros, the classifier picked up all 6. The recall accuracy is $\frac{6}{6}=100\%$
clf = SGDClassifier(random_state=0)
clf.fit(X_train, y_train_0)
C:\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3. "and default tol will be 1e-3." % type(self), FutureWarning)
SGDClassifier(alpha=0.0001, average=False, class_weight=None, epsilon=0.1, eta0=0.0, fit_intercept=True, l1_ratio=0.15, learning_rate='optimal', loss='hinge', max_iter=None, n_iter=None, n_jobs=1, penalty='l2', power_t=0.5, random_state=0, shuffle=True, tol=None, verbose=0, warm_start=False)
y[1000]
0.0
y_scores = clf.decision_function(X[1000].reshape(1, -1))
y_scores
array([113837.93381089])
threshold = 0
y_some_digits_pred = (y_scores > threshold)
y_some_digits_pred
array([ True])
threshold = 40000
y_some_digits_pred = (y_scores > threshold)
y_some_digits_pred
array([ True])
y_scores = cross_val_predict(clf, X_train, y_train_0, cv=3, method='decision_function')
C:\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3. "and default tol will be 1e-3." % type(self), FutureWarning) C:\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3. "and default tol will be 1e-3." % type(self), FutureWarning) C:\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3. "and default tol will be 1e-3." % type(self), FutureWarning)
plt.figure(figsize=(12,8)); plt.hist(y_scores, bins=100);
With the decision scores, we can compute precision and recall for all possible thresholds using the precision_recall_curve()
function:
from sklearn.metrics import precision_recall_curve
precisions, recalls, thresholds = precision_recall_curve(y_train_0, y_scores)
def plot_precision_recall_vs_threshold(precisions, recalls, thresholds):
plt.plot(thresholds, precisions[:-1], "b--", label="Precision")
plt.plot(thresholds, recalls[:-1], "g--", label="Recall")
plt.xlabel("Threshold")
plt.legend(loc="upper left")
plt.ylim([-0.5,1.5])
plt.figure(figsize=(12,8));
plot_precision_recall_vs_threshold(precisions, recalls, thresholds)
plt.show()
With this chart, you can select the threshold value that gives you the best precision/recall tradeoff for your task.
Some tasks may call for higher precision (accuracy of positive predictions). Like designing a classifier that picks up adult contents to protect kids. This will require the classifier to set a high bar to allow any contents to be consumed by children.
Some tasks may call for higher recall (ratio of positive instances that are correctly detected by the classifier). Such as detecting shoplifters/intruders on surveillance images - Anything that remotely resemble "positive" instances to be picked up.
One can also plot precisions against recalls to assist with the threshold selection
plt.figure(figsize=(12,8));
plt.plot(precisions, recalls);
plt.xlabel('recalls');
plt.ylabel('precisions');
plt.title('PR Curve: precisions/recalls tradeoff');
Let's aim for 90% precisions.
len(precisions)
59431
len(thresholds)
59430
plt.figure(figsize=(12,8));
plt.plot(thresholds, precisions[1:]);
idx = len(precisions[precisions < 0.9])
thresholds[idx]
-23405.08249946004
y_train_pred_90 = (y_scores > 21454)
precision_score(y_train_0, y_train_pred_90)
0.9278991596638655
recall_score(y_train_0, y_train_pred_90)
0.9321289886881647
Let's aim for 99% precisions.
idx = len(precisions[precisions < 0.99])
thresholds[idx]
290099.5550851834
y_train_pred_90 = (y_scores > thresholds[idx])
precision_score(y_train_0, y_train_pred_90)
0.9900105152471083
recall_score(y_train_0, y_train_pred_90)
0.6358264393044065
Instead of plotting precision versus recall, the ROC curve plots the true positive rate
(another name for recall) against the false positive rate
. The false positive rate
(FPR) is the ratio of negative instances that are incorrectly classified as positive. It is equal to one minus the true negative rate
, which is the ratio of negative instances that are correctly classified as negative.
The TNR is also called specificity
. Hence the ROC curve plots sensitivity
(recall) versus 1 - specificity
.
from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(y_train_0, y_scores)
def plot_roc_curve(fpr, tpr, label=None):
plt.plot(fpr, tpr, linewidth=2, label=label)
plt.plot([0,1], [0,1], 'k--')
plt.axis([0, 1, 0, 1])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.figure(figsize=(12,8));
plot_roc_curve(fpr, tpr)
plt.show();
from sklearn.metrics import roc_auc_score
roc_auc_score(y_train_0, y_scores)
0.9930522466368523
Use PR curve whenever the positive class is rare or when you care more about the false positives than the false negatives
Use ROC curve whenever the negative class is rare or when you care more about the false negatives than the false positives
In the example above, the ROC curve seemed to suggest that the classifier is good. However, when you look at the PR curve, you can see that there are room for improvement.
from sklearn.ensemble import RandomForestClassifier
f_clf = RandomForestClassifier(random_state=0)
y_probas_forest = cross_val_predict(f_clf, X_train, y_train_0,
cv=3, method='predict_proba')
y_scores_forest = y_probas_forest[:, 1]
fpr_forest, tpr_forest, threshold_forest = roc_curve(y_train_0, y_scores_forest)
plt.figure(figsize=(12,8));
plt.plot(fpr, tpr, "b:", label="SGD")
plot_roc_curve(fpr_forest, tpr_forest, "Random Forest")
plt.legend(loc="lower right")
plt.show();
roc_auc_score(y_train_0, y_scores_forest)
0.9967904848855614
f_clf.fit(X_train, y_train_0)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1, oob_score=False, random_state=0, verbose=0, warm_start=False)
y_train_rf = cross_val_predict(f_clf, X_train, y_train_0, cv=3)
precision_score(y_train_0, y_train_rf)
0.9918757898537642
recall_score(y_train_0, y_train_rf)
0.9275704879284147
confusion_matrix(y_train_0, y_train_rf)
array([[54032, 45], [ 429, 5494]], dtype=int64)