# Cross-validation¶

## What is cross-validation?¶

• A robust way to evaluate predictive accuracy.
• Gives mean and standard deviation.
• Makes good use of all the data.
In [ ]:
from sklearn.cross_validation import KFold
n_samples = 200
cv = KFold(n=n_samples, n_folds=5)

In [ ]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

for training_set, test_set in cv:
plt.figure(figsize=(20,1))
plt.plot(training_set, np.ones(len(training_set)), "o", color='blue', label="training set")
plt.plot(test_set, np.ones(len(test_set)), "o", color='red', label="test set")
plt.legend(loc="best")
plt.axis("off")


## Using cross-validation in scikit-learn¶

In [ ]:
from sklearn.cross_validation import cross_val_score, train_test_split

In [ ]:
from sklearn.datasets import load_digits

In [ ]:
digits = load_digits()

In [ ]:
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target)

In [ ]:
from sklearn.svm import SVC

In [ ]:
cross_val_score(SVC(C=1), X_train, y_train, cv=3)

In [ ]:
cross_val_score(SVC(C=10), X_train, y_train, cv=3, scoring="f1")


Let's go to a binary task for a moment (even vs uneven)

In [ ]:
cross_val_score(SVC(C=10), X_train, y_train % 2, cv=3)

In [ ]:
cross_val_score(SVC(C=10), X_train, y_train % 2, cv=3, scoring="average_precision")

In [ ]:
cross_val_score(SVC(C=10), X_train, y_train % 2, cv=3, scoring="roc_auc")


There are other ways to do cross-valiation

In [ ]:
from sklearn.cross_validation import ShuffleSplit
cross_val_score(SVC(C=10), X_train, y_train, cv=ShuffleSplit(len(X_train), 10, test_size=.4))


# Tasks¶

1. Select a good gamma and C for SVC on digits using cross-validation.
2. Validate your findings on the test set.