This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.
import numpy as np
import sklearn
import sklearn.datasets as ds
import sklearn.cross_validation as cv
import sklearn.neighbors as nb
import matplotlib.pyplot as plt
%matplotlib inline
datasets
module of scikit-learn. This dataset contains hand-written digits that have been manually labeled.digits = ds.load_digits()
X = digits.data
y = digits.target
print((X.min(), X.max()))
print(X.shape)
In the matrix X
, each row contains the $8 \times 8=64$ pixels (in grayscale, values between 0 and 16). The pixels are ordered according to the row-major order.
nrows, ncols = 2, 5
plt.figure(figsize=(6,3));
plt.gray()
for i in range(ncols * nrows):
ax = plt.subplot(nrows, ncols, i + 1)
ax.matshow(digits.images[i,...])
plt.xticks([]); plt.yticks([]);
plt.title(digits.target[i]);
(X_train, X_test,
y_train, y_test) = cv.train_test_split(X, y, test_size=.25)
knc = nb.KNeighborsClassifier()
knc.fit(X_train, y_train);
knc.score(X_test, y_test)
# Let's draw a 1.
one = np.zeros((8, 8))
one[1:-1, 4] = 16 # The image values are in [0, 16].
one[2, 3] = 16
plt.figure(figsize=(2,2));
plt.imshow(one, interpolation='none');
plt.grid(False);
plt.xticks(); plt.yticks();
plt.title("One");
knc.predict(one.ravel())
You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).
IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).