MNIST with SciKit-Learn and skorch

This notebooks shows how to define and train a simple Neural-Network with PyTorch and use it via skorch with SciKit-Learn.

Run in Google Colab View source on GitHub

Note: If you are running this in a colab notebook, we recommend you enable a free GPU by going:

Runtime   →   Change runtime type   →   Hardware Accelerator: GPU

If you are running in colab, you should install the dependencies and download the dataset by running the following cell:

In [1]:
! [ ! -z "$COLAB_GPU" ] && pip install torch scikit-learn==0.20.* skorch
In [2]:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt

Loading Data

Using SciKit-Learns fetch_openml to load MNIST data.

In [3]:
mnist = fetch_openml('mnist_784', cache=False)
In [4]:
mnist.data.shape
Out[4]:
(70000, 784)

Preprocessing Data

Each image of the MNIST dataset is encoded in a 784 dimensional vector, representing a 28 x 28 pixel image. Each pixel has a value between 0 and 255, corresponding to the grey-value of a pixel.
The above featch_mldata method to load MNIST returns data and target as uint8 which we convert to float32 and int64 respectively.

In [5]:
X = mnist.data.astype('float32')
y = mnist.target.astype('int64')

To avoid big weights that deal with the pixel values from between [0, 255], we scale X down. A commonly used range is [0, 1].

In [6]:
X /= 255.0
In [7]:
X.min(), X.max()
Out[7]:
(0.0, 1.0)

Note: data is not normalized.

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
In [9]:
assert(X_train.shape[0] + X_test.shape[0] == mnist.data.shape[0])
In [10]:
X_train.shape, y_train.shape
Out[10]:
((52500, 784), (52500,))
In [11]:
def plot_example(X, y):
    """Plot the first 5 images and their labels in a row."""
    for i, (img, y) in enumerate(zip(X[:5].reshape(5, 28, 28), y[:5])):
        plt.subplot(151 + i)
        plt.imshow(img)
        plt.xticks([])
        plt.yticks([])
        plt.title(y)
In [12]:
plot_example(X_train, y_train)

Build Neural Network with PyTorch

Simple, fully connected neural network with one hidden layer. Input layer has 784 dimensions (28x28), hidden layer has 98 (= 784 / 8) and output layer 10 neurons, representing digits 0 - 9.

In [13]:
import torch
from torch import nn
import torch.nn.functional as F
In [14]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
In [15]:
mnist_dim = X.shape[1]
hidden_dim = int(mnist_dim/8)
output_dim = len(np.unique(mnist.target))
In [16]:
mnist_dim, hidden_dim, output_dim
Out[16]:
(784, 98, 10)

A Neural network in PyTorch's framework.

In [17]:
class ClassifierModule(nn.Module):
    def __init__(
            self,
            input_dim=mnist_dim,
            hidden_dim=hidden_dim,
            output_dim=output_dim,
            dropout=0.5,
    ):
        super(ClassifierModule, self).__init__()
        self.dropout = nn.Dropout(dropout)

        self.hidden = nn.Linear(input_dim, hidden_dim)
        self.output = nn.Linear(hidden_dim, output_dim)

    def forward(self, X, **kwargs):
        X = F.relu(self.hidden(X))
        X = self.dropout(X)
        X = F.softmax(self.output(X), dim=-1)
        return X

skorch allows to use PyTorch's networks in the SciKit-Learn setting:

In [18]:
from skorch import NeuralNetClassifier
In [19]:
torch.manual_seed(0)

net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
    device=device,
)
In [20]:
net.fit(X_train, y_train);
  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        0.8299       0.8893        0.4037  0.8951
      2        0.4331       0.9113        0.3075  0.7862
      3        0.3619       0.9240        0.2614  0.8314
      4        0.3237       0.9305        0.2379  0.7739
      5        0.2914       0.9371        0.2173  0.7721
      6        0.2739       0.9413        0.1979  0.7949
      7        0.2569       0.9449        0.1859  0.7691
      8        0.2420       0.9461        0.1813  0.7871
      9        0.2337       0.9496        0.1708  0.7730
     10        0.2195       0.9532        0.1604  0.7992
     11        0.2151       0.9547        0.1514  0.7955
     12        0.2065       0.9560        0.1476  0.7734
     13        0.2015       0.9563        0.1455  0.8316
     14        0.1943       0.9587        0.1389  0.7913
     15        0.1883       0.9578        0.1381  0.8092
     16        0.1848       0.9596        0.1323  0.7907
     17        0.1838       0.9606        0.1312  0.7806
     18        0.1776       0.9623        0.1261  0.7657
     19        0.1738       0.9625        0.1250  0.7690
     20        0.1704       0.9627        0.1238  0.7627

Prediction

In [21]:
from sklearn.metrics import accuracy_score
In [22]:
y_pred = net.predict(X_test)
In [23]:
accuracy_score(y_test, y_pred)
Out[23]:
0.9627428571428571

An accuracy of about 96% for a network with only one hidden layer is not too bad.

Let's take a look at some predictions that went wrong:

In [24]:
error_mask = y_pred != y_test
In [25]:
plot_example(X_test[error_mask], y_pred[error_mask])

Convolutional Network

PyTorch expects a 4 dimensional tensor as input for its 2D convolution layer. The dimensions represent:

  • Batch size
  • Number of channel
  • Height
  • Width

As initial batch size the number of examples needs to be provided. MNIST data has only one channel. As stated above, each MNIST vector represents a 28x28 pixel image. Hence, the resulting shape for PyTorch tensor needs to be (x, 1, 28, 28).

In [26]:
XCnn = X.reshape(-1, 1, 28, 28)
In [27]:
XCnn.shape
Out[27]:
(70000, 1, 28, 28)
In [28]:
XCnn_train, XCnn_test, y_train, y_test = train_test_split(XCnn, y, test_size=0.25, random_state=42)
In [29]:
XCnn_train.shape, y_train.shape
Out[29]:
((52500, 1, 28, 28), (52500,))
In [30]:
class Cnn(nn.Module):
    def __init__(self, dropout=0.5):
        super(Cnn, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
        self.conv2_drop = nn.Dropout2d(p=dropout)
        self.fc1 = nn.Linear(1600, 100) # 1600 = number channels * width * height
        self.fc2 = nn.Linear(100, 10)
        self.fc1_drop = nn.Dropout(p=dropout)

    def forward(self, x):
        x = torch.relu(F.max_pool2d(self.conv1(x), 2))
        x = torch.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        
        # flatten over channel, height and width = 1600
        x = x.view(-1, x.size(1) * x.size(2) * x.size(3))
        
        x = torch.relu(self.fc1_drop(self.fc1(x)))
        x = torch.softmax(self.fc2(x), dim=-1)
        return x
In [31]:
torch.manual_seed(0)

cnn = NeuralNetClassifier(
    Cnn,
    max_epochs=10,
    lr=0.002,
    optimizer=torch.optim.Adam,
    device=device,
)
In [32]:
cnn.fit(XCnn_train, y_train);
  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        0.4298       0.9729        0.0898  4.9916
      2        0.1577       0.9799        0.0646  4.8633
      3        0.1261       0.9824        0.0564  4.9402
      4        0.1120       0.9848        0.0507  4.8320
      5        0.1006       0.9855        0.0446  4.8227
      6        0.0924       0.9862        0.0415  4.8191
      7        0.0844       0.9886        0.0375  4.8168
      8        0.0828       0.9854        0.0414  4.8333
      9        0.0779       0.9885        0.0368  4.8199
     10        0.0768       0.9891        0.0350  4.8129
In [33]:
y_pred_cnn = cnn.predict(XCnn_test)
In [34]:
accuracy_score(y_test, y_pred_cnn)
Out[34]:
0.9889714285714286

An accuracy of >98% should suffice for this example!

Let's see how we fare on the examples that went wrong before:

In [35]:
accuracy_score(y_test[error_mask], y_pred_cnn[error_mask])
Out[35]:
0.7745398773006135

Over 70% of the previously misclassified images are now correctly identified.

In [37]:
plot_example(X_test[error_mask], y_pred_cnn[error_mask])