MNIST with SciKit-Learn and skorch¶

This notebooks shows how to define and train a simple Neural-Network with PyTorch and use it via skorch with SciKit-Learn.

In [1]:

from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
import numpy as np

Loading Data¶

Using SciKit-Learns fetch_mldata to load MNIST data.

In [2]:

mnist = fetch_mldata('MNIST original', data_home='../datasets/')

In [3]:

mnist

Out[3]:

{'DESCR': 'mldata.org dataset: mnist-original',
 'COL_NAMES': ['label', 'data'],
 'target': array([0., 0., 0., ..., 9., 9., 9.]),
 'data': array([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)}

In [4]:

mnist.data.shape

Out[4]:

(70000, 784)

Preprocessing Data¶

Each image of the MNIST dataset is encoded in a 784 dimensional vector, representing a 28 x 28 pixel image. Each pixel has a value between 0 and 255, corresponding to the grey-value of a pixel.
The above featch_mldata method to load MNIST returns data and target as uint8 which we convert to float32 and int64 respectively.

In [5]:

X = mnist.data.astype('float32')
y = mnist.target.astype('int64')

As we will use ReLU as activation in combination with softmax over the output layer, we need to scale X down. An often use range is [0, 1].

In [6]:

X /= 255.0

In [7]:

X.min(), X.max()

Out[7]:

(0.0, 1.0)

Note: data is not normalized.

In [8]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

In [9]:

assert(X_train.shape[0] + X_test.shape[0] == mnist.data.shape[0])

In [10]:

X_train.shape, y_train.shape

Out[10]:

((52500, 784), (52500,))

Build Neural Network with Torch¶

Simple, fully connected neural network with one hidden layer. Input layer has 784 dimensions (28x28), hidden layer has 98 (= 784 / 8) and output layer 10 neurons, representing digits 0 - 9.

In [11]:

import torch
from torch import nn
import torch.nn.functional as F

In [12]:

torch.manual_seed(0);
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [13]:

mnist_dim = X.shape[1]
hidden_dim = int(mnist_dim/8)
output_dim = len(np.unique(mnist.target))

In [14]:

mnist_dim, hidden_dim, output_dim

Out[14]:

(784, 98, 10)

A Neural network in PyTorch's framework.

In [15]:

class ClassifierModule(nn.Module):
    def __init__(
            self,
            input_dim=mnist_dim,
            hidden_dim=hidden_dim,
            output_dim=output_dim,
            dropout=0.5,
    ):
        super(ClassifierModule, self).__init__()
        self.dropout = nn.Dropout(dropout)

        self.hidden = nn.Linear(input_dim, hidden_dim)
        self.output = nn.Linear(hidden_dim, output_dim)

    def forward(self, X, **kwargs):
        X = F.relu(self.hidden(X))
        X = self.dropout(X)
        X = F.softmax(self.output(X), dim=-1)
        return X

Skorch allows to use PyTorch's networks in the SciKit-Learn setting.

In [16]:

from skorch.net import NeuralNetClassifier

In [17]:

net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
    device=device,
)

In [18]:

net.fit(X_train, y_train);

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        0.8284       0.9010        0.3771  5.4626
      2        0.4376       0.9199        0.2879  5.6434
      3        0.3664       0.9308        0.2458  5.1058
      4        0.3239       0.9385        0.2150  4.9935
      5        0.2971       0.9448        0.1947  6.2501
      6        0.2755       0.9474        0.1823  5.3603
      7        0.2643       0.9514        0.1712  5.3241
      8        0.2443       0.9541        0.1585  5.5232
      9        0.2346       0.9557        0.1500  4.8883
     10        0.2257       0.9577        0.1447  4.6561
     11        0.2165       0.9594        0.1394  6.2466
     12        0.2093       0.9600        0.1338  5.2418
     13        0.2045       0.9610        0.1297  5.6362
     14        0.1969       0.9620        0.1263  5.5742
     15        0.1931       0.9629        0.1223  5.1409
     16        0.1893       0.9647        0.1191  6.2617
     17        0.1849       0.9651        0.1185  6.5456
     18        0.1803       0.9657        0.1155  6.8025
     19        0.1765       0.9665        0.1136  5.0039
     20        0.1721       0.9667        0.1103  5.0675

Prediction¶

In [19]:

predicted = net.predict(X_test)

In [20]:

np.mean(predicted == y_test)

Out[20]:

0.9653142857142857

An accuracy of nearly 96% for a network with only one hidden layer is not too bad

Convolutional Network¶

PyTorch expects a 4 dimensional tensor as input for its 2D convolution layer. The dimensions represent:

Batch size
Number of channel
Height
Width

As initial batch size the number of examples needs to be provided. MNIST data has only one channel. As stated above, each MNIST vector represents a 28x28 pixel image. Hence, the resulting shape for PyTorch tensor needs to be (x, 1, 28, 28).

In [21]:

XCnn = X.reshape(-1, 1, 28, 28)

In [22]:

XCnn.shape

Out[22]:

(70000, 1, 28, 28)

In [23]:

XCnn_train, XCnn_test, y_train, y_test = train_test_split(XCnn, y, test_size=0.25, random_state=42)

In [24]:

XCnn_train.shape, y_train.shape

Out[24]:

((52500, 1, 28, 28), (52500,))

In [25]:

class Cnn(nn.Module):
    def __init__(self):
        super(Cnn, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(1600, 128) # 1600 = number channels * width * height
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, x.size(1) * x.size(2) * x.size(3)) # flatten over channel, height and width = 1600
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        x = F.softmax(x, dim=-1)
        return x

In [26]:

cnn = NeuralNetClassifier(
    Cnn,
    max_epochs=15,
    lr=1,
    optimizer=torch.optim.Adadelta,
    device=device,
)

In [27]:

cnn.fit(XCnn_train, y_train);

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        0.4692       0.9730        0.0873  7.4336
      2        0.1503       0.9818        0.0601  6.6657
      3        0.1177       0.9834        0.0525  6.5910
      4        0.1037       0.9846        0.0476  7.9510
      5        0.0889       0.9847        0.0446  6.5556
      6        0.0808       0.9873        0.0407  6.4084
      7        0.0724       0.9878        0.0384  6.1549
      8        0.0680       0.9875        0.0379  5.7811
      9        0.0646       0.9885        0.0376  6.2944
     10        0.0582       0.9883        0.0370  5.6687
     11        0.0578       0.9879        0.0350  5.7188
     12        0.0542       0.9879        0.0380  6.4705
     13        0.0523       0.9904        0.0326  6.1535
     14        0.0493       0.9884        0.0343  6.5948
     15        0.0498       0.9900        0.0316  5.5662

In [28]:

cnn_pred = cnn.predict(XCnn_test)

In [29]:

np.mean(cnn_pred == y_test)

Out[29]:

0.9912571428571428

An accuracy of 99.1% should suffice for this example!