This notebooks shows how to define and train a simple Neural-Network with PyTorch and use it via skorch with SciKit-Learn.
from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
import numpy as np
Using SciKit-Learns fetch_mldata
to load MNIST data.
mnist = fetch_mldata('MNIST original', data_home='../datasets/')
mnist
{'DESCR': 'mldata.org dataset: mnist-original', 'COL_NAMES': ['label', 'data'], 'target': array([0., 0., 0., ..., 9., 9., 9.]), 'data': array([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)}
mnist.data.shape
(70000, 784)
Each image of the MNIST dataset is encoded in a 784 dimensional vector, representing a 28 x 28 pixel image. Each pixel has a value between 0 and 255, corresponding to the grey-value of a pixel.
The above featch_mldata
method to load MNIST returns data
and target
as uint8
which we convert to float32
and int64
respectively.
X = mnist.data.astype('float32')
y = mnist.target.astype('int64')
As we will use ReLU as activation in combination with softmax over the output layer, we need to scale X
down. An often use range is [0, 1].
X /= 255.0
X.min(), X.max()
(0.0, 1.0)
Note: data is not normalized.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
assert(X_train.shape[0] + X_test.shape[0] == mnist.data.shape[0])
X_train.shape, y_train.shape
((52500, 784), (52500,))
Simple, fully connected neural network with one hidden layer. Input layer has 784 dimensions (28x28), hidden layer has 98 (= 784 / 8) and output layer 10 neurons, representing digits 0 - 9.
import torch
from torch import nn
import torch.nn.functional as F
torch.manual_seed(0);
device = 'cuda' if torch.cuda.is_available() else 'cpu'
mnist_dim = X.shape[1]
hidden_dim = int(mnist_dim/8)
output_dim = len(np.unique(mnist.target))
mnist_dim, hidden_dim, output_dim
(784, 98, 10)
A Neural network in PyTorch's framework.
class ClassifierModule(nn.Module):
def __init__(
self,
input_dim=mnist_dim,
hidden_dim=hidden_dim,
output_dim=output_dim,
dropout=0.5,
):
super(ClassifierModule, self).__init__()
self.dropout = nn.Dropout(dropout)
self.hidden = nn.Linear(input_dim, hidden_dim)
self.output = nn.Linear(hidden_dim, output_dim)
def forward(self, X, **kwargs):
X = F.relu(self.hidden(X))
X = self.dropout(X)
X = F.softmax(self.output(X), dim=-1)
return X
Skorch allows to use PyTorch's networks in the SciKit-Learn setting.
from skorch.net import NeuralNetClassifier
net = NeuralNetClassifier(
ClassifierModule,
max_epochs=20,
lr=0.1,
device=device,
)
net.fit(X_train, y_train);
epoch train_loss valid_acc valid_loss dur ------- ------------ ----------- ------------ ------ 1 0.8284 0.9010 0.3771 5.4626 2 0.4376 0.9199 0.2879 5.6434 3 0.3664 0.9308 0.2458 5.1058 4 0.3239 0.9385 0.2150 4.9935 5 0.2971 0.9448 0.1947 6.2501 6 0.2755 0.9474 0.1823 5.3603 7 0.2643 0.9514 0.1712 5.3241 8 0.2443 0.9541 0.1585 5.5232 9 0.2346 0.9557 0.1500 4.8883 10 0.2257 0.9577 0.1447 4.6561 11 0.2165 0.9594 0.1394 6.2466 12 0.2093 0.9600 0.1338 5.2418 13 0.2045 0.9610 0.1297 5.6362 14 0.1969 0.9620 0.1263 5.5742 15 0.1931 0.9629 0.1223 5.1409 16 0.1893 0.9647 0.1191 6.2617 17 0.1849 0.9651 0.1185 6.5456 18 0.1803 0.9657 0.1155 6.8025 19 0.1765 0.9665 0.1136 5.0039 20 0.1721 0.9667 0.1103 5.0675
predicted = net.predict(X_test)
np.mean(predicted == y_test)
0.9653142857142857
An accuracy of nearly 96% for a network with only one hidden layer is not too bad
PyTorch expects a 4 dimensional tensor as input for its 2D convolution layer. The dimensions represent:
As initial batch size the number of examples needs to be provided. MNIST data has only one channel. As stated above, each MNIST vector represents a 28x28 pixel image. Hence, the resulting shape for PyTorch tensor needs to be (x, 1, 28, 28).
XCnn = X.reshape(-1, 1, 28, 28)
XCnn.shape
(70000, 1, 28, 28)
XCnn_train, XCnn_test, y_train, y_test = train_test_split(XCnn, y, test_size=0.25, random_state=42)
XCnn_train.shape, y_train.shape
((52500, 1, 28, 28), (52500,))
class Cnn(nn.Module):
def __init__(self):
super(Cnn, self).__init__()
self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(1600, 128) # 1600 = number channels * width * height
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, x.size(1) * x.size(2) * x.size(3)) # flatten over channel, height and width = 1600
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
x = F.softmax(x, dim=-1)
return x
cnn = NeuralNetClassifier(
Cnn,
max_epochs=15,
lr=1,
optimizer=torch.optim.Adadelta,
device=device,
)
cnn.fit(XCnn_train, y_train);
epoch train_loss valid_acc valid_loss dur ------- ------------ ----------- ------------ ------ 1 0.4692 0.9730 0.0873 7.4336 2 0.1503 0.9818 0.0601 6.6657 3 0.1177 0.9834 0.0525 6.5910 4 0.1037 0.9846 0.0476 7.9510 5 0.0889 0.9847 0.0446 6.5556 6 0.0808 0.9873 0.0407 6.4084 7 0.0724 0.9878 0.0384 6.1549 8 0.0680 0.9875 0.0379 5.7811 9 0.0646 0.9885 0.0376 6.2944 10 0.0582 0.9883 0.0370 5.6687 11 0.0578 0.9879 0.0350 5.7188 12 0.0542 0.9879 0.0380 6.4705 13 0.0523 0.9904 0.0326 6.1535 14 0.0493 0.9884 0.0343 6.5948 15 0.0498 0.9900 0.0316 5.5662
cnn_pred = cnn.predict(XCnn_test)
np.mean(cnn_pred == y_test)
0.9912571428571428
An accuracy of 99.1% should suffice for this example!