This notebooks shows how to define and train a simple Neural-Network with PyTorch and use it via skorch with SciKit-Learn.
Run in Google Colab | View source on GitHub |
Note: If you are running this in a colab notebook, we recommend you enable a free GPU by going:
Runtime → Change runtime type → Hardware Accelerator: GPU
If you are running in colab, you should install the dependencies and download the dataset by running the following cell:
import subprocess
# Installation on Google Colab
try:
import google.colab
subprocess.run(['python', '-m', 'pip', 'install', 'skorch' , 'torch'])
except ImportError:
pass
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
Using SciKit-Learns fetch_openml
to load MNIST data.
mnist = fetch_openml('mnist_784', as_frame=False, cache=False)
mnist.data.shape
(70000, 784)
Each image of the MNIST dataset is encoded in a 784 dimensional vector, representing a 28 x 28 pixel image. Each pixel has a value between 0 and 255, corresponding to the grey-value of a pixel.
The above featch_mldata
method to load MNIST returns data
and target
as uint8
which we convert to float32
and int64
respectively.
X = mnist.data.astype('float32')
y = mnist.target.astype('int64')
To avoid big weights that deal with the pixel values from between [0, 255], we scale X
down. A commonly used range is [0, 1].
X /= 255.0
X.min(), X.max()
(0.0, 1.0)
Note: data is not normalized.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
assert(X_train.shape[0] + X_test.shape[0] == mnist.data.shape[0])
X_train.shape, y_train.shape
((52500, 784), (52500,))
def plot_example(X, y):
"""Plot the first 5 images and their labels in a row."""
for i, (img, y) in enumerate(zip(X[:5].reshape(5, 28, 28), y[:5])):
plt.subplot(151 + i)
plt.imshow(img)
plt.xticks([])
plt.yticks([])
plt.title(y)
plot_example(X_train, y_train)
Simple, fully connected neural network with one hidden layer. Input layer has 784 dimensions (28x28), hidden layer has 98 (= 784 / 8) and output layer 10 neurons, representing digits 0 - 9.
import torch
from torch import nn
import torch.nn.functional as F
device = 'cuda' if torch.cuda.is_available() else 'cpu'
mnist_dim = X.shape[1]
hidden_dim = int(mnist_dim/8)
output_dim = len(np.unique(mnist.target))
mnist_dim, hidden_dim, output_dim
(784, 98, 10)
A Neural network in PyTorch's framework.
class ClassifierModule(nn.Module):
def __init__(
self,
input_dim=mnist_dim,
hidden_dim=hidden_dim,
output_dim=output_dim,
dropout=0.5,
):
super(ClassifierModule, self).__init__()
self.dropout = nn.Dropout(dropout)
self.hidden = nn.Linear(input_dim, hidden_dim)
self.output = nn.Linear(hidden_dim, output_dim)
def forward(self, X, **kwargs):
X = F.relu(self.hidden(X))
X = self.dropout(X)
X = F.softmax(self.output(X), dim=-1)
return X
skorch allows to use PyTorch's networks in the SciKit-Learn setting:
from skorch import NeuralNetClassifier
torch.manual_seed(0)
net = NeuralNetClassifier(
ClassifierModule,
max_epochs=20,
lr=0.1,
device=device,
)
net.fit(X_train, y_train);
epoch train_loss valid_acc valid_loss dur ------- ------------ ----------- ------------ ------ 1 0.8387 0.8800 0.4174 3.8169 2 0.4332 0.9103 0.3133 0.8510 3 0.3612 0.9233 0.2684 0.8208 4 0.3233 0.9309 0.2317 0.8079 5 0.2938 0.9353 0.2173 0.8074 6 0.2738 0.9390 0.2039 0.8277 7 0.2600 0.9454 0.1868 0.8224 8 0.2427 0.9484 0.1757 0.8623 9 0.2362 0.9503 0.1683 0.8312 10 0.2226 0.9512 0.1621 0.8221 11 0.2184 0.9529 0.1565 0.8158 12 0.2090 0.9541 0.1508 0.7974 13 0.2067 0.9570 0.1446 0.8123 14 0.1978 0.9570 0.1412 0.8304 15 0.1923 0.9582 0.1392 0.8421 16 0.1889 0.9582 0.1342 0.8153 17 0.1855 0.9612 0.1297 0.8458 18 0.1786 0.9613 0.1266 0.8827 19 0.1728 0.9615 0.1250 0.8335 20 0.1698 0.9613 0.1248 0.8112
from sklearn.metrics import accuracy_score
y_pred = net.predict(X_test)
accuracy_score(y_test, y_pred)
0.9631428571428572
An accuracy of about 96% for a network with only one hidden layer is not too bad.
Let's take a look at some predictions that went wrong:
error_mask = y_pred != y_test
plot_example(X_test[error_mask], y_pred[error_mask])
PyTorch expects a 4 dimensional tensor as input for its 2D convolution layer. The dimensions represent:
As initial batch size the number of examples needs to be provided. MNIST data has only one channel. As stated above, each MNIST vector represents a 28x28 pixel image. Hence, the resulting shape for PyTorch tensor needs to be (x, 1, 28, 28).
XCnn = X.reshape(-1, 1, 28, 28)
XCnn.shape
(70000, 1, 28, 28)
XCnn_train, XCnn_test, y_train, y_test = train_test_split(XCnn, y, test_size=0.25, random_state=42)
XCnn_train.shape, y_train.shape
((52500, 1, 28, 28), (52500,))
class Cnn(nn.Module):
def __init__(self, dropout=0.5):
super(Cnn, self).__init__()
self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
self.conv2_drop = nn.Dropout2d(p=dropout)
self.fc1 = nn.Linear(1600, 100) # 1600 = number channels * width * height
self.fc2 = nn.Linear(100, 10)
self.fc1_drop = nn.Dropout(p=dropout)
def forward(self, x):
x = torch.relu(F.max_pool2d(self.conv1(x), 2))
x = torch.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
# flatten over channel, height and width = 1600
x = x.view(-1, x.size(1) * x.size(2) * x.size(3))
x = torch.relu(self.fc1_drop(self.fc1(x)))
x = torch.softmax(self.fc2(x), dim=-1)
return x
torch.manual_seed(0)
cnn = NeuralNetClassifier(
Cnn,
max_epochs=10,
lr=0.002,
optimizer=torch.optim.Adam,
device=device,
)
cnn.fit(XCnn_train, y_train);
epoch train_loss valid_acc valid_loss dur ------- ------------ ----------- ------------ ------ 1 0.4319 0.9721 0.0891 5.8088 2 0.1628 0.9794 0.0641 2.1617 3 0.1349 0.9815 0.0568 1.8369 4 0.1153 0.9844 0.0507 1.4844 5 0.1006 0.9863 0.0441 1.4542 6 0.0962 0.9881 0.0397 1.4394 7 0.0861 0.9872 0.0423 1.4464 8 0.0853 0.9863 0.0410 1.4599 9 0.0805 0.9880 0.0384 1.4535 10 0.0753 0.9888 0.0392 1.4857
y_pred_cnn = cnn.predict(XCnn_test)
accuracy_score(y_test, y_pred_cnn)
0.9883428571428572
An accuracy of >98% should suffice for this example!
Let's see how we fare on the examples that went wrong before:
accuracy_score(y_test[error_mask], y_pred_cnn[error_mask])
0.7705426356589147
Over 70% of the previously misclassified images are now correctly identified.
plot_example(X_test[error_mask], y_pred_cnn[error_mask])