skorch
is designed to maximize interoperability between sklearn
and pytorch
. The aim is to keep 99% of the flexibility of pytorch
while being able to leverage most features of sklearn
. Below, we show the basic usage of skorch
and how it can be combined with sklearn
.
Run in Google Colab | View source on GitHub |
This notebook shows you how to use the basic functionality of skorch
.
import subprocess
# Installation on Google Colab
try:
import google.colab
subprocess.run(['python', '-m', 'pip', 'install', 'skorch' , 'torch'])
except ImportError:
pass
import torch
from torch import nn
import torch.nn.functional as F
torch.manual_seed(0)
torch.cuda.manual_seed(0)
We load a toy classification task from sklearn
.
import numpy as np
from sklearn.datasets import make_classification
# This is a toy dataset for binary classification, 1000 data points with 20 features each
X, y = make_classification(1000, 20, n_informative=10, random_state=0)
X, y = X.astype(np.float32), y.astype(np.int64)
X.shape, y.shape, y.mean()
((1000, 20), (1000,), 0.5)
pytorch
classification module
¶We define a vanilla neural network with two hidden layers. The output layer should have 2 output units since there are two classes. In addition, it should have a softmax nonlinearity, because later, when calling predict_proba
, the output from the forward
call will be used.
class ClassifierModule(nn.Module):
def __init__(
self,
num_units=10,
nonlin=F.relu,
dropout=0.5,
):
super(ClassifierModule, self).__init__()
self.num_units = num_units
self.nonlin = nonlin
self.dropout = dropout
self.dense0 = nn.Linear(20, num_units)
self.nonlin = nonlin
self.dropout = nn.Dropout(dropout)
self.dense1 = nn.Linear(num_units, 10)
self.output = nn.Linear(10, 2)
def forward(self, X, **kwargs):
X = self.nonlin(self.dense0(X))
X = self.dropout(X)
X = F.relu(self.dense1(X))
X = F.softmax(self.output(X), dim=-1)
return X
We use NeuralNetClassifier
because we're dealing with a classifcation task. The first argument should be the pytorch module
. As additional arguments, we pass the number of epochs and the learning rate (lr
), but those are optional.
Note: To use the CUDA backend, pass device='cuda'
as an additional argument.
from skorch import NeuralNetClassifier
net = NeuralNetClassifier(
ClassifierModule,
max_epochs=20,
lr=0.1,
# device='cuda', # uncomment this to train with CUDA
)
As in sklearn
, we call fit
passing the input data X
and the targets y
. By default, NeuralNetClassifier
makes a StratifiedKFold
split on the data (80/20) to track the validation loss. This is shown, as well as the train loss and the accuracy on the validation set.
# Training the network
net.fit(X, y)
epoch train_loss valid_acc valid_loss dur ------- ------------ ----------- ------------ ------ 1 0.6905 0.6150 0.6749 0.2269 2 0.6740 0.6200 0.6668 0.0308 3 0.6594 0.6750 0.6554 0.0185 4 0.6482 0.6900 0.6452 0.0183 5 0.6423 0.7050 0.6333 0.0189 6 0.6231 0.7000 0.6188 0.0198 7 0.6081 0.7100 0.6064 0.0199 8 0.6003 0.7000 0.5940 0.0183 9 0.5937 0.7250 0.5836 0.0216 10 0.5830 0.7150 0.5725 0.0194 11 0.5686 0.7100 0.5660 0.0202 12 0.5701 0.7150 0.5577 0.0200 13 0.5751 0.7200 0.5499 0.0201 14 0.5662 0.7250 0.5438 0.0167 15 0.5422 0.7250 0.5354 0.0204 16 0.5363 0.7250 0.5310 0.0268 17 0.5378 0.7350 0.5263 0.0281 18 0.5517 0.7250 0.5276 0.0193 19 0.5448 0.7250 0.5291 0.0182 20 0.5280 0.7350 0.5247 0.0219
<class 'skorch.classifier.NeuralNetClassifier'>[initialized]( module_=ClassifierModule( (dense0): Linear(in_features=20, out_features=10, bias=True) (dropout): Dropout(p=0.5, inplace=False) (dense1): Linear(in_features=10, out_features=10, bias=True) (output): Linear(in_features=10, out_features=2, bias=True) ), )
Also, as in sklearn
, you may call predict
or predict_proba
on the fitted model.
# Making prediction for first 5 data points of X
y_pred = net.predict(X[:5])
y_pred
array([0, 0, 0, 0, 0])
# Checking probarbility of each class for first 5 data points of X
y_proba = net.predict_proba(X[:5])
y_proba
array([[0.5603605 , 0.43963954], [0.782588 , 0.21741195], [0.6924924 , 0.30750763], [0.8895971 , 0.1104029 ], [0.7074626 , 0.2925373 ]], dtype=float32)
from sklearn.datasets import make_regression
# This is a toy dataset for regression, 1000 data points with 20 features each
X_regr, y_regr = make_regression(1000, 20, n_informative=10, random_state=0)
X_regr = X_regr.astype(np.float32)
y_regr = y_regr.astype(np.float32) / 100
y_regr = y_regr.reshape(-1, 1)
X_regr.shape, y_regr.shape, y_regr.min(), y_regr.max()
((1000, 20), (1000, 1), -6.4901485, 6.154505)
Note: Regression requires the target to be 2-dimensional, hence the need to reshape.
pytorch
regression module
¶Again, define a vanilla neural network with two hidden layers. The main difference is that the output layer only has one unit and does not apply a softmax nonlinearity.
class RegressorModule(nn.Module):
def __init__(
self,
num_units=10,
nonlin=F.relu,
):
super(RegressorModule, self).__init__()
self.num_units = num_units
self.nonlin = nonlin
self.dense0 = nn.Linear(20, num_units)
self.nonlin = nonlin
self.dense1 = nn.Linear(num_units, 10)
self.output = nn.Linear(10, 1)
def forward(self, X, **kwargs):
X = self.nonlin(self.dense0(X))
X = F.relu(self.dense1(X))
X = self.output(X)
return X
Training a regressor is almost the same as training a classifier. Mainly, we use NeuralNetRegressor
instead of NeuralNetClassifier
(this is the same terminology as in sklearn
).
from skorch import NeuralNetRegressor
net_regr = NeuralNetRegressor(
RegressorModule,
max_epochs=20,
lr=0.1,
# device='cuda', # uncomment this to train with CUDA
)
net_regr.fit(X_regr, y_regr)
epoch train_loss valid_loss dur ------- ------------ ------------ ------ 1 4.4794 3.4054 0.0327 2 2.6630 0.5670 0.0201 3 0.3102 0.2004 0.0278 4 0.2250 0.5211 0.0184 5 0.3249 0.1675 0.0190 6 0.1716 0.2142 0.0201 7 0.1175 0.1192 0.0222 8 0.1917 0.2653 0.0210 9 0.1455 0.1196 0.0212 10 0.1144 0.0803 0.0228 11 0.0434 0.0712 0.0218 12 0.0819 0.0690 0.0211 13 0.0419 0.0737 0.0219 14 0.0748 0.0498 0.0222 15 0.0310 0.0586 0.0217 16 0.0522 0.0312 0.0263 17 0.0189 0.0419 0.0223 18 0.0357 0.0219 0.0204 19 0.0134 0.0345 0.0215 20 0.0277 0.0161 0.0203
<class 'skorch.regressor.NeuralNetRegressor'>[initialized]( module_=RegressorModule( (dense0): Linear(in_features=20, out_features=10, bias=True) (dense1): Linear(in_features=10, out_features=10, bias=True) (output): Linear(in_features=10, out_features=1, bias=True) ), )
You may call predict
or predict_proba
on the fitted model. For regressions, both methods return the same value.
# Making prediction for first 5 data points of X
y_pred = net_regr.predict(X_regr[:5])
y_pred
array([[ 0.62908685], [-1.5245112 ], [-0.48306593], [-0.27282855], [-0.42769447]], dtype=float32)
Save and load either the whole model by using pickle or just the learned model parameters by calling save_params
and load_params
.
import pickle
file_name = '/tmp/mymodel.pkl'
with open(file_name, 'wb') as f:
pickle.dump(net, f)
with open(file_name, 'rb') as f:
new_net = pickle.load(f)
This only saves and loads the proper module
parameters, meaning that hyperparameters such as lr
and max_epochs
are not saved. Therefore, to load the model, we have to re-initialize it beforehand.
net.save_params(f_params=file_name) # a file handler also works
# first initialize the model
new_net = NeuralNetClassifier(
ClassifierModule,
max_epochs=20,
lr=0.1,
).initialize()
new_net.load_params(file_name)
sklearn Pipeline
¶It is possible to put the NeuralNetClassifier
inside an sklearn Pipeline
, as you would with any sklearn
classifier.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
pipe = Pipeline([
('scale', StandardScaler()),
('net', net),
])
pipe.fit(X, y)
Re-initializing module. Re-initializing criterion. Re-initializing optimizer. epoch train_loss valid_acc valid_loss dur ------- ------------ ----------- ------------ ------ 1 0.7188 0.4350 0.7027 0.0177 2 0.6989 0.4500 0.6988 0.0184 3 0.7031 0.4750 0.6956 0.0254 4 0.6956 0.5250 0.6931 0.0204 5 0.6892 0.5250 0.6912 0.0196 6 0.6905 0.5300 0.6890 0.0233 7 0.6888 0.5400 0.6866 0.0302 8 0.6842 0.5700 0.6842 0.0293 9 0.6815 0.5950 0.6815 0.0193 10 0.6761 0.5900 0.6787 0.0200 11 0.6777 0.5850 0.6761 0.0195 12 0.6677 0.6050 0.6730 0.0207 13 0.6646 0.6250 0.6695 0.0188 14 0.6620 0.6350 0.6647 0.0195 15 0.6560 0.6350 0.6604 0.0230 16 0.6478 0.6400 0.6562 0.0208 17 0.6499 0.6400 0.6519 0.0198 18 0.6275 0.6450 0.6459 0.0216 19 0.6342 0.6550 0.6412 0.0230 20 0.6192 0.6550 0.6356 0.0221
Pipeline(steps=[('scale', StandardScaler()), ('net', <class 'skorch.classifier.NeuralNetClassifier'>[initialized]( module_=ClassifierModule( (dense0): Linear(in_features=20, out_features=10, bias=True) (dropout): Dropout(p=0.5, inplace=False) (dense1): Linear(in_features=10, out_features=10, bias=True) (output): Linear(in_features=10, out_features=2, bias=True) ), ))])
y_proba = pipe.predict_proba(X[:5])
y_proba
array([[0.45305932, 0.5469407 ], [0.6833223 , 0.3166777 ], [0.7487094 , 0.2512906 ], [0.7011023 , 0.29889768], [0.7423797 , 0.25762028]], dtype=float32)
To save the whole pipeline, including the pytorch module, use pickle
.
Adding a new callback to the model is straightforward. Below we show how to add a new callback that determines the area under the ROC (AUC) score.
from skorch.callbacks import EpochScoring
There is a scoring callback in skorch, EpochScoring
, which we use for this. We have to specify which score to calculate. We have 3 choices:
sklearn
metric. For a list of all existing scores, look here.None
: If you implement your own .score
method on your neural net, passing scoring=None
will tell skorch
to use that.func(model, X, y) -> score
, which is then used.Note that this works exactly the same as scoring in sklearn
does.
For our case here, since sklearn
already implements AUC, we just pass the correct string 'roc_auc'
. We should also tell the callback that higher scores are better (to get the correct colors printed below -- by default, lower scores are assumed to be better). Furthermore, we may specify a name
argument for EpochScoring
, and whether to use training data (by setting on_train=True
) or validation data (which is the default).
auc = EpochScoring(scoring='roc_auc', lower_is_better=False)
Finally, we pass the scoring callback to the callbacks
parameter as a list and then call fit
. Notice that we get the printed scores and color highlighting for free.
net = NeuralNetClassifier(
ClassifierModule,
max_epochs=20,
lr=0.1,
callbacks=[auc],
)
net.fit(X, y)
epoch roc_auc train_loss valid_acc valid_loss dur ------- --------- ------------ ----------- ------------ ------ 1 0.6936 0.7299 0.5550 0.6742 0.0176 2 0.7103 0.6848 0.6600 0.6601 0.0208 3 0.7155 0.6550 0.6900 0.6536 0.0244 4 0.7255 0.6355 0.7200 0.6485 0.0179 5 0.7340 0.6380 0.7250 0.6422 0.0186 6 0.7373 0.6268 0.7400 0.6363 0.0200 7 0.7445 0.6157 0.7400 0.6317 0.0244 8 0.7477 0.6128 0.7450 0.6258 0.0195 9 0.7573 0.6068 0.7150 0.6153 0.0182 10 0.7616 0.5958 0.7350 0.6105 0.0266 11 0.7684 0.5819 0.7300 0.6010 0.0188 12 0.7712 0.5847 0.7000 0.5935 0.0197 13 0.7719 0.5659 0.7250 0.5895 0.0168 14 0.7746 0.5561 0.7300 0.5831 0.0202 15 0.7789 0.5632 0.7400 0.5779 0.0194 16 0.7839 0.5352 0.7250 0.5730 0.0197 17 0.7839 0.5495 0.7350 0.5693 0.0214 18 0.7816 0.5473 0.7350 0.5721 0.0225 19 0.7903 0.5436 0.7450 0.5592 0.0167 20 0.7948 0.5430 0.7450 0.5519 0.0186
<class 'skorch.classifier.NeuralNetClassifier'>[initialized]( module_=ClassifierModule( (dense0): Linear(in_features=20, out_features=10, bias=True) (dropout): Dropout(p=0.5, inplace=False) (dense1): Linear(in_features=10, out_features=10, bias=True) (output): Linear(in_features=10, out_features=2, bias=True) ), )
For information on how to write custom callbacks, have a look at the Advanced_Usage notebook.
GridSearchCV
¶The NeuralNet
class allows to directly access parameters of the pytorch module
by using the module__
prefix. So e.g. if you defined the module
to have a num_units
parameter, you can set it via the module__num_units
argument. This is exactly the same logic that allows to access estimator parameters in sklearn Pipeline
s and FeatureUnion
s.
This feature is useful in several ways. For one, it allows to set those parameters in the model definition. Furthermore, it allows you to set parameters in an sklearn GridSearchCV
as shown below.
In addition to the parameters prefixed by module__
, you may access a couple of other attributes, such as those of the optimizer by using the optimizer__
prefix (again, see below). All those special prefixes are stored in the prefixes_
attribute:
print(', '.join(net.prefixes_))
iterator_train, iterator_valid, callbacks, dataset, module, criterion, optimizer
Below we show how to perform a grid search over the learning rate (lr
), the module's number of hidden units (module__num_units
), the module's dropout rate (module__dropout
), and whether the SGD optimizer should use Nesterov momentum or not (optimizer__nesterov
).
from sklearn.model_selection import GridSearchCV
net = NeuralNetClassifier(
ClassifierModule,
max_epochs=20,
lr=0.1,
optimizer__momentum=0.9,
verbose=0,
train_split=False,
)
Note: We set the verbosity level to zero (verbose=0
) to prevent too much print output from being shown. Also, we disable the skorch-internal train-validation split (train_split=False
) because GridSearchCV
already splits the training data for us. We only have to leave the skorch-internal split enabled for some specific uses, e.g. to perform EarlyStopping
.
params = {
'lr': [0.05, 0.1],
'module__num_units': [10, 20],
'module__dropout': [0, 0.5],
'optimizer__nesterov': [False, True],
}
gs = GridSearchCV(net, params, refit=False, cv=3, scoring='accuracy', verbose=2)
gs.fit(X, y)
Fitting 3 folds for each of 16 candidates, totalling 48 fits [CV] END lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=False; total time= 0.2s [CV] END lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False; total time= 0.3s [CV] END lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True; total time= 0.3s [CV] END lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True; total time= 0.3s
GridSearchCV(cv=3, estimator=<class 'skorch.classifier.NeuralNetClassifier'>[uninitialized]( module=<class '__main__.ClassifierModule'>, ), param_grid={'lr': [0.05, 0.1], 'module__dropout': [0, 0.5], 'module__num_units': [10, 20], 'optimizer__nesterov': [False, True]}, refit=False, scoring='accuracy', verbose=2)
print(gs.best_score_, gs.best_params_)
0.8780367193540846 {'lr': 0.1, 'module__dropout': 0, 'module__num_units': 20, 'optimizer__nesterov': False}
Of course, we could further nest the NeuralNetClassifier
within an sklearn Pipeline
, in which case we just prefix the parameter by the name of the net (e.g. net__module__num_units
).