Notebook

Basic usage¶

skorch is designed to maximize interoperability between sklearn and pytorch. The aim is to keep 99% of the flexibility of pytorch while being able to leverage most features of sklearn. Below, we show the basic usage of skorch and how it can be combined with sklearn.

Run in Google Colab

View source on GitHub

This notebook shows you how to use the basic functionality of skorch.

Table of contents¶

Definition of the pytorch module
Training a classifier
Training a regressor
Saving and loading a model
- Whole model
- Only parameters
Usage with an sklearn Pipeline
Callbacks
Grid search
- Special prefixes
- Performing a grid search

In [1]:

! [ ! -z "$COLAB_GPU" ] && pip install torch skorch

In [2]:

import torch
from torch import nn
import torch.nn.functional as F

torch.manual_seed(0);

Training a classifier and making predictions¶

A toy binary classification task¶

We load a toy classification task from sklearn.

In [3]:

import numpy as np
from sklearn.datasets import make_classification

In [4]:

X, y = make_classification(1000, 20, n_informative=10, random_state=0)
X, y = X.astype(np.float32), y.astype(np.int64)

In [5]:

X.shape, y.shape, y.mean()

Out[5]:

((1000, 20), (1000,), 0.5)

Definition of the `pytorch` classification `module`¶

We define a vanilla neural network with two hidden layers. The output layer should have 2 output units since there are two classes. In addition, it should have a softmax nonlinearity, because later, when calling predict_proba, the output from the forward call will be used.

In [6]:

class ClassifierModule(nn.Module):
    def __init__(
            self,
            num_units=10,
            nonlin=F.relu,
            dropout=0.5,
    ):
        super(ClassifierModule, self).__init__()
        self.num_units = num_units
        self.nonlin = nonlin
        self.dropout = dropout

        self.dense0 = nn.Linear(20, num_units)
        self.nonlin = nonlin
        self.dropout = nn.Dropout(dropout)
        self.dense1 = nn.Linear(num_units, 10)
        self.output = nn.Linear(10, 2)

    def forward(self, X, **kwargs):
        X = self.nonlin(self.dense0(X))
        X = self.dropout(X)
        X = F.relu(self.dense1(X))
        X = F.softmax(self.output(X), dim=-1)
        return X

Defining and training the neural net classifier¶

We use NeuralNetClassifier because we're dealing with a classifcation task. The first argument should be the pytorch module. As additional arguments, we pass the number of epochs and the learning rate (lr), but those are optional.

Note: To use the CUDA backend, pass device='cuda' as an additional argument.

In [7]:

from skorch import NeuralNetClassifier

In [8]:

net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
#     device='cuda',  # uncomment this to train with CUDA
)

As in sklearn, we call fit passing the input data X and the targets y. By default, NeuralNetClassifier makes a StratifiedKFold split on the data (80/20) to track the validation loss. This is shown, as well as the train loss and the accuracy on the validation set.

In [10]:

net.fit(X, y)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        0.6905       0.6150        0.6749  0.0235
      2        0.6648       0.6450        0.6633  0.0213
      3        0.6619       0.6750        0.6533  0.0219
      4        0.6429       0.6800        0.6399  0.0207
      5        0.6307       0.6950        0.6254  0.0192
      6        0.6291       0.7000        0.6134  0.0202
      7        0.6102       0.7100        0.6033  0.0220
      8        0.6050       0.7000        0.5931  0.0210
      9        0.5966       0.7000        0.5844  0.0217
     10        0.5636       0.7100        0.5689  0.0226
     11        0.5757       0.7200        0.5628  0.0196
     12        0.5757       0.7200        0.5520  0.0190
     13        0.5559       0.7300        0.5459  0.0218
     14        0.5541       0.7300        0.5424  0.0206
     15        0.5659       0.7350        0.5378  0.0215
     16        0.5364       0.7350        0.5322  0.0192
     17        0.5456       0.7300        0.5239  0.0221
     18        0.5476       0.7450        0.5260  0.0188
     19        0.5499       0.7500        0.5249  0.0213
     20        0.5273       0.7350        0.5251  0.0206

Out[10]:

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=ClassifierModule(
    (dense0): Linear(in_features=20, out_features=10, bias=True)
    (dropout): Dropout(p=0.5)
    (dense1): Linear(in_features=10, out_features=10, bias=True)
    (output): Linear(in_features=10, out_features=2, bias=True)
  ),
)

Also, as in sklearn, you may call predict or predict_proba on the fitted model.

Making predictions, classification¶

In [11]:

y_pred = net.predict(X[:5])
y_pred

Out[11]:

array([0, 0, 0, 0, 0])

In [12]:

y_proba = net.predict_proba(X[:5])
y_proba

Out[12]:

array([[0.5349464 , 0.46505365],
       [0.8685093 , 0.1314907 ],
       [0.6860039 , 0.31399614],
       [0.9126012 , 0.08739878],
       [0.69675475, 0.30324525]], dtype=float32)

Training a regressor¶

A toy regression task¶

In [13]:

from sklearn.datasets import make_regression

In [14]:

X_regr, y_regr = make_regression(1000, 20, n_informative=10, random_state=0)
X_regr = X_regr.astype(np.float32)
y_regr = y_regr.astype(np.float32) / 100
y_regr = y_regr.reshape(-1, 1)

In [15]:

X_regr.shape, y_regr.shape, y_regr.min(), y_regr.max()

Out[15]:

((1000, 20), (1000, 1), -6.4901485, 6.154505)

Note: Regression currently requires the target to be 2-dimensional, hence the need to reshape. This should be fixed with an upcoming version of pytorch.

Definition of the `pytorch` regression `module`¶

Again, define a vanilla neural network with two hidden layers. The main difference is that the output layer only has one unit and does not apply a softmax nonlinearity.

In [16]:

class RegressorModule(nn.Module):
    def __init__(
            self,
            num_units=10,
            nonlin=F.relu,
    ):
        super(RegressorModule, self).__init__()
        self.num_units = num_units
        self.nonlin = nonlin

        self.dense0 = nn.Linear(20, num_units)
        self.nonlin = nonlin
        self.dense1 = nn.Linear(num_units, 10)
        self.output = nn.Linear(10, 1)

    def forward(self, X, **kwargs):
        X = self.nonlin(self.dense0(X))
        X = F.relu(self.dense1(X))
        X = self.output(X)
        return X

Defining and training the neural net regressor¶

Training a regressor is almost the same as training a classifier. Mainly, we use NeuralNetRegressor instead of NeuralNetClassifier (this is the same terminology as in sklearn).

In [17]:

from skorch import NeuralNetRegressor

In [18]:

net_regr = NeuralNetRegressor(
    RegressorModule,
    max_epochs=20,
    lr=0.1,
#     device='cuda',  # uncomment this to train with CUDA
)

In [19]:

net_regr.fit(X_regr, y_regr)

  epoch    train_loss    valid_loss     dur
-------  ------------  ------------  ------
      1        4.4168        3.0788  0.0292
      2        2.0120        0.4565  0.0270
      3        0.3343        0.2262  0.0263
      4        0.1851        0.2223  0.0257
      5        0.1491        0.1068  0.0242
      6        0.0946        0.1207  0.0263
      7        0.0739        0.0663  0.0290
      8        0.0554        0.0706  0.0298
      9        0.0437        0.0461  0.0337
     10        0.0372        0.0469  0.0273
     11        0.0291        0.0343  0.0263
     12        0.0270        0.0333  0.0285
     13        0.0207        0.0265  0.0281
     14        0.0196        0.0249  0.0344
     15        0.0152        0.0215  0.0286
     16        0.0151        0.0198  0.0281
     17        0.0120        0.0182  0.0283
     18        0.0119        0.0167  0.0266
     19        0.0100        0.0159  0.0266
     20        0.0097        0.0149  0.0259

Out[19]:

<class 'skorch.regressor.NeuralNetRegressor'>[initialized](
  module_=RegressorModule(
    (dense0): Linear(in_features=20, out_features=10, bias=True)
    (dense1): Linear(in_features=10, out_features=10, bias=True)
    (output): Linear(in_features=10, out_features=1, bias=True)
  ),
)

Making predictions, regression¶

You may call predict or predict_proba on the fitted model. For regressions, both methods return the same value.

In [20]:

y_pred = net_regr.predict(X_regr[:5])
y_pred

Out[20]:

array([[ 0.4903931 ],
       [-1.4224019 ],
       [-0.77500594],
       [-0.06901944],
       [-0.3867012 ]], dtype=float32)

Saving and loading a model¶

Save and load either the whole model by using pickle or just the learned model parameters by calling save_params and load_params.

Saving the whole model¶

In [21]:

import pickle

In [22]:

file_name = '/tmp/mymodel.pkl'

In [23]:

with open(file_name, 'wb') as f:
    pickle.dump(net, f)

/Users/thomasfan/anaconda3/lib/python3.7/site-packages/torch/serialization.py:241: UserWarning: Couldn't retrieve source code for container of type ClassifierModule. It won't be checked for correctness upon loading.
  "type " + obj.__name__ + ". It won't be checked "

In [24]:

with open(file_name, 'rb') as f:
    new_net = pickle.load(f)

Saving only the model parameters¶

This only saves and loads the proper module parameters, meaning that hyperparameters such as lr and max_epochs are not saved. Therefore, to load the model, we have to re-initialize it beforehand.

In [25]:

net.save_params(f_params=file_name)  # a file handler also works

In [26]:

# first initialize the model
new_net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
).initialize()

In [27]:

new_net.load_params(file_name)

Usage with an `sklearn Pipeline`¶

It is possible to put the NeuralNetClassifier inside an sklearn Pipeline, as you would with any sklearn classifier.

In [28]:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

In [29]:

pipe = Pipeline([
    ('scale', StandardScaler()),
    ('net', net),
])

In [30]:

pipe.fit(X, y)

Re-initializing module!
  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        0.7243       0.5000        0.7105  0.0184
      2        0.7057       0.5000        0.6996  0.0207
      3        0.6971       0.5000        0.6949  0.0192
      4        0.6936       0.5050        0.6929  0.0224
      5        0.6923       0.5400        0.6916  0.0210
      6        0.6905       0.5000        0.6906  0.0189
      7        0.6894       0.5100        0.6899  0.0194
      8        0.6891       0.5150        0.6892  0.0186
      9        0.6899       0.5250        0.6885  0.0202
     10        0.6844       0.5300        0.6876  0.0189
     11        0.6853       0.5650        0.6865  0.0199
     12        0.6842       0.5700        0.6855  0.0183
     13        0.6821       0.5850        0.6844  0.0199
     14        0.6821       0.6050        0.6832  0.0189
     15        0.6820       0.6100        0.6820  0.0206
     16        0.6769       0.6100        0.6800  0.0188
     17        0.6784       0.6200        0.6780  0.0219
     18        0.6763       0.6450        0.6761  0.0233
     19        0.6704       0.6550        0.6729  0.0254
     20        0.6691       0.6750        0.6699  0.0252

Out[30]:

Pipeline(memory=None,
     steps=[('scale', StandardScaler(copy=True, with_mean=True, with_std=True)), ('net', <class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=ClassifierModule(
    (dense0): Linear(in_features=20, out_features=10, bias=True)
    (dropout): Dropout(p=0.5)
    (dense1): Linear(in_features=10, out_features=10, bias=True)
    (output): Linear(in_features=10, out_features=2, bias=True)
  ),
))])

In [31]:

y_proba = pipe.predict_proba(X[:5])
y_proba

Out[31]:

array([[0.5064775 , 0.49352255],
       [0.53243965, 0.46756038],
       [0.57306874, 0.42693123],
       [0.54179883, 0.45820117],
       [0.5528906 , 0.44710937]], dtype=float32)

To save the whole pipeline, including the pytorch module, use pickle.

Callbacks¶

Adding a new callback to the model is straightforward. Below we show how to add a new callback that determines the area under the ROC (AUC) score.

In [32]:

from skorch.callbacks import EpochScoring

There is a scoring callback in skorch, EpochScoring, which we use for this. We have to specify which score to calculate. We have 3 choices:

Passing a string: This should be a valid sklearn metric. For a list of all existing scores, look here.
Passing None: If you implement your own .score method on your neural net, passing scoring=None will tell skorch to use that.
Passing a function or callable: If we want to define our own scoring function, we pass a function with the signature func(model, X, y) -> score, which is then used.

Note that this works exactly the same as scoring in sklearn does.

For our case here, since sklearn already implements AUC, we just pass the correct string 'roc_auc'. We should also tell the callback that higher scores are better (to get the correct colors printed below -- by default, lower scores are assumed to be better). Furthermore, we may specify a name argument for EpochScoring, and whether to use training data (by setting on_train=True) or validation data (which is the default).

In [33]:

auc = EpochScoring(scoring='roc_auc', lower_is_better=False)

Finally, we pass the scoring callback to the callbacks parameter as a list and then call fit. Notice that we get the printed scores and color highlighting for free.

In [34]:

net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
    callbacks=[auc],
)

In [35]:

net.fit(X, y)

  epoch    roc_auc    train_loss    valid_acc    valid_loss     dur
-------  ---------  ------------  -----------  ------------  ------
      1     0.6112        0.7076       0.5550        0.6802  0.0188
      2     0.6766        0.6750       0.6150        0.6626  0.0204
      3     0.7031        0.6560       0.6500        0.6498  0.0244
      4     0.7201        0.6364       0.6650        0.6381  0.0193
      5     0.7316        0.6176       0.6900        0.6285  0.0203
      6     0.7447        0.6094       0.7200        0.6183  0.0222
      7     0.7522        0.6170       0.7200        0.6090  0.0188
      8     0.7567        0.5786       0.7150        0.6032  0.0197
      9     0.7630        0.5850       0.7100        0.5954  0.0214
     10     0.7706        0.5770       0.7200        0.5889  0.0207
     11     0.7735        0.5740       0.7050        0.5842  0.0188
     12     0.7729        0.5771       0.7100        0.5859  0.0186
     13     0.7792        0.5557       0.7000        0.5745  0.0178
     14     0.7825        0.5810       0.7050        0.5691  0.0204
     15     0.7824        0.5634       0.7200        0.5691  0.0194
     16     0.7817        0.5778       0.7150        0.5704  0.0205
     17     0.7871        0.5624       0.7150        0.5633  0.0188
     18     0.7855        0.5613       0.7200        0.5660  0.0202
     19     0.7792        0.5637       0.7250        0.5722  0.0194
     20     0.7823        0.5516       0.7150        0.5681  0.0184

Out[35]:

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=ClassifierModule(
    (dense0): Linear(in_features=20, out_features=10, bias=True)
    (dropout): Dropout(p=0.5)
    (dense1): Linear(in_features=10, out_features=10, bias=True)
    (output): Linear(in_features=10, out_features=2, bias=True)
  ),
)

For information on how to write custom callbacks, have a look at the Advanced_Usage notebook.

Usage with sklearn `GridSearchCV`¶

Special prefixes¶

The NeuralNet class allows to directly access parameters of the pytorch module by using the module__ prefix. So e.g. if you defined the module to have a num_units parameter, you can set it via the module__num_units argument. This is exactly the same logic that allows to access estimator parameters in sklearn Pipelines and FeatureUnions.

This feature is useful in several ways. For one, it allows to set those parameters in the model definition. Furthermore, it allows you to set parameters in an sklearn GridSearchCV as shown below.

In addition to the parameters prefixed by module__, you may access a couple of other attributes, such as those of the optimizer by using the optimizer__ prefix (again, see below). All those special prefixes are stored in the prefixes_ attribute:

In [36]:

print(', '.join(net.prefixes_))

module, iterator_train, iterator_valid, optimizer, criterion, callbacks, dataset

Performing a grid search¶

Below we show how to perform a grid search over the learning rate (lr), the module's number of hidden units (module__num_units), the module's dropout rate (module__dropout), and whether the SGD optimizer should use Nesterov momentum or not (optimizer__nesterov).

In [37]:

from sklearn.model_selection import GridSearchCV

In [38]:

net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
    verbose=0,
    optimizer__momentum=0.9,
)

In [39]:

params = {
    'lr': [0.05, 0.1],
    'module__num_units': [10, 20],
    'module__dropout': [0, 0.5],
    'optimizer__nesterov': [False, True],
}

In [40]:

gs = GridSearchCV(net, params, refit=False, cv=3, scoring='accuracy', verbose=2)

In [41]:

gs.fit(X, y)

Fitting 3 folds for each of 16 candidates, totalling 48 fits
[CV] lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=False

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.

[CV]  lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=False

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s

[CV]  lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=False 
[CV]  lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=True 
[CV]  lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=True 
[CV]  lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=True 
[CV]  lr=0.05, module__dropout=0, module__num_units=10, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=False 
[CV]  lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=False 
[CV]  lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=False 
[CV]  lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=True 
[CV]  lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=True 
[CV]  lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=True 
[CV]  lr=0.05, module__dropout=0, module__num_units=20, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False 
[CV]  lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False 
[CV]  lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False 
[CV]  lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True 
[CV]  lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True 
[CV]  lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True 
[CV]  lr=0.05, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False 
[CV]  lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False 
[CV]  lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False 
[CV]  lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True 
[CV]  lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True 
[CV]  lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True 
[CV]  lr=0.05, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=False 
[CV]  lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=False 
[CV]  lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=False 
[CV]  lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=True 
[CV]  lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=True 
[CV]  lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=True 
[CV]  lr=0.1, module__dropout=0, module__num_units=10, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=False 
[CV]  lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=False 
[CV]  lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=False 
[CV]  lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=True 
[CV]  lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=True 
[CV]  lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=True 
[CV]  lr=0.1, module__dropout=0, module__num_units=20, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False 
[CV]  lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False 
[CV]  lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False 
[CV]  lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True 
[CV]  lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True 
[CV]  lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True 
[CV]  lr=0.1, module__dropout=0.5, module__num_units=10, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False 
[CV]  lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False 
[CV]  lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False 
[CV]  lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=False, total=   0.3s
[CV] lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True 
[CV]  lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True 
[CV]  lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True, total=   0.3s
[CV] lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True 
[CV]  lr=0.1, module__dropout=0.5, module__num_units=20, optimizer__nesterov=True, total=   0.3s

[Parallel(n_jobs=1)]: Done  48 out of  48 | elapsed:   15.7s finished

Out[41]:

GridSearchCV(cv=3, error_score='raise-deprecating',
       estimator=<class 'skorch.classifier.NeuralNetClassifier'>[uninitialized](
  module=<class '__main__.ClassifierModule'>,
),
       fit_params=None, iid='warn', n_jobs=None,
       param_grid={'lr': [0.05, 0.1], 'module__num_units': [10, 20], 'module__dropout': [0, 0.5], 'optimizer__nesterov': [False, True]},
       pre_dispatch='2*n_jobs', refit=False, return_train_score='warn',
       scoring='accuracy', verbose=2)

In [42]:

print(gs.best_score_, gs.best_params_)

0.862 {'lr': 0.05, 'module__dropout': 0, 'module__num_units': 20, 'optimizer__nesterov': False}

Of course, we could further nest the NeuralNetClassifier within an sklearn Pipeline, in which case we just prefix the parameter by the name of the net (e.g. net__module__num_units).