Creating checkpoints on the Hugging Face Hub¶

This short notebook explains how you can create a model checkpoint on Hugging Face Hub.

In [1]:

import subprocess

# Installation on Google Colab
try:
    import google.colab
    subprocess.run(['python', '-m', 'pip', 'install', 'skorch' , 'huggingface_hub', 'transformers'])
except ImportError:
    pass

Imports¶

In [2]:

import os

In [3]:

import numpy as np
import torch
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from torch import nn

In [4]:

from skorch import NeuralNetClassifier
from skorch.callbacks import TrainEndCheckpoint
from skorch.hf import HfHubStorage

In [5]:

from huggingface_hub import Repository, create_repo, HfApi

If not installed already, please install the Hugging Face Hub library:

$ python -m pip install huggingface_hub

Also, you need skorch>=0.12 or installed from the master branch on GitHub.

Run in Google Colab

View source on GitHub

Settings¶

In [6]:

# set the token as an environment variable called HF_TOKEN, e.g. `HF_TOKEN=hf_...`
# the token can be found at: https://huggingface.co/settings/tokens
TOKEN = os.environ['HF_TOKEN']
# choose name for the whole model and for the model weights
# typically, you only need one of the two, we use both for demonstration purposes
MODEL_NAME = 'skorch-model.pkl'
WEIGHTS_NAME = 'weights.pt'
# choose a repo name within your user account or organization
REPO_NAME = 'sawradip/demo-skorch'

In [7]:

torch.manual_seed(0)
np.random.seed(0)

Create data¶

We use a toy dataset for this demo.

In [8]:

X, y = make_classification(10000, 20, n_informative=10, random_state=0)
X, y = X.astype(np.float32), y.astype(np.int64)

In [9]:

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

Define model¶

The module¶

In [10]:

class ClassifierModule(nn.Module):
    def __init__(
            self,
            num_units=30,
            nonlin=nn.ReLU(),
            dropout=0.5,
    ):
        super(ClassifierModule, self).__init__()
        self.num_units = num_units
        self.nonlin = nonlin
        self.dropout = dropout

        self.dense0 = nn.Linear(20, num_units)
        self.nonlin = nonlin
        self.dropout = nn.Dropout(dropout)
        self.dense1 = nn.Linear(num_units, num_units)
        self.output = nn.Linear(num_units, 2)
        self.softmax = nn.Softmax(dim=-1)

    def forward(self, X, **kwargs):
        X = self.nonlin(self.dense0(X))
        X = self.dropout(X)
        X = self.nonlin(self.dense1(X))
        X = self.softmax(self.output(X))
        return X

Create a repository on Hugging Face Hub¶

Assuming the repo doesn't exist yet, create a new one using this function:

In [11]:

skorch_repo = create_repo(
    REPO_NAME,
    private=True,  # set to False if it should be public
    token=TOKEN,
    exist_ok=True,
)
skorch_repo

Out[11]:

'https://huggingface.co/sawradip/demo-skorch'

Create a `HfHubStorage` instance to use with the `TrainEndCheckpoint` callback¶

The ingredient we need to save models on the hub is the skorch.hf.HfHubStorage. This object can be used instead of a filename when you use skorch.callbacks.TrainEndCheckpoint (or skorch.callbacks.Checkpoint, but more on that later). Therefore, you can continue to use your existing checkpoints, only that models are stored on Hugging Face Hub instead of locally.

As a first step, we need to create a HfApi instance, which is used by the HfHubStorage to perform the upload.

In [12]:

hf_api = HfApi()

Then, we create a hub_pickle_storer, which is used by the checkpoint callback to write the whole skorch model as a pickle file to the indicated repository. We indicate the file path, repository name, and the Hugging Face token. Optionally, we can also set verbose=1 to print a message when a file has been uploaded.

In [13]:

hub_pickle_storer = HfHubStorage(
    hf_api,
    path_in_repo=MODEL_NAME,
    repo_id=REPO_NAME,
    token=TOKEN,
    verbose=1,
)

Instead of writing the whole skorch model to the Hub, we can also decide to only write specific components, e.g. the module. This saves the state_dict of the module to the Hub using torch.save under the hood.

Also, by default, the parameters are stored in an in-memory buffer. If you want to avoid that memory overhead, it is possible to save it on disk using the local_storage argument. Below, we choose to store the model weights in a file called my-model-weights.pt.

In [14]:

hub_params_storer = HfHubStorage(
    hf_api,
    path_in_repo=WEIGHTS_NAME,
    repo_id=REPO_NAME,
    token=TOKEN,
    verbose=1,
    local_storage='my-model-weights.pt',
)

The other attributes (optimizer, criterion, training history) are not saved for this demo. That's why we set their values to None when initializing the TrainEndCheckpoint below.

In [15]:

checkpoint = TrainEndCheckpoint(
    f_pickle=hub_pickle_storer,
    f_params=hub_params_storer,
    f_optimizer=None,
    f_criterion=None,
    f_history=None,
)

Finally, let's create our net and fit it with the data. The checkpoint callback will automatically store the parameters on the Hugging Face Hub at the end of training.

In [16]:

net = NeuralNetClassifier(
    ClassifierModule,
    lr=0.1,
    device='cpu',
    iterator_train__shuffle=True,
    callbacks=[checkpoint],
)

In [17]:

net.fit(X_train, y_train)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        0.6627       0.7573        0.5772  0.8426
      2        0.5550       0.8593        0.4154  0.1066
      3        0.4622       0.8973        0.3253  0.0958
      4        0.4119       0.9073        0.2840  0.1034
      5        0.3739       0.9113        0.2569  0.1067
      6        0.3489       0.9213        0.2368  0.1011
      7        0.3331       0.9240        0.2328  0.1049
      8        0.3115       0.9287        0.2187  0.1066
      9        0.3117       0.9300        0.2087  0.0959
     10        0.2983       0.9320        0.2102  0.0977
Uploaded file to https://huggingface.co/sawradip/demo-skorch/blob/main/weights.pt
Uploaded file to https://huggingface.co/sawradip/demo-skorch/blob/main/skorch-model.pkl

Out[17]:

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=ClassifierModule(
    (nonlin): ReLU()
    (dense0): Linear(in_features=20, out_features=30, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
    (dense1): Linear(in_features=30, out_features=30, bias=True)
    (output): Linear(in_features=30, out_features=2, bias=True)
    (softmax): Softmax(dim=-1)
  ),
)

As you can see, both the weights of the PyTorch module and the whole skorch model were saved on Hub. Visit the printed URLs to see them on the Hub.

As a next step, think about adding a Model Card to your repository to provide further information about the model.

Info: Using the HfHubStorage with Checkpoint:

Right now, we use TrainEndCheckpoint, which uploads the model only once, at the end of training. Instead, we could use Checkpoint, which uploads the model each time that the monitored metric improves. You should note, however, that at the moment, the upload is synchronous, i.e. we wait for the upload to finish. So if uploading the model takes a long time compared to training the model, your training process could be slowed down considerably, depending on how often the model improves.

If you still decide to use Checkpoint, you might want to keep a version of each upload file, instead of the latest one overwriting the previous one. This is possible by choosing a templated model name, e.g. 'skorch-model-{}.pkl'. This way, the first upload will create the file 'skorch-model-0.pkl', the second one creates the file 'skorch-model-1.pkl', etc.

Loading¶

In [18]:

import pickle
from huggingface_hub import hf_hub_download
from sklearn.metrics import accuracy_score

Loading the whole model¶

The skorch model is just a normal pickle file and can be loaded like this:

In [19]:

hub_pickle_storer.latest_url_

Out[19]:

'https://huggingface.co/sawradip/demo-skorch/blob/main/skorch-model.pkl'

In [20]:

path = hf_hub_download(REPO_NAME, MODEL_NAME, use_auth_token=TOKEN)

Downloading:   0%|          | 0.00/43.2k [00:00<?, ?B/s]

In [21]:

with open(path, 'rb') as f:
    net_loaded = pickle.load(f)

In [22]:

accuracy_score(y, net_loaded.predict(X))

Out[22]:

0.9334

Loading the model weights¶

The model weights are stored as a PyTorch state_dict.

In [23]:

hub_params_storer.latest_url_

Out[23]:

'https://huggingface.co/sawradip/demo-skorch/blob/main/weights.pt'

In [24]:

path = hf_hub_download(REPO_NAME, WEIGHTS_NAME, use_auth_token=TOKEN)

Downloading:   0%|          | 0.00/8.54k [00:00<?, ?B/s]

In [25]:

with open(path, 'rb') as f:
    weights_loaded = torch.load(f)

In [26]:

for key, val in weights_loaded.items():
    print(f"Parameter name '{key}' and shape {val.shape}")

Parameter name 'dense0.weight' and shape torch.Size([30, 20])
Parameter name 'dense0.bias' and shape torch.Size([30])
Parameter name 'dense1.weight' and shape torch.Size([30, 30])
Parameter name 'dense1.bias' and shape torch.Size([30])
Parameter name 'output.weight' and shape torch.Size([2, 30])
Parameter name 'output.bias' and shape torch.Size([2])

Typically, when you store the whole skorch model, you don't need to store the weights separately, as they are already part of the whole model:

In [27]:

for key, val in net_loaded.module_.state_dict().items():
    print(f"Parameter name '{key}' and shape {val.shape}")

Parameter name 'dense0.weight' and shape torch.Size([30, 20])
Parameter name 'dense0.bias' and shape torch.Size([30])
Parameter name 'dense1.weight' and shape torch.Size([30, 30])
Parameter name 'dense1.bias' and shape torch.Size([30])
Parameter name 'output.weight' and shape torch.Size([2, 30])
Parameter name 'output.bias' and shape torch.Size([2])

However, there can be situations where you don't need the whole skorch model, in which case you can only store the model weights.