This short notebook explains how you can create a model checkpoint on Hugging Face Hub.
import subprocess
# Installation on Google Colab
try:
import google.colab
subprocess.run(['python', '-m', 'pip', 'install', 'skorch' , 'huggingface_hub', 'transformers'])
except ImportError:
pass
import os
import numpy as np
import torch
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from torch import nn
from skorch import NeuralNetClassifier
from skorch.callbacks import TrainEndCheckpoint
from skorch.hf import HfHubStorage
from huggingface_hub import Repository, create_repo, HfApi
If not installed already, please install the Hugging Face Hub library:
$ python -m pip install huggingface_hub
Also, you need skorch>=0.12
or installed from the master branch on GitHub.
# set the token as an environment variable called HF_TOKEN, e.g. `HF_TOKEN=hf_...`
# the token can be found at: https://huggingface.co/settings/tokens
TOKEN = os.environ['HF_TOKEN']
# choose name for the whole model and for the model weights
# typically, you only need one of the two, we use both for demonstration purposes
MODEL_NAME = 'skorch-model.pkl'
WEIGHTS_NAME = 'weights.pt'
# choose a repo name within your user account or organization
REPO_NAME = 'sawradip/demo-skorch'
torch.manual_seed(0)
np.random.seed(0)
We use a toy dataset for this demo.
X, y = make_classification(10000, 20, n_informative=10, random_state=0)
X, y = X.astype(np.float32), y.astype(np.int64)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
class ClassifierModule(nn.Module):
def __init__(
self,
num_units=30,
nonlin=nn.ReLU(),
dropout=0.5,
):
super(ClassifierModule, self).__init__()
self.num_units = num_units
self.nonlin = nonlin
self.dropout = dropout
self.dense0 = nn.Linear(20, num_units)
self.nonlin = nonlin
self.dropout = nn.Dropout(dropout)
self.dense1 = nn.Linear(num_units, num_units)
self.output = nn.Linear(num_units, 2)
self.softmax = nn.Softmax(dim=-1)
def forward(self, X, **kwargs):
X = self.nonlin(self.dense0(X))
X = self.dropout(X)
X = self.nonlin(self.dense1(X))
X = self.softmax(self.output(X))
return X
Assuming the repo doesn't exist yet, create a new one using this function:
skorch_repo = create_repo(
REPO_NAME,
private=True, # set to False if it should be public
token=TOKEN,
exist_ok=True,
)
skorch_repo
'https://huggingface.co/sawradip/demo-skorch'
HfHubStorage
instance to use with the TrainEndCheckpoint
callback¶The ingredient we need to save models on the hub is the skorch.hf.HfHubStorage
. This object can be used instead of a filename when you use skorch.callbacks.TrainEndCheckpoint
(or skorch.callbacks.Checkpoint
, but more on that later). Therefore, you can continue to use your existing checkpoints, only that models are stored on Hugging Face Hub instead of locally.
As a first step, we need to create a HfApi
instance, which is used by the HfHubStorage
to perform the upload.
hf_api = HfApi()
Then, we create a hub_pickle_storer
, which is used by the checkpoint callback to write the whole skorch model as a pickle file to the indicated repository. We indicate the file path, repository name, and the Hugging Face token. Optionally, we can also set verbose=1
to print a message when a file has been uploaded.
hub_pickle_storer = HfHubStorage(
hf_api,
path_in_repo=MODEL_NAME,
repo_id=REPO_NAME,
token=TOKEN,
verbose=1,
)
Instead of writing the whole skorch model to the Hub, we can also decide to only write specific components, e.g. the module
. This saves the state_dict
of the module to the Hub using torch.save
under the hood.
Also, by default, the parameters are stored in an in-memory buffer. If you want to avoid that memory overhead, it is possible to save it on disk using the local_storage
argument. Below, we choose to store the model weights in a file called my-model-weights.pt
.
hub_params_storer = HfHubStorage(
hf_api,
path_in_repo=WEIGHTS_NAME,
repo_id=REPO_NAME,
token=TOKEN,
verbose=1,
local_storage='my-model-weights.pt',
)
The other attributes (optimizer, criterion, training history) are not saved for this demo. That's why we set their values to None
when initializing the TrainEndCheckpoint
below.
checkpoint = TrainEndCheckpoint(
f_pickle=hub_pickle_storer,
f_params=hub_params_storer,
f_optimizer=None,
f_criterion=None,
f_history=None,
)
Finally, let's create our net and fit it with the data. The checkpoint callback will automatically store the parameters on the Hugging Face Hub at the end of training.
net = NeuralNetClassifier(
ClassifierModule,
lr=0.1,
device='cpu',
iterator_train__shuffle=True,
callbacks=[checkpoint],
)
net.fit(X_train, y_train)
epoch train_loss valid_acc valid_loss dur ------- ------------ ----------- ------------ ------ 1 0.6627 0.7573 0.5772 0.8426 2 0.5550 0.8593 0.4154 0.1066 3 0.4622 0.8973 0.3253 0.0958 4 0.4119 0.9073 0.2840 0.1034 5 0.3739 0.9113 0.2569 0.1067 6 0.3489 0.9213 0.2368 0.1011 7 0.3331 0.9240 0.2328 0.1049 8 0.3115 0.9287 0.2187 0.1066 9 0.3117 0.9300 0.2087 0.0959 10 0.2983 0.9320 0.2102 0.0977 Uploaded file to https://huggingface.co/sawradip/demo-skorch/blob/main/weights.pt Uploaded file to https://huggingface.co/sawradip/demo-skorch/blob/main/skorch-model.pkl
<class 'skorch.classifier.NeuralNetClassifier'>[initialized]( module_=ClassifierModule( (nonlin): ReLU() (dense0): Linear(in_features=20, out_features=30, bias=True) (dropout): Dropout(p=0.5, inplace=False) (dense1): Linear(in_features=30, out_features=30, bias=True) (output): Linear(in_features=30, out_features=2, bias=True) (softmax): Softmax(dim=-1) ), )
As you can see, both the weights of the PyTorch module and the whole skorch model were saved on Hub. Visit the printed URLs to see them on the Hub.
As a next step, think about adding a Model Card to your repository to provide further information about the model.
Right now, we use TrainEndCheckpoint
, which uploads the model only once, at the end of training. Instead, we could use Checkpoint
, which uploads the model each time that the monitored metric improves. You should note, however, that at the moment, the upload is synchronous, i.e. we wait for the upload to finish. So if uploading the model takes a long time compared to training the model, your training process could be slowed down considerably, depending on how often the model improves.
If you still decide to use Checkpoint
, you might want to keep a version of each upload file, instead of the latest one overwriting the previous one. This is possible by choosing a templated model name, e.g. 'skorch-model-{}.pkl'
. This way, the first upload will create the file 'skorch-model-0.pkl'
, the second one creates the file 'skorch-model-1.pkl'
, etc.
import pickle
from huggingface_hub import hf_hub_download
from sklearn.metrics import accuracy_score
The skorch model is just a normal pickle file and can be loaded like this:
hub_pickle_storer.latest_url_
'https://huggingface.co/sawradip/demo-skorch/blob/main/skorch-model.pkl'
path = hf_hub_download(REPO_NAME, MODEL_NAME, use_auth_token=TOKEN)
Downloading: 0%| | 0.00/43.2k [00:00<?, ?B/s]
with open(path, 'rb') as f:
net_loaded = pickle.load(f)
accuracy_score(y, net_loaded.predict(X))
0.9334
The model weights are stored as a PyTorch state_dict
.
hub_params_storer.latest_url_
'https://huggingface.co/sawradip/demo-skorch/blob/main/weights.pt'
path = hf_hub_download(REPO_NAME, WEIGHTS_NAME, use_auth_token=TOKEN)
Downloading: 0%| | 0.00/8.54k [00:00<?, ?B/s]
with open(path, 'rb') as f:
weights_loaded = torch.load(f)
for key, val in weights_loaded.items():
print(f"Parameter name '{key}' and shape {val.shape}")
Parameter name 'dense0.weight' and shape torch.Size([30, 20]) Parameter name 'dense0.bias' and shape torch.Size([30]) Parameter name 'dense1.weight' and shape torch.Size([30, 30]) Parameter name 'dense1.bias' and shape torch.Size([30]) Parameter name 'output.weight' and shape torch.Size([2, 30]) Parameter name 'output.bias' and shape torch.Size([2])
Typically, when you store the whole skorch model, you don't need to store the weights separately, as they are already part of the whole model:
for key, val in net_loaded.module_.state_dict().items():
print(f"Parameter name '{key}' and shape {val.shape}")
Parameter name 'dense0.weight' and shape torch.Size([30, 20]) Parameter name 'dense0.bias' and shape torch.Size([30]) Parameter name 'dense1.weight' and shape torch.Size([30, 30]) Parameter name 'dense1.bias' and shape torch.Size([30]) Parameter name 'output.weight' and shape torch.Size([2, 30]) Parameter name 'output.bias' and shape torch.Size([2])
However, there can be situations where you don't need the whole skorch model, in which case you can only store the model weights.