The objective of this notebook is to explain:
Author: Pierre Guillou on May 25, 2020.
If you want to change the fastai2 default configuration (3 parameters groups), you need to define a splitter
function and pass it to the Learner
.
Example
def splitter(m):
groups = [group for group in m.children()]
groups = L(groups)
return groups.map(params)
learn = Learner(dls, my_model, splitter=splitter, metrics=error_rate)
(source) splitter
is a function that takes self.model
and returns a list of parameter groups (or just one parameter group if there are no different parameter groups). The default is trainable_params, which returns all trainable parameters of the model.
def trainable_params(m):
"Return all trainable parameters of `m`"
return [p for p in m.parameters() if p.requires_grad]
There are 4 possiblities.
For example with 3 parameters groups, you can do the following (for a Learner
unfrozen and the use of learn.fit_one_cycle()
):
1. if lr_max = 1e-3 -> [0.001,0.001,0.001]
2. if lr_max = slice(1e-3) -> [0.0001,0.0001,0.001]
3. if lr_max = slice(1e-5,1e-3) -> array([1.e-05, 1.e-04, 1.e-03]) #LRs evenly geometrically spaced
4. if lr_max = [1e-5, 1e-4, 1e-3] -> array([1.e-05, 1.e-04, 1.e-03]) #LRs evenly linearly spaced or not
Points 3 and 4 are not equivalent for a number of parameters groups greater than 3!!!
Check the following graph to understand what means passing a slice with 2 values as list of Learning Rates (point 3).
from IPython.display import Image
Image("images/lrs.png")
#hide
from utils import *
print(f'cuda device: {torch.cuda.current_device()}')
print(f'cuda device name: {torch.cuda.get_device_name(0)}')
cuda device: 0 cuda device name: Tesla V100-PCIE-32GB
import warnings
warnings.filterwarnings('ignore')
lr_max=1e-2
div=25.
div_final=1e5
pct_start=0.25
p = torch.linspace(0.,1,100)
f = combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final)
plt.plot(p, [f(o) for o in p]);
# last values of cosine annealing for the 1cycle policy for lr_max = 1e-2
print(p[-2],f(p[-2]))
print(p[-1],f(p[-1]))
tensor(0.9899) 4.574851048952042e-06 tensor(1.) 9.999999999940612e-08
from fastai2.vision.all import *
path = untar_data(URLs.PETS)
#hide
Path.BASE_PATH = path
pets = DataBlock(blocks = (ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(seed=42),
get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'),
item_tfms=Resize(460),
batch_tfms=aug_transforms(size=224, min_scale=0.75))
dls = pets.dataloaders(path/"images")
dls.show_batch(nrows=1, ncols=3)
dls.vocab
(#37) ['Abyssinian','Bengal','Birman','Bombay','British_Shorthair','Egyptian_Mau','Maine_Coon','Persian','Ragdoll','Russian_Blue'...]
Change pretrained resnet18 with an output layer of 1000 classes to 37.
# model resnet18
m = resnet18(pretrained=True)
# last layer of m
# list(m.children())[-1]
m.fc
Linear(in_features=512, out_features=1000, bias=True)
# number of classes of my dls
num_classes = len(dls.vocab)
num_classes
37
# HEAD of the model: remplace last linear layer of 1000 classes by one with 37 classes
# source: https://discuss.pytorch.org/t/resnet-last-layer-modification/33530/2
my_model = m
my_model.fc = nn.Linear(512, num_classes)
# last layer of m
list(my_model.children())[-1]
Linear(in_features=512, out_features=37, bias=True)
(source) splitter
is a function that takes self.model
and returns a list of parameter groups (or just one parameter group if there are no different parameter groups). The default is trainable_params, which returns all trainable parameters of the model.
def trainable_params(m):
"Return all trainable parameters of `m`"
return [p for p in m.parameters() if p.requires_grad]
# number of layers groups of my_model
len(list(my_model.children()))
10
Therefore, if we don't pass a splitter function to our Learner
, only one Learning Rate will be applied to all layers of my_model.
As there are 10 layers groups in my_model, let's create a splitter
function that distributes the parameters of my_model to 10 parameters groups.
def splitter(m):
groups = [group for group in m.children()]
groups = L(groups)
return groups.map(params)
# number of parameters groups of my_model
len(splitter(my_model))
10
Without splitter, there is 1 parameters group by default and therefore, 1 Learning Rate (1e-3 by default).
learn = Learner(dls, my_model, metrics=error_rate)
# Check the number of parameters groups...
learn.create_opt()
print(f'number of parameters groups: {len(learn.opt.param_groups)}')
# ... and the list of Learning Rates (before its atualization by the Optimizer of the function fit_one_cycle())
for i,h in enumerate(learn.opt.hypers):
print(i,h)
number of parameters groups: 1 0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}
With our splitter
, there are 10 parameters groups and automatically, the same Learning Rate (1e-3 by default) for each group.
learn = Learner(dls, my_model, splitter=splitter, metrics=error_rate)
# Check the number of parameters groups...
learn.create_opt()
print(f'number of parameters groups: {len(learn.opt.param_groups)}')
# ... and the list of Learning Rates (before its atualization by the Optimizer of the function fit_one_cycle())
for i,h in enumerate(learn.opt.hypers):
print(i,h)
number of parameters groups: 10 0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 1 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 2 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 3 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 4 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 5 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 6 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 7 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 8 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 9 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}
learn = Learner(dls, my_model, splitter=splitter, metrics=error_rate)
learn.freeze()
learn.summary()
ResNet (Input shape: ['64 x 3 x 224 x 224']) ================================================================ Layer (type) Output Shape Param # Trainable ================================================================ Conv2d 64 x 64 x 112 x 112 9,408 False ________________________________________________________________ BatchNorm2d 64 x 64 x 112 x 112 128 True ________________________________________________________________ ReLU 64 x 64 x 112 x 112 0 False ________________________________________________________________ MaxPool2d 64 x 64 x 56 x 56 0 False ________________________________________________________________ Conv2d 64 x 64 x 56 x 56 36,864 False ________________________________________________________________ BatchNorm2d 64 x 64 x 56 x 56 128 True ________________________________________________________________ ReLU 64 x 64 x 56 x 56 0 False ________________________________________________________________ Conv2d 64 x 64 x 56 x 56 36,864 False ________________________________________________________________ BatchNorm2d 64 x 64 x 56 x 56 128 True ________________________________________________________________ Conv2d 64 x 64 x 56 x 56 36,864 False ________________________________________________________________ BatchNorm2d 64 x 64 x 56 x 56 128 True ________________________________________________________________ ReLU 64 x 64 x 56 x 56 0 False ________________________________________________________________ Conv2d 64 x 64 x 56 x 56 36,864 False ________________________________________________________________ BatchNorm2d 64 x 64 x 56 x 56 128 True ________________________________________________________________ Conv2d 64 x 128 x 28 x 28 73,728 False ________________________________________________________________ BatchNorm2d 64 x 128 x 28 x 28 256 True ________________________________________________________________ ReLU 64 x 128 x 28 x 28 0 False ________________________________________________________________ Conv2d 64 x 128 x 28 x 28 147,456 False ________________________________________________________________ BatchNorm2d 64 x 128 x 28 x 28 256 True ________________________________________________________________ Conv2d 64 x 128 x 28 x 28 8,192 False ________________________________________________________________ BatchNorm2d 64 x 128 x 28 x 28 256 True ________________________________________________________________ Conv2d 64 x 128 x 28 x 28 147,456 False ________________________________________________________________ BatchNorm2d 64 x 128 x 28 x 28 256 True ________________________________________________________________ ReLU 64 x 128 x 28 x 28 0 False ________________________________________________________________ Conv2d 64 x 128 x 28 x 28 147,456 False ________________________________________________________________ BatchNorm2d 64 x 128 x 28 x 28 256 True ________________________________________________________________ Conv2d 64 x 256 x 14 x 14 294,912 False ________________________________________________________________ BatchNorm2d 64 x 256 x 14 x 14 512 True ________________________________________________________________ ReLU 64 x 256 x 14 x 14 0 False ________________________________________________________________ Conv2d 64 x 256 x 14 x 14 589,824 False ________________________________________________________________ BatchNorm2d 64 x 256 x 14 x 14 512 True ________________________________________________________________ Conv2d 64 x 256 x 14 x 14 32,768 False ________________________________________________________________ BatchNorm2d 64 x 256 x 14 x 14 512 True ________________________________________________________________ Conv2d 64 x 256 x 14 x 14 589,824 False ________________________________________________________________ BatchNorm2d 64 x 256 x 14 x 14 512 True ________________________________________________________________ ReLU 64 x 256 x 14 x 14 0 False ________________________________________________________________ Conv2d 64 x 256 x 14 x 14 589,824 False ________________________________________________________________ BatchNorm2d 64 x 256 x 14 x 14 512 True ________________________________________________________________ Conv2d 64 x 512 x 7 x 7 1,179,648 False ________________________________________________________________ BatchNorm2d 64 x 512 x 7 x 7 1,024 True ________________________________________________________________ ReLU 64 x 512 x 7 x 7 0 False ________________________________________________________________ Conv2d 64 x 512 x 7 x 7 2,359,296 False ________________________________________________________________ BatchNorm2d 64 x 512 x 7 x 7 1,024 True ________________________________________________________________ Conv2d 64 x 512 x 7 x 7 131,072 False ________________________________________________________________ BatchNorm2d 64 x 512 x 7 x 7 1,024 True ________________________________________________________________ Conv2d 64 x 512 x 7 x 7 2,359,296 False ________________________________________________________________ BatchNorm2d 64 x 512 x 7 x 7 1,024 True ________________________________________________________________ ReLU 64 x 512 x 7 x 7 0 False ________________________________________________________________ Conv2d 64 x 512 x 7 x 7 2,359,296 False ________________________________________________________________ BatchNorm2d 64 x 512 x 7 x 7 1,024 True ________________________________________________________________ AdaptiveAvgPool2d 64 x 512 x 1 x 1 0 False ________________________________________________________________ Linear 64 x 37 18,981 True ________________________________________________________________ Total params: 11,195,493 Total trainable params: 28,581 Total non-trainable params: 11,166,912 Optimizer used: <function Adam at 0x7f17e9b13320> Loss function: FlattenedLoss of CrossEntropyLoss() Model frozen up to parameter group number 9 Callbacks: - TrainEvalCallback - Recorder - ProgressCallback
List of Learning Rates before its atualization by the Optimizer of the function fit_one_cycle()
# Check the number of parameters groups...
learn.create_opt()
print(f'number of parameters groups: {len(learn.opt.param_groups)}')
# ... and the list of Learning Rates (before its atualization by the Optimizer of the function fit_one_cycle())
for i,h in enumerate(learn.opt.hypers):
print(i,h)
number of parameters groups: 10 0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 1 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 2 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 3 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 4 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 5 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 6 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 7 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 8 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05} 9 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}
learn.lr_find()
SuggestedLRs(lr_min=0.00831763744354248, lr_steep=0.010964781977236271)
learn.fit_one_cycle(1, lr_max=1e-2)
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.704656 | 0.333663 | 0.115020 | 00:13 |
learn.save('my_resnet18_finetuned')
List of Learning Rates: last values of cosine annealing for the 1cycle policy with lr_max = 1e-2
# Check the number of parameters groups...
print(f'number of parameters groups: {len(learn.opt.param_groups)}')
# ... and the list of Learning Rates (last values of cosine annealing for the 1cycle policy with lr_max = 1e-2)
for i,h in enumerate(learn.opt.hypers):
print(i,h)
number of parameters groups: 10 0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05} 1 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05} 2 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05} 3 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05} 4 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05} 5 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05} 6 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05} 7 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05} 8 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05} 9 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05}
Conclusion
We can verify than the Learning Rates of all parameters groups are identical but only the last parameters group has been updated as the Learner
was frozen.
Hypothese: all parameters groups with the same (max) Learning Rate (lr_max = 1e-3).
learn.load('my_resnet18_finetuned')
learn.unfreeze()
learn.fit_one_cycle(3, lr_max=1e-3)
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.861377 | 3.378591 | 0.577808 | 00:16 |
1 | 0.615114 | 0.678442 | 0.223275 | 00:16 |
2 | 0.307324 | 0.321883 | 0.110284 | 00:16 |
List of Learning Rates: last values of cosine annealing for the 1cycle policy with lr_max = 1e-3
# Check the number of parameters groups...
print(f'number of parameters groups: {len(learn.opt.param_groups)}')
# ... and the list of Learning Rates (last values of cosine annealing for the 1cycle policy with lr_max = 1e-3)
for i,h in enumerate(learn.opt.hypers):
print(i,h)
number of parameters groups: 10 0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 1 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 2 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 3 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 4 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 5 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 6 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 7 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 8 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 9 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}
Conclusion
We can verify than the Learning Rates of all parameters groups are identical.
Hypothese
learn.load('my_resnet18_finetuned')
learn.unfreeze()
learn.fit_one_cycle(3, lr_max=slice(1e-3))
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.346102 | 0.320928 | 0.106901 | 00:16 |
1 | 0.242630 | 0.249166 | 0.086604 | 00:16 |
2 | 0.151128 | 0.233383 | 0.087957 | 00:16 |
List of Learning Rates: last values of cosine annealing for the 1cycle policy with lr_max = slice(1e-3)
# Check the number of parameters groups...
print(f'number of parameters groups: {len(learn.opt.param_groups)}')
# ... and the list of Learning Rates (last values of cosine annealing for the 1cycle policy with lr_max = slice(1e-3))
for i,h in enumerate(learn.opt.hypers):
print(i,h)
number of parameters groups: 10 0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05} 1 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05} 2 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05} 3 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05} 4 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05} 5 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05} 6 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05} 7 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05} 8 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05} 9 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}
Conclusion
Hypothese
learn.load('my_resnet18_finetuned')
learn.unfreeze()
learn.fit_one_cycle(3, lr_max=slice(1e-3/100,1e-3))
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.435387 | 0.542696 | 0.178620 | 00:17 |
1 | 0.306988 | 0.319851 | 0.106225 | 00:17 |
2 | 0.151053 | 0.234533 | 0.075778 | 00:16 |
List of Learning Rates: last values of cosine annealing for the 1cycle policy with lr_max = slice(1e-5,1e-3)
# Check the number of parameters groups...
print(f'number of parameters groups: {len(learn.opt.param_groups)}')
# ... and the list of Learning Rates (last values of cosine annealing for the 1cycle policy with lr_max = slice(1e-5,1e-3))
for i,h in enumerate(learn.opt.hypers):
print(i,h)
number of parameters groups: 10 0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103818059e-10, 'mom': 0.9499942417973141, 'eps': 1e-05} 1 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 1.1273265478161222e-09, 'mom': 0.9499942417973141, 'eps': 1e-05} 2 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 1.880494020012536e-09, 'mom': 0.9499942417973141, 'eps': 1e-05} 3 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 3.1368530849825837e-09, 'mom': 0.9499942417973141, 'eps': 1e-05} 4 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.232586316170941e-09, 'mom': 0.9499942417973141, 'eps': 1e-05} 5 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 8.7284800449677e-09, 'mom': 0.9499942417973141, 'eps': 1e-05} 6 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 1.4559982251924255e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 7 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 2.428751421610113e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 8 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 4.051401551101539e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 9 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}
lr_list = list()
for h in learn.opt.hypers:
lr_list.append(h['lr'])
lr_first = lr_list[0]
lr_last = lr_list[-1]
inter = (lr_last - lr_first) / (len(learn.opt.param_groups) - 1)
lr_list_calculated = [lr_first+i*inter for i in range(len(learn.opt.param_groups))]
fig, ax = plt.subplots()
p = np.linspace(0,9,10)
ax.plot(p, lr_list, label='last lr values')
ax.plot(p, lr_list_calculated, label='calculated last lr values')
leg = ax.legend();
Conclusion
Each parameters group has a Learning Rate different from the smallest to the highest value given by a list.
learn.load('my_resnet18_finetuned')
learn.unfreeze()
lr_lastlayer = 1e-3
lr_firstlayer = lr_lastlayer / 100
inter = (lr_lastlayer - lr_firstlayer) / (len(learn.opt.param_groups) - 1) # 9 intervals
lr_max = [lr_firstlayer + i*inter for i in range(len(learn.opt.param_groups) - 1)]
lr_max.append(lr_lastlayer)
len(lr_max), lr_max
(10, [1e-05, 0.00012, 0.00023, 0.00034, 0.00045000000000000004, 0.0005600000000000001, 0.00067, 0.0007800000000000001, 0.0008900000000000001, 0.001])
learn.fit_one_cycle(3, lr_max=lr_max)
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.688318 | 1.270768 | 0.347091 | 00:16 |
1 | 0.501638 | 0.422770 | 0.140731 | 00:16 |
2 | 0.239655 | 0.272979 | 0.088633 | 00:16 |
List of Learning Rates: last values of cosine annealing for the 1cycle policy with lr_max = [lr1,lr2,lr3,lr4,lr5,lr6,lr7,lr8,lr9,lr10]
# Check the number of parameters groups...
print(f'number of parameters groups: {len(learn.opt.param_groups)}')
# ... and the list of Learning Rates (last values of cosine annealing for the 1cycle policy with lr_max = [lr1,lr2,lr3,lr4,lr5,lr6,lr7,lr8,lr9,lr10])
for i,h in enumerate(learn.opt.hypers):
print(i,h)
number of parameters groups: 10 0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103818059e-10, 'mom': 0.9499942417973141, 'eps': 1e-05} 1 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 8.10977412458167e-09, 'mom': 0.9499942417973141, 'eps': 1e-05} 2 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 1.554373373877137e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 3 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 2.297769335300173e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 4 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 3.041165296717788e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 5 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 3.7845612581299815e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 6 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 4.527957219563859e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 7 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.271353180976053e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 8 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.014749142399089e-08, 'mom': 0.9499942417973141, 'eps': 1e-05} 9 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}
lr_list = list()
for h in learn.opt.hypers:
lr_list.append(h['lr'])
lr_first = lr_list[0]
lr_last = lr_list[-1]
inter = (lr_last - lr_first) / (len(learn.opt.param_groups) - 1)
lr_list_calculated = [lr_first+i*inter for i in range(len(learn.opt.param_groups))]
fig, ax = plt.subplots()
p = np.linspace(0,9,10)
ax.plot(p, lr_list, 'o--', label='last lr values')
ax.plot(p, lr_list_calculated, label='calculated last lr values')
leg = ax.legend();
Conclusion
learn.load('my_resnet18_finetuned')
learn.unfreeze()
lr_lastlayer = 1e-3
lr_firstlayer = lr_lastlayer / 100
inter = (lr_lastlayer - lr_firstlayer) / ( len(learn.opt.param_groups) - 2 ) # 8 intervals
lr_max = [lr_firstlayer + i*inter for i in range(len(learn.opt.param_groups) - 2)]
lr_max.append(lr_lastlayer)
len(lr_max), lr_max
(9, [1e-05, 0.00013375, 0.0002575, 0.00038125, 0.000505, 0.00062875, 0.0007525, 0.0008762500000000001, 0.001])
learn.fit_one_cycle(3, lr_max=lr_max)
--------------------------------------------------------------------------- AssertionError Traceback (most recent call last) <ipython-input-45-e448c5702398> in <module> ----> 1 learn.fit_one_cycle(3, lr_max=lr_max) ~/.conda/envs/fastai2/lib/python3.7/site-packages/fastcore/utils.py in _f(*args, **kwargs) 429 init_args.update(log) 430 setattr(inst, 'init_args', init_args) --> 431 return inst if to_return else f(*args, **kwargs) 432 return _f 433 ~/.conda/envs/fastai2/lib/python3.7/site-packages/fastai2/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt) 107 "Fit `self.model` for `n_epoch` using the 1cycle policy." 108 if self.opt is None: self.create_opt() --> 109 self.opt.set_hyper('lr', self.lr if lr_max is None else lr_max) 110 lr_max = np.array([h['lr'] for h in self.opt.hypers]) 111 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final), ~/.conda/envs/fastai2/lib/python3.7/site-packages/fastai2/optimizer.py in set_hyper(self, k, v) 42 v = L(v, use_list=None) 43 if len(v)==1: v = v*len(self.param_lists) ---> 44 assert len(v) == len(self.hypers), f"Trying to set {len(v)} values for {k} but there are {len(self.param_lists)} parameter groups." 45 self._set_hyper(k, v) 46 AssertionError: Trying to set 9 values for lr but there are 10 parameter groups.
AssertionError Explication: we can not pass an array of 9 learning rates because there are 10 parameters groups.