The objective of this notebook is to show that the sizes of pkl files created by learn.export() of fastai v2 are different depending on the batch size used. This is odd, no?
#hide
from utils import *
from fastai2.vision.widgets import *
# !kaggle competitions download -c dogs-vs-cats-redux-kernels-edition
Downloading dogs-vs-cats-redux-kernels-edition.zip to /mnt/home/pierre/course-v4/nbs 100%|███████████████████████████████████████▉| 813M/814M [02:10<00:00, 6.83MB/s] 100%|████████████████████████████████████████| 814M/814M [02:12<00:00, 6.43MB/s]
#!mv dogs-vs-cats-redux-kernels-edition.zip data/
#%cd data
#!unzip dogs-vs-cats-redux-kernels-edition.zip
#!unzip train.zip
#!unzip test.zip
#!mkdir dogs-vs-cats
#!mv train/ train.zip test/ test.zip sample_submission.csv dogs-vs-cats/
Archive: dogs-vs-cats-redux-kernels-edition.zip inflating: sample_submission.csv inflating: test.zip inflating: train.zip total 1667592 -rw-rw-rw- 1 pierre pierre 853083403 May 6 12:54 dogs-vs-cats-redux-kernels-edition.zip drwxrwxrwx 3 pierre pierre 4096 May 6 12:54 flowers -rw-rw-r-- 1 pierre pierre 113903 Dec 11 04:18 sample_submission.csv -rw-rw-r-- 1 pierre pierre 284478493 Dec 11 04:18 test.zip -rw-rw-r-- 1 pierre pierre 569918665 Dec 11 04:18 train.zip
%pwd
'/mnt/home/pierre/course-v4/nbs'
path_data = Path('data/dogs-vs-cats/')
fns = get_image_files(path_data/'train')
fns_test = get_image_files(path_data/'test')
fns, len(fns)
((#25000) [Path('data/dogs-vs-cats/train/cat.1000.jpg'),Path('data/dogs-vs-cats/train/cat.10046.jpg'),Path('data/dogs-vs-cats/train/cat.10140.jpg'),Path('data/dogs-vs-cats/train/cat.10155.jpg'),Path('data/dogs-vs-cats/train/cat.10158.jpg'),Path('data/dogs-vs-cats/train/cat.10635.jpg'),Path('data/dogs-vs-cats/train/cat.10705.jpg'),Path('data/dogs-vs-cats/train/cat.10847.jpg'),Path('data/dogs-vs-cats/train/cat.10880.jpg'),Path('data/dogs-vs-cats/train/cat.10983.jpg')...], 25000)
dest = fns[0]
im = Image.open(dest)
print(dest)
im.to_thumb(128,128)
data/dogs-vs-cats/train/cat.1000.jpg
failed = verify_images(fns)
failed
(#0) []
# failed.map(Path.unlink);
path_data.ls()
(#2) [Path('data/dogs-vs-cats/test'),Path('data/dogs-vs-cats/train')]
fname = fns[0]
fname.name, re.findall(r'(.+)\.\d+.jpg$', fname.name)
('cat.1000.jpg', ['cat'])
dblock = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=using_attr(RegexLabeller(r'(.+)\.\d+.jpg$'), 'name'),
item_tfms=RandomResizedCrop(460, min_scale=0.5),
batch_tfms= aug_transforms(size=224, min_scale=0.75))
dls_512 = dblock.dataloaders(path_data/'train', bs=512)
dls_256 = dblock.dataloaders(path_data/'train', bs=256)
dls_128 = dblock.dataloaders(path_data/'train', bs=128)
dls_64 = dblock.dataloaders(path_data/'train', bs=64)
dls_32 = dblock.dataloaders(path_data/'train', bs=32)
dls_16 = dblock.dataloaders(path_data/'train', bs=16)
dls_8 = dblock.dataloaders(path_data/'train', bs=8)
/mnt/home/pierre/.conda/envs/fastai2/lib/python3.7/site-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
dls_512.train.show_batch(max_n=8, nrows=2, unique=True)
dls_512.valid.show_batch(max_n=4, nrows=1)
learn = cnn_learner(dls_512, resnet18, metrics=error_rate)
learn.fit(n_epoch=1)
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.133391 | 0.040558 | 0.012000 | 00:41 |
learn.export(fname='export_512.pkl')
learn = cnn_learner(dls_256, resnet18, metrics=error_rate)
learn.fit(n_epoch=1)
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.111129 | 0.040826 | 0.013200 | 00:35 |
/mnt/home/pierre/.conda/envs/fastai2/lib/python3.7/site-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
learn.export(fname='export_256.pkl')
learn = cnn_learner(dls_128, resnet18, metrics=error_rate)
learn.fit(n_epoch=1)
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.093232 | 0.036539 | 0.012000 | 00:35 |
/mnt/home/pierre/.conda/envs/fastai2/lib/python3.7/site-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
learn.export(fname='export_128.pkl')
learn = cnn_learner(dls_64, resnet18, metrics=error_rate)
learn.fit(n_epoch=1)
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.094818 | 0.033555 | 0.012600 | 00:37 |
/mnt/home/pierre/.conda/envs/fastai2/lib/python3.7/site-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
learn.export(fname='export_64.pkl')
learn = cnn_learner(dls_32, resnet18, metrics=error_rate)
learn.fit(n_epoch=1)
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.080218 | 0.033993 | 0.012200 | 00:42 |
/mnt/home/pierre/.conda/envs/fastai2/lib/python3.7/site-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
learn.export(fname='export_32.pkl')
learn = cnn_learner(dls_16, resnet18, metrics=error_rate)
learn.fit(n_epoch=1)
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.135674 | 0.044286 | 0.013800 | 01:05 |
/mnt/home/pierre/.conda/envs/fastai2/lib/python3.7/site-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
learn.export(fname='export_16.pkl')
learn = cnn_learner(dls_8, resnet18, metrics=error_rate)
learn.fit(n_epoch=1)
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.183321 | 0.057115 | 0.018600 | 02:05 |
/mnt/home/pierre/.conda/envs/fastai2/lib/python3.7/site-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
learn.export(fname='export_8.pkl')
files = !ls -lh *.pkl
fnames = [re.findall(r'.+(export[_\d+]*.pkl)$', f)[0] for f in files]
fsizes = [re.findall(r'.+pierre[ ]{1,}(\d+)M.+$', f)[0] for f in files]
fdict = dict(zip(fnames, fsizes))
fdict_sorted = dict()
for k, v in sorted(fdict.items(), key=lambda item: int(item[1])):
fdict_sorted[k]=v+'M'
pd.DataFrame.from_dict(fdict_sorted, orient='index', columns=['size'])
size | |
---|---|
export_128.pkl | 52M |
export_16.pkl | 52M |
export_32.pkl | 52M |
export_64.pkl | 52M |
export_8.pkl | 52M |
export_256.pkl | 125M |
export_512.pkl | 272M |