In this notebook, we show how transfer learning can be applied to detecting the sentiment of amazon reviews, between positive and negative reviews.
This notebook uses the work from Howard and Ruder, Ulmfit. The idea of the paper (and it implementation explained in the fast.ai deep learning course) is to learn a language model trained on a very large dataset, e.g. a Wikipedia dump. The intuition is that if a model is able to predict the next word at each word, it means it has learnt something about the structure of the language we are using.
Word2vec and the likes have lead to huge improvements on various NLP tasks. This could be seen as a first step to transfer learning, where the pre-trained word vectors correspond to a transfer of the embedding layer. The ambition of Ulmfit (and others like ELMO or the Transformer language model recently introduced) is to progressively move the NLP field to the state where Computer Vision has risen thanks to the ImageNet challenge. Thanks to the ImageNet challenge, today it is easy to download a model pre-trained on massive dataset of images, remove the last layer and replace it by a classifier or a regressor depending on the interest.
With Ulmfit, the goal is for everyone to be able to use a pre-trained language model and use it a backbone which we can use along with a classifier and a regressor. The game-changing apect of transfer learning is that we are no longer limited by the size of training data! With only a fraction of the data size that was necessary before, we can train a classifier/regressor and have very good result with few labelled data.
Given that labelled text data are difficult to get, in comparison with unlabelled text data which is almost infinite, transfer learning is likely to change radically the field of NLP, and help lead to a maturity state closer to computer vision.
The architecture for the language model used in ULMFit is the AWD-LSTM language model by Merity.
While we are using this language model for this experiment, we keep an eye open to a recently proposed character language model with Contextual String Embedings by Akbik.
This notebook illustrate the power of Ulmfit on a dataset of Amazon reviews available on Kaggle at https://www.kaggle.com/bittlingmayer/amazonreviews/home. We use code from the excellent fastai course and use it for a different dataset. The original code is available at https://github.com/fastai/fastai/tree/master/courses/dl2
The data consists of 4M reviews that are either positives or negatives. Training a model with FastText classifier results in a f1 score of 0.916. We show that using only a fraction of this dataset we are able to reach similar and even better results.
We encourage you to try it on your own tasks! Note that if you are interested in Regression instead of classification, you can also do it following this advice.
The notebook is organized as such:
We have then the backbone of our algorithm: a pre-trained language model fine-tuned on Amazon reviews
We end this notebook by looking at the specific effect of training size on the overall performance. This is to test the hypothesis that the ULMFit model does not need much labeled data to perform well.
Before starting, you should download the data from https://www.kaggle.com/bittlingmayer/amazonreviews, and put the extracted files into an ./Amazon folder somewhere you like, and use this path for this notebook.
Also, we recommend working on a dedicated environment (e.g. mkvirtualenv fastai). Then clone the fastai github repo https://github.com/fastai/fastai and install requirements.
from fastai.text import *
import html
import os
import pandas as pd
import pickle
import re
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, \
confusion_matrix
from sklearn.model_selection import train_test_split
from time import time
path = '/your/path/to/folder/Amazon'
train = []
with open(os.path.join(path, 'train.ft.txt'), 'r') as file:
for line in file:
train.append(file.readline())
test = []
with open(os.path.join(path, 'test.ft.txt'), 'r') as file:
for line in file:
test.append(file.readline())
print(f'The train data contains {len(train)} examples')
print(f'The test data contains {len(test)} examples')
BOS = 'xbos' # beginning-of-sentence tag
FLD = 'xfld' # data field tag
PATH=Path('/your/path/to/folder/Amazon')
CLAS_PATH=PATH/'amazon_class'
CLAS_PATH.mkdir(exist_ok=True)
LM_PATH=PATH/'amazon_lm'
LM_PATH.mkdir(exist_ok=True)
# Each item is '__label__1/2' and then the review so we split to get texts and labels
trn_texts,trn_labels = [text[10:] for text in train], [text[:10] for text in train]
trn_labels = [0 if label == '__label__1' else 1 for label in trn_labels]
val_texts,val_labels = [text[10:] for text in test], [text[:10] for text in test]
val_labels = [0 if label == '__label__1' else 1 for label in val_labels]
# Following fast.ai recommendations we put our data in pandas dataframes
col_names = ['labels','text']
df_trn = pd.DataFrame({'text':trn_texts, 'labels':trn_labels}, columns=col_names)
df_val = pd.DataFrame({'text':val_texts, 'labels':val_labels}, columns=col_names)
df_trn.head(10)
df_trn.to_csv(CLAS_PATH/'train.csv', header=False, index=False)
df_val.to_csv(CLAS_PATH/'test.csv', header=False, index=False)
CLASSES = ['neg', 'pos']
(CLAS_PATH/'classes.txt').open('w').writelines(f'{o}\n' for o in CLASSES)
# We're going to fine tune the language model so it's ok to take some of the test set in our train data
# for the lm fine-tuning
trn_texts,val_texts = train_test_split(np.concatenate([trn_texts,val_texts]), test_size=0.1)
df_trn = pd.DataFrame({'text':trn_texts, 'labels':[0]*len(trn_texts)}, columns=col_names)
df_val = pd.DataFrame({'text':val_texts, 'labels':[0]*len(val_texts)}, columns=col_names)
df_trn.to_csv(LM_PATH/'train.csv', header=False, index=False)
df_val.to_csv(LM_PATH/'test.csv', header=False, index=False)
# Here we use functions from the fast.ai course to get data
chunksize=24000
re1 = re.compile(r' +')
def fixup(x):
x = x.replace('#39;', "'").replace('amp;', '&').replace('#146;', "'").replace(
'nbsp;', ' ').replace('#36;', '$').replace('\\n', "\n").replace('quot;', "'").replace(
'<br />', "\n").replace('\\"', '"').replace('<unk>','u_n').replace(' @.@ ','.').replace(
' @-@ ','-').replace('\\', ' \\ ')
return re1.sub(' ', html.unescape(x))
def get_texts(df, n_lbls=1):
labels = df.iloc[:,range(n_lbls)].values.astype(np.int64)
texts = f'\n{BOS} {FLD} 1 ' + df[n_lbls].astype(str)
for i in range(n_lbls+1, len(df.columns)):
texts += f' {FLD} {i-n_lbls} ' + df[i].astype(str)
texts = list(texts.apply(fixup).values)
tok = Tokenizer().proc_all_mp(partition_by_cores(texts))
return tok, list(labels)
def get_all(df, n_lbls):
tok, labels = [], []
for i, r in enumerate(df):
print(i)
tok_, labels_ = get_texts(r, n_lbls)
tok += tok_;
labels += labels_
return tok, labels
df_trn = pd.read_csv(LM_PATH/'train.csv', header=None, chunksize=chunksize)
df_val = pd.read_csv(LM_PATH/'test.csv', header=None, chunksize=chunksize)
# This cell can take quite some time if your dataset is large
# Run it once and comment it for later use
tok_trn, trn_labels = get_all(df_trn, 1)
tok_val, val_labels = get_all(df_val, 1)
# Run this cell once and comment everything but the load statements for later use
(LM_PATH/'tmp').mkdir(exist_ok=True)
np.save(LM_PATH/'tmp'/'tok_trn.npy', tok_trn)
np.save(LM_PATH/'tmp'/'tok_val.npy', tok_val)
tok_trn = np.load(LM_PATH/'tmp'/'tok_trn.npy')
tok_val = np.load(LM_PATH/'tmp'/'tok_val.npy')
# Check the most common tokens
freq = Counter(p for o in tok_trn for p in o)
freq.most_common(25)
# Check the least common tokens
freq.most_common()[-25:]
# Build your vocabulary by keeping only the most common tokens that appears frequently enough
# and constrain the size of your vocabulary. We follow here the 60k recommendation.
max_vocab = 60000
min_freq = 2
itos = [o for o,c in freq.most_common(max_vocab) if c>min_freq]
itos.insert(0, '_pad_')
itos.insert(0, '_unk_')
stoi = collections.defaultdict(lambda:0, {v:k for k,v in enumerate(itos)})
len(itos)
trn_lm = np.array([[stoi[o] for o in p] for p in tok_trn])
val_lm = np.array([[stoi[o] for o in p] for p in tok_val])
np.save(LM_PATH/'tmp'/'trn_ids.npy', trn_lm)
np.save(LM_PATH/'tmp'/'val_ids.npy', val_lm)
pickle.dump(itos, open(LM_PATH/'tmp'/'itos.pkl', 'wb'))
# Save everything
trn_lm = np.load(LM_PATH/'tmp'/'trn_ids.npy')
val_lm = np.load(LM_PATH/'tmp'/'val_ids.npy')
itos = pickle.load(open(LM_PATH/'tmp'/'itos.pkl', 'rb'))
vs=len(itos)
vs,len(trn_lm)
# Uncomment this cell to download the pre-trained model.
# It will be placed into the PATH that you defined earlier.
# ! wget -nH -r -np -P {PATH} http://files.fast.ai/models/wt103/
# Load the weights of the model
em_sz,nh,nl = 400,1150,3
PRE_PATH = PATH/'models'/'wt103'
PRE_LM_PATH = PRE_PATH/'fwd_wt103.h5'
wgts = torch.load(PRE_LM_PATH, map_location=lambda storage, loc: storage)
# Check the word embedding layer and keep a 'mean word' for unknown tokens
enc_wgts = to_np(wgts['0.encoder.weight'])
row_m = enc_wgts.mean(0)
enc_wgts.shape
(238462, 400)
# Load the vocabulary on which the pre-trained model was trained
# Define an embedding matrix with the vocabulary of our dataset
itos2 = pickle.load((PRE_PATH/'itos_wt103.pkl').open('rb'))
stoi2 = collections.defaultdict(lambda:-1, {v:k for k,v in enumerate(itos2)})
new_w = np.zeros((vs, em_sz), dtype=np.float32)
for i,w in enumerate(itos):
r = stoi2[w]
new_w[i] = enc_wgts[r] if r>=0 else row_m
# Use the new embedding matrix for the pre-trained model
wgts['0.encoder.weight'] = T(new_w)
wgts['0.encoder_with_dropout.embed.weight'] = T(np.copy(new_w))
wgts['1.decoder.weight'] = T(np.copy(new_w))
# Define the learner object to do the fine-tuning
# Here we will freeze everything except the embedding layer, so that we can have a better
# embedding for unknown words than just the mean embedding on which we initialise it.
wd=1e-7
bptt=70
bs=52
opt_fn = partial(optim.Adam, betas=(0.8, 0.99))
trn_dl = LanguageModelLoader(np.concatenate(trn_lm), bs, bptt)
val_dl = LanguageModelLoader(np.concatenate(val_lm), bs, bptt)
md = LanguageModelData(PATH, 1, vs, trn_dl, val_dl, bs=bs, bptt=bptt)
drops = np.array([0.25, 0.1, 0.2, 0.02, 0.15])*0.7
learner= md.get_model(opt_fn, em_sz, nh, nl,
dropouti=drops[0], dropout=drops[1], wdrop=drops[2], dropoute=drops[3], dropouth=drops[4])
learner.metrics = [accuracy]
learner.freeze_to(-1)
learner.model.load_state_dict(wgts)
lr=1e-3
lrs = lr
# Run one epoch of fine-tuning
learner.fit(lrs/2, 1, wds=wd, use_clr=(32,2), cycle_len=1)
# Save the fine-tuned model and unfreeze everything to later fine-tune the whole model
learner.save('lm_last_ft')
learner.load('lm_last_ft')
learner.unfreeze()
learner.lr_find(start_lr=lrs/10, end_lr=lrs*10, linear=True)
learner.sched.plot()
# Run this if you want to highly tune the LM to the Amazon data, with 15 epochs
# use_clr controls the shape of the cyclical (triangular) learning rate
learner.fit(lrs, 1, wds=wd, use_clr=(20,10), cycle_len=15)
# Save the Backbone for further classification!!
learner.save('lm1')
learner.save_encoder('lm1_enc')
learner.sched.plot_loss()
Now that we spent some time fine-tuning the language model on our Amazon data, let's see if we can classify easily these reviews. As before, some cells should be run once, and then use data loaders for later use.
df_trn = pd.read_csv(CLAS_PATH/'train.csv', header=None, chunksize=chunksize)
df_val = pd.read_csv(CLAS_PATH/'test.csv', header=None, chunksize=chunksize)
tok_trn, trn_labels = get_all(df_trn, 1)
tok_val, val_labels = get_all(df_val, 1)
(CLAS_PATH/'tmp').mkdir(exist_ok=True)
np.save(CLAS_PATH/'tmp'/'tok_trn.npy', tok_trn)
np.save(CLAS_PATH/'tmp'/'tok_val.npy', tok_val)
np.save(CLAS_PATH/'tmp'/'trn_labels.npy', trn_labels)
np.save(CLAS_PATH/'tmp'/'val_labels.npy', val_labels)
tok_trn = np.load(CLAS_PATH/'tmp'/'tok_trn.npy')
tok_val = np.load(CLAS_PATH/'tmp'/'tok_val.npy')
itos = pickle.load((LM_PATH/'tmp'/'itos.pkl').open('rb'))
stoi = collections.defaultdict(lambda:0, {v:k for k,v in enumerate(itos)})
len(itos)
60002
trn_clas = np.array([[stoi[o] for o in p] for p in tok_trn])
val_clas = np.array([[stoi[o] for o in p] for p in tok_val])
np.save(CLAS_PATH/'tmp'/'trn_ids.npy', trn_clas)
np.save(CLAS_PATH/'tmp'/'val_ids.npy', val_clas)
In this part, we adopt an unusual train/test hierarchy. While it's common to train on a big dataset and thewn test on a small one, here we wanrt to test the hypothesis that the model can learn with few training data. Hence we take less data for training than for testing.
# We select here the 'size' first reviews of our dataset
# The paper claims that it's possible to achieve very good results with few labeled examples
# So let's try with 100 examples for training, and 5000 examples for validation.
# We encourage you to try different values to see the effect of data size on performance.
trn_size = 100
val_size = 5000
trn_clas = np.load(CLAS_PATH/'tmp'/'trn_ids.npy')
val_clas = np.load(CLAS_PATH/'tmp'/'val_ids.npy')
trn_labels = np.squeeze(np.load(CLAS_PATH/'tmp'/'trn_labels.npy'))
val_labels = np.squeeze(np.load(CLAS_PATH/'tmp'/'val_labels.npy'))
train = random.sample(list(zip(trn_clas, trn_labels)), trn_size)
trn_clas = np.array([item[0] for item in train])
trn_labels = np.array([item[1] for item in train])
del train
validation = random.sample(list(zip(val_clas, val_labels)), val_size)
val_clas = np.array([item[0] for item in validation])
val_labels = np.array([item[1] for item in validation])
del validation
bptt,em_sz,nh,nl = 70,400,1150,3
vs = len(itos)
opt_fn = partial(optim.Adam, betas=(0.8, 0.99))
bs = 48
min_lbl = trn_labels.min()
trn_labels -= min_lbl
val_labels -= min_lbl
c=int(trn_labels.max())+1
# Ccheck that the validation dataset is well balanced so acccuracy is a good metric
# We'll also check other metrics usual for binary classification (precision, recall, f1 score)
len(trn_labels[trn_labels == 1]) / len(trn_labels)
trn_ds = TextDataset(trn_clas, trn_labels)
val_ds = TextDataset(val_clas, val_labels)
trn_samp = SortishSampler(trn_clas, key=lambda x: len(trn_clas[x]), bs=bs//2)
val_samp = SortSampler(val_clas, key=lambda x: len(val_clas[x]))
trn_dl = DataLoader(trn_ds, bs//2, transpose=True, num_workers=1, pad_idx=1, sampler=trn_samp)
val_dl = DataLoader(val_ds, bs, transpose=True, num_workers=1, pad_idx=1, sampler=val_samp)
# We define the model, here it a classifier on top of an RNN language model
# We load the language model encoder that we fine tuned before
# We freeze everything but the last layer, so that we can train the classification layer only.
#load the saved weights from before, and freeze everything until the last layer
md = ModelData(PATH, trn_dl, val_dl)
dps = np.array([0.4, 0.5, 0.05, 0.3, 0.1])
m = get_rnn_classifier(bptt, 20*70, c, vs, emb_sz=em_sz, n_hid=nh, n_layers=nl, pad_token=1,
layers=[em_sz*3, 50, c], drops=[dps[4], 0.1],
dropouti=dps[0], wdrop=dps[1], dropoute=dps[2], dropouth=dps[3])
opt_fn = partial(optim.Adam, betas=(0.7, 0.99))
learn = RNN_Learner(md, TextModel(to_gpu(m)), opt_fn=opt_fn)
learn.reg_fn = partial(seq2seq_reg, alpha=2, beta=1)
learn.clip=25.
learn.metrics = [accuracy]
lr=3e-3
lrm = 2.6
lrs = np.array([lr/(lrm**4), lr/(lrm**3), lr/(lrm**2), lr/lrm, lr])
lrs=np.array([1e-4,1e-4,1e-4,1e-3,1e-2])
wd = 1e-7
wd = 0
learn.load_encoder('lm1_enc')
learn.freeze_to(-1)
learn.lr_find(lrs/1000)
learn.sched.plot()
# Run one epoch on the classification layer
learn.fit(lrs, 1, wds=wd, cycle_len=1, use_clr=(8,3))
# Save the trained model
learn.save('clas_0')
learn.load('clas_0')
# Gradually unfreeze another layer to train a bit more parameters than just the classifier layer
learn.freeze_to(-2)
learn.fit(lrs, 1, wds=wd, cycle_len=1, use_clr=(8,3))
# Save the trained model
learn.save('clas_1')
learn.load('clas_1')
# Unfreeze everything and train for a few epochs on the whole set of parameters of the model
learn.unfreeze()
learn.fit(lrs, 1, wds=wd, cycle_len=14, use_clr=(32,10))
learn.sched.plot_loss()
# Save the model
learn.save('clas_2')
Nonw, let's play with the model we've just learned!
m = get_rnn_classifer(bptt, 20*70, c, vs, emb_sz=em_sz, n_hid=nh, n_layers=nl, pad_token=1,
layers=[em_sz*3, 50, c], drops=[dps[4], 0.1],
dropouti=dps[0], wdrop=dps[1], dropoute=dps[2], dropouth=dps[3])
opt_fn = partial(optim.Adam, betas=(0.7, 0.99))
learn = RNN_Learner(md, TextModel(to_gpu(m)), opt_fn=opt_fn)
learn.reg_fn = partial(seq2seq_reg, alpha=2, beta=1)
learn.clip=25.
learn.metrics = [accuracy]
lr=3e-3
lrm = 2.6
lrs = np.array([lr/(lrm**4), lr/(lrm**3), lr/(lrm**2), lr/lrm, lr])
wd = 1e-7
wd = 0
learn.load_encoder('lm1_enc')
learn.load('clas_2')
def get_sentiment(input_str: str):
# predictions are done on arrays of input.
# We only have a single input, so turn it into a 1x1 array
texts = [input_str]
# tokenize using the fastai wrapper around spacy
tok = [t.split() for t in texts]
# tok = Tokenizer().proc_all_mp(partition_by_cores(texts))
# turn into integers for each word
encoded = [stoi[p] for p in tok[0]]
idx = np.array(encoded)[None]
idx = np.transpose(idx)
tensorIdx = VV(idx)
m.eval()
m.reset()
p = m.forward(tensorIdx)
return np.argmax(p[0][0].data.cpu().numpy())
def prediction(texts):
"""Do the prediction on a list of texts
"""
y = []
for i, text in enumerate(texts):
if i % 1000 == 0:
print(i)
encoded = text
idx = np.array(encoded)[None]
idx = np.transpose(idx)
tensorIdx = VV(idx)
m.eval()
m.reset()
p = m.forward(tensorIdx)
y.append(np.argmax(p[0][0].data.cpu().numpy()))
return y
sentence = "I like Feedly"
start = time()
print(get_sentiment(sentence))
print(time() - start)
y = prediction(list(val_clas))
# Show relevant metrics for binary classification
# We encourage you to try training the classifier with different data size and its effect on performance
print(f'Accuracy --> {accuracy_score(y, val_labels)}')
print(f'Precision --> {precision_score(y, val_labels)}')
print(f'F1 score --> {f1_score(y, val_labels)}')
print(f'Recall score --> {recall_score(y, val_labels)}')
print(confusion_matrix(y, val_labels))
print(classification_report(y, val_labels))
The language model has already learnt a lot about the syntax. It is very knowledgeable about the context in which words appear in sentences. However, the language model does not contain any notion of meaning. This problem is well summarised in Emily Bender's tweet during a very interesting twiter thread that occur in July around meaning in NLP. A cool summary of this thread can be found in the Hugging Face blogpost. Hence the meaning in language is very likely to be learned through supervision, with the help of ground-truth examples.
However, when we perform some NLP tasks, sentiment analysis in our example, both syntax and meaning are important! The idea is that you can save a lot of time by being taught with a lot of blind synatx first, and then learning meaning. Think of when you start learning a complete new field. Well, it is far easier to learn it in your mother tongue than in another language you master less.
The big practical gain here is that once you "know" a language, you need less supervised examples to learn a new thing! In our example, it means we need less labeled reviews for us to learn a relevant classifier.
Let's verify this hypothesis by training a classifier with several training size and see how this size affects the performance!
trn_clas = np.load(CLAS_PATH/'tmp'/'trn_ids.npy')
val_clas = np.load(CLAS_PATH/'tmp'/'val_ids.npy')
trn_labels = np.squeeze(np.load(CLAS_PATH/'tmp'/'trn_labels.npy'))
val_labels = np.squeeze(np.load(CLAS_PATH/'tmp'/'val_labels.npy'))
def experiment(trn_size, val_size):
train = random.sample(list(zip(trn_clas, trn_labels)), trn_size)
aux_trn_clas = np.array([item[0] for item in train])
aux_trn_labels = np.array([item[1] for item in train])
del train
validation = random.sample(list(zip(val_clas, val_labels)), val_size)
aux_val_clas = np.array([item[0] for item in validation])
aux_val_labels = np.array([item[1] for item in validation])
del validation
bptt,em_sz,nh,nl = 70,400,1150,3
vs = len(itos)
opt_fn = partial(optim.Adam, betas=(0.8, 0.99))
bs = 48
min_lbl = aux_trn_labels.min()
aux_trn_labels -= min_lbl
aux_val_labels -= min_lbl
c=int(aux_trn_labels.max())+1
# Load data in relevant structures
trn_ds = TextDataset(aux_trn_clas, aux_trn_labels)
val_ds = TextDataset(aux_val_clas, aux_val_labels)
trn_samp = SortishSampler(aux_trn_clas, key=lambda x: len(aux_trn_clas[x]), bs=bs//2)
val_samp = SortSampler(aux_val_clas, key=lambda x: len(aux_val_clas[x]))
trn_dl = DataLoader(trn_ds, bs//2, transpose=True, num_workers=1, pad_idx=1, sampler=trn_samp)
val_dl = DataLoader(val_ds, bs, transpose=True, num_workers=1, pad_idx=1, sampler=val_samp)
# Define the model and load the backbone lamguage model
md = ModelData(PATH, trn_dl, val_dl)
dps = np.array([0.4, 0.5, 0.05, 0.3, 0.1])
m = get_rnn_classifier(bptt, 20*70, c, vs, emb_sz=em_sz, n_hid=nh, n_layers=nl, pad_token=1,
layers=[em_sz*3, 50, c], drops=[dps[4], 0.1],
dropouti=dps[0], wdrop=dps[1], dropoute=dps[2], dropouth=dps[3])
opt_fn = partial(optim.Adam, betas=(0.7, 0.99))
learn = RNN_Learner(md, TextModel(to_gpu(m)), opt_fn=opt_fn)
learn.reg_fn = partial(seq2seq_reg, alpha=2, beta=1)
learn.clip=25.
learn.metrics = [accuracy]
lr=3e-3
lrm = 2.6
lrs = np.array([lr/(lrm**4), lr/(lrm**3), lr/(lrm**2), lr/lrm, lr])
lrs=np.array([1e-4,1e-4,1e-4,1e-3,1e-2])
wd = 1e-7
wd = 0
learn.load_encoder('lm1_enc')
learn.freeze_to(-1)
# Find th learning rate
learn.lr_find(lrs/1000)
# Run one epoch on the classification layer
learn.fit(lrs, 1, wds=wd, cycle_len=1, use_clr=(8,3))
# Save the trained model
learn.save(f'{trn_size}clas_0')
learn.load(f'{trn_size}clas_0')
# Gradually unfreeze another layer to train a bit more parameters than just the classifier layer
learn.freeze_to(-2)
learn.fit(lrs, 1, wds=wd, cycle_len=1, use_clr=(8,3))
# Save the trained model
learn.save(f'{trn_size}clas_1')
learn.load(f'{trn_size}clas_1')
# Unfreeze everything and train for a few epochs on the whole set of parameters of the model
learn.unfreeze()
learn.fit(lrs, 1, wds=wd, cycle_len=14, use_clr=(32,10))
# Save the model
learn.sched.plot_loss()
learn.save(f'{trn_size}clas_2')
from time import time
val_size = 100000
for trn_size in [50, 100, 500, 1000, 5000, 10000, 20000, 50000]:
print('#'*50)
print(f'Experiment with training size {trn_size}')
start = time()
experiment(trn_size, val_size)
t = time() - start
print(f'Time cost: {t}')
################################################## Experiment with training size 50
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.739306 0.713452 0.515713
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.780253 0.682528 0.60368
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.480616 0.665205 0.64434
HBox(children=(IntProgress(value=0, description='Epoch', max=14), HTML(value='')))
1 0.60659 0.642443 0.60112 2 0.61519 0.619721 0.69608 3 0.642923 0.626678 0.61732 4 0.652647 0.660426 0.51752 5 0.602682 0.620081 0.5915 6 0.594284 0.584023 0.66818 7 0.58685 0.559354 0.73106 8 0.55382 0.540782 0.77018 9 0.52772 0.527295 0.79862 10 0.518118 0.487917 0.83798 11 0.518521 0.461052 0.84442 12 0.53044 0.453327 0.84558 13 0.510322 0.468408 0.83854 Time cost: 3158.540988445282 ################################################## Experiment with training size 100
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.598895 78.284828 0.482651
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.62033 0.664568 0.71318
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.602062 0.617134 0.74574
HBox(children=(IntProgress(value=0, description='Epoch', max=14), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.509279 0.616894 0.58494 1 0.528293 0.574365 0.69924 2 0.496826 0.544474 0.75798 3 0.478803 0.559163 0.6684 4 0.442439 0.568413 0.64396 5 0.45688 0.435576 0.82176 6 0.438374 0.401803 0.87232 7 0.435346 0.382793 0.86982 8 0.430963 0.38687 0.86138 9 0.421749 0.363613 0.86442 10 0.404818 0.347554 0.87324 11 0.402366 0.34878 0.8688 12 0.420744 0.341431 0.86758 13 0.405834 0.34154 0.86362 Time cost: 3164.5589134693146 ################################################## Experiment with training size 500
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.531424 0.558967 0.85856
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.427045 0.402448 0.88602
HBox(children=(IntProgress(value=0, description='Epoch', max=14), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.43276 0.325113 0.88386 1 0.439859 0.350954 0.85564 2 0.420882 0.301699 0.88072 3 0.408916 0.243965 0.91232 4 0.385137 0.265443 0.8924 5 0.374238 0.249731 0.89888 6 0.397431 0.265853 0.90392 7 0.388508 0.256725 0.90612 8 0.405042 0.269658 0.90676 9 0.3749 0.278558 0.89718 10 0.378312 0.280107 0.89688 11 0.368829 0.269968 0.90122 12 0.412016 0.274945 0.90104 13 0.399776 0.281551 0.89786 Time cost: 3095.5910897254944 ################################################## Experiment with training size 1000
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.538816 0.369876 0.90136
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.453464 0.315374 0.88258
HBox(children=(IntProgress(value=0, description='Epoch', max=14), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.404357 0.259631 0.90256 1 0.419865 0.254745 0.89808 2 0.445964 0.268253 0.89904 3 0.427022 0.229095 0.91462 4 0.414167 0.228874 0.91148 5 0.407483 0.219707 0.91912 6 0.381847 0.216046 0.9203 7 0.365503 0.219289 0.91962 8 0.358103 0.213313 0.92152 9 0.328652 0.219443 0.91694 10 0.360773 0.225698 0.9129 11 0.325618 0.216891 0.91786 12 0.358954 0.213793 0.91994 13 0.324676 0.217357 0.91804 Time cost: 3222.9498105049133 ################################################## Experiment with training size 5000
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
80%|████████ | 168/209 [00:33<00:08, 5.06it/s, loss=2.11]
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.476658 0.251208 0.91892
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.433952 0.231621 0.92414
HBox(children=(IntProgress(value=0, description='Epoch', max=14), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.441624 0.26157 0.91548 1 0.39728 0.216438 0.92384 2 0.409002 0.224356 0.92368 3 0.422964 0.215129 0.92208 5 0.323477 0.190459 0.92822 6 0.359594 0.204132 0.9299 7 0.364609 0.197063 0.92962 8 0.335434 0.195078 0.93054 9 0.344869 0.193901 0.93174 10 0.355132 0.204457 0.92736 11 0.361977 0.196434 0.92986 12 0.335396 0.200645 0.92896 13 0.327323 0.20609 0.92624 Time cost: 4408.779232263565 ################################################## Experiment with training size 10000
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
77%|███████▋ | 323/417 [00:54<00:15, 5.95it/s, loss=1.4]
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.442663 0.237719 0.91894
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.431919 0.23883 0.92334
HBox(children=(IntProgress(value=0, description='Epoch', max=14), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.423774 0.199739 0.92554 1 0.400266 0.206542 0.92344 2 0.327765 0.191927 0.93002 3 0.355688 0.193465 0.92908 4 0.336286 0.182849 0.93128 5 0.324608 0.18046 0.93278 6 0.314902 0.183413 0.93328 7 0.328284 0.178485 0.93288 8 0.337061 0.180216 0.93436 9 0.308937 0.179975 0.9341 10 0.290357 0.178364 0.93366 11 0.301147 0.175089 0.93584 12 0.267383 0.176672 0.934 13 0.305133 0.17432 0.93538 Time cost: 5908.472403526306 ################################################## Experiment with training size 20000
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
75%|███████▍ | 623/834 [01:46<00:36, 5.84it/s, loss=1.45]
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.425248 0.229867 0.91804
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.410012 0.210839 0.92766
HBox(children=(IntProgress(value=0, description='Epoch', max=14), HTML(value='')))
epoch trn_loss val_loss accuracy 0 0.418405 0.202191 0.92848 1 0.385172 0.21752 0.92934 2 0.341867 0.1879 0.93032 3 0.343511 0.176737 0.93358 4 0.299173 0.169992 0.9357 58%|█████▊ | 480/834 [03:46<02:46, 2.12it/s, loss=0.315]
IOPub message rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable `--NotebookApp.iopub_msg_rate_limit`. Current values: NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec) NotebookApp.rate_limit_window=3.0 (secs)
9 0.313465 0.162371 0.93966 10 0.2692 0.162227 0.93946 11 0.272758 0.159716 0.94032 80%|███████▉ | 666/834 [04:43<01:11, 2.35it/s, loss=0.261]
IOPub message rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable `--NotebookApp.iopub_msg_rate_limit`. Current values: NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec) NotebookApp.rate_limit_window=3.0 (secs)
epoch trn_loss val_loss accuracy 0 0.441473 0.254497 0.9168
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
98%|█████████▊| 2044/2084 [07:08<00:08, 4.77it/s, loss=0.414]
IOPub message rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable `--NotebookApp.iopub_msg_rate_limit`. Current values: NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec) NotebookApp.rate_limit_window=3.0 (secs)
1 0.309567 0.170769 0.9367 80%|███████▉ | 1664/2084 [11:45<02:57, 2.36it/s, loss=0.249]
IOPub message rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable `--NotebookApp.iopub_msg_rate_limit`. Current values: NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec) NotebookApp.rate_limit_window=3.0 (secs)
4 0.257701 0.153826 0.9416 80%|███████▉ | 1665/2084 [13:25<03:22, 2.07it/s, loss=0.239]
IOPub message rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable `--NotebookApp.iopub_msg_rate_limit`. Current values: NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec) NotebookApp.rate_limit_window=3.0 (secs)
6 0.24764 0.148807 0.94436 7 0.239934 0.146907 0.9456 8 0.224837 0.156241 0.94496 9%|▉ | 189/2084 [01:18<13:09, 2.40it/s, loss=0.212]
IOPub message rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable `--NotebookApp.iopub_msg_rate_limit`. Current values: NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec) NotebookApp.rate_limit_window=3.0 (secs)
10 0.212315 0.145792 0.94616 11 0.221374 0.14564 0.9458 8%|▊ | 166/2084 [01:09<13:18, 2.40it/s, loss=0.186]
Some notebook issues here, you might want to run this cell from a python script...
Lety's see the evollution of the accuracy when we increas the size of the train data. For each training size, we report the best accuracy among the different epochs.
import matplotlib.pyplot as plt
best_acc = [0.84558, 0.87324, 0.91232, 0.9203, 0.93174, 0.93584, 0.94032, 0.94616]
sizes = [50, 100, 500, 1000, 5000, 10000, 20000, 50000]
plt.plot(sizes, best_acc)
plt.title('Evolution of performance when increasing the training size')
plt.xlabel('Training size')
plt.ylabel('Accuracy')
plt.show()
plt.plot(sizes, best_acc)
plt.title('Evolution of performance when increasing the training size, Zoom on the [0-10000] size zone')
plt.xlabel('Training size')
plt.ylabel('Accuracy')
plt.xlim([0, 10000])
plt.show()
plt.plot(np.log(sizes)/np.log(10), best_acc)
plt.title('Evolution of performance when increasing the training size, with log scale for size')
plt.xlabel('Training size (log)')
plt.ylabel('Accuracy')
plt.show()