Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.
Demo of a simple RNN for sentiment classification (here: a binary classification problem with two labels, positive and negative). Note that a simple RNN usually doesn't work very well due to vanishing and exploding gradient problems. Also, this implementation uses padding for dealing with variable size inputs. Hence, the shorter the sentence, the more <pad>
placeholders will be added to match the length of the longest sentence in a batch.
Note that this RNN trains about 4 times slower than the equivalent with packed sequences, ./rnn-simple-packed-imdb.ipynb.
%load_ext watermark
%watermark -a 'Sebastian Raschka' -v -p torch
import torch
import torch.nn.functional as F
from torchtext import data
from torchtext import datasets
import time
import random
torch.backends.cudnn.deterministic = True
Sebastian Raschka CPython 3.7.1 IPython 7.4.0 torch 1.0.1.post2
RANDOM_SEED = 123
torch.manual_seed(RANDOM_SEED)
VOCABULARY_SIZE = 20000
LEARNING_RATE = 1e-4
BATCH_SIZE = 128
NUM_EPOCHS = 15
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
EMBEDDING_DIM = 128
HIDDEN_DIM = 256
OUTPUT_DIM = 1
Load the IMDB Movie Review dataset:
TEXT = data.Field(tokenize = 'spacy')
LABEL = data.LabelField(dtype = torch.float)
train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)
train_data, valid_data = train_data.split(random_state=random.seed(RANDOM_SEED),
split_ratio=0.8)
print(f'Num Train: {len(train_data)}')
print(f'Num Valid: {len(valid_data)}')
print(f'Num Test: {len(test_data)}')
downloading aclImdb_v1.tar.gz
aclImdb_v1.tar.gz: 100%|██████████| 84.1M/84.1M [00:03<00:00, 22.3MB/s]
Num Train: 20000 Num Valid: 5000 Num Test: 25000
Build the vocabulary based on the top "VOCABULARY_SIZE" words:
TEXT.build_vocab(train_data, max_size=VOCABULARY_SIZE)
LABEL.build_vocab(train_data)
print(f'Vocabulary size: {len(TEXT.vocab)}')
print(f'Number of classes: {len(LABEL.vocab)}')
Vocabulary size: 20002 Number of classes: 2
The TEXT.vocab dictionary will contain the word counts and indices. The reason why the number of words is VOCABULARY_SIZE + 2 is that it contains to special tokens for padding and unknown words: <unk>
and <pad>
.
Make dataset iterators:
train_loader, valid_loader, test_loader = data.BucketIterator.splits(
(train_data, valid_data, test_data),
batch_size=BATCH_SIZE,
device=DEVICE)
Testing the iterators (note that the number of rows depends on the longest document in the respective batch):
print('Train')
for batch in train_loader:
print(f'Text matrix size: {batch.text.size()}')
print(f'Target vector size: {batch.label.size()}')
break
print('\nValid:')
for batch in valid_loader:
print(f'Text matrix size: {batch.text.size()}')
print(f'Target vector size: {batch.label.size()}')
break
print('\nTest:')
for batch in test_loader:
print(f'Text matrix size: {batch.text.size()}')
print(f'Target vector size: {batch.label.size()}')
break
Train Text matrix size: torch.Size([880, 128]) Target vector size: torch.Size([128]) Valid: Text matrix size: torch.Size([61, 128]) Target vector size: torch.Size([128]) Test: Text matrix size: torch.Size([42, 128]) Target vector size: torch.Size([128])
import torch.nn as nn
class RNN(nn.Module):
def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
super().__init__()
self.embedding = nn.Embedding(input_dim, embedding_dim)
self.rnn = nn.RNN(embedding_dim, hidden_dim)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, text):
#[sentence len, batch size] => [sentence len, batch size, embedding size]
embedded = self.embedding(text)
#[sentence len, batch size, embedding size] =>
# output: [sentence len, batch size, hidden size]
# hidden: [1, batch size, hidden size]
output, hidden = self.rnn(embedded)
return self.fc(hidden.squeeze(0)).view(-1)
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 64
HIDDEN_DIM = 128
OUTPUT_DIM = 1
torch.manual_seed(RANDOM_SEED)
model = RNN(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM)
model = model.to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)
def compute_binary_accuracy(model, data_loader, device):
model.eval()
correct_pred, num_examples = 0, 0
with torch.no_grad():
for batch_idx, batch_data in enumerate(data_loader):
logits = model(batch_data.text)
predicted_labels = (torch.sigmoid(logits) > 0.5).long()
num_examples += batch_data.label.size(0)
correct_pred += (predicted_labels == batch_data.label.long()).sum()
return correct_pred.float()/num_examples * 100
start_time = time.time()
for epoch in range(NUM_EPOCHS):
model.train()
for batch_idx, batch_data in enumerate(train_loader):
### FORWARD AND BACK PROP
logits = model(batch_data.text)
cost = F.binary_cross_entropy_with_logits(logits, batch_data.label)
optimizer.zero_grad()
cost.backward()
### UPDATE MODEL PARAMETERS
optimizer.step()
### LOGGING
if not batch_idx % 50:
print (f'Epoch: {epoch+1:03d}/{NUM_EPOCHS:03d} | '
f'Batch {batch_idx:03d}/{len(train_loader):03d} | '
f'Cost: {cost:.4f}')
with torch.set_grad_enabled(False):
print(f'training accuracy: '
f'{compute_binary_accuracy(model, train_loader, DEVICE):.2f}%'
f'\nvalid accuracy: '
f'{compute_binary_accuracy(model, valid_loader, DEVICE):.2f}%')
print(f'Time elapsed: {(time.time() - start_time)/60:.2f} min')
print(f'Total Training Time: {(time.time() - start_time)/60:.2f} min')
print(f'Test accuracy: {compute_binary_accuracy(model, test_loader, DEVICE):.2f}%')
Epoch: 001/015 | Batch 000/157 | Cost: 0.7111 Epoch: 001/015 | Batch 050/157 | Cost: 0.6912 Epoch: 001/015 | Batch 100/157 | Cost: 0.6856 Epoch: 001/015 | Batch 150/157 | Cost: 0.6970 training accuracy: 49.94% valid accuracy: 49.96% Time elapsed: 0.42 min Epoch: 002/015 | Batch 000/157 | Cost: 0.6905 Epoch: 002/015 | Batch 050/157 | Cost: 0.6980 Epoch: 002/015 | Batch 100/157 | Cost: 0.6934 Epoch: 002/015 | Batch 150/157 | Cost: 0.6927 training accuracy: 49.99% valid accuracy: 49.86% Time elapsed: 0.83 min Epoch: 003/015 | Batch 000/157 | Cost: 0.6947 Epoch: 003/015 | Batch 050/157 | Cost: 0.6938 Epoch: 003/015 | Batch 100/157 | Cost: 0.7035 Epoch: 003/015 | Batch 150/157 | Cost: 0.6942 training accuracy: 49.99% valid accuracy: 50.60% Time elapsed: 1.26 min Epoch: 004/015 | Batch 000/157 | Cost: 0.6927 Epoch: 004/015 | Batch 050/157 | Cost: 0.6920 Epoch: 004/015 | Batch 100/157 | Cost: 0.6916 Epoch: 004/015 | Batch 150/157 | Cost: 0.6947 training accuracy: 50.07% valid accuracy: 49.80% Time elapsed: 1.68 min Epoch: 005/015 | Batch 000/157 | Cost: 0.6885 Epoch: 005/015 | Batch 050/157 | Cost: 0.6907 Epoch: 005/015 | Batch 100/157 | Cost: 0.6939 Epoch: 005/015 | Batch 150/157 | Cost: 0.6881 training accuracy: 50.09% valid accuracy: 49.86% Time elapsed: 2.09 min Epoch: 006/015 | Batch 000/157 | Cost: 0.6939 Epoch: 006/015 | Batch 050/157 | Cost: 0.6928 Epoch: 006/015 | Batch 100/157 | Cost: 0.6917 Epoch: 006/015 | Batch 150/157 | Cost: 0.6915 training accuracy: 49.99% valid accuracy: 50.54% Time elapsed: 2.53 min Epoch: 007/015 | Batch 000/157 | Cost: 0.6927 Epoch: 007/015 | Batch 050/157 | Cost: 0.6935 Epoch: 007/015 | Batch 100/157 | Cost: 0.6931 Epoch: 007/015 | Batch 150/157 | Cost: 0.6917 training accuracy: 50.05% valid accuracy: 50.18% Time elapsed: 2.95 min Epoch: 008/015 | Batch 000/157 | Cost: 0.6921 Epoch: 008/015 | Batch 050/157 | Cost: 0.6940 Epoch: 008/015 | Batch 100/157 | Cost: 0.6923 Epoch: 008/015 | Batch 150/157 | Cost: 0.6877 training accuracy: 50.06% valid accuracy: 49.82% Time elapsed: 3.37 min Epoch: 009/015 | Batch 000/157 | Cost: 0.6926 Epoch: 009/015 | Batch 050/157 | Cost: 0.6980 Epoch: 009/015 | Batch 100/157 | Cost: 0.6970 Epoch: 009/015 | Batch 150/157 | Cost: 0.6900 training accuracy: 50.19% valid accuracy: 49.36% Time elapsed: 3.80 min Epoch: 010/015 | Batch 000/157 | Cost: 0.6954 Epoch: 010/015 | Batch 050/157 | Cost: 0.6926 Epoch: 010/015 | Batch 100/157 | Cost: 0.6916 Epoch: 010/015 | Batch 150/157 | Cost: 0.6926 training accuracy: 50.01% valid accuracy: 50.16% Time elapsed: 4.22 min Epoch: 011/015 | Batch 000/157 | Cost: 0.6933 Epoch: 011/015 | Batch 050/157 | Cost: 0.6933 Epoch: 011/015 | Batch 100/157 | Cost: 0.6947 Epoch: 011/015 | Batch 150/157 | Cost: 0.6922 training accuracy: 50.17% valid accuracy: 49.88% Time elapsed: 4.64 min Epoch: 012/015 | Batch 000/157 | Cost: 0.6927 Epoch: 012/015 | Batch 050/157 | Cost: 0.6934 Epoch: 012/015 | Batch 100/157 | Cost: 0.6931 Epoch: 012/015 | Batch 150/157 | Cost: 0.6934 training accuracy: 50.15% valid accuracy: 49.92% Time elapsed: 5.08 min Epoch: 013/015 | Batch 000/157 | Cost: 0.6938 Epoch: 013/015 | Batch 050/157 | Cost: 0.6946 Epoch: 013/015 | Batch 100/157 | Cost: 0.6956 Epoch: 013/015 | Batch 150/157 | Cost: 0.6925 training accuracy: 50.10% valid accuracy: 50.38% Time elapsed: 5.51 min Epoch: 014/015 | Batch 000/157 | Cost: 0.6940 Epoch: 014/015 | Batch 050/157 | Cost: 0.6917 Epoch: 014/015 | Batch 100/157 | Cost: 0.6902 Epoch: 014/015 | Batch 150/157 | Cost: 0.6961 training accuracy: 50.13% valid accuracy: 50.36% Time elapsed: 5.93 min Epoch: 015/015 | Batch 000/157 | Cost: 0.6985 Epoch: 015/015 | Batch 050/157 | Cost: 0.6916 Epoch: 015/015 | Batch 100/157 | Cost: 0.6879 Epoch: 015/015 | Batch 150/157 | Cost: 0.6934 training accuracy: 50.16% valid accuracy: 50.68% Time elapsed: 6.35 min Total Training Time: 6.35 min Test accuracy: 46.38%
import spacy
nlp = spacy.load('en')
def predict_sentiment(model, sentence):
# based on:
# https://github.com/bentrevett/pytorch-sentiment-analysis/blob/
# master/2%20-%20Upgraded%20Sentiment%20Analysis.ipynb
model.eval()
tokenized = [tok.text for tok in nlp.tokenizer(sentence)]
indexed = [TEXT.vocab.stoi[t] for t in tokenized]
tensor = torch.LongTensor(indexed).to(DEVICE)
tensor = tensor.unsqueeze(1)
prediction = torch.sigmoid(model(tensor))
return prediction.item()
print('Probability positive:')
predict_sentiment(model, "I really love this movie. This movie is so great!")
Probability positive:
0.5701386332511902