Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.

Model Zoo -- Simple RNN

Demo of a simple RNN for sentiment classification (here: a binary classification problem with two labels, positive and negative). Note that a simple RNN usually doesn't work very well due to vanishing and exploding gradient problems. Also, this implementation uses padding for dealing with variable size inputs. Hence, the shorter the sentence, the more <pad> placeholders will be added to match the length of the longest sentence in a batch.

Note that this RNN trains about 4 times slower than the equivalent with packed sequences, ./rnn-simple-packed-imdb.ipynb.

In [1]:
%load_ext watermark
%watermark -a 'Sebastian Raschka' -v -p torch

import torch
import torch.nn.functional as F
from torchtext import data
from torchtext import datasets
import time
import random

torch.backends.cudnn.deterministic = True
Sebastian Raschka 

CPython 3.7.1
IPython 7.4.0

torch 1.0.1.post2

General Settings

In [2]:
RANDOM_SEED = 123
torch.manual_seed(RANDOM_SEED)

VOCABULARY_SIZE = 20000
LEARNING_RATE = 1e-4
BATCH_SIZE = 128
NUM_EPOCHS = 15
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

EMBEDDING_DIM = 128
HIDDEN_DIM = 256
OUTPUT_DIM = 1

Dataset

Load the IMDB Movie Review dataset:

In [3]:
TEXT = data.Field(tokenize = 'spacy')
LABEL = data.LabelField(dtype = torch.float)
train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)
train_data, valid_data = train_data.split(random_state=random.seed(RANDOM_SEED),
                                          split_ratio=0.8)

print(f'Num Train: {len(train_data)}')
print(f'Num Valid: {len(valid_data)}')
print(f'Num Test: {len(test_data)}')
downloading aclImdb_v1.tar.gz
aclImdb_v1.tar.gz: 100%|██████████| 84.1M/84.1M [00:03<00:00, 22.3MB/s]
Num Train: 20000
Num Valid: 5000
Num Test: 25000

Build the vocabulary based on the top "VOCABULARY_SIZE" words:

In [4]:
TEXT.build_vocab(train_data, max_size=VOCABULARY_SIZE)
LABEL.build_vocab(train_data)

print(f'Vocabulary size: {len(TEXT.vocab)}')
print(f'Number of classes: {len(LABEL.vocab)}')
Vocabulary size: 20002
Number of classes: 2

The TEXT.vocab dictionary will contain the word counts and indices. The reason why the number of words is VOCABULARY_SIZE + 2 is that it contains to special tokens for padding and unknown words: <unk> and <pad>.

Make dataset iterators:

In [ ]:
train_loader, valid_loader, test_loader = data.BucketIterator.splits(
    (train_data, valid_data, test_data), 
    batch_size=BATCH_SIZE,
    device=DEVICE)

Testing the iterators (note that the number of rows depends on the longest document in the respective batch):

In [6]:
print('Train')
for batch in train_loader:
    print(f'Text matrix size: {batch.text.size()}')
    print(f'Target vector size: {batch.label.size()}')
    break
    
print('\nValid:')
for batch in valid_loader:
    print(f'Text matrix size: {batch.text.size()}')
    print(f'Target vector size: {batch.label.size()}')
    break
    
print('\nTest:')
for batch in test_loader:
    print(f'Text matrix size: {batch.text.size()}')
    print(f'Target vector size: {batch.label.size()}')
    break
Train
Text matrix size: torch.Size([880, 128])
Target vector size: torch.Size([128])

Valid:
Text matrix size: torch.Size([61, 128])
Target vector size: torch.Size([128])

Test:
Text matrix size: torch.Size([42, 128])
Target vector size: torch.Size([128])

Model

In [ ]:
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
        
        super().__init__()
        
        self.embedding = nn.Embedding(input_dim, embedding_dim)
        self.rnn = nn.RNN(embedding_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, text):

        #[sentence len, batch size] => [sentence len, batch size, embedding size]
        embedded = self.embedding(text)
        
        #[sentence len, batch size, embedding size] => 
        #  output: [sentence len, batch size, hidden size]
        #  hidden: [1, batch size, hidden size]
        output, hidden = self.rnn(embedded)
        
        return self.fc(hidden.squeeze(0)).view(-1)
In [ ]:
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 64
HIDDEN_DIM = 128
OUTPUT_DIM = 1

torch.manual_seed(RANDOM_SEED)
model = RNN(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM)
model = model.to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)

Training

In [ ]:
def compute_binary_accuracy(model, data_loader, device):
    model.eval()
    correct_pred, num_examples = 0, 0
    with torch.no_grad():
        for batch_idx, batch_data in enumerate(data_loader):
            logits = model(batch_data.text)
            predicted_labels = (torch.sigmoid(logits) > 0.5).long()
            num_examples += batch_data.label.size(0)
            correct_pred += (predicted_labels == batch_data.label.long()).sum()
        return correct_pred.float()/num_examples * 100
In [10]:
start_time = time.time()

for epoch in range(NUM_EPOCHS):
    model.train()
    for batch_idx, batch_data in enumerate(train_loader):
        
        ### FORWARD AND BACK PROP
        logits = model(batch_data.text)
        cost = F.binary_cross_entropy_with_logits(logits, batch_data.label)
        optimizer.zero_grad()
        
        cost.backward()
        
        ### UPDATE MODEL PARAMETERS
        optimizer.step()
        
        ### LOGGING
        if not batch_idx % 50:
            print (f'Epoch: {epoch+1:03d}/{NUM_EPOCHS:03d} | '
                   f'Batch {batch_idx:03d}/{len(train_loader):03d} | '
                   f'Cost: {cost:.4f}')

    with torch.set_grad_enabled(False):
        print(f'training accuracy: '
              f'{compute_binary_accuracy(model, train_loader, DEVICE):.2f}%'
              f'\nvalid accuracy: '
              f'{compute_binary_accuracy(model, valid_loader, DEVICE):.2f}%')
        
    print(f'Time elapsed: {(time.time() - start_time)/60:.2f} min')
    
print(f'Total Training Time: {(time.time() - start_time)/60:.2f} min')
print(f'Test accuracy: {compute_binary_accuracy(model, test_loader, DEVICE):.2f}%')
Epoch: 001/015 | Batch 000/157 | Cost: 0.7111
Epoch: 001/015 | Batch 050/157 | Cost: 0.6912
Epoch: 001/015 | Batch 100/157 | Cost: 0.6856
Epoch: 001/015 | Batch 150/157 | Cost: 0.6970
training accuracy: 49.94%
valid accuracy: 49.96%
Time elapsed: 0.42 min
Epoch: 002/015 | Batch 000/157 | Cost: 0.6905
Epoch: 002/015 | Batch 050/157 | Cost: 0.6980
Epoch: 002/015 | Batch 100/157 | Cost: 0.6934
Epoch: 002/015 | Batch 150/157 | Cost: 0.6927
training accuracy: 49.99%
valid accuracy: 49.86%
Time elapsed: 0.83 min
Epoch: 003/015 | Batch 000/157 | Cost: 0.6947
Epoch: 003/015 | Batch 050/157 | Cost: 0.6938
Epoch: 003/015 | Batch 100/157 | Cost: 0.7035
Epoch: 003/015 | Batch 150/157 | Cost: 0.6942
training accuracy: 49.99%
valid accuracy: 50.60%
Time elapsed: 1.26 min
Epoch: 004/015 | Batch 000/157 | Cost: 0.6927
Epoch: 004/015 | Batch 050/157 | Cost: 0.6920
Epoch: 004/015 | Batch 100/157 | Cost: 0.6916
Epoch: 004/015 | Batch 150/157 | Cost: 0.6947
training accuracy: 50.07%
valid accuracy: 49.80%
Time elapsed: 1.68 min
Epoch: 005/015 | Batch 000/157 | Cost: 0.6885
Epoch: 005/015 | Batch 050/157 | Cost: 0.6907
Epoch: 005/015 | Batch 100/157 | Cost: 0.6939
Epoch: 005/015 | Batch 150/157 | Cost: 0.6881
training accuracy: 50.09%
valid accuracy: 49.86%
Time elapsed: 2.09 min
Epoch: 006/015 | Batch 000/157 | Cost: 0.6939
Epoch: 006/015 | Batch 050/157 | Cost: 0.6928
Epoch: 006/015 | Batch 100/157 | Cost: 0.6917
Epoch: 006/015 | Batch 150/157 | Cost: 0.6915
training accuracy: 49.99%
valid accuracy: 50.54%
Time elapsed: 2.53 min
Epoch: 007/015 | Batch 000/157 | Cost: 0.6927
Epoch: 007/015 | Batch 050/157 | Cost: 0.6935
Epoch: 007/015 | Batch 100/157 | Cost: 0.6931
Epoch: 007/015 | Batch 150/157 | Cost: 0.6917
training accuracy: 50.05%
valid accuracy: 50.18%
Time elapsed: 2.95 min
Epoch: 008/015 | Batch 000/157 | Cost: 0.6921
Epoch: 008/015 | Batch 050/157 | Cost: 0.6940
Epoch: 008/015 | Batch 100/157 | Cost: 0.6923
Epoch: 008/015 | Batch 150/157 | Cost: 0.6877
training accuracy: 50.06%
valid accuracy: 49.82%
Time elapsed: 3.37 min
Epoch: 009/015 | Batch 000/157 | Cost: 0.6926
Epoch: 009/015 | Batch 050/157 | Cost: 0.6980
Epoch: 009/015 | Batch 100/157 | Cost: 0.6970
Epoch: 009/015 | Batch 150/157 | Cost: 0.6900
training accuracy: 50.19%
valid accuracy: 49.36%
Time elapsed: 3.80 min
Epoch: 010/015 | Batch 000/157 | Cost: 0.6954
Epoch: 010/015 | Batch 050/157 | Cost: 0.6926
Epoch: 010/015 | Batch 100/157 | Cost: 0.6916
Epoch: 010/015 | Batch 150/157 | Cost: 0.6926
training accuracy: 50.01%
valid accuracy: 50.16%
Time elapsed: 4.22 min
Epoch: 011/015 | Batch 000/157 | Cost: 0.6933
Epoch: 011/015 | Batch 050/157 | Cost: 0.6933
Epoch: 011/015 | Batch 100/157 | Cost: 0.6947
Epoch: 011/015 | Batch 150/157 | Cost: 0.6922
training accuracy: 50.17%
valid accuracy: 49.88%
Time elapsed: 4.64 min
Epoch: 012/015 | Batch 000/157 | Cost: 0.6927
Epoch: 012/015 | Batch 050/157 | Cost: 0.6934
Epoch: 012/015 | Batch 100/157 | Cost: 0.6931
Epoch: 012/015 | Batch 150/157 | Cost: 0.6934
training accuracy: 50.15%
valid accuracy: 49.92%
Time elapsed: 5.08 min
Epoch: 013/015 | Batch 000/157 | Cost: 0.6938
Epoch: 013/015 | Batch 050/157 | Cost: 0.6946
Epoch: 013/015 | Batch 100/157 | Cost: 0.6956
Epoch: 013/015 | Batch 150/157 | Cost: 0.6925
training accuracy: 50.10%
valid accuracy: 50.38%
Time elapsed: 5.51 min
Epoch: 014/015 | Batch 000/157 | Cost: 0.6940
Epoch: 014/015 | Batch 050/157 | Cost: 0.6917
Epoch: 014/015 | Batch 100/157 | Cost: 0.6902
Epoch: 014/015 | Batch 150/157 | Cost: 0.6961
training accuracy: 50.13%
valid accuracy: 50.36%
Time elapsed: 5.93 min
Epoch: 015/015 | Batch 000/157 | Cost: 0.6985
Epoch: 015/015 | Batch 050/157 | Cost: 0.6916
Epoch: 015/015 | Batch 100/157 | Cost: 0.6879
Epoch: 015/015 | Batch 150/157 | Cost: 0.6934
training accuracy: 50.16%
valid accuracy: 50.68%
Time elapsed: 6.35 min
Total Training Time: 6.35 min
Test accuracy: 46.38%
In [ ]:
import spacy
nlp = spacy.load('en')

def predict_sentiment(model, sentence):
    # based on:
    # https://github.com/bentrevett/pytorch-sentiment-analysis/blob/
    # master/2%20-%20Upgraded%20Sentiment%20Analysis.ipynb
    model.eval()
    tokenized = [tok.text for tok in nlp.tokenizer(sentence)]
    indexed = [TEXT.vocab.stoi[t] for t in tokenized]
    tensor = torch.LongTensor(indexed).to(DEVICE)
    tensor = tensor.unsqueeze(1)
    prediction = torch.sigmoid(model(tensor))
    return prediction.item()
In [12]:
print('Probability positive:')
predict_sentiment(model, "I really love this movie. This movie is so great!")
Probability positive:
Out[12]:
0.5701386332511902