Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.

In [1]:
%load_ext watermark
%watermark -a 'Sebastian Raschka' -v -p torch
Sebastian Raschka 

CPython 3.7.3
IPython 7.6.1

torch 1.2.0
  • Runs on CPU or GPU (if available)

Gradient Clipping

Certain types of deep neural networks, especially, simple ones without any other type regularization and a relatively large number of layers, can suffer from exploding gradient problems. The exploding gradient problem is a scenario where large loss gradients accumulate during backpropagation, which will eventually result in very large weight updates during training. As a consequence, the updates will be very unstable and fluctuate a lot, which often causes severe problems during training. This is also a particular problem for unbounded activation functions such as ReLU.

One common, classic technique for avoiding exploding gradient problems is the so-called gradient clipping approach. Here, we simply set gradient values above or below a certain threshold to a user-specified min or max value. In PyTorch, there are several ways for performing gradient clipping.

1 - Basic Clipping

The simplest approach to gradient clipping in PyTorch is by using the torch.nn.utils.clip_grad_value_ function. For example, if we have instantiated a PyTorch model from a model class based on torch.nn.Module (as usual), we can add the following line of code in order to clip the gradients to [-1, 1] range:

torch.nn.utils.clip_grad_value_(parameters=model.parameters(), 
                                clip_value=1.)

However, notice that via this approach, we can only specify a single clip value, which will be used for both the upper and lower bound such that gradients will be clipped to the range [-clip_value, clip_value].

2 - Custom Lower and Upper Bounds

If we want to clip the gradients to an unsymmetric interval around zero, say [-0.1, 1.0], we can take a different approach by defining a backwards hook:

for param in model.parameters():
    param.register_hook(lambda gradient: torch.clamp(gradient, -0.1, 1.0))

This backward hook only needs to be defined once after instantiating the model. Then, each time after calling the backward method, it will clip the gradients before running the model.step() method.

3 - Norm-clipping

Lastly, there's a third clipping option, torch.nn.utils.clip_grad_norm_, which clips the gradients using a vector norm as follows:

torch.nn.utils.clip_grad_norm_(parameters, max_norm, norm_type=2)

Clips gradient norm of an iterable of parameters. The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place.

Imports

In [2]:
import time
import numpy as np
from torchvision import datasets
from torchvision import transforms
from torch.utils.data import DataLoader
import torch.nn.functional as F
import torch


if torch.cuda.is_available():
    torch.backends.cudnn.deterministic = True

Settings and Dataset

In [3]:
##########################
### SETTINGS
##########################

# Device
device = torch.device("cuda:2" if torch.cuda.is_available() else "cpu")

# Hyperparameters
random_seed = 1
learning_rate = 0.01
num_epochs = 10
batch_size = 64

# Architecture
num_features = 784
num_hidden_1 = 256
num_hidden_2 = 128
num_hidden_3 = 64
num_hidden_4 = 32
num_classes = 10


##########################
### MNIST DATASET
##########################

# Note transforms.ToTensor() scales input images
# to 0-1 range
train_dataset = datasets.MNIST(root='data', 
                               train=True, 
                               transform=transforms.ToTensor(),
                               download=True)

test_dataset = datasets.MNIST(root='data', 
                              train=False, 
                              transform=transforms.ToTensor())


train_loader = DataLoader(dataset=train_dataset, 
                          batch_size=batch_size, 
                          shuffle=True)

test_loader = DataLoader(dataset=test_dataset, 
                         batch_size=batch_size, 
                         shuffle=False)

# Checking the dataset
for images, labels in train_loader:  
    print('Image batch dimensions:', images.shape)
    print('Image label dimensions:', labels.shape)
    break
Image batch dimensions: torch.Size([64, 1, 28, 28])
Image label dimensions: torch.Size([64])
In [4]:
def compute_accuracy(net, data_loader):
    net.eval()
    correct_pred, num_examples = 0, 0
    with torch.no_grad():
        for features, targets in data_loader:
            features = features.view(-1, 28*28).to(device)
            targets = targets.to(device)
            logits, probas = net(features)
            _, predicted_labels = torch.max(probas, 1)
            num_examples += targets.size(0)
            correct_pred += (predicted_labels == targets).sum()
        return correct_pred.float()/num_examples * 100
    
In [5]:
##########################
### MODEL
##########################

class MultilayerPerceptron(torch.nn.Module):

    def __init__(self, num_features, num_classes):
        super(MultilayerPerceptron, self).__init__()
        
        ### 1st hidden layer
        self.linear_1 = torch.nn.Linear(num_features, num_hidden_1)

        ### 2nd hidden layer
        self.linear_2 = torch.nn.Linear(num_hidden_1, num_hidden_2)

        ### 3rd hidden layer
        self.linear_3 = torch.nn.Linear(num_hidden_2, num_hidden_3)
        
        ### 4th hidden layer
        self.linear_4 = torch.nn.Linear(num_hidden_3, num_hidden_4)
        
        
        ### Output layer
        self.linear_out = torch.nn.Linear(num_hidden_4, num_classes)

        
    def forward(self, x):
        out = self.linear_1(x)
        out = F.relu(out)
        out = self.linear_2(out)
        out = F.relu(out)
        out = self.linear_3(out)
        out = F.relu(out)
        out = self.linear_4(out)
        out = F.relu(out)
        logits = self.linear_out(out)
        probas = F.log_softmax(logits, dim=1)
        return logits, probas

1 - Basic Clipping

In [6]:
torch.manual_seed(random_seed)
model = MultilayerPerceptron(num_features=num_features,
                             num_classes=num_classes)

model = model.to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)  

###################################################################

start_time = time.time()
for epoch in range(num_epochs):
    model.train()
    for batch_idx, (features, targets) in enumerate(train_loader):
        
        features = features.view(-1, 28*28).to(device)
        targets = targets.to(device)
            
        ### FORWARD AND BACK PROP
        logits, probas = model(features)
        cost = F.cross_entropy(logits, targets)
        optimizer.zero_grad()
        
        cost.backward()
        
        ### UPDATE MODEL PARAMETERS
        
        #########################################################
        #########################################################
        ### GRADIENT CLIPPING
        torch.nn.utils.clip_grad_value_(model.parameters(), 1.)
        #########################################################
        #########################################################
        
        optimizer.step()
        
        ### LOGGING
        if not batch_idx % 50:
            print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' 
                   %(epoch+1, num_epochs, batch_idx, 
                     len(train_loader), cost))

    with torch.set_grad_enabled(False):
        print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
              epoch+1, num_epochs, 
              compute_accuracy(model, train_loader)))
        
    print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))
    
print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))
Epoch: 001/010 | Batch 000/938 | Cost: 2.3054
Epoch: 001/010 | Batch 050/938 | Cost: 0.6427
Epoch: 001/010 | Batch 100/938 | Cost: 0.3220
Epoch: 001/010 | Batch 150/938 | Cost: 0.3492
Epoch: 001/010 | Batch 200/938 | Cost: 0.4505
Epoch: 001/010 | Batch 250/938 | Cost: 0.1510
Epoch: 001/010 | Batch 300/938 | Cost: 0.2062
Epoch: 001/010 | Batch 350/938 | Cost: 0.1287
Epoch: 001/010 | Batch 400/938 | Cost: 0.1714
Epoch: 001/010 | Batch 450/938 | Cost: 0.3522
Epoch: 001/010 | Batch 500/938 | Cost: 0.4268
Epoch: 001/010 | Batch 550/938 | Cost: 0.0133
Epoch: 001/010 | Batch 600/938 | Cost: 0.1868
Epoch: 001/010 | Batch 650/938 | Cost: 0.2312
Epoch: 001/010 | Batch 700/938 | Cost: 0.1471
Epoch: 001/010 | Batch 750/938 | Cost: 0.1321
Epoch: 001/010 | Batch 800/938 | Cost: 0.2776
Epoch: 001/010 | Batch 850/938 | Cost: 0.2223
Epoch: 001/010 | Batch 900/938 | Cost: 0.1812
Epoch: 001/010 training accuracy: 94.72%
Time elapsed: 0.25 min
Epoch: 002/010 | Batch 000/938 | Cost: 0.2080
Epoch: 002/010 | Batch 050/938 | Cost: 0.2177
Epoch: 002/010 | Batch 100/938 | Cost: 0.1090
Epoch: 002/010 | Batch 150/938 | Cost: 0.1225
Epoch: 002/010 | Batch 200/938 | Cost: 0.2514
Epoch: 002/010 | Batch 250/938 | Cost: 0.1093
Epoch: 002/010 | Batch 300/938 | Cost: 0.0626
Epoch: 002/010 | Batch 350/938 | Cost: 0.1242
Epoch: 002/010 | Batch 400/938 | Cost: 0.0168
Epoch: 002/010 | Batch 450/938 | Cost: 0.2678
Epoch: 002/010 | Batch 500/938 | Cost: 0.1761
Epoch: 002/010 | Batch 550/938 | Cost: 0.2607
Epoch: 002/010 | Batch 600/938 | Cost: 0.1324
Epoch: 002/010 | Batch 650/938 | Cost: 0.2334
Epoch: 002/010 | Batch 700/938 | Cost: 0.1510
Epoch: 002/010 | Batch 750/938 | Cost: 0.1456
Epoch: 002/010 | Batch 800/938 | Cost: 0.2882
Epoch: 002/010 | Batch 850/938 | Cost: 0.1485
Epoch: 002/010 | Batch 900/938 | Cost: 0.2007
Epoch: 002/010 training accuracy: 96.83%
Time elapsed: 0.49 min
Epoch: 003/010 | Batch 000/938 | Cost: 0.0550
Epoch: 003/010 | Batch 050/938 | Cost: 0.0555
Epoch: 003/010 | Batch 100/938 | Cost: 0.1040
Epoch: 003/010 | Batch 150/938 | Cost: 0.2290
Epoch: 003/010 | Batch 200/938 | Cost: 0.0506
Epoch: 003/010 | Batch 250/938 | Cost: 0.1028
Epoch: 003/010 | Batch 300/938 | Cost: 0.0381
Epoch: 003/010 | Batch 350/938 | Cost: 0.1593
Epoch: 003/010 | Batch 400/938 | Cost: 0.0637
Epoch: 003/010 | Batch 450/938 | Cost: 0.0127
Epoch: 003/010 | Batch 500/938 | Cost: 0.4391
Epoch: 003/010 | Batch 550/938 | Cost: 0.0110
Epoch: 003/010 | Batch 600/938 | Cost: 0.1959
Epoch: 003/010 | Batch 650/938 | Cost: 0.1020
Epoch: 003/010 | Batch 700/938 | Cost: 0.0206
Epoch: 003/010 | Batch 750/938 | Cost: 0.2747
Epoch: 003/010 | Batch 800/938 | Cost: 0.1192
Epoch: 003/010 | Batch 850/938 | Cost: 0.0115
Epoch: 003/010 | Batch 900/938 | Cost: 0.2476
Epoch: 003/010 training accuracy: 97.65%
Time elapsed: 0.74 min
Epoch: 004/010 | Batch 000/938 | Cost: 0.0875
Epoch: 004/010 | Batch 050/938 | Cost: 0.0335
Epoch: 004/010 | Batch 100/938 | Cost: 0.0530
Epoch: 004/010 | Batch 150/938 | Cost: 0.4291
Epoch: 004/010 | Batch 200/938 | Cost: 0.0634
Epoch: 004/010 | Batch 250/938 | Cost: 0.0437
Epoch: 004/010 | Batch 300/938 | Cost: 0.0547
Epoch: 004/010 | Batch 350/938 | Cost: 0.1602
Epoch: 004/010 | Batch 400/938 | Cost: 0.1071
Epoch: 004/010 | Batch 450/938 | Cost: 0.0351
Epoch: 004/010 | Batch 500/938 | Cost: 0.0712
Epoch: 004/010 | Batch 550/938 | Cost: 0.1261
Epoch: 004/010 | Batch 600/938 | Cost: 0.1212
Epoch: 004/010 | Batch 650/938 | Cost: 0.0802
Epoch: 004/010 | Batch 700/938 | Cost: 0.0844
Epoch: 004/010 | Batch 750/938 | Cost: 0.1496
Epoch: 004/010 | Batch 800/938 | Cost: 0.1543
Epoch: 004/010 | Batch 850/938 | Cost: 0.0182
Epoch: 004/010 | Batch 900/938 | Cost: 0.0433
Epoch: 004/010 training accuracy: 97.08%
Time elapsed: 0.98 min
Epoch: 005/010 | Batch 000/938 | Cost: 0.1570
Epoch: 005/010 | Batch 050/938 | Cost: 0.0291
Epoch: 005/010 | Batch 100/938 | Cost: 0.0363
Epoch: 005/010 | Batch 150/938 | Cost: 0.0320
Epoch: 005/010 | Batch 200/938 | Cost: 0.0322
Epoch: 005/010 | Batch 250/938 | Cost: 0.0720
Epoch: 005/010 | Batch 300/938 | Cost: 0.0497
Epoch: 005/010 | Batch 350/938 | Cost: 0.1058
Epoch: 005/010 | Batch 400/938 | Cost: 0.2139
Epoch: 005/010 | Batch 450/938 | Cost: 0.0602
Epoch: 005/010 | Batch 500/938 | Cost: 0.0689
Epoch: 005/010 | Batch 550/938 | Cost: 0.1355
Epoch: 005/010 | Batch 600/938 | Cost: 0.1659
Epoch: 005/010 | Batch 650/938 | Cost: 0.1504
Epoch: 005/010 | Batch 700/938 | Cost: 0.0403
Epoch: 005/010 | Batch 750/938 | Cost: 0.3422
Epoch: 005/010 | Batch 800/938 | Cost: 0.3299
Epoch: 005/010 | Batch 850/938 | Cost: 0.2327
Epoch: 005/010 | Batch 900/938 | Cost: 0.0171
Epoch: 005/010 training accuracy: 97.51%
Time elapsed: 1.23 min
Epoch: 006/010 | Batch 000/938 | Cost: 0.0548
Epoch: 006/010 | Batch 050/938 | Cost: 0.2781
Epoch: 006/010 | Batch 100/938 | Cost: 0.0657
Epoch: 006/010 | Batch 150/938 | Cost: 0.0444
Epoch: 006/010 | Batch 200/938 | Cost: 0.0057
Epoch: 006/010 | Batch 250/938 | Cost: 0.1058
Epoch: 006/010 | Batch 300/938 | Cost: 0.1610
Epoch: 006/010 | Batch 350/938 | Cost: 0.0353
Epoch: 006/010 | Batch 400/938 | Cost: 0.2474
Epoch: 006/010 | Batch 450/938 | Cost: 0.1038
Epoch: 006/010 | Batch 500/938 | Cost: 0.2918
Epoch: 006/010 | Batch 550/938 | Cost: 0.1360
Epoch: 006/010 | Batch 600/938 | Cost: 0.1977
Epoch: 006/010 | Batch 650/938 | Cost: 0.0314
Epoch: 006/010 | Batch 700/938 | Cost: 0.0968
Epoch: 006/010 | Batch 750/938 | Cost: 0.2215
Epoch: 006/010 | Batch 800/938 | Cost: 0.0328
Epoch: 006/010 | Batch 850/938 | Cost: 0.2423
Epoch: 006/010 | Batch 900/938 | Cost: 0.1192
Epoch: 006/010 training accuracy: 97.47%
Time elapsed: 1.48 min
Epoch: 007/010 | Batch 000/938 | Cost: 0.0126
Epoch: 007/010 | Batch 050/938 | Cost: 0.0735
Epoch: 007/010 | Batch 100/938 | Cost: 0.2426
Epoch: 007/010 | Batch 150/938 | Cost: 0.0736
Epoch: 007/010 | Batch 200/938 | Cost: 0.1387
Epoch: 007/010 | Batch 250/938 | Cost: 0.2173
Epoch: 007/010 | Batch 300/938 | Cost: 0.0127
Epoch: 007/010 | Batch 350/938 | Cost: 0.1131
Epoch: 007/010 | Batch 400/938 | Cost: 0.2219
Epoch: 007/010 | Batch 450/938 | Cost: 0.0127
Epoch: 007/010 | Batch 500/938 | Cost: 0.0905
Epoch: 007/010 | Batch 550/938 | Cost: 0.2466
Epoch: 007/010 | Batch 600/938 | Cost: 0.0065
Epoch: 007/010 | Batch 650/938 | Cost: 0.1477
Epoch: 007/010 | Batch 700/938 | Cost: 0.0183
Epoch: 007/010 | Batch 750/938 | Cost: 0.0534
Epoch: 007/010 | Batch 800/938 | Cost: 0.1139
Epoch: 007/010 | Batch 850/938 | Cost: 0.1177
Epoch: 007/010 | Batch 900/938 | Cost: 0.0662
Epoch: 007/010 training accuracy: 97.74%
Time elapsed: 1.72 min
Epoch: 008/010 | Batch 000/938 | Cost: 0.0276
Epoch: 008/010 | Batch 050/938 | Cost: 0.1275
Epoch: 008/010 | Batch 100/938 | Cost: 0.2151
Epoch: 008/010 | Batch 150/938 | Cost: 0.0204
Epoch: 008/010 | Batch 200/938 | Cost: 0.2154
Epoch: 008/010 | Batch 250/938 | Cost: 0.0271
Epoch: 008/010 | Batch 300/938 | Cost: 0.0523
Epoch: 008/010 | Batch 350/938 | Cost: 0.1604
Epoch: 008/010 | Batch 400/938 | Cost: 0.0888
Epoch: 008/010 | Batch 450/938 | Cost: 0.0045
Epoch: 008/010 | Batch 500/938 | Cost: 0.0288
Epoch: 008/010 | Batch 550/938 | Cost: 0.1140
Epoch: 008/010 | Batch 600/938 | Cost: 0.0849
Epoch: 008/010 | Batch 650/938 | Cost: 0.0216
Epoch: 008/010 | Batch 700/938 | Cost: 0.0294
Epoch: 008/010 | Batch 750/938 | Cost: 0.0995
Epoch: 008/010 | Batch 800/938 | Cost: 0.1159
Epoch: 008/010 | Batch 850/938 | Cost: 0.1599
Epoch: 008/010 | Batch 900/938 | Cost: 0.1317
Epoch: 008/010 training accuracy: 98.29%
Time elapsed: 1.97 min
Epoch: 009/010 | Batch 000/938 | Cost: 0.1071
Epoch: 009/010 | Batch 050/938 | Cost: 0.0580
Epoch: 009/010 | Batch 100/938 | Cost: 0.1777
Epoch: 009/010 | Batch 150/938 | Cost: 0.2850
Epoch: 009/010 | Batch 200/938 | Cost: 0.1229
Epoch: 009/010 | Batch 250/938 | Cost: 0.0672
Epoch: 009/010 | Batch 300/938 | Cost: 0.2009
Epoch: 009/010 | Batch 350/938 | Cost: 0.0110
Epoch: 009/010 | Batch 400/938 | Cost: 0.2604
Epoch: 009/010 | Batch 450/938 | Cost: 0.0801
Epoch: 009/010 | Batch 500/938 | Cost: 0.0092
Epoch: 009/010 | Batch 550/938 | Cost: 0.1360
Epoch: 009/010 | Batch 600/938 | Cost: 0.0664
Epoch: 009/010 | Batch 650/938 | Cost: 0.0886
Epoch: 009/010 | Batch 700/938 | Cost: 0.0630
Epoch: 009/010 | Batch 750/938 | Cost: 0.0784
Epoch: 009/010 | Batch 800/938 | Cost: 0.1736
Epoch: 009/010 | Batch 850/938 | Cost: 0.0855
Epoch: 009/010 | Batch 900/938 | Cost: 0.2815
Epoch: 009/010 training accuracy: 97.74%
Time elapsed: 2.21 min
Epoch: 010/010 | Batch 000/938 | Cost: 0.0024
Epoch: 010/010 | Batch 050/938 | Cost: 0.0497
Epoch: 010/010 | Batch 100/938 | Cost: 0.0888
Epoch: 010/010 | Batch 150/938 | Cost: 0.1719
Epoch: 010/010 | Batch 200/938 | Cost: 0.1729
Epoch: 010/010 | Batch 250/938 | Cost: 0.0543
Epoch: 010/010 | Batch 300/938 | Cost: 0.3770
Epoch: 010/010 | Batch 350/938 | Cost: 0.0270
Epoch: 010/010 | Batch 400/938 | Cost: 0.1400
Epoch: 010/010 | Batch 450/938 | Cost: 0.0526
Epoch: 010/010 | Batch 500/938 | Cost: 0.1984
Epoch: 010/010 | Batch 550/938 | Cost: 0.1677
Epoch: 010/010 | Batch 600/938 | Cost: 0.0550
Epoch: 010/010 | Batch 650/938 | Cost: 0.0294
Epoch: 010/010 | Batch 700/938 | Cost: 0.0465
Epoch: 010/010 | Batch 750/938 | Cost: 0.1103
Epoch: 010/010 | Batch 800/938 | Cost: 0.0272
Epoch: 010/010 | Batch 850/938 | Cost: 0.1376
Epoch: 010/010 | Batch 900/938 | Cost: 0.0279
Epoch: 010/010 training accuracy: 98.09%
Time elapsed: 2.46 min
Total Training Time: 2.46 min
In [7]:
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))
Test accuracy: 96.80%

2 - Custom Lower and Upper Bounds

In [8]:
torch.manual_seed(random_seed)
model = MultilayerPerceptron(num_features=num_features,
                             num_classes=num_classes)

#########################################################
#########################################################
### GRADIENT CLIPPING
for p in model.parameters():
    p.register_hook(lambda grad: torch.clamp(grad, -0.1, 1.0))
#########################################################
#########################################################
    
model = model.to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)  

###################################################################

start_time = time.time()
for epoch in range(num_epochs):
    model.train()
    for batch_idx, (features, targets) in enumerate(train_loader):
        
        features = features.view(-1, 28*28).to(device)
        targets = targets.to(device)
            
        ### FORWARD AND BACK PROP
        logits, probas = model(features)
        cost = F.cross_entropy(logits, targets)
        optimizer.zero_grad()
        
        cost.backward()
        
        ### UPDATE MODEL PARAMETERS
        optimizer.step()
        
        ### LOGGING
        if not batch_idx % 50:
            print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' 
                   %(epoch+1, num_epochs, batch_idx, 
                     len(train_loader), cost))

    with torch.set_grad_enabled(False):
        print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
              epoch+1, num_epochs, 
              compute_accuracy(model, train_loader)))
        
    print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))
    
print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))
Epoch: 001/010 | Batch 000/938 | Cost: 2.3054
Epoch: 001/010 | Batch 050/938 | Cost: 0.5977
Epoch: 001/010 | Batch 100/938 | Cost: 0.4369
Epoch: 001/010 | Batch 150/938 | Cost: 0.3053
Epoch: 001/010 | Batch 200/938 | Cost: 0.3661
Epoch: 001/010 | Batch 250/938 | Cost: 0.1908
Epoch: 001/010 | Batch 300/938 | Cost: 0.2845
Epoch: 001/010 | Batch 350/938 | Cost: 0.1928
Epoch: 001/010 | Batch 400/938 | Cost: 0.2715
Epoch: 001/010 | Batch 450/938 | Cost: 0.2338
Epoch: 001/010 | Batch 500/938 | Cost: 0.3923
Epoch: 001/010 | Batch 550/938 | Cost: 0.0973
Epoch: 001/010 | Batch 600/938 | Cost: 0.3142
Epoch: 001/010 | Batch 650/938 | Cost: 0.5024
Epoch: 001/010 | Batch 700/938 | Cost: 0.1549
Epoch: 001/010 | Batch 750/938 | Cost: 0.1906
Epoch: 001/010 | Batch 800/938 | Cost: 0.3325
Epoch: 001/010 | Batch 850/938 | Cost: 0.2060
Epoch: 001/010 | Batch 900/938 | Cost: 0.1301
Epoch: 001/010 training accuracy: 94.76%
Time elapsed: 0.24 min
Epoch: 002/010 | Batch 000/938 | Cost: 0.2553
Epoch: 002/010 | Batch 050/938 | Cost: 0.1858
Epoch: 002/010 | Batch 100/938 | Cost: 0.2514
Epoch: 002/010 | Batch 150/938 | Cost: 0.1413
Epoch: 002/010 | Batch 200/938 | Cost: 0.3071
Epoch: 002/010 | Batch 250/938 | Cost: 0.6133
Epoch: 002/010 | Batch 300/938 | Cost: 0.1657
Epoch: 002/010 | Batch 350/938 | Cost: 0.0828
Epoch: 002/010 | Batch 400/938 | Cost: 0.0733
Epoch: 002/010 | Batch 450/938 | Cost: 0.3012
Epoch: 002/010 | Batch 500/938 | Cost: 0.1857
Epoch: 002/010 | Batch 550/938 | Cost: 0.3618
Epoch: 002/010 | Batch 600/938 | Cost: 0.0777
Epoch: 002/010 | Batch 650/938 | Cost: 0.2648
Epoch: 002/010 | Batch 700/938 | Cost: 0.0242
Epoch: 002/010 | Batch 750/938 | Cost: 0.1050
Epoch: 002/010 | Batch 800/938 | Cost: 0.2148
Epoch: 002/010 | Batch 850/938 | Cost: 0.0817
Epoch: 002/010 | Batch 900/938 | Cost: 0.1354
Epoch: 002/010 training accuracy: 97.04%
Time elapsed: 0.49 min
Epoch: 003/010 | Batch 000/938 | Cost: 0.1346
Epoch: 003/010 | Batch 050/938 | Cost: 0.0825
Epoch: 003/010 | Batch 100/938 | Cost: 0.0771
Epoch: 003/010 | Batch 150/938 | Cost: 0.2360
Epoch: 003/010 | Batch 200/938 | Cost: 0.0730
Epoch: 003/010 | Batch 250/938 | Cost: 0.1499
Epoch: 003/010 | Batch 300/938 | Cost: 0.0410
Epoch: 003/010 | Batch 350/938 | Cost: 0.2091
Epoch: 003/010 | Batch 400/938 | Cost: 0.0738
Epoch: 003/010 | Batch 450/938 | Cost: 0.0889
Epoch: 003/010 | Batch 500/938 | Cost: 0.3630
Epoch: 003/010 | Batch 550/938 | Cost: 0.0312
Epoch: 003/010 | Batch 600/938 | Cost: 0.0782
Epoch: 003/010 | Batch 650/938 | Cost: 0.1753
Epoch: 003/010 | Batch 700/938 | Cost: 0.0286
Epoch: 003/010 | Batch 750/938 | Cost: 0.2166
Epoch: 003/010 | Batch 800/938 | Cost: 0.0627
Epoch: 003/010 | Batch 850/938 | Cost: 0.0204
Epoch: 003/010 | Batch 900/938 | Cost: 0.2867
Epoch: 003/010 training accuracy: 96.72%
Time elapsed: 0.73 min
Epoch: 004/010 | Batch 000/938 | Cost: 0.0207
Epoch: 004/010 | Batch 050/938 | Cost: 0.0499
Epoch: 004/010 | Batch 100/938 | Cost: 0.1858
Epoch: 004/010 | Batch 150/938 | Cost: 0.2015
Epoch: 004/010 | Batch 200/938 | Cost: 0.0285
Epoch: 004/010 | Batch 250/938 | Cost: 0.0029
Epoch: 004/010 | Batch 300/938 | Cost: 0.1746
Epoch: 004/010 | Batch 350/938 | Cost: 0.3149
Epoch: 004/010 | Batch 400/938 | Cost: 0.1773
Epoch: 004/010 | Batch 450/938 | Cost: 0.1013
Epoch: 004/010 | Batch 500/938 | Cost: 0.1665
Epoch: 004/010 | Batch 550/938 | Cost: 0.1540
Epoch: 004/010 | Batch 600/938 | Cost: 0.1822
Epoch: 004/010 | Batch 650/938 | Cost: 0.1506
Epoch: 004/010 | Batch 700/938 | Cost: 0.0224
Epoch: 004/010 | Batch 750/938 | Cost: 0.1400
Epoch: 004/010 | Batch 800/938 | Cost: 0.2262
Epoch: 004/010 | Batch 850/938 | Cost: 0.0679
Epoch: 004/010 | Batch 900/938 | Cost: 0.0020
Epoch: 004/010 training accuracy: 97.63%
Time elapsed: 0.98 min
Epoch: 005/010 | Batch 000/938 | Cost: 0.0508
Epoch: 005/010 | Batch 050/938 | Cost: 0.0585
Epoch: 005/010 | Batch 100/938 | Cost: 0.1441
Epoch: 005/010 | Batch 150/938 | Cost: 0.0862
Epoch: 005/010 | Batch 200/938 | Cost: 0.0284
Epoch: 005/010 | Batch 250/938 | Cost: 0.0977
Epoch: 005/010 | Batch 300/938 | Cost: 0.0565
Epoch: 005/010 | Batch 350/938 | Cost: 0.0272
Epoch: 005/010 | Batch 400/938 | Cost: 0.2603
Epoch: 005/010 | Batch 450/938 | Cost: 0.1202
Epoch: 005/010 | Batch 500/938 | Cost: 0.0612
Epoch: 005/010 | Batch 550/938 | Cost: 0.0833
Epoch: 005/010 | Batch 600/938 | Cost: 0.1666
Epoch: 005/010 | Batch 650/938 | Cost: 0.2642
Epoch: 005/010 | Batch 700/938 | Cost: 0.1884
Epoch: 005/010 | Batch 750/938 | Cost: 0.1608
Epoch: 005/010 | Batch 800/938 | Cost: 0.1029
Epoch: 005/010 | Batch 850/938 | Cost: 0.1178
Epoch: 005/010 | Batch 900/938 | Cost: 0.0709
Epoch: 005/010 training accuracy: 97.58%
Time elapsed: 1.23 min
Epoch: 006/010 | Batch 000/938 | Cost: 0.0642
Epoch: 006/010 | Batch 050/938 | Cost: 0.3518
Epoch: 006/010 | Batch 100/938 | Cost: 0.1134
Epoch: 006/010 | Batch 150/938 | Cost: 0.0821
Epoch: 006/010 | Batch 200/938 | Cost: 0.0645
Epoch: 006/010 | Batch 250/938 | Cost: 0.0486
Epoch: 006/010 | Batch 300/938 | Cost: 0.0972
Epoch: 006/010 | Batch 350/938 | Cost: 0.2861
Epoch: 006/010 | Batch 400/938 | Cost: 0.1126
Epoch: 006/010 | Batch 450/938 | Cost: 0.1479
Epoch: 006/010 | Batch 500/938 | Cost: 0.2181
Epoch: 006/010 | Batch 550/938 | Cost: 0.0674
Epoch: 006/010 | Batch 600/938 | Cost: 0.0705
Epoch: 006/010 | Batch 650/938 | Cost: 0.1032
Epoch: 006/010 | Batch 700/938 | Cost: 0.1529
Epoch: 006/010 | Batch 750/938 | Cost: 0.2484
Epoch: 006/010 | Batch 800/938 | Cost: 0.0432
Epoch: 006/010 | Batch 850/938 | Cost: 0.0821
Epoch: 006/010 | Batch 900/938 | Cost: 0.1152
Epoch: 006/010 training accuracy: 97.09%
Time elapsed: 1.47 min
Epoch: 007/010 | Batch 000/938 | Cost: 0.0418
Epoch: 007/010 | Batch 050/938 | Cost: 0.0527
Epoch: 007/010 | Batch 100/938 | Cost: 0.3778
Epoch: 007/010 | Batch 150/938 | Cost: 0.1742
Epoch: 007/010 | Batch 200/938 | Cost: 0.0725
Epoch: 007/010 | Batch 250/938 | Cost: 0.1187
Epoch: 007/010 | Batch 300/938 | Cost: 0.0980
Epoch: 007/010 | Batch 350/938 | Cost: 0.0077
Epoch: 007/010 | Batch 400/938 | Cost: 0.1274
Epoch: 007/010 | Batch 450/938 | Cost: 0.1387
Epoch: 007/010 | Batch 500/938 | Cost: 0.1959
Epoch: 007/010 | Batch 550/938 | Cost: 0.0874
Epoch: 007/010 | Batch 600/938 | Cost: 0.2559
Epoch: 007/010 | Batch 650/938 | Cost: 0.1413
Epoch: 007/010 | Batch 700/938 | Cost: 0.1285
Epoch: 007/010 | Batch 750/938 | Cost: 0.1931
Epoch: 007/010 | Batch 800/938 | Cost: 0.1151
Epoch: 007/010 | Batch 850/938 | Cost: 0.1889
Epoch: 007/010 | Batch 900/938 | Cost: 0.5518
Epoch: 007/010 training accuracy: 86.62%
Time elapsed: 1.72 min
Epoch: 008/010 | Batch 000/938 | Cost: 0.3283
Epoch: 008/010 | Batch 050/938 | Cost: 0.1818
Epoch: 008/010 | Batch 100/938 | Cost: 0.1827
Epoch: 008/010 | Batch 150/938 | Cost: 0.0844
Epoch: 008/010 | Batch 200/938 | Cost: 0.4017
Epoch: 008/010 | Batch 250/938 | Cost: 0.0129
Epoch: 008/010 | Batch 300/938 | Cost: 0.0155
Epoch: 008/010 | Batch 350/938 | Cost: 0.1844
Epoch: 008/010 | Batch 400/938 | Cost: 0.1146
Epoch: 008/010 | Batch 450/938 | Cost: 0.0566
Epoch: 008/010 | Batch 500/938 | Cost: 0.0895
Epoch: 008/010 | Batch 550/938 | Cost: 0.1851
Epoch: 008/010 | Batch 600/938 | Cost: 0.1134
Epoch: 008/010 | Batch 650/938 | Cost: 0.0838
Epoch: 008/010 | Batch 700/938 | Cost: 0.1157
Epoch: 008/010 | Batch 750/938 | Cost: 0.2275
Epoch: 008/010 | Batch 800/938 | Cost: 0.5753
Epoch: 008/010 | Batch 850/938 | Cost: 0.8735
Epoch: 008/010 | Batch 900/938 | Cost: 0.7114
Epoch: 008/010 training accuracy: 85.51%
Time elapsed: 1.97 min
Epoch: 009/010 | Batch 000/938 | Cost: 0.4851
Epoch: 009/010 | Batch 050/938 | Cost: 0.4595
Epoch: 009/010 | Batch 100/938 | Cost: 0.1939
Epoch: 009/010 | Batch 150/938 | Cost: 0.1813
Epoch: 009/010 | Batch 200/938 | Cost: 0.4969
Epoch: 009/010 | Batch 250/938 | Cost: 0.4874
Epoch: 009/010 | Batch 300/938 | Cost: 0.1605
Epoch: 009/010 | Batch 350/938 | Cost: 0.0899
Epoch: 009/010 | Batch 400/938 | Cost: 0.3318
Epoch: 009/010 | Batch 450/938 | Cost: 0.0524
Epoch: 009/010 | Batch 500/938 | Cost: 0.0215
Epoch: 009/010 | Batch 550/938 | Cost: 0.0997
Epoch: 009/010 | Batch 600/938 | Cost: 0.0541
Epoch: 009/010 | Batch 650/938 | Cost: 0.3480
Epoch: 009/010 | Batch 700/938 | Cost: 0.0736
Epoch: 009/010 | Batch 750/938 | Cost: 0.1682
Epoch: 009/010 | Batch 800/938 | Cost: 0.2877
Epoch: 009/010 | Batch 850/938 | Cost: 0.0539
Epoch: 009/010 | Batch 900/938 | Cost: 0.2708
Epoch: 009/010 training accuracy: 95.67%
Time elapsed: 2.21 min
Epoch: 010/010 | Batch 000/938 | Cost: 0.0531
Epoch: 010/010 | Batch 050/938 | Cost: 0.0453
Epoch: 010/010 | Batch 100/938 | Cost: 1.8852
Epoch: 010/010 | Batch 150/938 | Cost: 0.1455
Epoch: 010/010 | Batch 200/938 | Cost: 0.2089
Epoch: 010/010 | Batch 250/938 | Cost: 0.0155
Epoch: 010/010 | Batch 300/938 | Cost: 0.9183
Epoch: 010/010 | Batch 350/938 | Cost: 0.2231
Epoch: 010/010 | Batch 400/938 | Cost: 0.3704
Epoch: 010/010 | Batch 450/938 | Cost: 0.1086
Epoch: 010/010 | Batch 500/938 | Cost: 0.3775
Epoch: 010/010 | Batch 550/938 | Cost: 0.4196
Epoch: 010/010 | Batch 600/938 | Cost: 0.2836
Epoch: 010/010 | Batch 650/938 | Cost: 0.1170
Epoch: 010/010 | Batch 700/938 | Cost: 0.2631
Epoch: 010/010 | Batch 750/938 | Cost: 0.1400
Epoch: 010/010 | Batch 800/938 | Cost: 0.1048
Epoch: 010/010 | Batch 850/938 | Cost: 0.7937
Epoch: 010/010 | Batch 900/938 | Cost: 0.2107
Epoch: 010/010 training accuracy: 87.98%
Time elapsed: 2.46 min
Total Training Time: 2.46 min
In [9]:
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))
Test accuracy: 86.94%

3 - Norm-clipping

In [10]:
torch.manual_seed(random_seed)
model = MultilayerPerceptron(num_features=num_features,
                             num_classes=num_classes)

model = model.to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)  

###################################################################

start_time = time.time()
for epoch in range(num_epochs):
    model.train()
    for batch_idx, (features, targets) in enumerate(train_loader):
        
        features = features.view(-1, 28*28).to(device)
        targets = targets.to(device)
            
        ### FORWARD AND BACK PROP
        logits, probas = model(features)
        cost = F.cross_entropy(logits, targets)
        optimizer.zero_grad()
        
        cost.backward()
        
        ### UPDATE MODEL PARAMETERS
        
        #########################################################
        #########################################################
        ### GRADIENT CLIPPING
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1., norm_type=2)
        #########################################################
        #########################################################
        
        optimizer.step()
        
        ### LOGGING
        if not batch_idx % 50:
            print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' 
                   %(epoch+1, num_epochs, batch_idx, 
                     len(train_loader), cost))

    with torch.set_grad_enabled(False):
        print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
              epoch+1, num_epochs, 
              compute_accuracy(model, train_loader)))
        
    print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))
    
print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))
Epoch: 001/010 | Batch 000/938 | Cost: 2.3054
Epoch: 001/010 | Batch 050/938 | Cost: 0.5121
Epoch: 001/010 | Batch 100/938 | Cost: 0.3424
Epoch: 001/010 | Batch 150/938 | Cost: 0.2765
Epoch: 001/010 | Batch 200/938 | Cost: 0.5126
Epoch: 001/010 | Batch 250/938 | Cost: 0.1481
Epoch: 001/010 | Batch 300/938 | Cost: 0.2240
Epoch: 001/010 | Batch 350/938 | Cost: 0.1948
Epoch: 001/010 | Batch 400/938 | Cost: 0.0655
Epoch: 001/010 | Batch 450/938 | Cost: 0.1893
Epoch: 001/010 | Batch 500/938 | Cost: 0.4133
Epoch: 001/010 | Batch 550/938 | Cost: 0.0375
Epoch: 001/010 | Batch 600/938 | Cost: 0.2691
Epoch: 001/010 | Batch 650/938 | Cost: 0.3342
Epoch: 001/010 | Batch 700/938 | Cost: 0.1662
Epoch: 001/010 | Batch 750/938 | Cost: 0.0702
Epoch: 001/010 | Batch 800/938 | Cost: 0.4246
Epoch: 001/010 | Batch 850/938 | Cost: 0.2282
Epoch: 001/010 | Batch 900/938 | Cost: 0.0459
Epoch: 001/010 training accuracy: 94.99%
Time elapsed: 0.25 min
Epoch: 002/010 | Batch 000/938 | Cost: 0.2188
Epoch: 002/010 | Batch 050/938 | Cost: 0.3042
Epoch: 002/010 | Batch 100/938 | Cost: 0.1391
Epoch: 002/010 | Batch 150/938 | Cost: 0.1453
Epoch: 002/010 | Batch 200/938 | Cost: 0.3031
Epoch: 002/010 | Batch 250/938 | Cost: 0.1398
Epoch: 002/010 | Batch 300/938 | Cost: 0.0868
Epoch: 002/010 | Batch 350/938 | Cost: 0.1679
Epoch: 002/010 | Batch 400/938 | Cost: 0.0480
Epoch: 002/010 | Batch 450/938 | Cost: 0.2823
Epoch: 002/010 | Batch 500/938 | Cost: 0.2307
Epoch: 002/010 | Batch 550/938 | Cost: 0.1610
Epoch: 002/010 | Batch 600/938 | Cost: 0.0972
Epoch: 002/010 | Batch 650/938 | Cost: 0.3210
Epoch: 002/010 | Batch 700/938 | Cost: 0.0697
Epoch: 002/010 | Batch 750/938 | Cost: 0.0879
Epoch: 002/010 | Batch 800/938 | Cost: 0.2113
Epoch: 002/010 | Batch 850/938 | Cost: 0.2496
Epoch: 002/010 | Batch 900/938 | Cost: 0.2453
Epoch: 002/010 training accuracy: 96.15%
Time elapsed: 0.49 min
Epoch: 003/010 | Batch 000/938 | Cost: 0.1779
Epoch: 003/010 | Batch 050/938 | Cost: 0.0618
Epoch: 003/010 | Batch 100/938 | Cost: 0.0570
Epoch: 003/010 | Batch 150/938 | Cost: 0.2510
Epoch: 003/010 | Batch 200/938 | Cost: 0.1193
Epoch: 003/010 | Batch 250/938 | Cost: 0.2530
Epoch: 003/010 | Batch 300/938 | Cost: 0.1220
Epoch: 003/010 | Batch 350/938 | Cost: 0.2401
Epoch: 003/010 | Batch 400/938 | Cost: 0.0520
Epoch: 003/010 | Batch 450/938 | Cost: 0.0262
Epoch: 003/010 | Batch 500/938 | Cost: 0.2961
Epoch: 003/010 | Batch 550/938 | Cost: 0.0030
Epoch: 003/010 | Batch 600/938 | Cost: 0.1998
Epoch: 003/010 | Batch 650/938 | Cost: 0.1968
Epoch: 003/010 | Batch 700/938 | Cost: 0.0499
Epoch: 003/010 | Batch 750/938 | Cost: 0.1742
Epoch: 003/010 | Batch 800/938 | Cost: 0.1034
Epoch: 003/010 | Batch 850/938 | Cost: 0.0437
Epoch: 003/010 | Batch 900/938 | Cost: 0.1414
Epoch: 003/010 training accuracy: 97.30%
Time elapsed: 0.74 min
Epoch: 004/010 | Batch 000/938 | Cost: 0.1098
Epoch: 004/010 | Batch 050/938 | Cost: 0.0060
Epoch: 004/010 | Batch 100/938 | Cost: 0.3551
Epoch: 004/010 | Batch 150/938 | Cost: 0.3143
Epoch: 004/010 | Batch 200/938 | Cost: 0.0527
Epoch: 004/010 | Batch 250/938 | Cost: 0.0204
Epoch: 004/010 | Batch 300/938 | Cost: 0.0289
Epoch: 004/010 | Batch 350/938 | Cost: 0.2386
Epoch: 004/010 | Batch 400/938 | Cost: 0.0694
Epoch: 004/010 | Batch 450/938 | Cost: 0.1200
Epoch: 004/010 | Batch 500/938 | Cost: 0.0797
Epoch: 004/010 | Batch 550/938 | Cost: 0.0891
Epoch: 004/010 | Batch 600/938 | Cost: 0.3322
Epoch: 004/010 | Batch 650/938 | Cost: 0.1640
Epoch: 004/010 | Batch 700/938 | Cost: 0.1170
Epoch: 004/010 | Batch 750/938 | Cost: 0.2028
Epoch: 004/010 | Batch 800/938 | Cost: 0.2188
Epoch: 004/010 | Batch 850/938 | Cost: 0.0575
Epoch: 004/010 | Batch 900/938 | Cost: 0.0180
Epoch: 004/010 training accuracy: 96.86%
Time elapsed: 0.98 min
Epoch: 005/010 | Batch 000/938 | Cost: 0.0779
Epoch: 005/010 | Batch 050/938 | Cost: 0.1183
Epoch: 005/010 | Batch 100/938 | Cost: 0.1184
Epoch: 005/010 | Batch 150/938 | Cost: 0.0815
Epoch: 005/010 | Batch 200/938 | Cost: 0.0691
Epoch: 005/010 | Batch 250/938 | Cost: 0.0784
Epoch: 005/010 | Batch 300/938 | Cost: 0.1464
Epoch: 005/010 | Batch 350/938 | Cost: 0.1488
Epoch: 005/010 | Batch 400/938 | Cost: 0.2636
Epoch: 005/010 | Batch 450/938 | Cost: 0.0839
Epoch: 005/010 | Batch 500/938 | Cost: 0.1343
Epoch: 005/010 | Batch 550/938 | Cost: 0.0514
Epoch: 005/010 | Batch 600/938 | Cost: 0.1802
Epoch: 005/010 | Batch 650/938 | Cost: 0.0681
Epoch: 005/010 | Batch 700/938 | Cost: 0.0986
Epoch: 005/010 | Batch 750/938 | Cost: 0.0930
Epoch: 005/010 | Batch 800/938 | Cost: 0.1829
Epoch: 005/010 | Batch 850/938 | Cost: 0.1694
Epoch: 005/010 | Batch 900/938 | Cost: 0.0440
Epoch: 005/010 training accuracy: 97.22%
Time elapsed: 1.22 min
Epoch: 006/010 | Batch 000/938 | Cost: 0.0142
Epoch: 006/010 | Batch 050/938 | Cost: 0.3528
Epoch: 006/010 | Batch 100/938 | Cost: 0.0710
Epoch: 006/010 | Batch 150/938 | Cost: 0.0553
Epoch: 006/010 | Batch 200/938 | Cost: 0.0084
Epoch: 006/010 | Batch 250/938 | Cost: 0.1178
Epoch: 006/010 | Batch 300/938 | Cost: 0.1271
Epoch: 006/010 | Batch 350/938 | Cost: 0.0404
Epoch: 006/010 | Batch 400/938 | Cost: 0.1435
Epoch: 006/010 | Batch 450/938 | Cost: 0.1568
Epoch: 006/010 | Batch 500/938 | Cost: 0.2100
Epoch: 006/010 | Batch 550/938 | Cost: 0.0019
Epoch: 006/010 | Batch 600/938 | Cost: 0.1721
Epoch: 006/010 | Batch 650/938 | Cost: 0.0943
Epoch: 006/010 | Batch 700/938 | Cost: 0.0913
Epoch: 006/010 | Batch 750/938 | Cost: 0.1211
Epoch: 006/010 | Batch 800/938 | Cost: 0.0890
Epoch: 006/010 | Batch 850/938 | Cost: 0.0390
Epoch: 006/010 | Batch 900/938 | Cost: 0.0521
Epoch: 006/010 training accuracy: 97.79%
Time elapsed: 1.47 min
Epoch: 007/010 | Batch 000/938 | Cost: 0.0059
Epoch: 007/010 | Batch 050/938 | Cost: 0.0371
Epoch: 007/010 | Batch 100/938 | Cost: 0.2702
Epoch: 007/010 | Batch 150/938 | Cost: 0.1142
Epoch: 007/010 | Batch 200/938 | Cost: 0.0900
Epoch: 007/010 | Batch 250/938 | Cost: 0.1922
Epoch: 007/010 | Batch 300/938 | Cost: 0.0062
Epoch: 007/010 | Batch 350/938 | Cost: 0.0435
Epoch: 007/010 | Batch 400/938 | Cost: 0.0503
Epoch: 007/010 | Batch 450/938 | Cost: 0.1411
Epoch: 007/010 | Batch 500/938 | Cost: 0.1547
Epoch: 007/010 | Batch 550/938 | Cost: 0.1858
Epoch: 007/010 | Batch 600/938 | Cost: 0.0108
Epoch: 007/010 | Batch 650/938 | Cost: 0.0569
Epoch: 007/010 | Batch 700/938 | Cost: 0.0254
Epoch: 007/010 | Batch 750/938 | Cost: 0.0635
Epoch: 007/010 | Batch 800/938 | Cost: 0.2539
Epoch: 007/010 | Batch 850/938 | Cost: 0.1338
Epoch: 007/010 | Batch 900/938 | Cost: 0.3336
Epoch: 007/010 training accuracy: 98.25%
Time elapsed: 1.71 min
Epoch: 008/010 | Batch 000/938 | Cost: 0.0215
Epoch: 008/010 | Batch 050/938 | Cost: 0.2800
Epoch: 008/010 | Batch 100/938 | Cost: 0.2627
Epoch: 008/010 | Batch 150/938 | Cost: 0.0538
Epoch: 008/010 | Batch 200/938 | Cost: 0.2164
Epoch: 008/010 | Batch 250/938 | Cost: 0.0025
Epoch: 008/010 | Batch 300/938 | Cost: 0.0021
Epoch: 008/010 | Batch 350/938 | Cost: 0.1489
Epoch: 008/010 | Batch 400/938 | Cost: 0.0997
Epoch: 008/010 | Batch 450/938 | Cost: 0.0055
Epoch: 008/010 | Batch 500/938 | Cost: 0.0181
Epoch: 008/010 | Batch 550/938 | Cost: 0.1672
Epoch: 008/010 | Batch 600/938 | Cost: 0.0538
Epoch: 008/010 | Batch 650/938 | Cost: 0.0842
Epoch: 008/010 | Batch 700/938 | Cost: 0.0941
Epoch: 008/010 | Batch 750/938 | Cost: 0.0171
Epoch: 008/010 | Batch 800/938 | Cost: 0.0638
Epoch: 008/010 | Batch 850/938 | Cost: 0.2507
Epoch: 008/010 | Batch 900/938 | Cost: 0.0568
Epoch: 008/010 training accuracy: 98.31%
Time elapsed: 1.96 min
Epoch: 009/010 | Batch 000/938 | Cost: 0.0844
Epoch: 009/010 | Batch 050/938 | Cost: 0.1087
Epoch: 009/010 | Batch 100/938 | Cost: 0.0584
Epoch: 009/010 | Batch 150/938 | Cost: 0.0544
Epoch: 009/010 | Batch 200/938 | Cost: 0.0352
Epoch: 009/010 | Batch 250/938 | Cost: 0.0189
Epoch: 009/010 | Batch 300/938 | Cost: 0.0356
Epoch: 009/010 | Batch 350/938 | Cost: 0.1357
Epoch: 009/010 | Batch 400/938 | Cost: 0.2133
Epoch: 009/010 | Batch 450/938 | Cost: 0.0081
Epoch: 009/010 | Batch 500/938 | Cost: 0.0710
Epoch: 009/010 | Batch 550/938 | Cost: 0.0652
Epoch: 009/010 | Batch 600/938 | Cost: 0.0136
Epoch: 009/010 | Batch 650/938 | Cost: 0.0772
Epoch: 009/010 | Batch 700/938 | Cost: 0.0744
Epoch: 009/010 | Batch 750/938 | Cost: 0.0388
Epoch: 009/010 | Batch 800/938 | Cost: 0.0208
Epoch: 009/010 | Batch 850/938 | Cost: 0.0114
Epoch: 009/010 | Batch 900/938 | Cost: 0.0706
Epoch: 009/010 training accuracy: 97.76%
Time elapsed: 2.20 min
Epoch: 010/010 | Batch 000/938 | Cost: 0.0773
Epoch: 010/010 | Batch 050/938 | Cost: 0.0362
Epoch: 010/010 | Batch 100/938 | Cost: 0.0406
Epoch: 010/010 | Batch 150/938 | Cost: 0.0900
Epoch: 010/010 | Batch 200/938 | Cost: 0.3629
Epoch: 010/010 | Batch 250/938 | Cost: 0.0016
Epoch: 010/010 | Batch 300/938 | Cost: 0.0314
Epoch: 010/010 | Batch 350/938 | Cost: 0.0677
Epoch: 010/010 | Batch 400/938 | Cost: 0.0821
Epoch: 010/010 | Batch 450/938 | Cost: 0.0717
Epoch: 010/010 | Batch 500/938 | Cost: 0.2704
Epoch: 010/010 | Batch 550/938 | Cost: 0.1784
Epoch: 010/010 | Batch 600/938 | Cost: 0.0899
Epoch: 010/010 | Batch 650/938 | Cost: 0.0578
Epoch: 010/010 | Batch 700/938 | Cost: 0.1572
Epoch: 010/010 | Batch 750/938 | Cost: 0.0106
Epoch: 010/010 | Batch 800/938 | Cost: 0.0714
Epoch: 010/010 | Batch 850/938 | Cost: 0.0125
Epoch: 010/010 | Batch 900/938 | Cost: 0.0235
Epoch: 010/010 training accuracy: 98.38%
Time elapsed: 2.45 min
Total Training Time: 2.45 min
In [11]:
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))
Test accuracy: 96.89%
In [12]:
%watermark -iv
numpy       1.16.4
torch       1.2.0
torchvision 0.4.0a0+6b959ee