Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.

In [1]:
%load_ext watermark
%watermark -a 'Sebastian Raschka' -v -p torch
Sebastian Raschka 

CPython 3.6.8
IPython 7.2.0

torch 1.0.0

Model Zoo -- CNN Gender Classifier (ResNet-50 Architecture, CelebA) with Data Parallelism

Network Architecture

The network in this notebook is an implementation of the ResNet-50 [1] architecture on the CelebA face dataset [2] to train a gender classifier.

References

  • [1] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). (CVPR Link)

  • [2] Zhang, K., Tan, L., Li, Z., & Qiao, Y. (2016). Gender and smile classification using deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 34-38).

The ResNet-50 architecture is similar to the ResNet-34 architecture shown below (from [1]):

However, in ResNet-50, the skip connection uses a bottleneck (from [1]):

The following figure illustrates residual blocks with skip connections such that the input passed via the shortcut matches the dimensions of the main path's output, which allows the network to learn identity functions.

The ResNet-34 architecture actually uses residual blocks with skip connections such that the input passed via the shortcut matches is resized to dimensions of the main path's output. Such a residual block is illustrated below:

The ResNet-50 uses a bottleneck as shown below:

For a more detailed explanation see the other notebook, resnet-ex-1.ipynb.

Imports

In [2]:
import os
import time

import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import Dataset
from torch.utils.data import DataLoader

from torchvision import datasets
from torchvision import transforms

import matplotlib.pyplot as plt
from PIL import Image


if torch.cuda.is_available():
    torch.backends.cudnn.deterministic = True

Settings

In [3]:
##########################
### SETTINGS
##########################

# Hyperparameters
RANDOM_SEED = 1
LEARNING_RATE = 0.001
NUM_EPOCHS = 10

# Architecture
NUM_FEATURES = 128*128
NUM_CLASSES = 2
BATCH_SIZE = 128*torch.cuda.device_count()
DEVICE = 'cuda:0' # default GPU device
GRAYSCALE = False


if torch.cuda.is_available():
    torch.backends.cudnn.deterministic = True

Dataset

Downloading the Dataset

Note that the ~200,000 CelebA face image dataset is relatively large (~1.3 Gb). The download link provided below was provided by the author on the official CelebA website at http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html.

1) Download and unzip the file img_align_celeba.zip, which contains the images in jpeg format.

2) Download the list_attr_celeba.txt file, which contains the class labels

3) Download the list_eval_partition.txt file, which contains training/validation/test partitioning info

Preparing the Dataset

In [4]:
df1 = pd.read_csv('list_attr_celeba.txt', sep="\s+", skiprows=1, usecols=['Male'])

# Make 0 (female) & 1 (male) labels instead of -1 & 1
df1.loc[df1['Male'] == -1, 'Male'] = 0

df1.head()
Out[4]:
Male
000001.jpg 0
000002.jpg 0
000003.jpg 1
000004.jpg 0
000005.jpg 0
In [5]:
df2 = pd.read_csv('list_eval_partition.txt', sep="\s+", skiprows=0, header=None)
df2.columns = ['Filename', 'Partition']
df2 = df2.set_index('Filename')

df2.head()
Out[5]:
Partition
Filename
000001.jpg 0
000002.jpg 0
000003.jpg 0
000004.jpg 0
000005.jpg 0
In [6]:
df3 = df1.merge(df2, left_index=True, right_index=True)
df3.head()
Out[6]:
Male Partition
000001.jpg 0 0
000002.jpg 0 0
000003.jpg 1 0
000004.jpg 0 0
000005.jpg 0 0
In [7]:
df3.to_csv('celeba-gender-partitions.csv')
df4 = pd.read_csv('celeba-gender-partitions.csv', index_col=0)
df4.head()
Out[7]:
Male Partition
000001.jpg 0 0
000002.jpg 0 0
000003.jpg 1 0
000004.jpg 0 0
000005.jpg 0 0
In [8]:
df4.loc[df4['Partition'] == 0].to_csv('celeba-gender-train.csv')
df4.loc[df4['Partition'] == 1].to_csv('celeba-gender-valid.csv')
df4.loc[df4['Partition'] == 2].to_csv('celeba-gender-test.csv')
In [9]:
img = Image.open('img_align_celeba/000001.jpg')
print(np.asarray(img, dtype=np.uint8).shape)
plt.imshow(img);
(218, 178, 3)

Implementing a Custom DataLoader Class

In [10]:
class CelebaDataset(Dataset):
    """Custom Dataset for loading CelebA face images"""

    def __init__(self, csv_path, img_dir, transform=None):
    
        df = pd.read_csv(csv_path, index_col=0)
        self.img_dir = img_dir
        self.csv_path = csv_path
        self.img_names = df.index.values
        self.y = df['Male'].values
        self.transform = transform

    def __getitem__(self, index):
        img = Image.open(os.path.join(self.img_dir,
                                      self.img_names[index]))
        
        if self.transform is not None:
            img = self.transform(img)
        
        label = self.y[index]
        return img, label

    def __len__(self):
        return self.y.shape[0]
In [11]:
# Note that transforms.ToTensor()
# already divides pixels by 255. internally

custom_transform = transforms.Compose([transforms.CenterCrop((178, 178)),
                                       transforms.Resize((128, 128)),
                                       #transforms.Grayscale(),                                       
                                       #transforms.Lambda(lambda x: x/255.),
                                       transforms.ToTensor()])

train_dataset = CelebaDataset(csv_path='celeba-gender-train.csv',
                              img_dir='img_align_celeba/',
                              transform=custom_transform)

valid_dataset = CelebaDataset(csv_path='celeba-gender-valid.csv',
                              img_dir='img_align_celeba/',
                              transform=custom_transform)

test_dataset = CelebaDataset(csv_path='celeba-gender-test.csv',
                             img_dir='img_align_celeba/',
                             transform=custom_transform)


train_loader = DataLoader(dataset=train_dataset,
                          batch_size=BATCH_SIZE,
                          shuffle=True,
                          num_workers=4)

valid_loader = DataLoader(dataset=valid_dataset,
                          batch_size=BATCH_SIZE,
                          shuffle=False,
                          num_workers=4)

test_loader = DataLoader(dataset=test_dataset,
                         batch_size=BATCH_SIZE,
                         shuffle=False,
                         num_workers=4)
In [12]:
torch.manual_seed(0)

for epoch in range(2):

    for batch_idx, (x, y) in enumerate(train_loader):
        
        print('Epoch:', epoch+1, end='')
        print(' | Batch index:', batch_idx, end='')
        print(' | Batch size:', y.size()[0])
        
        x = x.to(DEVICE)
        y = y.to(DEVICE)
        time.sleep(1)
        break
Epoch: 1 | Batch index: 0 | Batch size: 512
Epoch: 2 | Batch index: 0 | Batch size: 512

Model

The following code cell that implements the ResNet-34 architecture is a derivative of the code provided at https://pytorch.org/docs/0.4.0/_modules/torchvision/models/resnet.html.

In [13]:
##########################
### MODEL
##########################


def conv3x3(in_planes, out_planes, stride=1):
    """3x3 convolution with padding"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=1, bias=False)


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out




class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes, grayscale):
        self.inplanes = 64
        if grayscale:
            in_dim = 1
        else:
            in_dim = 3
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(in_dim, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AvgPool2d(7, stride=1, padding=2)
        self.fc = nn.Linear(2048 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, (2. / n)**.5)
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        logits = self.fc(x)
        probas = F.softmax(logits, dim=1)
        return logits, probas



def resnet50(num_classes, grayscale):
    """Constructs a ResNet-50 model."""
    model = ResNet(block=Bottleneck, 
                   layers=[3, 4, 6, 3],
                   num_classes=NUM_CLASSES,
                   grayscale=grayscale)
    return model
In [14]:
torch.manual_seed(RANDOM_SEED)

##########################
### COST AND OPTIMIZER
##########################



model = resnet50(NUM_CLASSES, GRAYSCALE)


#### DATA PARALLEL START ####
if torch.cuda.device_count() > 1:
    print("Using", torch.cuda.device_count(), "GPUs")
    model = nn.DataParallel(model)
#### DATA PARALLEL END ####


model.to(DEVICE)

optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)  
Using 4 GPUs

Training

In [15]:
def compute_accuracy(model, data_loader, device):
    correct_pred, num_examples = 0, 0
    for i, (features, targets) in enumerate(data_loader):
            
        features = features.to(device)
        targets = targets.to(device)

        logits, probas = model(features)
        _, predicted_labels = torch.max(probas, 1)
        num_examples += targets.size(0)
        correct_pred += (predicted_labels == targets).sum()
    return correct_pred.float()/num_examples * 100
    

start_time = time.time()
for epoch in range(NUM_EPOCHS):
    
    model.train()
    for batch_idx, (features, targets) in enumerate(train_loader):
        
        features = features.to(DEVICE)
        targets = targets.to(DEVICE)
            
        ### FORWARD AND BACK PROP
        logits, probas = model(features)
        cost = F.cross_entropy(logits, targets)
        optimizer.zero_grad()
        
        cost.backward()
        
        ### UPDATE MODEL PARAMETERS
        optimizer.step()
        
        ### LOGGING
        if not batch_idx % 50:
            print ('Epoch: %03d/%03d | Batch %04d/%04d | Cost: %.4f' 
                   %(epoch+1, NUM_EPOCHS, batch_idx, 
                     len(train_loader), cost))

        

    model.eval()
    with torch.set_grad_enabled(False): # save memory during inference
        print('Epoch: %03d/%03d | Train: %.3f%% | Valid: %.3f%%' % (
              epoch+1, NUM_EPOCHS, 
              compute_accuracy(model, train_loader, device=DEVICE),
              compute_accuracy(model, valid_loader, device=DEVICE)))
        
    print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))
    
print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))
Epoch: 001/010 | Batch 0000/0318 | Cost: 0.7110
Epoch: 001/010 | Batch 0050/0318 | Cost: 0.2674
Epoch: 001/010 | Batch 0100/0318 | Cost: 0.1290
Epoch: 001/010 | Batch 0150/0318 | Cost: 0.1481
Epoch: 001/010 | Batch 0200/0318 | Cost: 0.0997
Epoch: 001/010 | Batch 0250/0318 | Cost: 0.1753
Epoch: 001/010 | Batch 0300/0318 | Cost: 0.0891
Epoch: 001/010 | Train: 94.134% | Valid: 94.478%
Time elapsed: 3.88 min
Epoch: 002/010 | Batch 0000/0318 | Cost: 0.1089
Epoch: 002/010 | Batch 0050/0318 | Cost: 0.0767
Epoch: 002/010 | Batch 0100/0318 | Cost: 0.0772
Epoch: 002/010 | Batch 0150/0318 | Cost: 0.0767
Epoch: 002/010 | Batch 0200/0318 | Cost: 0.0857
Epoch: 002/010 | Batch 0250/0318 | Cost: 0.1162
Epoch: 002/010 | Batch 0300/0318 | Cost: 0.0885
Epoch: 002/010 | Train: 97.543% | Valid: 97.418%
Time elapsed: 7.64 min
Epoch: 003/010 | Batch 0000/0318 | Cost: 0.0712
Epoch: 003/010 | Batch 0050/0318 | Cost: 0.0749
Epoch: 003/010 | Batch 0100/0318 | Cost: 0.0626
Epoch: 003/010 | Batch 0150/0318 | Cost: 0.0785
Epoch: 003/010 | Batch 0200/0318 | Cost: 0.0838
Epoch: 003/010 | Batch 0250/0318 | Cost: 0.0342
Epoch: 003/010 | Batch 0300/0318 | Cost: 0.0614
Epoch: 003/010 | Train: 97.869% | Valid: 97.559%
Time elapsed: 11.41 min
Epoch: 004/010 | Batch 0000/0318 | Cost: 0.0599
Epoch: 004/010 | Batch 0050/0318 | Cost: 0.0360
Epoch: 004/010 | Batch 0100/0318 | Cost: 0.0597
Epoch: 004/010 | Batch 0150/0318 | Cost: 0.0482
Epoch: 004/010 | Batch 0200/0318 | Cost: 0.0533
Epoch: 004/010 | Batch 0250/0318 | Cost: 0.0642
Epoch: 004/010 | Batch 0300/0318 | Cost: 0.0456
Epoch: 004/010 | Train: 98.215% | Valid: 97.629%
Time elapsed: 15.16 min
Epoch: 005/010 | Batch 0000/0318 | Cost: 0.0662
Epoch: 005/010 | Batch 0050/0318 | Cost: 0.0551
Epoch: 005/010 | Batch 0100/0318 | Cost: 0.0672
Epoch: 005/010 | Batch 0150/0318 | Cost: 0.0475
Epoch: 005/010 | Batch 0200/0318 | Cost: 0.0452
Epoch: 005/010 | Batch 0250/0318 | Cost: 0.0444
Epoch: 005/010 | Batch 0300/0318 | Cost: 0.0242
Epoch: 005/010 | Train: 98.503% | Valid: 97.826%
Time elapsed: 18.90 min
Epoch: 006/010 | Batch 0000/0318 | Cost: 0.0268
Epoch: 006/010 | Batch 0050/0318 | Cost: 0.0272
Epoch: 006/010 | Batch 0100/0318 | Cost: 0.0235
Epoch: 006/010 | Batch 0150/0318 | Cost: 0.0317
Epoch: 006/010 | Batch 0200/0318 | Cost: 0.0520
Epoch: 006/010 | Batch 0250/0318 | Cost: 0.0575
Epoch: 006/010 | Batch 0300/0318 | Cost: 0.0702
Epoch: 006/010 | Train: 97.707% | Valid: 97.458%
Time elapsed: 22.67 min
Epoch: 007/010 | Batch 0000/0318 | Cost: 0.0414
Epoch: 007/010 | Batch 0050/0318 | Cost: 0.0319
Epoch: 007/010 | Batch 0100/0318 | Cost: 0.0201
Epoch: 007/010 | Batch 0150/0318 | Cost: 0.0375
Epoch: 007/010 | Batch 0200/0318 | Cost: 0.0270
Epoch: 007/010 | Batch 0250/0318 | Cost: 0.0323
Epoch: 007/010 | Batch 0300/0318 | Cost: 0.0256
Epoch: 007/010 | Train: 98.801% | Valid: 97.891%
Time elapsed: 26.44 min
Epoch: 008/010 | Batch 0000/0318 | Cost: 0.0163
Epoch: 008/010 | Batch 0050/0318 | Cost: 0.0303
Epoch: 008/010 | Batch 0100/0318 | Cost: 0.0285
Epoch: 008/010 | Batch 0150/0318 | Cost: 0.0217
Epoch: 008/010 | Batch 0200/0318 | Cost: 0.0481
Epoch: 008/010 | Batch 0250/0318 | Cost: 0.0198
Epoch: 008/010 | Batch 0300/0318 | Cost: 0.0312
Epoch: 008/010 | Train: 98.960% | Valid: 98.032%
Time elapsed: 30.20 min
Epoch: 009/010 | Batch 0000/0318 | Cost: 0.0182
Epoch: 009/010 | Batch 0050/0318 | Cost: 0.0204
Epoch: 009/010 | Batch 0100/0318 | Cost: 0.0472
Epoch: 009/010 | Batch 0150/0318 | Cost: 0.0239
Epoch: 009/010 | Batch 0200/0318 | Cost: 0.0161
Epoch: 009/010 | Batch 0250/0318 | Cost: 0.0384
Epoch: 009/010 | Batch 0300/0318 | Cost: 0.0197
Epoch: 009/010 | Train: 99.349% | Valid: 98.012%
Time elapsed: 33.96 min
Epoch: 010/010 | Batch 0000/0318 | Cost: 0.0238
Epoch: 010/010 | Batch 0050/0318 | Cost: 0.0251
Epoch: 010/010 | Batch 0100/0318 | Cost: 0.0144
Epoch: 010/010 | Batch 0150/0318 | Cost: 0.0133
Epoch: 010/010 | Batch 0200/0318 | Cost: 0.0441
Epoch: 010/010 | Batch 0250/0318 | Cost: 0.0358
Epoch: 010/010 | Batch 0300/0318 | Cost: 0.0277
Epoch: 010/010 | Train: 99.374% | Valid: 97.966%
Time elapsed: 37.70 min
Total Training Time: 37.70 min

Evaluation

In [16]:
with torch.set_grad_enabled(False): # save memory during inference
    print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader, device=DEVICE)))
Test accuracy: 97.40%
In [17]:
for batch_idx, (features, targets) in enumerate(test_loader):

    features = features
    targets = targets
    break
    
plt.imshow(np.transpose(features[0], (1, 2, 0)))
Out[17]:
<matplotlib.image.AxesImage at 0x7f6e036358d0>
In [18]:
model.eval()
logits, probas = model(features.to(DEVICE)[0, None])
print('Probability Female %.2f%%' % (probas[0][0]*100))
Probability Female 100.00%
In [19]:
%watermark -iv
numpy       1.15.4
pandas      0.23.4
torch       1.0.0
PIL.Image   5.3.0