In [ ]:

#default_exp style_transfer

Before we run anything we want to ensure we have a P100 GPU:

In [ ]:

!nvidia-smi

Sat Feb 15 20:18:30 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.48.02    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   45C    P0    36W / 250W |   1795MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

If not, run the following:

(NOTE: This may change due to Colab's new policy)

In [ ]:

import torch
data = torch.ones(1000,1000,1000).cuda()

05 - Style Transfer¶

This notebook is largely based on the work by Lucas Vazquez on the fastai forum

What will we cover?¶

Custom loss functions
Custom models
Utilizing the nb_dev library
Deployment Code

Intro to `nbdev`¶

Library by Jeremy and Sylvain for writing libraries
Used to make the entire fastai library
Converts a ipynb to documentation as well as .py with only specific cells
Few base terminology:
- #default_exp
- #hide
- #export

In [ ]:

#hide
#Run once per session
!pip install fastai

In [ ]:

#hide
from nbdev.showdoc import *

We always need the showdoc to export

fastai libraries¶

In [ ]:

#export
from fastai.vision.all import *

What is Style Transfer?¶

Feature Loss
- Aka perception loss
Image Transformation network
- Input images -> Output images
Pre-Trained Loss Network
- Perceptual loss functions
  - A measurement of the differences in content and style between two images

alt text

Source: https://arxiv.org/abs/1603.08155

Our Pre-Trained Network:¶

vgg-19:
- Small CNN pre-trained on ImageNet
- Only 19 layers deep

In [ ]:

#export
from torchvision.models import vgg19, vgg16

In [ ]:

feat_net = vgg19(pretrained=True).features.cuda().eval()

We'll get rid of the head and use the internal activations (and our generator model's loss). As a result, we want to set every layer to un-trainable

In [ ]:

for p in feat_net.parameters(): p.requries_grad=False

We will be using feature detections that our model picks up, which is like our heatmaps generated for our Classification models

In [ ]:

layers = [feat_net[i] for i in [1, 6, 11, 20, 29, 22]]; layers

Out[ ]:

[ReLU(inplace=True),
 ReLU(inplace=True),
 ReLU(inplace=True),
 ReLU(inplace=True),
 ReLU(inplace=True),
 ReLU(inplace=True)]

The outputs are ReLU layers. Below is a configuration for the 16 and 19 models

In [ ]:

#export
_vgg_config = {
    'vgg16' : [1, 11, 18, 25, 20],
    'vgg19' : [1, 6, 11, 20, 29, 22]
}

Let's write a quick get_layers function to grab our network and the layers

In [ ]:

#export 
def _get_layers(arch:str, pretrained=True):
  "Get the layers and arch for a VGG Model (16 and 19 are supported only)"
  feat_net = vgg19(pretrained=pretrained).cuda() if arch.find('9') > 1 else vgg16(pretrained=pretrained).cuda()
  config = _vgg_config.get(arch)
  features = feat_net.features.cuda().eval()
  for p in features.parameters(): p.requires_grad=False
  return feat_net, [features[i] for i in config]

Now let's make it all in one go utilizing our private functions to pass in an architecture name and a pretrained parameter

In [ ]:

#export
def get_feats(arch:str, pretrained=True):
  "Get the features of an architecture"
  feat_net, layers = _get_layers(arch, pretrained)
  hooks = hook_outputs(layers, detach=False)
  def _inner(x):
    feat_net(x)
    return hooks.stored
  return _inner

In [ ]:

feats = get_feats('vgg19')

The Loss Function¶

Our loss fuction needs:

Our original image
Some artwork / style
Activation features from our encoder

What image will we be using?

alt text

Let's grab the image

In [ ]:

#export
url = 'https://static.greatbigcanvas.com/images/singlecanvas_thick_none/megan-aroon-duncanson/little-village-abstract-art-house-painting,1162125.jpg'

In [ ]:

!wget {url} -O 'style.jpg'

--2020-02-15 20:18:14--  https://static.greatbigcanvas.com/images/singlecanvas_thick_none/megan-aroon-duncanson/little-village-abstract-art-house-painting,1162125.jpg
Resolving static.greatbigcanvas.com (static.greatbigcanvas.com)... 3.212.96.207, 52.73.94.154
Connecting to static.greatbigcanvas.com (static.greatbigcanvas.com)|3.212.96.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 445921 (435K) [image/jpeg]
Saving to: ‘style.jpg’

style.jpg           100%[===================>] 435.47K  --.-KB/s    in 0.04s   

2020-02-15 20:18:14 (9.65 MB/s) - ‘style.jpg’ saved [445921/445921]

In [ ]:

fn = 'style.jpg'

We can now make a PipeLine to convert our image into a Tensor to use in our loss function. We'll want to use the Datasets for this

In [ ]:

dset = Datasets(fn, tfms=[PILImage.create])

In [ ]:

dl = dset.dataloaders(after_item=[ToTensor()], after_batch=[IntToFloatTensor(), Normalize.from_stats(*imagenet_stats)], bs=1)

In [ ]:

dl.show_batch(figsize=(7,7))

In [ ]:

style_im = dl.one_batch()[0]

In [ ]:

style_im.shape

Out[ ]:

torch.Size([1, 3, 824, 1000])

In [ ]:

#export
def get_style_im(url):
  download_url(url, 'style.jpg')
  fn = 'style.jpg'
  dset = Datasets(fn, tfms=[PILImage.create])
  dl = dset.dataloaders(after_item=[ToTensor()], after_batch=[IntToFloatTensor(), Normalize.from_stats(*imagenet_stats)], bs=1)
  return dl.one_batch()[0]

We can then grab the features using our feats function we made earlier

In [ ]:

im_feats = feats(style_im)

Let's look at their sizes

In [ ]:

for feat in im_feats:
  print(feat.shape)

torch.Size([1, 64, 824, 1000])
torch.Size([1, 128, 412, 500])
torch.Size([1, 256, 206, 250])
torch.Size([1, 512, 103, 125])
torch.Size([1, 512, 51, 62])
torch.Size([1, 512, 103, 125])

Now we can bring those images down to the channel size

In [ ]:

#export
def gram(x:Tensor):
  "Transpose a tensor based on c,w,h"
  n, c, h, w = x.shape
  x = x.view(n, c, -1)
  return (x @ x.transpose(1, 2))/(c*w*h)

In [ ]:

im_grams = [gram(f) for f in im_feats]

In [ ]:

for feat in im_grams:
  print(feat.shape)

torch.Size([1, 64, 64])
torch.Size([1, 128, 128])
torch.Size([1, 256, 256])
torch.Size([1, 512, 512])
torch.Size([1, 512, 512])
torch.Size([1, 512, 512])

In [ ]:

#export
def get_stl_fs(fs): return fs[:-1]

We're almost there! Let's look at why that was important

In [ ]:

#export
def style_loss(inp:Tensor, out_feat:Tensor):
  "Calculate style loss, assumes we have `im_grams`"
  # Get batch size
  bs = inp[0].shape[0]
  loss = []
  # For every item in our inputs
  for y, f in zip(*map(get_stl_fs, [im_grams, inp])):
    # Calculate MSE
    loss.append(F.mse_loss(y.repeat(bs, 1, 1), gram(f)))
  # Multiply their sum by 30000
  return 3e5 * sum(loss)

Great, so what now? Let's make a loss function for fastai!

Remember, we do not care to use any initial metrics

In [ ]:

#export
class FeatureLoss(Module):
  "Combines two losses and features into a useable loss function"
  def __init__(self, feats, style_loss, act_loss):
    store_attr(self, 'feats, style_loss, act_loss')
    self.reset_metrics()

  def forward(self, pred, targ):
    # First get the features of our prediction and target
    pred_feat, targ_feat = self.feats(pred), self.feats(targ)
    # Calculate style and activation loss
    style_loss = self.style_loss(pred_feat, targ_feat)
    act_loss = self.act_loss(pred_feat, targ_feat)
    # Store the loss
    self._add_loss(style_loss, act_loss)
    # Return the sum
    return style_loss + act_loss

  def reset_metrics(self):
    # Generates a blank metric
    self.metrics = dict(style = [], content = [])

  def _add_loss(self, style_loss, act_loss):
    # Add to our metrics
    self.metrics['style'].append(style_loss)
    self.metrics['content'].append(act_loss)

In [ ]:

#export
def act_loss(inp:Tensor, targ:Tensor):
  "Calculate the MSE loss of the activation layers"
  return F.mse_loss(inp[-1], targ[-1])

Let's declare our loss function by passing in our features and our two 'mini' loss functions

In [ ]:

loss_func = FeatureLoss(feats, style_loss, act_loss)

The Model Architecture¶

Let's now build our model

In [ ]:

#export
class ReflectionLayer(Module):
    "A series of Reflection Padding followed by a ConvLayer"
    def __init__(self, in_channels, out_channels, ks=3, stride=2):
        reflection_padding = ks // 2
        self.reflection_pad = nn.ReflectionPad2d(reflection_padding)
        self.conv2d = nn.Conv2d(in_channels, out_channels, ks, stride)

    def forward(self, x):
        out = self.reflection_pad(x)
        out = self.conv2d(out)
        return out

In [ ]:

ReflectionLayer(3, 3)

Out[ ]:

ReflectionLayer(
  (reflection_pad): ReflectionPad2d((1, 1, 1, 1))
  (conv2d): Conv2d(3, 3, kernel_size=(3, 3), stride=(2, 2))
)

In [ ]:

#export
class ResidualBlock(Module):
    "Two reflection layers and an added activation function with residual"
    def __init__(self, channels):
          self.conv1 = ReflectionLayer(channels, channels, ks=3, stride=1)
          self.in1 = nn.InstanceNorm2d(channels, affine=True)
          self.conv2 = ReflectionLayer(channels, channels, ks=3, stride=1)
          self.in2 = nn.InstanceNorm2d(channels, affine=True)
          self.relu = nn.ReLU()

    def forward(self, x):
          residual = x
          out = self.relu(self.in1(self.conv1(x)))
          out = self.in2(self.conv2(out))
          out = out + residual
          return out

In [ ]:

ResidualBlock(3)

Out[ ]:

ResidualBlock(
  (conv1): ReflectionLayer(
    (reflection_pad): ReflectionPad2d((1, 1, 1, 1))
    (conv2d): Conv2d(3, 3, kernel_size=(3, 3), stride=(1, 1))
  )
  (in1): InstanceNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (conv2): ReflectionLayer(
    (reflection_pad): ReflectionPad2d((1, 1, 1, 1))
    (conv2d): Conv2d(3, 3, kernel_size=(3, 3), stride=(1, 1))
  )
  (in2): InstanceNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (relu): ReLU()
)

In [ ]:

#export
class UpsampleConvLayer(Module):
    "Upsample with a ReflectionLayer"
    def __init__(self, in_channels, out_channels, ks=3, stride=1, upsample=None):
        self.upsample = upsample
        reflection_padding = ks // 2
        self.reflection_pad = nn.ReflectionPad2d(reflection_padding)
        self.conv2d = nn.Conv2d(in_channels, out_channels, ks, stride)

    def forward(self, x):
        x_in = x
        if self.upsample:
            x_in = torch.nn.functional.interpolate(x_in, mode='nearest', scale_factor=self.upsample)
        out = self.reflection_pad(x_in)
        out = self.conv2d(out)
        return out

Let's put everything together into a model

In [ ]:

#export
class TransformerNet(Module):
    "A simple network for style transfer"
    def __init__(self):
        # Initial convolution layers
        self.conv1 = ReflectionLayer(3, 32, ks=9, stride=1)
        self.in1 = nn.InstanceNorm2d(32, affine=True)
        self.conv2 = ReflectionLayer(32, 64, ks=3, stride=2)
        self.in2 = nn.InstanceNorm2d(64, affine=True)
        self.conv3 = ReflectionLayer(64, 128, ks=3, stride=2)
        self.in3 = nn.InstanceNorm2d(128, affine=True)
        # Residual layers
        self.res1 = ResidualBlock(128)
        self.res2 = ResidualBlock(128)
        self.res3 = ResidualBlock(128)
        self.res4 = ResidualBlock(128)
        self.res5 = ResidualBlock(128)
        # Upsampling Layers
        self.deconv1 = UpsampleConvLayer(128, 64, ks=3, stride=1, upsample=2)
        self.in4 = nn.InstanceNorm2d(64, affine=True)
        self.deconv2 = UpsampleConvLayer(64, 32, ks=3, stride=1, upsample=2)
        self.in5 = nn.InstanceNorm2d(32, affine=True)
        self.deconv3 = ReflectionLayer(32, 3, ks=9, stride=1)
        # Non-linearities
        self.relu = nn.ReLU()

    def forward(self, X):
        y = self.relu(self.in1(self.conv1(X)))
        y = self.relu(self.in2(self.conv2(y)))
        y = self.relu(self.in3(self.conv3(y)))
        y = self.res1(y)
        y = self.res2(y)
        y = self.res3(y)
        y = self.res4(y)
        y = self.res5(y)
        y = self.relu(self.in4(self.deconv1(y)))
        y = self.relu(self.in5(self.deconv2(y)))
        y = self.deconv3(y)
        return y

In [ ]:

net = TransformerNet()

DataLoaders and Learner¶

We will be using the COCO_SAMPLE dataset

In [ ]:

path = untar_data(URLs.COCO_SAMPLE)

Our DataBlock needs to be Image -> Image

In [ ]:

dblock = DataBlock(blocks=(ImageBlock, ImageBlock),
                   get_items=get_image_files,
                   splitter=RandomSplitter(0.1, seed=42),
                   item_tfms=[Resize(224)],
                   batch_tfms=[Normalize.from_stats(*imagenet_stats)])

If you do not pass in a get_y, fastai will assume your input = output

In [ ]:

dls = dblock.dataloaders(path, bs=22)

In [ ]:

dls.show_batch()

We now can make our Learner!

In [ ]:

learn = Learner(dls, TransformerNet(), loss_func=loss_func)

In [ ]:

learn.summary()

Out[ ]:

TransformerNet (Input shape: ['22 x 3 x 224 x 224'])
================================================================
Layer (type)         Output Shape         Param #    Trainable 
================================================================
ReflectionPad2d      22 x 3 x 232 x 232   0          False     
________________________________________________________________
Conv2d               22 x 32 x 224 x 224  7,808      True      
________________________________________________________________
InstanceNorm2d       22 x 32 x 224 x 224  64         True      
________________________________________________________________
ReflectionPad2d      22 x 32 x 226 x 226  0          False     
________________________________________________________________
Conv2d               22 x 64 x 112 x 112  18,496     True      
________________________________________________________________
InstanceNorm2d       22 x 64 x 112 x 112  128        True      
________________________________________________________________
ReflectionPad2d      22 x 64 x 114 x 114  0          False     
________________________________________________________________
Conv2d               22 x 128 x 56 x 56   73,856     True      
________________________________________________________________
InstanceNorm2d       22 x 128 x 56 x 56   256        True      
________________________________________________________________
ReflectionPad2d      22 x 128 x 58 x 58   0          False     
________________________________________________________________
Conv2d               22 x 128 x 56 x 56   147,584    True      
________________________________________________________________
InstanceNorm2d       22 x 128 x 56 x 56   256        True      
________________________________________________________________
ReflectionPad2d      22 x 128 x 58 x 58   0          False     
________________________________________________________________
Conv2d               22 x 128 x 56 x 56   147,584    True      
________________________________________________________________
InstanceNorm2d       22 x 128 x 56 x 56   256        True      
________________________________________________________________
ReLU                 22 x 128 x 56 x 56   0          False     
________________________________________________________________
ReflectionPad2d      22 x 128 x 58 x 58   0          False     
________________________________________________________________
Conv2d               22 x 128 x 56 x 56   147,584    True      
________________________________________________________________
InstanceNorm2d       22 x 128 x 56 x 56   256        True      
________________________________________________________________
ReflectionPad2d      22 x 128 x 58 x 58   0          False     
________________________________________________________________
Conv2d               22 x 128 x 56 x 56   147,584    True      
________________________________________________________________
InstanceNorm2d       22 x 128 x 56 x 56   256        True      
________________________________________________________________
ReLU                 22 x 128 x 56 x 56   0          False     
________________________________________________________________
ReflectionPad2d      22 x 128 x 58 x 58   0          False     
________________________________________________________________
Conv2d               22 x 128 x 56 x 56   147,584    True      
________________________________________________________________
InstanceNorm2d       22 x 128 x 56 x 56   256        True      
________________________________________________________________
ReflectionPad2d      22 x 128 x 58 x 58   0          False     
________________________________________________________________
Conv2d               22 x 128 x 56 x 56   147,584    True      
________________________________________________________________
InstanceNorm2d       22 x 128 x 56 x 56   256        True      
________________________________________________________________
ReLU                 22 x 128 x 56 x 56   0          False     
________________________________________________________________
ReflectionPad2d      22 x 128 x 58 x 58   0          False     
________________________________________________________________
Conv2d               22 x 128 x 56 x 56   147,584    True      
________________________________________________________________
InstanceNorm2d       22 x 128 x 56 x 56   256        True      
________________________________________________________________
ReflectionPad2d      22 x 128 x 58 x 58   0          False     
________________________________________________________________
Conv2d               22 x 128 x 56 x 56   147,584    True      
________________________________________________________________
InstanceNorm2d       22 x 128 x 56 x 56   256        True      
________________________________________________________________
ReLU                 22 x 128 x 56 x 56   0          False     
________________________________________________________________
ReflectionPad2d      22 x 128 x 58 x 58   0          False     
________________________________________________________________
Conv2d               22 x 128 x 56 x 56   147,584    True      
________________________________________________________________
InstanceNorm2d       22 x 128 x 56 x 56   256        True      
________________________________________________________________
ReflectionPad2d      22 x 128 x 58 x 58   0          False     
________________________________________________________________
Conv2d               22 x 128 x 56 x 56   147,584    True      
________________________________________________________________
InstanceNorm2d       22 x 128 x 56 x 56   256        True      
________________________________________________________________
ReLU                 22 x 128 x 56 x 56   0          False     
________________________________________________________________
ReflectionPad2d      22 x 128 x 114 x 11  0          False     
________________________________________________________________
Conv2d               22 x 64 x 112 x 112  73,792     True      
________________________________________________________________
InstanceNorm2d       22 x 64 x 112 x 112  128        True      
________________________________________________________________
ReflectionPad2d      22 x 64 x 226 x 226  0          False     
________________________________________________________________
Conv2d               22 x 32 x 224 x 224  18,464     True      
________________________________________________________________
InstanceNorm2d       22 x 32 x 224 x 224  64         True      
________________________________________________________________
ReflectionPad2d      22 x 32 x 232 x 232  0          False     
________________________________________________________________
Conv2d               22 x 3 x 224 x 224   7,779      True      
________________________________________________________________
ReLU                 22 x 32 x 224 x 224  0          False     
________________________________________________________________

Total params: 1,679,235
Total trainable params: 1,679,235
Total non-trainable params: 0

Optimizer used: <function Adam at 0x7fe082f1ef28>
Loss function: FeatureLoss()

Callbacks:
  - TrainEvalCallback
  - Recorder
  - ProgressCallback

Let's find a learning rate and fit for one epoch

In [ ]:

learn.lr_find()

In [ ]:

learn.fit_one_cycle(1, 1e-3)

epoch	train_loss	valid_loss	time
0	28.079075	28.128078	07:40

And take a look at some of our results!

In [ ]:

learn.show_results()

In [ ]:

learn.save('stage1')

Now let's try learn.predict

In [ ]:

pred = learn.predict('cat.jpg')

In [ ]:

pred[0].show()

Out[ ]:

<matplotlib.axes._subplots.AxesSubplot at 0x7fe014c07e10>

Well while that looks cool, we lost a lot of the features! How can we fix this? Let's try something similar to what we did for our style_im

In [ ]:

learn.load('stage1')

Out[ ]:

<fastai.learner.Learner at 0x7ff7890267f0>

In [ ]:

dset = Datasets('cat.jpg', tfms=[PILImage.create])
dl = dset.dataloaders(after_item=[ToTensor()], after_batch=[IntToFloatTensor(), Normalize.from_stats(*imagenet_stats)], bs=1)

In [ ]:

t_im = dl.one_batch()[0]

In [ ]:

with torch.no_grad():
  res = learn.model(t_im)

Now let's try that again

In [ ]:

TensorImage(res[0]).show()

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

Out[ ]:

<matplotlib.axes._subplots.AxesSubplot at 0x7ff788d839e8>

Much better! But the colours seem 'off', because the output activations had not been 'decoded' with the reverse-tfms before being shown as an image, so we will do that below.

In [ ]:

dec_res = dl.decode_batch(tuplify(res))[0][0]
dec_res.show();

In [ ]:

learn.save('224')

Now we can increase our size to 512 similar to how we could do in the segmentation example (this is homework, we will not do this as the epoch will take ~40 minutes)

Homework¶

In [ ]:

dblock = DataBlock(blocks=(ImageBlock, ImageBlock),
                   get_items=get_image_files,
                   splitter=RandomSplitter(0.1, seed=42),
                   item_tfms=[Resize(448)],
                   batch_tfms=[Normalize.from_stats(*imagenet_stats)])

In [ ]:

dls = dblock.dataloaders(path, bs=8)

In [ ]:

learn = Learner(dls, net, loss_func=loss_func).load('224')

In [ ]:

learn.fit_one_cycle(1, 1e-3)

In [ ]:

learn.show_results()

In [ ]:

learn.save('final')

Rest of the lesson¶

Let's export our model to use.

In [ ]:

learn.loss_func = CrossEntropyLossFlat()

In [ ]:

learn.export('myModel')

From here: Download the notebook and upload it to the main

Export¶

In [ ]:

#hide
from nbdev.imports import *
from nbdev.export import reset_nbdev_module, notebook2script

create_config('myLib', user='muellerzr', path='.', cfg_name='settings.ini')
cfg = Config(cfg_name='settings.ini')

In [ ]:

reset_nbdev_module()

In [ ]:

#hide
from nbdev.export import notebook2script
notebook2script('05_StyleTransfer.ipynb')

Converted 05_StyleTransfer.ipynb.

In [ ]:

05 - Style Transfer¶

What will we cover?¶

Intro to nbdev¶

fastai libraries¶

What is Style Transfer?¶

Our Pre-Trained Network:¶

The Loss Function¶

The Model Architecture¶

DataLoaders and Learner¶

Homework¶

Rest of the lesson¶

Export¶

Intro to `nbdev`¶