DS4420

The joy of autograd; a brief primer on pytorch for automatic differentiation.

Consider this exercise from the math for machine learning (MML) book.

image.png

On the board we worked this out manually:

$\frac{2x}{1+z}$ = $\frac{2x}{1+x^tx}$

But this would get tedious; fortunately modern libraries use auto-diff. In this class you'll want to get familiar with at least one such library (the obvious two choices being pytorch and TensorFlow; I have a personal bias for the former).

In [11]:
import numpy as np
import torch
import torch.optim as optim
import torch.nn as nn
In [12]:
D = 10 # arbitrary dimension.
x = torch.tensor(np.random.random(D), requires_grad=True)
print(x)
tensor([0.1801, 0.3444, 0.4006, 0.6215, 0.2568, 0.3863, 0.5832, 0.4745, 0.8343,
        0.0492], dtype=torch.float64, requires_grad=True)
In [13]:
z = torch.dot(x, x)
f = torch.log(1 + z)

So from our manual result above, we should end up with: $\frac{2x}{1 + z}$

In [14]:
(x / (1 + z))*2
Out[14]:
tensor([0.1134, 0.2168, 0.2522, 0.3913, 0.1617, 0.2432, 0.3672, 0.2987, 0.5252,
        0.0310], dtype=torch.float64, grad_fn=<MulBackward0>)

By taking a backward pass (the terminology comes from backprop, which we will discuss in some detail in later lectures), we collect the gradient automatically.

In [15]:
f.backward()
In [16]:
print(x.grad)
tensor([0.1134, 0.2168, 0.2522, 0.3913, 0.1617, 0.2432, 0.3672, 0.2987, 0.5252,
        0.0310], dtype=torch.float64)

and now (b)

From derivations we had:

${\text{cos}}(Ax + b) \cdot A$

In [33]:
D = 5 # arbitrary dimension.
x = torch.tensor(np.random.random(D), requires_grad=True)
print(x)
tensor([0.0143, 0.9790, 0.6805, 0.8771, 0.0953], dtype=torch.float64,
       requires_grad=True)
In [34]:
E = 10
A = torch.tensor(np.random.random((E, D)))
A
Out[34]:
tensor([[0.9670, 0.5487, 0.0196, 0.4444, 0.3262],
        [0.3794, 0.7918, 0.9998, 0.0907, 0.8979],
        [0.0238, 0.1552, 0.9408, 0.4540, 0.7213],
        [0.0461, 0.6931, 0.9383, 0.2263, 0.5735],
        [0.1622, 0.9763, 0.6564, 0.8489, 0.6759],
        [0.1780, 0.4534, 0.0268, 0.5234, 0.3507],
        [0.1702, 0.2508, 0.9425, 0.4099, 0.5830],
        [0.7704, 0.2154, 0.5592, 0.2581, 0.9374],
        [0.5028, 0.6831, 0.5410, 0.7700, 0.9633],
        [0.1757, 0.9128, 0.7975, 0.6231, 0.2041]], dtype=torch.float64)
In [35]:
b = torch.tensor(np.random.random((E)))
print(b)
tensor([0.2230, 0.8327, 0.9930, 0.4835, 0.0355, 0.7217, 0.7977, 0.3526, 0.9574,
        0.4611], dtype=torch.float64)
In [36]:
z = torch.mv(A, x) + b
f = torch.sin(z)
In [37]:
torch.sin(z)
Out[37]:
tensor([0.9350, 0.6309, 0.7764, 0.8853, 0.7785, 0.9942, 0.8621, 0.9553, 0.3643,
        0.6254], dtype=torch.float64, grad_fn=<SinBackward>)
In [38]:
z.shape[0]
Out[38]:
10
In [39]:
# this is a bit funny but basically if we just call `backward' 
# implicitly Torch takes as `loss.backward(torch.Tensor([1]))'
f.backward(torch.Tensor(np.ones(z.shape[0])))
x.grad
Out[39]:
tensor([-0.5727, -2.9134, -3.6513, -2.2281, -2.8390], dtype=torch.float64)

How does this compare to our bespoke solution?

In [40]:
torch.matmul( torch.cos(torch.mv(A, x) + b), A)
Out[40]:
tensor([-0.5727, -2.9134, -3.6513, -2.2281, -2.8390], dtype=torch.float64,
       grad_fn=<SqueezeBackward3>)

OK all is sane!

In [ ]: