DS4420: A note on multivariate differentiation

In the in-class exercise from 1/9, one step involved finding $\nabla_{\bf x} z$ Where $z = {\bf x}^t {\bf x}$ and ${\bf x} \in \mathcal{R}^d$. Many of you broke up the dot product, trying to do something like: $\sum_i^d \frac{d}{dx} {\bf x}_i$. I think folks mostly had the right idea, but ended up collapsing into a single dimension by summing over $2 \cdot x_i$ terms.

This is understandable; you may not be used to multivariate calc (or rusty in general) -- so don't worry! Just remember that gradients of ${\bf x}$ need to have the same dimension as ${\bf x}$! Hence the warning: Mind your dimensions!

Recall from lecture that when we are interested in $\nabla_{\bf x} f({\bf x})$, we are looking for a vector (collection) of partial derivatives for each ${\bf x}_i$.

Consider ${\bf x}_1$: $\frac{\partial z}{\partial {\bf x}_1}$. What is this? $\frac{\partial}{\partial {\bf x}_1} {\bf x}^t {\bf x} = \sum_i^d \frac{\partial}{\partial {\bf x}_1} {\bf x}^t_i {\bf x}_i$; the only term that won't be zeroed out here is when $i=1$: $\frac{\partial}{\partial {\bf x}_1}{\bf x}^t_1 {\bf x}_1 = \frac{\partial}{\partial {\bf x}_1}{\bf x}_1^2 = 2 {\bf x}_1$. Similarly, $\frac{\partial}{\partial {\bf x}_2}{\bf x}^t {\bf x} = \frac{\partial}{\partial {\bf x}_2}{\bf x}^t_2 {\bf x}_2 = \frac{\partial}{\partial {\bf x}_2}{\bf x}_2^2 = 2 {\bf x}_2$, and so on.

So we assemble each of these into a vector $\nabla_{\bf x}$, where entry $i$ ends up being $2{\bf x}_i$: $\nabla_{\bf x} = [2{\bf x}_1, ..., 2{\bf x}_d]$. Hence we have $2{\bf x}$.

In [2]:
import numpy as np
import torch
import torch.optim as optim
import torch.nn as nn
In [3]:
D = 10 # arbitrary dimension.
x = torch.tensor(np.random.random(D), requires_grad=True)
z = torch.dot(x, x)
z.backward()
x_grad = x.grad
print(x_grad)
tensor([0.7032, 0.8064, 1.0841, 1.2363, 1.7031, 0.4266, 0.2907, 1.7737, 1.2545,
        0.6804], dtype=torch.float64)
In [4]:
print(x_grad.shape)
torch.Size([10])
In [ ]: