In the in-class exercise from 1/9, one step involved finding $\nabla_{\bf x} z$ Where $z = {\bf x}^t {\bf x}$ and ${\bf x} \in \mathcal{R}^d$. Many of you broke up the dot product, trying to do something like: $\sum_i^d \frac{d}{dx} {\bf x}_i$. I think folks mostly had the right idea, but ended up collapsing into a single dimension by summing over $2 \cdot x_i$ terms.

This is understandable; you may not be used to multivariate calc (or rusty in general) -- so don't worry! Just remember that gradients of ${\bf x}$ need to have the same dimension as ${\bf x}$! Hence the warning: Mind your dimensions!

Recall from lecture that when we are interested in $\nabla_{\bf x} f({\bf x})$, we are looking for a *vector* (collection) of partial derivatives for each ${\bf x}_i$.

Consider ${\bf x}_1$: $\frac{\partial z}{\partial {\bf x}_1}$. What is this? $\frac{\partial}{\partial {\bf x}_1} {\bf x}^t {\bf x} = \sum_i^d \frac{\partial}{\partial {\bf x}_1} {\bf x}^t_i {\bf x}_i$; the only term that won't be zeroed out here is when $i=1$: $\frac{\partial}{\partial {\bf x}_1}{\bf x}^t_1 {\bf x}_1 = \frac{\partial}{\partial {\bf x}_1}{\bf x}_1^2 = 2 {\bf x}_1$. Similarly, $\frac{\partial}{\partial {\bf x}_2}{\bf x}^t {\bf x} = \frac{\partial}{\partial {\bf x}_2}{\bf x}^t_2 {\bf x}_2 = \frac{\partial}{\partial {\bf x}_2}{\bf x}_2^2 = 2 {\bf x}_2$, and so on.

So we assemble each of these into a vector $\nabla_{\bf x}$, where entry $i$ ends up being $2{\bf x}_i$: $\nabla_{\bf x} = [2{\bf x}_1, ..., 2{\bf x}_d]$. Hence we have $2{\bf x}$.

In [2]:

```
import numpy as np
import torch
import torch.optim as optim
import torch.nn as nn
```

In [3]:

```
D = 10 # arbitrary dimension.
x = torch.tensor(np.random.random(D), requires_grad=True)
z = torch.dot(x, x)
z.backward()
x_grad = x.grad
print(x_grad)
```

In [4]:

```
print(x_grad.shape)
```

In [ ]:

```
```