Array operations in PyTorch

(Szabó Sándor, 27. May 2020)

We will consider some common operations for matrices:

multiplication
transpose
gradient
eigenvalues, eigenvectors
least square norm

In [1]:

# Import torch and other required modules
import torch

Multiplication

Since multiplication is not commutative, we need to take care of the order.

Example 1.

In [28]:

# Example 1
A = torch.randn(2, 3)
B = torch.randn(3, 4)
torch.mm(A, B)

Out[28]:

tensor([[ 0.1024, -0.6093,  0.2240, -0.1838],
        [-0.7241, -0.7749,  0.2628,  1.9635]])

Example 2.

However if you change the order you obtain

In [23]:

# Example 2 - breaking
torch.mm(B, A)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-23-56583f172700> in <module>
      1 # Example 2 - breaking
----> 2 torch.mm(B, A)

RuntimeError: size mismatch, m1: [3 x 4], m2: [2 x 3] at C:\Users\builder\AppData\Local\Temp\pip-req-build-9msmi1s9\aten\src\TH/generic/THTensorMath.cpp:197

torch.mm does not broadcast. For broadcasting matrix products, see torch.matmul().

Example 3.

If your keyboard has the character '@', then you can write

In [24]:

# Example 3
A @ B

Out[24]:

tensor([[-0.5379, -0.1539, -0.2042, -0.5444],
        [ 0.0280, -0.8160,  0.5760,  2.8664]])

Transpose

When we transpose a matrix we transform rows to columns, and columns to rows.

Example 1.

In [26]:

# Example 1 
torch.t(A)

Out[26]:

tensor([[-0.7157,  1.2350],
        [ 0.0141, -1.2724],
        [ 0.1572,  0.7450]])

Example 2.

Again, you should thoughtful, because $(AB)^T\neq A^T B^T$ , indeed

In [30]:

# Example 2
C = torch.randn(3, 3)
D = torch.randn(3, 3)
torch.t(C @ D) - (torch.t(C) @ torch.t(D))

Out[30]:

tensor([[ 1.5273,  2.2196,  0.6171],
        [-0.0384, -1.0361, -1.0175],
        [ 2.1029, -0.2353, -0.4913]])

The correct result is $(AB)^T=B^T A^T$ .

Example 3.

In [31]:

# Example 3
torch.t(C @ D) - (torch.t(D) @ torch.t(C))

Out[31]:

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

Gradient

In machine learning we should minimize different type of error function.

The derivative of a multivariable scalar valued function is a matrix, the so-called Jacobi matrix.

If you want to calculate the derivative, you should give this option in the definition of the tensor, using requires_grad=True

In Example 1 we use the vector norm (in fact, the Frobenius norm, which gives the Eucledian vector norm in case of vectors).

Example 1.

In [122]:

# Example 1
# Create tensors.
A = torch.randn(3, 3, requires_grad=True)
x = torch.randn(3, 1, requires_grad=True)
b = torch.randn(3, 1, requires_grad=True)
y = torch.norm(A @ x - b, p='fro')
print("A: \n", A)
print("x: \n", x)
print("b: \n", b)
print("y: \n", y)

A: 
 tensor([[-0.3062, -0.5272,  1.2869],
        [-1.8781, -1.9212,  0.1393],
        [-1.0218,  1.2361, -0.1057]], requires_grad=True)
x: 
 tensor([[ 0.5530],
        [-0.6164],
        [ 0.8250]], requires_grad=True)
b: 
 tensor([[-1.0424],
        [-1.3509],
        [ 0.8377]], requires_grad=True)
y: 
 tensor(3.5743, grad_fn=<NormBackward0>)

Now we calculate the derivatives $\dfrac{\partial y}{\partial A}$ , $\dfrac{\partial y}{\partial x}$ , $\dfrac{\partial y}{\partial b}$ .

To compute the derivatives, we call the method on our result $y$ .

In [123]:

# Compute derivatives
y.backward()

# Display gradients
print('∂y/∂A: \n', A.grad)
print('∂y/∂x: \n', x.grad)
print('∂y/∂b: \n', b.grad)

∂y/∂A: 
 tensor([[ 0.3496, -0.3897,  0.5216],
        [ 0.2493, -0.2780,  0.3720],
        [-0.3484,  0.3884, -0.5198]])
∂y/∂x: 
 tensor([[-0.3966],
        [-1.9784],
        [ 0.9430]])
∂y/∂b: 
 tensor([[-0.6322],
        [-0.4509],
        [ 0.6301]])

Example 2.

Instead of Frobenius norm we can choose any $1\leq p<\infty$ norm. In the next example we choose $p=1$ .

In [124]:

# Example 2 
w = torch.norm(A @ x - b, p=1)
              
# Compute derivatives
w.backward()

# Display gradients
print('∂w/∂A: \n', A.grad)
print('∂w/∂x: \n', x.grad)
print('∂w/∂b: \n', b.grad)

∂w/∂A: 
 tensor([[ 0.9026, -1.0062,  1.3467],
        [ 0.8023, -0.8944,  1.1970],
        [-0.9014,  1.0048, -1.3449]])
∂w/∂x: 
 tensor([[-1.5591],
        [-5.6630],
        [ 2.4750]])
∂w/∂b: 
 tensor([[-1.6322],
        [-1.4509],
        [ 1.6301]])

Example 3.

At this moment the infinity norm $p=\infty$ does not work.

In [125]:

# Example 3 - breaking 
z = torch.norm(A @ x - b, p=inf)
              
# Compute derivatives
z.backward()

# Display gradients
print('∂z/∂A: \n', A.grad)
print('∂z/∂x: \n', x.grad)
print('∂z/∂b: \n', b.grad)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-125-e4f599068e51> in <module>
      1 # Example 3 - breaking
----> 2 z = torch.norm(A @ x - b, p=inf)
      3 
      4 # Compute derivatives
      5 z.backward()

NameError: name 'inf' is not defined

Eigenvalues, eigenvectors

When we work on linear operators (matrices) many times is a must to determine their eigenvalues and eigenvectors. To do this, we use torch.eig

Example 1.

In [111]:

# Example 1 
(eigvalues, eigvectors) = torch.eig(A, eigenvectors=True)
for i in range(3):
    print('eigenvalue: ', eigvalues[i])
    print('eigvector: ', eigvectors[i])
    print('\n')

eigenvalue:  tensor([1.1316, 0.0000], grad_fn=<SelectBackward>)
eigvector:  tensor([-0.9053,  0.4609,  0.0952], grad_fn=<SelectBackward>)


eigenvalue:  tensor([-0.2300,  0.0000], grad_fn=<SelectBackward>)
eigvector:  tensor([-0.1106, -0.8415, -0.8702], grad_fn=<SelectBackward>)


eigenvalue:  tensor([-0.9825,  0.0000], grad_fn=<SelectBackward>)
eigvector:  tensor([-0.4100,  0.2818, -0.4834], grad_fn=<SelectBackward>)

Since an eigenvalue can be complex, the first element in the tensor is the real part and the second element is the imaginary part of it.

Example 2.

Here is an example when the tensor has one eigenvalue with three different eigenvectors.

In [126]:

# Example 2 
C = torch.tensor([[0., -1., 0], [4., 4., 0], [2., 1., 2.]])

(eigvalues, eigvectors) = torch.eig(C, eigenvectors=True)
for i in range(3):
    print('eigenvalue: ', eigvalues[i])
    print('eigenvector: ', eigvectors[i])
    print('\n')

eigenvalue:  tensor([2., 0.])
eigenvector:  tensor([ 0.0000, -0.4472,  0.4082])


eigenvalue:  tensor([2.0000, 0.0000])
eigenvector:  tensor([ 0.0000,  0.8944, -0.8165])


eigenvalue:  tensor([2.0000, 0.0000])
eigenvector:  tensor([ 1.0000,  0.0000, -0.4082])

Example 3.

Only square matrices can have eigenvalues.

In [114]:

# Example 3 
D = torch.tensor([[0., -1., 0], [4., 4., 0]])

(eigvalues, eigvectors) = torch.eig(D, eigenvectors=True)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-114-dfab705752e7> in <module>
      2 D = torch.tensor([[0., -1., 0], [4., 4., 0]])
      3 
----> 4 (eigvalues, eigvectors) = torch.eig(D, eigenvectors=True)

RuntimeError: invalid argument 1: A should be square at C:\Users\builder\AppData\Local\Temp\pip-req-build-9msmi1s9\aten\src\TH/generic/THTensorLapack.cpp:195

Least square norm

Many times we should minimize the quantity $\Vert AX-B\Vert_2$ , where $A,B$ are given matrices, and $X$ is the one we want to calculate. We use the torch.lstsq function.

Example 1.

In [131]:

# Example 1
A = torch.randn(3, 4)
B = torch.randn(3, 1) # a vector
print("A: \n", A)
print("B: \n", B)
print("\n")

X, _ = torch.lstsq(B, A)

error = torch.norm(A @ X - B, p=2)
print("The solution of the least square problem is: \n\n", X)
print("\n")
print("The error is: ", error)

A: 
 tensor([[-0.2769,  1.0674,  0.6662, -1.5984],
        [ 0.8249,  1.3140, -0.0192, -0.3846],
        [ 2.4818,  0.5684, -1.7736,  0.6815]])
B: 
 tensor([[ 0.4494],
        [-0.9212],
        [-1.1869]])


The solution of the least square problem is: 

 tensor([[-0.2753],
        [-0.7950],
        [-0.3150],
        [-0.8957]])


The error is:  tensor(1.8849e-07)

Example 2.

Now consider the previous example, but with $p=1$ norm.

In [132]:

# Example 2 
A = torch.randn(3, 4)
B = torch.randn(3, 1) # a vector
print("A: \n", A)
print("B: \n", B)
print("\n")

X, _ = torch.lstsq(B, A)

error = torch.norm(A @ X - B, p=1)
print("The solution of the least square problem is: \n\n", X)
print("\n")
print("The error is: ", error)

A: 
 tensor([[-0.0644, -0.7350,  0.9334, -0.2362],
        [-0.5515, -0.2604,  0.7796, -1.1156],
        [-1.2727,  1.4277,  1.5215,  1.3682]])
B: 
 tensor([[-1.8821],
        [-0.5725],
        [-0.3240]])


The solution of the least square problem is: 

 tensor([[-0.3108],
        [ 1.2100],
        [-1.1999],
        [-0.4541]])


The error is:  tensor(8.3447e-07)

The error is greater.

Example 3.

If we give matrices of wrong dimensions, we obtain a 'size mismatch' error message.

In [133]:

# Example 3 - breaking
A = torch.randn(5, 4)
B = torch.randn(5, 1) # a vector
print("A: \n", A)
print("B: \n", B)
print("\n")

X, _ = torch.lstsq(B, A)

error = torch.norm(A @ X - B, p=1)
print("The solution of the least square problem is: \n\n", X)
print("\n")
print("The error is: ", error)

A: 
 tensor([[-0.7085, -0.5104,  0.8651, -1.5750],
        [-0.6783, -0.8917, -0.3745,  0.8802],
        [-0.1210,  0.4588,  1.7196, -1.7847],
        [ 0.5258, -0.0043,  1.3598, -0.2012],
        [-0.3974,  0.0806,  0.2112, -0.7372]])
B: 
 tensor([[ 2.1218],
        [ 2.0587],
        [ 0.4716],
        [-0.7330],
        [ 1.3271]])

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-133-0c463341c127> in <module>
      8 X, _ = torch.lstsq(B, A)
      9 
---> 10 error = torch.norm(A @ X - B, p=1)
     11 print("The solution of the least square problem is: \n\n", X)
     12 print("\n")

RuntimeError: size mismatch, m1: [5 x 4], m2: [5 x 1] at C:\Users\builder\AppData\Local\Temp\pip-req-build-9msmi1s9\aten\src\TH/generic/THTensorMath.cpp:197

Summary

We considered some basic linear algebraic problems, that PyTorch can solve very effectively, and we will use them many times later.

Reference Links

Official documentation for torch.Tensor https://pytorch.org/docs/stable/tensors.html

Official documentation for torch.Autograd https://pytorch.org/docs/stable/autograd.html

In [175]:

from IPython.core.display import HTML
import urllib.request
response = urllib.request.urlopen('https://raw.githubusercontent.com/wesszabo/Pytorch-basics/master/CSS/pytorch_basics.css')
HTML(response.read().decode("utf-8"))

Out[175]:

Table of Contents

Array operations in PyTorch

Multiplication

Example 1.

Example 2.

Example 3.

Transpose

Example 1.

Example 2.

Example 3.

Gradient

Example 1.

Example 2.

Example 3.

Eigenvalues, eigenvectors

Example 1.

Example 2.

Example 3.

Least square norm

Example 1.

Example 2.

Example 3.

Summary

Reference Links