We will consider some common operations for matrices:
# Import torch and other required modules
import torch
Since multiplication is not commutative, we need to take care of the order.
# Example 1
A = torch.randn(2, 3)
B = torch.randn(3, 4)
torch.mm(A, B)
tensor([[ 0.1024, -0.6093, 0.2240, -0.1838], [-0.7241, -0.7749, 0.2628, 1.9635]])
However if you change the order you obtain
# Example 2 - breaking
torch.mm(B, A)
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-23-56583f172700> in <module> 1 # Example 2 - breaking ----> 2 torch.mm(B, A) RuntimeError: size mismatch, m1: [3 x 4], m2: [2 x 3] at C:\Users\builder\AppData\Local\Temp\pip-req-build-9msmi1s9\aten\src\TH/generic/THTensorMath.cpp:197
torch.mm does not broadcast. For broadcasting matrix products, see torch.matmul().
If your keyboard has the character '@', then you can write
# Example 3
A @ B
tensor([[-0.5379, -0.1539, -0.2042, -0.5444], [ 0.0280, -0.8160, 0.5760, 2.8664]])
When we transpose a matrix we transform rows to columns, and columns to rows.
# Example 1
torch.t(A)
tensor([[-0.7157, 1.2350], [ 0.0141, -1.2724], [ 0.1572, 0.7450]])
Again, you should thoughtful, because (AB)T≠ATBT, indeed
# Example 2
C = torch.randn(3, 3)
D = torch.randn(3, 3)
torch.t(C @ D) - (torch.t(C) @ torch.t(D))
tensor([[ 1.5273, 2.2196, 0.6171], [-0.0384, -1.0361, -1.0175], [ 2.1029, -0.2353, -0.4913]])
The correct result is (AB)T=BTAT.
# Example 3
torch.t(C @ D) - (torch.t(D) @ torch.t(C))
tensor([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])
In machine learning we should minimize different type of error function.
The derivative of a multivariable scalar valued function is a matrix, the so-called Jacobi matrix.
If you want to calculate the derivative, you should give this option in the definition of the tensor, using requires_grad=True
In Example 1 we use the vector norm (in fact, the Frobenius norm, which gives the Eucledian vector norm in case of vectors).
# Example 1
# Create tensors.
A = torch.randn(3, 3, requires_grad=True)
x = torch.randn(3, 1, requires_grad=True)
b = torch.randn(3, 1, requires_grad=True)
y = torch.norm(A @ x - b, p='fro')
print("A: \n", A)
print("x: \n", x)
print("b: \n", b)
print("y: \n", y)
A: tensor([[-0.3062, -0.5272, 1.2869], [-1.8781, -1.9212, 0.1393], [-1.0218, 1.2361, -0.1057]], requires_grad=True) x: tensor([[ 0.5530], [-0.6164], [ 0.8250]], requires_grad=True) b: tensor([[-1.0424], [-1.3509], [ 0.8377]], requires_grad=True) y: tensor(3.5743, grad_fn=<NormBackward0>)
Now we calculate the derivatives ∂y∂A, ∂y∂x, ∂y∂b.
To compute the derivatives, we call the .backward method on our result y.
# Compute derivatives
y.backward()
# Display gradients
print('∂y/∂A: \n', A.grad)
print('∂y/∂x: \n', x.grad)
print('∂y/∂b: \n', b.grad)
∂y/∂A: tensor([[ 0.3496, -0.3897, 0.5216], [ 0.2493, -0.2780, 0.3720], [-0.3484, 0.3884, -0.5198]]) ∂y/∂x: tensor([[-0.3966], [-1.9784], [ 0.9430]]) ∂y/∂b: tensor([[-0.6322], [-0.4509], [ 0.6301]])
Instead of Frobenius norm we can choose any 1≤p<∞ norm. In the next example we choose p=1.
# Example 2
w = torch.norm(A @ x - b, p=1)
# Compute derivatives
w.backward()
# Display gradients
print('∂w/∂A: \n', A.grad)
print('∂w/∂x: \n', x.grad)
print('∂w/∂b: \n', b.grad)
∂w/∂A: tensor([[ 0.9026, -1.0062, 1.3467], [ 0.8023, -0.8944, 1.1970], [-0.9014, 1.0048, -1.3449]]) ∂w/∂x: tensor([[-1.5591], [-5.6630], [ 2.4750]]) ∂w/∂b: tensor([[-1.6322], [-1.4509], [ 1.6301]])
At this moment the infinity norm p=∞ does not work.
# Example 3 - breaking
z = torch.norm(A @ x - b, p=inf)
# Compute derivatives
z.backward()
# Display gradients
print('∂z/∂A: \n', A.grad)
print('∂z/∂x: \n', x.grad)
print('∂z/∂b: \n', b.grad)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-125-e4f599068e51> in <module> 1 # Example 3 - breaking ----> 2 z = torch.norm(A @ x - b, p=inf) 3 4 # Compute derivatives 5 z.backward() NameError: name 'inf' is not defined
When we work on linear operators (matrices) many times is a must to determine their eigenvalues and eigenvectors. To do this, we use torch.eig
# Example 1
(eigvalues, eigvectors) = torch.eig(A, eigenvectors=True)
for i in range(3):
print('eigenvalue: ', eigvalues[i])
print('eigvector: ', eigvectors[i])
print('\n')
eigenvalue: tensor([1.1316, 0.0000], grad_fn=<SelectBackward>) eigvector: tensor([-0.9053, 0.4609, 0.0952], grad_fn=<SelectBackward>) eigenvalue: tensor([-0.2300, 0.0000], grad_fn=<SelectBackward>) eigvector: tensor([-0.1106, -0.8415, -0.8702], grad_fn=<SelectBackward>) eigenvalue: tensor([-0.9825, 0.0000], grad_fn=<SelectBackward>) eigvector: tensor([-0.4100, 0.2818, -0.4834], grad_fn=<SelectBackward>)
Since an eigenvalue can be complex, the first element in the tensor is the real part and the second element is the imaginary part of it.
Here is an example when the tensor has one eigenvalue with three different eigenvectors.
# Example 2
C = torch.tensor([[0., -1., 0], [4., 4., 0], [2., 1., 2.]])
(eigvalues, eigvectors) = torch.eig(C, eigenvectors=True)
for i in range(3):
print('eigenvalue: ', eigvalues[i])
print('eigenvector: ', eigvectors[i])
print('\n')
eigenvalue: tensor([2., 0.]) eigenvector: tensor([ 0.0000, -0.4472, 0.4082]) eigenvalue: tensor([2.0000, 0.0000]) eigenvector: tensor([ 0.0000, 0.8944, -0.8165]) eigenvalue: tensor([2.0000, 0.0000]) eigenvector: tensor([ 1.0000, 0.0000, -0.4082])
Only square matrices can have eigenvalues.
# Example 3
D = torch.tensor([[0., -1., 0], [4., 4., 0]])
(eigvalues, eigvectors) = torch.eig(D, eigenvectors=True)
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-114-dfab705752e7> in <module> 2 D = torch.tensor([[0., -1., 0], [4., 4., 0]]) 3 ----> 4 (eigvalues, eigvectors) = torch.eig(D, eigenvectors=True) RuntimeError: invalid argument 1: A should be square at C:\Users\builder\AppData\Local\Temp\pip-req-build-9msmi1s9\aten\src\TH/generic/THTensorLapack.cpp:195
Many times we should minimize the quantity ‖AX−B‖2, where A,B are given matrices, and X is the one we want to calculate. We use the torch.lstsq function.
# Example 1
A = torch.randn(3, 4)
B = torch.randn(3, 1) # a vector
print("A: \n", A)
print("B: \n", B)
print("\n")
X, _ = torch.lstsq(B, A)
error = torch.norm(A @ X - B, p=2)
print("The solution of the least square problem is: \n\n", X)
print("\n")
print("The error is: ", error)
A: tensor([[-0.2769, 1.0674, 0.6662, -1.5984], [ 0.8249, 1.3140, -0.0192, -0.3846], [ 2.4818, 0.5684, -1.7736, 0.6815]]) B: tensor([[ 0.4494], [-0.9212], [-1.1869]]) The solution of the least square problem is: tensor([[-0.2753], [-0.7950], [-0.3150], [-0.8957]]) The error is: tensor(1.8849e-07)
Now consider the previous example, but with p=1 norm.
# Example 2
A = torch.randn(3, 4)
B = torch.randn(3, 1) # a vector
print("A: \n", A)
print("B: \n", B)
print("\n")
X, _ = torch.lstsq(B, A)
error = torch.norm(A @ X - B, p=1)
print("The solution of the least square problem is: \n\n", X)
print("\n")
print("The error is: ", error)
A: tensor([[-0.0644, -0.7350, 0.9334, -0.2362], [-0.5515, -0.2604, 0.7796, -1.1156], [-1.2727, 1.4277, 1.5215, 1.3682]]) B: tensor([[-1.8821], [-0.5725], [-0.3240]]) The solution of the least square problem is: tensor([[-0.3108], [ 1.2100], [-1.1999], [-0.4541]]) The error is: tensor(8.3447e-07)
The error is greater.
If we give matrices of wrong dimensions, we obtain a 'size mismatch' error message.
# Example 3 - breaking
A = torch.randn(5, 4)
B = torch.randn(5, 1) # a vector
print("A: \n", A)
print("B: \n", B)
print("\n")
X, _ = torch.lstsq(B, A)
error = torch.norm(A @ X - B, p=1)
print("The solution of the least square problem is: \n\n", X)
print("\n")
print("The error is: ", error)
A: tensor([[-0.7085, -0.5104, 0.8651, -1.5750], [-0.6783, -0.8917, -0.3745, 0.8802], [-0.1210, 0.4588, 1.7196, -1.7847], [ 0.5258, -0.0043, 1.3598, -0.2012], [-0.3974, 0.0806, 0.2112, -0.7372]]) B: tensor([[ 2.1218], [ 2.0587], [ 0.4716], [-0.7330], [ 1.3271]])
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-133-0c463341c127> in <module> 8 X, _ = torch.lstsq(B, A) 9 ---> 10 error = torch.norm(A @ X - B, p=1) 11 print("The solution of the least square problem is: \n\n", X) 12 print("\n") RuntimeError: size mismatch, m1: [5 x 4], m2: [5 x 1] at C:\Users\builder\AppData\Local\Temp\pip-req-build-9msmi1s9\aten\src\TH/generic/THTensorMath.cpp:197
We considered some basic linear algebraic problems, that PyTorch can solve very effectively, and we will use them many times later.
Official documentation for torch.Tensor https://pytorch.org/docs/stable/tensors.html
Official documentation for torch.Autograd https://pytorch.org/docs/stable/autograd.html
from IPython.core.display import HTML
import urllib.request
response = urllib.request.urlopen('https://raw.githubusercontent.com/wesszabo/Pytorch-basics/master/CSS/pytorch_basics.css')
HTML(response.read().decode("utf-8"))