Marcos Duarte
Laboratory of Biomechanics and Motor Control (http://demotu.org/)
Federal University of ABC, Brazil
A matrix is a square or rectangular array of numbers or symbols (termed elements), arranged in rows and columns. For instance:
$$ \mathbf{A} = \begin{bmatrix} a_{1,1} & a_{1,2} & a_{1,3} \\ a_{2,1} & a_{2,2} & a_{2,3} \end{bmatrix} $$$$ \mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} $$The matrix $\mathbf{A}$ above has two rows and three columns, it is a 2x3 matrix.
In Numpy:
# Import the necessary libraries
import numpy as np
from IPython.display import display
np.set_printoptions(precision=4) # number of digits of precision for floating point
A = np.array([[1, 2, 3], [4, 5, 6]])
A
array([[1, 2, 3], [4, 5, 6]])
To get information about the number of elements and the structure of the matrix (in fact, a Numpy array), we can use:
print('A:\n', A)
print('len(A) = ', len(A))
print('np.size(A) = ', np.size(A))
print('np.shape(A) = ', np.shape(A))
print('np.ndim(A) = ', np.ndim(A))
A: [[1 2 3] [4 5 6]] len(A) = 2 np.size(A) = 6 np.shape(A) = (2, 3) np.ndim(A) = 2
We could also have accessed this information with the correspondent methods:
print('A.size = ', A.size)
print('A.shape = ', A.shape)
print('A.ndim = ', A.ndim)
A.size = 6 A.shape = (2, 3) A.ndim = 2
We used the array function in Numpy to represent a matrix. A Numpy array is in fact different than a matrix, if we want to use explicit matrices in Numpy, we have to use the function mat
:
B = np.mat([[1, 2, 3], [4, 5, 6]])
B
matrix([[1, 2, 3], [4, 5, 6]])
Both array and matrix types work in Numpy, but you should choose only one type and not mix them; the array is preferred because it is the standard vector/matrix/tensor type of Numpy. So, let's use the array type for the rest of this text.
The sum of two m-by-n matrices $\mathbf{A}$ and $\mathbf{B}$ is another m-by-n matrix:
In Numpy:
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([[7, 8, 9], [10, 11, 12]])
print('A:\n', A)
print('B:\n', B)
print('A + B:\n', A+B);
A: [[1 2 3] [4 5 6]] B: [[ 7 8 9] [10 11 12]] A + B: [[ 8 10 12] [14 16 18]]
The multiplication of the m-by-n matrix $\mathbf{A}$ by the n-by-p matrix $\mathbf{B}$ is a m-by-p matrix:
$$ \mathbf{A} = \begin{bmatrix} a_{1,1} & a_{1,2} \\ a_{2,1} & a_{2,2} \end{bmatrix} \;\;\; \text{and} \;\;\; \mathbf{B} = \begin{bmatrix} b_{1,1} & b_{1,2} & b_{1,3} \\ b_{2,1} & b_{2,2} & b_{2,3} \end{bmatrix} $$$$ \mathbf{A} \mathbf{B} = \begin{bmatrix} a_{1,1}b_{1,1} + a_{1,2}b_{2,1} & a_{1,1}b_{1,2} + a_{1,2}b_{2,2} & a_{1,1}b_{1,3} + a_{1,2}b_{2,3} \\ a_{2,1}b_{1,1} + a_{2,2}b_{2,1} & a_{2,1}b_{1,2} + a_{2,2}b_{2,2} & a_{2,1}b_{1,3} + a_{2,2}b_{2,3} \end{bmatrix} $$In Numpy:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6, 7], [8, 9, 10]])
print('A:\n', A)
print('B:\n', B)
print('A x B:\n', np.dot(A, B));
A: [[1 2] [3 4]] B: [[ 5 6 7] [ 8 9 10]] A x B: [[21 24 27] [47 54 61]]
Note that because the array type is not truly a matrix type, we used the dot product to calculate matrix multiplication.
We can use the matrix type to show the equivalent:
A = np.mat(A)
B = np.mat(B)
print('A:\n', A)
print('B:\n', B)
print('A x B:\n', A*B);
A: [[1 2] [3 4]] B: [[ 5 6 7] [ 8 9 10]] A x B: [[21 24 27] [47 54 61]]
Same result as before.
The order in multiplication matters, $\mathbf{AB} \neq \mathbf{BA}$:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
print('A:\n', A)
print('B:\n', B)
print('A x B:\n', np.dot(A, B))
print('B x A:\n', np.dot(B, A));
A: [[1 2] [3 4]] B: [[5 6] [7 8]] A x B: [[19 22] [43 50]] B x A: [[23 34] [31 46]]
The addition or multiplication of a scalar (a single number) to a matrix is performed over all the elements of the matrix:
A = np.array([[1, 2], [3, 4]])
c = 10
print('A:\n', A)
print('c:\n', c)
print('c + A:\n', c+A)
print('cA:\n', c*A);
A: [[1 2] [3 4]] c: 10 c + A: [[11 12] [13 14]] cA: [[10 20] [30 40]]
The transpose of the matrix $\mathbf{A}$ is the matrix $\mathbf{A^T}$ turning all the rows of matrix $\mathbf{A}$ into columns (or columns into rows):
$$ \mathbf{A} = \begin{bmatrix} a & b & c \\ d & e & f \end{bmatrix} \;\;\;\;\;\;\iff\;\;\;\;\;\; \mathbf{A^T} = \begin{bmatrix} a & d \\ b & e \\ c & f \end{bmatrix} $$In NumPy, the transpose operator can be used as a method or function:
A = np.array([[1, 2], [3, 4]])
print('A:\n', A)
print('A.T:\n', A.T)
print('np.transpose(A):\n', np.transpose(A));
A: [[1 2] [3 4]] A.T: [[1 3] [2 4]] np.transpose(A): [[1 3] [2 4]]
The determinant is a number associated with a square matrix.
The determinant of the following matrix:
$$ \left[ \begin{array}{ccc} a & b & c \\ d & e & f \\ g & h & i \end{array} \right] $$is written as:
$$ \left| \begin{array}{ccc} a & b & c \\ d & e & f \\ g & h & i \end{array} \right| $$And has the value:
$$ (aei + bfg + cdh) - (ceg + bdi + afh) $$One way to manually calculate the determinant of a matrix is to use the rule of Sarrus: we repeat the last columns (all columns but the first one) in the right side of the matrix and calculate the sum of the products of three diagonal north-west to south-east lines of matrix elements, minus the sum of the products of three diagonal south-west to north-east lines of elements as illustrated in the following figure:
In Numpy, the determinant is computed with the linalg.det
function:
A = np.array([[1, 2], [3, 4]])
print('A:\n', A);
A: [[1 2] [3 4]]
print('Determinant of A:\n', np.linalg.det(A))
Determinant of A: -2.0
The identity matrix $\mathbf{I}$ is a matrix with ones in the main diagonal and zeros otherwise. The 3x3 identity matrix is:
$$ \mathbf{I} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} $$In Numpy, instead of manually creating this matrix we can use the function eye
:
np.eye(3) # identity 3x3 array
array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]])
The inverse of the matrix $\mathbf{A}$ is the matrix $\mathbf{A^{-1}}$ such that the product between these two matrices is the identity matrix:
$$ \mathbf{A}\cdot\mathbf{A^{-1}} = \mathbf{I} $$The calculation of the inverse of a matrix is usually not simple (the inverse of the matrix $\mathbf{A}$ is not $1/\mathbf{A}$; there is no division operation between matrices). The Numpy function linalg.inv
computes the inverse of a square matrix:
numpy.linalg.inv(a)
Compute the (multiplicative) inverse of a matrix.
Given a square matrix a, return the matrix ainv satisfying dot(a, ainv) = dot(ainv, a) = eye(a.shape[0]).
A = np.array([[1, 2], [3, 4]])
print('A:\n', A)
Ainv = np.linalg.inv(A)
print('Inverse of A:\n', Ainv);
A: [[1 2] [3 4]] Inverse of A: [[-2. 1. ] [ 1.5 -0.5]]
For a non-square matrix, its inverse is not defined. However, we can calculate what it's known as the pseudo-inverse.
Consider a non-square matrix, $\mathbf{A}$. To calculate its inverse, note that the following manipulation results in the identity matrix:
The $\mathbf{A} \mathbf{A}^T$ is a square matrix and is invertible (also nonsingular) if $\mathbf{A}$ is L.I. (linearly independent rows/columns).
The matrix $\mathbf{A}^T(\mathbf{A}\mathbf{A}^T)^{-1}$ is known as the generalized inverse or Moore–Penrose pseudoinverse of the matrix $\mathbf{A}$, a generalization of the inverse matrix.
To compute the Moore–Penrose pseudoinverse, we could calculate it by a naive approach in Python:
from numpy.linalg import inv
Ainv = A.T @ inv(A @ A.T)
But both Numpy and Scipy have functions to calculate the pseudoinverse, which might give greater numerical stability (but read Inverses and pseudoinverses. Numerical issues, speed, symmetry). Of note, numpy.linalg.pinv calculates the pseudoinverse of a matrix using its singular-value decomposition (SVD) and including all large singular values (using the LAPACK (Linear Algebra Package) routine gesdd), whereas scipy.linalg.pinv calculates a pseudoinverse of a matrix using a least-squares solver (using the LAPACK method gelsd) and scipy.linalg.pinv2 also uses SVD to find the pseudoinverse (also using the LAPACK routine gesdd).
For example:
from scipy.linalg import pinv2
A = np.array([[1, 0, 0], [0, 1, 0]])
Apinv = pinv2(A)
print('Matrix A:\n', A)
print('Pseudo-inverse of A:\n', Apinv)
print('A x Apinv:\n', A@Apinv)
Matrix A: [[1 0 0] [0 1 0]] Pseudo-inverse of A: [[ 1. 0.] [ 0. 1.] [ 0. 0.]] A x Apinv: [[ 1. 0.] [ 0. 1.]]
A square matrix is said to be orthogonal if:
As consequence:
However, keep in mind that not all matrices with determinant equals to one are orthogonal, for example, the matrix:
$$ \begin{bmatrix} 3 & 2 \\ 4 & 3 \end{bmatrix} $$Has determinant equals to one but it is not orthogonal (the columns or rows don't have norm equals to one).
A linear equation is an algebraic equation in which each term is either a constant or the product of a constant and (the first power of) a single variable (Wikipedia).
We are interested in solving a set of linear equations where two or more variables are unknown, for instance:
$$ x + 2y = 4 $$$$ 3x + 4y = 10 $$Let's see how to employ the matrix formalism to solve these equations (even that we know the solution is x=2
and y=1
).
Let's express this set of equations in matrix form:
And for the general case:
$$ \mathbf{Av} = \mathbf{c} $$Where $\mathbf{A, v, c}$ are the matrices above and we want to find the values x,y
for the matrix $\mathbf{v}$.
Because there is no division of matrices, we can use the inverse of $\mathbf{A}$ to solve for $\mathbf{v}$:
As we know how to compute the inverse of $\mathbf{A}$, the solution is:
A = np.array([[1, 2], [3, 4]])
Ainv = np.linalg.inv(A)
c = np.array([4, 10])
v = np.dot(Ainv, c)
print('v:\n', v)
v: [ 2. 1.]
What we expected.
However, the use of the inverse of a matrix to solve equations is computationally inefficient.
Instead, we should use linalg.solve
for a determined system (same number of equations and unknowns) or linalg.lstsq
otherwise:
From the help for solve
:
numpy.linalg.solve(a, b)[source]
Solve a linear matrix equation, or system of linear scalar equations.
Computes the “exact” solution, x, of the well-determined, i.e., full rank, linear matrix equation ax = b.
v = np.linalg.solve(A, c)
print('Using solve:')
print('v:\n', v)
Using solve: v: [ 2. 1.]
And from the help for lstsq
:
numpy.linalg.lstsq(a, b, rcond=-1)[source]
Return the least-squares solution to a linear matrix equation.
Solves the equation a x = b by computing a vector x that minimizes the Euclidean 2-norm || b - a x ||^2. The equation may be under-, well-, or over- determined (i.e., the number of linearly independent rows of a can be less than, equal to, or greater than its number of linearly independent columns). If a is square and of full rank, then x (but for round-off error) is the “exact” solution of the equation.
v = np.linalg.lstsq(A, c)[0]
print('Using lstsq:')
print('v:\n', v)
Using lstsq: v: [ 2. 1.]
Same solutions, of course.
When a system of equations has a unique solution, the determinant of the square matrix associated to this system of equations is nonzero.
When the determinant is zero there are either no solutions or many solutions to the system of equations.
But if we have an overdetermined system:
$$ x + 2y = 4 $$$$ 3x + 4y = 10 $$$$ 5x + 6y = 15 $$(Note that the possible solution for this set of equations is not exact because the last equation should be equal to 16.)
Let's try to solve it:
A = np.array([[1, 2], [3, 4], [5, 6]])
print('A:\n', A)
c = np.array([4, 10, 15])
print('c:\n', c);
A: [[1 2] [3 4] [5 6]] c: [ 4 10 15]
Because the matix $\mathbf{A}$ is not squared, we can calculate its pseudo-inverse or use the function linalg.lstsq
:
v = np.linalg.lstsq(A, c)[0]
print('Using lstsq:')
print('v:\n', v)
Using lstsq: v: [ 1.3333 1.4167]
The functions inv
and solve
failed because the matrix $\mathbf{A}$ was not square (overdetermined system). The function lstsq
not only was able to handle an overdetermined system but was also able to find the best approximate solution.
And if the the set of equations was undetermined, lstsq
would also work. For instance, consider the system:
And in matrix form:
$$ \begin{bmatrix} 1 & 2 & 2 \\ 3 & 4 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} = \begin{bmatrix} 10 \\ 13 \end{bmatrix} $$A possible solution would be x=2,y=1,z=3
, but other values would also satisfy this set of equations.
Let's try to solve using lstsq
:
A = np.array([[1, 2, 2], [3, 4, 1]])
print('A:\n', A)
c = np.array([10, 13])
print('c:\n', c);
A: [[1 2 2] [3 4 1]] c: [10 13]
v = np.linalg.lstsq(A, c)[0]
print('Using lstsq:')
print('v:\n', v);
Using lstsq: v: [ 0.8 2. 2.6]
This is an approximated solution and as explained in the help of solve
, this solution, v
, is the one that minimizes the Euclidean norm $|| \mathbf{c - A v} ||^2$.