Matrix

Marcos Duarte

A matrix is a square or rectangular array of numbers or symbols (termed elements), arranged in rows and columns.

For instance:

$$ \mathbf{A} = \begin{bmatrix} a_{1,1} & a_{1,2} & a_{1,3} \\ a_{2,1} & a_{2,2} & a_{2,3} \end{bmatrix} $$$$ \mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} $$

The matrix $\mathbf{A}$ above has two rows and three columns, it is a 2x3 matrix.

In NumPy:

In [1]:
# Import the necessary libraries
from __future__ import division, print_function
import numpy as np
from IPython.display import display
np.set_printoptions(precision=4)  # number of digits of precision for floating point
In [2]:
A = np.array([[1, 2, 3], [4, 5, 6]])
A
Out[2]:
array([[1, 2, 3],
       [4, 5, 6]])

To get information about the number of elements and the structure of the matrix (in fact, a NumPy array), we can use:

In [3]:
print('A = ', A)
print('len(A) = ', len(A))
print('np.size(A) = ', np.size(A))
print('np.shape(A) = ', np.shape(A))
print('np.ndim(A) = ', np.ndim(A))
A =  [[1 2 3]
 [4 5 6]]
len(A) =  2
np.size(A) =  6
np.shape(A) =  (2, 3)
np.ndim(A) =  2

We could also have accessed this information with the correspondent methods:

In [4]:
print('A.size = ', A.size)
print('A.shape = ', A.shape)
print('A.ndim = ', A.ndim)
A.size =  6
A.shape =  (2, 3)
A.ndim =  2

We used the array function in NumPy to represent a matrix. A NumPy array is in fact different than a matrix, if we want to use explicit matrices in Numpy, we have to use the function mat:

In [5]:
B = np.mat([[1, 2, 3], [4, 5, 6]])
B
Out[5]:
matrix([[1, 2, 3],
        [4, 5, 6]])

Both array and matrix types work in NumPy, but you should choose only one type and not mix them; the array is preferred because it is the standard vector/matrix/tensor type of NumPy. So, let's use the array type for the rest of this text.

Addition and multiplication

The sum of two m-by-n matrices $\mathbf{A}$ and $\mathbf{B}$ is another m-by-n matrix:

$$ \mathbf{A} = \begin{bmatrix} a_{1,1} & a_{1,2} & a_{1,3} \\ a_{2,1} & a_{2,2} & a_{2,3} \end{bmatrix} \;\;\; \text{and} \;\;\; \mathbf{B} = \begin{bmatrix} b_{1,1} & b_{1,2} & b_{1,3} \\ b_{2,1} & b_{2,2} & b_{2,3} \end{bmatrix} $$$$ \mathbf{A} + \mathbf{B} = \begin{bmatrix} a_{1,1}+b_{1,1} & a_{1,2}+b_{1,2} & a_{1,3}+b_{1,3} \\ a_{2,1}+b_{2,1} & a_{2,2}+b_{2,2} & a_{2,3}+b_{2,3} \end{bmatrix} $$

In NumPy:

In [6]:
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([[7, 8, 9], [10, 11, 12]])
print('A ='), display(A)
print('B ='), display(B)
print('A + B ='), display(A+B);
A =
array([[1, 2, 3],
       [4, 5, 6]])
B =
array([[ 7,  8,  9],
       [10, 11, 12]])
A + B =
array([[ 8, 10, 12],
       [14, 16, 18]])

The multiplication of the m-by-n matrix $\mathbf{A}$ by the n-by-p matrix $\mathbf{B}$ is a m-by-p matrix:

$$ \mathbf{A} = \begin{bmatrix} a_{1,1} & a_{1,2} \\ a_{2,1} & a_{2,2} \end{bmatrix} \;\;\; \text{and} \;\;\; \mathbf{B} = \begin{bmatrix} b_{1,1} & b_{1,2} & b_{1,3} \\ b_{2,1} & b_{2,2} & b_{2,3} \end{bmatrix} $$$$ \mathbf{A} \mathbf{B} = \begin{bmatrix} a_{1,1}b_{1,1} + a_{1,2}b_{2,1} & a_{1,1}b_{1,2} + a_{1,2}b_{2,2} & a_{1,1}b_{1,3} + a_{1,2}b_{2,3} \\ a_{2,1}b_{1,1} + a_{2,2}b_{2,1} & a_{2,1}b_{1,2} + a_{2,2}b_{2,2} & a_{2,1}b_{1,3} + a_{2,2}b_{2,3} \end{bmatrix} $$

In NumPy:

In [7]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6, 7], [8, 9, 10]])
print('A ='), display(A)
print('B ='), display(B)
print('AB ='), display(np.dot(A, B));
A =
array([[1, 2],
       [3, 4]])
B =
array([[ 5,  6,  7],
       [ 8,  9, 10]])
AB =
array([[21, 24, 27],
       [47, 54, 61]])

Note that because the array type is not truly a matrix type, we used the dot product to calculate matrix multiplication.
We can use the matrix type to show the equivalent:

In [8]:
A = np.mat(A)
B = np.mat(B)
print('A ='), display(A)
print('B ='), display(B)
print('AB ='), display(A*B);
A =
matrix([[1, 2],
        [3, 4]])
B =
matrix([[ 5,  6,  7],
        [ 8,  9, 10]])
AB =
matrix([[21, 24, 27],
        [47, 54, 61]])

Same result as before.

The order in multiplication matters, $\mathbf{AB} \neq \mathbf{BA}$:

In [9]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
print('A ='), display(A)
print('B ='), display(B)
print('AB ='), display(np.dot(A, B))
print('BA ='), display(np.dot(B, A));
A =
array([[1, 2],
       [3, 4]])
B =
array([[5, 6],
       [7, 8]])
AB =
array([[19, 22],
       [43, 50]])
BA =
array([[23, 34],
       [31, 46]])

The addition or multiplication of a scalar (a single number) to a matrix is performed over all the elements of the matrix:

In [10]:
A = np.array([[1, 2], [3, 4]])
c = 10
print('A ='), display(A)
print('c =', c)
print('c + A ='), display(c+A)
print('cA ='), display(c*A);
A =
array([[1, 2],
       [3, 4]])
c = 10
c + A =
array([[11, 12],
       [13, 14]])
cA =
array([[10, 20],
       [30, 40]])

Transposition

The transpose of the matrix $\mathbf{A}$ is the matrix $\mathbf{A^T}$ turning all the rows of matrix $\mathbf{A}$ into columns (or columns into rows):

$$ \mathbf{A} = \begin{bmatrix} a & b & c \\ d & e & f \end{bmatrix} \;\;\;\;\;\;\iff\;\;\;\;\;\; \mathbf{A^T} = \begin{bmatrix} a & d \\ b & e \\ c & f \end{bmatrix} $$

In NumPy, the transpose operator can be used as a method or function:

In [11]:
A = np.array([[1, 2], [3, 4]])
print('A ='), display(A)
print('A.T ='), display(A.T)
print('np.transpose(A) ='), display(np.transpose(A));
A =
array([[1, 2],
       [3, 4]])
A.T =
array([[1, 3],
       [2, 4]])
np.transpose(A) =
array([[1, 3],
       [2, 4]])

Determinant

The determinant is a number associated with a square matrix.

The determinant of the following matrix:

$$ \left[ \begin{array}{ccc} a & b & c \\ d & e & f \\ g & h & i \end{array} \right] $$

is written as:

$$ \left| \begin{array}{ccc} a & b & c \\ d & e & f \\ g & h & i \end{array} \right| $$

And has the value:

$$ (aei + bfg + cdh) - (ceg + bdi + afh) $$

One way to manually calculate the determinant of a matrix is to use the rule of Sarrus: we repeat the last columns (all columns but the first one) in the right side of the matrix and calulate the sum of the products of three diagonal north-west to south-east lines of matrix elements, minus the sum of the products of three diagonal south-west to north-east lines of elements as illustrated in the following figure:

Rule of Sarrus
Figure. Rule of Sarrus: the sum of the products of the solid diagonals minus the sum of the products of the dashed diagonals (image from Wikipedia).

In NumPy, the determinant is computed with the linalg.det function:

In [12]:
A = np.array([[1, 2], [3, 4]])
print('A ='), display(A);
A =
array([[1, 2],
       [3, 4]])
In [13]:
print('Determinant of A =')
print(np.linalg.det(A))
Determinant of A =
-2.0

Identity

The identity matrix $\mathbf{I}$ is a matrix with ones in the main diagonal and zeros otherwise. The 3x3 identity matrix is:

$$ \mathbf{I} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} $$

In Numpy, instead of manually creating this matrix we can use the function eye:

In [14]:
np.eye(3)  # identity 3x3 array
Out[14]:
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

Inverse

The inverse of the matrix $\mathbf{A}$ is the matrix $\mathbf{A^{-1}}$ such that the product between these two matrices is the identity matrix:

$$ \mathbf{A}\cdot\mathbf{A^{-1}} = \mathbf{I} $$

The calculation of the inverse of a matrix is usually not simple (the inverse of the matrix $\mathbf{A}$ is not $1/\mathbf{A}$; there is no division operation between matrices). The Numpy function linalg.inv computes the inverse of a square matrix:

numpy.linalg.inv(a)
Compute the (multiplicative) inverse of a matrix.
Given a square matrix a, return the matrix ainv satisfying dot(a, ainv) = dot(ainv, a) = eye(a.shape[0]).
In [15]:
A = np.array([[1, 2], [3, 4]])
print('A ='), display(A)
Ainv = np.linalg.inv(A)
print('Inverse of A ='), display(Ainv);
A =
array([[1, 2],
       [3, 4]])
Inverse of A =
array([[-2. ,  1. ],
       [ 1.5, -0.5]])

Orthogonality

A square matrix is said to be orthogonal if:

  1. There is no linear combination of one of the lines or columns of the matrix that would lead to the other row or column.
  2. Its columns or rows form a basis of (independent) unit vectors (versors).

As consequence:

  1. Its determinant is equal to 1 or -1.
  2. Its inverse is equal to its transpose.

However, keep in mind that not all matrices with determinant equals to one are orthogonal, for example, the matrix:

$$ \begin{bmatrix} 3 & 2 \\ 4 & 3 \end{bmatrix} $$

Has determinant equals to one but it is not orthogonal (the columns or rows don't have norm equals to one).

Linear equations

A linear equation is an algebraic equation in which each term is either a constant or the product of a constant and (the first power of) a single variable (Wikipedia).

We are interested in solving a set of linear equations where two or more variables are unknown, for instance:

$$ x + 2y = 4 $$$$ 3x + 4y = 10 $$

Let's see how to employ the matrix formalism to solve these equations (even that we know the solution is x=2 and y=1).
Let's express this set of equations in matrix form:

$$ \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 4 \\ 10 \end{bmatrix} $$

And for the general case:

$$ \mathbf{Av} = \mathbf{c} $$

Where $\mathbf{A, v, c}$ are the marices above and we want to find the values x,y for the matrix $\mathbf{v}$.
Because there is no division of matrices, we can use the inverse of $\mathbf{A}$ to solve for $\mathbf{v}$:

$$ \mathbf{A}^{-1}\mathbf{Av} = \mathbf{A}^{-1}\mathbf{c} \implies $$$$ \mathbf{v} = \mathbf{A}^{-1}\mathbf{c} $$

As we know how to compute the inverse of $\mathbf{A}$, the solution is:

In [16]:
Ainv = np.linalg.inv(A)
c = np.array([4, 10])
v = np.dot(Ainv, c)
print('v ='), display(v);
v =
array([ 2.,  1.])

What we expected.

However, the use of the inverse of a matrix to solve equations is computationally inefficient.
Instead, we should use linalg.solve for a determined system (same number of equations and unknowns) or linalg.lstsq otherwise:
From the help for solve:

numpy.linalg.solve(a, b)[source]
Solve a linear matrix equation, or system of linear scalar equations.
Computes the “exact” solution, x, of the well-determined, i.e., full rank, linear matrix equation ax = b.
In [17]:
v = np.linalg.solve(A, c)
print('Using solve:')
print('v ='), display(v);
Using solve:
v =
array([ 2.,  1.])

And from the help for lstsq:

numpy.linalg.lstsq(a, b, rcond=-1)[source]
Return the least-squares solution to a linear matrix equation.
Solves the equation a x = b by computing a vector x that minimizes the Euclidean 2-norm || b - a x ||^2. The equation may be under-, well-, or over- determined (i.e., the number of linearly independent rows of a can be less than, equal to, or greater than its number of linearly independent columns). If a is square and of full rank, then x (but for round-off error) is the “exact” solution of the equation.
In [18]:
v = np.linalg.lstsq(A, c)[0]
print('Using lstsq:')
print('v ='), display(v);
Using lstsq:
v =
array([ 2.,  1.])

Same solutions, of course.

When a system of equations has a unique solution, the determinant of the square matrix associated to this system of equations is nonzero.
When the determinant is zero there are either no solutions or many solutions to the system of equations.

But if we have an overdetermined system:

$$ x + 2y = 4 $$$$ 3x + 4y = 10 $$$$ 5x + 6y = 15 $$

(Note that the possible solution for this set of equations is not exact because the last equation should be equal to 16.)

Let's try to solve it:

In [19]:
A = np.array([[1, 2], [3, 4], [5, 6]])
print('A ='), display(A)
c = np.array([4, 10, 15])
print('c ='), display(c);
A =
array([[1, 2],
       [3, 4],
       [5, 6]])
c =
array([ 4, 10, 15])

Because the matix $\mathbf{A}$ is not squared, the only function that will work is linalg.lstsq:

In [20]:
v = np.linalg.lstsq(A, c)[0]
print('Using lstsq:')
print('v ='), display(v);
Using lstsq:
v =
array([ 1.3333,  1.4167])

The functions inv and solve failed bacause the matrix $\mathbf{A}$ was not square (overdetermined system). The function lstsq not only was able to handle an overdetermined system but was also able to find the best approximate solution.

And if the the set of equations was undetermined, lstsq would also work. For instance, consider the system:

$$ x + 2y + 2z = 10 $$$$ 3x + 4y + z = 13 $$

And in matrix form:

$$ \begin{bmatrix} 1 & 2 & 2 \\ 3 & 4 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} = \begin{bmatrix} 10 \\ 13 \end{bmatrix} $$

A possible solution would be x=2,y=1,z=3, but other values would also satisfy this set of equations.

Let's try to solve using lstsq:

In [21]:
A = np.array([[1, 2, 2], [3, 4, 1]])
print('A ='), display(A)
c = np.array([10, 13])
print('c ='), display(c);
A =
array([[1, 2, 2],
       [3, 4, 1]])
c =
array([10, 13])
In [22]:
v = np.linalg.lstsq(A, c)[0]
print('Using lstsq:')
print('v ='), display(v);
Using lstsq:
v =
array([ 0.8,  2. ,  2.6])

This is an appoximated solution and as explained in the help of solve, this solution, v, is the one that minimizes the Euclidean norm $|| \mathbf{c - A v} ||^2$.