Week 1: Matrices, vectors, matrix/vector norms, scalar products & unitary matrices
Week 2: TAs-week (Strassen, FFT, a bit of SVD)
Week 3: Matrix ranks, singular value decomposition, linear systems
Today we will talk about:
A matrix can be considered as a sequence of vectors, its columns:
$$
A = [a_1, \ldots, a_m],
$$
where $a_m \in \mathbb{C}^n$.
A matrix-by-vector product is equivalent to taking a linear combination of those columns
$$
y = Ax, \quad \Longleftrightarrow y = a_1 x_1 + a_2 x_2 + \ldots +a_m x_m.
$$
This is a special case of block matrix notation that we have already seen.
The vectors are called linearly dependent, if there exist non-zero coefficients $x_i$ such that
$$\sum_i a_i x_i = 0,$$
or in the matrix form,
$$
Ax = 0, \quad \Vert x \Vert \ne 0.
$$
In this case, we say that the matrix $A$ has a non-trivial nullspace.
A linear space is defined as all possible vectors of the form
$$
\sum_{i=1}^m a_i x_i,
$$
where $x_i$ are some coefficients and $a_i, i = 1, \ldots, m$ are given vectors. In the matrix form, the linear space is a collection of the vectors of the form
$$A x.$$
This set is also called the range of the matrix.
What is a matrix rank?
A matrix rank is defined as the maximal number of linearly independent columns in a matrix, or the dimension of its column space.
You can also use linear combination of rows to define the rank.
** Theorem**
The dimension of the column space of the matrix is equal to the dimension of the row space of the matrix.
Why do we care?
Johnson–Lindenstrauss lemma
Given $0 < \epsilon < 1$, a set of points in $\mathbb{R}^N$ and $n > \frac{8 \log m}{\epsilon^2}$
and linear map $f$ from $\mathbb{R}^N \rightarrow \mathbb{R}^M$
$$(1 - \epsilon) \Vert u - v \Vert^2 \leq \Vert f(u) - f(v) \Vert^2 \leq (1 + \epsilon) \Vert u - v \Vert^2.$$A very useful representation for computation of the matrix rank is the skeleton decomposition, which can be graphically represented as follows:
or in the matrix form $$ A = U \widehat{A}^{-1} V^{\top}, $$ where $U$ are some $r$ columns of $A$, $V^{\top}$ are some rows of $A$, $\widehat{A}$ is the submatrix on the intersection, which should be **nonsingular**.Remark.
An inverse of the matrix $P$ the matrix $Q = P^{-1}$ such that
$ P Q = QP = I$.
If the matrix is square and has full rank (i.e. equal to $n$) then the inverse exists.
Let $U$ be the $r$ columns based on the submatrix $\widehat{A}$. Then they are linearly independent. Take any other column $u$ of $A$. Then it is a linear combination of the columns of $U$, i.e.
$u = U x$.
These are $n$ equations. We take $r$ of those corresponding to the rows that contain $\widehat{A}$ and get the equation
$$\widehat{u} = \widehat{A} x,$$ therefore
$$x = \widehat{A}^{-1} \widehat{u},$$ and that gives us the result.
Any rank-$r$ matrix can be written in the form $$A = U \widehat{A}^{-1} V^{\top},$$ where $U$ is $n \times r$, $V$ is $m \times r$ and $\widehat{A}$ is $r \times r$, or $$ A = U' V'^{\top}. $$ So, every rank-$r$ matrix can be written as a product of a "tall" matrix $U'$ by a long matrix $V'$.
In the index form, it is
$$
a_{ij} = \sum_{\alpha=1}^r u_{i \alpha} v_{j \alpha}.
$$
For rank 1 we have
$$
a_{ij} = u_i v_j,
$$
i.e. it is a separation of indices and rank-$r$ is a sum of rank-$1$ matrices!
It is interesting to note, that for the rank-$r$ matrix $$A = U V^{\top}$$ only $U$ and $V$ can be stored, which gives us $(n+m) r$ parameters, so it can be used for compression. We can also compute matrix-by-vector product much faster.
import numpy as np
n = 1000
r = 10
u = np.random.randn(n, r)
v = np.random.randn(n, r)
a = u.dot(v.T)
x = np.random.randn(n)
%timeit a.dot(x)
%timeit u.dot(v.T.dot(x))
The slowest run took 6.03 times longer than the fastest. This could mean that an intermediate result is being cached 1000 loops, best of 3: 562 µs per loop The slowest run took 6.92 times longer than the fastest. This could mean that an intermediate result is being cached 100000 loops, best of 3: 12.8 µs per loop
We can also try to compute the matrix rank using the built-in np.linalg.matrix_rank
function
#Computing matrix rank
import numpy as np
n = 50
a = np.ones((n, n))
print 'Rank of the matrix:', np.linalg.matrix_rank(a)
b = a + 1e-6 * np.random.randn(n, n)
print 'Rank of the matrix:', np.linalg.matrix_rank(b, tol=1e-8)
Rank of the matrix: 1 Rank of the matrix: 50
For any rank-$r$ matrix $A$ with $r < \min(m, n)$ there is a matrix $B$ such that its rank rank is equal to $\min(m, n)$ and
$$
\Vert A - B \Vert = \varepsilon.
$$
So, does this mean that numerically matrix rank has no meaning? (I.e., small perturbations lead to full rank)!
The solution is to compute the best rank-r approximation to a matrix.
To compute low-rank approximation, we need to compute singular value decomposition.
Any matrix $A$ can be written as a product of three matrices:
$$
A = U \Sigma V^*,
$$
where $U$ is an $n \times K$ unitary matrix, $V$ is an $m \times K$ unitary matrix, $K = \min(m, n)$ $\Sigma$ is a diagonal matrix with non-negative elements $\sigma_1 \geq \ldots, \geq \sigma_K$ on the diagonal.
Computation of the best rank-$r$ approximation is equivalent to setting $\sigma_{r+1}= 0, \ldots, \sigma_K = 0$. The error is then determined by the omitted singular value!
$$
\min_{A_r} \Vert A - A_r \Vert = \sigma_{r+1},
$$
that is why it is important to look at the decay of the singular vectors.
$\Vert A - A_r \Vert = \Vert U \Sigma V^* - A_r \Vert = \Vert \Sigma - U^* A_r V \Vert$,
where we used that for spectral norm (and for the Frobenius as well) we have
for any unitary $Q$ and any matrix $X$. This is called unitary equivalence of the spectral norm (and Frobenius as well).
What is left is to note that the best rank-$r$ approximation of the diagonal matrix is its subdiagonal.
Computing the SVD is tricky; we will talk about the algorithms hidden in this computation later. But we already can do this in Python!
#Computing matrix rank
import numpy as np
n = 50
a = np.ones((n, n))
print 'Rank of the matrix:', np.linalg.matrix_rank(a)
b = a + 1e-5 * np.random.randn(n, n)
print 'Rank of the matrix:', np.linalg.matrix_rank(b)
Rank of the matrix: 1 Rank of the matrix: 50
u, s, v = np.linalg.svd(b)
print s[1]/s[0]
r = 1
u1 = u[:, :r]
s1 = s[:r]
v1 = v[:r, :]
a1 = u1.dot(np.diag(s1).dot(v1))
print np.linalg.norm(b - a1, 2)/s[0]
2.62173054286e-06 2.62173054287e-06
We can use SVD to compute approximations of function-related matrices, i.e. the matrices of the form $$a_{ij} = f(x_i, y_j),$$ where $f$ is a certain function, and $x_i, \quad i = 1, \ldots, n$ and $y_j, \quad j = 1, \ldots, m$ are some one-dimensional grids.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
n = 1000
a = [[1.0/(i+j+1) for i in xrange(n)] for j in xrange(n)] #Hilbert matrix
a = np.array(a)
u, s, v = np.linalg.svd(a)
fig, ax = plt.subplots(1, 1)
ax.semilogy(s[:30])
#We have very good low-rank approximation of it!
[<matplotlib.lines.Line2D at 0x11ec0f150>]
Now we can do something more interesting, like function approximation
import numpy as np
n = 128
t = np.linspace(0, 5, n)
x, y = np.meshgrid(t, t)
f = 1.0 / (x + y + 0.5)
u, s, v = np.linalg.svd(f, full_matrices=False)
r = 10
u = u[:, :r]
s = s[:r]
v = v[:r, :] #Mind the transpose here!
fappr = u.dot(np.diag(s).dot(v))
er = np.linalg.norm(fappr - f, 'fro') / np.linalg.norm(f, 'fro')
print er
plt.semilogy(s/s[0])
3.61306034376e-10
[<matplotlib.lines.Line2D at 0x11f5dcf90>]
And do some 3D plotting
%matplotlib inline
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(121, projection='3d')
ax.plot_surface(x, y, f)
ax.set_title('Original function')
ax = fig.add_subplot(122, projection='3d')
ax.plot_surface(x, y, fappr - f)
ax.set_title('Approximation error with rank=%d, err=%3.1e' % (r, er))
fig.subplots_adjust()
fig.tight_layout()
Consider a linear factor model,
$$y = Ax, $$where $y$ is a vector of length $n$, and $x$ is a vector of length $r$.
The data is organizes as samples: we observe vectors
$$y_1, \ldots, y_T,$$
then the factor model can be written as
$$
Y = AX,
$$
where $Y$ is $n \times T$, $A$ is $n \times r$ and $X$ is $r \times T$. This is exactly a rank-$r$ model: it tells us that the vector lie in a small subspace!
We also can use SVD to recover this subspace (but not the independent components). Principal component analysis can be done by SVD!
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()