**Week 1:** Matrices, vectors, matrix/vector norms, scalar products & unitary matrices

**Week 2:** TAs-week (Strassen, FFT, a bit of SVD)

**Week 3:** Matrix ranks, singular value decomposition, linear systems

- Concept of block matrices (Lecture 3.1)
- Matrix norms, scalar products, unitary matrices (Lecture 3.2)

Today we will talk about:

- Rank of the matrix, idea of linear dependence
- Singular value decomposition (SVD) and low-rank approximation

A matrix can be considered as a sequence of vectors, its columns:
$$
A = [a_1, \ldots, a_m],
$$
where $a_m \in \mathbb{C}^n$.

A matrix-by-vector product is equivalent to taking a linear combination of those columns

$$
y = Ax, \quad \Longleftrightarrow y = a_1 x_1 + a_2 x_2 + \ldots +a_m x_m.
$$

This is a special case of **block matrix notation** that we have already seen.

The vectors are called linearly dependent, if there exist non-zero coefficients $x_i$ such that

$$\sum_i a_i x_i = 0,$$
or in the matrix form,
$$
Ax = 0, \quad \Vert x \Vert \ne 0.
$$
In this case, we say that the matrix $A$ has a non-trivial **nullspace**.

A **linear space** is defined as all possible vectors of the form
$$
\sum_{i=1}^m a_i x_i,
$$
where $x_i$ are some coefficients and $a_i, i = 1, \ldots, m$ are given vectors. In the matrix form, the linear space is a collection of the vectors of the form

$$A x.$$
This set is also called the **range** of the matrix.

What is a matrix rank?

A matrix rank is defined as the maximal number of linearly independent columns in a matrix, or the **dimension of its column space**.

You can also use linear combination of rows to define the rank.

** Theorem**

The dimension of the column space of the matrix is equal to the dimension of the row space of the matrix.

- Of course, not! (I.e. a random matrix is a full-rank matrices).

Why do we care?

**Johnson–Lindenstrauss lemma**

Given $0 < \epsilon < 1$, a set of points in $\mathbb{R}^N$ and $n > \frac{8 \log m}{\epsilon^2}$

and linear map $f$ from $\mathbb{R}^N \rightarrow \mathbb{R}^M$

$$(1 - \epsilon) \Vert u - v \Vert^2 \leq \Vert f(u) - f(v) \Vert^2 \leq (1 + \epsilon) \Vert u - v \Vert^2.$$A very useful representation for computation of the matrix rank is the **skeleton decomposition**, which can be graphically represented as follows:

or in the matrix form
$$
A = U \widehat{A}^{-1} V^{\top},
$$
where $U$ are some $r$ columns of $A$, $V^{\top}$ are some rows of $A$, $\widehat{A}$ is the submatrix on the intersection, which should be **nonsingular**.

**Remark.**

An inverse of the matrix $P$ the matrix $Q = P^{-1}$ such that

$ P Q = QP = I$.

If the matrix is square and has full rank (i.e. equal to $n$) then the inverse exists.

Let $U$ be the $r$ columns based on the submatrix $\widehat{A}$. Then they are linearly independent. Take any other column $u$ of $A$. Then it is a linear combination of the columns of $U$, i.e.

$u = U x$.

These are $n$ equations. We take $r$ of those corresponding to the rows that contain $\widehat{A}$ and get the equation

$$\widehat{u} = \widehat{A} x,$$ therefore

$$x = \widehat{A}^{-1} \widehat{u},$$ and that gives us the result.

Any rank-$r$ matrix can be written in the form $$A = U \widehat{A}^{-1} V^{\top},$$ where $U$ is $n \times r$, $V$ is $m \times r$ and $\widehat{A}$ is $r \times r$, or $$ A = U' V'^{\top}. $$ So, every rank-$r$ matrix can be written as a product of a "tall" matrix $U'$ by a long matrix $V'$.

In the index form, it is

$$
a_{ij} = \sum_{\alpha=1}^r u_{i \alpha} v_{j \alpha}.
$$
For rank 1 we have
$$
a_{ij} = u_i v_j,
$$
i.e. it is a separation of indices and rank-$r$ is a sum of rank-$1$ matrices!

It is interesting to note, that for the rank-$r$ matrix $$A = U V^{\top}$$ only $U$ and $V$ can be stored, which gives us $(n+m) r$ parameters, so it can be used for compression. We can also compute matrix-by-vector product much faster.

In [4]:

```
import numpy as np
n = 1000
r = 10
u = np.random.randn(n, r)
v = np.random.randn(n, r)
a = u.dot(v.T)
x = np.random.randn(n)
%timeit a.dot(x)
%timeit u.dot(v.T.dot(x))
```

We can also try to compute the matrix rank using the built-in `np.linalg.matrix_rank`

function

In [26]:

```
#Computing matrix rank
import numpy as np
n = 50
a = np.ones((n, n))
print 'Rank of the matrix:', np.linalg.matrix_rank(a)
b = a + 1e-6 * np.random.randn(n, n)
print 'Rank of the matrix:', np.linalg.matrix_rank(b, tol=1e-8)
```

For any rank-$r$ matrix $A$ with $r < \min(m, n)$ there is a matrix $B$ such that its rank rank is equal to $\min(m, n)$ and
$$
\Vert A - B \Vert = \varepsilon.
$$
So, does this mean that numerically matrix rank has no meaning? (I.e., small perturbations lead to full rank)!

The solution is to compute the best **rank-r** approximation to a matrix.

To compute low-rank approximation, we need to compute **singular value decomposition**.

Any matrix $A$ can be written as a product of three matrices:

$$
A = U \Sigma V^*,
$$
where $U$ is an $n \times K$ unitary matrix, $V$ is an $m \times K$ unitary matrix, $K = \min(m, n)$ $\Sigma$ is a diagonal matrix with non-negative elements $\sigma_1 \geq \ldots, \geq \sigma_K$ on the diagonal.

Computation of the best rank-$r$ approximation is equivalent to setting $\sigma_{r+1}= 0, \ldots, \sigma_K = 0$. The error is then determined by the omitted singular value!

$$
\min_{A_r} \Vert A - A_r \Vert = \sigma_{r+1},
$$
that is why it is important to look at the decay of the singular vectors.

$\Vert A - A_r \Vert = \Vert U \Sigma V^* - A_r \Vert = \Vert \Sigma - U^* A_r V \Vert$,

where we used that for spectral norm (and for the Frobenius as well) we have

for any unitary $Q$ and any matrix $X$. This is called **unitary equivalence** of the spectral norm (and Frobenius as well).

What is left is to note that the best rank-$r$ approximation of the diagonal matrix is its subdiagonal.

Computing the SVD is tricky; we will talk about the algorithms hidden in this computation later. But we already can do this in Python!

In [1]:

```
#Computing matrix rank
import numpy as np
n = 50
a = np.ones((n, n))
print 'Rank of the matrix:', np.linalg.matrix_rank(a)
b = a + 1e-5 * np.random.randn(n, n)
print 'Rank of the matrix:', np.linalg.matrix_rank(b)
```

In [6]:

```
u, s, v = np.linalg.svd(b)
print s[1]/s[0]
r = 1
u1 = u[:, :r]
s1 = s[:r]
v1 = v[:r, :]
a1 = u1.dot(np.diag(s1).dot(v1))
print np.linalg.norm(b - a1, 2)/s[0]
```

We can use SVD to compute approximations of **function-related** matrices, i.e. the matrices of the form
$$a_{ij} = f(x_i, y_j),$$
where $f$ is a certain function, and $x_i, \quad i = 1, \ldots, n$ and $y_j, \quad j = 1, \ldots, m$ are some **one-dimensional grids**.

In [5]:

```
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
n = 1000
a = [[1.0/(i+j+1) for i in xrange(n)] for j in xrange(n)] #Hilbert matrix
a = np.array(a)
u, s, v = np.linalg.svd(a)
fig, ax = plt.subplots(1, 1)
ax.semilogy(s[:30])
#We have very good low-rank approximation of it!
```

Out[5]:

Now we can do something more interesting, like function approximation

In [15]:

```
import numpy as np
n = 128
t = np.linspace(0, 5, n)
x, y = np.meshgrid(t, t)
f = 1.0 / (x + y + 0.5)
u, s, v = np.linalg.svd(f, full_matrices=False)
r = 10
u = u[:, :r]
s = s[:r]
v = v[:r, :] #Mind the transpose here!
fappr = u.dot(np.diag(s).dot(v))
er = np.linalg.norm(fappr - f, 'fro') / np.linalg.norm(f, 'fro')
print er
plt.semilogy(s/s[0])
```

Out[15]:

And do some 3D plotting

In [16]:

```
%matplotlib inline
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(121, projection='3d')
ax.plot_surface(x, y, f)
ax.set_title('Original function')
ax = fig.add_subplot(122, projection='3d')
ax.plot_surface(x, y, fappr - f)
ax.set_title('Approximation error with rank=%d, err=%3.1e' % (r, er))
fig.subplots_adjust()
fig.tight_layout()
```

Consider a linear factor model,

$$y = Ax, $$where $y$ is a vector of length $n$, and $x$ is a vector of length $r$.

The data is organizes as samples: we observe vectors

$$y_1, \ldots, y_T,$$
then the factor model can be written as

$$
Y = AX,
$$
where $Y$ is $n \times T$, $A$ is $n \times r$ and $X$ is $r \times T$. This is exactly a rank-$r$ model: it tells us that the vector lie in a small subspace!

We also can use SVD to recover this subspace (but not the independent components). Principal component analysis can be done by SVD!

- Matrix rank definition
- Skeleton approximation and dyadic representation of a rank-$r$ matrix
- Singular value decomposition

- Linear systems
- Inverse matrix
- Condition number
- Linear least squares
- Pseudoinverse

In [3]:

```
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()
```

Out[3]:

In [ ]:

```
```