**Week 1:** Matrices, vectors, matrix/vector norms, scalar products & unitary matrices

**Week 2:** TAs-week (Strassen, FFT, a bit of SVD)

**Week 3:** Matrix ranks, singular value decomposition, linear systems, eigenvalues

**Week 4:** Matrix decompositions: QR, LU, SVD + test + structured matrices start

- LU decomposition and Gaussian elimination, Cholesky factorization
- QR decomposition and Gram-Schmidt algorithm
- Schur decomposition and QR-algorithm (basic)

Today we will talk about:

- Algorithms for the symmetric eigenvalue problems (QR-algorithm, Divide-and-Conquer, bisection)
- SVD and its applicationss

If $A = A^*$, then $A$ is a **normal matrix**, thus

By Householder transformations we can reduce the matrix $A$ to the **triadiagonal form** (in the general case, **upper Hessenberg form**)

The iterations of the QR-algorithm have the following form:

$$A_k = Q_k R_k, \quad A_{k+1} = R_k Q_k.$$If $A_0 = A$ is tridiagonal symmetric matrix , this form is preserved by the QR-algorithm.

Let us see..

In [86]:

```
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')
#Generate a random tridiagonal matrix
n = 20
d = np.random.randn(n)
sub_diag = np.random.randn(n-1)
mat = np.diag(d) + np.diag(sub_diag, -1) + np.diag(sub_diag, 1)
mat1 = np.abs(mat)
mat1 = mat1/np.max(mat1.flatten())
plt.spy(mat)
q, r = np.linalg.qr(mat)
plt.figure()
b = r.dot(q)
plt.imshow(np.abs(b))
#plt.figure()
#plt.imshow(np.abs(r.dot(q)))
q[0, :]
```

Out[86]:

In the tridiagonal form, you do not have to compute the **Q** matrix: you only have to compute the **triadiagonal part**

In the "eigenvalues only" mode the computation takes $\mathcal{O}(n)$ for one step.

This is called **implicit QR-step**

All the implicit QR-algorithms are based on the following theorem:

Let $$Q^* A Q = H$$ be a irreducible upper Hessenberg matrix.

Then, the first column of the matrix $Q$ defines all of its other columns.

The convergence of the QR-algorithm is a very **delicate issue** (see Tyrtyshnikov, Brief introduction to numerical analysis for details).

Summary, if

$$A = X \Lambda X^{-1},$$and $$\Lambda = \begin{bmatrix} \Lambda_1 & 0 \\ 0 & \Lambda_2 \end{bmatrix},$$

and there is a **gap** between the eigenvalues of $\Lambda_1$ and $\Lambda_2$, then the $21$ block in the QR-iteration goes to zero with $q = \frac{\lambda_m}{\lambda_{m+1}}$.
}
So we need to increase the gap! The answer is given by the **shifted QR-algorithm**.

There are several ways to select the shifts.

We have **converged part** and not **converged part**: select $s_k = a_k$ (last non-converged diagonal element).

The **Wilkinson shift** is to select the eigenvalue of a $2 \times 2$ matrix. It is more robust, but there are examples of failure.

To answer this question, we have to go back to the power method.

Remember the power method for the computation of the eigenvalues.

$$x_{k+1} := A x_k.$$It converges to the largest eigenvalue in modulus.

The convergence can be arbitrary slow.

However, if we shift the matrix as

$$ A := A - \lambda I,$$the corresponding eigenvalue becomes small (but we need large).

To make a small eigenvalue large, we need to **invert the matrix**, and that gives us **inverse iteration**

And the **Rayleigh quotient iteration** is given by the selection of the **adaptive shift**:

In the symmetric case $A = A^*$ the convergence is **cubic**.

Now let us talk about singular values and eigenvalues.

SVD: $$A = U \Lambda V^*$$

exists for any matrix.

It can be also viewed as a reduction of a given matrix to the diagonal form by means of

two-sided unitary transformations:

$$\Lambda = U^* A V.$$By Householder we can reduce to the **bidiagonal form**

Implicit QR-algorithm (with shifts) gives the way of computing the eigenvalues (and Schur form).

The problem of the computation of the SVD can be reduced to the **symmetric eigenvalue problem** in two ways:

- Reduce to the bidiagonal form
- Work with the tridiagonal matrix $$T = B^* B$$
- Work with the extended matrix $$T = \begin{bmatrix} 0 & B \\ B^{\top} & 0 \end{bmatrix}$$

The case 1. is ok if you **do not form T directly**!

Thus, the problem of computing singular values can be reduced to the problem of the computation of the eigenvalues of symmetric tridiagonal matrix.

- QR-algorithm: the "gold standard" of the eigenvalue computations
- RQI-iteration: Rayleigh quotient iteration is implicitly performed at each step of the QR-algorithm
- Divide-and-conquer: the fastest (?) nowdays
- Bisection method
- Jacobi method

Suppose we have a tridiagonal matrix, and we split it into two blocks:

$$T = \begin{bmatrix} T'_1 & B \\ B^{\top} & T'_2 \end{bmatrix}$$We can write the matrix $T$ as

$$T = \begin{bmatrix} T_1 & 0 \\ 0 & T_2 \end{bmatrix} + b_m v v^{\top}$$Suppose we have decomposed $T_1$ and $T_2$ already:

$$T_1 = Q_1 \Lambda_1 Q^*_1, \quad T_2 = Q_2 \Lambda_2 Q^*_2$$Then,

$$\begin{bmatrix} Q^*_1 & 0 \\ 0 & Q^*_2 \end{bmatrix} T\begin{bmatrix} Q^*_1 & 0 \\ 0 & Q^*_2 \end{bmatrix} = D + \rho u u^{*},$$I.e. we have reduced the problem to the problem of the computation of the eigenvalues of

Diagonal plus low-rank matrix

Is it easy to compute the eigenvalues of the matrix

$$D + \rho u u^* $$The characteristic polynomial has the form

$$\det(D + \rho uu^* - \lambda I) = \det(D - \lambda I)\det(I + \rho (D - \lambda I)^{-1} uu^*) = 0.$$Then (prove!!)

$$\det(I + v u^*) = ?$$$$\det(I + \rho (D - \lambda I)^{-1} uu^*) = 1 + \rho \sum_{i=1}^n \frac{|u_i|^2}{d_i - \lambda} = 0$$How to find the roots?

In [87]:

```
import numpy as np
lm = [1, 2, 3, 4]
M = len(lm)
D = np.array(lm)
a = np.min(lm)
b = np.max(lm)
t = np.linspace(-1, 6, 1000)
u = 0.5 * np.ones(M)
rho = 1
def fun(lam):
return 1 + rho * np.sum(u**2/(D - lam))
res = [fun(lam) for lam in t]
plt.plot(res, 'k')
plt.ylim([-6, 6])
plt.tight_layout()
plt.title('Plot of the function')
```

Out[87]:

The function has only one root at $[d_i, d_{i+1}]$

We have proved, by the way, the **Cauchy interlacing theorem** (what happens to the eigenvalues under rank-$1$ perturbation)

A Newton method will fail (draw a picture with a tangent line).

However, Newton method is just approximation of a function $f(\lambda)$ by a linear function.

Much better approximation is the **hyperbola**:

To fit the coefficients, we have to evaluate $f(\lambda)$ and $f'(\lambda)$ in the particular point.

After that, the approximation can be recovered from solving **quadratic equation**

First, stability: This method was abandoned for a long time due to instability of the computation of the eignenvectors.

The exact expression for the eigenvectors is just

$$(D - \alpha_i I)^{-1}u,$$where $\alpha_i$ is the computed root.

The solution is by using a strange Lowner theorem:

If $\alpha_i$ and $d_i$ satisfy the **interlacing theorem**

Then there exists a vector $\widehat{u}$ such that $\alpha_i$ are exact eigenvalues of the matrix

$$\widehat{D} = D + \widehat{u} \widehat{u}^*.$$So, you first compute the eigenvalues, then compute $\widehat{u}$ and only then the eigenvectors.

In the computations we have to evaluate the sums of the form

$$f(\lambda) = 1 + \rho \sum_{i=1}^n \frac{|u_i|^2}{(d_i - \lambda)},$$and have to do it at least for $n$ points.

The complexity is then $\mathcal{O}(n^2)$, as for the QR-algorithm.

Can we make it $\mathcal{O}(n \log n)$?

The answer is yes, but we have to replace the computations by the approximate ones

by the help of **Fast Multipole Method**.

I will do a short explanation on the whiteboard.

Absolutely different approach is based on the **bisection**.

Given a matrix $A$ its inertia is defined as a triple $(\nu, \zeta, \pi)$,

the number of negative, zero and positive eigenvalues.

If $X$ is non-singular, then

$$Inertia(A) = Inertia(X^* A X)$$Given $z$ we can do the Gaussian elimination:

$$A - zI = L D L^*,$$and inertia of the diagonal matrix is trivial to compute.

Thus, if we want to find all the eigenvalues in the interval $a$, $b$

Using inertia, we can easily count the number of eigenvalues in an interval.

Recall what a Jacobi (Givens rotations) are:

In a plane they correspong to a $2 \times 2$ orthogonal matrix of the form

$$\begin{pmatrix} \cos \phi & \sin \phi \\ -\sin \phi & \cos \phi \end{pmatrix},$$and in the $n$-dimensional case we select two variables $i$ and $j$ and rotate.

The idea of the Jacobi method is to minimize

$$\mathrm{off}( U^* A U), $$by applying succesive Jacobi rotations to "kill" off-diagonal elements.

When the "pivot" is chosen, it is easy to eliminate it.

The main question is then what is the order of **sweeps** we have to make (i.e. in which order to eliminate).

If we always eliminate the largest off-diagonal elements the method has quadratic convergence.

- Many algorithms for the computation of the SEV solution (QR, Divide-and-conquer, bisection, Jacobi)

Now we go to the applications of the SVD.

SVD is extremely important in computational science and engineering.

It has many names: Principal component analysis, Proper Orthogonal Decomposition, Empirical Orthogonal Functions

We will consider:

- Latent semantic analysis
- Collaborative filtering
- Data compression

One of the most famous application is Latent semantic indexing, see, for example,

Deerwester, Scott C., et al. "Indexing by latent semantic analysis." (1990)

The problem setup: we have a set of text documents $D_1, \ldots, D_N.$ We want to solve the search problem: i.e., we have a query as a set of words, and we want to find the best documents.

Our data is processed as follows: for each document we create a list of words contained in the document, and count the frequencies of each word. This is called the **bag of words** model (i.e., the document is unordered set of words).

This is how the term-document matrix $A$ is obtained. Its row size is **the size of the dictionary**.

Its column size is **the number of documents**. An element $A_{ij}$ the the frequency of occurence of the $i$-th word in the $j$-document.

To do the search, we can just multiply a term-documnt matrix by a **search vector**, i.e. a list of words.

**The problem**: The document will be returned only if there is an exact word match. However, we might search for "Samuel Clemens", and hope to get the results for "Mark Twain" as well. But there might be no exact match!

How the SVD can help?

Compute **low-rank approximation** $A_r$ of the term-document matrix $A$.

$$A \approx A_r,$$
and **we do not care about the approximation error** (i.e., we do not require it to be small).

The matrix $A_r$ can be then used to do queries.

We project the documents to **low-dimensional subspace**, given a query $q$ the projection is

$$q_r = \Sigma^{-1}_r U^{\top}_r q$$
Now we can compute the similarity between $d_r$ and other projected documents

$$\widehat{d}_i = \Sigma^{-1}_r U^{\top}_r d_i,$$
and compute the **cosine** of the angles between the query and the projected document.

Now we can test a demo database

In [88]:

```
import numpy as np
import pandas as pd
import re #Regular expressions
rows = ['human', 'interface', 'computer', 'user', 'system', 'response', 'time', 'EPS', 'survey', 'trees', 'graph', 'minors']
nterms = len(rows)
docs = []
docs += ['Human machine interface for Lab ABC computer applications']
docs += ['A survey of user opinions of computer system response time']
docs += ['The EPS user interfaces management system']
docs += ['System and human system engineering testing of EPS']
docs += ['Relation of user-perceived response time on user management']
docs += ['The generation of random, binary, unordered trees']
docs += ['The intersection graph of paths in trees']
docs += ['Graph minors IV: Width of trees and well-quasi-ordering']
docs += ['Graph minors: A survey']
ndocs = len(docs)
term_doc = np.zeros((nterms, ndocs))
for i in xrange(nterms):
for j in xrange(ndocs):
if re.search(rows[i], docs[j], re.IGNORECASE):
term_doc[i, j] = 1
#Use pandas to plot
pd.DataFrame(data=term_doc,index=rows)
```

Out[88]:

Now we can compare the results between ordinary matvec and low-rank matvec.

In [89]:

```
query = 'Human computer interaction'
qv = np.zeros((nterms))
for i in xrange(nterms):
if re.search(rows[i], query, re.IGNORECASE):
qv[i] = 1
res1 = qv.dot(term_doc) #Non-compressed search result
u, s, v = np.linalg.svd(term_doc)
r = 2
u = u[:, :r]
s = s[:r]
v = v[:r, :] #Numpy transposes
appr1 = u.dot(np.diag(s)).dot(v)
res2 = qv.dot(appr1)
res_all = np.vstack((res1, res2)).T #To make two columns
print 'There query is:', query, ',the scores are:'
pd.DataFrame(res_all, index=docs, columns=['No SVD', 'SVD'])
```

Out[89]:

Another important (and similar) application comes from **recommender systems**.

Suppose you have a **user-product matrix**: each user puts a rating for a particular product.

The matrix is then **the number of users times the number of products**. The goal is to recommend additional products to be bought for a particular user.

The scheme is the same: we compute the SVD, and the recommendation for each user is just a column of the approximated matrix.

In [90]:

```
#Read the data
data_read = np.loadtxt('task2_transact_eval.txt',dtype=np.int32, skiprows=1,delimiter='|')
```

In [91]:

```
columns = ['SessionNo', 'ItemNo', 'TransType']
ds = pd.DataFrame(data=data_read,columns=columns)
```

In [92]:

```
from itertools import product
ds_buy = ds[ds.TransType==2][['SessionNo', 'ItemNo']]
users, inv_users = np.unique(data_read[:, 0], return_inverse=True)
products, inv_products = np.unique(data_read[:, 1], return_inverse=True)
print 'Unique sessions:', len(users), 'Unique products', len(products)
#Scipy spars
```

In [94]:

```
import scipy.sparse
import scipy.sparse.linalg
plt.xkcd()
nnz = len(inv_users)
spmat = scipy.sparse.coo_matrix((np.ones(nnz), (inv_users, inv_products)))
r = 2#Approximation rank
u, s, v = scipy.sparse.linalg.svds(spmat, r)
n = spmat.shape[0]
m = spmat.shape[1]
q = np.zeros(n)
user_id = 1
q[user_id] = 1.0
qrec = q.T.dot(u).dot(np.diag(s)).dot(v)
qrec = qrec / np.max(abs(qrec))
plt.plot(qrec)
qb = spmat.T.dot(q)
plt.plot(qb)
plt.xlabel('Product ID')
plt.ylabel('Recommendation')
```

Out[94]:

Dense matrices typically require $N^2$ elements to be stored. For $N \sim 10^4 - 10^5$ the memory requirements. A low rank approximation can reduces this number of $\mathcal{O}(Nr)$

In [95]:

```
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.xkcd()
n = 256
a = [[1.0/(i + j + 0.5) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)
u, s, v = np.linalg.svd(a)
plt.semilogy(s/s[0])
plt.title('Singular values decay for a Hilbert matrix')
```

Out[95]:

We can try to test on a more realistic matrix, since solving linear systems with Hilbert matrix has little practical sense. Instead, we solve a linear system with a matrix
$$A_{ij} = \frac{1}{i - j + \frac{1}{2}},$$
which corresponds to an integral equation
$$
\int \frac{q(y)dy}{x - y } = f(x).
$$
In real life, the equation

$$
\int_{\Omega} \frac{q(y)dy}{\Vert x - y\Vert } = f(x),
$$
is solved, where $\Omega$ is a surface in 3D. This is used, for example, in modelling integral circuits.
Let us see what happens with the singular values.

In [99]:

```
import numpy as np
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.xkcd()
n = 256
a = [[1.0/(i - j + 0.5) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)
u, s, v = np.linalg.svd(a)
plt.plot(s)
plt.title('Singular values decay for a Cauchy-Hilbert matrix')
plt.tight_layout()
s[5] - np.pi
```

Out[99]:

What to do?

The idea is to break the matrix intro blocks

$$ A = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} $$and the blocks $A_{12}$ and $A_{21}$ will be of low-rank! Let us try that..

In [100]:

```
import numpy as np
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.xkcd()
n = 256
a = [[1.0/(i - j + 0.5) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)
a12 = a[:n/2,n/2:]
a21 = a[n/2:,:n/2]
s12 = np.linalg.svd(a12)[1]
s21 = np.linalg.svd(a21)[1]
plt.semilogy(s12)
plt.semilogy(s21)#Actually, they are the same
```

Out[100]:

Surprisingly many matrices in PDEs can be well approximated by block-low-rank (other names: hierarchical, mosaic-skeleton) matrices.

They have linear storage, but algorithms are not very **simple**

- Algorithms for symmetric eigenvalue problems
- Three applications of the SVD

- Yes, test
- We start
**sparse and/or structured**NLA.

In [81]:

```
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()
```

Out[81]: