**Week 1:** Matrices, vectors, matrix/vector norms, scalar products & unitary matrices

**Week 2:** TAs-week (Strassen, FFT, a bit of SVD)

**Week 3:** Matrix ranks, singular value decomposition, linear systems, eigenvalues

**Week 4:** Matrix decompositions: QR, LU, SVD + test + structured matrices start

**Week 5:** Iterative methods, preconditioners, matrix functions

- Krylov subspaces
- Iterative methods: Conjugate Gradient (CG), Generalized Minimal Residual Method (GMRES), BiCGStab
- Definition of preconditioners
- Jacobi, Gauss-Seidel as preconditioners

Today we will continue to talk about iterative methods.

We will consider:

- Concept of Incomplete LU/Incomplete Cholesky
- A brief note about algebraic multigrid (no details, because we need PDEs for the general multigrid concept)
- Iterative methods for other NLA problems (SVD, eigenvalue problems)
- Other than sparse: Toeplitz matrices, FFT, circulant matrices.

Just to recall: we are solving

$$Ax = f,$$where $A$ is an $n \times n$ **sparse matrix** (i.e. the number of non-zero elements is smaller than the $N^2)$,

and the **matrix-by-vector product** can be computed in $\mathcal{O}(N)$ operations.

We solve $$A P^{-1} y = f,$$

and $P$ is such that $A \approx P$ and $AP^{-1}$ has **better spectra** for the iterative method.

General idea is to select a class $\mathcal{P}$ of matrices with which we can **easily solve** linear systems and then

find a matrix $P \in \mathcal{P}$ that minimizes

$$P = \arg \min_{P \in \mathcal{P}}\Vert A - P \Vert $$- Diagonal matrices give
**Jacobi preconditioners** - Triangular matrices give
**Gauss-Seidel**preconditioner

Any other ideas for the class of the matrices?

Let us remember the basic method for solving linear systems:

Decompose the matrix $A$ in the form

$$A = P_1 L U P^{\top}_2, $$where $P_1$ and $P_2$ are certain **permutation** matrices (which do the pivoting).

The most natural idea is to use **sparse** $L$ and $U$.

It is not possible without **fill-in** growth for example for matrices, coming from 2D/3D Partial Differential equations (PDEs).

What to do?

Suppose you want to eliminate a variable $x_1$,

and the equations have the form

$$5 x_1 + x_4 + x_{10} = 1, \quad 3 x_1 + x_4 + x_8 = 0, \ldots,$$and in all other equations $x_1$ is not present.

After the elimination, only $x_{10}$ will enter additionally to the second equation (new fill-in).

$$x_4 + x_8 + 3(1 - x_4 - x_{10})/5 = 0$$In the Incomplete $LU$ case (actually, ILU(0)) we just throw away the **new fill-in**.

We run the usual LU-decomposition cycle, but avoid inserting non-zeros **other** than the initial non-zero pattern.

```
L = np.zeros((n, n))
U = np.zeros((n, n))
for k in xrange(n): #Eliminate one row
L[k, k] = 1
for i in xrange(k+1, n):
L[i, k] = a[i, k] / a[k, k]
for j in xrange(k+1, n):
a[i, j] = a[i, j] - L[i, k] * a[k, j] #New fill-ins appear here
for j in xrange(k, n):
U[k, j] = a[k, j]
```

Youseef Saad (who is the author of GMRES) also had a seminal paper on the **Incomplete LU** decomposition

Saad, Yousef (1996), Iterative methods for sparse linear systems

And he proposed **ILU(k)** method.

The idea of ILU(k) is very instructive and is based on the connection between sparse matrices and graphs.

Suppose you have an $n \times n$ matrix $A$ and a corresponding adjacency graph.

Then we eliminate one variable (vertex) and get a smaller system of size $(n-1) \times (n-1)$.

What is the graph of this system, when a **new edge appear**?

The new edge can appear only between vertices that had common neighbour: it means, that they are second-order neigbours. This is also a sparsity pattern of the matrix $$A^2.$$

The **ILU(k)** idea is to leave only the elements in $L$ and $U$ that are $k$-order neighbours.

The ILU(2) is very efficient, but for some reason completely abandoned (i.e. there is no implementation in MATLAB and scipy).

There is an original Sparsekit software by Saad, which works quite well.

A much more popular approach is based on the so-called **thresholded LU**.

You do the standard Gaussian elimination with fill-ins, but either:

Throw away elements that are smaller than threshold, and/or control the amount of non-zeros you are allowed to store.

The smaller is the threshold, the better is the preconditioner, but more memory it takes.

In the SPD case, instead of incomplete LU you should use Incomplete Cholesky, which is twice faster and consumes twice less memory.

Both ILUT and Ichol are implemented in scipy and you can try (nothing quite fancy, but it works).

There are more efficient (but much less popular due to the absense of open-source implementations) **second-order** LU factorization proposed by I. Kaporin

The idea is to approximate the matrix in the form

$$A \approx U_2 U^{\top}_2 + U^{\top}_2 R_2 + R^{\top}_2 U_2,$$which is just the expansion of the $UU^{\top}$ with respect to the perturbation of $U$.

where $U_1$ and $U_2$ are low-triangular and sparse, whereare $R_2$ is small with respect to the drop tolerance parameter.

Besides the ILU/IC approaches, there is a very efficient approach, called **algebraic multigrid (AMG)**.

It was first proposed by Ruge and Stuben in 1987.

It is probably the only **black-box** method that is asymptotically optimal for 2D/3D problems

(at least, for elliptic problems).

The idea comes from the **geometric multigrid**.

The basic idea of the multigrid is to consider instead of one problem problems on **coarser meshes**.

Then we solve for a system on a coarse mesh, and interpolate it to the finer mesh.

Geometric multigrid uses the information about the meshes, operators, discretization..

AMG method tries to guess it from the **matrix**.

To know more, enroll in the FastPDE course this Spring :)

The ILU has three flavours: ILU(k), ILUT, ILU2. They are different, but based on the approximation of the matrix in a simpler form to use as a preconditioners.

Now we will briefly discuss iterative methods for other NLA problems.

Up to now, we only talked about the iterative methods for **linear systems**.

There are other important large-scale problems:

- (Partial) Eigenvalue problem: $Ax_k = \lambda_k x_k.$
- (Partial) SVD problem: $A v_k = \sigma_k u_k, \quad A^* u_k = \sigma_k v_k$.

**It is extremely difficult to compute all eigenvalues/singular values of the matrix**

(singular vectors / eigenvectors are not sparse, so we need to store them all!).

But it is possible to solve **partial eigenvalue problems**.

The SVD follows from the symmetric eigenvalue problem.

Recall the power method:

$$x_{k+1} = A x_k.$$Can we do better, than the power method (which converges to the largest eigenvalue?)

We can use the full Krylov subspace, and construct the orthogonal basis in the subspace

$$x \in K_n(A, f)$$to approximate the eigenvalues.

This gives rise to the **Arnoldi method** (for Hermitian matrices it reduces to the **Lanczos method**)

The Arnold method has a simple structure (see the pseudocode), which just modified Gram-Schmidt applied to the Krylov subspace

```
q[k] = A * q[k-1],
for j in range(k-1):
h[j, k-1] = (q[j], q[k])
q[k] -= h[j, k-1] * q[j]
h[k, k-1] = || q[k] ||
q[k] = q[k]/h[k, k-1]
```

The Arnoldi method generates an orthogonal basis of the form

$$A Q_n = \widetilde{H}_n Q_{n+1},$$and $H_n = Q^*_n A Q_n$ is **upper Hessenberg matrix**.

The idea of using Arnoldi method as an **eigenvalue algorithm** is that we compute the eigenvalues of $A$

by looking at the eigenvalues of the **orthogonal projection** $H_n$ (called **Ritz values**).

Since $H_n$ is small we can use standard software to compute its eigenvalues.

Typically, Ritz values converge to **extreme** (large & small) eigenvalues of $A$.

In the Hermitian case, $A = A^*$ the matrix

$$H = Q^* A Q$$becomes tridiagonals, and it gives short three-term recurrence relations for the vectors:

the famous Lanczos method.

We only need to store last vectors and the tridiagonal matrix.

The Lanczos vectors may loose orthogonality during the process due to floating-point errors, thus all practical implementations of it use **restarts**.

A very good introduction to the topic is given in the book of **Golub and Van-Loan (Matrix Computations)**.

The convergence of eigenvalues is still governed by the condition number, and the number of iterations can be large.

Thus, instead of Krylov subspaces, we can use **Rational Krylov subspaces**, which require auxiliary linear solvers.

The simplest Rational Krylov method is the **inverse iteration**:

It converges to the smallest eigenvector (as the power method with the inverse matrix).

Each step requires one linear solve.

We can factorize $A$ once, and then apply inverse!

The RQI-iteration has the form

$$x_{k+1} = (A - \lambda_k I)^{-1} x_k, \quad \lambda_k = ( A x_k, x_k)/(x_k, x_k).$$For SPD matrices it has **cubic convergence**, but each time we have to construct preconditioners for the different matrices.

If the factorization of $(A - \lambda I)$ is not feasible, **Jacobi-Davidson** method is typically the method of choice.

Jacobi not only presented the way to solve the eigenvalue problem by Jacobi rotations, but also proposed an iterative procedure. Let $u_j$ be the current approximation, and $t$ the correction:

$$A(u_j + t) = \lambda (u_j + t),$$and we look for the correction $t \perp u_j$ (new orthogonal vector).

Then, the parallel part has the form

$$u_j u^*_j A (u_j + t) = \lambda u_j,$$which simplifies to

$$\theta_j + u_j u^* _j A t = \lambda.$$The orthogonal component is

$$( I - u_j u^*_j) A (u_j + t) = (I - u_j u^*_j) \lambda (u_j + t),$$which is equivalent to

$$ (I - u_j u^*_j) (A - \lambda I) t = (I - u_j u^*_j) (- A u_j + \lambda u_j) = - (I - u_j u^*_j) A u_j = - (A - \theta_j I) u_j = -r_j. $$$r_j$ is the residue.

Since $(I - u_j u^*j) t = t$, we can rewrite this equation in the form

Now we replace $\lambda$ by $\theta_j$, and get

$$ (I - u_j u^*_j) (A - \theta_j I) (I - u_j u^*_j) t = -r_j. $$Since $r_j \perp u_j$ this equation is consistent, if $(A - \theta_j I)$ is non-singular.

If this equation is solved exactly, we will get Rayleigh Quotient iteration!

If we want many eigenvectors, we just compute **partial Schur decomposition:**

and then want to update $Q_k$ by one vector added to $Q_k$. We just use instead of $A$ the matrix $(I - Q_k Q^*_k) A (I - Q_k Q^*_k)$.

The correction equation can be solved only roughly, and JD method is often the fastest.

- Iterative methods (Arnoldi, Lanczos, Jacobi-Davidson) for eigenvalue problems
- There is a software for using them
- There is a lot of technical issues hidden (restarts, spurious eigenvalues, stability)

- Up to now, we discussed preconditioning only for
**sparse matrices** - But iterative methods work well for any matrices that have fast black-box matrix-by-vector product
- Important class of matrices are
**Toeplitz matrices**(and**Hankel matrices**)

We will briefly discuss them, with more to follow on Thursday.

A matrix is called **Toeplitz** if its elements are defined as

A Toeplitz matrix is completely defined by its first column and first row (i.e., $2n-1$ parameters).

It can also be multiplied by vector fast.

It has a lot of application in **signal processing**

In order to get fast matrix-by-vector product, consider a special class of **Toeplitz matrices:**

**Circulant matrices**.

A matrix is called **circulant**, if

**Theorem:**

Any circulant matrix can be represented in the form

$$C = F \Lambda F^*,$$where $F$ is the **Fourier matrix** with the elements

and matrix $\Lambda$ is the diagonal matrix and

$$\lambda = F^* c, $$where $c$ is the first column of the circulant matrix.

The matrix-by-vector product $F x$ can be computed in $\mathcal{O}(n \log n)$ operations using Fast Fourier Transform (FFT).

```
and then
$$
C \begin{bmatrix} q\\ 0 \end{bmatrix} = \begin{bmatrix} Tq \\0 \end{bmatrix}.
$$
```

How to solve linear systems with Toeplitz matrices?

Since we have fast matrix-by-vector product, methods we can use iterative methods.

We can also use direct solvers, but what is **the structure of the inverse** of such matrix?

Discuss it next time!

- ILU
- Iterative eigensolvers
- Toeplitz matrices, FFT, circulant

- More on Toeplitz matrices (direct and iterative methods, block case).
- Back to matrices: matrix functions

In [1]:

```
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()
```

Out[1]: