Week 1: Matrices, vectors, matrix/vector norms, scalar products & unitary matrices
Week 2: TAs-week (Strassen, FFT, a bit of SVD)
Week 3: Matrix ranks, singular value decomposition, linear systems, eigenvalues
Week 4: Matrix decompositions: QR, LU, SVD + test + structured matrices start
Week 5: Iterative methods, preconditioners, matrix functions
Today we will continue to talk about iterative methods.
We will consider:
Just to recall: we are solving
$$Ax = f,$$where $A$ is an $n \times n$ sparse matrix (i.e. the number of non-zero elements is smaller than the $N^2)$,
and the matrix-by-vector product can be computed in $\mathcal{O}(N)$ operations.
We solve $$A P^{-1} y = f,$$
and $P$ is such that $A \approx P$ and $AP^{-1}$ has better spectra for the iterative method.
General idea is to select a class $\mathcal{P}$ of matrices with which we can easily solve linear systems and then
find a matrix $P \in \mathcal{P}$ that minimizes
$$P = \arg \min_{P \in \mathcal{P}}\Vert A - P \Vert $$Any other ideas for the class of the matrices?
Let us remember the basic method for solving linear systems:
Decompose the matrix $A$ in the form
$$A = P_1 L U P^{\top}_2, $$where $P_1$ and $P_2$ are certain permutation matrices (which do the pivoting).
The most natural idea is to use sparse $L$ and $U$.
It is not possible without fill-in growth for example for matrices, coming from 2D/3D Partial Differential equations (PDEs).
What to do?
Suppose you want to eliminate a variable $x_1$,
and the equations have the form
$$5 x_1 + x_4 + x_{10} = 1, \quad 3 x_1 + x_4 + x_8 = 0, \ldots,$$and in all other equations $x_1$ is not present.
After the elimination, only $x_{10}$ will enter additionally to the second equation (new fill-in).
$$x_4 + x_8 + 3(1 - x_4 - x_{10})/5 = 0$$In the Incomplete $LU$ case (actually, ILU(0)) we just throw away the new fill-in.
We run the usual LU-decomposition cycle, but avoid inserting non-zeros other than the initial non-zero pattern.
L = np.zeros((n, n)) U = np.zeros((n, n)) for k in xrange(n): #Eliminate one row L[k, k] = 1 for i in xrange(k+1, n): L[i, k] = a[i, k] / a[k, k] for j in xrange(k+1, n): a[i, j] = a[i, j] - L[i, k] * a[k, j] #New fill-ins appear here for j in xrange(k, n): U[k, j] = a[k, j]
Youseef Saad (who is the author of GMRES) also had a seminal paper on the Incomplete LU decomposition
Saad, Yousef (1996), Iterative methods for sparse linear systems
And he proposed ILU(k) method.
The idea of ILU(k) is very instructive and is based on the connection between sparse matrices and graphs.
Suppose you have an $n \times n$ matrix $A$ and a corresponding adjacency graph.
Then we eliminate one variable (vertex) and get a smaller system of size $(n-1) \times (n-1)$.
What is the graph of this system, when a new edge appear?
The new edge can appear only between vertices that had common neighbour: it means, that they are second-order neigbours. This is also a sparsity pattern of the matrix $$A^2.$$
The ILU(k) idea is to leave only the elements in $L$ and $U$ that are $k$-order neighbours.
The ILU(2) is very efficient, but for some reason completely abandoned (i.e. there is no implementation in MATLAB and scipy).
There is an original Sparsekit software by Saad, which works quite well.
A much more popular approach is based on the so-called thresholded LU.
You do the standard Gaussian elimination with fill-ins, but either:
Throw away elements that are smaller than threshold, and/or control the amount of non-zeros you are allowed to store.
The smaller is the threshold, the better is the preconditioner, but more memory it takes.
In the SPD case, instead of incomplete LU you should use Incomplete Cholesky, which is twice faster and consumes twice less memory.
Both ILUT and Ichol are implemented in scipy and you can try (nothing quite fancy, but it works).
There are more efficient (but much less popular due to the absense of open-source implementations) second-order LU factorization proposed by I. Kaporin
The idea is to approximate the matrix in the form
$$A \approx U_2 U^{\top}_2 + U^{\top}_2 R_2 + R^{\top}_2 U_2,$$which is just the expansion of the $UU^{\top}$ with respect to the perturbation of $U$.
where $U_1$ and $U_2$ are low-triangular and sparse, whereare $R_2$ is small with respect to the drop tolerance parameter.
Besides the ILU/IC approaches, there is a very efficient approach, called algebraic multigrid (AMG).
It was first proposed by Ruge and Stuben in 1987.
It is probably the only black-box method that is asymptotically optimal for 2D/3D problems
(at least, for elliptic problems).
The idea comes from the geometric multigrid.
The basic idea of the multigrid is to consider instead of one problem problems on coarser meshes.
Then we solve for a system on a coarse mesh, and interpolate it to the finer mesh.
Geometric multigrid uses the information about the meshes, operators, discretization..
AMG method tries to guess it from the matrix.
To know more, enroll in the FastPDE course this Spring :)
The ILU has three flavours: ILU(k), ILUT, ILU2. They are different, but based on the approximation of the matrix in a simpler form to use as a preconditioners.
Now we will briefly discuss iterative methods for other NLA problems.
Up to now, we only talked about the iterative methods for linear systems.
There are other important large-scale problems:
It is extremely difficult to compute all eigenvalues/singular values of the matrix
(singular vectors / eigenvectors are not sparse, so we need to store them all!).
But it is possible to solve partial eigenvalue problems.
The SVD follows from the symmetric eigenvalue problem.
Recall the power method:
$$x_{k+1} = A x_k.$$Can we do better, than the power method (which converges to the largest eigenvalue?)
We can use the full Krylov subspace, and construct the orthogonal basis in the subspace
$$x \in K_n(A, f)$$to approximate the eigenvalues.
This gives rise to the Arnoldi method (for Hermitian matrices it reduces to the Lanczos method)
The Arnold method has a simple structure (see the pseudocode), which just modified Gram-Schmidt applied to the Krylov subspace
q[k] = A * q[k-1], for j in range(k-1): h[j, k-1] = (q[j], q[k]) q[k] -= h[j, k-1] * q[j] h[k, k-1] = || q[k] || q[k] = q[k]/h[k, k-1]
The Arnoldi method generates an orthogonal basis of the form
$$A Q_n = \widetilde{H}_n Q_{n+1},$$and $H_n = Q^*_n A Q_n$ is upper Hessenberg matrix.
The idea of using Arnoldi method as an eigenvalue algorithm is that we compute the eigenvalues of $A$
by looking at the eigenvalues of the orthogonal projection $H_n$ (called Ritz values).
Since $H_n$ is small we can use standard software to compute its eigenvalues.
Typically, Ritz values converge to extreme (large & small) eigenvalues of $A$.
In the Hermitian case, $A = A^*$ the matrix
$$H = Q^* A Q$$becomes tridiagonals, and it gives short three-term recurrence relations for the vectors:
the famous Lanczos method.
We only need to store last vectors and the tridiagonal matrix.
The Lanczos vectors may loose orthogonality during the process due to floating-point errors, thus all practical implementations of it use restarts.
A very good introduction to the topic is given in the book of Golub and Van-Loan (Matrix Computations).
The convergence of eigenvalues is still governed by the condition number, and the number of iterations can be large.
Thus, instead of Krylov subspaces, we can use Rational Krylov subspaces, which require auxiliary linear solvers.
The simplest Rational Krylov method is the inverse iteration:
$$x_{k+1} = A^{-1} x_k.$$It converges to the smallest eigenvector (as the power method with the inverse matrix).
Each step requires one linear solve.
We can factorize $A$ once, and then apply inverse!
The RQI-iteration has the form
$$x_{k+1} = (A - \lambda_k I)^{-1} x_k, \quad \lambda_k = ( A x_k, x_k)/(x_k, x_k).$$For SPD matrices it has cubic convergence, but each time we have to construct preconditioners for the different matrices.
If the factorization of $(A - \lambda I)$ is not feasible, Jacobi-Davidson method is typically the method of choice.
Jacobi not only presented the way to solve the eigenvalue problem by Jacobi rotations, but also proposed an iterative procedure. Let $u_j$ be the current approximation, and $t$ the correction:
$$A(u_j + t) = \lambda (u_j + t),$$and we look for the correction $t \perp u_j$ (new orthogonal vector).
Then, the parallel part has the form
$$u_j u^*_j A (u_j + t) = \lambda u_j,$$which simplifies to
$$\theta_j + u_j u^* _j A t = \lambda.$$The orthogonal component is
$$( I - u_j u^*_j) A (u_j + t) = (I - u_j u^*_j) \lambda (u_j + t),$$which is equivalent to
$$ (I - u_j u^*_j) (A - \lambda I) t = (I - u_j u^*_j) (- A u_j + \lambda u_j) = - (I - u_j u^*_j) A u_j = - (A - \theta_j I) u_j = -r_j. $$$r_j$ is the residue.
Since $(I - u_j u^*j) t = t$, we can rewrite this equation in the form
Now we replace $\lambda$ by $\theta_j$, and get
$$ (I - u_j u^*_j) (A - \theta_j I) (I - u_j u^*_j) t = -r_j. $$Since $r_j \perp u_j$ this equation is consistent, if $(A - \theta_j I)$ is non-singular.
If this equation is solved exactly, we will get Rayleigh Quotient iteration!
If we want many eigenvectors, we just compute partial Schur decomposition:
$$A Q_k = Q_k T_k, $$and then want to update $Q_k$ by one vector added to $Q_k$. We just use instead of $A$ the matrix $(I - Q_k Q^*_k) A (I - Q_k Q^*_k)$.
The correction equation can be solved only roughly, and JD method is often the fastest.
We will briefly discuss them, with more to follow on Thursday.
A matrix is called Toeplitz if its elements are defined as
$$a_{ij} = t_{i - j}.$$A Toeplitz matrix is completely defined by its first column and first row (i.e., $2n-1$ parameters).
It can also be multiplied by vector fast.
It has a lot of application in signal processing
In order to get fast matrix-by-vector product, consider a special class of Toeplitz matrices:
Circulant matrices.
A matrix is called circulant, if
$$a_{ij} = c_{\mathrm{mod}(i - j, n)}.$$Theorem:
Any circulant matrix can be represented in the form
$$C = F \Lambda F^*,$$where $F$ is the Fourier matrix with the elements
$$F_{kl} = \frac{1}{\sqrt{n}} w^{kl}, \quad k, l = 0, \ldots, n-1, \quad w = e^{\frac{2 \pi i}{n}},$$and matrix $\Lambda$ is the diagonal matrix and
$$\lambda = F^* c, $$where $c$ is the first column of the circulant matrix.
The matrix-by-vector product $F x$ can be computed in $\mathcal{O}(n \log n)$ operations using Fast Fourier Transform (FFT).
and then
$$
C \begin{bmatrix} q\\ 0 \end{bmatrix} = \begin{bmatrix} Tq \\0 \end{bmatrix}.
$$
How to solve linear systems with Toeplitz matrices?
Since we have fast matrix-by-vector product, methods we can use iterative methods.
We can also use direct solvers, but what is the structure of the inverse of such matrix?
Discuss it next time!
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()