#!/usr/bin/env python # coding: utf-8 # # Lecture 13: Iterative methods # ## Syllabus # **Week 1:** Matrices, vectors, matrix/vector norms, scalar products & unitary matrices # **Week 2:** TAs-week (Strassen, FFT, a bit of SVD) # **Week 3:** Matrix ranks, singular value decomposition, linear systems, eigenvalues # **Week 4:** Matrix decompositions: QR, LU, SVD + test + structured matrices start # **Week 5:** Iterative methods, preconditioners, matrix functions # ## Recap of the previous lecture # - Krylov subspaces # - Iterative methods: Conjugate Gradient (CG), Generalized Minimal Residual Method (GMRES), BiCGStab # - Definition of preconditioners # - Jacobi, Gauss-Seidel as preconditioners # ## Today lecture # # Today we will continue to talk about iterative methods. # # We will consider: # # - Concept of Incomplete LU/Incomplete Cholesky # - A brief note about algebraic multigrid (no details, because we need PDEs for the general multigrid concept) # - Iterative methods for other NLA problems (SVD, eigenvalue problems) # - Other than sparse: Toeplitz matrices, FFT, circulant matrices. # ## Problem setup (recall) # # Just to recall: we are solving # # $$Ax = f,$$ # # where $A$ is an $n \times n$ **sparse matrix** (i.e. the number of non-zero elements is smaller than the $N^2)$, # # and the **matrix-by-vector product** can be computed in $\mathcal{O}(N)$ operations. # # We solve $$A P^{-1} y = f,$$ # # and $P$ is such that $A \approx P$ and $AP^{-1}$ has **better spectra** for the iterative method. # ## Simple method # General idea is to select a class $\mathcal{P}$ of matrices with which we can **easily solve** linear systems and then # # find a matrix $P \in \mathcal{P}$ that minimizes # # $$P = \arg \min_{P \in \mathcal{P}}\Vert A - P \Vert $$ # # 1. Diagonal matrices give **Jacobi preconditioners** # 2. Triangular matrices give **Gauss-Seidel** preconditioner # # Any other ideas for the class of the matrices? # # ## Remember the Gaussian elimination # Let us remember the basic method for solving linear systems: # # Decompose the matrix $A$ in the form # # $$A = P_1 L U P^{\top}_2, $$ # # where $P_1$ and $P_2$ are certain **permutation** matrices (which do the pivoting). # # The most natural idea is to use **sparse** $L$ and $U$. # # It is not possible without **fill-in** growth for example for matrices, coming from 2D/3D Partial Differential equations (PDEs). # # What to do? # ## Incomplete LU # # # Suppose you want to eliminate a variable $x_1$, # # and the equations have the form # # $$5 x_1 + x_4 + x_{10} = 1, \quad 3 x_1 + x_4 + x_8 = 0, \ldots,$$ # # and in all other equations $x_1$ is not present. # # After the elimination, only $x_{10}$ will enter additionally to the second equation (new fill-in). # # $$x_4 + x_8 + 3(1 - x_4 - x_{10})/5 = 0$$ # # In the Incomplete $LU$ case (actually, ILU(0)) we just throw away the **new fill-in**. # ## Incomplete-LU: formal definition # # We run the usual LU-decomposition cycle, but avoid inserting non-zeros **other** than the initial non-zero pattern. # # `` # L = np.zeros((n, n)) # U = np.zeros((n, n)) # for k in xrange(n): #Eliminate one row # L[k, k] = 1 # for i in xrange(k+1, n): # L[i, k] = a[i, k] / a[k, k] # for j in xrange(k+1, n): # a[i, j] = a[i, j] - L[i, k] * a[k, j] #New fill-ins appear here # for j in xrange(k, n): # U[k, j] = a[k, j] # `` # ## ILU(k) # # Youseef Saad (who is the author of GMRES) also had a seminal paper on the **Incomplete LU** decomposition # # Saad, Yousef (1996), Iterative methods for sparse linear systems # # And he proposed **ILU(k)** method. # ## ILU(k), continued # # The idea of ILU(k) is very instructive and is based on the connection between sparse matrices and graphs. # # Suppose you have an $n \times n$ matrix $A$ and a corresponding adjacency graph. # # Then we eliminate one variable (vertex) and get a smaller system of size $(n-1) \times (n-1)$. # # What is the graph of this system, when a **new edge appear**? # ## LU & graphs # # The new edge can appear only between vertices that had common neighbour: it means, that they are second-order neigbours. This is also a sparsity pattern of the matrix $$A^2.$$ # # The **ILU(k)** idea is to leave only the elements in $L$ and $U$ that are $k$-order neighbours. # # The ILU(2) is very efficient, but for some reason completely abandoned (i.e. there is no implementation in MATLAB and scipy). # # There is an original Sparsekit software by Saad, which works quite well. # ## ILU Thresholded (ILUT) # A much more popular approach is based on the so-called **thresholded LU**. # # You do the standard Gaussian elimination with fill-ins, but either: # # Throw away elements that are smaller than threshold, and/or control the amount of non-zeros you are allowed to store. # # The smaller is the threshold, the better is the preconditioner, but more memory it takes. # ## Symmetric positive definite case # # In the SPD case, instead of incomplete LU you should use Incomplete Cholesky, which is twice faster and consumes twice less memory. # # Both ILUT and Ichol are implemented in scipy and you can try (nothing quite fancy, but it works). # ## Second-order LU preconditioners # # There are more efficient (but much less popular due to the absense of open-source implementations) **second-order** LU factorization [proposed by I. Kaporin](http://www.researchgate.net/profile/I_Kaporin/publication/242940993_High_quality_preconditioning_of_a_general_symmetric_positive_definite_matrix_based_on_its_UTU__UTR__RTU-decomposition/links/53f72ad90cf2888a74976f54.pdf) # # The idea is to approximate the matrix in the form # # $$A \approx U_2 U^{\top}_2 + U^{\top}_2 R_2 + R^{\top}_2 U_2,$$ # # which is just the expansion of the $UU^{\top}$ with respect to the perturbation of $U$. # # where $U_1$ and $U_2$ are low-triangular and sparse, whereare $R_2$ is small with respect to the drop tolerance parameter. # # # ## What else? # Besides the ILU/IC approaches, there is a very efficient approach, called **algebraic multigrid (AMG)**. # # It was first proposed by Ruge and Stuben in 1987. # # It is probably the only **black-box** method that is asymptotically optimal for 2D/3D problems # # (at least, for elliptic problems). # # The idea comes from the **geometric multigrid**. # # # # ## Multigrid: just the basic idea # The basic idea of the multigrid is to consider instead of one problem problems on **coarser meshes**. # # Then we solve for a system on a coarse mesh, and interpolate it to the finer mesh. # # Geometric multigrid uses the information about the meshes, operators, discretization.. # # AMG method tries to guess it from the **matrix**. # # To know more, enroll in the FastPDE course this Spring :) # ## Summary of this part # # The ILU has three flavours: ILU(k), ILUT, ILU2. They are different, but based on the approximation of the matrix in a simpler form to use as a preconditioners. # # Now we will briefly discuss iterative methods for other NLA problems. # ## Iterative methods for other NLA problems # # Up to now, we only talked about the iterative methods for **linear systems**. # # There are other important large-scale problems: # # 1. (Partial) Eigenvalue problem: $Ax_k = \lambda_k x_k.$ # 2. (Partial) SVD problem: $A v_k = \sigma_k u_k, \quad A^* u_k = \sigma_k v_k$. # # **It is extremely difficult to compute all eigenvalues/singular values of the matrix** # # (singular vectors / eigenvectors are not sparse, so we need to store them all!). # # But it is possible to solve **partial eigenvalue problems**. # # The SVD follows from the symmetric eigenvalue problem. # ## Can we do better, than the power method # # Recall the power method: # # $$x_{k+1} = A x_k.$$ # # Can we do better, than the power method (which converges to the largest eigenvalue?) # ## Yes, we can do better # # We can use the full Krylov subspace, and construct the orthogonal basis in the subspace # # $$x \in K_n(A, f)$$ # # to approximate the eigenvalues. # # This gives rise to the **Arnoldi method** (for Hermitian matrices it reduces to the **Lanczos method**) # ## Arnoldi method # # The Arnold method has a simple structure (see the pseudocode), which just modified Gram-Schmidt applied to the Krylov subspace # # # `` # q[k] = A * q[k-1], # for j in range(k-1): # h[j, k-1] = (q[j], q[k]) # q[k] -= h[j, k-1] * q[j] # h[k, k-1] = || q[k] || # q[k] = q[k]/h[k, k-1] # `` # ## Arnoldi method, cont # # The Arnoldi method generates an orthogonal basis of the form # # $$A Q_n = \widetilde{H}_n Q_{n+1},$$ # # and $H_n = Q^*_n A Q_n$ is **upper Hessenberg matrix**. # # The idea of using Arnoldi method as an **eigenvalue algorithm** is that we compute the eigenvalues of $A$ # # by looking at the eigenvalues of the **orthogonal projection** $H_n$ (called **Ritz values**). # # Since $H_n$ is small we can use standard software to compute its eigenvalues. # # Typically, Ritz values converge to **extreme** (large & small) eigenvalues of $A$. # # ## Symmetric case: Lanczos method # # In the Hermitian case, $A = A^*$ the matrix # # $$H = Q^* A Q$$ becomes tridiagonals, and it gives short three-term recurrence relations for the vectors: # # the famous Lanczos method. # # We only need to store last vectors and the tridiagonal matrix. # ## Practical issues and stability # # The Lanczos vectors may loose orthogonality during the process due to floating-point errors, thus all practical implementations of it use **restarts**. # # A very good introduction to the topic is given in the book of **Golub and Van-Loan (Matrix Computations)**. # ## Preconditioning of eigenvalue problems. # # The convergence of eigenvalues is still governed by the condition number, and the number of iterations can be large. # # Thus, instead of Krylov subspaces, we can use **Rational Krylov subspaces**, which require auxiliary linear solvers. # ## Simplest Rational Krylov method: Inverse iteration # # The simplest Rational Krylov method is the **inverse iteration**: # # $$x_{k+1} = A^{-1} x_k.$$ # # It converges to the smallest eigenvector (as the power method with the inverse matrix). # # Each step requires one linear solve. # # We can factorize $A$ once, and then apply inverse! # # ## Rayleigh quotient iteration # # The RQI-iteration has the form # # $$x_{k+1} = (A - \lambda_k I)^{-1} x_k, \quad \lambda_k = ( A x_k, x_k)/(x_k, x_k).$$ # # For SPD matrices it has **cubic convergence**, but each time we have to construct preconditioners for the different matrices. # # If the factorization of $(A - \lambda I)$ is not feasible, **Jacobi-Davidson** method is typically the method of choice. # ## Jacobi correction # # Jacobi not only presented the way to solve the eigenvalue problem by Jacobi rotations, but also proposed an iterative procedure. Let $u_j$ be the current approximation, and $t$ the correction: # # $$A(u_j + t) = \lambda (u_j + t),$$ # # and we look for the correction $t \perp u_j$ (new orthogonal vector). # # Then, the parallel part has the form # # $$u_j u^*_j A (u_j + t) = \lambda u_j,$$ # # which simplifies to # # $$\theta_j + u_j u^* _j A t = \lambda.$$ # # The orthogonal component is # # $$( I - u_j u^*_j) A (u_j + t) = (I - u_j u^*_j) \lambda (u_j + t),$$ # # which is equivalent to # # $$ # (I - u_j u^*_j) (A - \lambda I) t = (I - u_j u^*_j) (- A u_j + \lambda u_j) = - (I - u_j u^*_j) A u_j = - (A - \theta_j I) u_j = -r_j. # $$ # # $r_j$ is the residue. # # Since $(I - u_j u^*j) t = t$, we can rewrite this equation in the form # # ## Jacobi-Davidson correction # # $$ (I - u_j u^*_j) (A - \lambda I) (I - u_j u^*_j) t = -r_j.$$ # # Now we replace $\lambda$ by $\theta_j$, and get # # $$ # (I - u_j u^*_j) (A - \theta_j I) (I - u_j u^*_j) t = -r_j. # $$ # # Since $r_j \perp u_j$ this equation is consistent, if $(A - \theta_j I)$ is non-singular. # # If this equation is solved exactly, we will get Rayleigh Quotient iteration! # # If we want many eigenvectors, we just compute **partial Schur decomposition:** # # $$A Q_k = Q_k T_k, $$ # # and then want to update $Q_k$ by one vector added to $Q_k$. We just use instead of $A$ the matrix $(I - Q_k Q^*_k) A (I - Q_k Q^*_k)$. # ## Jacobi-Davidson: summary # # The correction equation can be solved only roughly, and JD method is often the fastest. # ## Software # # - The [ARPack](http://www.caam.rice.edu/software/ARPACK/) is the most widely used (it powers scipy sparse eigensolver) # - The [PRIMME](https://github.com/primme/primme) is the best from my experience (it employs dynamic switching between different methods, including those we have not talked about like LOBPCG.) # - [PROPACK](http://sun.stanford.edu/~rmunk/PROPACK/) works well for the SVD. # # ## Summary for this part # # - Iterative methods (Arnoldi, Lanczos, Jacobi-Davidson) for eigenvalue problems # - There is a software for using them # - There is a lot of technical issues hidden (restarts, spurious eigenvalues, stability) # ## Other structured matrices # # - Up to now, we discussed preconditioning only for **sparse matrices** # - But iterative methods work well for any matrices that have fast black-box matrix-by-vector product # - Important class of matrices are **Toeplitz matrices** (and **Hankel matrices**) # # We will briefly discuss them, with more to follow on Thursday. # ## Toeplitz matrices: definition # # A matrix is called **Toeplitz** if its elements are defined as # # $$a_{ij} = t_{i - j}.$$ # # A Toeplitz matrix is completely defined by its first column and first row (i.e., $2n-1$ parameters). # # It can also be multiplied by vector fast. # # It has a lot of application in **signal processing** # ## Fast Toeplitz matrix-by-vector product # # In order to get fast matrix-by-vector product, consider a special class of **Toeplitz matrices:** # # **Circulant matrices**. # # A matrix is called **circulant**, if # # $$a_{ij} = c_{\mathrm{mod}(i - j, n)}.$$ # # ## Spectral theorem for circulant matrices # # **Theorem:** # # Any circulant matrix can be represented in the form # # $$C = F \Lambda F^*,$$ # # where $F$ is the **Fourier matrix** with the elements # # $$F_{kl} = \frac{1}{\sqrt{n}} w^{kl}, \quad k, l = 0, \ldots, n-1, \quad w = e^{\frac{2 \pi i}{n}},$$ # # and matrix $\Lambda$ is the diagonal matrix and # # $$\lambda = F^* c, $$ # # where $c$ is the first column of the circulant matrix. # ## FFT # # The matrix-by-vector product $F x$ can be computed in $\mathcal{O}(n \log n)$ operations using Fast Fourier Transform (FFT). # ## Fast Toeplitz-by-vector product by circulant embedding # # # # $$ C = \begin{bmatrix} T & T_2 \\ # T_1 & T \end{bmatrix} $$ # and then # $$ # C \begin{bmatrix} q\\ 0 \end{bmatrix} = \begin{bmatrix} Tq \\0 \end{bmatrix}. # $$ # ## Linear system solution with Toeplitz matrices # # How to solve linear systems with Toeplitz matrices? # # Since we have fast matrix-by-vector product, methods we can use iterative methods. # # We can also use direct solvers, but what is **the structure of the inverse** of such matrix? # # Discuss it next time! # ## Take home message # - ILU # - Iterative eigensolvers # - Toeplitz matrices, FFT, circulant # ## Next lecture # - More on Toeplitz matrices (direct and iterative methods, block case). # - Back to matrices: matrix functions # # Questions? # In[1]: from IPython.core.display import HTML def css_styling(): styles = open("./styles/custom.css", "r").read() return HTML(styles) css_styling()