Today we will talk about:
Recall that we are trying to avoid O(n4) complexity. The idea is to make a matrix have a simpler structure so that each step of QR algorithm becomes cheaper.
In case of a general matrix we can use the Hessenberg form.
The matrix A is said to be in the Hessenberg form, if
aij=0,if i≥j+2.By applying Householder reflections we can reduce any matrix to the Hessenberg form
U∗AU=HThe only difference with Schur decomposition is that we have to map the first column to the vector with two non-zeros, and the first element is not changed.
The computational cost of such reduction is O(n3) operations.
In a Hessenberg form, computation of one iteration of the QR algorithm costs O(n2) operations (e.g. using Givens rotations, how?), and the Hessenberg form is preserved by the QR iteration (check why).
In the symmetric case, we have A=A∗, then H=H∗ and the upper Hessenberg form becomes tridiagonal matrix.
From now on we will talk about the case of symmetric tridiagonal form.
Any symmetric (Hermitian) matrix can be reduced to the tridiagonal form by Householder reflections.
Key point is that tridiagonal form is preserved by the QR algorithm, and the cost of one step can be reduced to O(n)!
The iterations of the QR algorithm have the following form:
Ak=QkRk,Ak+1=RkQk.If A0=A is tridiagonal symmetric matrix , this form is preserved by the QR algorithm.
Let us see..
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')
#Generate a random tridiagonal matrix
n = 20
d = np.random.randn(n)
sub_diag = np.random.randn(n-1)
mat = np.diag(d) + np.diag(sub_diag, -1) + np.diag(sub_diag, 1)
mat1 = np.abs(mat)
mat1 = mat1/np.max(mat1.flatten())
plt.spy(mat)
q, r = np.linalg.qr(mat)
plt.figure()
b = r.dot(q)
b[abs(b) <= 1e-12] = 0
plt.spy(b)
#plt.figure()
#plt.imshow(np.abs(r.dot(q)))
b[0, :]
array([ 2.43863127, -0.51747331, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ])
In the tridiagonal form, you do not have to compute the Q matrix: you only have to compute the triadiagonal part that appears after the iterations
Ak=QkRk,Ak+1=RkQk,in the case when Ak=A∗k and is also tridiagonal.
Such matrix is defined by O(n) parameters; computation of the QR is more complicated, but it is possible to compute Ak+1 directly without computing Qk.
This is called implicit QR-step.
All the implicit QR algorithms are based on the following theorem:
Let Q∗AQ=H
Then, the first column of the matrix Q defines all of its other columns. It can be found from the equation
AQ=QH.The convergence of the QR-algorithm is a very delicate issue (see Tyrtyshnikov, Brief introduction to numerical analysis for details).
Summary. If we have a decomposition of the form
A=XΛX−1,A=[A11A12A21A22]and Λ=[Λ100Λ2],λ(Λ1)={λ1,…,λm}, λ(Λ2)={λm+1,…,λr},
and there is a gap between the eigenvalues of Λ1 and Λ2 (|λ1|≥⋯≥|λm|>|λm+1|≥⋯≥|λr|>0), then the A(k)21 block of Ak in the QR-iteration goes to zero with
‖A(k)21‖≤Cqk,q=|λm+1λm|,where m is the size of Λ1.
So we need to increase the gap! It can be done by the QR algorithm with shifts.
The convergence rate for a shifted version is then
|λm+1−skλm−sk|,where λm is the m-th largest eigenvalue of the matrix in modulus. If the shift is close to the eigenvalue, then the convergence speed is better.
There are different stratagies to choose shifts.
Shifts is a general stratagy to improve convergence of iterative methods of finding eigenvalues. In next slides we will illustrate how to choose shift on a simpler algorithm.
Remember the power method for the computation of the eigenvalues.
xk+1:=Axk,xk+1:=xk+1‖xk+1‖.It converges to the eigenvector corresponding to the largest eigenvalue in modulus.
The convergence can be very slow.
Let us try to use shifting strategy. If we shift the matrix as
A:=A−λkI.and the corresponding eigenvalue becomes small (but we need large). That is not what we wanted.
To make a small eigenvalue large, we need to invert the matrix, and that gives us inverse iteration
xk+1=(A−λI)−1xk,where λ is the shift which is approximation to the eigenvalue that we want. As it was for the power method, the convegence is linear.
To accelerate convergence one can use the Rayleigh quotient iteration (inverse iteration with adaptive shifts) which is given by the selection of the adaptive shift:
xk+1=(A−λkI)−1xk,In the symmetric case A=A∗ the convergence is locally cubic and locally quadratic otherwise.
Now let us talk about singular values and eigenvalues.
SVD: A=UΣV∗
exists for any matrix.
It can be also viewed as a reduction of a given matrix to the diagonal form by means of
two-sided unitary transformations:
Σ=U∗AV.By two-sided Householder transformation we can reduce any matrix to the bidiagonal form B.
Implicit QR-algorithm (with shifts) gives the way of computing the eigenvalues (and Schur form). But we cannot apply QR algorithm directly to the bidiagonal matrix, as it is not diagonalizable in general case.
However, the problem of the computation of the SVD can be reduced to the symmetric eigenvalue problem in two ways:
The case 1 is OK if you do not form T directly!
Thus, the problem of computing singular values can be reduced to the problem of the computation of the eigenvalues of symmetric tridiagonal matrix.
Done:
Next slides:
Suppose we have a tridiagonal matrix, and we split it into two blocks:
T=[T′1BB⊤T′2]We can write the matrix T as
T=[T100T2]+bmvv∗where vv∗ is rank 1 matrix, v=(0,…,0,1,1,0,…,0)T.
Suppose we have decomposed T1 and T2 already:
T1=Q1Λ1Q∗1,T2=Q2Λ2Q∗2Then (check),
[Q∗100Q∗2]T[Q∗100Q∗2]=D+ρuu∗,D=[Λ100Λ2]I.e. we have reduced the problem to the problem of the computation of the eigenvalues of
diagonal plus low-rank matrix
It is tricky to compute the eigenvalues of the matrix
D+ρuu∗The characteristic polynomial has the form
det(D+ρuu∗−λI)=det(D−λI)det(I+ρ(D−λI)−1uu∗)=0.Then (prove!!)
det(I+ρ(D−λI)−1uu∗)=1+ρn∑i=1|ui|2di−λ=0Hint: find det(I+wu∗) using the fact that det(C)=∏ni=1λi(C) and trace(C)=∑ni=1λi.
import numpy as np
lm = [1, 2, 3, 4]
M = len(lm)
D = np.array(lm)
a = np.min(lm)
b = np.max(lm)
t = np.linspace(-1, 6, 1000)
u = 0.5 * np.ones(M)
rho = 1
def fun(lam):
return 1 + rho * np.sum(u**2/(D - lam))
res = [fun(lam) for lam in t]
plt.plot(res, 'k')
plt.ylim([-6, 6])
plt.tight_layout()
plt.title('Plot of the function')
Text(0.5,1,'Plot of the function')
The function has only one root at [di,di+1]
We have proved, by the way, the Cauchy interlacing theorem (what happens to the eigenvalues under rank-1 perturbation)
A Newton method will fail (draw a picture with a tangent line).
Note that Newton method is just approximation of a function f(λ) by a linear function.
Much better approximation is the hyperbola:
f(λ)≈c0+c1di−λ+c2di+1−λ.To fit the coefficients, we have to evaluate f(λ) and f′(λ) in the particular point.
After that, the approximation can be recovered from solving quadratic equation
First, stability: This method was abandoned for a long time due to instability of the computation of the eigenvectors.
In the recursion, we need to compute the eigenvectors of the D+ρuu∗ matrix.
The exact expression for the eigenvectors is just (let us check!)
(D−αiI)−1u,where αi is the computed root.
The solution came is to use a strange Lowner theorem:
If αi and di satisfy the interlacing theorem
dn<αn<…<di+1<αi+1…Then there exists a vector ˆu such that αi are exact eigenvalues of the matrix
ˆD=D+ˆuˆu∗.So, you first compute the eigenvalues, then compute ˆu and only then the eigenvectors.
In the computations of divide and conquer we have to evaluate the sums of the form
f(λ)=1+ρn∑i=1|ui|2(di−λ),and have to do it at least for n points.
The complexity is then O(n2), as for the QR algorithm.
Can we make it O(nlogn)?
The answer is yes, but we have to replace the computations by the approximate ones
by the help of Fast Multipole Method.
I will do a short explanation on the whiteboard.
Absolutely different approach is based on the bisection.
Given a matrix A its inertia is defined as a triple (ν,ζ,π),
where ν is the number of negative, ζ - zero and π - positive eigenvalues.
If X is non-singular, then
Inertia(A)=Inertia(X∗AX)Given z we can do the Gaussian elimination:
A−zI=LDL∗,and inertia of the diagonal matrix is trivial to compute.
Thus, if we want to find all the eigenvalues in the interval a, b
Using inertia, we can easily count the number of eigenvalues in an interval.
Illustration: if Inertia(A)=(5,0,2) and after shift Inertia(A−zI)=(4,0,3), z∈[a,b] then we know that λ(A)∈[a,z].
Recall what a Jacobi (Givens rotations) are:
In a plane they correspong to a 2×2 orthogonal matrix of the form
(cosϕsinϕ−sinϕcosϕ),and in the n-dimensional case we select two variables i and j and rotate.
The idea of the Jacobi method is to minimize sum of squares of off-diagonal elements:
Γ(A)=off(U∗AU),off2(X)=∑i≠j|Xij|2=‖X‖2F−n∑i=1x2ii.by applying succesive Jacobi rotations U to zero off-diagonal elements.
When the "pivot" is chosen, it is easy to eliminate it.
The main question is then what is the order of sweeps we have to make (i.e. in which order to eliminate).
If we always eliminate the largest off-diagonal elements the method has quadratic convergence.
In practice, a cyclic order (i.e., (1,2),(1,3),…,(2,3),…) is used.
To show convergence, we firstly show that off(B)<off(A),
In this case we use the unitary invariance of Frobenius norm and denote by p and q the indices that is changed after rotation:
Γ2(A)=off2(B)=‖B‖2F−n∑i=1b2ii=‖A‖2F−∑i≠p,qb2ii−(b2pp+b2qq)=‖A‖2F−∑i≠p,qa2ii−(a2pp+2a2pq+a2qq)=‖A‖2F−n∑i=1a2ii−2a2pq=off2(A)−2a2pq<off2(A)
We show that the ''size'' of off-diagonal elements decreases after Jacobi rotation.
If we always select the largest off-diagonal element apq=γ to eliminate (pivot), then we have
|aij|≤γ,thus
off(A)2≤2Nγ2,where 2N=n(n−1) is the number of off-diagonal elements. Or rewrite this inequality in the form
2γ2≥off2(A)N.Now we use relations Γ2(A)=off2(A)−2γ2≤off2(A)−off2(A)N and get Γ(A)≤√(1−1N)off(A).
Aften K steps we have the factor
(1−1N)K2≈e−12,i.e. linear convergence. However, the convergence is locally quadratic (given without proof here).
Jacobi method was the first numerical method for the eigenvalues, proposed in 1846.
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()