We've discussed how to compute a factorization
$$A = QR,$$where $Q$ is orthogonal and $R$ is upper-triangular. For our presentation of the QR algorithm, we assume that $A= A^T$, i.e. $A$ is symmetric. The QR algorithm applies more generally, but we focus only on this case.
Because $A$ is symmetric, we know that, by the Spectral Theorem, $A = U D U^T$ where $D$ is a diagonal matrix (containing the eigenvalues) and $U$ is an orthogonal matrix (containing the eigenvectors). We continue with the philosophy of taking powers of the matrix, to assist us in finding the eigenvalues:
$$A^k = U D^k U^T.$$We will consider the case where $A$ is $2 \times 2$ first. We assume $|\lambda_1| > |\lambda_2|$. If we look at the first column of $A^k$, we are computing the vector
$$r_1^{(k)}=A^k e_1, \quad e_1 = \begin{bmatrix} 1 & 0 \end{bmatrix}^T.$$This is very similar to the Power method with starting vector $x^{(0)} = e_1$. Indeed, let $u_1$ and $u_2$ be the two columns of $U$, so that
$$ r_1^{(k)}=A^k e_1 = U D^k U^T e_1 = \begin{bmatrix} u_1 & u_2 \end{bmatrix} \begin{bmatrix} \lambda^k_1 & 0 \\ 0 & \lambda^k_2 \end{bmatrix} \begin{bmatrix} u_1^T \\ u_2^T \end{bmatrix} e_1, $$$$ = \lambda_1^k (u_1^T e_1) u_1 + \lambda_2^k (u_2^Te_1) u_2 = \lambda_1^k (u_1^T e_1) u_1 \left(1 + O \left( \left| \frac{\lambda_2}{\lambda_1}\right|^k \right) \right), \quad k \to \infty.$$So, it follows that $A^ke_1$ aligns with the dominant eigevector $u_1$ as $k \to \infty$. A nice way to mathematically capture this fact is:
$$\lim_{k\to \infty} \frac{ |u_1^Tr_1^{(k)}|}{|u_2^Tr_1^{(k)}|} = \infty.$$i.e., the projection on to the first eigenvector is asymptotically larger the projection onto the second. Typically, the same is true of $A^k e_2$, the second column of $A^k$:
$$ A^k e_2 = U D^k U^T e_2 = \begin{bmatrix} u_1 & u_2 \end{bmatrix} \begin{bmatrix} \lambda^k_1 & 0 \\ 0 & \lambda^k_2 \end{bmatrix} \begin{bmatrix} u_1^T \\ u_2^T \end{bmatrix} e_2, $$$$ = \lambda_1^k (u_1^T e_2) u_1 + \lambda_2^k (u_2^Te_2) u_2.$$For notational convenience, define $\beta_{ij} = u_i^T e_j$, all of which we assume are non-zero. To try to isolate the second eigenvector, we can try Gram-Schmidt:
$$r_2^{(k)} = A^ke_2 - \left( \frac{(A^ke_1)^T A^k e_2}{(A^k e_1)^T A^k e_1)} \right) A^k e_1.$$This creates a vector $r_k$ that is orthogonal to the first column of $A$. So, we can hope that $r_k$ will begin to look like the second eigenvector of $A$:
$$r_2^{(k} = \lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2 - \frac{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2)}{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)}(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)$$We compute the products that occur in the fraction using the rule: If $u$ and $v$ are orthogonal, then $(\alpha u + \beta v)^T(\gamma u + \delta v) = \alpha \gamma + \beta \delta$ for scalars $\alpha,\beta,\gamma,\delta$. We find
$$ (\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2) = \lambda_1^{2k} \beta_{11}\beta_{12} + \lambda_2^{2k} \beta_{22} \beta_{21},$$$$ (\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2) = \lambda_1^{2k} \beta_{11}^2 + \lambda_2^{2k}\beta_{21}^2.$$The ratio is then
$$\frac{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2)}{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)} = \frac{ \lambda_1^{2k} \beta_{11}\beta_{12} + \lambda_2^{2k} \beta_{22} \beta_{21}} {\lambda_1^{2k} \beta_{11}^2 + \lambda_2^{2k} \beta_{22}^2} = \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2}, ~~ \delta = \lambda_2/\lambda_1$$To see if this vector $r_2^{(k)}$ points in the direction of $u_1$ or $u_2$ we take the inner products, using orthogonality:
$$ {u_1^Tr_2^{(k)}} = {\lambda_1^k} \beta_{12} - \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2} {\lambda_1^k} \beta_{11} = \lambda_2^k \delta^{-k} \left( \beta_{12} - \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2}\right).$$It can be shown that the quantity in parenthesis is $O(|\delta|^{2k})$ as $k \to \infty$. In other words:
$$ {u_1^Tr_2^{(k)}} = \lambda_2^k O(|\delta|^k).$$We now compare this to
$${u_2^Tr_2^{(k)}} = \lambda_2^k \left( \beta_{22} - \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2} \beta_{12} \right).$$This time the quantity in parenthesis has no reason to vanish as $k \to \infty$. We see that (typically)
$$\lim_{k \to \infty} \frac{|u_2^Tr_2^{(k)}|}{|u_1^Tr_2^{(k)}|} = \infty.$$This is a long-winded way of saying that if you apply Gram-Schmidt to the columns of $A^k$, for a symmetric matrix $A$, then the orthogonal matrix that results will converge to the matrix $U$ of eigenvectors of $A$. This means that:
Suppose $A$ is a symmetric, non-singular matrix $A$ with eigenvalues that have distinct magnitudes
$$|\lambda_1| > |\lambda_2| > \cdots > |\lambda_n|.$$If $A^k = QR = Q^{(k)} R^{(k)}$ is the QR factorization of $A^k$, then as $k \to \infty$,
$$(Q^{(k)})^T A Q^{(k)} = (Q^{(k)})^T U D U^TQ^{(k)} \to D.$$Computational issue: We should NOT compute $A^k$ for $k$ large! This will typically result in overflow and underflow. We need another way to compute $Q^{(k)}$. Consider the sequence:
$A = Q_1 R_1, \quad \text{ the QR factorization of $A$},$
$A_1 = R_1 Q_1, \quad \text{reverse the order of multiplication},$
$A_1 = Q_2 R_2, \quad \text{ the QR factorization of $A_1$},$
$A_2 = R_2 Q_2, \quad \text{reverse the order of multiplication},$
$A_2 = Q_3 R_3, \quad \text{ the QR factorization of $A_2$},$
$A_3 = R_3 Q_3, \quad \text{reverse the order of multiplication},$
$\vdots$
For any $k \geq 1$, $A_k$ has the same eigenvalues as $A$, $Q_1 Q_{2} \cdots Q_k = Q^{(k)} \to U$ and $A_k \to D$ as $k \to \infty$.
We first show that the eigenvalues never change. $A = Q_1 R_1 = Q_1 R_1 Q_1 Q_1^T = Q_1 A_1 Q_1^T$. So, $A$ is similar to $A_1$. In general, it follows that $A_{k-1} = Q_k A_k Q_{k}^T$ and therefore $A_k$ is similar to $A$ for every $k$, by induction:
$$A_k = Q^T_k A_{k-1} Q_{k} = (Q_1 Q_2 \cdots Q_k)^T A Q_1 Q_{2} \cdots Q_k. $$Next, we show that $Q_1 Q_{2} \cdots Q_k = Q^{(k)}$ for every $k$. For $k = 1$, $A = Q_1R_1$ and $A^1= Q^{(1)} R^{(1)}$, and by the uniqueness of the QR factorization $Q_1 = Q^{(1)}$. Now, assume $Q_{1}Q_{2} \cdots Q_{k-1} = Q^{(k-1)}$ and we show it holds for $k \to k +1$ (induction). So, we have
$$A^{k-1} = Q_{1} Q_{2} \cdots Q_{k-1} R^{(k-1)} = Q^{(k-1)}R^{(k-1)},$$$$ A^k = A Q_{1} Q_{2} \cdots Q_{k-1} R^{(k-1)},$$$$ A Q_{1} Q_{2} \cdots Q_{k-1} = Q_1 Q_2 \cdots Q_{k-1} A_{k-1}, $$$$ A^k = Q_1 Q_2 \cdots Q_{k-1} A_{k-1} R^{(k-1)},$$$$A_{k-1} = Q_k R_k,$$$$ A^k = Q_1 Q_2 \cdots Q_{k-1} Q_k R_k R^{(k-1)}.$$By the uniqueness of the QR factorization ($R_k R^{(k-1)}$ is upper-triangular with positive diagonal entries), we find $Q_1 Q_2 \cdots Q_{k-1} Q_k = Q^{(k)}$. The convergence then follows from Theorem 1.
We need a convergence criteria for the QR algorithm. From the Gershgorin Circle Theorem, it follows that if the sum of the off-diagonal entries of $A_k$ in each row are each less then $\epsilon > 0$ then the eigenvalues are all known to an accuracy of $\epsilon$.
for $k = 1,2,\ldots,N_{\max}$
Set $[Q,R] = \text{QR}(A)$ % Do this with MGS
Set $A = RQ$
If $\|A - \mathrm{diag}(A)\|_\infty < \mathrm{TOL}$, output $\mathrm{diag}(A)$
endfor
A = randn(5);
A = A + A'; % Convergence is slow on some matrices
D = QRalg(A,400,10e-10)'
flipud(eigs(A))'
ans = QR completed in 197 iterations D = 5.1809 -4.6179 3.3188 1.8265 -0.4698 ans = 5.1809 -4.6179 3.3188 1.8265 -0.4698
A = randn(5);
A = A*A'; % Convergence is much faster on others
D = QRalg(A,400,10e-10)'
flipud(eigs(A))'
ans = QR completed in 39 iterations D = 29.9783 16.1580 8.2269 1.2845 0.1065 ans = 29.9783 16.1580 8.2269 1.2845 0.1065
Some final remarks on eigenvalue algorithms:
This is not the QR algorithm that is used in practice. Typically, a symmetric matrix is reduced to tridiagonal form using Householder transformations. Then a QR algorithm, that is mathematically equivalent to the one we discussed, is used that exploits the tridiagonal structure. This gives a much more efficient algorithm.
To see how the Power method can fail, consider the matrix
We want to compute the eigenvalues with the symmetric Power method. Choose $x = x^{(0)} = [x_1, x_2]^T$. It follows that
$$\lambda^{(k)} = \frac{x^T A^{2k-1} x}{x^T A^{2k-2} x}.$$
And then
$$x^T A^{2k-1} x = (-2)^{2k-1} x_1^2 + 2^{2k-1} x_2^2 = 2^{2k-1}(x_2^2 - x_1^2),$$
$$x^T A^{2k-2} x = (-2)^{2k-2} x_1^2 + 2^{2k-2} x_2^2 = 2^{2k-2}(x_2^2 + x_1^2).$$
So, we find
$$\lambda^{(k)} = 2\frac{x_2^2 - x_1^2}{x_2^2 + x_1^2} \neq \pm 2,$$
for most choices of $x_1$ and $x_2$. This is the worst case for a numerical algorithms: It converges to the wrong answer! This can be fixed. If one consider $A + I = \mathrm{diag}(-1,3)$, the power method will converge, $\lambda^{(k)} \to 3$ as $k \to \infty$. And because we added the identity matrix, we subtract one: $\lambda^{(k)} -1 \to 2$, a true eigenvalue of $A$.