Given an $m \times m$ nonsingular matrix $A$, if no row swaps are needed, we can compute the LU factorization $A = LU$ by Gaussian elimination.
Now, suppose that we "augment" the matrix $A$ by an $m \times m$ identity matrix $I$, forming the matrix $\begin{pmatrix} A & I \end{pmatrix}$. If we do Gaussian elimination on this matrix (again without row swaps), we will get something like:
$$ \begin{pmatrix} A & I \end{pmatrix} \to \begin{pmatrix} U & C \end{pmatrix} $$where the first $m$ columns are the same $U$ matrix as before, and the last $m$ columns (from the elimination steps acting on $I$) are some matrix $C$.
Give an explicit formula for $C$ in terms of $L$ and/or $U$. (Hint: remember the derivation of the Gauss–Jordan method to compute $A^{-1}$. This is not quite the same, but the ideas are similar.)
Check your answer by trying it for a random $4\times 4$ matrix in Julia using the code below: enter your formula in the ???? box, execute the two code cells, and inspect the results.
A = rand(4,4) # a random 4×4 matrix
L, U = lu(A, Val{false}) # LU factors of A, without row swaps
# show the L and U factors (with bold labels):
display("text/html", "<b>L = </b>")
display(L)
display("text/html", "<b>U = </b>")
display(U)
M = [A I] # augmented 4×8 matrix
Lₘ, Uₘ = lu(M, Val{false}) # LU factors of M, without row swaps
# show the first four columns of Uₘ, which should match U:
display("text/html", "<b>Should also equal U: </b>")
display(Uₘ[:,1:4])
# the last four columns of C are our C matrix:
display("text/html", "<b>final output is C: </b>")
C = Uₘ[:,5:8]
4×4 Array{Float64,2}: 1.0 0.0 0.0 0.0 0.735015 1.0 0.0 0.0 1.70646 3.89619 1.0 0.0 0.899307 -1.24215 -0.218737 1.0
4×4 Array{Float64,2}: 0.452841 0.436244 0.390122 0.355589 0.0 -0.18783 0.416922 0.555298 0.0 0.0 -2.09388 -1.82241 0.0 0.0 0.0 0.836759
4×4 Array{Float64,2}: 0.452841 0.436244 0.390122 0.355589 0.0 -0.18783 0.416922 0.555298 0.0 0.0 -2.09388 -1.82241 0.0 0.0 0.0 0.836759
4×4 Array{Float64,2}: 1.0 0.0 0.0 0.0 -0.735015 1.0 0.0 0.0 1.1573 -3.89619 1.0 0.0 -1.55916 0.389909 0.218737 1.0
From class, elimination corresponds to multiplying the matrix on the left by some elimination matrices
Remember that each of Gaussian elimination, i.e. some row operation, is given by multiplication on the left by some elimination matrix. Therefore, if we do some sequence of elimination steps to transform $\begin{pmatrix} A & I\end{pmatrix}$ into $\begin{pmatrix} U & C\end{pmatrix}$, with the first step corresponding to left-multiplication by $E_1$, the second corresponding to left-multiplication by $E_2$, etc., for a total of $k$ steps, then the entire Gaussian elimination process is given by multiplication on the left by the product matrix $E = E_k \cdots E_1$. That is to say, we have $$\begin{pmatrix} U & C\end{pmatrix} = X\begin{pmatrix} A & I\end{pmatrix} = \begin{pmatrix} EA & EI\end{pmatrix}.$$ From this we can read off two equations, by equating the left and right blocks on the far left and right sides:
But, from class, $E$ (if there are no row swaps) is a lower-triangular matrix that is simply $L^{-1}$, since $EA=U \implies A = E^{-1} U = LU$, and in fact we defined $L$ precisely as the inverse of the elimination steps.
So $C = E = L^{-1}$.
Let's check this numerically
inv(L)
4×4 Array{Float64,2}: 1.0 0.0 0.0 0.0 -0.735015 1.0 0.0 0.0 1.1573 -3.89619 1.0 0.0 -1.55916 0.389909 0.218737 1.0
This is exactly the same as the $C$ matrix printed above, so our deduction was correct!
(From Strang, section 2.4, problem 30.)
With $i^2 = -1$, the product of $(A+iB)$ and $(x+iy)$ (for real matrices A and B and real vectors x and y) is $Ax + iBx + iAy - By$. Instead, write the same operation in terms of a real matrix-vector product of twice the size, acting on vectors of the real parts on top of the imaginary parts:
$$ \begin{pmatrix} A & -B \\ ? & ? \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} Ax - By \\ ? \end{pmatrix} = \begin{pmatrix} \mbox{real part} \\ \mbox{imaginary part} \end{pmatrix} $$Fill in the question marks.
Check your answer in Julia on random 3×3 matrices using the following code (note: in Julia, $i$ is represented by im
):
A = rand(3,3)
B = rand(3,3)
x = rand(3)
y = rand(3)
(A + im*B) * (x + im*y)
3-element Array{Complex{Float64},1}: -0.661795+1.78111im -0.606395+2.23004im 0.991824+1.31834im
From the problem statement we have $(A + iB)(x + iy) = Ax + iBx + iAy - By = (Ax - By) + i(Bx + Ay),$ where in this last expression we've grouped the real and imaginary parts separately. So the far right question mark needs to be $Bx + Ay$. By the definition of matrix multiplication (which works for blocks!), this forces the two question marks on the left as well: $$ \begin{pmatrix} A & -B \\ B & A \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} Ax - By \\ Bx + Ay \end{pmatrix}$$
Writing this out in Julia, we have
# fill in the ?'s. The result should be the real and imaginary parts of the vector
# printed from the previous output, stacked on top of one another:
[A -B
B A] * [x
y]
6-element Array{Float64,1}: -0.661795 -0.606395 0.991824 1.78111 2.23004 1.31834
which matches the real and imaginary parts of the output printed above.
(From Strang, section 2.5, problem 5.)
Find an upper-triangular $U$ (not diagonal) with $U^2 = I$ and $U = U^{-1}$.
Here is a "brute force" way of solving this problem, by just setting up a set of equations for the entries of $U$ and solving them:
Let's look for a $2 \times 2$ matrix $U$ with the specified properties. Give the entries of $U$ names: $$U = \begin{pmatrix} u_{11} & u_{12} \\ 0 & u_{21}\end{pmatrix}.$$ Take a look at the square of $U$: $$U^2 = \begin{pmatrix} u_{11}^2 & u_{11}u_{12} + u_{12}u_{22} \\ 0 & u_{22}^2\end{pmatrix} = \begin{pmatrix} u_{11}^2 & (u_{11} + u_{22})u_{12} \\ 0 & u_{22}^2\end{pmatrix}.$$ We need that this matrix above is $I$ (of course, $U^2 = I$ and $U = U^{-1}$ are equivalent, so we can just focus on the first one). We need $u_{11}^2 = u_{22}^2 = 1$, which means that each of $u_{11}$ and $u_{22}$ is either 1 or -1. We also need that $(u_{11} + u_{22})u_{12} = 0$, i.e. that either $u_{11} + u_{22} = 0$ or $u_{12} = 0$ (or both). But $U$ wasn't allowed to be diagonal, so we can't have $u_{12} = 0$, so we must have instead that $u_{11} + u_{22} = 0$, i.e that $u_{22} = -u_{11}$.
So, we see that, as long as $u_{22} \neq 0$, the matrix $$\boxed{\begin{pmatrix} 1 & u_{22} \\ 0 & -1 \end{pmatrix}}$$ and its negative are upper-triangular, equal to their own inverse, and not diagonal, and that these are the only such $2 \times 2$ matrices.
To get the solution more elegantly, let's think about what upper triangular matrices do. $Ux$ only sends information "upwards" in $x$, i.e. it adds multiples of later rows to earlier rows (exactly the opposite of what lower triangular matrices do in elimination). What we need is a matrix that sends an entry "up" in the first step, but if we do the step twice then it cancels the previous step.
Let's start with a really simple upper-triangular matrix, as "close to $I$" as possible, making it $4 \times 4$ to be interesting. $$ \begin{pmatrix} 1 & & & 1 \\ & 1 & & \\ & & 1 & \\ & & & 1 \end{pmatrix} $$ which is just the identity matrix plus a single 1 above the diagonal. If we multiply this by a vector $x$, it adds the first and last rows, and keeps the other rows the same: $$ \begin{pmatrix} 1 & & & 1 \\ & 1 & & \\ & & 1 & \\ & & & 1 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} = \begin{pmatrix} x_1 + x_4 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} $$ If we multiply by the matrix over and over, it will keep adding "more" of $x_4$ to the first row, and we'll never get back to the original vector. Instead, if we multiply by the matrix a second time, what we want do is subtract instead so that we cancel out the previous step. How can we do that? We just need to flip the sign of $x_4$. Let's instead use $$ \boxed{U = \begin{pmatrix} 1 & & & 1 \\ & 1 & & \\ & & 1 & \\ & & & -1 \end{pmatrix}} $$ which gives $$ U \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} = \begin{pmatrix} x_1+x_4 \\ x_2 \\ x_3 \\ -x_4 \end{pmatrix} \implies U^2 \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} = \begin{pmatrix} x_1+x_4-x_4 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} = \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} $$ and hence $U^2 = I$.
Of course, there are many other possible upper-triangular matrices that follow this general idea.
(From Strang, section 2.5, problem 12.)
If the product $C = AB$ is invertible (for square A and B), then $A$ itself is invertible. Show this by finding a formula for $A^{-1}$ in terms of $C^{-1}$ and $B$.
Check your answer for random 3×3 matrices in Julia using the code below.
A = rand(3,3)
B = rand(3,3)
C = A * B
inv(A) # output A⁻¹
3×3 Array{Float64,2}: -0.73291 -4.53468 2.48916 0.323729 4.80398 -1.17527 0.845977 -3.77757 0.692128
Let's suppose for a moment for wishful thinking that $A$ was really invertible. What would the inverse have to be? The information we have is that $C$ is invertible, so let's try to use that. That $C$ is invertible means there is some matrix $C$ such that $CC^{-1} = C^{-1}C = I$. We could multiply our equation $C = AB$ on one side or the other by $C^{-1}$ to try to get closer to finding an inverse for $A$. If we multiply on the left, we get $$I = C^{-1}C = C^{-1}AB,$$ which isn't terribly helpful. If we multiply by $C^{-1}$ on the right instead, we get the equation $$I = CC^{-1} = ABC^{-1},$$ which is much more helpful! Putting parentheses above, we have that $I = A(BC^{-1})$, which tells us exactly that $A$ is in fact invertible and that $A^{-1} = BC^{-1}$.
Let's check this in Julia:
# our expression in terms of inv(C) and B that gives the same result as inv(A):
B*inv(C)
3×3 Array{Float64,2}: -0.73291 -4.53468 2.48916 0.323729 4.80398 -1.17527 0.845977 -3.77757 0.692128
This agrees with inv(A)
above, hooray!
Remark: In our derivation, we did not need to assume that B is invertible too! However, we can of course prove that too by the same process, and obtain $B^{-1} = C^{-1} A$ — but just because it is true doesn't mean that are allowed to assume it in order to prove that A is invertible (that would be almost circular reasoning). Let's check that formula for $B^{-1}$ too, for fun:
inv(B)
3×3 Array{Float64,2}: 5.05245 -9.92964 2.2534 -8.06185 -0.211051 9.01923 0.830514 10.6695 -7.89531
inv(C) * A
3×3 Array{Float64,2}: 5.05245 -9.92964 2.2534 -8.06185 -0.211051 9.01923 0.830514 10.6695 -7.89531
(From Strang, section 2.5, problem 31.)
This matrix $A$ has a remarkably simple inverse. Find $A^{-1}$ by Gauss–Jordan elimination on $\begin{pmatrix} A & I \end{pmatrix}$.
$$ A = \begin{pmatrix} 1 & -1 & 1 & -1 \\ 0 & 1 & -1 & 1 \\ 0 & 0 & 1 & -1 \\ 0 & 0 & 0 & 1 \end{pmatrix} $$(You can check your answer by calling inv
in Julia, but you should still explicitly show how you get the inverse by hand-calculation using the Gauss–Jordan method.)
Notice that adding any of the bottom three rows to the row above it zeros out all of the entries in that row above the diagonal. So, if we add the second row to the first (so the bottom two haven't changed), the third to the second, and then the fourth to the third, we'll have done the Gauss-Jordan elimination.
We dutifully begin. We start with the augmented matrix:
$$\begin{pmatrix} A & I \end{pmatrix} = \left(\begin{array}{cccc|cccc} 1 & -1 & 1 & -1 & 1 & 0 & 0 & 0\\ 0 & 1 & -1 & 1 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & -1 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \end{array}\right), $$We then add the second row to the first, obtaining:
$$\begin{pmatrix} A & I \end{pmatrix} = \left(\begin{array}{cccc|cccc} 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0\\ 0 & 1 & -1 & 1 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & -1 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \end{array}\right), $$We then add the third row to the second, obtaining:
$$\begin{pmatrix} A & I \end{pmatrix} = \left(\begin{array}{cccc|cccc} 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0\\ 0 & 0 & 1 & -1 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \end{array}\right), $$We then finish by adding the fourth row to the third, obtaining:
$$\begin{pmatrix} A & I \end{pmatrix} = \left(\begin{array}{cccc|cccc} 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0\\ 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \end{array}\right), $$So, the block on the right of the augmented matrix is the inverse of $A$:
$$A^{-1} = \begin{pmatrix}1 & 1 & 0 & 0\\ 0 & 1 & 1 & 0\\ 0 & 0 & 1 & 1\\ 0 & 0 & 0 & 1\end{pmatrix}$$Suppose that you are given the $PA = LU$ factorization of an invertible $m \times m$ matrix $A$, and we want to solve the block-matrix equation:
$$ \begin{pmatrix} A & B \\ 0 & A \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} b \\ c \end{pmatrix} $$where $B$ is an $m \times m$ matrix, "0" denotes an $m \times m$ block of zeros, and x,y,b,c are m-component vectors.
Write the solution vectors x,y in terms of P,L,U,B,b,c (or the inverses of these matrices).
Explain how your answer can be computed in $\sim m^2$ operations (i.e. roughly proportional to $m^2$) if you do things in the right order and the right way.
The system we're given is block upper-triangular, so just like with usual upper-triangular systems let's solve it by backsubstitution.
The bottom block-row gives the equation $Ay = c$. As $PA = LU$, we have $A = P^{-1}LU$, so the equation reads $P^{-1}LUy = c$, and as $P, L, U$ are invertible we can multiply on the left by $P$, and then by $L^{-1}$, and then by $U^{-1}$ to obtain $$\boxed{y = U^{-1}L^{-1}Pc}.$$ (Equivalently, $Ay = c \implies PAy = Pc = LUy \implies y = U^{-1} L^{-1} c$.)
Now we back-substitute. The first equation reads $Ax + By = b$ so we have $x = A^{-1}(b - By).$ As $y = U^{-1}L^{-1}Pc$ and $A^{-1} = U^{-1}L^{-1}P$, plugging in we get: $$\boxed{x = U^{-1}L^{-1}P(b - By)}.$$
Now let's see how, if we're handed the matrices $P, L, U$ and the vectors $b, c$, we can compute $x$ and $y$ by the formulas above in $\sim m^2$ operations. Here are a few things to notice:
So, as long as we (a) interpret left multiplication of $L^{-1}$ or $U^{-1}$ on a previously-known $m \times 1$ vector to mean solving an equation using back/forward substitution (taking $\sim m^2$ operations) and (b) only apply matrices to vectors and never to other matrices, i.e. write $x$ and $y$ as: $$\boxed{y = U^{-1}(L^{-1}(Pc)) \\ x = U^{-1}(L^{-1}(P(b-By))},$$ we can solve for $x, y$ in $\sim m^2$ operations.