Notebook

18.06 pset 2¶

Problem 1 (10 points)¶

Given an $m \times m$ nonsingular matrix $A$, if no row swaps are needed, we can compute the LU factorization $A = LU$ by Gaussian elimination.

Now, suppose that we "augment" the matrix $A$ by an $m \times m$ identity matrix $I$, forming the matrix $\begin{pmatrix} A & I \end{pmatrix}$. If we do Gaussian elimination on this matrix (again without row swaps), we will get something like:

$$ \begin{pmatrix} A & I \end{pmatrix} \to \begin{pmatrix} U & C \end{pmatrix} $$

where the first $m$ columns are the same $U$ matrix as before, and the last $m$ columns (from the elimination steps acting on $I$) are some matrix $C$.

Give an explicit formula for $C$ in terms of $L$ and/or $U$. (Hint: remember the derivation of the Gauss–Jordan method to compute $A^{-1}$. This is not quite the same, but the ideas are similar.)
Check your answer by trying it for a random $4\times 4$ matrix in Julia using the code below: enter your formula in the ???? box, execute the two code cells, and inspect the results.

In [1]:

A = rand(4,4) # a random 4×4 matrix
L, U = lu(A, Val{false}) # LU factors of A, without row swaps

# show the L and U factors (with bold labels):
display("text/html", "<b>L = </b>")
display(L)
display("text/html", "<b>U = </b>")
display(U)

M = [A I] # augmented 4×8 matrix
Lₘ, Uₘ = lu(M, Val{false}) # LU factors of M, without row swaps

# show the first four columns of Uₘ, which should match U:
display("text/html", "<b>Should also equal U: </b>")
display(Uₘ[:,1:4])

# the last four columns of C are our C matrix:
display("text/html", "<b>final output is C: </b>")
C = Uₘ[:,5:8]

L =

4×4 Array{Float64,2}:
 1.0        0.0       0.0       0.0
 0.735015   1.0       0.0       0.0
 1.70646    3.89619   1.0       0.0
 0.899307  -1.24215  -0.218737  1.0

U =

4×4 Array{Float64,2}:
 0.452841   0.436244   0.390122   0.355589
 0.0       -0.18783    0.416922   0.555298
 0.0        0.0       -2.09388   -1.82241 
 0.0        0.0        0.0        0.836759

Should also equal U:

4×4 Array{Float64,2}:
 0.452841   0.436244   0.390122   0.355589
 0.0       -0.18783    0.416922   0.555298
 0.0        0.0       -2.09388   -1.82241 
 0.0        0.0        0.0        0.836759

final output is C:

Out[1]:

4×4 Array{Float64,2}:
  1.0        0.0       0.0       0.0
 -0.735015   1.0       0.0       0.0
  1.1573    -3.89619   1.0       0.0
 -1.55916    0.389909  0.218737  1.0

Solution¶

From class, elimination corresponds to multiplying the matrix on the left by some elimination matrices

Remember that each of Gaussian elimination, i.e. some row operation, is given by multiplication on the left by some elimination matrix. Therefore, if we do some sequence of elimination steps to transform $\begin{pmatrix} A & I\end{pmatrix}$ into $\begin{pmatrix} U & C\end{pmatrix}$, with the first step corresponding to left-multiplication by $E_1$, the second corresponding to left-multiplication by $E_2$, etc., for a total of $k$ steps, then the entire Gaussian elimination process is given by multiplication on the left by the product matrix $E = E_k \cdots E_1$. That is to say, we have $$\begin{pmatrix} U & C\end{pmatrix} = X\begin{pmatrix} A & I\end{pmatrix} = \begin{pmatrix} EA & EI\end{pmatrix}.$$ From this we can read off two equations, by equating the left and right blocks on the far left and right sides:

$U = EA$
$C = EI = E$.

But, from class, $E$ (if there are no row swaps) is a lower-triangular matrix that is simply $L^{-1}$, since $EA=U \implies A = E^{-1} U = LU$, and in fact we defined $L$ precisely as the inverse of the elimination steps.

So $C = E = L^{-1}$.

Let's check this numerically

In [2]:

inv(L)

Out[2]:

4×4 Array{Float64,2}:
  1.0        0.0       0.0       0.0
 -0.735015   1.0       0.0       0.0
  1.1573    -3.89619   1.0       0.0
 -1.55916    0.389909  0.218737  1.0

This is exactly the same as the $C$ matrix printed above, so our deduction was correct!

Problem 2 (10 points)¶

(From Strang, section 2.4, problem 30.)

With $i^2 = -1$, the product of $(A+iB)$ and $(x+iy)$ (for real matrices A and B and real vectors x and y) is $Ax + iBx + iAy - By$. Instead, write the same operation in terms of a real matrix-vector product of twice the size, acting on vectors of the real parts on top of the imaginary parts:

$$ \begin{pmatrix} A & -B \\ ? & ? \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} Ax - By \\ ? \end{pmatrix} = \begin{pmatrix} \mbox{real part} \\ \mbox{imaginary part} \end{pmatrix} $$

Fill in the question marks.
Check your answer in Julia on random 3×3 matrices using the following code (note: in Julia, $i$ is represented by im):

In [3]:

A = rand(3,3)
B = rand(3,3)
x = rand(3)
y = rand(3)
(A + im*B) * (x + im*y)

Out[3]:

3-element Array{Complex{Float64},1}:
 -0.661795+1.78111im
 -0.606395+2.23004im
  0.991824+1.31834im

Solution¶

From the problem statement we have $(A + iB)(x + iy) = Ax + iBx + iAy - By = (Ax - By) + i(Bx + Ay),$ where in this last expression we've grouped the real and imaginary parts separately. So the far right question mark needs to be $Bx + Ay$. By the definition of matrix multiplication (which works for blocks!), this forces the two question marks on the left as well: $$ \begin{pmatrix} A & -B \\ B & A \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} Ax - By \\ Bx + Ay \end{pmatrix}$$

Writing this out in Julia, we have

In [4]:

# fill in the ?'s.  The result should be the real and imaginary parts of the vector
# printed from the previous output, stacked on top of one another:
[A -B
 B  A] * [x
          y]

Out[4]:

6-element Array{Float64,1}:
 -0.661795
 -0.606395
  0.991824
  1.78111 
  2.23004 
  1.31834

which matches the real and imaginary parts of the output printed above.

Problem 3 (10 points)¶

(From Strang, section 2.5, problem 5.)

Find an upper-triangular $U$ (not diagonal) with $U^2 = I$ and $U = U^{-1}$.

Solution 1: Brute force¶

Here is a "brute force" way of solving this problem, by just setting up a set of equations for the entries of $U$ and solving them:

Let's look for a $2 \times 2$ matrix $U$ with the specified properties. Give the entries of $U$ names: $$U = \begin{pmatrix} u_{11} & u_{12} \\ 0 & u_{21}\end{pmatrix}.$$ Take a look at the square of $U$: $$U^2 = \begin{pmatrix} u_{11}^2 & u_{11}u_{12} + u_{12}u_{22} \\ 0 & u_{22}^2\end{pmatrix} = \begin{pmatrix} u_{11}^2 & (u_{11} + u_{22})u_{12} \\ 0 & u_{22}^2\end{pmatrix}.$$ We need that this matrix above is $I$ (of course, $U^2 = I$ and $U = U^{-1}$ are equivalent, so we can just focus on the first one). We need $u_{11}^2 = u_{22}^2 = 1$, which means that each of $u_{11}$ and $u_{22}$ is either 1 or -1. We also need that $(u_{11} + u_{22})u_{12} = 0$, i.e. that either $u_{11} + u_{22} = 0$ or $u_{12} = 0$ (or both). But $U$ wasn't allowed to be diagonal, so we can't have $u_{12} = 0$, so we must have instead that $u_{11} + u_{22} = 0$, i.e that $u_{22} = -u_{11}$.

So, we see that, as long as $u_{22} \neq 0$, the matrix $$\boxed{\begin{pmatrix} 1 & u_{22} \\ 0 & -1 \end{pmatrix}}$$ and its negative are upper-triangular, equal to their own inverse, and not diagonal, and that these are the only such $2 \times 2$ matrices.

Solution 2¶

To get the solution more elegantly, let's think about what upper triangular matrices do. $Ux$ only sends information "upwards" in $x$, i.e. it adds multiples of later rows to earlier rows (exactly the opposite of what lower triangular matrices do in elimination). What we need is a matrix that sends an entry "up" in the first step, but if we do the step twice then it cancels the previous step.

Let's start with a really simple upper-triangular matrix, as "close to $I$" as possible, making it $4 \times 4$ to be interesting. $$ \begin{pmatrix} 1 & & & 1 \\ & 1 & & \\ & & 1 & \\ & & & 1 \end{pmatrix} $$ which is just the identity matrix plus a single 1 above the diagonal. If we multiply this by a vector $x$, it adds the first and last rows, and keeps the other rows the same: $$ \begin{pmatrix} 1 & & & 1 \\ & 1 & & \\ & & 1 & \\ & & & 1 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} = \begin{pmatrix} x_1 + x_4 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} $$ If we multiply by the matrix over and over, it will keep adding "more" of $x_4$ to the first row, and we'll never get back to the original vector. Instead, if we multiply by the matrix a second time, what we want do is subtract instead so that we cancel out the previous step. How can we do that? We just need to flip the sign of $x_4$. Let's instead use $$ \boxed{U = \begin{pmatrix} 1 & & & 1 \\ & 1 & & \\ & & 1 & \\ & & & -1 \end{pmatrix}} $$ which gives $$ U \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} = \begin{pmatrix} x_1+x_4 \\ x_2 \\ x_3 \\ -x_4 \end{pmatrix} \implies U^2 \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} = \begin{pmatrix} x_1+x_4-x_4 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} = \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} $$ and hence $U^2 = I$.

Of course, there are many other possible upper-triangular matrices that follow this general idea.

Problem 4 (10 points)¶

(From Strang, section 2.5, problem 12.)

If the product $C = AB$ is invertible (for square A and B), then $A$ itself is invertible. Show this by finding a formula for $A^{-1}$ in terms of $C^{-1}$ and $B$.
Check your answer for random 3×3 matrices in Julia using the code below.

In [5]:

A = rand(3,3)
B = rand(3,3)
C = A * B

inv(A) # output A⁻¹

Out[5]:

3×3 Array{Float64,2}:
 -0.73291   -4.53468   2.48916 
  0.323729   4.80398  -1.17527 
  0.845977  -3.77757   0.692128

Solution¶

Let's suppose for a moment for wishful thinking that $A$ was really invertible. What would the inverse have to be? The information we have is that $C$ is invertible, so let's try to use that. That $C$ is invertible means there is some matrix $C$ such that $CC^{-1} = C^{-1}C = I$. We could multiply our equation $C = AB$ on one side or the other by $C^{-1}$ to try to get closer to finding an inverse for $A$. If we multiply on the left, we get $$I = C^{-1}C = C^{-1}AB,$$ which isn't terribly helpful. If we multiply by $C^{-1}$ on the right instead, we get the equation $$I = CC^{-1} = ABC^{-1},$$ which is much more helpful! Putting parentheses above, we have that $I = A(BC^{-1})$, which tells us exactly that $A$ is in fact invertible and that $A^{-1} = BC^{-1}$.

Let's check this in Julia:

In [6]:

# our expression in terms of inv(C) and B that gives the same result as inv(A):
B*inv(C)

Out[6]:

3×3 Array{Float64,2}:
 -0.73291   -4.53468   2.48916 
  0.323729   4.80398  -1.17527 
  0.845977  -3.77757   0.692128

This agrees with inv(A) above, hooray!

Remark: In our derivation, we did not need to assume that B is invertible too! However, we can of course prove that too by the same process, and obtain $B^{-1} = C^{-1} A$ — but just because it is true doesn't mean that are allowed to assume it in order to prove that A is invertible (that would be almost circular reasoning). Let's check that formula for $B^{-1}$ too, for fun:

In [7]:

inv(B)

Out[7]:

3×3 Array{Float64,2}:
  5.05245   -9.92964    2.2534 
 -8.06185   -0.211051   9.01923
  0.830514  10.6695    -7.89531

In [8]:

inv(C) * A

Out[8]:

3×3 Array{Float64,2}:
  5.05245   -9.92964    2.2534 
 -8.06185   -0.211051   9.01923
  0.830514  10.6695    -7.89531

Problem 5 (10 points)¶

(From Strang, section 2.5, problem 31.)

This matrix $A$ has a remarkably simple inverse. Find $A^{-1}$ by Gauss–Jordan elimination on $\begin{pmatrix} A & I \end{pmatrix}$.

$$ A = \begin{pmatrix} 1 & -1 & 1 & -1 \\ 0 & 1 & -1 & 1 \\ 0 & 0 & 1 & -1 \\ 0 & 0 & 0 & 1 \end{pmatrix} $$

(You can check your answer by calling inv in Julia, but you should still explicitly show how you get the inverse by hand-calculation using the Gauss–Jordan method.)

Solution¶

Notice that adding any of the bottom three rows to the row above it zeros out all of the entries in that row above the diagonal. So, if we add the second row to the first (so the bottom two haven't changed), the third to the second, and then the fourth to the third, we'll have done the Gauss-Jordan elimination.

We dutifully begin. We start with the augmented matrix:

$$\begin{pmatrix} A & I \end{pmatrix} = \left(\begin{array}{cccc|cccc} 1 & -1 & 1 & -1 & 1 & 0 & 0 & 0\\ 0 & 1 & -1 & 1 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & -1 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \end{array}\right), $$

We then add the second row to the first, obtaining:

$$\begin{pmatrix} A & I \end{pmatrix} = \left(\begin{array}{cccc|cccc} 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0\\ 0 & 1 & -1 & 1 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & -1 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \end{array}\right), $$

We then add the third row to the second, obtaining:

$$\begin{pmatrix} A & I \end{pmatrix} = \left(\begin{array}{cccc|cccc} 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0\\ 0 & 0 & 1 & -1 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \end{array}\right), $$

We then finish by adding the fourth row to the third, obtaining:

$$\begin{pmatrix} A & I \end{pmatrix} = \left(\begin{array}{cccc|cccc} 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0\\ 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \end{array}\right), $$

So, the block on the right of the augmented matrix is the inverse of $A$:

$$A^{-1} = \begin{pmatrix}1 & 1 & 0 & 0\\ 0 & 1 & 1 & 0\\ 0 & 0 & 1 & 1\\ 0 & 0 & 0 & 1\end{pmatrix}$$

Remark: A matrix like $A^{-1}$ above is called a bidiagonal matrix, and is even nicer to work with than a triangular matrix! (You can multiply by a bidiagonal matrix or its inverse in $\sim m$ operations.) It is also in the form of something called a a Jordan block that we will see near the end of 18.06.

Problem 6 (10 points)¶

Suppose that you are given the $PA = LU$ factorization of an invertible $m \times m$ matrix $A$, and we want to solve the block-matrix equation:

$$ \begin{pmatrix} A & B \\ 0 & A \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} b \\ c \end{pmatrix} $$

where $B$ is an $m \times m$ matrix, "0" denotes an $m \times m$ block of zeros, and x,y,b,c are m-component vectors.

Write the solution vectors x,y in terms of P,L,U,B,b,c (or the inverses of these matrices).
Explain how your answer can be computed in $\sim m^2$ operations (i.e. roughly proportional to $m^2$) if you do things in the right order and the right way.
- Important: Remember from class that even if you write an expression like $M^{-1} v$ for some matrix $M$ and vector $v$, it doesn't mean that you have to compute it by calculating $M^{-1}$ first ($\sim m^3$ operations!) and then multiplying by $v$. Instead, you should solve $Mu = v$ for $u$ by the best method possible.
- Important: If you have an expression $MNv$ for matrices $M$ and $N$ and a vector $v$, there is a big difference between computing it as $(MN)v$ and $M(Nv)$. Why?

Solution¶

The system we're given is block upper-triangular, so just like with usual upper-triangular systems let's solve it by backsubstitution.

The bottom block-row gives the equation $Ay = c$. As $PA = LU$, we have $A = P^{-1}LU$, so the equation reads $P^{-1}LUy = c$, and as $P, L, U$ are invertible we can multiply on the left by $P$, and then by $L^{-1}$, and then by $U^{-1}$ to obtain $$\boxed{y = U^{-1}L^{-1}Pc}.$$ (Equivalently, $Ay = c \implies PAy = Pc = LUy \implies y = U^{-1} L^{-1} c$.)

Now we back-substitute. The first equation reads $Ax + By = b$ so we have $x = A^{-1}(b - By).$ As $y = U^{-1}L^{-1}Pc$ and $A^{-1} = U^{-1}L^{-1}P$, plugging in we get: $$\boxed{x = U^{-1}L^{-1}P(b - By)}.$$

Now let's see how, if we're handed the matrices $P, L, U$ and the vectors $b, c$, we can compute $x$ and $y$ by the formulas above in $\sim m^2$ operations. Here are a few things to notice:

If $v$ is any $m \times 1$ vector and $M$ is any $m \times m$ matrix, then the usual row-column computation of $Mv$ takes $\sim m^2$ operations (actually exactly $m(2m - 1)$, similarly to a problem from last week).
If $M$ is upper- or lower-triangular then computing the vector $M^{-1}v$, i.e. solving for $z$ in the equation $Mz = v$, takes $\sim m^2$ operations if we do the solving by back/forward-substitution!
If $M, N$ are $m \times m$ matrices and $v$ is a $m \times 1$ column vector, then the product $MNv$ can be bracketed two ways: $(MN)v = MNv = M(Nv).$ If we bracket the first way, i.e. $(MN)v$, then computing by multiplying rows and columns takes $m^2(2m - 1)$ operations to compute $MN$ and then another $m(2m - 1)$ operations to compute the product $(MN)v$ (once we have $v$), for a total of $2m^3 + m^2 - m \sim m^3$ operations. But, if we bracket it as $M(Nv)$, then it takes $m(2m - 1)$ operations to compute $Nv$ first and then another $m(2m - 1)$ to compute $M(Nv)$, for a total of $2m(2m - 1) \sim m^2$ operations!
Remark: Permuting a vector, by explicitly computing $Pc$ takes $\sim m^2$ operations if we actually construct the whole matrix $P$ and multiply by it. In practice, however, as discussed in class the computer really just tells us the desired ordering (gives a list of row numbers in the desired order), and copying a vector into a new order is just $\sim m$ operations.

So, as long as we (a) interpret left multiplication of $L^{-1}$ or $U^{-1}$ on a previously-known $m \times 1$ vector to mean solving an equation using back/forward substitution (taking $\sim m^2$ operations) and (b) only apply matrices to vectors and never to other matrices, i.e. write $x$ and $y$ as: $$\boxed{y = U^{-1}(L^{-1}(Pc)) \\ x = U^{-1}(L^{-1}(P(b-By))},$$ we can solve for $x, y$ in $\sim m^2$ operations.

In [ ]: