In linear algebra, it is critically important to think about the shape (size) of matrices and vectors, and whether operations make sense. You can multiply $AB$ if $A$ is $m \times n$ ($m$ rows and $n$ columns) and $B$ is $n \times p$ — the "middle" dimensions need to match up. You can also add matrices of equal sizes, or multiply them by scalars. Multiplying $Ax$, an $m \times n$ matrix $A$ by an $n$-component vector, can be thought of as a special case of this rule if you think of $x$ as an $n \times 1$ "matrix". On exams, it is common for people to panic and start writing down nonsense, and an easy way to catch this is to make sure that the operations you are writing down have the correct shapes.
Suppose $A$ is a $5\times 3$ matrix, $B$ is a $3 \times 4$ matrix, $x$ is a $3 \times 1$ matrix (a 3-component column vector), $r$ is a $1 \times 5$ matrix (a 5-component "row vector"), and $s$ is a $1 \times 3$ matrix (a 3-component row vector). Give the shape ($m \times n$) of the result of each of the following operations, or say that the operation makes no sense:
As a check, you should also try out your answers in Julia with some random matrices and vectors of these sizes:
# matrices/vectors of the sizes given above, containing small random integers:
A = rand(-9:9, 5,3)
B = rand(-9:9, 3,4)
x = rand(-9:9, 3,1)
r = rand(-9:9, 1,5)
s = rand(-9:9, 1,3);
# a sample operation: replace this with A*B etcetera to check your answers above.
(x*s)^3
3×3 Array{Int64,2}: 1296 648 1296 -1296 -648 -1296 -6480 -3240 -6480
Here are the results of these operations in Julia for some random matrices of the specified sizes:
A*B
5×4 Array{Int64,2}: -82 -20 -140 5 81 3 125 3 -27 36 -27 12 36 54 98 -25 -25 1 -11 -56
B*A
DimensionMismatch("matrix A has dimensions (3,4), matrix B has dimensions (5,3)") Stacktrace: [1] _generic_matmatmul!(::Array{Int64,2}, ::Char, ::Char, ::Array{Int64,2}, ::Array{Int64,2}) at ./linalg/matmul.jl:492 [2] generic_matmatmul!(::Array{Int64,2}, ::Char, ::Char, ::Array{Int64,2}, ::Array{Int64,2}) at ./linalg/matmul.jl:483 [3] *(::Array{Int64,2}, ::Array{Int64,2}) at ./linalg/matmul.jl:146 [4] include_string(::String, ::String) at ./loading.jl:515
x*s
3×3 Array{Int64,2}: 4 2 4 -4 -2 -4 -20 -10 -20
s*x
1×1 Array{Int64,2}: -18
r*s
DimensionMismatch("matrix A has dimensions (1,5), matrix B has dimensions (1,3)") Stacktrace: [1] _generic_matmatmul!(::Array{Int64,2}, ::Char, ::Char, ::Array{Int64,2}, ::Array{Int64,2}) at ./linalg/matmul.jl:492 [2] generic_matmatmul!(::Array{Int64,2}, ::Char, ::Char, ::Array{Int64,2}, ::Array{Int64,2}) at ./linalg/matmul.jl:483 [3] *(::Array{Int64,2}, ::Array{Int64,2}) at ./linalg/matmul.jl:146 [4] include_string(::String, ::String) at ./loading.jl:515
A+B
DimensionMismatch("dimensions must match") Stacktrace: [1] promote_shape(::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}) at ./indices.jl:79 [2] +(::Array{Int64,2}, ::Array{Int64,2}) at ./arraymath.jl:37 [3] include_string(::String, ::String) at ./loading.jl:515
2A + x*r
DimensionMismatch("dimensions must match") Stacktrace: [1] promote_shape(::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}) at ./indices.jl:79 [2] +(::Array{Int64,2}, ::Array{Int64,2}) at ./arraymath.jl:37 [3] include_string(::String, ::String) at ./loading.jl:515
r*A*x
1×1 Array{Int64,2}: -141
x*A*r
DimensionMismatch("matrix A has dimensions (3,1), matrix B has dimensions (5,3)") Stacktrace: [1] _generic_matmatmul!(::Array{Int64,2}, ::Char, ::Char, ::Array{Int64,2}, ::Array{Int64,2}) at ./linalg/matmul.jl:492 [2] generic_matmatmul!(::Array{Int64,2}, ::Char, ::Char, ::Array{Int64,2}, ::Array{Int64,2}) at ./linalg/matmul.jl:483 [3] *(::Array{Int64,2}, ::Array{Int64,2}, ::Array{Int64,2}) at ./operators.jl:424 [4] include_string(::String, ::String) at ./loading.jl:515
A^3
DimensionMismatch("matrix A has dimensions (5,3), matrix B has dimensions (5,3)") Stacktrace: [1] _generic_matmatmul!(::Array{Int64,2}, ::Char, ::Char, ::Array{Int64,2}, ::Array{Int64,2}) at ./linalg/matmul.jl:492 [2] generic_matmatmul!(::Array{Int64,2}, ::Char, ::Char, ::Array{Int64,2}, ::Array{Int64,2}) at ./linalg/matmul.jl:483 [3] power_by_squaring(::Array{Int64,2}, ::Int64) at ./intfuncs.jl:182 [4] ^ at ./linalg/dense.jl:332 [inlined] [5] literal_pow(::Base.#^, ::Array{Int64,2}, ::Type{Val{3}}) at ./intfuncs.jl:205 [6] include_string(::String, ::String) at ./loading.jl:515
The following code multiplies two random upper-triangular matrices (matrices whose entries are zero below the diagonal).
L₁ = Matrix(UpperTriangular(rand(-9:9, 5,5)))
5×5 Array{Int64,2}: -3 -3 2 -9 0 0 -5 -6 6 3 0 0 -3 0 -4 0 0 0 7 -4 0 0 0 0 2
L₂ = Matrix(UpperTriangular(rand(-9:9, 5,5)))
5×5 Array{Int64,2}: -6 -3 -2 -5 3 0 9 -5 -7 0 0 0 -8 5 2 0 0 0 -5 9 0 0 0 0 0
L₁ * L₂
5×5 Array{Int64,2}: 18 -18 5 91 -86 0 -45 73 -25 42 0 0 24 -15 -6 0 0 0 -35 63 0 0 0 0 0
We observe that the result is again upper-triangular. In fact, the product of any two square upper-triangular matrices of the same size is again upper-triangular. This is true because the nonzero entries "match up" with zero entries when multiplying rows and columns to produce entries below the diagonal. More precisely:
Consider any upper-triangular $n \times n$ matrices $A = (a_{ij})$ and $B = (b_{ij})$ (this notation means that the row-$i$, column-$j$ entry of $A$ is $a_{ij}$, and similarly for $B$). That they are upper-triangular means precisely that $a_{ij} = b_{ij} = 0$ whenever $1 \leq j < i \leq n$. Now consider their product $AB = (c_{ij})$. To show that $AB$ is upper-triangular we just need to show that $c_{ij} = 0$ whenever $1 \leq j < i \leq n$. So, suppose $i, j$ are such that $1 \leq j < i \leq n$. By definition of matrix multiplication, $c_{ij} = \sum_{k = 1}^n a_{ik}b_{kj}$. For any $k$ with $1 \leq k \leq n$, we then must either have $k < i$ or $k > j$. But in the first case $a_{ik} = 0$ because $A$ is upper-triangular, and in the second case $b_{kj} = 0$ because $B$ is upper-triangular, so in either case $a_{ik}b_{kj} = 0$. So the whole sum $c = \sum_{k = 1}^n a_{ik}b_{kj} = 0$. Therefore $c_{ij} = 0$ whenever $1 \leq j < i \leq n$, so $AB = (c_{ij})$ is upper-triangular.
An clear but informal argument, e.g. with a diagram showing a row of $L_1$ times a column of $L_2$ and explaining why the result is zero above the diagonal, is also acceptable.
(From Strang, section 2.2, problem 11.)
A system of linear equations $Ax=b$ cannot have exactly two solutions. An easy way to see why: if two vectors $x$ and $y \ne x$ are two solutions (i.e. $Ax=b$ and $Ay=b$), what is another solution? (Hint: $x+y$ is almost right.)
$Ax=b$ and $Ay=b$, so $A(x+y)=Ax+Ay=2b$. [The key property that $A(x+y)=Ax+Ay$ is called linearity, and is what makes matrix multiplication a part of linear algebra.] But we want $b$ on the right-hand side, so we can just divide both sides by 2: $A((x+y)/2) = b$, so $(x+y)/2$ is a solution. (Since $x \ne y$, this is a new solution, halfway between $x$ and $y$.)
In fact, there are infinitely many solutions: anything on the line connecting $x$ and $y$. Let z = tx + (1-t)y for any number t. Then $z$ lies on the line connecting $x$ and $y$, and in fact as $t$ varies over all real numbers $t$ the vector $z$ traverses this entire line (check this on paper with your favorite vectors $x$ and $y$ in the plane!). Then $z$ is another solution, again thanks to linearity: $$Az = A(tx + (1-t)y) = tAx + (1 - t)Ay = tb + (1 - t)b = b.$$
(From Strang, section 2.2, problem 14.) Consider Gaussian elimination on the following system of equations:
$$ 2x + 5y + z = 0 \\ 4x + dy + z = 2 \\ y - z = 3 $$(Write your solution in matrix form.)
In matrix form, the system of equations is: $$\begin{pmatrix} 2 & 5 & 1 \\ 4 & d & 1 \\ 0 & 1 & -1\end{pmatrix}\begin{pmatrix} x \\ y \\ z\end{pmatrix} = \begin{pmatrix} 0 \\ 2 \\ 3\end{pmatrix}.$$ Now consider performing Gaussian elimination on the associated augmented $3 \times 4.$ Subtracting twice the first row from the second to eliminate the 2-1 entry gives: $$\left(\begin{array}{ccc|c} 2 & 5 & 1 & 0\\ 0 & d - 10 & -1 & 2\\ 0 & 1 & -1 & 3 \end{array}\right). $$
We need to do a row exchange (of the second and third rows) if the 2-2 entry is 0, i.e $d - 10 = 0$, i.e. $d = 10$.
The system will be singular exactly when the second and third rows of the $3 \times 3$ matrix above (ignoring the constants on the right) are scalar multiplies of one another; as the 2-3 and 3-3 entries both equal -1, this happens exactly when $d - 10 = 1$, i.e. when $d = 11$.
In the first case, when $d = 10$, we have the augmented matrix: $$\left(\begin{array}{ccc|c} 2 & 5 & 1 & 0\\ 0 & 0 & -1 & 2\\ 0 & 1 & -1 & 3 \end{array}\right). $$ Exchanging the second and third rows gives: $$\left(\begin{array}{ccc|c} 2 & 5 & 1 & 0\\ 0 & 1 & -1 & 3 \\ 0 & 0 & -1 & 2 \end{array}\right), $$ a nonsingular triangular system corresponding to the system of linear equations $$ 2x + 5y + z = 0 \\ y - z = 3 \\ -z = 2. $$
Given an exact count of the number of scalar additions and multiplications required to multiply $AB$ where $A$ and $B$ are two $m \times m$ matrices, and compare to the count for $A + B$. If you compute $AB + A + B$, which operation(s) will take most of the time for large $m$? (You can assume additions and multiplications take the same amount of time, which is approximately true.)
Determining performance on real computers is often quite a bit more complicated than just counting arithmetic operations, but it is still interesting to do some real timing. The following code benchmarks $1000 \times 1000$ and $2000 \times 2000$ matrix multiplications (repeating each timing measurement 3 times to reduce noise — just look at the smallest time). How does the ratio of the two times compare to your prediction about the operation counts?
m = 1000
A = rand(m,m)
B = rand(m,m)
BLAS.set_num_threads(1) # only use a single processor — multiprocessing makes things much weirder
@time A*B; @time A*B; @time A*B;
0.183616 seconds (228 allocations: 7.645 MiB) 0.167275 seconds (6 allocations: 7.630 MiB) 0.166910 seconds (6 allocations: 7.630 MiB)
m = 2000
A = rand(m,m)
B = rand(m,m)
BLAS.set_num_threads(1) # only use a single processor — multiprocessing makes things much weirder
@time A*B; @time A*B; @time A*B;
1.290393 seconds (6 allocations: 30.518 MiB, 0.14% gc time) 1.271722 seconds (6 allocations: 30.518 MiB, 0.29% gc time) 1.271718 seconds (6 allocations: 30.518 MiB, 1.17% gc time)
Computing the product $AB$ amounts to computing each of the $m^2$ entries of $AB$ independently. Each entry of the product is obtained as a product of a $1 \times m$ row vector in $A$ with a $m \times 1$ column vector in $B$. The product of such vectors requires $m$ scalar multiplications and $m - 1$ scalar additions (to add the $m$ products together). So it requires $2m - 1$ operations to compute each entry of $AB$, for a total of $m^2(2m - 1) = 2m^3 - m^2$ operations in total. Notice that, for large m, this is dominated by the first term, so the number of operations grows proportional to $m^3$. (This is sometimes denoted "$O(m^3)$" or, more precisely, "$\Theta(m^3)$" complexity in computer science.)
To compute $A + B$, we need to compute each of the $m^2$ entries, and each entry requires just the addition of two scalars. So this requires only $m^2$ operations.
So for large $m$, computing $AB + A + B$ is dominated by determining the product $AB$.
For $m = 1000$ computing $AB$ involves $1000^2(2*1000 - 1) = 1999000000$ operations, and for $m = 2000$ computing $AB$ involves $2000^2(2*2000 - 1) = 15996000000$ operations. The ratio is $15996000000/1999000000 = 8.002001\ldots \approx 8$. More simply, since the number of operations goes roughly proportional to $m^3$ for large $m$, doubling $m$ gives a factor of $2^3 = 8$. This agrees pretty well with the ratio of computation times seen above (1.27s/0.167s ≈ 7.6).
Remark: The exact timings that you get will vary, depending on what computer you used to run the benchmark. The ratio should be close to 8, but won't be exactly 8 because computers are complicated — you can't predict performance just by counting operations, is explained in more detail by this 2017 notebook from 18.S096.
Remark: All of the different "perspectives" on matrix multiplication that were presented in class and in the book require exactly the same $2m^3-m^2$ operations, just executed in different orders. In theory, it is actually possible to multiply two matrices with an operation count that grows more slowly than $m^3$. The famous "Strassen" algorithm has a count roughly proportional to $m^{2.8074}$, and the Coppersmith–Winograd algorithm needs only about $m^{2.24}$ operations (times a constant). In practice, however, these algorithms are virtually never used — they don't become advantageous until $m$ is so huge that arbitrary matrices are too big to multiply anyway. Essentially all practical matrix libraries use an $m^3$ algorithm for generic matrices.
Consider the matrix:
$$ A = \begin{pmatrix} 1 & 1 & 1 \\ 0 & 4 & 0 \\ 2 & 4 & 5 \end{pmatrix} $$What matrix $E$ puts $A$ into upper-triangular form $EA = U$? (Remember: this is the product of the matrices from the individual elimination steps.)
What matrix $F$ can you multiply by $A$ to put $A$ into lower triangular form $FA = L$? (Hint: do Gaussian elimination "upside down", i.e. from the bottom up.)
Finding $E$:
Recall that a single elimination step is just left multiplication by an "elementary matrix". For example, to eliminate the 3,1-entry by subtracting twice the first row from the third row is given by left multiplication by the elementary matrix $$ E_1 = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -2 & 0 & 1\end{pmatrix},$$ and the multiplication gives $$E_1A = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -2 & 0 & 1\end{pmatrix}\begin{pmatrix} 1 & 1 & 1 \\ 0 & 4 & 0 \\ 2 & 4 & 5\end{pmatrix} = \begin{pmatrix}1 & 1 & 1 \\ 0 & 4 & 0 \\ 0 & 2 & 3\end{pmatrix}.$$ To achieve upper-triangular form, we can then just subtract half the second row from the third row. This is achieved by left multiplication by the matrix $$E_2 = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -0.5 & 1\end{pmatrix},$$ giving $$(E_2E_1)A = E_2(E_1A) = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -0.5 & 1\end{pmatrix}\begin{pmatrix}1 & 1 & 1 \\ 0 & 4 & 0 \\ 0 & 2 & 3\end{pmatrix} = \begin{pmatrix}1 & 1 & 1 \\ 0 & 4 & 0 \\ 0 & 0 & 3\end{pmatrix} = U.$$ So if $E = E_2E_1$ then we have $EA = E_2E_1A$ is upper-triangular, as needed. Explicitly, $E$ is given by: $$E = E_2E_1 = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -0.5 & 1\end{pmatrix}\begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -2 & 0 & 1\end{pmatrix} = \boxed{\begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -2 & -0.5 & 1\end{pmatrix}}.$$
Finding $F$:
Now we want to eliminate the entries above the diagonal, starting in the last column. We can get rid of the 1,3-entry by subtracting 1/5 of the third column from the first. This is achieved by left multiplication by the elementary matrix $$F_1 = \begin{pmatrix} 1 & 0 & -1/5 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{pmatrix},$$ giving $$F_1A = \begin{pmatrix} 1 & 0 & -1/5 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{pmatrix}\begin{pmatrix} 1 & 1 & 1 \\ 0 & 4 & 0 \\ 2 & 4 & 5 \end{pmatrix} = \begin{pmatrix} 3/5 & 1/5 & 0 \\ 0 & 4 & 0 \\ 2 & 4 & 5 \end{pmatrix}.$$ To get rid of the 1,2-entry, we can not subtract -1/20 of the second row from the first. This is achieved by left multiplication by the elementary matrix $$F_1 = \begin{pmatrix} 1 & -1/20 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{pmatrix},$$ giving $$(F_2F_1)A = F_2(F_1A) = \begin{pmatrix} 1 & -1/20 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{pmatrix}\begin{pmatrix} 3/5 & 1/5 & 0 \\ 0 & 4 & 0 \\ 2 & 4 & 5 \end{pmatrix} = \begin{pmatrix} 3/5 & 0 & 0 \\ 0 & 4 & 0 \\ 2 & 4 & 5 \end{pmatrix} = L.$$ So the matrix $$F = F_2F_1 = \begin{pmatrix} 1 & -1/20 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{pmatrix}\begin{pmatrix} 1 & 0 & -1/5 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{pmatrix} = \boxed{\begin{pmatrix} 1 & -1/20 & -1/5 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{pmatrix}}$$ does the job.
Remark 1: These matrices $E$ and $F$ are not uniquely determined by the problem statement - there are tons of acceptable answers, in fact infinitely many. By problem 2, we know that the product of upper-triangular matrices is upper-triangular. So if we have some (let's say nonsingular) matrix $E$ such that $EA = U$ with $U$ upper-triangular and any nonsingular upper-triangular matrix $U'$, then $U'E$ is non-singular and $(U'E)A = U'(EA) = U'U$ is again upper-triangular, so $U'E$ is another perfectly good matrix "$E$" as asked for. So the matrix $E$ requested (and the corresponding upper triangular matrix $U$) is not unique, and there are as many good answers as there are upper triangular matrices. A similar statement for the matrix $F$ (including problem 2 for lower triangular matrices) holds true as well, of course.
Remark 2: Note that no row swaps are needed in this case. They are not forbidden, however, and if you do a row swap you will find that your elimination matrix is no longer triangular. To regain the convenience of triangular factorizations in the presence of row swaps, we will introduce the PA=LU factorization in class.
Remark 3: As noted in class, some matrices require row swaps in order to obtain an upper-triangular form. For example, consider: $$\begin{pmatrix} 0 & 1 \\ 2 & 3\end{pmatrix}$$ You cannot add any multiple of the first row to the third to obtain an upper-triangular matrix, and this matrix cannot be written in the form $LU$ with $L$ lower-triangular and $U$ upper-triangular.