Lecture 5: Matrix multiplication part 2: Strassen algorithm

Matrix-by-matrix product (a reminder)

Naive algorithm: $2n^3 =$ $\mathcal{O}(n^3)$ operations.

However, $2n^2$ of input data, $n^2$ of output. Maybe there exists $\mathcal{O}(n^2)$ algorithm (not proven or disproven).

  • Strassen gives $\mathcal{O}(n^{2.807\dots})$ - sometimes used in practice

  • World record $\mathcal{O}(n^{2.37\dots})$ - big constant, not practical

Strassen algorithm

Let $A$ and $B$ be two $2\times 2$ matrices. Naive multiplication $C = AB$ $$ \begin{bmatrix} c_{11} & c_{12} \\ c_{21} & c_{22} \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{bmatrix} = \begin{bmatrix} a_{11}b_{11} + a_{12}b_{21} & a_{11}b_{21} + a_{12}b_{22} \\ a_{21}b_{11} + a_{22}b_{21} & a_{21}b_{21} + a_{22}b_{22} \end{bmatrix} $$ contains $8$ multiplications and $4$ additions.

In the work Gaussian elimination is not optimal (1969) Strassen found that one can calculate $C$ using 18 additions and only 7 multiplications: $$ \begin{split} c_{11} &= f_1 + f_4 - f_5 + f_7, \\ c_{12} &= f_3 + f_5, \\ c_{21} &= f_2 + f_4, \\ c_{22} &= f_1 - f_2 + f_3 + f_6, \end{split} $$ where $$ \begin{split} f_1 &= (a_{11} + a_{22}) (b_{11} + b_{22}), \\ f_2 &= (a_{21} + a_{22}) b_{11}, \\ f_3 &= a_{11} (b_{12} - b_{22}), \\ f_4 &= a_{22} (b_{21} - b_{11}), \\ f_5 &= (a_{11} + a_{12}) b_{22}, \\ f_6 &= (a_{21} - a_{11}) (b_{11} + b_{12}), \\ f_7 &= (a_{12} - a_{22}) (b_{21} + b_{22}). \end{split} $$

Fortunately, these formulas hold even if $a_{ij}$ and $b_{ij}$, $i,j=1,2$ are block matrices.

Thus, Strassen algorithm looks as follows. First of all we split matrices $A$ and $B$ of sizes $n\times n$, $n=2^d$ into 4 blocks of size $\frac{n}{2}\times \frac{n}{2}$. Then we calculate multiplications in the described formulas recursively. This leads us again to the divide and conquer idea.

Complexity of the Strassen algorithm

Number of multiplications

Calculation of number of multiplications is a trivial task. Let us denote by $M(n)$ number of multiplications used to multiply 2 matrices of sizes $n\times n$ using the divide and conquer concept. Then for naive algorithm we have $$ M_\text{naive}(n) = 8 M_\text{naive}\left(\frac{n}{2} \right) = 8^2 M_\text{naive}\left(\frac{n}{4} \right) = \dots = 8^{d-1} M(1) = 8^{d} = 8^{\log_2 n} = n^{\log_2 8} = n^3 $$ So, even when using divide and coquer idea we can not be better than $n^3$.

Lets calculate number of multiplications for the Strassen algorithm: $$ M_\text{strassen}(n) = 7 M_\text{strassen}\left(\frac{n}{2} \right) = 7^2 M_\text{strassen}\left(\frac{n}{4} \right) = \dots = 7^{d-1} M(1) = 7^{d} = 7^{\log_2 n} = n^{\log_2 7} $$

Number of addtitions

There is no point to estimate number of addtitions $A(n)$ for naive algorithm, as we already got $n^3$ multiplications. For the Strassen algorithm we have: $$ A_\text{strassen}(n) = 7 A_\text{strassen}\left( \frac{n}{2} \right) + 18 \left( \frac{n}{2} \right)^2 $$ since on the first level we have to add $\frac{n}{2}\times \frac{n}{2}$ matrices 18 times and then go deeper for each of the 7 multiplications. Thus, $$ \begin{split} A_\text{strassen}(n) =& 7 A_\text{strassen}\left( \frac{n}{2} \right) + 18 \left( \frac{n}{2} \right)^2 = 7 \left(7 A_\text{strassen}\left( \frac{n}{4} \right) + 18 \left( \frac{n}{4} \right)^2 \right) + 18 \left( \frac{n}{2} \right)^2 = 7^2 A_\text{strassen}\left( \frac{n}{4} \right) + 7\cdot 18 \left( \frac{n}{4} \right)^2 + 18 \left( \frac{n}{2} \right)^2 = \\ =& \dots = 18 \sum_{k=1}^d 7^{k-1} \left( \frac{n}{2^k} \right)^2 = \frac{18}{4} n^2 \sum_{k=1}^d \left(\frac{7}{4} \right)^{k-1} = \frac{18}{4} n^2 \frac{\left(\frac{7}{4} \right)^d - 1}{\frac{7}{4} - 1} = 6 n^2 \left( \left(\frac{7}{4} \right)^d - 1\right) \leqslant 6 n^2 \left(\frac{7}{4} \right)^d = 6 n^{\log_2 7} \end{split} $$ (since $4^d = n^2$ and $7^d = n^{\log_2 7}$).

Asymptotic behavior of $A(n)$ could be also found from the master theorem.

Total complexity

Total complexity is $M_\text{strassen}(n) + A_\text{strassen}(n)=$ $7 n^{\log_2 7}$. Strassen algorithm becomes faster when $$ \begin{split} 2n^3 &> 7 n^{\log_2 7}, \\ n &> 667, \end{split} $$ so it is not a good idea to get to the bottom level of recursion.

Strassen algorithm and tensor rank (advanced topic)

It is not clear how Strassen found these formulas. However, now we can see that they are not that artificial: there is a general approach based on the so-called tensor decomposition technique. Here by tensor we imply nothing, but a multidimensional array - generalization of the matrix concept to many dimensions.

Let us numerate elements in the $2\times 2$ matrices as follows $$ \begin{bmatrix} c_{1} & c_{3} \\ c_{2} & c_{4} \end{bmatrix} = \begin{bmatrix} a_{1} & a_{3} \\ a_{2} & a_{4} \end{bmatrix} \begin{bmatrix} b_{1} & b_{3} \\ b_{2} & b_{4} \end{bmatrix}= \begin{bmatrix} a_{1}b_{1} + a_{3}b_{2} & a_{1}b_{3} + a_{3}b_{4} \\ a_{2}b_{1} + a_{4}b_{2} & a_{2}b_{3} + a_{4}b_{4} \end{bmatrix} $$

This can be written as $$ c_k = \sum_{i=1}^4 \sum_{j=1}^4 x_{ijk} a_i b_j, \quad k=1,2,3,4 $$ where $x_{ijk}$ is a 3-dimensional array, that consists of zeros and ones: $$ \begin{split} x_{\ :,\ :,\ 1} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \end{pmatrix} \quad x_{\ :,\ :,\ 2} = \begin{pmatrix} 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ \end{pmatrix} \\ x_{\ :,\ :,\ 3} = \begin{pmatrix} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ \end{pmatrix} \quad x_{\ :,\ :,\ 4} = \begin{pmatrix} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ \end{pmatrix} \end{split} $$

Trilinear decomposition

To get Strassen algorithm we should do the following trick - decompose $x_{ijk}$ in the following way $$ x_{ijk} = \sum_{\alpha=1}^r u_{i\alpha} v_{j\alpha} w_{k\alpha}. $$ This decomposition is called trilinear tensor decomposition and has a meaning of separation of variables: we have a sum of $r$ (called rank) summands with separated $i$, $j$ and $k$.

Strassen via trilinear

Now we have $$ c_k = \sum_{\alpha=1}^r w_{k\alpha} \left(\sum_{i=1}^4 u_{i\alpha} a_i \right) \left( \sum_{j=1}^4 v_{j\alpha} b_j\right), \quad k=1,2,3,4. $$ Multiplications by $u_{i\alpha}$ or $v_{j\alpha}$ or $w_{k\alpha}$ do not require recursion since $u, v$ and $w$ are known precomputed matrices. Therefore, we have only $r$ multiplications of $\left(\sum_{i=1}^4 u_{i\alpha} a_i \right)$ and $\left( \sum_{j=1}^4 v_{j\alpha} b_j\right)$ where both factors depend on the input data.

As you might guess one can check that array $x_{ijk}$ has rank $r=7$, which leads us to $7$ multiplications and to the Strassen algorithm!

In [ ]: