13 Linear Algebra: Singular Value Decomposition (Students)¶

One can always decompose a matrix $\mathsf{A}$

\begin{gather} \mathsf{A} = \mathsf{U}\,\text{diag}(w_j)\,\mathsf{V}^{T}\\ \mathsf{U}^T \mathsf{U} = \mathsf{U} \mathsf{U}^T = 1\\ \mathsf{V}^T \mathsf{V} = \mathsf{V} \mathsf{V}^T = 1 \end{gather}

where $\mathsf{U}$ and $\mathsf{V}$ are orthogonal matrices and the $w_j$ are the singular values that are assembled into a diagonal matrix $\mathsf{W}$.

$$ \mathsf{W} = \text{diag}(w_j) $$

The inverse (if it exists) can be directly calculated from the SVD:

$$ \mathsf{A}^{-1} = \mathsf{V} \text{diag}(1/w_j) \mathsf{U}^T $$

Solving ill-conditioned coupled linear equations¶

In [ ]:

import numpy as np

Non-singular matrix¶

Solve the linear system of equations

$$ \mathsf{A}\mathbf{x} = \mathbf{b} $$

Using the standard linear solver in numpy:

In [ ]:

A = np.array([
        [1, 2, 3],
        [3, 2, 1],
        [-1, -2, -6],
    ])
b = np.array([0, 1, -1])

In [ ]:

Using the inverse from SVD:

$$ \mathbf{x} = \mathsf{A}^{-1} \mathbf{b} $$

In [ ]:

First check that the SVD really factors $\mathsf{A} = \mathsf{U}\,\text{diag}(w_j)\,\mathsf{V}^{T}$:

In [ ]:

Now calculate the matrix inverse $\mathsf{A}^{-1} = \mathsf{V} \text{diag}(1/w_j) \mathsf{U}^T$:

In [ ]:

Check that this is the same that we get from numpy.linalg.inv():

In [ ]:

Now, finally solve (and check against numpy.linalg.solve()):

In [ ]:

Singular matrix¶

If the matrix $\mathsf{A}$ is singular (i.e., its rank (linearly independent rows or columns) is less than its dimension and hence the linear system of equation does not have a unique solution):

For example, the following matrix has the same row twice:

In [ ]:

C = np.array([
     [ 0.87119148,  0.9330127,  -0.9330127],
     [ 1.1160254,   0.04736717, -0.04736717],
     [ 1.1160254,   0.04736717, -0.04736717],
    ])
b1 = np.array([ 2.3674474,  -0.24813392, -0.24813392])
b2 = np.array([0, 1, 1])

NOTE: failure is not always that obvious: numerically, a matrix can be almost singular: Try solving the linear system of equations

$$ \mathsf{D}\mathbf{x} = \mathbf{b}_1 $$

with matrix $\mathsf{D}$ below:

In [ ]:

D = C.copy()
D[2, :] = C[0] - 3*C[1]
D

Solve:

In [ ]:

Note that some of the values are huge, and suspiciously like the inverse of machine precision? Sign of a nearly singular matrix.

Now back to the example with $\mathsf{C}$:

SVD for singular matrices¶

If a matrix is singular or near singular then one can still apply SVD.

One can then compute the pseudo inverse

\begin{align} \mathsf{A}^{-1} &= \mathsf{V} \text{diag}(\alpha_j) \mathsf{U}^T \\ \alpha_j &= \begin{cases} \frac{1}{w_j}, &\quad\text{if}\ w_j \neq 0\\ 0, &\quad\text{if}\ w_j = 0 \end{cases} \end{align}

i.e., any singular $w_j = 0$ is being "augmented" by setting

$$ \frac{1}{w_j} \rightarrow 0 \quad\text{if}\quad w_j = 0 $$

in $\text{diag}(1/w_j)$.

Perform the SVD for the singular matrix $\mathsf{C}$:

In [ ]:

Note the third value $w_2 \approx 0$: sign of a singular matrix.

Test that the SVD really decomposes $\mathsf{A} = \mathsf{U}\,\text{diag}(w_j)\,\mathsf{V}^{T}$:

In [ ]:

There are the singular values (let's say, $|w_i| < 10^{-12}$):

In [ ]:

Pseudo-inverse¶

Calculate the pseudo-inverse from the SVD

\begin{align} \mathsf{A}^{-1} &= \mathsf{V} \text{diag}(\alpha_j) \mathsf{U}^T \\ \alpha_j &= \begin{cases} \frac{1}{w_j}, &\quad\text{if}\ w_j \neq 0\\ 0, &\quad\text{if}\ w_j = 0 \end{cases} \end{align}

Augment:

In [ ]:

Now solve the linear problem with SVD:

In [ ]:

Thus, using the pseudo-inverse $\mathsf{C}^{-1}$ we can obtain solutions to the equation

$$ \mathsf{C} \mathbf{x}_1 = \mathbf{b}_1 $$

However, $\mathbf{x}_1$ is not the only solution: there's a whole line of solutions that are formed by the special solution and a combination of the basis vectors in the null space of the matrix:

The (right) kernel or null space contains all vectors $\mathbf{x^0}$ for which

$$ \mathsf{C} \mathbf{x^0} = 0 $$

(The dimension of the null space corresponds to the number of singular values.) You can find a basis that spans the null space. Any linear combination of null space basis vectors will also end up in the null space when $\mathbf{A}$ is applied to it.

Specifically, if $\mathbf{x}_1$ is a special solution and $\lambda_1 \mathbf{x}^0_1 + \lambda_2 \mathbf{x}^0_2 + \dots$ is a vector in the null space then

$$ \mathbf{x} = \mathbf{x}_1 + ( \lambda_1 \mathbf{x}^0_1 + \lambda_2 \mathbf{x}^0_2 + \dots ) $$

is also a solution because

$$ \mathsf{C} \mathbf{x} = \mathsf{C} \mathbf{x_1} + \mathsf{C} ( \lambda_1 \mathbf{x}^0_1 + \lambda_2 \mathbf{x}^0_2 + \dots ) = \mathsf{C} \mathbf{x_1} + 0 = \mathbf{b}_1 + 0 = \mathbf{b}_1 $$

The $\lambda_i$ are arbitrary real numbers and hence there is an infinite number of solutions.

In SVD:

The columns $U_{\cdot, i}$ of $\mathsf{U}$ (i.e. U.T[i] or U[:, i]) corresponding to non-zero $w_i$, i.e. $\{i : w_i \neq 0\}$, form the basis for the range of the matrix $\mathsf{A}$.
The columns $V_{\cdot, i}$ of $\mathsf{V}$ (i.e. V.T[i] or V[:, i]) corresponding to zero $w_i$, i.e. $\{i : w_i = 0\}$, form the basis for the null space of the matrix $\mathsf{A}$.

Note that x1 can be written as a linear combination of U.T[0] and U.T[1]:

$$ \mathbf{x}_1 = (\mathbf{x}_1\cdot \mathsf{U}^T_{0,.}) \mathsf{U}^T_{0,.} + (\mathbf{x}_1\cdot \mathsf{U}^T_{1,.}) \mathsf{U}^T_{1,.} $$

In [ ]:

Thus, all solutions are

x1 + lambda * VT[2]

The solution vector $x_2$ is in the null space:

In [ ]:

(For more details see the solution notebook.)

SVD for fewer equations than unknowns¶

$M$ equations for $N$ unknowns with $M < N$:

no unique solutions (underdetermined)
$N-M$ dimensional family of solutions
SVD: at least $N-M$ zero or negligible $w_j$: columns of $\mathsf{V}$ corresponding to singular $w_j$ span the solution space when added to a particular solution.

Same as the above Solving ill-conditioned coupled linear equations.

SVD for more equations than unknowns¶

$M$ equations for $N$ unknowns with $M > N$:

no exact solutions in general (overdetermined)
but: SVD can provide best solution in the least-square sense $$ \mathbf{x} = \mathsf{V}\, \text{diag}(1/w_j)\, \mathsf{U}^{T}\, \mathbf{b} $$ where
- $\mathbf{x}$ is a $N$-dimensional vector of the unknowns,
- $\mathsf{V}$ is a $N \times M$ matrix
- the $w_j$ form a square $M \times M$ matrix,
- $\mathsf{U}$ is a $N \times M$ matrix (and $\mathsf{U}^T$ is a $M \times N$ matrix), and
- $\mathbf{b}$ is the $M$-dimensional vector of the given values

It provides the $\mathbf{x}$ that minimizes the residual

$$ \mathbf{r} := |\mathsf{A}\mathbf{x} - \mathbf{b}|. $$

Linear least-squares fitting¶

This is the linear least-squares fitting problem: Given data points $(x_i, y_i)$, fit to a linear model $y(x)$, which can be any linear combination of functions of $x$.

For example:

$$ y(x) = a_1 + a_2 x + a_3 x^2 + \dots + a_M x^{M-1} $$

or in general $$ y(x) = \sum_{k=1}^M a_k X_k(x) $$

The goal is to determine the coefficients $a_k$.

Define the merit function $$ \chi^2 = \sum_{i=1}^N \left[ \frac{y_i - \sum_{k=1}^M a_k X_k(x_i)}{\sigma_i}\right]^2 $$ (sum of squared deviations, weighted with standard deviations $\sigma_i$ on the $y_i$).

Best parameters $a_k$ are the ones that minimize $\chi^2$.

Design matrix $\mathsf{A}$ ($N \times M$, $N \geq M$), vector of measurements $\mathbf{b}$ ($N$-dim) and parameter vector $\mathbf{a}$ ($M$-dim):

\begin{align} A_{ij} &= \frac{X_j(x_i)}{\sigma_i}\\ b_i &= \frac{y_i}{\sigma_i}\\ \mathbf{a} &= (a_1, a_2, \dots, a_M) \end{align}

Minimum occurs when the derivative vanishes: $$ 0 = \frac{\partial\chi^2}{\partial a_k} = \sum_{i=1}^N {\sigma_i}^{-2} \left[ y_i - \sum_{k=1}^M a_k X_k(x_i) \right] X_k(x_i), \quad 1 \leq k \leq M $$ ($M$ coupled equations) \begin{align} \sum_{j=1}^{M} \alpha_{kj} a_j &= \beta_k\\ \mathsf{\alpha}\mathbf{a} = \mathsf{\beta} \end{align} with the $M \times M$ matrix \begin{align} \alpha_{kj} &= \sum_{i=1}^N \frac{X_j(x_i) X_k(x_i)}{\sigma_i^2}\\ \mathsf{\alpha} &= \mathsf{A}^T \mathsf{A} \end{align} and the vector of length $M$ \begin{align} \beta_{k} &= \sum_{i=1}^N \frac{y_i X_k(x_i)}{\sigma_i^2}\\ \mathsf{\beta} &= \mathsf{A}^T \mathbf{b} \end{align}

The inverse of $\mathsf{\alpha}$ is related to the uncertainties in the parameters: $$ \mathsf{C} := \mathsf{\alpha}^{-1} $$ in particular $$ \sigma(a_i) = C_{ii} $$ (and the $C_{ij}$ are the co-variances).

Solution of the linear least-squares fitting problem with SVD¶

We need to solve the overdetermined system of $M$ coupled equations \begin{align} \sum_{j=1}^{M} \alpha_{kj} a_j &= \beta_k\\ \mathsf{\alpha}\mathbf{a} = \mathsf{\beta} \end{align}

SVD finds $\mathbf{a}$ that minimizes $$ \chi^2 = |\mathsf{A}\mathbf{a} - \mathbf{b}| $$

(proof in Numerical Recipes Ch 2.)

The errors are $$ \sigma^2(a_j) = \sum_{i=1}^{M} \left(\frac{V_{ji}}{w_i}\right)^2 $$ (see Numerical Recipes Ch. 15)

Example¶

Synthetic data

$$ y(x) = 3\sin x - 2\sin 3x + \sin 4x $$

with noise $r$ added (uniform in range $-5 < r < 5$).

In [ ]:

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
matplotlib.style.use('ggplot')

import numpy as np

In [ ]:

def signal(x, noise=0):
    r = np.random.uniform(-noise, noise, len(x))
    return 3*np.sin(x) - 2*np.sin(3*x) + np.sin(4*x) + r

In [ ]:

X = np.linspace(-10, 10, 500)
Y = signal(X, noise=5)

In [ ]:

plt.plot(X, Y, 'r-', X, signal(X, noise=0), 'k--')

In [ ]:

def fitfunc(x, a):
    return a[0]*np.cos(x) + a[1]*np.sin(x) + \
           a[2]*np.cos(2*x) + a[3]*np.sin(2*x) + \
           a[4]*np.cos(3*x) + a[5]*np.sin(3*x) + \
           a[6]*np.cos(4*x) + a[7]*np.sin(4*x)

def basisfuncs(x):
    return np.array([np.cos(x), np.sin(x), 
                     np.cos(2*x), np.sin(2*x), 
                     np.cos(3*x), np.sin(3*x), 
                     np.cos(4*x), np.sin(4*x)])

In [ ]:

M = 8
sigma = 1.
alpha = np.zeros((M, M))
beta = np.zeros(M)
for x in X:
    Xk = basisfuncs(x)
    for k in range(M):
        for j in range(M):
            alpha[k, j] += Xk[k]*Xk[j]
for x, y in zip(X, Y):
    beta += y * basisfuncs(x)/sigma

Now use SVD!

In [ ]:

In this case, the singular values do not immediately show if any basis functions are superfluous (this would be the case for values close to 0).

In [ ]:

... nevertheless, remember to routinely mask any singular values or close to singular values:

In [ ]:

Compare the fitted values to the original parameters $a_j = 0, +3, 0, 0, 0, -2, 0, +1$.

In [ ]:

plt.plot(X, fitfunc(X, a_values), 'b-', label="fit")
plt.plot(X, signal(X, noise=0), 'k--', label="signal")
plt.legend(loc="best", fontsize="small")

In [ ]: