One can always decompose a matrix $\mathsf{A}$
\begin{gather} \mathsf{A} = \mathsf{U}\,\text{diag}(w_j)\,\mathsf{V}^{T}\\ \mathsf{U}^T \mathsf{U} = \mathsf{U} \mathsf{U}^T = 1\\ \mathsf{V}^T \mathsf{V} = \mathsf{V} \mathsf{V}^T = 1 \end{gather}where $\mathsf{U}$ and $\mathsf{V}$ are orthogonal matrices and the $w_j$ are the singular values that are assembled into a diagonal matrix $\mathsf{W}$.
$$ \mathsf{W} = \text{diag}(w_j) $$The inverse (if it exists) can be directly calculated from the SVD:
$$ \mathsf{A}^{-1} = \mathsf{V} \text{diag}(1/w_j) \mathsf{U}^T $$import numpy as np
Solve the linear system of equations
$$ \mathsf{A}\mathbf{x} = \mathbf{b} $$Using the standard linear solver in numpy:
A = np.array([
[1, 2, 3],
[3, 2, 1],
[-1, -2, -6],
])
b = np.array([0, 1, -1])
Using the inverse from SVD:
$$ \mathbf{x} = \mathsf{A}^{-1} \mathbf{b} $$
First check that the SVD really factors $\mathsf{A} = \mathsf{U}\,\text{diag}(w_j)\,\mathsf{V}^{T}$:
Now calculate the matrix inverse $\mathsf{A}^{-1} = \mathsf{V} \text{diag}(1/w_j) \mathsf{U}^T$:
Check that this is the same that we get from numpy.linalg.inv()
:
Now, finally solve (and check against numpy.linalg.solve()
):
If the matrix $\mathsf{A}$ is singular (i.e., its rank (linearly independent rows or columns) is less than its dimension and hence the linear system of equation does not have a unique solution):
For example, the following matrix has the same row twice:
C = np.array([
[ 0.87119148, 0.9330127, -0.9330127],
[ 1.1160254, 0.04736717, -0.04736717],
[ 1.1160254, 0.04736717, -0.04736717],
])
b1 = np.array([ 2.3674474, -0.24813392, -0.24813392])
b2 = np.array([0, 1, 1])
NOTE: failure is not always that obvious: numerically, a matrix can be almost singular: Try solving the linear system of equations
$$ \mathsf{D}\mathbf{x} = \mathbf{b}_1 $$with matrix $\mathsf{D}$ below:
D = C.copy()
D[2, :] = C[0] - 3*C[1]
D
Solve:
Note that some of the values are huge, and suspiciously like the inverse of machine precision? Sign of a nearly singular matrix.
Now back to the example with $\mathsf{C}$:
If a matrix is singular or near singular then one can still apply SVD.
One can then compute the pseudo inverse
\begin{align} \mathsf{A}^{-1} &= \mathsf{V} \text{diag}(\alpha_j) \mathsf{U}^T \\ \alpha_j &= \begin{cases} \frac{1}{w_j}, &\quad\text{if}\ w_j \neq 0\\ 0, &\quad\text{if}\ w_j = 0 \end{cases} \end{align}i.e., any singular $w_j = 0$ is being "augmented" by setting
$$ \frac{1}{w_j} \rightarrow 0 \quad\text{if}\quad w_j = 0 $$in $\text{diag}(1/w_j)$.
Perform the SVD for the singular matrix $\mathsf{C}$:
Note the third value $w_2 \approx 0$: sign of a singular matrix.
Test that the SVD really decomposes $\mathsf{A} = \mathsf{U}\,\text{diag}(w_j)\,\mathsf{V}^{T}$:
There are the singular values (let's say, $|w_i| < 10^{-12}$):
Calculate the pseudo-inverse from the SVD
\begin{align} \mathsf{A}^{-1} &= \mathsf{V} \text{diag}(\alpha_j) \mathsf{U}^T \\ \alpha_j &= \begin{cases} \frac{1}{w_j}, &\quad\text{if}\ w_j \neq 0\\ 0, &\quad\text{if}\ w_j = 0 \end{cases} \end{align}Augment:
Now solve the linear problem with SVD:
Thus, using the pseudo-inverse $\mathsf{C}^{-1}$ we can obtain solutions to the equation
$$ \mathsf{C} \mathbf{x}_1 = \mathbf{b}_1 $$However, $\mathbf{x}_1$ is not the only solution: there's a whole line of solutions that are formed by the special solution and a combination of the basis vectors in the null space of the matrix:
The (right) kernel or null space contains all vectors $\mathbf{x^0}$ for which
$$ \mathsf{C} \mathbf{x^0} = 0 $$(The dimension of the null space corresponds to the number of singular values.) You can find a basis that spans the null space. Any linear combination of null space basis vectors will also end up in the null space when $\mathbf{A}$ is applied to it.
Specifically, if $\mathbf{x}_1$ is a special solution and $\lambda_1 \mathbf{x}^0_1 + \lambda_2 \mathbf{x}^0_2 + \dots$ is a vector in the null space then
$$ \mathbf{x} = \mathbf{x}_1 + ( \lambda_1 \mathbf{x}^0_1 + \lambda_2 \mathbf{x}^0_2 + \dots ) $$is also a solution because
$$ \mathsf{C} \mathbf{x} = \mathsf{C} \mathbf{x_1} + \mathsf{C} ( \lambda_1 \mathbf{x}^0_1 + \lambda_2 \mathbf{x}^0_2 + \dots ) = \mathsf{C} \mathbf{x_1} + 0 = \mathbf{b}_1 + 0 = \mathbf{b}_1 $$The $\lambda_i$ are arbitrary real numbers and hence there is an infinite number of solutions.
In SVD:
U.T[i]
or U[:, i]
) corresponding to non-zero $w_i$, i.e. $\{i : w_i \neq 0\}$, form the basis for the range of the matrix $\mathsf{A}$.V.T[i]
or V[:, i]
) corresponding to zero $w_i$, i.e. $\{i : w_i = 0\}$, form the basis for the null space of the matrix $\mathsf{A}$.Note that x1
can be written as a linear combination of U.T[0]
and U.T[1]
:
Thus, all solutions are
x1 + lambda * VT[2]
The solution vector $x_2$ is in the null space:
(For more details see the solution notebook.)
$M$ equations for $N$ unknowns with $M < N$:
Same as the above Solving ill-conditioned coupled linear equations.
$M$ equations for $N$ unknowns with $M > N$:
no exact solutions in general (overdetermined)
but: SVD can provide best solution in the least-square sense $$ \mathbf{x} = \mathsf{V}\, \text{diag}(1/w_j)\, \mathsf{U}^{T}\, \mathbf{b} $$ where
It provides the $\mathbf{x}$ that minimizes the residual
$$ \mathbf{r} := |\mathsf{A}\mathbf{x} - \mathbf{b}|. $$This is the linear least-squares fitting problem: Given data points $(x_i, y_i)$, fit to a linear model $y(x)$, which can be any linear combination of functions of $x$.
For example:
$$ y(x) = a_1 + a_2 x + a_3 x^2 + \dots + a_M x^{M-1} $$or in general $$ y(x) = \sum_{k=1}^M a_k X_k(x) $$
The goal is to determine the coefficients $a_k$.
Define the merit function $$ \chi^2 = \sum_{i=1}^N \left[ \frac{y_i - \sum_{k=1}^M a_k X_k(x_i)}{\sigma_i}\right]^2 $$ (sum of squared deviations, weighted with standard deviations $\sigma_i$ on the $y_i$).
Best parameters $a_k$ are the ones that minimize $\chi^2$.
Design matrix $\mathsf{A}$ ($N \times M$, $N \geq M$), vector of measurements $\mathbf{b}$ ($N$-dim) and parameter vector $\mathbf{a}$ ($M$-dim):
\begin{align} A_{ij} &= \frac{X_j(x_i)}{\sigma_i}\\ b_i &= \frac{y_i}{\sigma_i}\\ \mathbf{a} &= (a_1, a_2, \dots, a_M) \end{align}Minimum occurs when the derivative vanishes: $$ 0 = \frac{\partial\chi^2}{\partial a_k} = \sum_{i=1}^N {\sigma_i}^{-2} \left[ y_i - \sum_{k=1}^M a_k X_k(x_i) \right] X_k(x_i), \quad 1 \leq k \leq M $$ ($M$ coupled equations) \begin{align} \sum_{j=1}^{M} \alpha_{kj} a_j &= \beta_k\\ \mathsf{\alpha}\mathbf{a} = \mathsf{\beta} \end{align} with the $M \times M$ matrix \begin{align} \alpha_{kj} &= \sum_{i=1}^N \frac{X_j(x_i) X_k(x_i)}{\sigma_i^2}\\ \mathsf{\alpha} &= \mathsf{A}^T \mathsf{A} \end{align} and the vector of length $M$ \begin{align} \beta_{k} &= \sum_{i=1}^N \frac{y_i X_k(x_i)}{\sigma_i^2}\\ \mathsf{\beta} &= \mathsf{A}^T \mathbf{b} \end{align}
The inverse of $\mathsf{\alpha}$ is related to the uncertainties in the parameters: $$ \mathsf{C} := \mathsf{\alpha}^{-1} $$ in particular $$ \sigma(a_i) = C_{ii} $$ (and the $C_{ij}$ are the co-variances).
We need to solve the overdetermined system of $M$ coupled equations \begin{align} \sum_{j=1}^{M} \alpha_{kj} a_j &= \beta_k\\ \mathsf{\alpha}\mathbf{a} = \mathsf{\beta} \end{align}
SVD finds $\mathbf{a}$ that minimizes $$ \chi^2 = |\mathsf{A}\mathbf{a} - \mathbf{b}| $$
(proof in Numerical Recipes Ch 2.)
The errors are $$ \sigma^2(a_j) = \sum_{i=1}^{M} \left(\frac{V_{ji}}{w_i}\right)^2 $$ (see Numerical Recipes Ch. 15)
Synthetic data
$$ y(x) = 3\sin x - 2\sin 3x + \sin 4x $$with noise $r$ added (uniform in range $-5 < r < 5$).
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
matplotlib.style.use('ggplot')
import numpy as np
def signal(x, noise=0):
r = np.random.uniform(-noise, noise, len(x))
return 3*np.sin(x) - 2*np.sin(3*x) + np.sin(4*x) + r
X = np.linspace(-10, 10, 500)
Y = signal(X, noise=5)
plt.plot(X, Y, 'r-', X, signal(X, noise=0), 'k--')
def fitfunc(x, a):
return a[0]*np.cos(x) + a[1]*np.sin(x) + \
a[2]*np.cos(2*x) + a[3]*np.sin(2*x) + \
a[4]*np.cos(3*x) + a[5]*np.sin(3*x) + \
a[6]*np.cos(4*x) + a[7]*np.sin(4*x)
def basisfuncs(x):
return np.array([np.cos(x), np.sin(x),
np.cos(2*x), np.sin(2*x),
np.cos(3*x), np.sin(3*x),
np.cos(4*x), np.sin(4*x)])
M = 8
sigma = 1.
alpha = np.zeros((M, M))
beta = np.zeros(M)
for x in X:
Xk = basisfuncs(x)
for k in range(M):
for j in range(M):
alpha[k, j] += Xk[k]*Xk[j]
for x, y in zip(X, Y):
beta += y * basisfuncs(x)/sigma
Now use SVD!
In this case, the singular values do not immediately show if any basis functions are superfluous (this would be the case for values close to 0).
w
... nevertheless, remember to routinely mask any singular values or close to singular values:
Compare the fitted values to the original parameters $a_j = 0, +3, 0, 0, 0, -2, 0, +1$.
plt.plot(X, fitfunc(X, a_values), 'b-', label="fit")
plt.plot(X, signal(X, noise=0), 'k--', label="signal")
plt.legend(loc="best", fontsize="small")