Learning Objectives: using external libraries, matrix manipulation, SVD.
Singular value decomposition is a method for finding the pseudo-inverse of non-square and singular matrices. It provides a straightforward way of performing least-squares parameter fitting when the model is linear in its parameters. A useful introduction can be found in Numerical Recipes (W.H Press et al.)
Theorem: The NxM matrix X can be decomposed as $$\mathsf{X} = \mathsf{UsV},$$ where $\mathsf{U}$ is NxM and column orthogonal, $\mathsf{s}$ is MxM and diagonal with non-negative entries and $\mathsf{V}$ is MxM and row orthogonal. The pseudo-inverse of $\mathsf{X}$ is then $$\mathsf{X^{-1}} = \mathsf{V^Ts^{-1}U^T},$$ with the caveat that the entries of $\mathsf{s^{-1}}$ are 0 if the corresponding value in s is 0.
import numpy as np
import copy
#create a matrix, X
X = np.array([[1.0,2.0,3.0],[4.0,5.0,6.0],[7.0,8.0,9.0],[10.0,11.0,12.0]])
def pseudo_inverse(X):
#Perform SVD, storing the arrays as U, s, V
U, s, V = np.linalg.svd(X, full_matrices=0)
#full_matrices=False gives the above definition of SVD
print "Singular values = " + str(s)
#calculate inverses of the matrices
U_T = np.transpose(U)
V_T = np.transpose(V)
#inverse of s...
inv_s = []
for singular_value in s:
if singular_value < 1.0e-12: #some small cutoff
inv_s.append(0.0)
else:
inv_s.append(1.0/singular_value)
inv_s = np.diag(inv_s) #convert from list to diagonal matrix
#pseudo-inverse of X...
inv_X = np.dot(V_T, np.dot(inv_s, U_T))
return inv_X
inv_X = pseudo_inverse(X)
Singular values = [ 2.54624074e+01 1.29066168e+00 2.35731836e-15]
The smallest singular value is effectively zero, it is of the order of the computer's rounding error. In order to find the pseudo-inverse of $\mathsf{A}$, this singular value's reciprocal must be set to zero. The function pseudo_inverse
includes this step.
print inv_X
[[-0.48333333 -0.24444444 -0.00555556 0.23333333] [-0.03333333 -0.01111111 0.01111111 0.03333333] [ 0.41666667 0.22222222 0.02777778 -0.16666667]]
The pseudo-inverse of the matrix $\mathsf{A}$ has now been calculated for an overdetermined system. The least-squares solution (see below) to $$\mathsf{y} = \mathsf{X}.\mathsf{a}$$ is simply $$\mathsf{a} = \mathsf{X^{-1}}.\mathsf{y},$$ where $\mathsf{X^{-1}}$ is the pseudo-inverse of $\mathsf{X}$. This vector $\mathsf{a}$ is the solution in the sense that $$(\mathsf{y}-\mathsf{X}.\mathsf{a})^2$$ is minimised. This property makes singular value decomposition useful for least-squares problems.
Suppose that $\mathsf{a}$ are the parameters of the model $\mathsf{y} = \mathsf{X}.\mathsf{a}$, $\mathsf{y}$ are measurements of the independent variable and $\mathsf{X}$ is a matrix of functions of the dependent variables. In the case of equal Gaussian errors on the $y_i$, we then seek $\mathsf{a}$ which minimises $$\chi^2 = (\mathsf{y}-\mathsf{X}.\mathsf{a})^2.$$ Clearly, this can be done using singular value decomposition. In the case that the errors, $\sigma_i,$ on the measurements are different, the matrix $\mathsf{X}$ must be replaced by $\mathsf{A}$, the design matrix, in order to weight the measurements: $$A_{ij} = \frac{X_{ij}}{\sigma_{i}},$$ the vector $\mathsf{y}$ is replaced by $\mathsf{b}$: $$b_{ij} = \frac{y_{ij}}{\sigma_{i}}.$$ The pseudo-inverse of $\mathsf{A}$ can then be used to find the parameters $\mathsf{a}$: $$\mathsf{a} = \mathsf{A^{-1}}.\mathsf{b},$$
#Now include some errors
#choose some vector of independent variables, b, and their errors, sigma
#the system is overdetermined, so this does not necessarily give an exact set of parameters!
y = [14.3, 31.8, 49.1, 68.5]
sigma = [0.4, 0.3, 0.6, 0.2]
b = [y[i]/sigma[i] for i in range(len(y))]
#create the design matrix
A = np.zeros(X.shape)
for i in range(X.shape[0]):
for j in range(X.shape[1]):
A[i][j] = X[i][j]/sigma[i]
#pseudo_inverse of design matrix
inv_A = pseudo_inverse(A)
#parameters which minimise chi-squared
a = np.dot(inv_A, b)
print "Parameters = " + str(a)
chi = np.linalg.norm(b - np.dot(A, a))
print "Chi-statistic = " + str(chi)
Singular values = [ 1.02921777e+02 3.74863557e+00 3.62710615e-15] Parameters = [ 1.14826608 2.01942306 2.89058005] Chi-statistic = 2.36148830442
At low temperatures, the heat capacity of a metal can be modelled as $$C = a_0 T + a_1 T^3,$$ where T is temperature and the parameters a_0 and a_1 reflect the contributions of conduction electrons and lattice vibrations, respectively.
Given the temperature (K) and heat capacity (JK$^{-1}$kg$^{-1}$) data below, find the model's parameters.
Plot the model and data as a straight line. (answers: $a_0$ = 2.043 JK$^{-2}$kg$^{-1}$, $a_1$ = 5.789e-3 JK$^{-4}$kg$^{-1}$)
T = [0.1, 2.1, 4.1, 6.1, 8.1, 10.1, 12.1, 14.1, 16.1, 18.1, 20.1]
C = [0.368, 4.565, 10.119, 13.472, 20.375, 26.356, 32.908, 45.0275, 56.946, 70.921, 89.089]
C_errors = [0.163, 0.389, 0.504, 0.593, 0.669, 0.735, 0.795, 0.850, 0.902, 0.950, 0.996]