Notebook

Singular Value Decomposition¶

Learning Objectives: using external libraries, matrix manipulation, SVD.

Singular value decomposition is a method for finding the pseudo-inverse of non-square and singular matrices. It provides a straightforward way of performing least-squares parameter fitting when the model is linear in its parameters. A useful introduction can be found in Numerical Recipes (W.H Press et al.)

Theorem: The NxM matrix X can be decomposed as $$\mathsf{X} = \mathsf{UsV},$$ where $\mathsf{U}$ is NxM and column orthogonal, $\mathsf{s}$ is MxM and diagonal with non-negative entries and $\mathsf{V}$ is MxM and row orthogonal. The pseudo-inverse of $\mathsf{X}$ is then $$\mathsf{X^{-1}} = \mathsf{V^Ts^{-1}U^T},$$ with the caveat that the entries of $\mathsf{s^{-1}}$ are 0 if the corresponding value in s is 0.

In [21]:

import numpy as np
import copy

#create a matrix, X
X = np.array([[1.0,2.0,3.0],[4.0,5.0,6.0],[7.0,8.0,9.0],[10.0,11.0,12.0]])

def pseudo_inverse(X):
    #Perform SVD, storing the arrays as U, s, V
    U, s, V = np.linalg.svd(X, full_matrices=0)
    #full_matrices=False gives the above definition of SVD
    print "Singular values = " + str(s)
    
    #calculate inverses of the matrices
    U_T = np.transpose(U)
    V_T = np.transpose(V)
    #inverse of s...
    inv_s = []
    for singular_value in s:
        if singular_value < 1.0e-12: #some small cutoff
            inv_s.append(0.0)
        else:
            inv_s.append(1.0/singular_value)
    inv_s = np.diag(inv_s) #convert from list to diagonal matrix
    
    #pseudo-inverse of X...
    inv_X = np.dot(V_T, np.dot(inv_s, U_T))
    return inv_X


inv_X = pseudo_inverse(X)

Singular values = [  2.54624074e+01   1.29066168e+00   2.35731836e-15]

The smallest singular value is effectively zero, it is of the order of the computer's rounding error. In order to find the pseudo-inverse of $\mathsf{A}$, this singular value's reciprocal must be set to zero. The function pseudo_inverse includes this step.

In [22]:

print inv_X

[[-0.48333333 -0.24444444 -0.00555556  0.23333333]
 [-0.03333333 -0.01111111  0.01111111  0.03333333]
 [ 0.41666667  0.22222222  0.02777778 -0.16666667]]

The pseudo-inverse of the matrix $\mathsf{A}$ has now been calculated for an overdetermined system. The least-squares solution (see below) to $$\mathsf{y} = \mathsf{X}.\mathsf{a}$$ is simply $$\mathsf{a} = \mathsf{X^{-1}}.\mathsf{y},$$ where $\mathsf{X^{-1}}$ is the pseudo-inverse of $\mathsf{X}$. This vector $\mathsf{a}$ is the solution in the sense that $$(\mathsf{y}-\mathsf{X}.\mathsf{a})^2$$ is minimised. This property makes singular value decomposition useful for least-squares problems.

Suppose that $\mathsf{a}$ are the parameters of the model $\mathsf{y} = \mathsf{X}.\mathsf{a}$, $\mathsf{y}$ are measurements of the independent variable and $\mathsf{X}$ is a matrix of functions of the dependent variables. In the case of equal Gaussian errors on the $y_i$, we then seek $\mathsf{a}$ which minimises $$\chi^2 = (\mathsf{y}-\mathsf{X}.\mathsf{a})^2.$$ Clearly, this can be done using singular value decomposition. In the case that the errors, $\sigma_i,$ on the measurements are different, the matrix $\mathsf{X}$ must be replaced by $\mathsf{A}$, the design matrix, in order to weight the measurements: $$A_{ij} = \frac{X_{ij}}{\sigma_{i}},$$ the vector $\mathsf{y}$ is replaced by $\mathsf{b}$: $$b_{ij} = \frac{y_{ij}}{\sigma_{i}}.$$ The pseudo-inverse of $\mathsf{A}$ can then be used to find the parameters $\mathsf{a}$: $$\mathsf{a} = \mathsf{A^{-1}}.\mathsf{b},$$

In [23]:

#Now include some errors
#choose some vector of independent variables, b, and their errors, sigma
#the system is overdetermined, so this does not necessarily give an exact set of parameters!
y = [14.3, 31.8, 49.1, 68.5]
sigma = [0.4, 0.3, 0.6, 0.2]
b = [y[i]/sigma[i] for i in range(len(y))]

#create the design matrix
A = np.zeros(X.shape)
for i in range(X.shape[0]):
    for j in range(X.shape[1]):
        A[i][j] = X[i][j]/sigma[i]

#pseudo_inverse of design matrix
inv_A = pseudo_inverse(A)

#parameters which minimise chi-squared
a = np.dot(inv_A, b)
print "Parameters = " + str(a)

chi = np.linalg.norm(b - np.dot(A, a))
print "Chi-statistic = " + str(chi)  

Singular values = [  1.02921777e+02   3.74863557e+00   3.62710615e-15]
Parameters = [ 1.14826608  2.01942306  2.89058005]
Chi-statistic = 2.36148830442

Investigation: low temperature heat capacity of a metal¶

At low temperatures, the heat capacity of a metal can be modelled as $$C = a_0 T + a_1 T^3,$$ where T is temperature and the parameters a_0 and a_1 reflect the contributions of conduction electrons and lattice vibrations, respectively.

Given the temperature (K) and heat capacity (JK$^{-1}$kg$^{-1}$) data below, find the model's parameters.

Plot the model and data as a straight line. (answers: $a_0$ = 2.043 JK$^{-2}$kg$^{-1}$, $a_1$ = 5.789e-3 JK$^{-4}$kg$^{-1}$)

In [24]:

T = [0.1, 2.1, 4.1, 6.1, 8.1, 10.1, 12.1, 14.1, 16.1, 18.1, 20.1]
C = [0.368, 4.565, 10.119, 13.472, 20.375, 26.356, 32.908, 45.0275, 56.946, 70.921, 89.089]
C_errors = [0.163, 0.389, 0.504, 0.593, 0.669, 0.735, 0.795, 0.850, 0.902, 0.950, 0.996]