Author: David Wood, d.wood0403@gmail.com; John Hunt, john.hunt92@gmail.com
The factorization is named after Andre-Louis Cholesky, a French military officer involved in geodetics engineering. Andre-Louis Cholesky lived from 1875 to 1918, killed in World War I (WWI). He attented the renowned university École Polytechnique in France, where the first discoveries about radioactivity occurred. Cholesky survyed Crete and North Africa before WWI. Before the war Cholesky created and used the Cholesky decomposition in his survying.
The Cholesky decomposition is the decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for efficient numerical solutions, e.g. Monte Carlo simulations and systems with multiple correlated variables (a system with independant variables would result in matrices that are not Hermitian, positive-deffinite and thus non solvable with this method).
The Cholesky decomposition is also closely related to the solution of least-squares problems, since the normal equations that characterize the least-squares solution have a symmetric positive-definite coefficient matrix. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations. It can also be used to calculate and use correlation matrices in experiment statistics. In the case of a scalar (n = 1), the Cholesky factor R is just the positive square root of A. However, R should not be confused with the square roots of A.
The Cholesky factorization can be thought of, with caution (see above), as finding the general square root of matrix A. Using this type of factorization, we factor A into a single matrix L multiplied by itself A=LLT or A=RTR. In the case of a scalar (n = 1), the Cholesky factor R is just the positive square root of A. However, R should not be confused with the square roots of A.
Every positive definite matrix A has a Cholesky factorization if there exists a lower trianglar matrix L that contains positive diagonal elements such that A=LLT. In summary, Cholesky factorization is a specific form of LU decomposition where U=LT. This fact makes computing L much easier.
Theorem : Every positive definite matrix A has a Cholesky factorization which may be constructed.
Proof: Using induction on an n−1 × n−1 matrix, we can prove the result for the trivial 1 × 1 positive definite matrix A and extend it to an n × n matrix. For the 1 × 1 positive definite matrix A=[a11], a11>0. Therefore L=[l11] where l11=sqrt(a11). This result then can be extended to any positive definite matrix A.
Since A is positive definite, it is symmetric and can be represented as
A=[a11AT21A21A22]Since A is positive definite, a11>0, and thus we can easily determine that l11=√a11 and L21=1l11 A21.
We can now show that
K=A22 − L21LT21is a positive definite matrix as well. For any n−1 × 1 column vector X≠0 and let
V=−1a11AT21XThen
XTKX = [V XT]A[VX] > 0because A is positive definite. Since
A22−L21LT21
is a positive definite n−1 × n−1 matrix, there is a lower triangular matrix that contains all positive elements such that
A22−L21LT21=L22LT22Therefore
A=[a11AT21A21A22]=[l110L21L22][l11LT210LT22]Complete Algebra Using a 3x3 matrix, we will show the Cholesky factorization in its entirety which may be coded. As an example, we have to solve the following system of equations:
A=[a11a21a31a21a22a32a31a32a33]=[l1100l21l220l31l32l33][l11l21l310l22l3200l33]=LLT=[l211l11l21l11a31l11l21l221+l222l31l21+l32l22l11l31l31l21+l32l22l231+l232+l233]
We can easily see that the diagonal elements have a pattern. In general:
lii=√aii−i−1∑j=1l2ij
For off-diagonal elements (lij) there is also a pattern:
l21=1l11a21which in general terms is:
lij=1ljj(aij−j−1∑k=1likljk)i>jUse in linear algebra To solve linear systems of equations, Cholesky factorization can be used when computing A−1 is difficult or computationally expensive - as is the case for large systems. To solve equations in Ax=b form, we only have to perform one intermediate step before solving for x. This comes from the fact that once A is factored into LLT we can redefine the equation with a temporary vector y
Ax = L(LTx) = Ly = bOnce A is factored into LLT we must use forward substitution to solve for the temporary vector y using
Ly = [l110L21L22][y1y2] = [b1b2] = bThen, having found y, the next step is use backward substitution to solve for x in
LTx = [l11L210L22][x1x2] = [y1y2] = yBoth forward substitution and backward substitution can be performed by directly solving each equation in the system sequentially.
Additionally, the Cholesky factorization could also be written as A=UHU. Where U = LH. Here U is an upper triangle matrix instead of a lower triangle matrix. This could be useful if a solver for an upper triangle has already been implemented then the order of the matrix can change my taking the Hermitian of the L matrix.
Example:
A=[12412131784172916181630]=LLTLH=U=[1241033200230004]
A=[12412131784172916181630]=UHU=[1000230043201234][1241033200230004]
By inspection it can be seen that UHU = LLH
from math import sqrt
#from pprint import pprint
def cholesky(A):
L = [[0.0] * len(A) for _ in range(len(A))]
for i, (Ai, Li) in enumerate(zip(A, L)):
for j, Lj in enumerate(L[:i+1]):
s = sum(Li[k] * Lj[k] for k in range(j))
Li[j] = sqrt(Ai[i] - s) if (i == j) else \
(1.0 / Lj[j] * (Ai[j] - s))
return L
A = [[6, 3, 4, 8], [3, 6, 5, 1], [4, 5, 10, 7], [8, 1, 7, 25]]
L = cholesky(A)
print("A:")
print(A, '\n\n')
print("L:")
print(L, '\n\n')
A: [[6, 3, 4, 8], [3, 6, 5, 1], [4, 5, 10, 7], [8, 1, 7, 25]] L: [[2.449489742783178, 0.0, 0.0, 0.0], [1.2247448713915892, 2.1213203435596424, 0.0, 0.0], [1.6329931618554523, 1.414213562373095, 2.309401076758503, 0.0], [3.2659863237109046, -1.4142135623730956, 1.5877132402714704, 3.1324910215354165]]
In its entirely, explicitly coding Cholesky factorization can be an arduous task and greatly taxes a computer's computational capacity if A becomes too large. Thankfully, the SciPy library has a built in Cholesky decomposition function - scipy.linalg.cholesky - that does the calculations while requiring only a fraction of the computation power. Below is the same example using SciPy's built in functions.
#import pprint
import scipy
import scipy.linalg # SciPy Linear Algebra Library
A = scipy.array([[6, 3, 4, 8], [3, 6, 5, 1], [4, 5, 10, 7], [8, 1, 7, 25]])
L = scipy.linalg.cholesky(A, lower=True)
U = scipy.linalg.cholesky(A, lower=False)
print("A:")
print(A, '\n\n')
print("L:")
print(L, '\n\n')
print("U:")
print(U, '\n\n')
A: [[ 6 3 4 8] [ 3 6 5 1] [ 4 5 10 7] [ 8 1 7 25]] L: [[ 2.44948974 0. 0. 0. ] [ 1.22474487 2.12132034 0. 0. ] [ 1.63299316 1.41421356 2.30940108 0. ] [ 3.26598632 -1.41421356 1.58771324 3.13249102]] U: [[ 2.44948974 1.22474487 1.63299316 3.26598632] [ 0. 2.12132034 1.41421356 -1.41421356] [ 0. 0. 2.30940108 1.58771324] [ 0. 0. 0. 3.13249102]]
As you can see, both implementations of Cholesky factorization gave the same answer. However, in the example were we created the Cholesky function we only have the lower triangle matrix. The transposed lower triangle matrix may also be calculated with additional coding.
%% Cholesky Factorization
% -- User Information (Matrix and olution vector)-- %
B = [1 2 4 1; 2 13 17 8; 4 17 29 16; 1 8 16 30];
b = [2;2;2;2];
% B = [6 15 55; 15 55 225; 55 225 979];
[row, col] = size(B);
[V, D] = eig(B);
d = [];
for i=1:length(D)
d = [d;D(i,i)];
end
if sum(d>0)==length(d) && row*row == sum(sum(B - B' == 0))% -- Check if matrix is PSD & Symmetry -- %
L = zeros(row,col);
for i = 1:row
for j = 1:col
if j > i
L(i,j) = 0;
elseif j == i
num = 0;
for k = 1:row-1
num = num + L(i,k)^2;
end
L(i,j) = sqrt(B(i,j) - num);
else
num = 0;
for k = 1:row-1
num = num + L(j,k)*L(i,k);
end
L(i,j) = (B(i,j)-num)/L(j,j);
end
end
end
else
disp('Cannot use Cholesky Factorization b/c it is not symmetric or it is not positive semi definite')
end
% -- Solve -- %
y = L(b);
x = L'\y
ANSWER
L = [1,0,0,0;2,3,0,0;4,3,2,0;1,2,3,4]
x = [5.97222222222222; 1.15972222222222; -1.68750000000000; 0.458333333333333]
In addition Matlab has prebuilt functions as well, below is an example using those functions.
A = [1 2 4 1; 2 13 17 8; 4 17 29 16; 1 8 16 30]
(or you can read in data from a file and continue)
L = chol(A)
y = L(b);
x = L'\y
The answer is the same as the example above.
Two main uses of the Cholesky decomposition in engineering are to solve a least squares problem - because the Cholesky factorization requires less computational power to compute and solve than QR or LU - and random sampling of a multivariate Normal distribution. It can be used to decompose a covariance matrix because these matrices may only be positive semi-definite - one of the prerequisites to using the Cholesky decomposition.
Here we will solve a problem for x that is defined by a system of 3 equations.
Example 2: A system is defined by the equations below
[4214217−514−583][x1x2x3]=[14−101155]Using Cholesky decomposition, we find
L=[2001407−35]First solving Ly=b with forward substitution
y=[7−275]then solving LTy=b with backward substitution, finally gives a result of
x=[3−61]Example 2: We can also use the factorization to generate a random sampling from the multivariate Normal distribution Z ≈ N(0,I) where the affine transformation
Y = QX + μis to be approximated. The variable Y are the random samples that are generated, X are the samples from the known univariate Normal distribution, and Q is defined as follows:
Q = λ1/2Φwhere λ is a diagonal matrix containing the eigenvalues of the target matrix Σ, and Φ contains the associated eigenvectors of Σ arranged to match their eigenvalues in λ.
Note that Y has the distribution
Y ≈ N(μ,QQ′)We use the Cholesky decompositon because we want Σ=QQ′. Therefore, to simulate (or sample) Z we can use X ≈ N(μ,Σ) which is less computationally expensive to calculate.
Use the Cholesky decomposition to transform 10 normal distributed independent (and thus uncorrelated) random variables with mean of 0 and variance of 1 - N(0,1) to correlated variables in using Matlab or python libraries with the following covariance matrix (please copy and paste this in Matlab or python)
Kxx=[1235−11−25812131837485372253185046204126563717537461111827231217516−142018556150635040184127619188909660−25262350881431041168653756121639010422319911082237755096116199285175151716406086110175230]and
SD=[13.605551275463997.0710678118654810.53565375285277.416198487095669.5393920141694611.958260743101414.933184523068116.881943016134115.1657508881031] (Maybe put a file on LS?)
Please turn in the following:
A plot of the uncorrelated variables as well as the correlated variables (using subplot in Matlab or pairplot from seaborn library in python will be helpful).
An explination of why your plots shows that applying the cholesky decompotision correlates the random variables.
MATLAB CODE
% Solution
mu = 0; sigma = 1; sz = 10;
x_uncor = normrnd(mu,sigma,sz);
% Uncorrelated Plots
figure(1); count = 1;
for i = 1:10
for j = 1:10
subplot(10,10,count)
scatter(x_uncor(:,i),x_uncor(:,j))
count = count + 1;
end
end
L = chol(Kxx);
x_cor = L*Kxx;
% Correlated Plots
figure(2); count = 1;
for i = 1:10
for j = 1:10
subplot(10,10,count)
scatter(x_cor(:,i),x_cor(:,j))
count = count + 1;
end
end