Author: Mat Haskell - mhaskell9@gmail.com
Eigen values and eigen vectors are very cool and show up all over the place. Eigenvalues and eigenvectors exist generally for linear operators, but this notebook will just look at matrices. The basic idea is that certain vectors exist where, after the matrix operates on them, the direction of the vector has not changed. The magnitude can be changed, but not the direction. These special vectors are eigenvectors of the operator. This makes sense when thinking about a rotation matrix. If you try to rotate a vector purely in the x-direction about the x-axis, it does not rotate. This means that the axis of rotation must be an eigenvector of the rotation matrix since it cannot change the direction of the vector it is operating on.
Another way to think about eigenvalues is that there are special cases where matrix multiplication is turned into scalar multiplication. This seems strange since the output of matrix multiplication is not really intuitive by just looking at the equation. Matrices are linear operators that can be used to accomplish various tasks. However, when multiplying a matrix by one of its eigenvectors, that vector will just be scaled by the value of the corresponding eigenvalue. The eigenvector can grow, shrink, and flip directions (i.e. eigenvalues can be negative). Eigenvalues can even be zero, in which case the eigenvector multiplied by the matrix will be scaled down to zero.
The physical meaning of eigenvalues and eigenvectors is also cool. For special symmetric matrices, like inertia tensors, stress tensors, and strain tensors, the eigenvectors are the principal axes of the matrix. So if you were to rotate the original matrix axes to line up with the eigenvectors (or principal axes), all of the cross terms would go away leaving only numbers along the main diagonal. Also, the eigenvalues are the principal stresses/strains of a stress/strain tensor. Another physical meaning comes with differential and difference equations. Eigenvalues are the poles of the system and show stability based off if the eigenvalues are positive or negative. The last example I will include here is with Google. I don't know their algorithm, but I do know that Google's search engine optimization algorithm forms a giant matrix of info regarding links between websites and they use eigenvalues and eigenvectors to determine which websites are most closely related to your search. So eigenvalues are the magic that Google uses to make sure you don't have to look past the first few links to find what you are looking for!
Eigenvalues only exist for square matrices. A∈Cn×n
Let x∈Cn×1 be an eigenvector of A and λ∈C be the corresponding eigenvalue of A, then the idea that matrix multiplication is turned into scalar multiplication yields the equation: Ax=λx
Here are the formal definitions for eigen pairs:
Def: (λ,x) is a right eigen pair if Ax=λx.
Def: (λ,x) is a left eigen pair if xHAH=λxH.
Most of the time we deal with right eigen pairs. So here is how to actually solve for the eigenvalues. Before any computational work, we need to rearrange the equation Ax=λx into a useful form.
Note: some textbooks write it as (λI−A)x=0, but they are equivalent.
We can see that x∈N(A−λI). Since there is a non-trivial null space we know that (A−λI) is not full rank, which implies that det(A−λI)=0. We will use this fact to solve for the eigenvalues of A.
Here are the computational steps for finding the eigenvalues of A:
Note: In practice, you will most likely never write an algorithm that finds eigenvalues. There are existing software tools to find eigenvalues (e.g. the eig() function in Matlab and Python). I would strongly recommend using those tools and not going through any of this process by hand! If you really wanted to find eigenvalues on paper, you could try polynomial division to get the roots. This can get tricky pretty fast as the number of roots increases (there is a lot of guessing and checking). Also, writing your own algorithm to find eigenvalues is more difficult than it might appear and you would need to do more research on your own to learn how. This is because numerical root finding algorithms typically only find real roots. Writing an algorithm to find complex roots is tricky and will not be discussed here. In fact, I believe that complex root finding tools actually use eigenvalue solvers to get the roots. Anyways, the Gershgorin Circle Theorem and Cauchy-Riemann Equations would be a good place to start your research.
Note: we can write XA with all of the eigenvalues factored out (which would be the result of polynomial division):
XA(λ)=(λ−λ1)(λ−λ2)⋯(λ−λn)=Πni=0(λ−λi)Def: Algebraic multiplicity, mi, of λi is the number of times it is repeated. So XA can be written like this: XA(λ)=(λ−λ1)m1(λ−λ2)m2⋯(λ−λp)mp, p≤n
After finding the roots/eigenvalues, we can solve for the eigenvectors by plugging in each eigenvalue into the equation (A−λiI)x=0. You are just solving for the null space of (A−λI).
Note: any linear combination of vectors from N(A−λI) are also eigenvectors. Because any scalar multiple of an eigenvector is also an eigenvector, most software libraries will return the eigenvectors with a norm of 1.
Find the eigenvalues and eigenvectors of A by following the steps above. A=(4−25−7)
The order of assigning λ1 and λ2 doesn't really matter. I will use these: λ1=3
Neither of these are repeated, so the algebraic multiplicity of both are just 1. m1=1
Let's continue with the example and find the eigenvectors of A.
Notice that both rows of A when multiplied by x give the same equation: x1−2x2=0⇒x1=2x2
Notice that both rows of A when multiplied by x give the same equation: 5x1−x2=0⇒x2=5x1
The eigenvalue decomposition of A is as such: A=SΛS−1
Where, Λ=diag([λ1,λ2,⋯,λn])=(λ10⋯00λ2⋯0⋮⋮⋱000⋯λn)
Note: it does not matter in which order the eigenvalues are placed to form Λ. However, for A=SΛS−1 to be true, the position of the column of λi must match the position of the column of vi (where vi is an eigenvector corresponding to λi).
Def: Geometric multiplicity, qi, of λi is the number of linearly independent eigenvectors that can be formed from λi. Also, 1≤qi≤mi. This means at least 1 eigenvector can be formed from every eigenvalue, but at most mi. For example: if an eigenvalue is repeated 3 times, there will be at most 3 corresponding linearly independent eigenvectors but at least 1 eigenvector.
The geometric multiplicity of an eigenvalue is a very important concept for eigenvalue decomposition. In order to perform the eigenvalue decomposition, as described above where A=SΛS−1, the geomectric multiplicity must be equal to the algebraic multiplicity for each eigenvalue. i.e.
mi=qi ∀λiThis means that however many times an eigenvalue is repeated, there must be that same number of linearly independent eigenvectors derived from that eigenvalue. Otherwise, there isn't enough eigenvectors to fill the columns of S.
When mi≠qi, there is a process to find what are called generalized eigenvectors to fill up the missing columns of S. When this happens, Λ is replaced with J and is not strictly diagonal. This is called the Jordan form and is a topic for another day.
import numpy as np
class Test:
def __init__(self):
self.i = 1
def runTest(self,check,statement):
print('Test %d: ' % self.i)
self.i += 1
if check:
print(statement + ' \n')
else:
print('error \n')
def printEigen(vals,vecs):
print('Eigenvalues are {:.2f}, {:.2f}, and {:.2f}. \n'.format(vals[0],vals[1],vals[2]))
print('The eigenvector corresponding to {:.2f} is:'.format(vals[0]))
print(vecs[:,0][:,None])
print('The eigenvector corresponding to {:.2f} is:'.format(vals[1]))
print(vecs[:,1][:,None])
print('The eigenvector corresponding to {:.2f} is:'.format(vals[2]))
print(vecs[:,2][:,None])
print('') # add a blank line at the end
# this is the matrix we used above
A = np.array([[4,-2],
[5,-7]])
# these are the eigenvalues we found
l1 = 3
l2 = -6
# these are the eigenvectors we found
v1 = np.array([[2],
[1]])
v2 = np.array([[1],
[5]])
# check that Ax-lx=0
# note that numerical precision might limit this to being very close to true
tests = Test()
test1 = np.array_equal(A@v1, l1*v1)
tests.runTest(test1,'v1 and l1 are a right eigen pair!')
test2 = np.array_equal(A@v2, l2*v2)
tests.runTest(test2,'v2 and l2 are a right eigen pair!')
# eigenvalue decomposition
L = np.diag([l1, l2])
S = np.concatenate((v1,v2), axis=1)
S_inv = np.linalg.inv(S)
# test that it works
test3 = np.array_equal(A, S@L@S_inv)
tests.runTest(test3,'A = S*L*S_inv!')
# now let's test that the order doesn't matter
L = np.diag([l2, l1])
S = np.concatenate((v2,v1), axis=1)
S_inv = np.linalg.inv(S)
test4 = np.array_equal(A, S@L@S_inv)
tests.runTest(test4,'It still works when the order is switched!')
# test that trace equals sum of eigenvalues
test5 = np.trace(A)==l1+l2
tests.runTest(test5,'The trace equals sum of eigenvalues!')
# test that determinant equals product of eigenvalues
test6 = np.linalg.det(A)==l1*l2
tests.runTest(test6,'The determinant equals product of eigenvalues!')
# this runs into numerical precision errors, try again testing if error is small
det_error = np.linalg.det(A)-l1*l2
test7 = np.abs(det_error) < 1e-10
tests.runTest(test7,'The determinant equals product of eigenvalues!')
Test 1: v1 and l1 are a right eigen pair! Test 2: v2 and l2 are a right eigen pair! Test 3: A = S*L*S_inv! Test 4: It still works when the order is switched! Test 5: The trace equals sum of eigenvalues! Test 6: error Test 7: The determinant equals product of eigenvalues!
A = np.array([[2,0,0],
[0,1,-2],
[0,1,3]])
vals,vecs = np.linalg.eig(A)
printEigen(vals,vecs)
# check if Ax-lx=0
# due to numerical precision error, I will check if norm(Ax-lx) < 1e-10
tests = Test()
test1 = np.linalg.norm(A@vecs[:,0] - vals[0]*vecs[:,0]) < 1e-10
tests.runTest(test1,'A*x1 = l1*x1!')
test2 = np.linalg.norm(A@vecs[:,1] - vals[1]*vecs[:,1]) < 1e-10
tests.runTest(test2,'A*x2 = l2*x2!')
test3 = np.linalg.norm(A@vecs[:,2] - vals[2]*vecs[:,2]) < 1e-10
tests.runTest(test3,'A*x3 = l3*x3!')
# Eigenvalue decomposition
S = vecs
Lambda = np.diag([vals[0],vals[1],vals[2]])
S_inv = np.linalg.inv(S)
# test that A = S*Lambda*S_inv
test4 = np.linalg.norm(A - S@Lambda@S_inv) < 1e-10
tests.runTest(test4,'Eigenvalue decomposition worked!')
# test if eigenvalues came in complex conjugate pairs
test5 = vals[0].conj()==vals[1]
tests.runTest(test5,'Eigenvalues came in complex conjugate pairs!')
# test if eigenvectors came in complex conjugate pairs
test6 = np.array_equal(vecs[:,0].conj(), vecs[:,1])
tests.runTest(test6,'Eigenvectors came in complex conjugate pairs!')
Eigenvalues are 2.00+1.00j, 2.00-1.00j, and 2.00+0.00j. The eigenvector corresponding to 2.00+1.00j is: [[ 0. +0.j ] [ 0.81649658+0.j ] [-0.40824829-0.40824829j]] The eigenvector corresponding to 2.00-1.00j is: [[ 0. -0.j ] [ 0.81649658-0.j ] [-0.40824829+0.40824829j]] The eigenvector corresponding to 2.00+0.00j is: [[1.+0.j] [0.+0.j] [0.+0.j]] Test 1: A*x1 = l1*x1! Test 2: A*x2 = l2*x2! Test 3: A*x3 = l3*x3! Test 4: Eigenvalue decomposition worked! Test 5: Eigenvalues came in complex conjugate pairs! Test 6: Eigenvectors came in complex conjugate pairs!
B = np.array([[1,3,4],
[1,3,4],
[1,3,4]])
vals,vecs = np.linalg.eig(B)
# test that rank goes down by 1 for each 0 eigenvalue
num_0_vals = 0
for x in vals:
if x==0:
num_0_vals += 1
m,n = B.shape
max_rank = min(m,n)
expected_rank = max_rank - num_0_vals
tests = Test()
test1 = expected_rank==np.linalg.matrix_rank(B)
tests.runTest(test1,'The rank went down by 1 for each 0 eigenvalue!')
# show that repeated eigenvalues don't always produce lin. ind. eigenvectors
test2 = np.array_equal(vecs[:,0], -vecs[:,2]) and vals[0]==vals[2]
tests.runTest(test2,'Repeated eigenvalues did not produce linearly independent eigenvectors')
Test 1: The rank went down by 1 for each 0 eigenvalue! Test 2: Repeated eigenvalues did not produce linearly independent eigenvectors
It turns out that the eigenvalues of a 3 dimensional rotation matrix will always be 1, ejθ, and e−jθ. Also, the eigenvector associated with λ=1 is the axis of rotation. You could also find these values through other means, like quaternions or axis-angle parameterization. But it is cool that the eigenvalues and eigenvectors have real physical meaning for rotation matrices!
Note: ejθ=cos(θ)+jsin(θ), so you find the magnitude of theta with acos(real(λ)). There are limitations of acos() and asin() that should be considered as well. I will use positve angles below 90 degrees to avoid issues here.
def rotx(angle):
R = np.array([[1,0,0],
[0,np.cos(angle),-np.sin(angle)],
[0,np.sin(angle),np.cos(angle)]])
return R
def roty(angle):
R = np.array([[np.cos(angle),0,np.sin(angle)],
[0,1,0],
[-np.sin(angle),0,np.cos(angle)]])
return R
def rotz(angle):
R = np.array([[np.cos(angle),-np.sin(angle),0],
[np.sin(angle),np.cos(angle),0],
[0,0,1]])
return R
tests = Test()
# test simple rotation about x-axis
theta_x = np.pi/4
Rx = rotx(theta_x)
x_axis = np.array([[1],[0],[0]])
vals,vecs = np.linalg.eig(Rx)
# check to see if axis of rotation from eigenvector is the x-axis
printEigen(vals,vecs)
index = np.where(vals==1)[0]
axis_of_rotation = vecs[:,index[0]][:,None]
test1 = np.array_equal(x_axis,axis_of_rotation)
tests.runTest(test1,'The eigenvector associated with lambda=1 is the axis of rotation!')
# check that the amount of rotation from eigenvalue is correct
angle = np.arccos(np.real(vals[0]))
test2 = angle==theta_x
tests.runTest(test2,'The eigenvalue gave the correct amount of rotation!')
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-3-8173f069d87e> in <module> 15 return R 16 ---> 17 tests = Test() 18 19 # test simple rotation about x-axis NameError: name 'Test' is not defined
# Now test a case where there is rotation about all 3 axes
theta_y = np.pi/6
theta_z = np.pi/5
Ry = roty(theta_y)
Rz = rotz(theta_z)
R = Rx@Ry@Rz
vals,vecs = np.linalg.eig(R)
# get the axis-angle parameterization
expected_angle = np.arccos(0.5*(np.trace(R)-1))
expected_axis = np.array([[R[1,2]-R[2,1]],
[R[2,0]-R[0,2]],
[R[0,1]-R[1,0]]])/(2*np.sin(expected_angle))
# check to see if axis of rotation from eigenvector is correct
printEigen(vals,vecs)
eigen_axis = vecs[:,0]
test3 = np.array_equal(expected_axis, eigen_axis)
tests.runTest(test1,'The eigenvector associated with lambda=1 is the axis of rotation!')
# check that the amount of rotation from eigenvalue is correct
eigen_angle = np.arccos(np.real(vals[2]))
test4 = expected_angle==eigen_angle
tests.runTest(test2,'The eigenvalue gave the correct amount of rotation!')
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-4-a1cdd9c90941> in <module> 1 # Now test a case where there is rotation about all 3 axes ----> 2 theta_y = np.pi/6 3 theta_z = np.pi/5 4 Ry = roty(theta_y) 5 Rz = rotz(theta_z) NameError: name 'np' is not defined
by Curtis Johnson
Power iteration is a useful algorithm used for optimal control, path planning, and notably early versions of Google's PageRank algorithm.
The useful recursive relationship used in this algorithm is given by bk+1=Abk||Abk||.
bk will converge to a multiple of the eigenvector cooresponding to the largest eigenvalue.
Let A be diagonalizable nxn matrix with eigenvalues λ1,λ2,⋯,λm such that λ1 is the dominant eigenvalue (i.e. |λ1|>|λi|,∀i>1). The cooresponding eigenvectors are x1,x2,⋯,xm.
Also let the initial vector b0∈span(x1⋯xp) such that b0=c1x1+c2x2+⋯+cnxn where c1≠0.
⟹Akb0=c1Akx1+c2Akx2+⋯+cnAkxn=c1λk1x1+c2λk2x2+⋯+cnλknxn=c1λk1[x1+c2c1(λ2λ1)kx2+⋯+cnc1(λnλ1)kxn]=c1λk1x1,k→∞=αx1import numpy as np
import matplotlib.pyplot as plt
A = np.array([[1,3],[7,2]])
b = np.array([-.75,0.5]) #starting choice of this vector is totally arbitrary
eigvals, eigvecs = np.linalg.eig(A)
print("Eigenvalues\n", eigvals)
print("Eigenvectors\n", eigvecs)
largest_eigval_ind = np.argmax(np.abs(eigvals))
largest_eigvec = eigvecs[:,largest_eigval_ind]
smallest_eigval_ind = np.argmin(np.abs(eigvals))
smallest_eigvec = eigvecs[:,smallest_eigval_ind]
plt.figure()
plt.xlim(-1,1)
plt.ylim(-1,1)
for i in range(0,10):
plt.clf()
plt.plot([0,smallest_eigvec[0]], [0, smallest_eigvec[1]], linewidth=4)
plt.plot([0,largest_eigvec[0]], [0, largest_eigvec[1]], linewidth=4)
plt.plot([0,b[0]], [0, b[1]])
plt.legend(['Smallest Eigenvector','Largest Eigenvector', 'Approximation'])
plt.xlim(-1,1)
plt.ylim(-1,1)
b = A@b
b = b/np.linalg.norm(b)
plt.pause(1.0)
Eigenvalues [-3.10977223 6.10977223] Eigenvectors [[-0.58959436 -0.50629921] [ 0.80769951 -0.86235788]]
by Curtis Johnson
Let the diagram below represent the entire internet where each circle represents a webpage. Outgoing arrows represent links to external websites and incoming arrows represent websites that link to your website (e.g. an example to illustrate).
Each website's relative importance can be written as a function of how many websites link to it (e.g. if website 1 is linked to by website 2 and website 3, the relative importance of website 1 (x1) can be written as x1=x2+x3).
You want to rank all of these websites by their relative importance in order to place ads where they are most likely to be seen (i.e. place ads in the most important websites).
a) Find the probability matrix P (also known as a left stochastic matrix) that describes the probability from moving from each webpage to any other webpage. NOTE: Because this matrix contains probabilities, make sure that each column sum is equal to 1. Row sums do not need to be 1.
b) Show that λ=1 is an eigenvalue of any 4x4 left stochastic matrix A of the same form as P (i.e. column sum is 1).
c) Find the probability vector x=[x1,x2,x3,x4]T that ranks each of the webpages in the internet by their relative importance. HINT: This is an eigenvector problem.
Cool facts: This algorithm was first applied like this by Larry Page, one of the founders of Google. It formed the foundation for the PageRank system that set Google apart from it's competitors. As of writing, Google's market value is about $1,000,000,000,000.
This problem can also be solved iteratively using the Power Iteration method explained above. This helps when the P matrix is prohibitively large (as it is for the actual internet). Another helpful fact is that the P matrix is sparse because webpages on the internet are sparsely linked to each other. There are specialized algorithms that deal with sparse matrices very effeiciently, which helps with computation time.