Another Example of Principal Component Analysis

In [1]:
import numpy as np
In [2]:
X = np.array([[10,20,10],
              [2,5,2],
              [8,17,7],
              [9,20,10],
              [12,22,11]])
In [4]:
print(X)
[[10 20 10]
 [ 2  5  2]
 [ 8 17  7]
 [ 9 20 10]
 [12 22 11]]

Let's first try to drive the principle components directly without using scikit-learn Decomposition module

In [39]:
X = np.mat(X)
meanVals = X.mean(axis=0)
A = X - meanVals           # A is the zero-mean (centered) version of X
C = np.cov(A, rowvar=0)    # C is the covarianvce matrix of X
print(C)
[[14.2  25.3  13.5 ]
 [25.3  46.7  24.75]
 [13.5  24.75 13.5 ]]
In [40]:
# Note that C = (1/(N-1)) A.T*A
N = np.shape(X)[0]
print(np.dot(A.T,A)/(N-1))
[[14.2  25.3  13.5 ]
 [25.3  46.7  24.75]
 [13.5  24.75 13.5 ]]

Now we can obtain eigenvalues and eigenvectors of the covariance matrix:

In [41]:
eigen_values, eigen_vectors = np.linalg.eig(C)
print("Eigen Values:", eigen_values, "\n")
print("Eigen Vectors:\n", eigen_vectors)
Eigen Values: [73.718  0.384  0.298] 

Eigen Vectors:
 [[ 0.434  0.9   -0.044]
 [ 0.795 -0.406 -0.451]
 [ 0.424 -0.161  0.891]]

No we can transform the full data into the new feature space based on the eigenvectors:

In [42]:
newFeatures = eigen_vectors.T
XTrans = np.dot(newFeatures, A.T)
print(XTrans.T)
[[  4.173   0.      0.259]
 [-14.615   0.172   0.252]
 [ -0.352  -0.1    -0.973]
 [  3.739  -0.9     0.303]
 [  7.055   0.828   0.159]]

However, typically, we want a lower-dimensional space. We can sort the eigenvectors in the decreasing order of their eigenvalues and take the top k. In the example below, we'll take only the top first principal component (since it has the largest eigenvalue, no sorting necessary):

In [43]:
reducedFeatures = eigen_vectors[:,0].T
reducedXTrans = np.dot(reducedFeatures, A.T)
print(reducedXTrans.T)
[[  4.173]
 [-14.615]
 [ -0.352]
 [  3.739]
 [  7.055]]

We can also use Scikit-learn decomposition module to do the same thing:

In [44]:
from sklearn import decomposition
In [52]:
pca = decomposition.PCA(svd_solver='randomized')
XTrans = pca.fit_transform(X)
In [55]:
np.set_printoptions(precision=3, suppress=True)

print(XTrans)
[[-4.173 -0.    -0.259]
 [14.615 -0.172 -0.252]
 [ 0.352  0.1    0.973]
 [-3.739  0.9   -0.303]
 [-7.055 -0.828 -0.159]]

The remaining part of this notebook, is another example of using PCA for dimensionality reduction. This is the example used in lecture slides.

In [56]:
M = np.array([[2.5, 2.4],
           [0.5, 0.7],
           [2.2, 2.9],
           [1.9, 2.2],
           [3.1, 3.0],
           [2.3, 2.7],
           [2, 1.6],
           [1, 1.1],
           [1.5, 1.6],
           [1.1, 0.9]])
In [58]:
meanM = M.mean(axis=0)
MC = M - meanM                 # MC is the zero-mean (centered) version of M
CovM = np.cov(MC, rowvar=0)    # CovM is the covarianvce matrix of M
print("Zero Mean Matrix:\n", MC,"\n")
print("Covariance Matrix:\n", CovM,"\n")
Zero Mean Matrix:
 [[ 0.69  0.49]
 [-1.31 -1.21]
 [ 0.39  0.99]
 [ 0.09  0.29]
 [ 1.29  1.09]
 [ 0.49  0.79]
 [ 0.19 -0.31]
 [-0.81 -0.81]
 [-0.31 -0.31]
 [-0.71 -1.01]] 

Covariance Matrix:
 [[0.617 0.615]
 [0.615 0.717]] 

In [59]:
eigVals, eigVecs = np.linalg.eig(CovM)
print("Eigenvalues:\n", eigVals,"\n")
print("Eigenvectors:\n", eigVecs,"\n")
Eigenvalues:
 [0.049 1.284] 

Eigenvectors:
 [[-0.735 -0.678]
 [ 0.678 -0.735]] 

In [60]:
newFeatures = eigVecs[:,1].T
print(newFeatures)
[-0.678 -0.735]
In [61]:
MTrans = np.dot(newFeatures, MC.T)
print(np.mat(MTrans).T)
[[-0.828]
 [ 1.778]
 [-0.992]
 [-0.274]
 [-1.676]
 [-0.913]
 [ 0.099]
 [ 1.145]
 [ 0.438]
 [ 1.224]]
In [63]:
# Instead we can use scikit-learn's decomposition.PCA and specifiy the number of components

pca2 = decomposition.PCA(n_components=1)
MTrans2 = pca2.fit_transform(M)
print(MTrans2)
[[-0.828]
 [ 1.778]
 [-0.992]
 [-0.274]
 [-1.676]
 [-0.913]
 [ 0.099]
 [ 1.145]
 [ 0.438]
 [ 1.224]]
In [ ]: