In [1]:

# %load ../../preconfig.py
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
plt.rcParams['axes.grid'] = False

#import numpy as np
#import pandas as pd

#import sklearn

#import itertools

import logging
logger = logging.getLogger()

def show_image(filename, figsize=None, res_dir=True):
    if figsize:
        plt.figure(figsize=figsize)

    if res_dir:
        filename = './res/{}'.format(filename)

    plt.imshow(plt.imread(filename))

11 Dimensionality Reduction¶

11.1 Eigenvalues and Eigenvectors of Symmetric Matrices¶

11.1.1 Definitions¶

$M e = \lambda e$, where $\lambda$ is an eigenvalue and $e$ is the corresponding eigenvector.

to make the eigenvector unique, we require that:

every eigenvector is a unit vector.
the first nonzero component of an eigenvector is positive.

11.1.2 Computing Eigenvalues and Eigenvectors¶

To find principal eigenvector, we use power iteration method:
start with any unit vector $v$, and compute $M^i v$ iteratively until it converges. When $M$ is a stochastic matrix, the limiting vector is the principal eigenvector, and its corresponding eigenvalue is 1.

Another method $O(n^3)$: algebra solution.

11.1.3 Finding Eigenpairs by Power Iteration¶

idea:
Start by computing the pricipal eigenvector, then modify the matrix to, in effect, remove the principal eigenvector.

To find the pricipal eigenpair

power iteration:
Start with any nonzero vector $x_0$ and then iterate: $$x_{k+1} := \frac{M x_k}{\| M x_k\|}$$

$x$ is the pricipal eigenvector when it reaches convergence, and $\lambda_1 = x^T M x$.

Attention: Since power iteration will introduce small errors, inaccuracies accumulate when we try to compute all eigenpairs.
To find the second eigenpair

we create $M^* = M - \lambda_1 x x^T$, then use power iteration again.

proof: the second eigenpair of $M^*$ is also that of $M$.
略

11.1.4 The Matrix of Eigenvectors¶

the eigenvectors of a symmetric matrix are orthonormal

11.1.5 Exercises for Section 11.1¶

11.2 Pricipal-Component Analysis¶

idea:
treat the set of tuples as a matrix $M$ and find the eigenvectors for $M M^T$ or $M^T M$.

In [3]:

plt.imshow(plt.imread('./res/fig11_2.png'))

Out[3]:

<matplotlib.image.AxesImage at 0x1173e1240>

11.2.2 Using Eigenvectors for Dimensionality Reduction¶

Since $M^T M e = \lambda e = e \lambda$,
Let $E$ be the matrix whose columns are the eigenvectors, ordered as largest eigenvalue first. Define the matrix $L$ to have the eigenvalues of $M^T M$ along the diagonal, largest first, and 0's in all other entries.

$M^T M E = E L$

Let $E_k$ be the first $k$ columns of $E$. Then $M E_k$ is a $k$-dimensional representation of $M$.

11.2.3 The Matrix of Distances¶

\begin{align} M^T M e &= \lambda e \\ M M^T (M e) &= M \lambda e = \lambda (M e) \end{align}

the eigenvalues of $M M^T$ are the eigenvalues of $M^T M$ plus additional 0's, and their eigenvectors are shared.

11.2.4 Exercises for Section 11.2¶

11.3 Singular-Value Decomposition¶

11.3.1 Definition of SVD¶

Let $M$ be an $m \times n$ matrix, and let the rank of $M$ be $r$.

$$M = U \Sigma V^T$$

$U$ is an $m \times r$ column-orthnormal matrix (each of its columns is a unit vector and the dot product of any two columns is 0).
$V$ is an $n \times r$ column-orthnormal maxtrix.
$\Sigma$ is a diagonal matrix. The elements of $\Sigma$ are called the singular values of $M$.

In [4]:

show_image('fig11_5.png')

In [6]:

show_image('fig11_7.png', figsize=(8, 10))

11.3.2 Interpretation of SVD¶

viewing the $r$ columns of $U$, $\Sigma$, and $V$ as representing concepts that are hidden in the original matrix $M$.

In Fig 11.7:

concepts: "science fiction" and "romance".
$U$ connects peopel to concepts.
$V$ connects movies to concepts.
$\Sigma$ give the strength of each of the concepts.

11.3.3 Dimensionality Reduction Using SVD¶

In [7]:

show_image('fig11_9.png', figsize=(8, 10))

If we set the $s$ smallest singular values to 0, then we can also eliminate the corresponding $s$ rows of $U$ and $V$.

In [8]:

show_image('fig11_10.png', figsize=(8, 10))

11.3.4 Why Zeroing Low Singular Values Works¶

Let $M = P Q R$, $m_{i j} = \sum_k \sum_l p_{ik} q_{kl} r_{lj}$.

Then \begin{align} \| M \|^2 &= \sum_i \sum_j (m_{ij})^2 \\ &= \sum_i \sum_j (\sum_k \sum_l p_{ik} q_{kl} r_{lj})^2 \\ &= \sum_i \sum_j \left ( \sum_k \sum_l \sum_n \sum_m p_{ik} q_{kl} r_{lj} p_{in} q_{nm} r_{mj} \right) \\ &\text{as $Q$ is diagonal matrix, $q_{kl}$ and $q_{nm}$ will be 0 unless $k = l$ and $n = m$.} \\ &= \sum_i \sum_j \sum_k \sum_n p_{ik} q_{kk} r_{kj} p_{in} q_{nn} r_{nj} \\ &= \sum_j \sum_k \sum_n \color{blue}{\sum_i p_{ik} p_{in}} q_{kk} r_{kj} q_{nn} r_{nj} \\ &\text{as } P = U, \sum_i p_{ik} p_{in} = 1 \text{ if } k = n \text{ and 0 otherwise} \\ &= \sum_j \sum_k q_{kk} r_{kj} q_{kk} r_{kj} \\ &= \sum_k (q_{kk})^2 \end{align}

How many singular values should we retain?

A useful rule of thumb is to retain enough singular values to make up 90% of the energy in $\Sigma$.

11.3.5 Querying Using Concepts¶

Let $q$ is the vector of user Quincy

what movies he would like? "concept space": $q V$, select the one whose score is highest.
find users similar to Quincy? measure the similarity of users by their cosine distance in concept space.

11.3.6 Computing the SVD of a Matrix¶

The SVD of a matrix $M$ is strongly connected to the eigenvalues of the symmetric matrices $M^T M$and $M M^T$.

$M^T = (U \Sigma V^T)^T = V \Sigma^T U^T = V \Sigma U^T$

\begin{align} M^T M &= V \Sigma U^T U \Sigma V^T \\ &= V \Sigma^2 V^T \\ M^T M V &= V \Sigma^2 V^T V \\ & = V \Sigma^2 \end{align}

similar, $M M^T U = U \Sigma^2$.

11.3.7 Exercises for Section 11.3¶

11.4 CUR Decomposition¶

SVD: even if $M$ is sparse, $U$ and $V$ will be dense.

CUR: if $M$ is sparse, $C$ and $R$ will be sparse.

11.4.2 Choosing Rows and Columns Properly¶

Let $f = \sum_{i,j} m_{ij}^2$.

find $C$
- pick $r$ rows, and each row is picked with $p_i = \frac{\sum_j m_{ij}^2}{f}$.
- normailize: each row is divided by $\sqrt{r p_i}$.
find $R$ selected in the analogous way.
Counstructing $U$.
- find $M$, that is the intersection of the chosen columns of $C$ and $R$.
- compute the SVD of $W$: $W = X \Sigma Y^T$.
- compute $\Sigma^+$, the Moore-Penrose pseudoinverse of the diagonal matrix $\Sigma$: replace $\sigma$ by $1/\sigma$ if $\sigma \neq 0$.
- $U = Y (\Sigma^+)^2 X^T$.

In [4]:

show_image('fig11_13.png')

Eliminating Duplicate Rows and Columns¶

it is possible that a single row or column is selected more than once, how to deal with it?

let it go.
or, merge same rows and/or columns $W$ will not be a sqaure matrix, then we need transpose the result to get $\Sigma^+$.

In [3]:

show_image('ex11_17.png')

In [5]:

#Exercise

In [ ]: