## 40 pts¶

### Ranks and skeleton decomposition¶

• What is the rank of the matrix $\begin{pmatrix}1 & 1 & 1 \\ 2 & 2 & 2 \\ 1 & 2 & 1\end{pmatrix}$? Why? Find its skeleton decomposition.

• Find rank of the matrix $a_{ij} = (i+j)^2$, $i,j=1,\dots,n$. Find its skeleton decomposition. Note: use interpretation of rank as the minimal number of summarands with separated variables $i$ and $j$.

### SVD¶

Interesting remark: matrix norms that have the form $$\|A\|^{\text{shatten}}_p = \left(\sigma_1^p(A) + \dots + \sigma_r^p(A)\right)^{1/p}$$ are called Shatten norms, where $\sigma_i(A)$ are singular values of $A$. Important cases are $p=2$ - the Frobenius norm, $p=\infty$ - the second norm and $p=1$ - the nuclear norm.

• Prove that $\|A\|_2 = \sigma_1(A)$ and $\|A\|_F = \sqrt{\sigma_1^2(A) + \dots + \sigma_r^2(A)}$. Note: unitary invariance of the second and Frobenius norms might be helpful.

• Find the distance between $A$ and its truncated SVD in the second and in the Frobenius norms.

• Suppose $A^* = A$. Find the exact relation between the eigenvalues and singular values of $A$.

### Eigenvalues¶

• Is matrix $\begin{pmatrix}1 & 0 & 1 \\ 0 & 0 & 0 \\ 1 & 0 & 1\end{pmatrix}$ diagonalizable? Why?

• Prove that $\lambda(A)\in \mathbb{R}^1$ if $A$ is Hermitian, $\lambda(A)$ has only imaginary part if $A$ is skew-Hermitian and $|\lambda(A)|=1$ if $A$ is unitary.

### Linear systems¶

• Is it possible to solve 1 000 000 $\times$ 1 000 000 dense linear system on your laptop within 1 month via LU decomposition provided that the LU decomposition is not given?

• What is the complexity of solving a system with a matrix given by its LU or by its QR decomposition. Explain the answer.

In [ ]:



# Problem 2 (QR decomposition)¶

## 20 pts¶

• Implement the QR factorization using Housholder reflections and the modified Gram-Schimdt algorithm. Make sure that they work on a random matrix. To measure orthogonality use $\|G - I\|$ as in Problem Set 1

• Compare the Gram-Schmidt (from the Problem Set 1), modified Gram-Schmidt and the Housholder algorithms on $5\times5$, $10\times 10$ and $20\times 20$ Hilbert matrices

In [ ]:



# Problem 3 (PageRank)¶

## 40 pts¶

### Connected graph¶

• First of all create PageRank matrix $A$ that corresponds to the following graph: Make sure that your matrix is stochastic. What is the largest eigenvalue of this matrix?
• Implement power method and plot relative errors ${|\lambda_{k+1} - 1|}$ (since you know that the exact $\lambda$ is $1$) and $\frac{\|x_{k+1} - x_{k}\|}{\|x_{k+1}\|}$ (since you have no information about $x$) as a function of $k$, where $k$ is the interation number. DO NOT FORGET TO USE LOGARITHMIC SCALE!

### Disconneted graph: damping factor importance¶

• Create PageRank matrix $A$ that corresponds to the following graph:

• Run the power method and plot relative errors ${|\lambda_{k+1} - 1|}$ and $\frac{\|x_{k+1} - x_{k}\|}{\|x_{k+1}\|}$ as a function of $k$. Why do you observe the absense of convergence?
Hint: think about the multiplicity of $\lambda=1$

In order to avoid this problem Larry Page and Sergey Brin proposed to use the following regularization technique: $$A_d = dA + \frac{1-d}{N} \begin{pmatrix} 1 & \dots & 1 \\ \vdots & & \vdots \\ 1 & \dots & 1 \end{pmatrix},$$ where $d$ is small parameter (typically $d=0.85$), which is called damping factor, $A$ is of size $N\times N$. Now $A_d$ is the matrix with multiplicity of the largest eigenvalue equal to 1. Recall that computing the eigenvector of the PageRank matrix which corresponds to the largest eigenvalue has the following interpretation: consider a person who stays in a random node of the graph (i.e. opens a random web page); at each step he follows one of the outcoming edges uniformly at random (i.e. opens one of the links). So the guy randomly walks through the graph and the eigenvector we are looking for is exactly his stationary distribution — for each node it tells you the probability of visiting this particular node. Therefore if the guy has started from a part of a graph which is not connected with the other part, he will never get there. In the regularized model the person at each step follows one of the outcoming links with probability $d$ OR visits a random node from the whole graph with probability $(1-d)$.

• Now run the power method with $A_d$ and plot errors as a function of iteration for different $d$.

### Simple English Wiki¶

Let us now find the most significant articles on the simple english Wikipedia according to the PageRank model. We provide you with the adjecency matrix of the simple Wikipedia articles (file simple_wiki_matrix.mat, matrix can be acessed by the key 'W') and the dictionary that maps article id to its name on Wikipedia (file simple_wiki_dict.pickle).

• Normalize each column of the adjecency matrix to get a matrix from the PageRank model. Check that the obtained matrix is stochastic.

• Plot relative errors ${|\lambda_{k+1} - 1|}$ and $\frac{\|x_{k+1} - x_{k}\|}{\|x_{k+1}\|}$ as a function of $k$ for different $d$. Note that your matrix contains a lot of zeros and is stored in the sparse format. Hence np.dot(A, x) will not work. However, sparse matrices has method .dot(), so A.dot(x) works fine.

• Print names of top-5 articles when $d=0.85$.

In [ ]:



# Problem 4 (Eigenfaces)¶

## 50 pts¶

The aim of this task is to build a face classifier. There are 40 persons in the database. Every person is represented by 10 photos with slightly different facial expression.

• Create training sample:

Import first 9 images for each face ($9\times 40$ images). Represent these pictures as a matrix $F$ with $9\times 40$ columns, where each column is a reshaped 2D picture. Note: use np.reshape to reshape matrix into column

• Calculate and plot mean face. Subtract it from each column of the matrix $F$

• Calculate SVD decomposition of the shifted matrix F and truncate the obtained representaition: $F_r = U_r S_r V_r^T$.

Here $U_r$ is a matrix with $r$ columns - basis set in a space of faces. $W_r = S_r V_r^T$ is a matrix of coefficients in the basis $U_r$. Note that now every image is represented as a small number of coefficients in the basis $U_r$.

• Plot vectors in $U_r$ using subplots. Make sure that you get face-like images. Now you know what eigenfaces are =)

• Import testing set which is the rest of photos. Find their coefficients in the basis $U_r$.

• Compare the obtained vectors of coefficients to vectors in $W_r$ using cosine similarity and classify testing faces. As an output give indices of faces that were misclassified when $r=5$.

In [ ]:



# Problem 5 (Building recommender systems with SVD)¶

## 50 pts¶

### Pseudocode for measuring recommender system quality¶

initialize total_score variable with 0
for each user in test-data:
rating_history <-- all but the last N movies rated by user
evaluation_set <-- the last N movies rated by user

initialize user_preferences_vector with 1s using rating_history as indices of non-zero values
top-N recommendations <-- folding-in applied to user_preferences_vector
correct_predictions <-- # of intersections of recommendations and evaluation_set
total_score <-- total_score + correct_predictions

return total_score

You may use different implementation at your will, the only two criterias:

• it should be not slower than a reference implementation
• it should produce correct results
In [ ]: