#!/usr/bin/env python
# coding: utf-8

# # [NTDS'17] assignment 4: graph signal processing
# [ntds'17]: https://github.com/mdeff/ntds_2017
# 
# [Michaël Defferrard](http://deff.ch), [EPFL LTS2](http://lts2.epfl.ch)
# 
# In this assignment we'll use the [PyGSP](https://github.com/epfl-lts2/pygsp) to do some Graph Signal Processing (GSP). This fourth assignement is shorter than the last one in order to let you focus on your projects.

# In[1]:


get_ipython().run_line_magic('matplotlib', 'inline')


# In[2]:


import numpy as np
from scipy import sparse, spatial
import pandas as pd
import matplotlib.pyplot as plt
from pygsp import graphs, filters, plotting


# In[3]:


plt.rcParams['figure.figsize'] = (17, 5)
plotting.BACKEND = 'matplotlib'


# If you get an import error about the PyGSP, install it with `pip install pygsp` or `conda install -c conda-forge pygsp`.

# ## 1 Data
# 
# For this assignment, we'll again use the [Free Music Archive dataset](https://github.com/mdeff/fma), as for the third. This time, I'm giving you all the data. Let's load the musical genres.

# In[4]:


# Ground truth genres.
tracks = pd.read_csv('../data/fma_tracks.csv', index_col=0, squeeze=True)

# Complete missing genres.
tracks[:10] = [21, 21, 27, 12, 31, 89, 36, 25, 21, 12]

# Convert to top-level genres.
tracks = tracks.apply(lambda gid: 21 if gid in [21, 83, 100, 539, 542, 811] else 12)
assert set(tracks.unique()) == {12, 21} 


# ## 2 Graph
# 
# Given a data matrix $\mathbf{X} = [\mathbf{x}_1, \ldots, \mathbf{x}_N]^\intercal \in \mathbb{R}^{N \times d}$, where each $\mathbf{x} \in \mathbb{R}^d$ of the $N=2000$ row is a sample for which we have $d$ features, we constructed in the last assigment a similarity graph defined as
# $$\mathbf{W}[i,j] = \exp(-d^2(\mathbf{x}_i - \mathbf{x}_j) / \sigma^2).$$
# 
# We construct below a PyGSP graph object with the edge weights from the last assignment.

# In[5]:


weights = sparse.load_npz('../data/fma_graph.npz')
G = graphs.Graph(weights, gtype='FMA genres')
del weights


# With the PyGSP, compute the normalized graph Laplacian, defined as
# $$\mathbf{L} = \mathbf{I} - \mathbf{D}^{-1/2} \mathbf{W} \mathbf{D}^{-1/2},$$
# where $\mathbf{D} \in \mathbb{R}^{N \times N}$ is the diagonal degree matrix and $\mathbf{I}$ is the identity matrix.
# 
# Hints:
# * Look at the documentation [here](https://pygsp.readthedocs.io/en/latest/reference/graphs.html).

# In[6]:


G.compute_laplacian('normalized')


# ## 3 Fourier basis
# 
# The PyGSP can compute the graph Fourier basis.

# In[7]:


G.compute_fourier_basis(recompute=True)


# The vector of eigenvalues are then available at `G.e`, and the Fourier basis, i.e., the eigenvectors, at `G.U`.

# In[8]:


plt.plot(G.e);


# From the above plot, can you infer if the Fourier basis was computed from a combinatorial or normalized Laplacian?
# 
# **Your answer here.** If the largest eigenvalue is larger than 2, then we know for sure it was computed from a combinatorial Laplacian. If it's smaller than 2, then it is probably a normalized Laplacian, but we cannot be sure.

# ## 4 Layout
# 
# To visualize our graph, we need to give each node a coordinate in 2 or 3 dimensions. Embed the graph in 2D with Laplacian eigenmaps and visualize it.
# 
# Hints:
# * Use `G.set_coordinates()` and `G.plot()`.

# In[9]:


G.set_coordinates(G.U[:, 1:3])

G.plot(vertex_size=50)


# We can also visualize signals, like the genres.

# In[10]:


G.plot_signal(tracks, vertex_size=20)


# Or the eigenvectors / Fourier modes.
# 
# Hint:
# * If you did embed the graph correctly, you should see how the second and third eigenvectors are aligned with the x and y axes.

# In[11]:


G.plot_signal(G.U[:, 5], vertex_size=50)


# ## 5 Filter localization
# 
# We define below the low-pass filter
# $$\hat{g}(x) = \exp \left( \frac{-\tau x}{\lambda_{\text{max}}} \right).$$
# Its frequency response is depicted.
# 
# The vertical bars in the plot represents the eigenvalues of the graph. While the filter is continuous, it is only evaluated at those eigenvalues when filtering a graph signal.

# In[12]:


f = filters.Heat(G, tau=10)
f.plot()


# To observe how our kernel looks like in the node domain, we would like to localize it on node $i$. Given the Fourier basis $\mathbf{U}$, the filter kernel $\hat{g}(\lambda)$, and the diagonal matrix of eigenvalues $\mathbf\Lambda$, what is the expression of the localized kernel $T_i g \in \mathbb{R}^N$?
# 
# **Your answer here.** The signal is given by
# $$T_i g = \mathbf{U} \hat{g}(\mathbf{\Lambda}) \mathbf{U}^\intercal \delta_i.$$
# 
# Now localize it at node $i=300$ and observe the result.
# 
# Hints:
# * You can evaluate the filter with `f.evaluate()`.
# * You can filter a signal with `f.filter()`.

# In[13]:


NODE = 300

s = G.U @ np.diag(f.evaluate(G.e).squeeze()) @ G.U.T[:, NODE]

# Alternative:
# s = np.zeros(G.N)
# s[NODE] = 1
# s = G.U @ np.diag(f.evaluate(G.e).squeeze()) @ G.U.T @ s

# Alternative:
# s = np.zeros(G.N)
# s[NODE] = 1
# s = f.filter(s)

# Alternative:
# s = f.localize(NODE)

G.plot_signal(s, vertex_size=50, highlight=NODE)


# Intuitively, should the plotted graph signal be smooth?
# 
# **Your answer here.** Yes, because the energy of the signal is concentrated in the low frequencies.
# 
# Confirm your intuition by looking at the signal `s` in the Fourier domain. Compare with the graph Fourier transform (GFT) of a delta signal
# $$\delta_i[j] =
# \begin{cases}
# 1 & \text{if } j = i, \\
# 0 & \text{otherwise.}
# \end{cases}$$
# 
# Hints:
# * You can compute the GFT with `G.gft()`.

# In[14]:


s0 = np.zeros(G.N)
s0[NODE] = 1
plt.plot(G.e, G.gft(s0));

plt.plot(G.e, G.gft(s));


# ## 6 Transductive learning
# 
# In this problem, we'll consider having the labels for some percentage of our $N=2000$, but missing the rest. Those of you who have done some [Machine Learning (ML)](https://en.wikipedia.org/wiki/Machine_learning) will surely recognize here a [supervised learning](https://en.wikipedia.org/wiki/Supervised_learning) problem and will say: yeah, let's train a classifier on the training data, and predict the missing labels! While they would not be wrong in doing so, they would not use the unlabeled data at all for training. The setup where test cases are known a-priori is called [transductive learning](https://en.wikipedia.org/wiki/Transduction_%28machine_learning%29). So let's exploit the structure of the data domain with a similarity graph!
# 
# Define $\mathbf{y} \in \mathbb{R}^N$ such as
# $$\mathbf{y}[i] =
# \begin{cases}
#     1 &\text{if the genre of track } i \text{ is Rock}, \\
#     -1 &\text{if the genre of track } i \text{ is Hip-Hop}, \\
#     0 &\text{if the genre of track } i \text{ is unknown}, \\
# \end{cases}
# $$
# 
# and the diagonal masking matrix $\mathbf{M} \in \mathbb{R}^{N \times N}$, which indicates the observations, such as
# 
# $$\mathbf{M}[i,i] =
# \begin{cases}
#     1 &\text{if the genre of track } i \text{ is known}, \\
#     0 &\text{if the genre of track } i \text{ is unknown}. \\
# \end{cases}
# $$

# In[15]:


# Ground truth.
x = np.ones(G.N)
x[tracks == 21] = -1

def prepare_observations(p):
    """Prepare observations, where p is the percentage of values to keep."""
    rs = np.random.RandomState(42)
    M = np.diag(rs.uniform(size=G.N) < p)
    return M.dot(x)

# Play with the percentage of observed values.
y = prepare_observations(p=0.1)
plt.hist(y);


# The problem we then want to solve is
# $$\mathbf{x}^* = \operatorname*{arg\,min}_{\mathbf{x} \in \mathbb{R}^N} \|\mathbf{y} - \mathbf{Mx}\|_2^2 + \alpha \mathbf{x}^\intercal \mathbf{L} \mathbf{x},$$
# where $\alpha$ is an hyper-parameter which controls the trade-off between the data fidelity term and the smoothness prior.
# 
# What is the solution of this problem?
# 
# **Your answer here.**
# The optimal solution (by putting the derivative to zero) is given by:
# $$\begin{align}
# 2 \mathbf{M}^\intercal (\mathbf{Mx}^\ast - \mathbf{y}) + \alpha (\mathbf{L x}^\ast + \mathbf{L}^\intercal \mathbf{x}^\ast) &= 0 \\
# \mathbf{Mx}^\ast - \mathbf{My} + \alpha \mathbf{L x}^\ast &= 0 \\
# \mathbf{x}^\ast &= (\mathbf{M} + \alpha \mathbf{L})^{-1} \mathbf{y}
# \end{align}$$
# Note that $\mathbf{My} = \mathbf{y}$ by definition of $\mathbf{y}$ and $\mathbf{M}$.
# 
# Implement it below.
# 
# Hints:
# * The solution of a linear system of equations can be found with `np.linalg.solve()`.

# In[16]:


def solve(y, alpha):
    """
    Solve the optimization problem.
    
    Parameters:
        y: the observations
        alpha: the balance between fidelity and smoothness prior.
    
    Returns:
        x_pred: the predicted class
        x_star: the solution of the optimization problem
    """
    M = np.diag(y != 0)

    # Reconstruct.
    x_star = np.linalg.solve(M + alpha * G.L, y)

    # Binarize.
    x_pred = np.ones(G.N)
    x_pred[x_star < 0] = -1

    return x_pred, x_star

x_pred, x_star = solve(y, alpha=1)

# Be sure that the prediction is binary.
np.testing.assert_equal(abs(x_pred), 1)

# Error rate.
err = np.count_nonzero(x - x_pred)
print('{} errors ({:.2%})'.format(err, err/G.N))


# Let's visualize the various graph signals:
# 1. The ground truth `x`.
# 1. The observations `y`.
# 1. The smooth solution `x_star`.
# 1. The binary prediction `x_pred`.

# In[17]:


G.plot_signal(x, vertex_size=20)
G.plot_signal(y, vertex_size=20)
G.plot_signal(x_star, vertex_size=20)
G.plot_signal(x_pred, vertex_size=20)


# Compute the prediction accuracy as a function of $p \approx \frac{n}{N}$.

# In[18]:


alpha = 0.1

for p in [0.1, 0.3, 0.5, 0.7, 0.9]:
    y = prepare_observations(p)
    x_pred, _ = solve(y, alpha)
    err = np.count_nonzero(x - x_pred)
    print('{}: {:6.2%}'.format(p, err/G.N))


# ## 7 Conclusion
# 
# In this assignment, you hopefully got some intuitions about the graph Fourier transform (GFT), and have an idea on how we can leverage graphs to exploit geometry in the data. Moreover, we saw that the [PyGSP](https://github.com/epfl-lts2/pygsp) can considerably ease Graph Signal Processing (GSP)!
# 
# If you feel confident about predicting genres on the [FMA](https://github.com/mdeff/fma), consider participating to our [genre recognition challenge](https://www.crowdai.org/challenges/www-2018-challenge-learning-to-recognize-musical-genre) to label 35,000 tracks! I will surely have a semester or master project to offer to those who do well. :)