Notebook

NTDS'18 milestone 4: graph signal processing¶

Rodrigo Pena, EPFL LTS2

Students¶

Team: <your team number>
Students: <the name of all students in the team>
Dataset: <the dataset you used to complete the milestone>

Rules¶

Milestones have to be completed by teams. No collaboration between teams is allowed.
Textual answers shall be short. Typically one to two sentences.
Code has to be clean.
You cannot import any other library than we imported.
When submitting, the notebook is executed and the results are stored. I.e., if you open the notebook again it should show numerical results and plots. We won't be able to execute your notebooks.
The notebook is re-executed from a blank state before submission. That is to be sure it is reproducible. You can click "Kernel" then "Restart & Run All" in Jupyter.

Objective¶

The goal of this milestone is to do some Graph Signal Processing (GSP) on the data of your project.

A note about plotting¶

There are several questions in this milestone that ask you to plot a signal on your network. There are several ways from which you could approach it. In all cases, compute the position of the nodes a single time at the beginning, as this is likely to be a costly operation. Using a single layout for all the graph plots will also make it easier to compare the plots. Indeed, the only thing changing between plots is the signal displayed. You can represent the features/labels lying on the graph via node colors. To do so, make sure to have a consistent color map throughout and remember to display a colorbar and scale in all plots, so that we can tell what numbers the colors represent.

An option is to use the Laplacian eigenmaps that you have seen in the previous milestone to embed your graph on the plane. For example:
```
from matplotlib import pyplot as plt
plt.scatter(eigenvectors[:, 1], eigenvectors[:, 2], c=signal, alpha=0.5)
plt.colorbar()
```

Another option is to use the plotting capabilities of NetworkX. See the documentation of its drawing methods. For example:

import networkx as nx
graph = nx.from_scipy_sparse_matrix(adjacency)
coords = nx.spring_layout(graph)  # Force-directed layout.
coords = eigenvectors[:, 1:3]  # Laplacian eigenmaps.
nx.draw_networkx_nodes(graph, coords, node_size=60, node_color=signal)
nx.draw_networkx_edges(graph, coords, alpha=0.3)

Another option is to use the plotting capabilities of the PyGSP, a Python package for Graph Signal Processing. Note that your are forbidden to use the PyGSP for anything else than plotting. See the documentation of its plotting utilities. For example:
```
import pygsp as pg
graph = pg.graphs.Graph(adjacency)
graph.set_coordinates('spring')  # Force-directed layout.
graph.set_coordinates(eigenvectors[:, 1:3])  # Laplacian eigenmaps.
graph.plot_signal(signal)
```
Yet another option is to save your graph on disk, use Gephi externally, to visualize the graph, save the graph with the Gephi coordinates and finally load the nodes coordinates back into the notebook.

We encourage you to try all the above methods before making your choice. Then be consistent and use only one throughout the milestone. NetworkX and PyGSP should already be installed in your environement. If that's not the case, install with conda install networkx pygsp (after activating the ntds_2018 environment).

0 - Load your network¶

In [ ]:

%matplotlib inline

If you get a No module named 'pyunlocbox' error when running the below cell, install the pyunlocbox with conda install pyunlocbox (after activating the ntds_2018 environment).

In [ ]:

import numpy as np
from scipy import sparse
import scipy.sparse.linalg
from matplotlib import pyplot as plt
from pyunlocbox import functions, solvers

For this milestone, all we will need is a set of features/labels for each of the nodes on the network, as well as the Laplacian, $L,$ and Gradient, $\nabla_G,$ matrices that you have computed for your network while working on milestone 3.

Import those objects in the cell below (or recompute the Laplacian and Gradient from your stored adjacency matrix, if you wish).

Note: If your features/labels are not floating-point numbers, please convert them. For example, if your data has labels "cat" and "dog" for nodes that represent cats or dogs, respectively, you may assign the number 1.0 for the label "cat" and the number -1.0 for the label "dog".

In [ ]:

laplacian = # Your code here.
gradient = # Your code here.
labels = # Your code here.
n_nodes = # Your code here.

1 - Graph Fourier Transform¶

In this section we will observe how your feature/label vector looks like in the "Graph Fourier" domain.

Question 1¶

Compute the Fourier basis vectors and the Laplacian eigenvalues. Make sure to order those from smaller to larger, $\lambda_0 \leq \lambda_1 \leq \dots \leq \lambda_{N-1},$ and use the same ordering for the Fourier basis vectors.

In [ ]:

e = # Ordered Laplacian eigenvalues.
U = # Ordered graph Fourier basis.

Plot the first 3 and the last Fourier basis vectors as signals on your graph. Clearly indicate which plot belongs to which basis vector.

In [ ]:

# Your code here.

Question 2¶

What can you observe in terms of local variations when comparing the basis vectors corresponding to the smallest eigenvalues to those corresponding to the largest eigenvalue? How would this justify the interpretation of the eigenvalues as "graph frequencies"?

Your answer here.

Question 3¶

Implement a function that returns the Graph Fourier Transform (GFT) of a given vector $x \in \mathbb{R}^{N},$ with respect to your graph, and a function that computes the corresponding inverse GFT (iGFT).

In [ ]:

def GFT(x):
    return # Your code here.

def iGFT(x):
    return # Your code here.

Question 4¶

Plot your feature/label vector as a signal on your graph

In [ ]:

# Your code here.

Plot the absolute values of the GFT of your feature/label signal as a function of the graph eigenvalues. Make sure to add a marker indicating the position of each graph eigenvalue, and remember to properly name the axes.

In [ ]:

# Your code here.

Question 5¶

Discuss the behavior of the GFT that you plotted in the last question via comparing the plot of your label signal and those of the Fourier basis of Question 1. Would you consider your labels a "low-pass" or "high-pass" signal, or yet something else entirely?

Your answer here.

2 - Filtering on graphs¶

In this section we will check how filtered Dirac impulses diffuse on your graph.

Question 6¶

Implement the following three filter kernels and the graph filtering operation.

The heat kernel is supposed to take in a vector of eigenvalues e and a parameter t and output a vector of evaluations of the heat kernel at those eigenvalues (see the course slides for help).
The inverse filter kernel is supposed to take in a vector of eigenvalues e and a parameter t and implement spectrally the filter defined in the node domain by $f_{out} = (I + t L)^{-1} f_{in},$ where $f_{in}, f_{out} \in \mathbb{R}^{N}$ are, repectively, the input and output signals to the filter.
The rectangle kernel takes in a vector of eigenvalues e and parameters l_min and l_max and returns 1.0 at coordinates satisfying $(e[l] \geq l_{min}) \wedge (e[l] \leq l_{max}),$ and 0.0 otherwise.
The graph filtering operation takes a graph signal $x \in \mathbb{R}^{N}$, a spectral graph kernel and a set of keyworded variables, and returns the corresponding filtered signal.
- Hint: Remember that you have implemented the GFT and iGFT operations in Question 3.
- The **kwargs is a placeholder to collect supplementary pairs of keyword-values that are not known by the implementation before execution time. The kwargs variable is a dictionary whose keyes and values are the parameter names and values. This is useful to allow both graph_filter(x, heat_kernel, tau=1.0) and graph_filter(x, rectangle_kernel, lambda_min=0.0, lambda_max=1.0) to be valid calls from the same implementation. One can then defer the keyword-value assignment to the kernel call: foo = kernel(bar, **kwargs).

In [ ]:

def heat_kernel(e, t):
    return # Your code here.

def inverse_kernel(e, t):
    return # Your code here.

def rectangle_kernel(e, l_min, l_max):
    return # Your code here.

def graph_filter(x, kernel, **kwargs):
    return # Your code here.

Question 7¶

Plot all three filter kernels in the spectral domain. Remember to properly name the axes and title the plots. Choose filter parameters that best approximate the behavior of the GFT of your feature/label signal (as seen in Question 4).

In [ ]:

# Your code here.

Question 8¶

Consider two Dirac impulses arbitrarily placed on your graph. Plot their filtered versions by the three filter kernels implemented in Question 6.

In [ ]:

# Your code here.

Comment on the "diffusion" of the Diracs induced by the filters. What does it say about the "communication" of information across your network? Relate that to the network connectivity measures that you analyzed during the previous milestones.

Your answer here.

3 - De-noising¶

In this section we will add some centered Gaussian noise to your feature/label signal and attempt to recover it.

Question 9¶

In the cell below, set the noise variance $\sigma^2$ by making sure that the signal-to-noise ratio $SNR = \frac{\operatorname{Var}(\text{labels})}{\sigma^2}$ is about $1.5$.

Note: Actually, you might want to play with the noise variance here and set it to different values and see how the denoising filters behave.

In [ ]:

noise_variance = # Your code here.
noisy_measurements = labels + noise_variance * np.random.randn(n_nodes)

Question 10¶

In the denoising setting, a common graph signal processing assumption is that the signal $z$ that we want to recover is "smooth", in the sense that $\|\nabla_G z\|_2 = \sqrt{z^{\top} L z}$ is small, while remaining "close" to the measurements that we start with. This leads to denoising by solving the following optimization problem:

$$ z^\star = \text{arg} \, \underset{z \in \mathbb{R}^{N}}{\min} \, \|z - y\|_2^2 + \gamma z^{\top} L z, $$

where $y \in \mathbb{R}^{N}$ is the vector of noisy measurements.

Derive the close form solution to this problem giving $z^\star$ as a function of $y$, $\gamma$ and $L$. Does this solution correspond to any graph filtering operation that you know?

Your answer here.

Question 11¶

Now, denoise the noisy measurements by passing them through the filters that you implemented in Question 6. Choose the filter parameters based on the behavior of the GFT of your original label signal (this is the prior knowledge that you input to the problem).

In [ ]:

z_heat_denoised = # Your code here.
z_inv_denoised = # Your code here.
z_rect_denoised = # Your code here.

Plot, on your graph, the original label signal, the noisy measurements, and the three denoised version obtained above. Report on each plot the value of the corresponding relative error $$ \text{rel-err} = \frac{\|\text{labels} - z \|_2}{\|\text{labels}\|_2}, $$ where $z$ is the plotted signal.

In [ ]:

# Your code here.

Finally, overlay on the same plot the GFT of all five signals above.

In [ ]:

# Your code here.

Question 12¶

Comment on which denoised version seems to best match the original label signal. What is the underlying assumption behind the three filtering approaches? Do you think it holds for your label signal? Why?

Your answer here.

4 - Transductive learning¶

It is often the case in large networks that we can only afford to query properties/labels on a small subset of nodes. Nonetheless, if the underlying labels signal is "regular" enough, we might still be able to recover a good approximation of it by solving an offline variational problem, with constraints on the values of the measured nodes.

In this section, we will be interested in solving such transductive learning problems by minimizing a (semi-) p-norm of the graph gradient applied to the signal of interest:

$$ \text{arg} \, \underset{z|_S = y}{\min} \|\nabla_G z\|_p^p, $$

where $S$ is the set of measured nodes.

In English, we can say that we are looking for solutions with small "aggregated local variations", as measured by $\|\nabla_G z\|_p^p = \sum_{i=1}^{n} \sum_{j=1}^{n} \left( \sqrt{W_{ij}} |z[i] - z[j]| \right)^p,$ while satisfying the measurement constraints $z[i] = y[i]$ for $i \in S.$

We will work with two cases, according to the choices $p=1$ or $p=2.$ For $p=1,$ the problem is known as "interpolation by graph total-variation minimization," whereas for $p=2$ it is sometimes called "interpolation by Tikhonov regularization".

In order to solve these variational problems with the black-box solver provided to you, you will use the pyunlocbox. This toolbox implements iterative solvers based on so-called "proximal-splitting" methods.

Question 13¶

Throughout this section, we will consider only a binarized version of your label signal. If your variable labels currently has values other than $\{-1, 1\},$ threshold them so that those are the only values taken in this vector. This can be done for example by choosing a number $t \in \mathbb{R}$ and then setting $\text{labels_bin}[i] = 1$ if $\text{labels}[i] \geq t$ and $\text{labels_bin}[i] = 0$ otherwise.

In [ ]:

labels_bin = # Your code here.

Now, subsample this binarized label signal by $70\%$ by choosing, uniformly at random, $30\%$ of the nodes whose labels we will keep.

You will do this by computing a "measurement mask" vector w with 1.0's at the measured coordinates, and $0.0$'s otherwise.

In [ ]:

mn_ratio = 0.3
m = int(mn_ratio * n_nodes)  # Number of measurements.

w = # Your code here.

Plot the subsampled signal on the graph. Hint: you might want to set to numpy.nan the values of the un-measured nodes for a cleaner plot.

In [ ]:

# Your code here.

Interlude¶

For the solution of the variational problems you can use the following function as a "black-box".

You will just need to provide a gradient matrix (which you should already have from Section 0), and an orthogonal projection operator P onto the span of the measured coordinates (made precise in the next question).

In [ ]:

def graph_pnorm_interpolation(gradient, P, x0=None, p=1., **kwargs):
    r"""
    Solve an interpolation problem via gradient p-norm minimization.

    A signal :math:`x` is estimated from its measurements :math:`y = A(x)` by solving
    :math:`\text{arg}\underset{z \in \mathbb{R}^n}{\min}
    \| \nabla_G z \|_p^p \text{ subject to } Az = y` 
    via a primal-dual, forward-backward-forward algorithm.

    Parameters
    ----------
    gradient : array_like
        A matrix representing the graph gradient operator
    P : callable
        Orthogonal projection operator mapping points in :math:`z \in \mathbb{R}^n` 
        onto the set satisfying :math:`A P(z) = A z`.
    x0 : array_like, optional
        Initial point of the iteration. Must be of dimension n.
        (Default is `numpy.random.randn(n)`)
    p : {1., 2.}
    kwargs :
        Additional solver parameters, such as maximum number of iterations
        (maxit), relative tolerance on the objective (rtol), and verbosity
        level (verbosity). See :func:`pyunlocbox.solvers.solve` for the full
        list of options.

    Returns
    -------
    x : array_like
        The solution to the optimization problem.

    """
    
    grad = lambda z: gradient.dot(z)
    div = lambda z: gradient.transpose().dot(z)

    # Indicator function of the set satisfying :math:`y = A(z)`
    f = functions.func()
    f._eval = lambda z: 0
    f._prox = lambda z, gamma: P(z)

    # :math:`\ell_1` norm of the dual variable :math:`d = \nabla_G z`
    g = functions.func()
    g._eval = lambda z: np.sum(np.abs(grad(z)))
    g._prox = lambda d, gamma: functions._soft_threshold(d, gamma)

    # :math:`\ell_2` norm of the gradient (for the smooth case)
    h = functions.norm_l2(A=grad, At=div)

    stepsize = (0.9 / (1. + scipy.sparse.linalg.norm(gradient, ord='fro'))) ** p

    solver = solvers.mlfbf(L=grad, Lt=div, step=stepsize)

    if p == 1.:
        problem = solvers.solve([f, g, functions.dummy()], x0=x0, solver=solver, **kwargs)
        return problem['sol']
    if p == 2.:
        problem = solvers.solve([f, functions.dummy(), h], x0=x0, solver=solver, **kwargs)
        return problem['sol']
    else:
        return x0

Question 14¶

During the iterations of the algorithm used for solving the variational problem, we have to make sure that the labels at the measured nodes stay the same. We will do this by means of an operator P which, given a vector $a \in \mathbb{R}^{N},$ returns another vector $b \in \mathbb{R}^{N}$ satisfying $b[i] = \text{labels_bin}[i]$ for every node $i$ in the set $S$ of known labels, and $b[i] = a[i]$ otherwise. Write in the cell below the function for this orthogonal projection operator P.

Hint: remember you have already computed the mask w.

In [ ]:

def P(a):
    # Your code here.
    return b

Question 15¶

Solve the variational problems for $p = 1$ and $p = 2$. Record the solution for the $1-$norm minimization under sol_1norm_min and the one for $2-$norm minimization under sol_2norm_min.

Compute also binarized versions of these solutions by thresholding the values with respect to $0$, that is, non-negative values become 1.0, while negative values become -1.0. Store those binarized versions under sol_1norm_bin and sol_2norm_bin, respectively.

In [ ]:

sol_1norm_min = # Your code here.

sol_2norm_min = # Your code here.

threshold = 0

sol_1norm_bin = # Your code here.

sol_2norm_bin = # Your code here.

Plot, on your graph, the original labels_bin signal, as well as the solutions to the variational problems (both binarized and otherwise). Indicate on each plot the value of the relative error $\text{rel-err} = \frac{\|\text{labels_bin} - z\|_2}{\|\text{labels_bin}\|_2}$, where $z$ is the signal in the corresponding plot.

Question 16¶

Now that you have got a feeling for the sort of solutions that the transductive learning problems studied can give, we will see what is the effect of the number of measurements on the accuracy of both $p-$norm minimization problems.

Towards this goal, you will write a phase_transition() function. This function will basically go over all the procedures that you have implemented in this section, but for varying numbers of measurements and thresholding values. It will also compute the relative error, $\text{rel-err},$ of the solutions and average them over a number of trials.

The output of the phase_transition() function has to be a matrix with len(mn_ratios) columns and len(thresholds) rows. Each pixel $(i,j)$ in the output matrix has to contain the average, over n_trials trials, of the relative error $\text{rel-err}$ in the binarized (with threshold thresholds[i]) solution given by graph_pnorm_interpolation() from observing an mn_ratios[j] fraction of nodes. The randomness comes from a different choice of mask w at each trial, hence the averaging.

The interest of this phase transition matrix is to assess what level of recovery error one could expect for a certain fraction of measurements and a certain threshold level.

In [ ]:

def phase_transition(mn_ratios, thresholds, n_trials, labels_bin, p):

    # Create sample mask.
    
    # Solve p-norm interpolation.
    
    # Aggregate.
    
    return pt_matrix

Question 17¶

Pick 5 "m/n" ratios in $(0, 1)$ and 5 threshold levels in $(-1, 1)$ and run the phase_transition() function with n_trials = 20, for both $p = 1$ and $p = 2$.

In [ ]:

mn_ratios = # Your code here.

thresholds = # Your code here.

pt_matrix_1norm = # Your code here.

pt_matrix_2norm = # Your code here.

Plot both phase transition matrices as images with a colorbar. Make sure to properly name the axes and title the images.

In [ ]:

# Your code here.

Question 18¶

Do the phase transition plots above provide any justification for choosing one $p-$norm interpolation over the other? Why?

Your answer here.