Clément Vignac, EPFL LTS4 and Guillermo Ortiz Jiménez, EPFL LTS4.
<your team number>
<your name
> (for the indivudual submission) or <the name of all students in the team>
(for the team submission)Grading:
Submission:
In this assignment you will experiment with the main concepts of spectral graph theory, as well as familizarize yourself with the main data science techniques for network data.
The assignment is made of three parts:
import numpy as np
from scipy.spatial.distance import pdist, squareform
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline
from pygsp.graphs import TwoMoons
G = TwoMoons(moontype='synthesized', N=2000)
X = G.coords
Y = G.labels.astype(int)
plt.scatter(X[:, 0], X[:, 1], c=Y)
plt.show()
Build a similarity graph using the euclidean distance between data points.
Note: Use an RBF kernel to set the edge weights $w_{ij}=\exp(-||x_i- x_j||_2^2 / ~ 2 \sigma^2)$ of your adjacency and threshold the ones with the smallest magnitude.
def epsilon_similarity_graph(X: np.ndarray, sigma=1, epsilon=0):
""" X (n x d): coordinates of the n data points in R^d.
sigma (float): width of the kernel
epsilon (float): threshold
Return:
adjacency (n x n ndarray): adjacency matrix of the graph.
"""
# Your code here
return adjacency
adjacency = epsilon_similarity_graph(X, sigma=None, epsilon=None)
plt.spy(adjacency)
plt.show()
How do you choose sigma
?
Your answer here
How do you choose the threshold epsilon
?
Your answer here
Build the combinatorial and normalized graph laplacians for this dataset.
def compute_laplacian(adjacency: np.ndarray, normalize: bool):
""" Return:
L (n x n ndarray): combinatorial or symmetric normalized Laplacian.
"""
# Your code here
laplacian_comb = compute_laplacian(adjacency, normalize=False)
laplacian_norm = compute_laplacian(adjacency, normalize=True)
For both Laplacian matrices, compute the eigendecomposition $L = U^\top \Lambda U$, where the columns $u_k \in \mathbb{R}^N$ of $U = [u_1, \dots, u_N] \in \mathbb{R}^{N \times N}$ are the eigenvectors and the diagonal elements $\lambda_k = \Lambda_{kk}$ are the corresponding eigenvalues. Make sure that the eigenvalues are ordered, i.e., $\lambda_1 \leq \lambda_2 \leq \dots \leq \lambda_N$.
Justify your choice of a solver for the eigendecomposition.
Your answer here
def spectral_decomposition(laplacian: np.ndarray):
""" Return:
lamb (np.array): eigenvalues of the Laplacian
U (np.ndarray): corresponding eigenvectors.
"""
# Your code here
lamb_comb, U_comb = spectral_decomposition(laplacian_comb)
lamb_norm, U_norm = spectral_decomposition(laplacian_norm)
We plot the sorted eigenvalues as a function of their index:
plt.figure(figsize=(12,5))
plt.subplot(121)
plt.plot(lamb_comb)
plt.xlabel('Index')
plt.ylabel('Eigenvalue')
plt.title('Eigenvalues $L_{comb}$')
plt.subplot(122)
plt.plot(lamb_norm)
plt.xlabel('Index')
plt.ylabel('Eigenvalue')
plt.title('Eigenvalues $L_{norm}$')
plt.show()
What is the lowest eigenvalue $\lambda_0$ and the corresponding eigenvector $u_0$? Answer for both Laplacian matrices.
Your answer here
When filtering a signal or computing polynomials, which Laplacian provides the best numerical stability? Justify your answer.
Your answer here
The eigendecomposition provides an easy way to compute the number of connected components in the graph. Fill the following function:
def compute_number_connected_components(lamb: np.array, threshold: float):
""" lamb: array of eigenvalues of a Laplacian
Return:
n_components (int): number of connected components.
"""
# Your code here
Tune the parameters $\epsilon$ and $\sigma$ of the similarity graph so that the graph is connected. Otherwise, clustering would be too simple!
print(compute_number_connected_components(lamb_norm, threshold=None))
from sklearn.cluster import KMeans
# Your code here
y_pred = # Vector with cluster assignments
plt.scatter(X[:, 0], X[:, 1], c=y_pred)
plt.show()
K-means cannot find a good solution to this problem. Why?
Your answer here
As opposed to naive K-means, spectral clustering doesn't operate on the input space but on the eigenspace of the graph that represents the data. Implement spectral clustering. You can use this tutorial.
class SpectralClustering():
def __init__(self, n_classes: int, normalize: bool):
self.n_classes = n_classes
self.normalize = normalize
self.laplacian = None
self.e = None
self.U = None
self.clustering_method = # Your code here
def fit_predict(self, adjacency):
""" Your code should be correct both for the combinatorial
and the symmetric normalized spectral clustering.
Return:
y_pred (np.ndarray): cluster assignments.
"""
# Your code here
return y_pred
print("Connected components:", compute_number_connected_components(lamb_norm, threshold=1e-12))
spectral_clustering = SpectralClustering(n_classes=2, normalize=True)
y_pred = spectral_clustering.fit_predict(adjacency)
plt.scatter(X[:, 0], X[:, 1], c=y_pred)
plt.show()
Can you think of another 2D dataset in which k-means would badly perform, but spectral clustering would not?
Construct it!
For this question you can import any dataset of your choice, for example from sklearn.datasets
or pygsp.graphs
, but you can also get creative and define something of your own. First, create and plot the dataset.
# Your code here
Run K-means:
# Your code here
Create the similarity graph, and run spectral clustering with both the combinatorial and normalized Laplacian matrices:
# Your code here
Comment your results here
Most datasets are very high-dimensional, which means it can be very hard to understand their geometry. Fortunately, there exists multiple techniques that can help us to reduce the dimensionality of the data, and allow us to visualize it.
In this part of the assignment we will use MNIST to compare these techniques. Indeed, without dimensionality reduction it would be very difficult to answer questions like: are the different digits clustered together in different areas of space?
But first, let's load our dataset:
from utils import load_mnist
X_mnist, y_mnist = load_mnist()
classes = np.unique(y_mnist)
Most dimensionality reduction algorithms are constructed such that some property of the dataset remains invariant in the lower dimensional representation. Before implementing laplacian eigenmaps, can you say what property of the data does this algorithm preserve?
Your answer here
Implement a function that uses Laplacian eigenmaps to do dimensionality reduction.
def laplacian_eigenmaps(X:np.ndarray, dim: int, sigma: float, epsilon: float, normalize: bool):
""" Return:
coords (n x dim array): new coordinates for the data points."""
# Your code here
Use this function to visualize MNIST in 2D. Feel free to play with the different parameters.
dim = 2
# Your code here
Visualize MNIST in 3D:
dim = 3
# Your code here
We provide the visualization of MNIST with other methods:
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE, Isomap
# This cell can take a few minutes to run
run_this_cell = False
if run_this_cell:
# In 2d
embeddings = [PCA(n_components=2, copy=True, whiten=True, tol=1e-5),
Isomap(n_components=2, n_neighbors=5),
TSNE(n_components=2)]
for embedding in embeddings:
X_embedded = embedding.fit_transform(X_mnist)
fig = plt.figure()
for i in classes:
mask = y_mnist == i
plt.scatter(X_embedded[mask, 0], X_embedded[mask, 1], label=i)
plt.legend()
plt.title('Embedding method: '+ type(embedding).__name__)
plt.show()
# In 3d
embeddings = [PCA(n_components=3, copy=True, whiten=True, tol=1e-5),
Isomap(n_components=3, n_neighbors=5),
TSNE(n_components=3)]
for embedding in embeddings:
X_embedded = embedding.fit_transform(X_mnist)
fig = plt.figure()
ax = Axes3D(fig)
for i in classes:
mask = y_mnist == i
ax.scatter(X_embedded[mask, 0], X_embedded[mask, 1], X_embedded[mask, 2], label=i)
ax.legend()
ax.title.set_text('Embedding method: '+ type(embedding).__name__)
plt.show()
In a few words, what are the principles guiding the design of each method? Compare their results.
Your answer here
In this part of the assignment we are going to familiarize ourselves with the main concepts in Graph Signal Processing and regularization on graphs in general. From now on, you can only use the following libraries as well as the functions that you implemented in the previous parts.
import pandas as pd
import numpy as np
from pygsp.graphs import Bunny
In this exercise we will use a nearest-neighbor graph constructed from the Stanford Bunny point cloud included in the PyGSP library.
G = Bunny()
adjacency = np.asarray(G.W.todense())
n_nodes = adjacency.shape[0]
We will use the following function to plot our signals on this graph.
def plot_bunny(x=None, title='', vlim=[-0.03, 0.03]):
fig = plt.gcf()
ax = plt.gca()
if not isinstance(ax, Axes3D):
ax = plt.subplot(111, projection='3d')
if x is not None:
x = np.squeeze(x)
p = ax.scatter(G.coords[:,0], G.coords[:,1], G.coords[:,2], c=x, marker='o',
s=5, cmap='RdBu_r', vmin=vlim[0], vmax=vlim[1])
ax.view_init(elev=-90, azim=90)
ax.dist = 7
ax.set_axis_off()
ax.set_title(title)
if x is not None:
fig.colorbar(p)
plt.subplot(111, projection='3d')
plot_bunny()
Let us start by constructing the normalized graph laplacians from the adjacency matrix and find its spectral decomposition.
laplacian = compute_laplacian(adjacency, normalize=True)
lam, U = spectral_decomposition(laplacian)
Plot the eigenvalues.
plt.figure(figsize=(6, 5))
plt.plot(lam)
plt.title('Eigenvalues $L_{norm}$')
plt.show()
To make things more clear we will plot some of its eigenvectors (0, 1, 3, 10, 100) as signals on the bunny graph.
plt.figure(figsize=(18, 9))
plt.subplot(231, projection='3d')
plot_bunny(x=U[:,0], title='Eigenvector #0')
plt.subplot(232, projection='3d')
plot_bunny(x=U[:,1], title='Eigenvector #1')
plt.subplot(233, projection='3d')
plot_bunny(x=U[:,2], title='Eigenvector #2')
plt.subplot(234, projection='3d')
plot_bunny(x=U[:,3], title='Eigenvector #3')
plt.subplot(235, projection='3d')
plot_bunny(x=U[:,10], title='Eigenvector #10')
plt.subplot(236, projection='3d')
plot_bunny(x=U[:,100], title='Eigenvector #100')
What can you say in terms of the variation (smoothness) of these signals? How can the smoothness of a signal be measured?
Your answer here
Create a function to compute the Graph Fourier Transform (GFT) of a graph signal and its inverse.
Note: You can assume that you have internal access to the eigendecomposition (U
and lam
) of the laplacian.
def GFT(signal: np.ndarray):
# Your code here
def iGFT(fourier_coefficients: np.ndarray):
# Your code here
Now, let's create a graph signal:
x = G.coords[:, 0] + G.coords[:, 1] + 3 * G.coords[:, 2]
x /= np.linalg.norm(x)
noise = np.random.randn(n_nodes)
noise /= np.linalg.norm(noise)
x_noisy = x + 0.3*noise
plot_bunny(x_noisy, vlim=[min(x_noisy), max(x_noisy)])
and plot its graph spectrum:
plt.figure(figsize=(10, 6))
plt.plot(lam, np.abs(GFT(x_noisy)), 'r.')
plt.plot(lam, np.abs(GFT(x)), 'g-')
plt.xlabel('$\lambda$')
plt.ylabel('GFT')
plt.legend(['$x_{noisy}$', '$x$'])
We will try to extract the signal from the noise using graph filters. Let us start by creating three ideal graph filters.
ideal_lp = np.ones((n_nodes,))
ideal_bp = np.ones((n_nodes,))
ideal_hp = np.ones((n_nodes,))
ideal_lp[lam >= 0.1] = 0 # Low-pass filter with cut-off at lambda=0.1
ideal_bp[lam < 0.1] = 0 # Band-pass filter with cut-offs at lambda=0.1 and lambda=0.5
ideal_bp[lam > 0.5] = 0
ideal_hp[lam <= 1] = 0 # High-pass filter with cut-off at lambda=1
Additionally, create the ideal graph filter that implements the solution of Tikhonov regularization.
alpha = 0.99 / np.max(lam)
ideal_tk = # Your code here
Let's plot the spectral responses:
plt.plot(lam, ideal_lp, '-', label='LP')
plt.plot(lam, ideal_bp, '-', label='BP')
plt.plot(lam, ideal_hp, '-', label='HP')
plt.plot(lam, ideal_tk, '-', label='Tikhonov')
plt.xlabel('$\lambda$')
plt.ylabel('Spectral response')
plt.legend(loc='lower right')
Create a function to filter a signal given an ideal graph filter
def ideal_graph_filter(x: np.ndarray, spectral_response: np.ndarray):
"""Return a filtered signal."""
# Your code here
Let us visualize the results:
x_lp = ideal_graph_filter(x_noisy,ideal_lp)
x_bp = ideal_graph_filter(x_noisy,ideal_bp)
x_hp = ideal_graph_filter(x_noisy,ideal_hp)
x_tk = ideal_graph_filter(x_noisy,ideal_tk)
plt.figure(figsize=(18, 9))
plt.subplot(231, projection='3d')
plot_bunny(x=x, title='signal (true)', vlim=[min(x), max(x)])
plt.subplot(232, projection='3d')
plot_bunny(x=x_noisy, title='signal (noisy)', vlim=[min(x), max(x)])
plt.subplot(233, projection='3d')
plot_bunny(x=x_lp, title='Low-pass', vlim=[min(x_lp), max(x_lp)])
plt.subplot(234, projection='3d')
plot_bunny(x=x_bp, title='Band-pass', vlim=[min(x_bp), max(x_bp)])
plt.subplot(235, projection='3d')
plot_bunny(x=x_hp, title='High-pass', vlim=[min(x_hp), max(x_hp)])
plt.subplot(236, projection='3d')
plot_bunny(x=x_tk, title='Tikhonov denoised signal', vlim=[min(x_tk), max(x_tk)])
How would you link to the observations you made before about the spectral decomposition of the laplacian? Also, judging from the results, what type of model prior do you think Tikhonov regularization enforces?
Your answer here
We have seen how we can use the GFT to define different filters that enhance or reduce certain frequency bands. However, to do so, we require an explicit eigendecomposition of the graph laplacian, which has a cost $O(n^3)$. For very large graphs this is very intense computationally. We will now see how we can obtain similar results by filtering the signals directly without resorting to an eigendecomposition.
The key idea is to use a polynomial of the graph laplacian to define a graph filter, i.e., $g(L)x=\sum_{k=1}^K \alpha_k L^k x$, and use the fact that the powers of a diagonalizable matrix can be written in terms of powers of its eigenvalues. This is $$ L^k=(U\Lambda U^T)^k=U\Lambda^k U^T = U\begin{bmatrix} (\lambda_0)^k &\dots & 0\\ \vdots & \ddots & \vdots\\ 0 & \dots & (\lambda_N)^k \end{bmatrix} U^T. $$
This means that a polynomial of the graph laplacian acts independently on each eigenvalue of the graph, and has a frequency spectrum of $$g(\lambda)=\sum_{k=1}^K \alpha_k \lambda^k.$$ Hence, $$g(L)x=\sum_{k=1}^K \alpha_k L^k x=\sum_{k=1}^K \alpha_k U\Lambda^k U^T x=U \left(\sum_{k=1}^K \alpha_k\Lambda^k \right)U^T x=\operatorname{iGFT}\left(g(\Lambda)\operatorname{GFT}(x)\right).$$
With these ingredients, we have reduced the design of graph filters in the vertex domain to a regression task that approximates a given spectral response by a polynomial. There are multiple ways to do this, but in this assignment we will implement a very simple strategy based on least-squares regression.
Implement a function to find the coefficients of a polynomial that approximates a given ideal filter.
Hint: np.vander
and np.linalg.lstsq
.
def fit_polynomial(lam: np.ndarray, order: int, spectral_response: np.ndarray):
""" Return an array of polynomial coefficients of length 'order'."""
# Your code here
Implement a function to compute the frequency response of that filter.
def polynomial_graph_filter_response(coeff: np.array, lam: np.ndarray):
""" Return an array of the same shape as lam.
response[i] is the spectral response at frequency lam[i]. """
# Your code here
Let us fit the Tikhonov ideal filter with several polynomials of different order.
plt.plot(lam, ideal_tk)
orders = [1, 2, 3, 5, 10, 20]
for order in orders:
coeff_tk = fit_polynomial(lam, order, ideal_tk)
plt.plot(lam, polynomial_graph_filter_response(coeff_tk, lam))
plt.xlabel('$\lambda$')
plt.ylabel('Spectral response')
plt.legend(orders)
So far, we have only defined a way to compute the coefficients of our laplacian polynomial. Let us now compute our graph filter.
def polynomial_graph_filter(coeff: np.array, laplacian: np.ndarray):
""" Return the laplacian polynomial with coefficients 'coeff'. """
# Your code here
Based on the previous plot, choose a filter order that achieves (in your opinion) a good tradeoff in terms of computational complexity and response accuracy.
order = # Your code here
coeff_tk = fit_polynomial(lam, order, ideal_tk)
g_tk = polynomial_graph_filter(coeff_tk, laplacian)
As you have seen in class, polynomial graph filters are only one of the ways in which you can approximate ideal graph filters. In this sense, ARMA filters are a natural way to implement Tikhonov denoising on graphs. Let us recall the general solution of the Tikhonov regularized denoising problem
$$y=(I+\alpha L)^{-1}x. $$With a little bit of algebra manipulation we can rewrite this expression as $$ y = -\alpha L y + x, $$ from which we can derive the iterative algorithm $$ y_k = -\alpha L y_{k-1} + x\qquad k=1,2,\dots $$ which is guaranteed to converge as long as $\alpha \lambda_{max} < 1$.
Implement the ARMA version of Tikhonov regularization.
def arma_tikhonov(x: np.ndarray, laplacian: np.ndarray, alpha: float, max_iter=50):
""" Return an array of the same shape as x."""
# Your code here
Filter the previous noisy graph signal with the polynomial and ARMA approximations of the ideal Tikhonov filter.
x_tk_polynomial = # Your code here
x_tk_arma = arma_tikhonov(x_noisy, laplacian, alpha)
Let us compare with the previous version.
plt.figure(figsize=(18, 4))
plt.subplot(131, projection='3d')
plot_bunny(x_tk, title='Ideal filter', vlim=[min(x_tk), max(x_tk)])
plt.subplot(132, projection='3d')
plot_bunny(x_tk_polynomial, title='Polynomial filter', vlim=[min(x_tk), max(x_tk)])
plt.subplot(133, projection='3d')
plot_bunny(x_tk_arma, title='ARMA filter', vlim=[min(x_tk), max(x_tk)])
So far, we have only played with toy examples. Let us see the use of these tools in practice! In particular, let us see how we can use some graph filters to construct features to feed a classifier. For this part of the assignment we will import some extra packages.
import time
import networkx as nx
from sklearn.linear_model import LogisticRegression
import torch
import torch.nn as nn
import torch.nn.functional as F
import dgl.function as fn
from dgl import DGLGraph
from dgl.data.citation_graph import load_cora
np.random.seed(0)
torch.manual_seed(1)
We will use the CORA dataset and the citation graph that we created in Assignment 1. However, to simplify the next tasks we will directly use the preprocessed version of this dataset contained within the Deep Graph Library (DGL).
In this assignment, we will interpret CORA's features as multidimensional graph signals living on the citation graph. Our task is to design a classifier that uses these features and the geometry of the graph can identify the type of paper each node represents.
The goal of this exercise is to do semi-supervised learning on graphs.
We assume that we know to which scientific field a small subset of the papers belongs (the ones contained in train_mask
).
The goal is to predict to which field the other papers belong, using both the citation graph and the bag-of-word representation of each paper.
cora = load_cora()
features = torch.FloatTensor(cora.features) # Feature vector for each paper
labels = torch.LongTensor(cora.labels) # The field to which each paper belongs
train_mask = torch.BoolTensor(cora.train_mask) # Mask of nodes selected for training
val_mask = torch.BoolTensor(cora.val_mask) # Mask of nodes selected for validation
test_mask = torch.BoolTensor(cora.test_mask) # Mask of nodes selected for testing
in_feats = features.shape[1]
n_classes = cora.num_labels
n_edges = cora.graph.number_of_edges()
graph = cora.graph
adjacency = np.asarray(nx.to_numpy_matrix(graph))
n_nodes = adjacency.shape[0]
For this exercise we will use the normalized laplacian.
laplacian = compute_laplacian(adjacency, normalize=True)
lam, U = spectral_decomposition(laplacian)
lam_max = np.max(lam)
The simplest classification method consists in ignoring the citation graph and trying to classify the papers using only the features.
In this case, the problem is viewed as a standard classification task.
To train our classifier we will select a few nodes in our graph for training and fit a logistic regression classifier on them.
To avoid overfitting to the test set when we do hyperparameter tuning, we will also select a validation set.
And finally, we will test our classifier on the rest of the nodes.
Hint: use sklearn.linear_model.LogisticRegression
.
train_features = features[train_mask]
train_labels = labels[train_mask]
val_features = features[val_mask]
val_labels = labels[val_mask]
test_features = features[test_mask]
test_labels = labels[test_mask]
# Fit a logistic regression model
# Your code here
train_acc = # Your code here
val_acc = # Your code here
test_acc = # Your code here
print('Train accuracy {:.4f} | Validation accuracy {:.4f} | Test accuracy {:.4f}'.format(train_acc, val_acc, test_acc))
That's not a bad start! Now, let's try to improve a bit the results by taking into account the graph structure using tools from GSP. For this purpose, we will design a handcrafted filter that will be used to denoise the signal, before feeding it to a logistic regression.
However, before we start, what hypothesis can you make on the spectral properties of the denoised signal?
Your answer here
Based on this prior, design an ideal filter response that you believe could enhance important features of the graph.
Note: you just need to design one graph filter that we will apply to all features. Don't design a different filter for each feature.
Note: finding the right filter can be very challenging, don't worry if you can't find it. Just make sure you experiment with a few configurations and parameters.
ideal_filter = # Store your spectral response here
Choose a filter order to approximate your filter using laplacian polynomials.
order = # Your code here
coeff = fit_polynomial(lam, order, ideal_filter)
graph_filter = polynomial_graph_filter(coeff, laplacian)
Let's plot the frequency response of your spectral template and its polynomial approximation.
plt.plot(lam, ideal_filter)
plt.plot(lam, polynomial_graph_filter_response(coeff, lam))
plt.legend(['Ideal', 'Polynomial'])
plt.xlabel('$\lambda$')
plt.ylabel('Spectral response')
Now, let's create the new features.
filtered_features = graph_filter @ features.numpy()
train_features = filtered_features[train_mask,:]
train_labels = labels[train_mask]
val_features = filtered_features[val_mask,:]
val_labels = labels[val_mask]
test_features = filtered_features[test_mask,:]
test_labels = labels[test_mask]
Train another logistic regression classifier on the new features. Remember to play with the regularization parameters to achieve a well performing model.
# Your code here
Evaluate your model.
train_acc = # Your code here
val_acc = # Your code here
test_acc = # Your code here
print('Train accuracy {:.4f} | Validation accuracy {:.4f} | Test accuracy {:.4f}'.format(train_acc, val_acc, test_acc))
By now, you will probably have seen that it is challenging to find the right combination of spectral response, filter parameters and regularization method. And in most cases, this is a painstaking job. Wouldn't it be great to automate these tasks?
Fortunately, this is possible if we use the right tools! Specifically, we will see that Graph Convolutional Networks are a great framework to automatize the feature extraction method.
In this exercise, we will follow the same classification pipeline as above, but instead of hand-crafting our filter we will let PyTorch
find the coefficients for us using gradient descent.
In this section, most of the code is already written. Try to understand it and to play with some parameters. It may be useful if you want to solve some learning task in your project.
We start by constructing a LaplacianPolynomial
model in DGL
. It computes the function: $f(X) = \sum_{i=1}^{k} \alpha_i L^i X \theta$ where the trainable parameters are the coefficients $\alpha_i$ and the matrix $\theta$. This function can be interpreted as a filtering of $X$ by $\sum_{i=1}^{k} \alpha_i L^i$ followed by a linear layer.
class LaplacianPolynomial(nn.Module):
def __init__(self,
in_feats: int,
out_feats: int,
k: int,
dropout_prob: float,
norm=True):
super().__init__()
self._in_feats = in_feats
self._out_feats = out_feats
self._k = k
self._norm = norm
# Contains the weights learned by the Laplacian polynomial
self.pol_weights = nn.Parameter(torch.Tensor(self._k + 1))
# Contains the weights learned by the logistic regression (without bias)
self.logr_weights = nn.Parameter(torch.Tensor(in_feats, out_feats))
self.dropout = nn.Dropout(p=dropout_prob)
self.reset_parameters()
def reset_parameters(self):
"""Reinitialize learnable parameters."""
torch.manual_seed(0)
torch.nn.init.xavier_uniform_(self.logr_weights, gain=0.01)
torch.nn.init.normal_(self.pol_weights, mean=0.0, std=1e-3)
def forward(self, graph, feat):
r"""Compute graph convolution.
Notes
-----
* Input shape: :math:`(N, *, \text{in_feats})` where * means any number of additional
dimensions, :math:`N` is the number of nodes.
* Output shape: :math:`(N, *, \text{out_feats})` where all but the last dimension are
the same shape as the input.
Parameters
----------
graph (DGLGraph) : The graph.
feat (torch.Tensor): The input feature
Returns
-------
(torch.Tensor) The output feature
"""
feat = self.dropout(feat)
graph = graph.local_var()
# D^(-1/2)
norm = torch.pow(graph.in_degrees().float().clamp(min=1), -0.5)
shp = norm.shape + (1,) * (feat.dim() - 1)
norm = torch.reshape(norm, shp)
# mult W first to reduce the feature size for aggregation.
feat = torch.matmul(feat, self.logr_weights)
result = self.pol_weights[0] * feat.clone()
for i in range(1, self._k + 1):
old_feat = feat.clone()
if self._norm:
feat = feat * norm
graph.ndata['h'] = feat
# Feat is not modified in place
graph.update_all(fn.copy_src(src='h', out='m'),
fn.sum(msg='m', out='h'))
if self._norm:
graph.ndata['h'] = graph.ndata['h'] * norm
feat = old_feat - graph.ndata['h']
result += self.pol_weights[i] * feat
return result
def extra_repr(self):
"""Set the extra representation of the module,
which will come into effect when printing the model.
"""
summary = 'in={_in_feats}, out={_out_feats}'
summary += ', normalization={_norm}'
return summary.format(**self.__dict__)
Once we have are model ready we just need to create a function that performs one step of our training loop, and another one that evaluates our model.
def train(model, g, features, labels, loss_fcn, train_mask, optimizer):
model.train() # Activate dropout
logits = model(g, features)
loss = loss_fcn(logits[train_mask], labels[train_mask])
optimizer.zero_grad()
loss.backward()
optimizer.step()
return loss
def evaluate(model, g, features, labels, mask):
model.eval() # Deactivate dropout
with torch.no_grad():
logits = model(g, features)[mask] # only compute the evaluation set
labels = labels[mask]
_, indices = torch.max(logits, dim=1)
correct = torch.sum(indices == labels)
return correct.item() * 1.0 / len(labels)
Choose the training parameters.
pol_order = 3
lr = 0.2
weight_decay = 5e-6
n_epochs = 1000
p_dropout = 0.8
And train the classifier end to end.
graph = DGLGraph(cora.graph)
model = LaplacianPolynomial(in_feats, n_classes, pol_order, p_dropout)
loss_fcn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),
lr=lr,
weight_decay=weight_decay)
dur = []
for epoch in range(n_epochs):
if epoch >= 3:
t0 = time.time()
loss = train(model, graph, features, labels, loss_fcn, train_mask, optimizer)
if epoch >= 3:
dur.append(time.time() - t0)
acc = evaluate(model, graph, features, labels, val_mask)
print("Epoch {:05d} | Time(s) {:.4f} | Train Loss {:.4f} | Val Accuracy {:.4f}". format(
epoch, np.mean(dur), loss.item(), acc))
print()
acc = evaluate(model, graph, features, labels, test_mask)
print("Test Accuracy {:.4f}".format(acc))
Trained this way our GCN based on polynomials of the laplacian is a black box. Fortunately, however, the only difference between this shallow model and our previous classifier is the way we chose the filter coefficients.
Let's see what the network learned. Print the coefficients of the learned filter.
coeff_gcn = # Your code here
print(coeff_gcn)
To interpret the model we can plot the frequency response of the learned filter.
plt.semilogy(lam, np.abs(polynomial_graph_filter_response(coeff_gcn, lam)))
plt.xlabel('$\lambda$')
plt.ylabel('Spectral response (db)')
As we said, the whole classification pipeline of the previous exercise is identical to the one we tried before: Graph filtering + Logistic regression. The only difference lies in the way we chose the filter coefficients. First we were choosing them manually, and now, we let PyTorch
find them for us. However, if everything is correct we should be able to use this filter to construct new hand-crafted features and train a logistic regression model that achieves good accuracy on the training set. Let's do that!
Use the learned coefficients to train a new feature extractor:
graph_gcn_filter = # Your code here
Let's extract the new features by filtering the data:
features_gcn = graph_gcn_filter @ features.numpy()
train_features_gcn = features_gcn[train_mask,:]
train_labels = labels[train_mask]
val_features_gcn = features_gcn[val_mask,:]
val_labels = labels[val_mask]
test_features_gcn = features_gcn[test_mask,:]
test_labels = labels[test_mask]
Train a logistic regression on these features:
# Your code here
Finally, let's evaluate this model:
train_acc = # Your code here
val_acc = # Your code here
test_acc = # Your code here
print('Train accuracy {:.4f} | Validation accuracy {:.4f} | Test accuracy {:.4f}'.format(train_acc, val_acc, test_acc))
The performance of this model may not be exactly the same as the one obtained with Pytorch. What are the differences in the training procedure that can explain this gap?
Your answer here