It turns out, that Networkx is weak in terms of community detection.. And since it is forbidden to install additional software on our department computers, we will use only what we have.
And the only thing we have is k-clique perlocation.
import numpy as np
import sklearn.metrics as metrics
import scipy.io
from scipy.spatial.distance import pdist, squareform
from scipy.cluster import hierarchy
import matplotlib.pyplot as plt
plt.xkcd()
import networkx as nx
%matplotlib inline
Lets load our beloved Zackhary Karate Club and run 3-clique perlocations. Find 3-cliques communities and assign a cluster label to each node. Draw the graph with respect to the community assignment. Try to highligh intersected nodes.
G = nx.karate_club_graph()
cliqComm = nx.k_clique_communities(G, 3)
cliqComm=list(cliqComm)
cliqComm
[frozenset({0, 1, 2, 3, 7, 8, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 26, 27, 28, 29, 30, 31, 32, 33}), frozenset({0, 4, 5, 6, 10, 16}), frozenset({24, 25, 31})]
colorList=np.ones([G.order(),])
for c in xrange(len(cliqComm)):
for i in cliqComm[c]:
colorList[i] += c+1
print colorList
[ 4. 2. 2. 2. 3. 3. 3. 2. 2. 1. 3. 1. 2. 2. 2. 2. 3. 2. 2. 2. 2. 2. 2. 2. 4. 4. 2. 2. 2. 2. 2. 5. 2. 2.]
nx.draw(G,
cmap = plt.get_cmap('Reds'),
node_color=colorList,
node_size=500,
alpha=0.8,
with_labels=True)
The following similarity measures are ofter used to evaluate the performance of community detection (clustering) algorithm in situation when a dataset has ground-truth partition (Generators, Real datasets).
Let's denote $\hat{\pi}$ as an outcome of community detection algorithm and $\pi^*$ as a real partition. Ratio of correctly clustered observations is calculated as
$$ Acc(\hat{\pi}, \pi^*) = \frac{\text{# of correctly clustered obs}}{N} \text{,}$$where observation is identified as "correctly clustered" if at least half its co-clustered observations in produced partition $\hat{\pi}$ belong to the same cluster in partition $\pi^*$
where
$\pi^*$,
communities in $\pi^*$ ($\hat{\pi}$)
Adjusted Rand Index is a slightly corrected version of Rand index:
$$\text{ARI}(\hat{\pi},\pi^*) = \frac{\text{Rand}(\hat{\pi},\pi^*) - \text{Expected}}{\text{Max} - \text{Expected}}$$As "alternaite" there is also a measure called Normalized Mutual Information. I call it alternative, because the results of NMI and ARI are quite similar.
Yes, we have seen it before.
Newman randomized version of the graph is obtained with rewinding of the edges under the constraint, that the expected degree of each vertex matches the degree of the vertex in the original graph. The expected number of edges between vertices $v_i$ and $v_j$ in given graph is $k_i k_j/2m$. Thus the modularity can be written as
\begin{equation} Q = \frac{1}{2m} \sum\limits_{ij}\left(a_{ij} - \frac{k_i k_j}{2m}\right)\delta(\mathcal{C}_i,\mathcal{C}_j), \end{equation}where $\delta(\cdot,\cdot)$ is Kronecker's delta and $\mathcal{C}_i$ is cluster label of vertex $v_i$. The more number of edges exceeds the expected number of connections the better community is defined. So, large values of modularity indicate good communities, however this is now always true. Moreover the maximum of graph's modularity generally grows if the size of the graph and/or the number of separated clusters increase.
data = scipy.io.loadmat('network3.mat')
A = data['A'].astype('float')
plt.spy(A)
comm = data['Comm']
G = nx.Graph(A)
# pos = nx.spring_layout(G, scale = 5, iterations=100)
nx.draw(G,
pos,
node_color=comm,
alpha = 0.6,
with_labels = False)
D = pdist(A, metric = 'jaccard')
hc = hierarchy.average(D)
Z = hierarchy.dendrogram(hc)
comm2 = hierarchy.fcluster(hc, 4, criterion='maxclust')
comm = comm.flatten()
metrics.adjusted_rand_score(comm, comm2)
0.11638609695134744
nx.draw(G,
pos,
node_color=comm2,
alpha = 0.6,
with_labels = False)