What is this?¶

This notebook is an almost one to one copy from Marks Needham's Neo4j Graph Algorithms Jupyter Notebooks.

The purpose of this notebook is to demonstrate how the original Jupyter Python notebooks culd look like with the Cypher kernel.

Contents¶

Path finding

All Pairs Shortest Path
Single Source Shortest Path

Centrality

Degree Centrality
Closeness Centrality
Betweenness Centrality

Community Detection

Strongly Connected Components
Weighted Union Find
Unweighted Union Find
Label Propagation
Louvain
Triangle Counting

The Setup¶

If you do not have a Neo4j database running with APOC installed, then you could use the following:

In [ ]:

%%bash
rm -rf plugins
mkdir plugins
cd plugins
echo "Downloading APOC plugin"
wget https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/3.3.0.1/apoc-3.3.0.1-all.jar
echo "Downloading algorithms plugin"
wget https://github.com/neo4j-contrib/neo4j-graph-algorithms/releases/download/3.3.2.0/graph-algorithms-algo-3.3.2.0.jar
cd ..

In [2]:

%%bash
docker run \
    -d --name myneo4j \
    --publish=7474:7474 \
    --publish=7687:7687 \
    --volume=$(pwd)/plugins:/plugins \
    --env NEO4J_AUTH=neo4j/class \
    --env=NEO4J_dbms_security_procedures_unrestricted=apoc.\\\*,algo.\\\* \
    neo4j

Out[2]:

dd8ea088ca50b68f80129a1805e351e1ce9404ad055bc0a58993e08ac62d9f80

All Pairs Shortest Path¶

All Pairs Shortest Path (APSP) calculates the shortest (weighted) path between all pairs of nodes. This algorithm has optimisations that make it quicker than calling the SSSP algorithm for every pair of nodes in the graph.

In [11]:

CREATE (a:Loc{name:'A'}), (b:Loc{name:'B'}), (c:Loc{name:'C'}), 
       (d:Loc{name:'D'}), (e:Loc{name:'E'}), (f:Loc{name:'F'}),
       (a)-[:ROAD {cost:50}]->(b),
       (a)-[:ROAD {cost:50}]->(c),
       (a)-[:ROAD {cost:100}]->(d),
       (a)-[:RAIL {cost:50}]->(d),
       (b)-[:ROAD {cost:40}]->(d),
       (c)-[:ROAD {cost:40}]->(d),
       (c)-[:ROAD {cost:80}]->(e),
       (d)-[:ROAD {cost:30}]->(e),
       (d)-[:ROAD {cost:80}]->(f),
       (e)-[:ROAD {cost:40}]->(f),
       (e)-[:RAIL {cost:20}]->(f);

Out[11]:

In [12]:

MATCH (n) 
RETURN labels(n) AS labels, COUNT(*) AS count
ORDER BY count DESC

Out[12]:

+-----------------+
| labels  | count |
+-----------------+
| ["Loc"] | 6     |
+-----------------+

1 row available after 1 ms, consumed after another 0 ms

In [13]:

MATCH (n:Loc)-[r]->(m:Loc)
RETURN n,r,m;

Out[13]:

+----------------------------------------------------------------------------------------------------+
| n                            | r                                    | m                            |
+----------------------------------------------------------------------------------------------------+
| (:Loc {name: "A", _id_: 46}) | [:ROAD {_id_: 52, cost: 50}[46>48]]  | (:Loc {name: "C", _id_: 48}) |
| (:Loc {name: "A", _id_: 46}) | [:ROAD {_id_: 53, cost: 100}[46>49]] | (:Loc {name: "D", _id_: 49}) |
| (:Loc {name: "A", _id_: 46}) | [:RAIL {_id_: 54, cost: 50}[46>49]]  | (:Loc {name: "D", _id_: 49}) |
| (:Loc {name: "A", _id_: 46}) | [:ROAD {_id_: 51, cost: 50}[46>47]]  | (:Loc {name: "B", _id_: 47}) |
| (:Loc {name: "B", _id_: 47}) | [:ROAD {_id_: 55, cost: 40}[47>49]]  | (:Loc {name: "D", _id_: 49}) |
| (:Loc {name: "C", _id_: 48}) | [:ROAD {_id_: 56, cost: 40}[48>49]]  | (:Loc {name: "D", _id_: 49}) |
| (:Loc {name: "C", _id_: 48}) | [:ROAD {_id_: 57, cost: 80}[48>50]]  | (:Loc {name: "E", _id_: 50}) |
| (:Loc {name: "D", _id_: 49}) | [:ROAD {_id_: 58, cost: 30}[49>50]]  | (:Loc {name: "E", _id_: 50}) |
| (:Loc {name: "D", _id_: 49}) | [:ROAD {_id_: 59, cost: 80}[49>51]]  | (:Loc {name: "F", _id_: 51}) |
| (:Loc {name: "E", _id_: 50}) | [:ROAD {_id_: 60, cost: 40}[50>51]]  | (:Loc {name: "F", _id_: 51}) |
| (:Loc {name: "E", _id_: 50}) | [:RAIL {_id_: 61, cost: 20}[50>51]]  | (:Loc {name: "F", _id_: 51}) |
+----------------------------------------------------------------------------------------------------+

11 rows available after 26 ms, consumed after another 3 ms

In [14]:

CALL algo.allShortestPaths.stream('cost',{nodeQuery:'Loc',defaultValue:1.0})
YIELD sourceNodeId, targetNodeId, distance
WITH sourceNodeId, targetNodeId, distance 
WHERE algo.isFinite(distance) = true

MATCH (source:Loc) WHERE id(source) = sourceNodeId
MATCH (target:Loc) WHERE id(target) = targetNodeId
WITH source, target, distance WHERE source <> target

RETURN source.name AS source, target.name AS target, distance
ORDER BY distance DESC
LIMIT 10

Out[14]:

+----------------------------+
| source | target | distance |
+----------------------------+
| "A"    | "F"    | 100.0    |
| "B"    | "F"    | 90.0     |
| "C"    | "F"    | 90.0     |
| "A"    | "E"    | 80.0     |
| "B"    | "E"    | 70.0     |
| "C"    | "E"    | 70.0     |
| "A"    | "B"    | 50.0     |
| "A"    | "C"    | 50.0     |
| "D"    | "F"    | 50.0     |
| "A"    | "D"    | 50.0     |
+----------------------------+

10 rows available after 14 ms, consumed after another 0 ms

In [15]:

%%python
df

Out[15]:

  source target distance
0    "A"    "F"    100.0
1    "B"    "F"     90.0
2    "C"    "F"     90.0
3    "A"    "E"     80.0
4    "B"    "E"     70.0
5    "C"    "E"     70.0
6    "A"    "B"     50.0
7    "A"    "C"     50.0
8    "D"    "F"     50.0
9    "A"    "D"     50.0

Single Source Shortest Path¶

The Single Source Shortest Path (SSSP) algorithm calculates the shortest (weighted) path between a pair of nodes. Dijkstra's algorithm is the most well known one in this category. SSSP is a real time graph algorithm - it can be used as part of the normal user flow in a web or mobile application.

In [16]:

MATCH (start:Loc{name:'A'}), (end:Loc{name:'F'})
CALL algo.shortestPath.stream(start, end, 'cost') 
YIELD nodeId, cost
MATCH (other:Loc) WHERE id(other) = nodeId
RETURN other.name AS name, cost

Out[16]:

+--------------+
| name | cost  |
+--------------+
| "A"  | 0.0   |
| "D"  | 50.0  |
| "E"  | 80.0  |
| "F"  | 100.0 |
+--------------+

4 rows available after 1 ms, consumed after another 4 ms

Degree Centrality¶

Degree Centrality is the simplest of all the centrality algorithms. It measures the number of incoming and outgoing relationships from a node. The algorithm can help us find popular nodes in a graph.

In [17]:

MERGE (nAlice:User {id:'Alice'})
MERGE (nBridget:User {id:'Bridget'})
MERGE (nCharles:User {id:'Charles'})
MERGE (nDoug:User {id:'Doug'})
MERGE (nMark:User {id:'Mark'})
MERGE (nMichael:User {id:'Michael'})
MERGE (nAlice)-[:FOLLOWS]->(nDoug)
MERGE (nAlice)-[:FOLLOWS]->(nBridget)
MERGE (nAlice)-[:FOLLOWS]->(nCharles)
MERGE (nMark)-[:FOLLOWS]->(nDoug)
MERGE (nMark)-[:FOLLOWS]->(nMichael)
MERGE (nBridget)-[:FOLLOWS]->(nDoug)
MERGE (nCharles)-[:FOLLOWS]->(nDoug)
MERGE (nMichael)-[:FOLLOWS]->(nDoug)

Out[17]:

In [18]:

MATCH (n:User)-[f:FOLLOWS]->(m:User)
RETURN n, f, m

Out[18]:

+------------------------------------------------------------------------------------------------------+
| n                                 | f                            | m                                 |
+------------------------------------------------------------------------------------------------------+
| (:User {_id_: 52, id: "Alice"})   | [:FOLLOWS {_id_: 62}[52>55]] | (:User {_id_: 55, id: "Doug"})    |
| (:User {_id_: 52, id: "Alice"})   | [:FOLLOWS {_id_: 63}[52>53]] | (:User {_id_: 53, id: "Bridget"}) |
| (:User {_id_: 52, id: "Alice"})   | [:FOLLOWS {_id_: 64}[52>54]] | (:User {_id_: 54, id: "Charles"}) |
| (:User {_id_: 53, id: "Bridget"}) | [:FOLLOWS {_id_: 67}[53>55]] | (:User {_id_: 55, id: "Doug"})    |
| (:User {_id_: 54, id: "Charles"}) | [:FOLLOWS {_id_: 68}[54>55]] | (:User {_id_: 55, id: "Doug"})    |
| (:User {_id_: 56, id: "Mark"})    | [:FOLLOWS {_id_: 66}[56>57]] | (:User {_id_: 57, id: "Michael"}) |
| (:User {_id_: 56, id: "Mark"})    | [:FOLLOWS {_id_: 65}[56>55]] | (:User {_id_: 55, id: "Doug"})    |
| (:User {_id_: 57, id: "Michael"}) | [:FOLLOWS {_id_: 69}[57>55]] | (:User {_id_: 55, id: "Doug"})    |
+------------------------------------------------------------------------------------------------------+

8 rows available after 24 ms, consumed after another 4 ms

Finally we can run the algorithm by executing the following query:

In [19]:

MATCH (u:User)
RETURN u.id AS name,
       size((u)-[:FOLLOWS]->()) AS follows,
       size((u)<-[:FOLLOWS]-()) AS followers

Out[19]:

+---------------------------------+
| name      | follows | followers |
+---------------------------------+
| "Alice"   | 3       | 0         |
| "Bridget" | 1       | 1         |
| "Charles" | 1       | 1         |
| "Doug"    | 0       | 5         |
| "Mark"    | 2       | 0         |
| "Michael" | 1       | 1         |
+---------------------------------+

6 rows available after 1 ms, consumed after another 1 ms

We can see that Doug is the most popular user in our imaginary Twitter graph with 5 followers - all other users follow him but he doesn't follow anybody back. In the real Twitter network celebrities have very high follower counts but tend to follow very few back people. We could therefore consider Doug a celebrity!

We can also call a version of the algorithm that will store the result as a property on a node. This is useful if we want to run future queries that use the result.

In [20]:

MATCH (u:User)
set u.followers = size((u)<-[:FOLLOWS]-())

Out[20]:

In [21]:

MATCH (n:User)-[f:FOLLOWS]->(m:User)
RETURN n, f, m

Out[21]:

+----------------------------------------------------------------------------------------------------------------------------------+
| n                                               | f                            | m                                               |
+----------------------------------------------------------------------------------------------------------------------------------+
| (:User {_id_: 52, followers: 0, id: "Alice"})   | [:FOLLOWS {_id_: 62}[52>55]] | (:User {_id_: 55, followers: 5, id: "Doug"})    |
| (:User {_id_: 52, followers: 0, id: "Alice"})   | [:FOLLOWS {_id_: 63}[52>53]] | (:User {_id_: 53, followers: 1, id: "Bridget"}) |
| (:User {_id_: 52, followers: 0, id: "Alice"})   | [:FOLLOWS {_id_: 64}[52>54]] | (:User {_id_: 54, followers: 1, id: "Charles"}) |
| (:User {_id_: 53, followers: 1, id: "Bridget"}) | [:FOLLOWS {_id_: 67}[53>55]] | (:User {_id_: 55, followers: 5, id: "Doug"})    |
| (:User {_id_: 54, followers: 1, id: "Charles"}) | [:FOLLOWS {_id_: 68}[54>55]] | (:User {_id_: 55, followers: 5, id: "Doug"})    |
| (:User {_id_: 56, followers: 0, id: "Mark"})    | [:FOLLOWS {_id_: 66}[56>57]] | (:User {_id_: 57, followers: 1, id: "Michael"}) |
| (:User {_id_: 56, followers: 0, id: "Mark"})    | [:FOLLOWS {_id_: 65}[56>55]] | (:User {_id_: 55, followers: 5, id: "Doug"})    |
| (:User {_id_: 57, followers: 1, id: "Michael"}) | [:FOLLOWS {_id_: 69}[57>55]] | (:User {_id_: 55, followers: 5, id: "Doug"})    |
+----------------------------------------------------------------------------------------------------------------------------------+

8 rows available after 1 ms, consumed after another 3 ms

Closeness Centrality¶

Closeness Centrality is a way of detecting nodes that are able to spread information very efficiently through a graph.

The Closeness Centrality of a node measures its average distance to all other nodes. Nodes with a high closeness score have the shortest distances to all other nodes.

In [22]:

MERGE (a:Node{id:"A"})
MERGE (b:Node{id:"B"})
MERGE (c:Node{id:"C"})
MERGE (d:Node{id:"D"})
MERGE (e:Node{id:"E"})

MERGE (a)-[:LINK]->(b)
MERGE (b)-[:LINK]->(a)
MERGE (b)-[:LINK]->(c)
MERGE (c)-[:LINK]->(b)
MERGE (c)-[:LINK]->(d)
MERGE (d)-[:LINK]->(c)
MERGE (d)-[:LINK]->(e)
MERGE (e)-[:LINK]->(d)

Out[22]:

In [23]:

MATCH (n:Node)-[f:LINK]->(m:Node)
RETURN n, f, m

Out[23]:

+---------------------------------------------------------------------------------------+
| n                           | f                         | m                           |
+---------------------------------------------------------------------------------------+
| (:Node {_id_: 58, id: "A"}) | [:LINK {_id_: 70}[58>59]] | (:Node {_id_: 59, id: "B"}) |
| (:Node {_id_: 59, id: "B"}) | [:LINK {_id_: 72}[59>60]] | (:Node {_id_: 60, id: "C"}) |
| (:Node {_id_: 59, id: "B"}) | [:LINK {_id_: 71}[59>58]] | (:Node {_id_: 58, id: "A"}) |
| (:Node {_id_: 60, id: "C"}) | [:LINK {_id_: 74}[60>61]] | (:Node {_id_: 61, id: "D"}) |
| (:Node {_id_: 60, id: "C"}) | [:LINK {_id_: 73}[60>59]] | (:Node {_id_: 59, id: "B"}) |
| (:Node {_id_: 61, id: "D"}) | [:LINK {_id_: 75}[61>60]] | (:Node {_id_: 60, id: "C"}) |
| (:Node {_id_: 61, id: "D"}) | [:LINK {_id_: 76}[61>62]] | (:Node {_id_: 62, id: "E"}) |
| (:Node {_id_: 62, id: "E"}) | [:LINK {_id_: 77}[62>61]] | (:Node {_id_: 61, id: "D"}) |
+---------------------------------------------------------------------------------------+

8 rows available after 17 ms, consumed after another 2 ms

The actual algorithm for computing the node centrality is computed with the following query:

In [24]:

CALL algo.closeness.stream('Node', 'LINK')
YIELD nodeId, centrality

MATCH (n:Node) WHERE id(n) = nodeId

RETURN n.id AS node, centrality
ORDER BY centrality DESC
limit 20;

Out[24]:

+---------------------------+
| node | centrality         |
+---------------------------+
| "C"  | 0.6666666666666666 |
| "B"  | 0.5714285714285714 |
| "D"  | 0.5714285714285714 |
| "A"  | 0.4                |
| "E"  | 0.4                |
+---------------------------+

5 rows available after 48 ms, consumed after another 0 ms

Betweenness Centrality¶

Betweenness Centrality is a way of detecting the amount of influence a node has over the flow of information in a graph.

It is often used to find nodes that serve as a bridge from one part of a graph to another. In the above example Alice is the main connection in the graph. If Alice is removed all connections in the graph would be cut off. This makes Alice "important" because she ensures that no nodes are isolated.

In [29]:

MERGE (nAlice:User {id:'Alice'})
MERGE (nBridget:User {id:'Bridget'})
MERGE (nCharles:User {id:'Charles'})
MERGE (nDoug:User {id:'Doug'})
MERGE (nMark:User {id:'Mark'})
MERGE (nMichael:User {id:'Michael'})

MERGE (nAlice)-[:MANAGE]->(nBridget)
MERGE (nAlice)-[:MANAGE]->(nCharles)
MERGE (nAlice)-[:MANAGE]->(nDoug)
MERGE (nMark)-[:MANAGE]->(nAlice)
MERGE (nCharles)-[:MANAGE]->(nMichael)

Out[29]:

In [30]:

MATCH (n:User)-[f:MANAGE]->(m:User)
RETURN n, f, m

Out[30]:

+----------------------------------------------------------------------------------------------------------------------------------+
| n                                               | f                            | m                                               |
+----------------------------------------------------------------------------------------------------------------------------------+
| (:User {_id_: 52, followers: 0, id: "Alice"})   | [:MANAGE {_id_: 79}[52>54]]  | (:User {_id_: 54, followers: 1, id: "Charles"}) |
| (:User {_id_: 52, followers: 0, id: "Alice"})   | [:MANAGE {_id_: 78}[52>53]]  | (:User {_id_: 53, followers: 1, id: "Bridget"}) |
| (:User {_id_: 52, followers: 0, id: "Alice"})   | [:MANAGE {_id_: 100}[52>55]] | (:User {_id_: 55, followers: 5, id: "Doug"})    |
| (:User {_id_: 54, followers: 1, id: "Charles"}) | [:MANAGE {_id_: 102}[54>57]] | (:User {_id_: 57, followers: 1, id: "Michael"}) |
| (:User {_id_: 56, followers: 0, id: "Mark"})    | [:MANAGE {_id_: 101}[56>52]] | (:User {_id_: 52, followers: 0, id: "Alice"})   |
+----------------------------------------------------------------------------------------------------------------------------------+

5 rows available after 18 ms, consumed after another 0 ms

In [31]:

CALL algo.betweenness.stream('User','MANAGE',{direction:'out'}) 
YIELD nodeId, centrality

MATCH (user:User) WHERE id(user) = nodeId

RETURN user.id AS user,centrality
ORDER BY centrality DESC
LIMIT 20;

Out[31]:

+------------------------+
| user      | centrality |
+------------------------+
| "Alice"   | 4.0        |
| "Charles" | 2.0        |
| "Bridget" | 0.0        |
| "Doug"    | 0.0        |
| "Mark"    | 0.0        |
| "Michael" | 0.0        |
+------------------------+

6 rows available after 7 ms, consumed after another 0 ms

Strongly Connected Components¶

The Strongly Connected Components (SCC) algorithm finds sets of connected nodes in a directed graph where each node is reachable in both directions from any other node in the same set. It is often used early in a graph analysis process to help us get an idea of how our graph is structured.

In [32]:

MERGE (nAlice:User {id:'Alice'})
MERGE (nBridget:User {id:'Bridget'})
MERGE (nCharles:User {id:'Charles'})
MERGE (nDoug:User {id:'Doug'})
MERGE (nMark:User {id:'Mark'})
MERGE (nMichael:User {id:'Michael'})

MERGE (nAlice)-[:FOLLOW]->(nBridget)
MERGE (nAlice)-[:FOLLOW]->(nCharles)
MERGE (nMark)-[:FOLLOW]->(nDoug)
MERGE (nMark)-[:FOLLOW]->(nMichael)
MERGE (nBridget)-[:FOLLOW]->(nMichael)
MERGE (nDoug)-[:FOLLOW]->(nMark)
MERGE (nMichael)-[:FOLLOW]->(nAlice)
MERGE (nAlice)-[:FOLLOW]->(nMichael)
MERGE (nBridget)-[:FOLLOW]->(nAlice)
MERGE (nMichael)-[:FOLLOW]->(nBridget)

Out[32]:

In [33]:

MATCH (n:User)-[f:FOLLOW]->(m:User)
RETURN n, f, m

Out[33]:

+----------------------------------------------------------------------------------------------------------------------------------+
| n                                               | f                            | m                                               |
+----------------------------------------------------------------------------------------------------------------------------------+
| (:User {_id_: 52, followers: 0, id: "Alice"})   | [:FOLLOW {_id_: 104}[52>54]] | (:User {_id_: 54, followers: 1, id: "Charles"}) |
| (:User {_id_: 52, followers: 0, id: "Alice"})   | [:FOLLOW {_id_: 110}[52>57]] | (:User {_id_: 57, followers: 1, id: "Michael"}) |
| (:User {_id_: 52, followers: 0, id: "Alice"})   | [:FOLLOW {_id_: 103}[52>53]] | (:User {_id_: 53, followers: 1, id: "Bridget"}) |
| (:User {_id_: 53, followers: 1, id: "Bridget"}) | [:FOLLOW {_id_: 107}[53>57]] | (:User {_id_: 57, followers: 1, id: "Michael"}) |
| (:User {_id_: 53, followers: 1, id: "Bridget"}) | [:FOLLOW {_id_: 111}[53>52]] | (:User {_id_: 52, followers: 0, id: "Alice"})   |
| (:User {_id_: 55, followers: 5, id: "Doug"})    | [:FOLLOW {_id_: 108}[55>56]] | (:User {_id_: 56, followers: 0, id: "Mark"})    |
| (:User {_id_: 56, followers: 0, id: "Mark"})    | [:FOLLOW {_id_: 105}[56>55]] | (:User {_id_: 55, followers: 5, id: "Doug"})    |
| (:User {_id_: 56, followers: 0, id: "Mark"})    | [:FOLLOW {_id_: 106}[56>57]] | (:User {_id_: 57, followers: 1, id: "Michael"}) |
| (:User {_id_: 57, followers: 1, id: "Michael"}) | [:FOLLOW {_id_: 109}[57>52]] | (:User {_id_: 52, followers: 0, id: "Alice"})   |
| (:User {_id_: 57, followers: 1, id: "Michael"}) | [:FOLLOW {_id_: 112}[57>53]] | (:User {_id_: 53, followers: 1, id: "Bridget"}) |
+----------------------------------------------------------------------------------------------------------------------------------+

10 rows available after 15 ms, consumed after another 2 ms

In [34]:

CALL algo.scc.stream('User','FOLLOW')
YIELD nodeId, partition
MATCH (u:User) WHERE id(u) = nodeId
RETURN u.id AS name, partition

Out[34]:

+-----------------------+
| name      | partition |
+-----------------------+
| "Alice"   | 0         |
| "Bridget" | 0         |
| "Charles" | 2         |
| "Doug"    | 3         |
| "Mark"    | 3         |
| "Michael" | 0         |
+-----------------------+

6 rows available after 24 ms, consumed after another 9 ms

In [35]:

%%python
df

Out[35]:

        name partition
0    "Alice"         0
1  "Bridget"         0
2  "Charles"         2
3     "Doug"         3
4     "Mark"         3
5  "Michael"         0

We have 3 strongly connected components in our sample graph. The first and biggest component has members Alice, Bridget, and Michael, while the second component has Doug and Mark. Charles ends up in his own component becuase there isn't an outgoing relationship from that node to any of the others.

We can also call a version of the algorithm that will store the result as a property on a node. This is useful if we want to run future queries that use the result.

In [36]:

CALL algo.scc('User','FOLLOW', {write:true,partitionProperty:'partition'})
YIELD loadMillis, computeMillis, writeMillis, setCount, maxSetSize, minSetSize;

Out[36]:

+-------------------------------------------------------------------------------+
| loadMillis | computeMillis | writeMillis | setCount | maxSetSize | minSetSize |
+-------------------------------------------------------------------------------+
| 0          | 0             | 19          | 3        | 3          | 1          |
+-------------------------------------------------------------------------------+

1 row available after 65 ms, consumed after another 5 ms

Weighted Connected Components¶

The Connected Components or UnionFind algorithm finds sets of connected nodes in an undirected graph where each node is reachable from any other node in the same set. It differs from the SCC algorithm because it only needs a path to exist between pairs of nodes in one direction, whereas SCC needs a path to exist in both directions. As with SCC, UnionFind is often used early in an analysis to understand a graph's structure.

In [37]:

MERGE (nAlice:User {id:'Alice'})
MERGE (nBridget:User {id:'Bridget'})
MERGE (nCharles:User {id:'Charles'})
MERGE (nDoug:User {id:'Doug'})
MERGE (nMark:User {id:'Mark'})
MERGE (nMichael:User {id:'Michael'})

MERGE (nAlice)-[:FRIEND {weight:0.5}]->(nBridget)
MERGE (nAlice)-[:FRIEND {weight:4}]->(nCharles)
MERGE (nMark)-[:FRIEND {weight:1}]->(nDoug)
MERGE (nMark)-[:FRIEND {weight:2}]->(nMichael);

Out[37]:

In [38]:

MATCH (n:User)-[f:FRIEND]->(m:User)
RETURN n, f, m

Out[38]:

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                           | f                                         | m                                                             |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:User {_id_: 52, followers: 0, partition: 0, id: "Alice"}) | [:FRIEND {weight: 0.5, _id_: 113}[52>53]] | (:User {_id_: 53, followers: 1, partition: 0, id: "Bridget"}) |
| (:User {_id_: 52, followers: 0, partition: 0, id: "Alice"}) | [:FRIEND {weight: 4, _id_: 114}[52>54]]   | (:User {_id_: 54, followers: 1, partition: 2, id: "Charles"}) |
| (:User {_id_: 56, followers: 0, partition: 3, id: "Mark"})  | [:FRIEND {weight: 1, _id_: 115}[56>55]]   | (:User {_id_: 55, followers: 5, partition: 3, id: "Doug"})    |
| (:User {_id_: 56, followers: 0, partition: 3, id: "Mark"})  | [:FRIEND {weight: 2, _id_: 116}[56>57]]   | (:User {_id_: 57, followers: 1, partition: 0, id: "Michael"}) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

4 rows available after 14 ms, consumed after another 1 ms

In [39]:

CALL algo.unionFind.stream('User', 'FRIEND', {weightProperty:'weight', defaultValue:0.0, threshold:1.0, concurrency: 1})
YIELD nodeId,setId

MATCH (u:User) WHERE id(u) = nodeId

RETURN u.id AS user, setId

Out[39]:

+-------------------+
| user      | setId |
+-------------------+
| "Alice"   | 0     |
| "Bridget" | 1     |
| "Charles" | 0     |
| "Doug"    | 4     |
| "Mark"    | 4     |
| "Michael" | 4     |
+-------------------+

6 rows available after 21 ms, consumed after another 10 ms

In this case we can see that because the weight of the relationship betwen Bridget and Alice is only 0.5, the relationship is ignored by the algorithm and Bridget ends up in her own component.

We can also call a version of the algorithm that will store the result as a property on a node. This is useful if we want to run future queries that use the result.

In [40]:

CALL algo.unionFind('User', 'FRIEND', {write:true, partitionProperty:"partition",weightProperty:'weight', defaultValue:0.0, threshold:1.0, concurrency: 1})
YIELD nodes, setCount, loadMillis, computeMillis, writeMillis;

Out[40]:

+-------------------------------------------------------------+
| nodes | setCount | loadMillis | computeMillis | writeMillis |
+-------------------------------------------------------------+
| 6     | 3        | 7          | 0             | 3           |
+-------------------------------------------------------------+

1 row available after 24 ms, consumed after another 0 ms

Unweighted Connected Components¶

In [41]:

MERGE (nAlice:User {id:'Alice'})
MERGE (nBridget:User {id:'Bridget'})
MERGE (nCharles:User {id:'Charles'})
MERGE (nDoug:User {id:'Doug'})
MERGE (nMark:User {id:'Mark'})
MERGE (nMichael:User {id:'Michael'})

MERGE (nAlice)-[:FRIEND {weight:0.5}]->(nBridget)
MERGE (nAlice)-[:FRIEND {weight:4}]->(nCharles)
MERGE (nMark)-[:FRIEND {weight:1}]->(nDoug)
MERGE (nMark)-[:FRIEND {weight:2}]->(nMichael);

Out[41]:

In [43]:

MATCH (n:User)-[f:FRIEND]->(m:User)
RETURN n, f, m

Out[43]:

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                           | f                                         | m                                                             |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:User {_id_: 52, followers: 0, partition: 0, id: "Alice"}) | [:FRIEND {weight: 0.5, _id_: 113}[52>53]] | (:User {_id_: 53, followers: 1, partition: 1, id: "Bridget"}) |
| (:User {_id_: 52, followers: 0, partition: 0, id: "Alice"}) | [:FRIEND {weight: 4, _id_: 114}[52>54]]   | (:User {_id_: 54, followers: 1, partition: 0, id: "Charles"}) |
| (:User {_id_: 56, followers: 0, partition: 4, id: "Mark"})  | [:FRIEND {weight: 1, _id_: 115}[56>55]]   | (:User {_id_: 55, followers: 5, partition: 4, id: "Doug"})    |
| (:User {_id_: 56, followers: 0, partition: 4, id: "Mark"})  | [:FRIEND {weight: 2, _id_: 116}[56>57]]   | (:User {_id_: 57, followers: 1, partition: 4, id: "Michael"}) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

4 rows available after 2 ms, consumed after another 3 ms

In [42]:

CALL algo.unionFind.stream('User', 'FRIEND', {}) 
YIELD nodeId,setId

MATCH (u:User) WHERE id(u) = nodeId

RETURN u.id AS user, setId

Out[42]:

+-------------------+
| user      | setId |
+-------------------+
| "Alice"   | 0     |
| "Bridget" | 0     |
| "Charles" | 0     |
| "Doug"    | 4     |
| "Mark"    | 4     |
| "Michael" | 4     |
+-------------------+

6 rows available after 19 ms, consumed after another 5 ms

We have two distinct group of users that have no link between them. The first group contains Alice, Charles, and Bridget, while the second group contains Michael, Doug, and Mark. We can also call a version of the algorithm that will store the result as a property on a node. This is useful if we want to run future queries that use the result.

In [44]:

CALL algo.unionFind('User', 'FRIEND', {write:true, partitionProperty:"partition"}) 
YIELD nodes, setCount, loadMillis, computeMillis, writeMillis;

Out[44]:

+-------------------------------------------------------------+
| nodes | setCount | loadMillis | computeMillis | writeMillis |
+-------------------------------------------------------------+
| 6     | 2        | 1          | 0             | 0           |
+-------------------------------------------------------------+

1 row available after 10 ms, consumed after another 0 ms

Label Propagation¶

Label Propagation (LPA) is a fast algorithm for finding communities in a graph. It detects these communities using network structure alone as its guide and doesn't require a pre-defined objective function or prior information about the communities.

One interesting feature of LPA is that we can give nodes preliminary labels to narrow down the range of solutions generated. This means that it can be used as semi-supervised way of finding communities where we hand-pick some initial communities.

In [45]:

MERGE (nAlice:User {id:'Alice'}) SET nAlice.predefined_label=52
MERGE (nBridget:User {id:'Bridget'}) SET nBridget.predefined_label=21
MERGE (nCharles:User {id:'Charles'}) SET nCharles.predefined_label=43
MERGE (nDoug:User {id:'Doug'}) SET nDoug.predefined_label=21
MERGE (nMark:User {id:'Mark'}) SET nMark.predefined_label=19
MERGE (nMichael:User {id:'Michael'}) SET nMichael.predefined_label=52

MERGE (nAlice)-[:FOLLOW]->(nBridget)
MERGE (nAlice)-[:FOLLOW]->(nCharles)
MERGE (nMark)-[:FOLLOW]->(nDoug)
MERGE (nBridget)-[:FOLLOW]->(nMichael)
MERGE (nDoug)-[:FOLLOW]->(nMark)
MERGE (nMichael)-[:FOLLOW]->(nAlice)
MERGE (nAlice)-[:FOLLOW]->(nMichael)
MERGE (nBridget)-[:FOLLOW]->(nAlice)
MERGE (nMichael)-[:FOLLOW]->(nBridget)
MERGE (nCharles)-[:FOLLOW]->(nDoug)

Out[45]:

In [46]:

MATCH (n:User)-[f:FOLLOW]->(m:User)
RETURN n, f, m

Out[46]:

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                   | f                            | m                                                                                   |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 0, id: "Alice"})   | [:FOLLOW {_id_: 104}[52>54]] | (:User {predefined_label: 43, _id_: 54, followers: 1, partition: 0, id: "Charles"}) |
| (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 0, id: "Alice"})   | [:FOLLOW {_id_: 110}[52>57]] | (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: "Michael"}) |
| (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 0, id: "Alice"})   | [:FOLLOW {_id_: 103}[52>53]] | (:User {predefined_label: 21, _id_: 53, followers: 1, partition: 0, id: "Bridget"}) |
| (:User {predefined_label: 21, _id_: 53, followers: 1, partition: 0, id: "Bridget"}) | [:FOLLOW {_id_: 107}[53>57]] | (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: "Michael"}) |
| (:User {predefined_label: 21, _id_: 53, followers: 1, partition: 0, id: "Bridget"}) | [:FOLLOW {_id_: 111}[53>52]] | (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 0, id: "Alice"})   |
| (:User {predefined_label: 43, _id_: 54, followers: 1, partition: 0, id: "Charles"}) | [:FOLLOW {_id_: 83}[54>55]]  | (:User {predefined_label: 21, _id_: 55, followers: 5, partition: 4, id: "Doug"})    |
| (:User {predefined_label: 21, _id_: 55, followers: 5, partition: 4, id: "Doug"})    | [:FOLLOW {_id_: 108}[55>56]] | (:User {predefined_label: 19, _id_: 56, followers: 0, partition: 4, id: "Mark"})    |
| (:User {predefined_label: 19, _id_: 56, followers: 0, partition: 4, id: "Mark"})    | [:FOLLOW {_id_: 105}[56>55]] | (:User {predefined_label: 21, _id_: 55, followers: 5, partition: 4, id: "Doug"})    |
| (:User {predefined_label: 19, _id_: 56, followers: 0, partition: 4, id: "Mark"})    | [:FOLLOW {_id_: 106}[56>57]] | (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: "Michael"}) |
| (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: "Michael"}) | [:FOLLOW {_id_: 109}[57>52]] | (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 0, id: "Alice"})   |
| (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: "Michael"}) | [:FOLLOW {_id_: 112}[57>53]] | (:User {predefined_label: 21, _id_: 53, followers: 1, partition: 0, id: "Bridget"}) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

11 rows available after 2 ms, consumed after another 3 ms

In [48]:

CALL algo.labelPropagation.stream("User", "FOLLOW", {direction: "OUTGOING", iterations: 10})

Out[48]:

In [50]:

CALL algo.labelPropagation('User', 'FOLLOW','OUTGOING',
  {iterations:10,partitionProperty:'partition', write:true})
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, write, partitionProperty;

Out[50]:

+-------------------------------------------------------------------------------------------+
| nodes | iterations | loadMillis | computeMillis | writeMillis | write | partitionProperty |
+-------------------------------------------------------------------------------------------+
| 6     | 2          | 1          | 3             | 1           | TRUE  | "partition"       |
+-------------------------------------------------------------------------------------------+

1 row available after 18 ms, consumed after another 2 ms

Louvain¶

The Louvain method of community detection is an algorithm for detecting communities in networks. It maximizes a modularity score for each community, where the modularity quantifies the quality of an assignment of nodes to communities by evaluating how much more densely connected the nodes within a community are compared to how connected they would be in a random network.

The Louvain algorithm is one of the fastest modularity based algorithms and works well with large graphs. It also reveals a hierarchy of communities at different scales, which can be useful for understanding the global functioning of a network.

In [51]:

MERGE (nAlice:User {id:'Alice'})
MERGE (nBridget:User {id:'Bridget'})
MERGE (nCharles:User {id:'Charles'})
MERGE (nDoug:User {id:'Doug'})
MERGE (nMark:User {id:'Mark'})
MERGE (nMichael:User {id:'Michael'})

MERGE (nAlice)-[:FRIEND]->(nBridget)
MERGE (nAlice)-[:FRIEND]->(nCharles)
MERGE (nMark)-[:FRIEND]->(nDoug)
MERGE (nBridget)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nMark)
MERGE (nAlice)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nDoug)

Out[51]:

In [54]:

MATCH (n:User)-[f:FRIEND]->(m:User)
RETURN n, f, m

Out[54]:

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                 | f                                         | m                                                                                                 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 4, id: "Alice", community: 0})   | [:FRIEND {_id_: 122}[52>57]]              | (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: "Michael", community: 0}) |
| (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 4, id: "Alice", community: 0})   | [:FRIEND {weight: 0.5, _id_: 113}[52>53]] | (:User {predefined_label: 21, _id_: 53, followers: 1, partition: 4, id: "Bridget", community: 0}) |
| (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 4, id: "Alice", community: 0})   | [:FRIEND {weight: 4, _id_: 114}[52>54]]   | (:User {predefined_label: 43, _id_: 54, followers: 1, partition: 4, id: "Charles", community: 2}) |
| (:User {predefined_label: 21, _id_: 53, followers: 1, partition: 4, id: "Bridget", community: 0}) | [:FRIEND {_id_: 120}[53>57]]              | (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: "Michael", community: 0}) |
| (:User {predefined_label: 43, _id_: 54, followers: 1, partition: 4, id: "Charles", community: 2}) | [:FRIEND {_id_: 121}[54>56]]              | (:User {predefined_label: 19, _id_: 56, followers: 0, partition: 4, id: "Mark", community: 2})    |
| (:User {predefined_label: 43, _id_: 54, followers: 1, partition: 4, id: "Charles", community: 2}) | [:FRIEND {_id_: 123}[54>55]]              | (:User {predefined_label: 21, _id_: 55, followers: 5, partition: 4, id: "Doug", community: 2})    |
| (:User {predefined_label: 19, _id_: 56, followers: 0, partition: 4, id: "Mark", community: 2})    | [:FRIEND {weight: 1, _id_: 115}[56>55]]   | (:User {predefined_label: 21, _id_: 55, followers: 5, partition: 4, id: "Doug", community: 2})    |
| (:User {predefined_label: 19, _id_: 56, followers: 0, partition: 4, id: "Mark", community: 2})    | [:FRIEND {weight: 2, _id_: 116}[56>57]]   | (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: "Michael", community: 0}) |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

8 rows available after 1 ms, consumed after another 1 ms

In [52]:

CALL algo.louvain.stream('User', 'FRIEND', {})
YIELD nodeId, community

MATCH (user:User) WHERE id(user) = nodeId

RETURN user.id AS user, community
ORDER BY community;

Out[52]:

+-----------------------+
| user      | community |
+-----------------------+
| "Alice"   | 0         |
| "Bridget" | 0         |
| "Michael" | 0         |
| "Charles" | 2         |
| "Doug"    | 2         |
| "Mark"    | 2         |
+-----------------------+

6 rows available after 59 ms, consumed after another 0 ms

Our algorithm found two communities with 3 members each. Mark, Doug, and Charles are all friends with each other, as are Bridget, Alice, and Michael. Charles is the only one who has friends in both communities but he has more in community 4 so he fits better in that one.

We can also call a version of the algorithm that will store the result as a property on a node. This is useful if we want to run future queries that use the result.

In [53]:

CALL algo.louvain('User', 'FRIEND',
  {write:true, writeProperty:'community'})
YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis

Out[53]:

+--------------------------------------------------------------------------------+
| nodes | communityCount | iterations | loadMillis | computeMillis | writeMillis |
+--------------------------------------------------------------------------------+
| 6     | 2              | 1          | 2          | 12            | 4           |
+--------------------------------------------------------------------------------+

1 row available after 26 ms, consumed after another 1 ms

Triangle Counting¶

Triangle counting is a community detection graph algorithm that is used to determine the number of triangles passing through each node in the graph. A triangle is a set of three nodes where each node has a relationship to all other nodes.

In [55]:

CREATE (alice:Person{id:"Alice"}),
       (michael:Person{id:"Michael"}),
       (karin:Person{id:"Karin"}),
       (chris:Person{id:"Chris"}),
       (will:Person{id:"Will"}),
       (mark:Person{id:"Mark"})
CREATE (michael)-[:KNOWS]->(karin),
       (michael)-[:KNOWS]->(chris),
       (will)-[:KNOWS]->(michael),
       (mark)-[:KNOWS]->(michael),
       (mark)-[:KNOWS]->(will),
       (alice)-[:KNOWS]->(michael),
       (will)-[:KNOWS]->(chris),
       (chris)-[:KNOWS]->(karin);

Out[55]:

In [57]:

MATCH (n:Person)-[f:KNOWS]->(m:Person)
RETURN n, f, m

Out[57]:

+-------------------------------------------------------------------------------------------------------------+
| n                                    | f                             | m                                    |
+-------------------------------------------------------------------------------------------------------------+
| (:Person {_id_: 100, id: "Alice"})   | [:KNOWS {_id_: 145}[100>101]] | (:Person {_id_: 101, id: "Michael"}) |
| (:Person {_id_: 101, id: "Michael"}) | [:KNOWS {_id_: 140}[101>102]] | (:Person {_id_: 102, id: "Karin"})   |
| (:Person {_id_: 101, id: "Michael"}) | [:KNOWS {_id_: 141}[101>103]] | (:Person {_id_: 103, id: "Chris"})   |
| (:Person {_id_: 103, id: "Chris"})   | [:KNOWS {_id_: 147}[103>102]] | (:Person {_id_: 102, id: "Karin"})   |
| (:Person {_id_: 104, id: "Will"})    | [:KNOWS {_id_: 146}[104>103]] | (:Person {_id_: 103, id: "Chris"})   |
| (:Person {_id_: 104, id: "Will"})    | [:KNOWS {_id_: 142}[104>101]] | (:Person {_id_: 101, id: "Michael"}) |
| (:Person {_id_: 105, id: "Mark"})    | [:KNOWS {_id_: 144}[105>104]] | (:Person {_id_: 104, id: "Will"})    |
| (:Person {_id_: 105, id: "Mark"})    | [:KNOWS {_id_: 143}[105>101]] | (:Person {_id_: 101, id: "Michael"}) |
+-------------------------------------------------------------------------------------------------------------+

8 rows available after 12 ms, consumed after another 1 ms

In [58]:

CALL algo.triangle.stream('Person','KNOWS') 
yield nodeA,nodeB,nodeC

MATCH (a:Person) WHERE id(a) = nodeA
MATCH (b:Person) WHERE id(b) = nodeB
MATCH (c:Person) WHERE id(c) = nodeC

RETURN a.id AS nodeA, b.id AS nodeB, c.id AS nodeC

Out[58]:

+-------------------------------+
| nodeA     | nodeB   | nodeC   |
+-------------------------------+
| "Michael" | "Chris" | "Will"  |
| "Michael" | "Karin" | "Chris" |
| "Michael" | "Will"  | "Mark"  |
+-------------------------------+

3 rows available after 31 ms, consumed after another 6 ms

Cleaning up...¶

To delete everything in the database run:

In [10]:

MATCH (n) DETACH DELETE n

Out[10]:

To throw away the container, which we created above run:

In [ ]:

%%bash
docker stop myneo4j
docker rm myneo4j