Notebook

Introducing py2neo¶

py2neo is the most popular of the Python drivers used to interact with Neo4j. For simplicity, this example assumes that you've got authentication turned off.

You can turn authentication off by uncommenting this line in your neo4j.conf file:

dbms.security.auth_enabled=false

Now we'll import py2neo and write a simple query to find all the groups that have 'Python' in the name:

In [1]:

from py2neo import Graph
graph = Graph()

In [2]:

query = """
MATCH (group:Group)-[:HAS_TOPIC]->(topic)
WHERE group.name CONTAINS "Python" 
RETURN group.name, COLLECT(topic.name) AS topics
"""

result = graph.cypher.execute(query)

for row in result:
    print(row) 

 group.name               | topics                                                                                                                                                                                                                     
--------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Python for Quant Finance | ['Data Mining', 'Computer programming', 'Data Analytics', 'Machine Learning', 'Predictive Analytics', 'Data Visualization', 'Big Data', 'Cloud Computing', 'Trading', 'Finance', 'Python', 'New Technology', 'Open Source']

 group.name                       | topics                                                                                                                                                                                                                            
----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Python and Django Coding Session | ['Front-end Development', 'HTML', 'Computer programming', 'Website Design', 'Programming Languages', 'Open Source', 'Software Development', 'Web Technology', 'Django', 'Web Development', 'Web Design', 'MySQL', 'Python', 'CSS']

 group.name                   | topics                                                                                                                                                                                              
------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 London Python Project Nights | ['New Technology', 'Technology', 'Python', 'Software Development', 'Open Source', 'Open source python', 'Computer programming', 'Projects', 'Python Web Development', 'Getting started with Python']

You should see a few groups and a list of the topics that they have.

Calculating topic similarity¶

Now that we've got the hang of executing Neo4j queries from Python let's calculate topic similarity based on common groups so that we can use it in our queries.

We'll first import the igraph library:

In [7]:

from igraph import Graph as IGraph

Next we'll write a query which finds all pairs of topics and then works out the number of common groups. We'll use that as our 'weight' in the similarity calculation.

In [5]:

query = """
MATCH (topic:Topic)<-[:HAS_TOPIC]-()-[:HAS_TOPIC]->(other:Topic)
WHERE ID(topic) < ID(other)
RETURN topic.name, other.name, COUNT(*) AS weight
ORDER BY weight DESC
LIMIT 20
"""

graph.cypher.execute(query)

Out[5]:

    | topic.name            | other.name | weight
----+-----------------------+------------+--------
  1 | Open Source           | Python     |     13
  2 | Big Data              | Python     |     12
  3 | Computer programming  | Python     |     10
  4 | Software Development  | Python     |     10
  5 | Data Science          | Python     |      9
  6 | Web Development       | Python     |      8
  7 | Data Analytics        | Python     |      7
  8 | Machine Learning      | Python     |      7
  9 | Data Visualization    | Python     |      6
 10 | Data Mining           | Python     |      6
 11 | JavaScript            | Python     |      6
 12 | Hadoop                | Python     |      6
 13 | Cloud Computing       | Python     |      4
 14 | Ruby                  | Python     |      4
 15 | Predictive Analytics  | Python     |      4
 16 | Mobile Development    | Python     |      4
 17 | iOS Development       | Python     |      4
 18 | Programming Languages | Python     |      3
 19 | Apache Spark          | Python     |      3
 20 | nodeJS                | Python     |      3

Now let's run the query again and wrap the output in igraph:

In [8]:

query = """
MATCH (topic:Topic)<-[:HAS_TOPIC]-()-[:HAS_TOPIC]->(other:Topic)
WHERE ID(topic) < ID(other)
RETURN topic.name, other.name, COUNT(*) AS weight
"""

ig = IGraph.TupleList(graph.cypher.execute(query), weights=True)
ig

Out[8]:

<igraph.Graph at 0x10a5c5e58>

We're now ready to run a community detection algorithm over the graph to see what clusters/communities we have:

In [9]:

clusters = IGraph.community_walktrap(ig, weights="weight")
clusters = clusters.as_clustering()
len(clusters)

Out[9]:

Let's have a quick look at what we've got:

In [11]:

nodes = [node["name"] for node in ig.vs]
nodes = [{"id": x, "label": x} for x in nodes]
nodes[:5]

for node in nodes:
    idx = ig.vs.find(name=node["id"]).index
    node["group"] = clusters.membership[idx]
    
nodes[:10]

Out[11]:

[{'group': 0, 'id': 'Computer programming', 'label': 'Computer programming'},
 {'group': 0, 'id': 'Geeks & Nerds', 'label': 'Geeks & Nerds'},
 {'group': 1, 'id': 'Data Science', 'label': 'Data Science'},
 {'group': 2, 'id': 'Sci-Fi/Fantasy', 'label': 'Sci-Fi/Fantasy'},
 {'group': 3, 'id': 'Cloud Computing', 'label': 'Cloud Computing'},
 {'group': 4, 'id': 'Social CRM', 'label': 'Social CRM'},
 {'group': 5, 'id': 'Hack', 'label': 'Hack'},
 {'group': 0,
  'id': 'Go programming language',
  'label': 'Go programming language'},
 {'group': 0, 'id': 'Front-end Development', 'label': 'Front-end Development'},
 {'group': 0, 'id': 'Finding a New Job', 'label': 'Finding a New Job'}]

And finally we're going to write a Cypher query which takes the results of our community detection algorithm and writes the results back into Neo4j:

In [12]:

query = """
UNWIND {params} AS p 
MATCH (t:Topic {name: p.id}) 
MERGE (cluster:Cluster {name: p.group})
MERGE (t)-[:IN_CLUSTER]->(cluster)
"""

graph.cypher.execute(query, params = nodes)

Out[12]: