Graphscope provides a set of graph analysis interfaces compatible with Networkx.
In this article, we will show how to use graphscope to perform graph analysis like Networkx.
Usually, the graph analysis process of NetworkX starts with the construction of a graph.
In the following example, we create an empty graph first, and then expand the data through the interface of NetworkX.
# Install graphscope package if you are NOT in the Playground
!pip3 install graphscope
import networkx
# Initialize an empty graph
G = networkx.Graph()
# Add edges (1, 2)and(1 3) by `add_edges_from` interface
G.add_edges_from([(1, 2), (1, 3)])
# Add vertex "4" by `add_node` interface
G.add_node(4)
Then we can query the graph information.
# Query the number of vertices by `number_of_nodes` interface.
G.number_of_nodes()
# Similarly, query the number of edges by `number_of_edges` interface.
G.number_of_edges()
# Query the degree of each vertex by `degree` interface.
sorted(d for n, d in G.degree())
Finally, calling the builtin algorithm of NetworkX to analysis the graph G
.
# Run 'connected components' algorithm
list(networkx.connected_components(G))
# Run 'clustering' algorithm
networkx.clustering(G)
Graph Building
To use NetworkX interface from graphscope, we just need to replace import networkx as nx
with import graphscope.nx as nx
.
Here we use nx.Graph()
interace to create an empty undirected graph.
import graphscope
graphscope.set_option(show_log=True)
import graphscope.nx as nx
# Initialize an empty graph
G = nx.Graph()
Add edges and vertices
Just like operating NetworkX, you can add vertices by add_node
add_nodes_from
and add edges by add_edge
add_edges_from
.
# Add one vertex by `add_node` interface
G.add_node(1)
# Or add a batch of vertices from iterable list
G.add_nodes_from([2, 3])
# Also you can add attributes while adding vertices
G.add_nodes_from([(4, {"color": "red"}), (5, {"color": "green"})])
# Similarly, add one edge by `add_edge` interface
G.add_edge(1, 2)
e = (2, 3)
G.add_edge(*e)
# Or add a batch of edges from iterable list
G.add_edges_from([(1, 2), (1, 3)])
# Add attributes while adding edges
G.add_edges_from([(1, 2), (2, 3, {'weight': 3.1415})])
Query Graph
Just like operating NetworkX, you can search the number of vertices/edge by number_of_nodes
/number_of_edges
interface, or query the neighbor of vertex by adj
interface.
# Query the number of vertices by `number_of_nodes` interface.
G.number_of_nodes()
# Similarly, query the number of edges by `number_of_edges` interface.
G.number_of_edges()
# list the vertices in graph `G`
list(G.nodes)
# list the edges in graph `G`
list(G.edges)
# query the nerghbors of vertex '1'
list(G.adj[1])
# search the degree of vertex '1'
G.degree(1)
Delete
Just like operating NetworkX, you can remove vertices by remove_node
or remove_nodes_from
interface, and remove edges by remove_edge
or remove_edges_from
interface.
# remove one vertex by `remove_node` interface
G.remove_node(5)
list(G.nodes)
# remove a batch of vertices by `remove_nodes_from` interface
G.remove_nodes_from([4, 5])
list(G.nodes)
# remove one edge by `remove_edge` interface
G.remove_edge(1, 2)
list(G.edges)
# remove a batch of edges by `remove_edges_from` interface
G.remove_edges_from([(1, 3), (2, 3)])
list(G.edges)
# query the number of vertices after removal
G.number_of_nodes()
# query the number of edges after removal
G.number_of_edges()
Graph Analysis
The interface of graph analysis module in graphscope is also compatible with NetworkX.
In following examples, we use connected_components
to analyze the connected components of the graph, use clustering
to get the clustering coefficient of each vertex, and all_pairs_shortest_path
to compute the shortest path between any two vertices.
# Building graph
G = nx.Graph()
G.add_edges_from([(1, 2), (1, 3)])
G.add_node(4)
# Run connected_components
list(nx.connected_components(G))
# Run clustering
nx.clustering(G)
# Run all_pairs_shortest_path
sp = dict(nx.all_pairs_shortest_path(G))
sp[3]
Graph Display
Like NetworkX, you can draw a graph by draw
interface, which relies on the drawing function of 'Matplotlib'.
You should install matplotlib
first if you are not in playground environment.
!pip3 install matplotlib
使用 GraphScope 来进行简单地绘制图
# Create a star graph with 5 vertices
G = nx.star_graph(5)
# Sraw
nx.draw(G, with_labels=True, font_weight='bold')
Let's see how much GraphScope improves the algorithm performance compared with NetworkX by a simple experiment.
We run clustering algorithm on twitter datasets.
Download dataset if you are not in playground environment
!wget https://raw.githubusercontent.com/GraphScope/gstest/master/twitter.e -P /tmp
Loading dataset both in GraphScope and NetwrokX.
import os
import graphscope.nx as gs_nx
import networkx as nx
# loading graph in NetworkX
g1 = nx.read_edgelist(
os.path.expandvars('/tmp/twitter.e'), nodetype=int, data=False, create_using=nx.Graph
)
type(g1)
# Loading graph in GraphScope
g2 = gs_nx.read_edgelist(
os.path.expandvars('/tmp/twitter.e'), nodetype=int, data=False, create_using=gs_nx.Graph
)
type(g2)
Run algorithm and display time both in GraphScope and NetworkX.
%%time
# GraphScope
ret_gs = gs_nx.clustering(g2)
%%time
# NetworkX
ret_nx = nx.clustering(g1)
# Result comparison
ret_gs == ret_nx