Graph Analytical Algorithms

Graph analytics is widely used in real world. Many algorithms, like community detection, paths and connectivity, centrality are proven to be very useful in various businesses. GraphScope ships with a set of built-in algorithms, enables users easily analysis their graph data.

This tutorials demostrate how to use built-in algorithms to process analytics tasks.

Preparation

We start by creating a session and loading a property graph into GraphScope.

Here we take a peer-to-peer dataset derived from Gnutella peer-to-peer network, August 31 2002, with generated data on vertices and edges. The graph data files are located in /home/jovyan/datasets/property.

In [ ]:
import graphscope
from graphscope.framework.loader import Loader

k8s_volumes = {
    "data": {
        "type": "hostPath",
        "field": {
          "path": "/testingdata",  # Path in host
          "type": "Directory"
        },
        "mounts": {
          "mountPath": "/home/jovyan/datasets",
            "readOnly": True
        }
    }
}

graphscope.set_option(show_log=True)  # enable logging
graphscope.set_option(initializing_interactive_engine=False)
sess = graphscope.session(k8s_volumes=k8s_volumes)

graph = sess.g(directed=False)
graph = graph.add_vertices("/home/jovyan/datasets/property/p2p-31_property_v_0", label="person")
graph = graph.add_edges("/home/jovyan/datasets/property/p2p-31_property_e_0", label="knows")

Let's take a look at the schema of the graph.

In [ ]:
print(graph.schema)

As shown above, we loaded a property graph has vertices labeled "person", with 2 properties ( namely "weight" and "id"), and edges labeled "knows" with 3 properties ( namely, "src_label_id", "dst_label_id" and "dist").

Project to Simple Graph

Most graph analytical algorithms are defined on simple graph, which has only one kind of vertices and edges, edges and vertices have at most one property as its attribute.

GraphScope provides a function project to convert a property graph to a simple graph, by selecting one kind of label for vertices/edges, and each with at most one of their properties.

In [ ]:
simple_graph = graph.project(vertices={"person": []}, edges={"knows": ["dist"]})

Run Algorithms

In the following sections, we will run several algorithms over the graph and inspect the result.

Single Source Shortest Path

Algorithm sssp takes two arguments, a graph, and the src for query as source.

In the example, we are quering the shortest path from source node id=6, over the projected subgraph in the previous step.

Behind the scenes, the algorithm will codegen a compatible version for the loaded graph, and compile to a executable binary. During the process, some validation will be conduct. e.g., in this case, the sssp algorithm requires the graph has a int or double value on edges. It may take a bit longer to building the library. However, this step only take once for the same algorithm on a typed graph.

In [ ]:
from graphscope import sssp
sssp_context = sssp(simple_graph, src=6)

After the computation, the results are distributed on the vineyard instances on the cluster. The returned object is a Context, which has several methods to retrieve or persist the results.

Please refer to Context to get more details.

In this case, the results represent the shortest distance from the source node. We use this to fetch the results and display with its vertex id.

In [ ]:
sssp_context.to_dataframe(selector={'id': 'v.id', 'dist': 'r'}, vertex_range={'begin': 1, 'end': 10}).sort_values(by='id')

Alternatively, we can save the results to local file system.

In [ ]:
sssp_context.output_to_client('./sssp_result.csv', selector={'id': 'v.id', 'dist': 'r'})

You may want to take a look at the file.

In [ ]:
!head ./sssp_result.csv

PageRank

PageRank may be the most famous and commonly used graph algorithm. Let's see how to run PageRank in GraphScope just in 2 lines.

In [ ]:
from graphscope import pagerank
pr_context = pagerank(simple_graph, delta=0.85, max_round=10)
In [ ]:
pr_context.to_dataframe(selector={'id': 'v.id', 'rank': 'r'}, vertex_range={'begin': 1, 'end': 10}).sort_values(by='id')

Save the results to local.

In [ ]:
pr_context.output_to_client('./pagerank_result.csv', selector={'id': 'v.id', 'rank': 'r'})

Weakly Connected Components

In graph theory, a component, sometimes called a connected component, of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph.

Algorithm weakly connected components (WCC) determines the weakly connected component each vertex belongs to.

In [ ]:
from graphscope import wcc
wcc_context = wcc(simple_graph)
wcc_context.to_dataframe(selector={'id': 'v.id', 'cc': 'r'}, vertex_range={'begin': 1, 'end': 10}).sort_values(by='id')

For more analytical algorithms, please check Builtin algorithms and try something new.

Finally, close session to release resources.

In [ ]:
sess.close()