Graph Manipulations with NetowrkX Compatible APIs

GraphScope supports graph manipulations with NetworkX compatible APIs. This tutorial go through these APIs following the organization of tutorial in NetworkX.

First, we need to launch a session and import necessary packages as introduced in the previous tutorial.

In [ ]:
import os
import graphscope

k8s_volumes = {
    "data": {
        "type": "hostPath",
        "field": {
          "path": "/testingdata",  # Path in host
          "type": "Directory"
        },
        "mounts": {
          "mountPath": "/home/jovyan/datasets",  # Path in pods
          "readOnly": True
        }
    }
}

graphscope.set_option(show_log=True)  # enable logging
graphscope.set_option(initializing_interactive_engine=False)
sess = graphscope.session(k8s_volumes=k8s_volumes)  # create a session

# to make our lives easier, we set this session as default.
sess.as_default()

Then we import the graphscope networkx module.

In [ ]:
import graphscope.nx as nx

Creating a graph

there are several way to create a graph in graphscope.nx, including

  • create an empty graph.
  • create from edgelist file.
  • create from graphscope.g

To create an empty graph, simple new a Graph object.

In [ ]:
G = nx.Graph()

or we can create a Graph from a edgelist file

In [ ]:
G2 = nx.read_edgelist("/home/jovyan/datasets/dynamic/p2p-31_dynamic.edgelist", nodetype=int, data=True, create_using=nx.Graph)

In addition, we can create a graph in the GraphScope way, which will be introduced in the next tutorial.

In [ ]:
from graphscope.framework.loader import Loader
graph = graphscope.g(directed=False)
graph = graph.add_vertices(Loader("/home/jovyan/datasets/ldbc_sample/person_0_0.csv", delimiter="|"), "person")
graph = graph.add_edges(Loader("/home/jovyan/datasets/ldbc_sample/person_knows_person_0_0.csv", delimiter="|"),
                        "knows",
                        src_label="person",
                        dst_label="person",
              )
G3 = nx.Graph(graph)

Nodes

The graph G can be grown in several ways. In graphscope.nx, nodes can be some hashable Python object, including int, str, float, tuple, bool object. To get started though we’ll look at simple manipulations and start from empty graph. You can add one node at a time,

In [ ]:
G.add_node(1)

or add nodes from any iterable container, such as a list

In [ ]:
G.add_nodes_from([2, 3])

You can also add nodes along with node attributes if your container yields 2-tuples of the form (node, node_attribute_dict):

Node attributes are discussed further below.

In [ ]:
G.add_nodes_from([
    (4, {"color": "red"}),
    (5, {"color": "green"}),
])

Nodes from one graph can be incorporated into another:

In [ ]:
H = nx.path_graph(10)
G.add_nodes_from(H)

G now contains the nodes of H as nodes of G.

In [ ]:
list(G.nodes)
In [ ]:
list(G.nodes.data())  # shows the node attributes

Edges

G can also be grown by adding one edge at a time,

In [ ]:
G.add_edge(1, 2)
e = (2, 3)
G.add_edge(*e)  # unpack edge tuple*
In [ ]:
list(G.edges)

by adding a list of edges,

In [ ]:
G.add_edges_from([(1, 2), (1, 3)])
In [ ]:
list(G.edges)

or by adding any ebunch of edges. An ebunch is any iterable container of edge-tuples. An edge-tuple can be a 2-tuple of nodes or a 3-tuple with 2 nodes followed by an edge attribute dictionary, e.g., (2, 3, {'weight': 3.1415}). Edge attributes are discussed further below.

In [ ]:
G.add_edges_from([(2, 3, {'weight': 3.1415})])
In [ ]:
list(G.edges.data())  # shows the edge arrtibutes
In [ ]:
G.add_edges_from(H.edges)
In [ ]:
list(G.edges)

or one can add new nodes and edges once with .update(nodes, edges)

In [ ]:
G.update(edges=[(10, 11), (11, 12)], nodes=[10, 11, 12])
In [ ]:
list(G.nodes)
In [ ]:
list(G.edges)

There are no complaints when adding existing nodes or edges. For example, after removing all nodes and edges,

In [ ]:
G.clear()

we add new nodes/edges and graphscope.nx quietly ignores any that are already present.

In [ ]:
G.add_edges_from([(1, 2), (1, 3)])
G.add_node(1)
G.add_edge(1, 2)
G.add_node("spam")        # adds node "spam"
G.add_nodes_from("spam")  # adds 4 nodes: 's', 'p', 'a', 'm'
G.add_edge(3, 'm')

At this stage the graph G consists of 8 nodes and 3 edges, as can be seen by:

In [ ]:
G.number_of_nodes()
In [ ]:
G.number_of_edges()

Examining elements of a graph

We can examine the nodes and edges. Four basic graph properties facilitate reporting: G.nodes, G.edges, G.adj and G.degree. These are set-like views of the nodes, edges, neighbors (adjacencies), and degrees of nodes in a graph. They offer a continually updated read-only view into the graph structure. They are also dict-like in that you can look up node and edge data attributes via the views and iterate with data attributes using methods .items(), .data('span'). If you want a specific container type instead of a view, you can specify one. Here we use lists, though sets, dicts, tuples and other containers may be better in other contexts.

In [ ]:
list(G.nodes)
In [ ]:
list(G.edges)
In [ ]:
list(G.adj[1])  # or list(G.neighbors(1))
In [ ]:
G.degree[1]  # the number of edges incident to 1

One can specify to report the edges and degree from a subset of all nodes using an nbunch. An nbunch is any of: None (meaning all nodes), a node, or an iterable container of nodes that is not itself a node in the graph.

In [ ]:
G.edges([2, 'm'])
In [ ]:
G.degree([2, 3])

Removing elements from a graph

One can remove nodes and edges from the graph in a similar fashion to adding. Use methods Graph.remove_node(), Graph.remove_nodes_from(), Graph.remove_edge() and Graph.remove_edges_from(), e.g.

In [ ]:
G.remove_node(2)
G.remove_nodes_from("spam")
list(G.nodes)
In [ ]:
list(G.edges)
In [ ]:
G.remove_edge(1, 3)
G.remove_edges_from([(1, 2), (2, 3)])
list(G.edges)

Using the graph constructors

Graph objects do not have to be built up incrementally - data specifying graph structure can be passed directly to the constructors of the various graph classes. When creating a graph structure by instantiating one of the graph classes you can specify data in several formats.

In [ ]:
G.add_edge(1, 2)
H = nx.DiGraph(G)   # create a DiGraph using the connections from G
list(H.edges())
In [ ]:
edgelist = [(0, 1), (1, 2), (2, 3)]
H = nx.Graph(edgelist)
list(H.edges)

Accessing edges and neighbors

In addition to the views Graph.edges, and Graph.adj, access to edges and neighbors is possible using subscript notation.

In [ ]:
G = nx.Graph([(1, 2, {"color": "yellow"})])
In [ ]:
G[1]  # same as G.adj[1]
In [ ]:
G[1][2]
In [ ]:
G.edges[1, 2]

You can get/set the attributes of an edge using subscript notation if the edge already exists.

In [ ]:
G.add_edge(1, 3)
G[1][3]['color'] = "blue"
G.edges[1, 3]
In [ ]:
G.edges[1, 2]['color'] = "red"
G.edges[1, 2]

Fast examination of all (node, adjacency) pairs is achieved using G.adjacency(), or G.adj.items(). Note that for undirected graphs, adjacency iteration sees each edge twice.

In [ ]:
FG = nx.Graph()
FG.add_weighted_edges_from([(1, 2, 0.125), (1, 3, 0.75), (2, 4, 1.2), (3, 4, 0.375)])
for n, nbrs in FG.adj.items():
   for nbr, eattr in nbrs.items():
       wt = eattr['weight']
       if wt < 0.5: print(f"({n}, {nbr}, {wt:.3})")

Convenient access to all edges is achieved with the edges property.

In [ ]:
for (u, v, wt) in FG.edges.data('weight'):
    if wt < 0.5:
        print(f"({u}, {v}, {wt:.3})")

Adding attributes to graphs, nodes, and edges

Attributes such as weights, labels, colors, can be attached to graphs, nodes, or edges.

Each graph, node, and edge can hold key/value attribute. By default these are empty, but attributes can be added or changed using add_edge, add_node or direct manipulation of the attribute dictionaries named G.graph, G.nodes, and G.edges for a graph G.

Graph attributes

Assign graph attributes when creating a new graph

In [ ]:
G = nx.Graph(day="Friday")
G.graph

Or you can modify attributes later

In [ ]:
G.graph['day'] = "Monday"
G.graph

Node attributes

Add node attributes using add_node(), add_nodes_from(), or G.nodes

In [ ]:
G.add_node(1, time='5pm')
G.add_nodes_from([3], time='2pm')
G.nodes[1]
In [ ]:
G.nodes[1]['room'] = 714
G.nodes.data()

Note that adding a node to G.nodes does not add it to the graph, use G.add_node() to add new nodes. Similarly for edges.

Edge Attributes

Add/change edge attributes using add_edge(), add_edges_from(), or subscript notation.

In [ ]:
G.add_edge(1, 2, weight=4.7 )
G.add_edges_from([(3, 4), (4, 5)], color='red')
G.add_edges_from([(1, 2, {'color': 'blue'}), (2, 3, {'weight': 8})])
G[1][2]['weight'] = 4.7
G.edges[3, 4]['weight'] = 4.2
In [ ]:
G.edges.data()

The special attribute weight should be numeric as it is used by algorithms requiring weighted edges.

Induce deepcopy subgraph and edge_subgraph

graphscope.nx support induce a deepcopy subgraph by given node set or edge set.

In [ ]:
G = nx.path_graph(10)
# induce a subgraph by nodes
H = G.subgraph([0, 1, 2])
list(H.nodes)
In [ ]:
list(H.edges)
In [ ]:
# induce a edge subgraph by edges
K = G.edge_subgraph([(1, 2), (3, 4)])
list(K.nodes)
In [ ]:
list(K.edges)

Note: different with subgraph/edge_subgraph api in NetworkX which return a view, graphscope.nx return a deepcopy of subgraph/edge_subgraph.

Making copies

One can use to_directed to return a directed representaion of the graph.

In [ ]:
DG = G.to_directed()  # here would return a "deepcopy" directed representation of G.
list(DG.edges)
In [ ]:
# or with
DGv = G.to_directed(as_view=True)  # return a view.
list(DGv.edges)
In [ ]:
# or with
DG = nx.DiGraph(G)   # return a "deepcopy" of directed representation of G.
list(DG.edges)

Or get a copy of the graph.

In [ ]:
H = G.copy()  # return a view of copy
list(H.edges)
In [ ]:
# or with
H = G.copy(as_view=False)  # return a "deepcopy" copy
list(H.edges)
In [ ]:
# or with
H = nx.Graph(G)  # return a "deepcopy" copy
list(H.edges)

Note: graphscope.nx not support shallow copy of the graph.

Directed graphs

The DiGraph class provides additional methods and properties specific to directed edges, e.g., DiGraph.out_edges, DiGraph.in_degree, DiGraph.predecessors(), DiGraph.successors() etc. To allow algorithms to work with both classes easily, the directed versions of neighbors() is equivalent to successors() while degree reports the sum of in_degree and out_degree even though that may feel inconsistent at times.

In [ ]:
DG = nx.DiGraph()
DG.add_weighted_edges_from([(1, 2, 0.5), (3, 1, 0.75)])
In [ ]:
DG.out_degree(1, weight='weight')
In [ ]:
DG.degree(1, weight='weight')
In [ ]:
list(DG.successors(1))
In [ ]:
list(DG.neighbors(1))
In [ ]:
list(DG.predecessors(1))

Some algorithms work only for directed graphs and others are not well defined for directed graphs. Indeed the tendency to lump directed and undirected graphs together is dangerous. If you want to treat a directed graph as undirected for some measurement you should probably convert it using Graph.to_undirected()

In [ ]:
H = DG.to_undirected()  # return a "deepcopy" of undirected represetation of DG.
list(H.edges)
In [ ]:
# or with
H = nx.Graph(DG)  # create an undirected graph H from a directed graph G
list(H.edges)

Directed Graph also support to reverse edge using DiGraph.reverse().

In [ ]:
K = DG.reverse()  # retrun a "deepcopy" of reversed copy.
list(K.edges)
In [ ]:
# or with
K = DG.reverse(copy=False)  # return a view of reversed copy.
list(K.edges)

Analyzing graphs

The structure of G can be analyzed using various graph-theoretic functions such as:

In [ ]:
G = nx.Graph()
G.add_edges_from([(1, 2), (1, 3)])
G.add_node(4)
sorted(d for n, d in G.degree())
In [ ]:
nx.builtin.clustering(G)

In graphscope.nx, we support some builtin algorithm for analyzing graph, see builtin algorithm for details on graph algorithms supported.

Transform to graphscope.graph

As graphscope.nx Graph can create from graphscope.graph, the graphscope.nx graph can transform to graphscope.graph too. e.g

In [ ]:
nodes = [(0, {"foo": 0}), (1, {"foo": 1}), (2, {"foo": 2})]
edges = [(0, 1, {"weight": 0}), (0, 2, {"weight": 1}), (1, 2, {"weight": 2})]
G = nx.Graph()
G.update(edges, nodes)
g = graphscope.g(G)

Finally, don't forget to close the session.

In [ ]:
sess.close()