Graphviz networks plotted with Plotly

                          U.S Open binary tree via Networkx, Pygraphviz and Plotly

igraph and networkx provide layouts based on different layout algorithms, and the networks generated by these Python libraries are plotted via cairo, respectively matplotlib. Neither igraph nor networkx can handle and plot large networks.

An appropriate layout for large networks is provided by graphviz. Moreover graphviz and pygraphviz, a Python interface to graphviz, have graph layouts that are not implemented in igraph, and networkx. Here is illustrated the so called Price network, created through a preferential attachement, and plotted with graphviz layout sfdp, destinated to large networks.

Using Python Plotly we can process network data with one or more of the above mentioned libraries, assign any layout in the collection of all their layouts, and get an interactive plot.

In this Jupyter notebook we illustrate the complementary use of networkx, pygraphviz (with graphviz in the backend), and Plotly, to generate the radial tree representing the evolution of women players at U.S. Open 2016.

US Open is a tennis tournament attended by $N=128=2^7$ women players. It is a balanced knokout tournament, consisting in 7 rounds. A few days before the start the organizers release a draw, that points out the pairs that play in the first round. Then succesivelly, the $(2k+1)^{th}$ and $(2k+2)^{th}$ winners in a round $r\geq 1$ play together in the $(r+1)^{th}$ round, $k=0, ..., 2^{7-r}$. After the $6^{th}$ round remain only two players and the winner of their final match is the winner of the Grand Slam. The whole process is visualized by WTA as a balanced binary tree, having as root the tournament's winner, and the players in all rounds are the tree nodes. The parent of a pair $({2k+1}, 2k+2)$ playing in a round r is the winner of the corresponding match.

To define and plot the corresponding binary tree we proceed as follows:

  • setup an Excel file from the binary tree posted at WTA site. The players are indexed traversing the tree by the Breadth First Search method.
  • the networkx.balanced_tree function creates the binary tree G.
  • the nodes and edges of the graph G define a pygraphviz graph H, with the radial layout, called twopi.
  • the node positions assigned by pygraphviz layout, as well as the corresponding edges are used to define the attributes of the Plotly objects that represent the binary tree.
In [1]:
import networkx as nx
import pygraphviz as pgv
import pandas as pd
import numpy as np
from ast import literal_eval
In [2]:
import platform
print(f'Python version: {platform.python_version()}')
print(f'pygraphviz version: {pgv.__version__}')  #pygraphviz version 1.5 for python 3
Python version: 3.7.1
pygraphviz version: 1.5

Read the Excel file:

In [3]:
df = pd.read_excel("Data/US-Open-2016.xls")
df.loc[:6, :]#print tree levels 0, 1, 2 
0 Kerber
1 Pliskova
2 Kerber
3 S Williams
4 Pliskova
5 Wozniacki
6 Kerber
In [8]:
N = len(df)
In [9]:
labels = list(df['name'])

Define the tree $G$ as a networkx graph:

In [10]:
G = nx.balanced_tree(2, 6)
V = G.nodes()
E = G.edges()

pygraphviz tree H, and its layout are defined below:

In [11]:
H = pgv.AGraph(strict=True, directed=False)

The function position extracts the node coordinates of the pygraphviz tree $H$:

Process the above defined network data to get the corresponding Plotly plot of the binary tree:

In [12]:
import plotly.plotly as py
import plotly.graph_objs  as go

The Plotly version of a graph of edges E and node coordinates pos is returned by the following function:

In [13]:
def plotly_graph(E, pos):
    # E is the list of tuples representing the graph edges
    # pos is the list of node coordinates 
    N = len(pos)
    Xn = [pos[k][0] for k in range(N)]# x-coordinates of nodes
    Yn = [pos[k][1] for k in range(N)]# y-coordnates of nodes

    Xe = []
    Ye = []
    for e in E:
        Xe += [pos[e[0]][0],pos[e[1]][0], None]# x coordinates of the nodes defining the edge e
        Ye += [pos[e[0]][1],pos[e[1]][1], None]# y - " - 
    return Xn, Yn, Xe, Ye    

Get node positions in the tree H:

In [14]:
pos = np.array([literal_eval(H.get_node(k).attr['pos']) for  k in range(N)])
#Rotate node positions with pi/2 counter-clockwise
pos[:, [0, 1]] = pos[:, [1, 0]]
pos[:, 0] =- pos[:,0]

Define the Plotly objects that represent the binary tree, and the finalist routes to the last match:

In [15]:
Xn, Yn, Xe, Ye=plotly_graph(E, pos)

edges = go.Scatter(x=Xe,
                   line=dict(color='rgb(160,160,160)', width=0.75),
nodes = go.Scatter(x=Xn, 
                               line=dict(color='rgb(100,100,100)', width=0.5)),
In [16]:
Kerber_path = [0, 2, 6, 14, 30, 62, 126]
Pliskova_path = [1, 4, 10, 21, 43, 87]
colorKP = ['#CC0000']*len(Kerber_path) + ['rgb(65, 64, 123)']*len(Pliskova_path)# set color for both paths
In [17]:
XK = [pos[k][0] for k in Kerber_path]
YK = [pos[k][1] for k in Kerber_path]
XP = [pos[k][0] for k in Pliskova_path]
YP = [pos[k][1] for k in Pliskova_path]

finalists = go.Scatter(x=XK+XP, 
                              line=dict(color='rgb(100,100,100)', width=0.5),                             
                       text=['Kerber']*len(Kerber_path) + ['Pliskova']*len(Pliskova_path), 

We attach to each player in the second round its name aligned radially with respect to the corresponding node position. The function set_annotation places the names at their position, with text displayed at a given angle:

In [18]:
def set_annotation(x, y, anno_text,  textangle, fontsize=11, color='rgb(10,10,10)'): 
    return dict(x= x,  
                y= y,       
                text= anno_text,      
                textangle=textangle,#angle with horizontal line through (x,y), in degrees;
                                    #+ =clockwise, -=anti-clockwise
                font= dict(size=fontsize, color=color),  

Define Plotly plot layout:

In [19]:
layout = go.Layout(
             title="U.S. Open 2016<br>Radial binary tree associated to women's singles  players", 

The node positions returned by pygraphviz radial layout are not located on a circle centered at origin. That is why we calculate the circle center and radius:

In [20]:
center = np.array([(np.min(pos[63:, 0])+np.max(pos[63:, 0]))/2, (np.min(pos[63:, 1])+np.max(pos[63:, 1]))/2])
radius = np.linalg.norm(pos[63,:]-center)

Compute the text angle:

In [21]:
for k in range(63, 127):
    v = pos[k,:]-center
In [22]:
pos_text = center+1.2*(pos[63:, :]-center)# text position
annotations = []
#define annotations for non-finalist players
for k in range(63, 87):
    annotations += [set_annotation(pos_text[k-63][0], pos_text[k-63][1], labels[k],  angles[k-63])]
for k in range(88, 126):
    annotations += [set_annotation(pos_text[k-63][0], pos_text[k-63][1], labels[k],  angles[k-63])]    

#insert colored annotations for the finalists, Pliskova and Kerber
annotations += [set_annotation(pos_text[87-63][0], pos_text[87-63][1], 
                                       color='rgb(65, 64, 123)'),
                set_annotation(pos_text[126-63][0], pos_text[126-63][1], 
annotations += [set_annotation(center[0]-0.15, center[1]+45, 
                                        '<b>Winner<br>A. Kerber</b>', 
                                        0, fontsize=12, color='#CC0000')]

Append the annotation that displays data source:

In [23]:
data_anno_text="Data source: "+\
               "<a href=''> [1] </a>,"+\
               " Excel file: "+\
                "<a href=''> [2] </a>"
            font=dict(size=12 )
layout.annotations = annotations
In [24]:
data = [edges, nodes, finalists]
fig = go.FigureWidget(data=data, layout=layout)
In [ ]:
#py.sign_in('empet', 'my_api_key')
py.plot(fig, filename='US-Open-16')
In [25]:
from IPython.display import IFrame
IFrame('',  width=650, height=650)
In [ ]: