The Mapper graph associated to a dataset, generated by Kepler Mapper, and plotted with Plotly¶

In this notebook we illustrate how a topological graph generated by Kepler Mapper can be plotted directly in notebook with Plotly.

Python Plotly (plotly.py) is a high-level declarative and interactive, browser-based graphing library, built on top of plotly.js, that in turn is on top of d3.js and stack.gl. The Plotly plots can be generated, and displayed directly in a Jupyter Notebook or hosted online in the Plotly cloud. plotly.py is MIT Licensed.

The default method provided by Kepler-Mapper to plot the topological network associated to a dataset, does not allow much customization.

The graph nodes can be colored only with the jet colormap.
Once the network is generated, no plot updates are possible.
The resulted graph saved in a html file does not fit very well into a Jupyter notebook.
Can generate only 2d graphs. Plotly can display 3d graphs, too.

That is why I slightly modified the files kmapper.py and visuals.py to keep the initial settings, and added the option to generate the topological graph as an instance of the igraph.Graph class, with the Kamada-Kawai layout, 'kk', or the Fruchterman Reingold layout, 'fr', and plot it via Plotly.

I also created the file plotlyviz.py that extract data from json_dict to get the Plotly interactive plot of the corresponding graph.

This version is experimental and the commits in my local repo are not yet pushed to the forked repo.

For this version we should add Python igraph and Plotly to the initial requirements to install kmapper. Both can be installed with pip. Here are installing instructions for igraph http://igraph.org/python/.

Plotly can be installed by: pip install plotly

Such a version is, in my opinion, much more adequate for students and researchers that used to study and analyze a dataset in a jupyter notebook or jupyterlab, the next-generation user interface for Project Jupyter.

Here I give as example the graph for digit dataset, and Fashion MNIST data set.

Digits dataset¶

In [2]:

import numpy as np
import sklearn
from sklearn import datasets
import umap
import pkmapper as km
from pkmapper import plotlyviz as pl

In [3]:

from plotly.offline import download_plotlyjs, init_notebook_mode,  iplot
init_notebook_mode(connected=True)

In [4]:

data, labels = datasets.load_digits().data, datasets.load_digits().target

In [5]:

mapper = km.KeplerMapper(verbose=0)

In [6]:

projected_data = mapper.fit_transform(data, projection=umap.UMAP(n_neighbors=15,
                                                                 min_dist=0.9,
                                                                 n_components=2,
                                                                 metric='euclidean',
                                                                 random_state=7654321))

In [7]:

# Create the simplicial complex  
scomplex = mapper.map(projected_data,
                      clusterer=sklearn.cluster.DBSCAN(eps=0.3, min_samples=15),
                      coverer=km.Cover(35, 0.9))



color_function=projected_data[:,0]-projected_data[:,0].min()
#When path_html is None, i.e. we choose to generate a Plotly plot, 
#the method visualize returns the json_graph (as kmgraph), as well as color and meta
#color and meta
kmgraph, color, meta = mapper.visualize(scomplex, custom_tooltips=labels, 
                                        color_function=color_function, path_html=None)

To imitate the original colors from kmapper we define the Plotly colormap/colorscale pl_jet (it can be called directly, too, without explicit definition). Python community however considers jet colormap as having a lot of drawbacks and starting with matplotlib 2.0 the default jet map was replaced by viridis

In [8]:

pl_jet=[[0.0, 'rgb(0, 0, 127)'],
 [0.03, 'rgb(0, 0, 163)'],
 [0.07, 'rgb(0, 0, 204)'],
 [0.1, 'rgb(0, 0, 241)'],
 [0.13, 'rgb(0, 8, 255)'],
 [0.17, 'rgb(0, 40, 255)'],
 [0.2, 'rgb(0, 76, 255)'],
 [0.23, 'rgb(0, 108, 255)'],
 [0.27, 'rgb(0, 144, 255)'],
 [0.3, 'rgb(0, 176, 255)'],
 [0.33, 'rgb(0, 212, 255)'],
 [0.37, 'rgb(12, 244, 234)'],
 [0.4, 'rgb(41, 255, 205)'],
 [0.43, 'rgb(66, 255, 179)'],
 [0.47, 'rgb(95, 255, 150)'],
 [0.5, 'rgb(124, 255, 121)'],
 [0.53, 'rgb(150, 255, 95)'],
 [0.57, 'rgb(179, 255, 66)'],
 [0.6, 'rgb(205, 255, 41)'],
 [0.63, 'rgb(234, 255, 12)'],
 [0.67, 'rgb(255, 229, 0)'],
 [0.7, 'rgb(255, 196, 0)'],
 [0.73, 'rgb(255, 166, 0)'],
 [0.77, 'rgb(255, 133, 0)'],
 [0.8, 'rgb(255, 103, 0)'],
 [0.83, 'rgb(255, 70, 0)'],
 [0.87, 'rgb(255, 40, 0)'],
 [0.9, 'rgb(241, 7, 0)'],
 [0.93, 'rgb(204, 0, 0)'],
 [0.97, 'rgb(163, 0, 0)'],
 [1.0, 'rgb(127, 0, 0)']]


pl_brewer=[[0.0, '#a50026'],
           [0.1, '#d73027'],
           [0.2, '#f46d43'],
           [0.3, '#fdae61'],
           [0.4, '#fee08b'],
           [0.5, '#ffffbf'],
           [0.6, '#d9ef8b'],
           [0.7, '#a6d96a'],
           [0.8, '#66bd63'],
           [0.9, '#1a9850'],
           [1.0, '#006837']]

In [9]:

#PLot the graph with Kamada Kawai layout
plotly_graph_data=pl.plotly_graph(kmgraph, tooltips=labels, graph_layout='kk', colorscale=pl_jet,  factor_size=3, 
                                  edge_linewidth=0.5)#here colorscale could be 'jet'; in this case the above definition 
                                                     #of pl_jet is not necessary anymore
layout=pl.plot_layout(title='Mapper graph of digits dataset',  width=800, height=800,
                      annotation_text=meta,  
                      bgcolor='rgba(0,0,0, 0.95)')

fig=dict(data=plotly_graph_data, layout=layout)
iplot(fig)

Let us update our plot, replacing pl_jet with pl_brewer:

In [10]:

fig['data'][1]['marker'].update(colorscale=pl_brewer)
iplot(fig)

Change the plot background from black to white (or grey), and the colormap for nodes to Viridis:

In [11]:

fig['data'][1]['marker'].update(colorscale='Viridis')
fig['layout'].update(plot_bgcolor='rgb(255,255,255)')
iplot(fig)

Fashion MNIST¶

Read the Fashion-MNIST test dataset, downloaded from kaggle. It consists in 10000 28x28-grayscale images and their associated labels.

In [12]:

import pandas as pd

In [13]:

df = pd.read_csv("fashion-mnist_test.csv")
X = df.iloc[:, 1:].values
y = (df.iloc[:, :1].values).reshape(-2)

Define the dict (label: fashion), where fashion stands for ten fashion items, such as clothes, shoes, bags:

In [14]:

fashion_dict={0: 'T-shirt/top',
              1: 'Trouser',
              2: 'Pullover',
              3: 'Dress',
              4: 'Coat',
              5: 'Sandal',
              6: 'Shirt',
              7: 'Sneaker',
              8: 'Bag',
              9: 'Ankle boot'}

In [15]:

mapper = km.KeplerMapper(verbose=0)


projected_data = mapper.fit_transform(X, projection=umap.UMAP(n_neighbors=5,
                                                              n_components=2,
                                                              min_dist=0.1,
                                                              random_state=123
                                                            )) 

In [16]:

scomplex = mapper.map(projected_data,
                      clusterer=sklearn.cluster.DBSCAN(eps=0.15, min_samples=6),#0.1 15
                      coverer=km.Cover(23, 0.15))#20

In [17]:

color_function=projected_data[:,0]-projected_data[:,0].min()
kmgraph, color, meta=mapper.visualize(scomplex,  color_function=color_function, path_html=None)
#Comment the above line line and uncomment the next one to get the Kepler-Mapper original graph
#html=mapper.visualize(scomplex,  color_function=color_function, path_html='fashion-mnist.html')

In [23]:

plotly_graph_data=pl.plotly_graph(kmgraph, graph_layout='fr', colorscale=pl_brewer, 
                                  reversescale=True, factor_size=2, edge_linewidth=0.5)
title='Topological network representing the  Fashion MNIST  dataset,<br> via   Kepler-Mapper,\
       and UMAP as a filter function'
layout=pl.plot_layout(title=title,  width=800, height=800,
                      annotation_text=meta,  
                      bgcolor='rgba(0,0,0, 1)')

fig_network=dict(data=plotly_graph_data, layout=layout)
iplot(fig_network)

Let us update the node tooltips with the number and class name of members of the associated cluster:

In [24]:

tooltips=plotly_graph_data[1]['text']

Define custom tooltips that point out how many items from each fashion type form a cluster(node):

In [25]:

for j, node in enumerate(kmgraph['nodes']):
    member_label_ids=y[scomplex['nodes'][node['name']]]
    member_labels=[fashion_dict[id] for id in member_label_ids]
    f_type, f_number=np.unique(member_labels, return_counts=True) 
    for m in range(len(f_number)):
        tooltips[j]+='<br>'+str(f_type[m])+': '+ str(f_number[m])

plotly_graph_data[1].update(text=tooltips)

In [26]:

#plotly_graph_data[1]['marker']['colorbar']=dict(thickness=20, tickmode='array', ticktext=[0.0, 0.17, 0.33, 0.5, 0.67, 0.83, 1.0],
#tickvals=[0, 5, 10, 15, 20, 25, 30], title='proj-data<br>x-coord')
fign=dict(data=plotly_graph_data, layout=layout)

iplot(fign)

Here https://plot.ly/~empet/14820.embed is the graph of the breast cancer dataset uploaded to the Plotly cloud.

To be continued with meta_data insertion...

In [27]:

from IPython.core.display import HTML
def  css_styling():
    styles = open("./custom.css", "r").read()
    return HTML(styles)
css_styling()

Out[27]: