Clustergrammer DataFrame Visualization Example¶

This example shows how to visualize a Pandas DataFrame using the Clustergrammer Jupyter Widget.

In [1]:

import numpy as np
import pandas as pd
from clustergrammer_widget import *

Generate a dataframe of random data with row and column labels

In [2]:

# generate random matrix
num_rows = 500
num_cols = 10
np.random.seed(seed=100)
mat = np.random.rand(num_rows, num_cols)

# make row and col labels
rows = range(num_rows)
cols = range(num_cols)
rows = [str(i) for i in rows]
cols = [str(i) for i in cols]

# make dataframe 
df = pd.DataFrame(data=mat, columns=cols, index=rows)

Initialize the network object, load the dataframe, hierarchically cluster the rows and columns using default parameters, and finally visualize using clustergrammer_widget.

In [3]:

# initialize network object
net = Network(clustergrammer_widget)
# load dataframe
net.load_df(df)
# cluster using default parameters
net.cluster()
# make the visualization
net.widget()

The network object can also be used to normalize and filter your data. Here we will Z-score normalize the columns and make an updated visualization.

In [4]:

net.normalize(axis='col', norm_type='zscore', keep_orig=True)
net.make_clust()
net.widget()

make_clust method will be deprecated in next version, please use cluster method.

Here we will filter rows based on their sum. We will keep the top 20 rows based on their absolute value sum.

In [5]:

net.filter_N_top('row', 20, 'sum')
net.make_clust()
net.widget()

make_clust method will be deprecated in next version, please use cluster method.

In [ ]: