Visualizing Iris Dataset using Clustergrammer

In [1]:
from sklearn import datasets
import pandas as pd
from clustergrammer_widget import *
from copy import deepcopy
net = Network()
In [2]:
# import iris data 
iris = datasets.load_iris()

Make DataFrame from Iris

First make row and col names, then build the dataframe

In [3]:
unique_cat_names = iris.target_names.tolist()
cols = []
for i in range(len(iris.target)):
    inst_col = iris.target[i]
    inst_name = 'flowers: flower-'+str(i)
    inst_cat = 'flower-type: ' + unique_cat_names[inst_col]
    inst_tuple = (inst_name, inst_cat)
    cols.append(inst_tuple)
In [4]:
rows = []
for i in range(len(iris.feature_names)):
    inst_name = 'feature: ' + iris.feature_names[i]
    rows.append(inst_name)

Make a DataFrame with flower-samples as columns and dimensions (e.g. peteal width) as rows. Column categories (e.g. setosa) have been added to the column names using tuples.

In [6]:
mat = iris.data.transpose()
df = pd.DataFrame(data=mat, columns=cols, index = rows)
df.shape
Out[6]:
(4, 150)

Visualize Iris Dataframe using Clustergrammer

Below we Z-score normalize the rows (dimensions) to make them more easily comparable.

In [8]:
net = deepcopy(Network())
tmp_df = {}
tmp_df['mat'] = df
net.df_to_dat(tmp_df)
net.normalize(axis='row', norm_type='zscore', keep_orig=True)
net.make_clust()
clustergrammer_widget(network=net.widget())

We see that flowers still largely cluster according to the categories. We also get a smoother breakdown ot the flowers into hierarchical clusters (toggle the column dendrogram level using the triangle/circle slider).