In this notebook, I will work with the smaller datasets since they are quicker to load and process. Once I've set up dendro-category export, etc, then I will work with the original datasets.
import pandas as pd
from clustergrammer_widget import *
net = Network()
net.load_file('../cytof_data/ds_plasma.txt')
net.clip(-10,10)
df_plasma = net.export_df()
ds_data_plasma = net.downsample(ds_type='kmeans', axis='row', num_samples=100)
df_ds_plasma = net.export_df()
net.load_file('../cytof_data/ds_pma.txt')
net.clip(-10,10)
df_pma = net.export_df()
ds_data_pma = net.downsample(ds_type='kmeans', axis='row', num_samples=100)
df_ds_pma = net.export_df()
df_ds_plasma.shape
(100, 28)
df_ds_pma.shape
(100, 28)
I'm generating categories based on the clusters given by the dendrogram
net.load_df(df_ds_plasma)
net.set_cat_color('row', 1, 'Majority-Majority-Treatment: Plasma', 'blue')
net.set_cat_color('row', 1, 'Majority-Majority-Treatment: PMA', 'red')
net.filter_cat('col', 1, 'Marker-type: surface marker')
net.make_clust()
net.dendro_cats('row', dendro_level=5)
net.make_clust()
df_plasma_cat = net.export_df()
clustergrammer_widget(network=net.widget())
net.load_df(df_pma)
net.filter_cat('col', 1, 'Marker-type: surface marker')
net.make_clust()
net.dendro_cats('row', dendro_level=5)
net.make_clust()
# net.dendro_cats('col', dendro_level=7)
# net.make_clust()
df_pma_cat = net.export_df()
clustergrammer_widget(network=net.widget())
ds_data_plasma.shape
(1000,)
ds_data_pma.shape
(1000,)
# new_plasma_rows = []
# for inst_row in plasma_rows:
# inst_row = list(inst_row)
# inst_name = inst_row[0]
# inst_row[0] = inst_name.split(': ')[0] + ': ' + 'plasma-' + inst_name.split(': ')[1]
# inst_row = tuple(inst_row)
# new_plasma_rows.append(inst_row)
# df_plasma_cat.index = new_plasma_rows
# df_pma_cat.index = new_pma_rows
df_merge_cat = pd.concat([df_plasma_cat, df_pma_cat])
net.load_df(df_merge_cat)
net.make_clust()
clustergrammer_widget(network=net.widget())
This is getting closer to what I want. The dendrogram-cats are not assigned correctly. I will have to manually rename then, e.g. Dendro-cat-1 -> Natural Killer cells. This way I will be able to see whether cells that have the same categorization cluster together with or without PMA treatment.
I will also have to transfer the categories determined based on hierarchical clustering of downsampled data to the non-downsampled data. Here, I needed to manually make the names unique, but I will not need to do this when I work with the original non-downsampled data.
exp_df = net.export_df()
exp_rows = exp_df.index.tolist()
exp_cols = exp_df.columns.tolist()
len(list(set(exp_rows)))
2000
len(list(set(exp_cols)))
18