The original data was given as two tab-separated matrices
Plasma.txt
(original name: 160202_CGI002_Plasma_Plasma_singlets.fcs_raw_events.txt)PMA.txt
(original name: 160202_CGI002_PMA_PMA_singlets.fcs_raw_events.txt)These files had individual cell measurements as rows and dimensions (e.g. antibodies) as columns. I only kept the dimensions of interest surface marker and phospho marker antibody columns/dimensions and renamed these files. I then semi-automatically identified 'roughly-defined' cell types using hierarchical clustering and the surface markers associated cell types.
Plasma_CT.txt
and PMA_CT.txt
.
import pandas as pd
import numpy as np
from clustergrammer_widget import *
net = Network(clustergrammer_widget)
# load Plasma treated data with defined cell types
net.load_file('../cytof_data/Plasma_UCT.txt')
# subsample the data so that both treatments have the same number of cells
net.random_sample(axis='row',num_samples=110000, random_state=99)
df_plasma = net.export_df()
print(df_plasma.shape)
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
net.downsample(ds_type='kmeans', axis='row', num_samples=1000)
print(net.dat['mat'].shape)
# clip z-scores since we do not care about extreme outliers
net.clip(-10,10)
net.write_matrix_to_tsv('../cytof_data/ds_plasma.txt')
(110000, 28)
/Users/nickfernandez/anaconda/lib/python2.7/site-packages/sklearn/cluster/k_means_.py:1382: RuntimeWarning: init_size=300 should be larger than k=1000. Setting it to 3*k init_size=init_size)
(1000, 28)
net.set_cat_color('row', 1, 'Majority-Treatment: Plasma', 'blue')
net.set_cat_color('row', 1, 'Majority-Treatment: PMA', 'red')
# greens
net.set_cat_color('row', 2, 'Majority-Category: CD14hi monocytes', 'yellow')
net.set_cat_color('row', 2, 'Majority-Category: CD4 Tcells', 'blue')
net.set_cat_color('row', 2, 'Majority-Category: NK cells_CD16hi', 'red')
net.set_cat_color('row', 2, 'Majority-Category: NK cells_CD16hi_CD57hi', 'orange')
net.set_cat_color('row', 2, 'Majority-Category: NK cells_CD56hi', '#FF6347')
net.set_cat_color('col', 1, 'Marker-type: phospho marker', 'red')
net.set_cat_color('col', 1, 'Marker-type: surface marker', 'blue')
net.cluster(views=[])
net.widget()
net.load_file('../cytof_data/PMA_UCT.txt')
net.random_sample(axis='row',num_samples=110000, random_state=99)
df_pma = net.export_df()
net.load_df(df_pma)
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
net.downsample(ds_type='kmeans', axis='row', num_samples=1000)
net.dat['mat'].shape
net.clip(-10,10)
net.write_matrix_to_tsv('../cytof_data/ds_pma.txt')
net.cluster(views=[])
net.widget()
df_merge = pd.concat([df_plasma, df_pma])
print(df_merge.shape)
net.load_df(df_merge)
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
net.downsample(ds_type='kmeans', axis='row', num_samples=2000)
net.clip(-10,10)
net.dat['mat'].shape
net.cluster(views=[])
net.widget()
(220000, 28)
/Users/nickfernandez/anaconda/lib/python2.7/site-packages/sklearn/cluster/k_means_.py:1382: RuntimeWarning: init_size=300 should be larger than k=2000. Setting it to 3*k init_size=init_size)
df_merge = pd.concat([df_plasma, df_pma])
net.load_df(df_merge)
net.filter_cat('col', 1, 'Marker-type: surface marker')
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
net.downsample(ds_type='kmeans', axis='row', num_samples=2000)
net.clip(-10,10)
print(net.dat['mat'].shape)
net.cluster(views=[])
net.widget()
(2000, 18)
df_merge = pd.concat([df_plasma, df_pma])
net.load_df(df_merge)
net.filter_cat('col', 1, 'Marker-type: phospho marker')
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
net.downsample(ds_type='kmeans', axis='row', num_samples=2000)
net.clip(-10,10)
print(net.dat['mat'].shape)
net.cluster(views=[])
net.widget()
(2000, 10)
PMA and Plasma treated cells separate more based on phospho markers than based on surface markers. This makes sense since PMA treatment is expected to influence phosphorylation levels.
We see a cluster of Monocytes and Granulocytes with high phosphorylation markers: pCREB, pMAPKAP2, pERK1 2, pp38. Below we will export this cluster using the interactive dendrogram and the widget DataFrame export method, widget_df, below:
df_CD14hi = net.widget_df()
net.load_df(df_CD14hi)
net.cluster(views=[])
net.widget()