The original data was given as two tab-separated matrices
Plasma.txt
(original name: 160202_CGI002_Plasma_Plasma_singlets.fcs_raw_events.txt)PMA.txt
(original name: 160202_CGI002_PMA_PMA_singlets.fcs_raw_events.txt)These files had individual cell measurements as rows and dimensions (e.g. antibodies) as columns. I only kept the dimensions of interest surface marker and phospho marker antibody columns/dimensions and renamed these files Plasma_clean.txt
and PMA_clean.txt
.
import pandas as pd
import numpy as np
from clustergrammer_widget import *
net = Network()
net.load_file('../cytof_data/Plasma_CT.txt')
df_plasma = net.export_df()
net.load_df(df_plasma)
df_plasma.shape
(141859, 28)
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
net.downsample(ds_type='kmeans', axis='row', num_samples=1000)
net.dat['mat'].shape
/Users/nickfernandez/anaconda/lib/python2.7/site-packages/sklearn/cluster/k_means_.py:1382: RuntimeWarning: init_size=300 should be larger than k=1000. Setting it to 3*k init_size=init_size)
(1000, 28)
# clip z-scores since we do not are about extreme outliers
net.clip(-10,10)
net.make_clust(views=[])
clustergrammer_widget(network=net.widget())
net.load_file('../cytof_data/PMA_CT.txt')
df_pma = net.export_df()
net.load_df(df_pma)
df_pma.shape
(141859, 28)
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
net.downsample(ds_type='kmeans', axis='row', num_samples=1000)
net.dat['mat'].shape
net.clip(-10,10)
net.make_clust(views=[])
clustergrammer_widget(network=net.widget())
df_plasma.shape
(141859, 28)
df_pma.shape
(110705, 28)
df_merge = pd.concat([df_plasma, df_pma])
df_merge.shape
(252564, 28)
net.load_df(df_merge)
net.normalize(axis='col', norm_type='zscore', keep_orig=False)
net.downsample(ds_type='kmeans', axis='row', num_samples=1000)
net.clip(-10,10)
net.dat['mat'].shape
(1000, 28)
net.make_clust(views=[])
clustergrammer_widget(network=net.widget())
The above heatmap mixes both surface markers and phosphorylation markers. We can see that PMA treated cells have higher levels of phospohrylation of
and higher levels of the CD14 surface marker.
We want to also see how our cells cluster based on phosphorylation markers only and surface markers only.