MAGIC for single cell RNA-seq

This notebook details the usage of MAGIC for single cell RNA-seq data.

Loading data

MAGIC can read single cell RNA-seq data from a csv or sparse mtx file. The csv file contains cells in the rows and genes in the columns. First step is to import the package. The following code snippet imports the magic package along with other plotting related imports

In [14]:

import magic

# Plotting and miscellaneous imports
import os
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

%matplotlib inline

The data can be loaded using the magic.mg.SCData.from_csv or magic.mg.SCData.from_mtx functions.

In [6]:

# Load single-cell RNA-seq data
scdata = magic.mg.SCData.from_csv(os.path.expanduser('~/Documents/Sophomore/Lab/wishbone/data/sdata_nn_TGFb_day_8_10.csv'),
                                  data_type='sc-seq', normalize=False)

This will create an object of the type magic.mg.SCData which is the base class for the analysis.

A summary of the scdata object is shown below

In [3]:

scdata

Out[3]:

SCData: 7523 cells x 28909 genes

cluster_assignments=None
data_type=True
diffusion_eigenvalues=None
diffusion_eigenvectors=None
diffusion_map_correlations=None
library_sizes=None
magic=None
metadata=True
normalized=True
pca=None
tsne=None

This shows that the data matrix contains 7523 cells and 28909 genes along with the different properties of the magic.mg.SCData class.

The scdata object can also be initialized using a pandas DataFrame. An example is shown below


scdata = magic.mg.SCData(df, 'sc-seq')

Data preprocessing

Filter and normalize data

The first step in data processing for MAGIC is to determine the molecule per cell and molecule per gene cutoffs with which to filter the data

In [4]:

fig, ax = scdata.plot_molecules_per_cell_and_gene()

3.23095955575
4.40377233181

From these histograms, choose the appropriate cutoffs to filter the data. In this case, the data has already been filtered.

In [ ]:

# Minimum molecules/cell value
CELL_MIN = 0

# Maximum molecules/cell values
CELL_MAX = 1000000

# Minimum number of nonzero cells/gene 
# (None if no filtering desired)
GENE_NONZERO = None

# Minimum number of molecules/gene
# (None if no filtering desired)
GENE_MOLECULES = None

In [ ]:

scdata.filter_scseq_data(filter_cell_min=CELL_MIN, filter_cell_max=CELL_MAX, 
                         filter_gene_nonzero=GENE_NONZERO, filter_gene_mols=GENE_MOLECULES)

Next, we will normalize the data by dividing each cell by its molecule count and multiplying the counts of cells by the median of the molecule counts.

In [7]:

scdata = scdata.normalize_scseq_data()

Saving SCData object

The SCData object can be saved to a pickle file and loaded using the save and load functions.


scdata.save('scdata.p')
scdata = magic.mg.SCdata.load('scdata.p')

Running MAGIC

Before running MAGIC, a PCA plot showing the percent of variance explained by the top PCA components can be used to choose the number of PCA components to include when running MAGIC.

In [4]:

fig, ax = scdata.plot_pca_variance_explained(n_components=40, random=True)

When running MAGIC, a number of parameters can be specified.

In [8]:

# MAGIC
scdata.run_magic(n_pca_components=20, random_pca=True, t=6, k=30, 
                 ka=10, epsilon=1, rescale_percent=99)

doing PCA
Computing distances
Autotuning distances
Computing kernel
MAGIC: L_t = L^t
MAGIC: data_new = L_t * data

Visualizations

Gene-gene scatter plots

2D scatter plot before MAGIC:

In [5]:

fig, ax = scdata.scatter_gene_expression(['VIM', 'CDH1'], color='ZEB1')
ax.set_xlabel('Vimentin (VIM)')
ax.set_ylabel('E-cadherin (CDH1)')

Out[5]:

<matplotlib.text.Text at 0x1c878af60>

2D scatter plot after MAGIC:

In [6]:

fig, ax = scdata.magic.scatter_gene_expression(['MAGIC VIM', 'MAGIC CDH1'], color ='MAGIC ZEB1')
ax.set_xlabel('MAGIC Vimentin (VIM)')
ax.set_ylabel('MAGIC E-cadherin (CDH1)')

Out[6]:

<matplotlib.text.Text at 0x1c87b5a90>

3D scatter plot before MAGIC:

In [7]:

fig, ax = scdata.scatter_gene_expression(['VIM', 'CDH1', 'FN1'], color='ZEB1')
ax.set_xlabel('Vimentin (VIM)')
ax.set_ylabel('E-cadherin (CDH1)')
ax.set_zlabel('Fibronectin (FN1)')

Out[7]:

<matplotlib.text.Text at 0x10f4f8898>

3D scatter plot after MAGIC:

In [8]:

fig, ax = scdata.magic.scatter_gene_expression(['MAGIC VIM', 'MAGIC CDH1', 'MAGIC FN1'], color='MAGIC ZEB1')
ax.set_xlabel('MAGIC Vimentin (VIM)')
ax.set_ylabel('MAGIC E-cadherin (CDH1)')
ax.set_zlabel('MAGIC Fibronectin (FN1)')
ax.set_zlim(35, 150)

Out[8]:

(35, 150)

PCA scatter plots

In [9]:

scdata.run_pca()

In [10]:

scdata.magic.run_pca()

PC2 vs PC3 colored by CDH1, VIM, FN1 and ZEB1 (before MAGIC):

In [17]:

gs = gridspec.GridSpec(2,2)
fig = plt.figure(figsize=[15, 12])
genes = ['CDH1', 'VIM', 'EZH2', 'ZEB1']
for i in range(len(genes)):
    ax = fig.add_subplot(gs[i//2, i%2])
    scdata.scatter_gene_expression(genes=['PC2', 'PC3'], color=genes[i], fig=fig, ax=ax)

PC2 vs PC3 colored by CDH1, VIM, FN1, and ZEB1 (after MAGIC):

In [18]:

gs = gridspec.GridSpec(2,2)
fig = plt.figure(figsize=[15, 12])
genes = ['MAGIC CDH1', 'MAGIC VIM', 'MAGIC EZH2', 'MAGIC ZEB1']
for i in range(len(genes)):
    ax = fig.add_subplot(gs[i//2, i%2])
    scdata.magic.scatter_gene_expression(genes=['PC2', 'PC3'], color=genes[i], fig=fig, ax=ax)

tSNE scatter plots

In [5]:

scdata.run_tsne()

tSNE maps colored by CDH1, VIM, FN1, and ZEB1 (before MAGIC):

In [6]:

fig, ax = scdata.plot_gene_expression(genes=['CDH1', 'VIM', 'FN1', 'ZEB1'])

tSNE maps colored by CDH1, FN1, VIM, and ZEB1 (after MAGIC):

In [7]:

colors = {'MAGIC CDH1': scdata.magic.data['MAGIC CDH1'],
          'MAGIC VIM': scdata.magic.data['MAGIC VIM'],
          'MAGIC FN1': scdata.magic.data['MAGIC FN1'],
          'MAGIC ZEB1': scdata.magic.data['MAGIC ZEB1']}
fix, ax = scdata.plot_gene_expression(genes=colors)

MAGIC for single cell RNA-seq

Table of contents

Loading data

Data preprocessing

Filter and normalize data

Saving SCData object

Running MAGIC

Visualizations

Gene-gene scatter plots

2D scatter plot before MAGIC:

2D scatter plot after MAGIC:

3D scatter plot before MAGIC:

3D scatter plot after MAGIC:

PCA scatter plots

PC2 vs PC3 colored by CDH1, VIM, FN1 and ZEB1 (before MAGIC):

PC2 vs PC3 colored by CDH1, VIM, FN1, and ZEB1 (after MAGIC):

tSNE scatter plots

tSNE maps colored by CDH1, VIM, FN1, and ZEB1 (before MAGIC):

tSNE maps colored by CDH1, FN1, VIM, and ZEB1 (after MAGIC):