Hematopoiesis: trace myeloid and erythroid differentiation for data of Paul et al., Cell (2015).
Note: For a more recent and richer analysis of this dataset take a look at https://github.com/theislab/graph_abstraction/tree/master/paul15.
import numpy as np
import matplotlib.pyplot as pl
import scanpy.api as sc
sc.settings.verbosity = 3 # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_versions()
results_file = './write/paul15.h5ad'
sc.settings.set_figure_params(dpi=80) # low dpi (dots per inch) yields small inline figures
Perform a simple Diffusion Pseudotime analysis on raw data, as in Haghverdi et al. (2016). No preprocessing, only logarthmize the raw counts.
Note: The following function is also available as sc.datasets.paul15()
.
adata = sc.datasets.paul15()
sc.pp.log1p(adata) # logarithmize data
sc.pp.neighbors(adata, n_neighbors=20, use_rep='X', method='gauss')
sc.tl.diffmap(adata)
sc.tl.dpt(adata, n_branchings=1, n_dcs=10)
Diffusion Pseudotime (DPT) analysis detects the branch of granulocyte/macrophage progenitors (GMP), and the branch of megakaryocyte/erythrocyte progenitors (MEP). There are two small further subgroups (segments 0 and 2).
sc.pl.diffmap(adata, color=['dpt_pseudotime', 'dpt_groups', 'paul15_clusters'])
With this, we reproduced the analysis of Haghverdi et al. (2016, Suppl. Note 4 and Suppl. Figure N4).
adata.write(results_file)
Computing connectivities using UMAP gives us quantitiatively different results for the pseudotime.
sc.pp.neighbors(adata, n_neighbors=20, use_rep='X', method='umap')
sc.tl.diffmap(adata)
sc.tl.dpt(adata, n_branchings=1)
sc.pl.diffmap(adata, color=['dpt_pseudotime', 'dpt_groups', 'paul15_clusters'])