To merge all available mouse dataset for BBKNN paper
import numpy as np
import scipy as sp
import scanpy.api as sc
import pandas as pd
import matplotlib.pyplot as plt
from bbknn import bbknn
sc.settings.verbosity = 3
sc.settings.set_figure_params(dpi=100)
sc.logging.print_version_and_date()
Running Scanpy 1.2.2 on 2018-07-20 15:48.
The objects can be downloaded from ftp://ngs.sanger.ac.uk/production/teichmann/BBKNN/MouseAtlas.zip
adata = sc.read("MouseAtlas.total")
Standard KNN. Lots of atlas-of-origin batch.
sc.pl.umap(adata,color='Dataset')
sc.pl.umap(adata,color='Organ groups')
Run BBKNN
bbknn(adata,batch_key='sample',trim=10000,approx=False,use_faiss=False)
computing batch balanced neighbors finished (0:06:22.40) --> added to `.uns['neighbors']` 'distances', weighted adjacency matrix 'connectivities', weighted adjacency matrix
sc.tl.umap(adata)
computing UMAP finished (0:06:30.40) --> added 'X_umap', UMAP coordinates (adata.obsm)
sc.pl.umap(adata,color='Dataset')
sc.pl.umap(adata,color='Organ groups')
... storing 'Organ groups' as categorical
This dataset contains subsampled cells from above to create more numerically balanced populations
bdata = sc.read("MouseAtlas.subset")
Let's see how it looks before we apply bbknn
sc.pp.neighbors(bdata)
computing neighbors using 'X_pca' with n_pcs = 50 finished (0:01:53.03) --> added to `.uns['neighbors']` 'distances', weighted adjacency matrix 'connectivities', weighted adjacency matrix
sc.tl.umap(bdata)
computing UMAP finished (0:02:25.33) --> added 'X_umap', UMAP coordinates (adata.obsm)
sc.pl.umap(bdata,color='Dataset')
bbknn(bdata,batch_key='sample',trim=10000,approx=False,use_faiss=False)
computing batch balanced neighbors finished (0:02:02.51) --> added to `.uns['neighbors']` 'distances', weighted adjacency matrix 'connectivities', weighted adjacency matrix
sc.tl.umap(bdata)
computing UMAP finished (0:02:42.40) --> added 'X_umap', UMAP coordinates (adata.obsm)
sc.pl.umap(bdata, color='Dataset')
sc.pl.umap(bdata, color='Organ groups')
sc.pl.umap(bdata, color='Cell types')
... storing 'Cell types' as categorical
These are marker genes we used to annotate cell types
for gene in ["Nanog","Pou5f1","Hox_genes","Cd34","Ptprc","Cd3g","Cd19","Cd68","S100a8","Hba-a1"]:
sc.pl.umap(bdata,color=gene,color_map='OrRd')
for gene in ["Cdh5","Col1a1","Pdgfra","Pdgfrb","Myl9","Syt11","Epcam","Cdh1","Krt8"]:
sc.pl.umap(bdata,color=gene,color_map='OrRd')