import scgen
import scanpy as sc
Using TensorFlow backend.
train = sc.read("./tests/data/pancreas.h5ad",
backup_url="https://www.dropbox.com/s/qj1jlm9w10wmt0u/pancreas.h5ad?dl=1")
HBox(children=(FloatProgress(value=1.0, bar_style='info', description='pancreas.h5ad', max=1.0, style=Progress…
/Users/mohammad.lotfollahi/opt/miniconda3/lib/python3.7/site-packages/anndata/compat/__init__.py:161: FutureWarning: Moving element from .uns['neighbors']['distances'] to .obsp['distances']. This is where adjacency matrices should go now. FutureWarning, /Users/mohammad.lotfollahi/opt/miniconda3/lib/python3.7/site-packages/anndata/compat/__init__.py:161: FutureWarning: Moving element from .uns['neighbors']['connectivities'] to .obsp['connectivities']. This is where adjacency matrices should go now. FutureWarning,
We need two observation labels "batch" and "cell_type" for our batch_removal procedure. There exist a "batch" obs but no "cell_type", so we add it as a .obs of adata
train.obs["cell_type"] = train.obs["celltype"].tolist()
sc.pp.neighbors(train)
sc.tl.umap(train)
sc.pl.umap(train, color=["batch", "cell_type"], wspace=.5, frameon=False)
network = scgen.VAEArithKeras(x_dimension= train.shape[1], model_path="./models/batch")
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input (InputLayer) (None, 2448) 0 _________________________________________________________________ encoder (Model) (None, 100) 2765000 _________________________________________________________________ decoder (Model) (None, 2448) 2687248 ================================================================= Total params: 5,452,248 Trainable params: 5,445,848 Non-trainable params: 6,400 _________________________________________________________________
We train the model for 100 epochs
network.train(train_data=train, n_epochs=100)
Now, we pass our train (uncorected) adata and network object to batch_removal fucntion which returns adata of batch corrected data
corrected_adata = scgen.batch_removal(network, train, batch_key="batch", cell_label_key="cell_type")
sc.pp.neighbors(corrected_adata)
sc.tl.umap(corrected_adata)
WARNING: You’re trying to run this on 2448 dimensions of `.X`, if you really want this, set `use_rep='X'`. Falling back to preprocessing with `sc.pp.pca` and default params.
sc.pl.umap(corrected_adata, color=["batch", "cell_type"], wspace=.5, frameon=False)
... storing 'celltype' as categorical ... storing 'sample' as categorical ... storing 'batch' as categorical ... storing 'louvain' as categorical ... storing 'cell_type' as categorical ... storing 'concat_batch' as categorical
Note that original adata.raw
for the adata.raw is saved to corrected_adata.raw
and you can use that for fruther analaysis
corrected_adata.raw
<anndata._core.raw.Raw at 0x7f9708ac3e10>
sc.pl.umap(corrected_adata, color=["INS", "cell_type"], wspace=.5, frameon=False, use_raw=True)