Created by Maren Büttner on 22.08.2020.
In this notebook, we are going to build a reference intestine atlas and share the model weights
import scarches as sca
import scanpy as sc
sc.settings.set_figure_params(dpi=100, frameon=False, facecolor='white')
Using TensorFlow backend.
adata = sc.read("./data/adata_annotated_counts.h5ad")
adata
AnnData object with n_obs × n_vars = 70459 × 16878 obs: 'batch', 'doublet_score', 'log_counts', 'louvain', 'mt_frac', 'n_counts', 'n_genes', 'ribo_frac', 'sample', 'cell_type', 'cell_type_refined' var: 'gene_ids', 'is_ambient-CD1', 'is_ambient-CD2', 'is_ambient-CD3', 'is_ambient-Anika_Control_1', 'is_ambient-Anika_Control_2', 'is_ambient-Anika_Control_3_FVR', 'is_ambient-Anika_Control_4_FVR', 'is_ambient-Anika_FVR', 'is_ambient-Anika_Mutant_1', 'is_ambient-Anika_Mutant_2_new', 'is_ambient-Anika_Mutant_3_FVR', 'is_ambient-Anika_Mutant_4_FVR', 'is_ambient-Anika_FVR_enriched', 'is_ambient-Anika_wholecrypt', 'n_cells'
condition_key
is the column name which stores batch id in your adata.obs
condition_key = "batch"
We normalize the data and select 1000 highly variable genes using sca.data.normalize_hvg
this function return normalized data in
adata.X
and raw count values in adata.raw.X
. It also store noramlization factors in adata.obs["size_factors"]
adata = sca.data.normalize_hvg(adata,batch_key=condition_key,n_top_genes=1000)
Using 290 HVGs from full intersect set Using 153 HVGs from n_batch-1 set Using 145 HVGs from n_batch-2 set Using 81 HVGs from n_batch-3 set Using 60 HVGs from n_batch-4 set Using 71 HVGs from n_batch-5 set Using 57 HVGs from n_batch-6 set Using 64 HVGs from n_batch-7 set Using 77 HVGs from n_batch-8 set Using 2 HVGs from n_batch-9 set Using 1000 HVGs
adata
AnnData object with n_obs × n_vars = 70459 × 1000 obs: 'batch', 'doublet_score', 'log_counts', 'louvain', 'mt_frac', 'n_counts', 'n_genes', 'ribo_frac', 'sample', 'cell_type', 'cell_type_refined', 'size_factors' var: 'gene_ids', 'is_ambient-CD1', 'is_ambient-CD2', 'is_ambient-CD3', 'is_ambient-Anika_Control_1', 'is_ambient-Anika_Control_2', 'is_ambient-Anika_Control_3_FVR', 'is_ambient-Anika_Control_4_FVR', 'is_ambient-Anika_FVR', 'is_ambient-Anika_Mutant_1', 'is_ambient-Anika_Mutant_2_new', 'is_ambient-Anika_Mutant_3_FVR', 'is_ambient-Anika_Mutant_4_FVR', 'is_ambient-Anika_FVR_enriched', 'is_ambient-Anika_wholecrypt', 'n_cells', 'mean', 'std' uns: 'log1p'
There are some parameters that worth to be mentioned here:
mse
, sse
, nb
, or zinb
. Please NOTE that If you are going to use nb
or zinb
loss function, we suggest that setting beta
hyperparameter to zero will be the best config for scArches to train on your task.Note : nb
(negative binomial) and zinb
(zero inflated nb
) require raw count data in your adata.raw.X
and normalaized log-transformed values in adata.X
. The data must also contain adata.obs["size_factors"]
which are the normalization factor used
to normalze each cell count values. We suggest to use scarches.data.normalize_hvg()
(see here) function or use scanpy's scanpy.pp.normalize_total(adata,...,key_added="size_factors")
(see here).
HINT : we recommend to use nb
loss, however if you did not find the results satisfying then consider using sse
network = sca.models.scArches(task_name='intestine_atlas',
x_dimension=adata.shape[1],
z_dimension=10,
architecture=[128, 128],
gene_names=adata.var_names.tolist(),
conditions=adata.obs[condition_key].unique().tolist(),
alpha=0.001,
loss_fn='nb',
model_path="./models/scArches/",
)
WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:181: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:186: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:4185: The name tf.truncated_normal is deprecated. Please use tf.random.truncated_normal instead. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:4115: The name tf.random_normal is deprecated. Please use tf.random.normal instead. scArchesNB's network has been successfully constructed! WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/scArches-0.1.2-py3.7.egg/scarches/models/_losses.py:94: The name tf.lgamma is deprecated. Please use tf.math.lgamma instead. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/scArches-0.1.2-py3.7.egg/scarches/models/_losses.py:95: The name tf.log is deprecated. Please use tf.math.log instead. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/scArches-0.1.2-py3.7.egg/scarches/models/_utils.py:88: The name tf.is_nan is deprecated. Please use tf.math.is_nan instead. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/scArches-0.1.2-py3.7.egg/scarches/models/_utils.py:88: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where scArchesNB's network has been successfully compiled!
You can train scArches with train
function with the following parameters:
obs
matrix in adata
which contains the conditions for each sample.False
and scArches' pretrained model exists in model_path
, will restore scArches' weights. Otherwise will train and validate scArches on adata
.network.train(adata,
condition_key=condition_key,
n_epochs=30,
batch_size=128,
save=True,
retrain=True)
WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead. WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead. |████████████████████| 100.0% - loss: 2.7433 - kl_loss: 13.9198 - recon_loss: 2.7294 - val_loss: 0.7818 - val_kl_loss: 3.8901 - val_recon_loss: 0.7779 scArchesNB has been successfully saved in ./models/scArches/intestine_atlas.
Latent space representation of reference data can be obtained using get_latent
function
This function has the following parameters:
obs
matrix in adata
which contains the study for each sample.latent_adata = network.get_latent(adata, condition_key)
sc.pp.neighbors(latent_adata)
sc.tl.umap(latent_adata)
sc.pl.umap(latent_adata, color=[condition_key, "cell_type"],
frameon=False, wspace=0.6)
You can easily get TOKEN by signing up in Zenodo Website and creating an app in the settings. You just have to following these steps for creating a new TOKEN:
deposit:actions
and deposit:write
.NOTE: Zenodo will show the created TOKEN only once so be careful in preserving it. If you lost your TOKEN you have to create new one.
ACCESS_TOKEN = ""
You can use wrapper functions in zenodo
module in scArches package to interact with your depositions and uploaded files in Zenodo. In Zenodo, A deposition is a cloud space for a publication, poster, etc which contains multiple files.
In order to create a deposition in Zenodo, You can call our create_deposition
function with the following parameters:
{
"name": "LASTNAME, FIRSTNAME", (Has to be in this format)
"affiliation": "AFFILIATION", (Optional)
"orcid": "ORCID" (Optional, has to be a valid ORCID)
}
deposition_id = sca.zenodo.create_deposition(ACCESS_TOKEN,
upload_type="other",
title='scArches-intestine',
description='pre-trained scArches on intestine',
creators=[
{"name": "Büttner, Maren", "affiliation": "helmholtz center"},
],
)
New Deposition has been successfully created!
After creating a deposition, you can easily upload your pre-trained scArches model using upload_model
function in zenodo
module. This function accepts the following parameters:
The function will return the generated download_link
in order to use and provide other
download_link = sca.zenodo.upload_model(network,
deposition_id=deposition_id,
access_token=ACCESS_TOKEN)
Model has been successfully uploaded
sca.zenodo.publish_deposition(deposition_id, ACCESS_TOKEN)
Deposition with id = 3995049 has been successfully published!
download_link
'https://zenodo.org/record/3995049/files/scNet-intestine_atlas.zip?download=1'
please fill this form to enter the model in our database