Notebook

Preprocess, train, and upload an small intestine reference atlas¶

Created by Maren Büttner on 22.08.2020.

In this notebook, we are going to build a reference intestine atlas and share the model weights

In [1]:

import scarches as sca
import scanpy as sc
sc.settings.set_figure_params(dpi=100, frameon=False, facecolor='white')

Using TensorFlow backend.

load raw adata¶

In [2]:

adata = sc.read("./data/adata_annotated_counts.h5ad")

In [3]:

adata

Out[3]:

AnnData object with n_obs × n_vars = 70459 × 16878
    obs: 'batch', 'doublet_score', 'log_counts', 'louvain', 'mt_frac', 'n_counts', 'n_genes', 'ribo_frac', 'sample', 'cell_type', 'cell_type_refined'
    var: 'gene_ids', 'is_ambient-CD1', 'is_ambient-CD2', 'is_ambient-CD3', 'is_ambient-Anika_Control_1', 'is_ambient-Anika_Control_2', 'is_ambient-Anika_Control_3_FVR', 'is_ambient-Anika_Control_4_FVR', 'is_ambient-Anika_FVR', 'is_ambient-Anika_Mutant_1', 'is_ambient-Anika_Mutant_2_new', 'is_ambient-Anika_Mutant_3_FVR', 'is_ambient-Anika_Mutant_4_FVR', 'is_ambient-Anika_FVR_enriched', 'is_ambient-Anika_wholecrypt', 'n_cells'

condition_key is the column name which stores batch id in your adata.obs

In [4]:

condition_key = "batch"

We normalize the data and select 1000 highly variable genes using sca.data.normalize_hvg this function return normalized data in adata.X and raw count values in adata.raw.X. It also store noramlization factors in adata.obs["size_factors"]

In [5]:

adata = sca.data.normalize_hvg(adata,batch_key=condition_key,n_top_genes=1000)

Using 290 HVGs from full intersect set
Using 153 HVGs from n_batch-1 set
Using 145 HVGs from n_batch-2 set
Using 81 HVGs from n_batch-3 set
Using 60 HVGs from n_batch-4 set
Using 71 HVGs from n_batch-5 set
Using 57 HVGs from n_batch-6 set
Using 64 HVGs from n_batch-7 set
Using 77 HVGs from n_batch-8 set
Using 2 HVGs from n_batch-9 set
Using 1000 HVGs

In [6]:

adata

Out[6]:

AnnData object with n_obs × n_vars = 70459 × 1000
    obs: 'batch', 'doublet_score', 'log_counts', 'louvain', 'mt_frac', 'n_counts', 'n_genes', 'ribo_frac', 'sample', 'cell_type', 'cell_type_refined', 'size_factors'
    var: 'gene_ids', 'is_ambient-CD1', 'is_ambient-CD2', 'is_ambient-CD3', 'is_ambient-Anika_Control_1', 'is_ambient-Anika_Control_2', 'is_ambient-Anika_Control_3_FVR', 'is_ambient-Anika_Control_4_FVR', 'is_ambient-Anika_FVR', 'is_ambient-Anika_Mutant_1', 'is_ambient-Anika_Mutant_2_new', 'is_ambient-Anika_Mutant_3_FVR', 'is_ambient-Anika_Mutant_4_FVR', 'is_ambient-Anika_FVR_enriched', 'is_ambient-Anika_wholecrypt', 'n_cells', 'mean', 'std'
    uns: 'log1p'

create scArches network from scratch¶

There are some parameters that worth to be mentioned here:

task_name: name of the task (i.e dataset) which you are going to train scArches on it.
x_dimension: number of dimensions in expression space
z_dimension: number of dimensions in latent space of scArches
n_conditions: list of unique conditions in your data (batches, datasets, or domains) (see above to get and idea)
gene_names: list of gene names used as scArches' input
model_path: path to save trained scArches model and its configuration files.
alpha: KL divergence coefficient for VAE. Biger alpha (1<=alpha<=0.1) -> better mixing , small alpha (alpha <= 0.001): good mixing while having disctinct cell types
loss_fn: loss function to be used in scArches. Can be one of mse, sse, nb, or zinb. Please NOTE that If you are going to use nb or zinb loss function, we suggest that setting beta hyperparameter to zero will be the best config for scArches to train on your task.

Note : nb (negative binomial) and zinb (zero inflated nb) require raw count data in your adata.raw.X and normalaized log-transformed values in adata.X. The data must also contain adata.obs["size_factors"]which are the normalization factor used to normalze each cell count values. We suggest to use scarches.data.normalize_hvg() (see here) function or use scanpy's scanpy.pp.normalize_total(adata,...,key_added="size_factors")(see here).

HINT : we recommend to use nb loss, however if you did not find the results satisfying then consider using sse

In [7]:

network = sca.models.scArches(task_name='intestine_atlas',
                              x_dimension=adata.shape[1], 
                              z_dimension=10,
                              architecture=[128, 128],
                              gene_names=adata.var_names.tolist(),
                              conditions=adata.obs[condition_key].unique().tolist(),
                              alpha=0.001,
                              loss_fn='nb',
                              model_path="./models/scArches/",
                              )

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:181: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:186: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:4185: The name tf.truncated_normal is deprecated. Please use tf.random.truncated_normal instead.

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:4115: The name tf.random_normal is deprecated. Please use tf.random.normal instead.

scArchesNB's network has been successfully constructed!
WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/scArches-0.1.2-py3.7.egg/scarches/models/_losses.py:94: The name tf.lgamma is deprecated. Please use tf.math.lgamma instead.

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/scArches-0.1.2-py3.7.egg/scarches/models/_losses.py:95: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/scArches-0.1.2-py3.7.egg/scarches/models/_utils.py:88: The name tf.is_nan is deprecated. Please use tf.math.is_nan instead.

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/scArches-0.1.2-py3.7.egg/scarches/models/_utils.py:88: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
scArchesNB's network has been successfully compiled!

train¶

You can train scArches with train function with the following parameters:

adata: Annotated dataset used for training and evaluating scArches.
condition_key: name of the column in obs matrix in adata which contains the conditions for each sample.
n_epochs: number of epochs used to train scArches.
batch_size: number of sample used to sample as mini-batches in order to optmize scArches.
save: whether to save scArches' model and configs after training phase or not.
retrain: if False and scArches' pretrained model exists in model_path, will restore scArches' weights. Otherwise will train and validate scArches on adata.

In [8]:

network.train(adata,
              condition_key=condition_key,
              n_epochs=30,
              batch_size=128, 
              save=True, 
              retrain=True)

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /Users/maren.buettner/anaconda3/envs/scarches/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

 |████████████████████| 100.0%  - loss: 2.7433 - kl_loss: 13.9198 - recon_loss: 2.7294 - val_loss: 0.7818 - val_kl_loss: 3.8901 - val_recon_loss: 0.7779

scArchesNB has been successfully saved in ./models/scArches/intestine_atlas.

UMAP visualization of latent space¶

Latent space representation of reference data can be obtained using get_latent function This function has the following parameters:

adata: Annotated dataset to be transformed to latent space
condition_key: Name of the column in obs matrix in adata which contains the study for each sample.

In [9]:

latent_adata = network.get_latent(adata, condition_key)

In [10]:

sc.pp.neighbors(latent_adata)
sc.tl.umap(latent_adata)
sc.pl.umap(latent_adata, color=[condition_key, "cell_type"], 
           frameon=False, wspace=0.6)

You can easily get TOKEN by signing up in Zenodo Website and creating an app in the settings. You just have to following these steps for creating a new TOKEN:

Sign in/Register in Zenodo
Go to Applications page.
Click on new_token in Personal access tokens panel.
Give it access for deposit:actions and deposit:write.

NOTE: Zenodo will show the created TOKEN only once so be careful in preserving it. If you lost your TOKEN you have to create new one.

In [11]:

ACCESS_TOKEN = ""

1. create a deposition in your zenodo account¶

You can use wrapper functions in zenodo module in scArches package to interact with your depositions and uploaded files in Zenodo. In Zenodo, A deposition is a cloud space for a publication, poster, etc which contains multiple files.

In order to create a deposition in Zenodo, You can call our create_deposition function with the following parameters:

access_token: Your access token
upload_type: Type of the deposition, has to be one of the following types defined in here.
title: Title of the deposition.
description: Description of the deposition.
creators: List of creators of this deposition. Each item in the list has to be in the following form:

{
    "name": "LASTNAME, FIRSTNAME", (Has to be in this format)
    "affiliation": "AFFILIATION", (Optional)
    "orcid": "ORCID" (Optional, has to be a valid ORCID)
}

In [12]:

deposition_id = sca.zenodo.create_deposition(ACCESS_TOKEN, 
                                             upload_type="other", 
                                             title='scArches-intestine',
                                             description='pre-trained scArches on intestine',                                            
                                             creators=[
                                                 {"name": "Büttner, Maren", "affiliation": "helmholtz center"},
                                             ],
                                             )

New Deposition has been successfully created!

2. upload your model to your deposition¶

After creating a deposition, you can easily upload your pre-trained scArches model using upload_model function in zenodo module. This function accepts the following parameters:

model: Instance of scArches' class which is trained on your task
deposition_id: ID of the deposition you want to upload the model in.
access_token: Your TOKEN.

The function will return the generated download_link in order to use and provide other

In [13]:

download_link = sca.zenodo.upload_model(network, 
                                        deposition_id=deposition_id, 
                                        access_token=ACCESS_TOKEN)

Model has been successfully uploaded

3. publish the created deposition¶

In [14]:

sca.zenodo.publish_deposition(deposition_id, ACCESS_TOKEN)

Deposition with id = 3995049 has been successfully published!

In [15]:

download_link

Out[15]:

'https://zenodo.org/record/3995049/files/scNet-intestine_atlas.zip?download=1'

4. Enter model link in our database¶

please fill this form to enter the model in our database