import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import scanpy as sc
import pandas as pd
import numpy as np
import trvae
/home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace. from pandas.core.index import RangeIndex Using TensorFlow backend.
sc.set_figure_params(dpi=200)
condition_key = "condition"
cell_type_key = "cell_label"
adata = sc.read("/home/mohsen/data/haber/haber_count.h5ad")
adata
AnnData object with n_obs × n_vars = 9842 × 15215 obs: 'batch', 'barcode', 'condition', 'cell_label'
We can preserve more genes (up to 7000 like scGen) but in order to train the network quickly, we will extract top 1000 genes. This can be done with normalize_hvg
function in the tl
module of trVAE package. The function accepts the following arguments:
.X
attribute.adata
and put total counts per cell in "size_factors" column of adata.obs
(True
is recommended).False
is recommended).adata
after normalization (True
is recommended).adata
normalization.adata = trvae.tl.normalize_hvg(adata,
target_sum=1e4,
size_factors=True,
scale_input=False,
logtrans_input=True,
n_top_genes=1000)
adata
AnnData object with n_obs × n_vars = 9842 × 1000 obs: 'batch', 'barcode', 'condition', 'cell_label', 'size_factors' var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm' uns: 'log1p'
adata.X.min(), adata.X.max()
(0.0, 8.656907)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
WARNING: You’re trying to run this on 1000 dimensions of `.X`, if you really want this, set `use_rep='X'`. Falling back to preprocessing with `sc.pp.pca` and default params.
/home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/numba/compiler.py:602: NumbaPerformanceWarning: The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible. To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help. File "../../anaconda3/envs/mohsen/lib/python3.7/site-packages/umap/rp_tree.py", line 135: @numba.njit(fastmath=True, nogil=True, parallel=True) def euclidean_random_projection_split(data, indices, rng_state): ^ self.func_ir.loc)) /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/umap/nndescent.py:92: NumbaPerformanceWarning: The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible. To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help. File "../../anaconda3/envs/mohsen/lib/python3.7/site-packages/umap/utils.py", line 409: @numba.njit(parallel=True) def build_candidates(current_graph, n_vertices, n_neighbors, max_candidates, rng_state): ^ current_graph, n_vertices, n_neighbors, max_candidates, rng_state /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/numba/compiler.py:602: NumbaPerformanceWarning: The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible. To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help. File "../../anaconda3/envs/mohsen/lib/python3.7/site-packages/umap/nndescent.py", line 47: @numba.njit(parallel=True) def nn_descent( ^ self.func_ir.loc))
sc.pl.umap(adata, color=[condition_key, cell_type_key],
wspace=0.6,
frameon=False)
/home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/matplotlib/font_manager.py:1241: UserWarning: findfont: Font family ['Arial'] not found. Falling back to DejaVu Sans. (prop.get_family(), self.defaultFamily[fontext]))
conditions = adata.obs[condition_key].unique().tolist()
Some of network parameters:
relu
, leaky_relu
, linear
, ...adata.var_names.tolist()
)mse
or sse
)network = trvae.models.trVAE(x_dimension=adata.shape[1],
architecture=[128, 32],
z_dimension=10,
gene_names=adata.var_names.tolist(),
conditions=conditions,
model_path='./models/trVAE/haber/',
alpha=0.0001,
beta=50,
eta=100,
loss_fn='mse',
output_activation='relu')
WARNING: Logging before flag parsing goes to stderr. W0803 11:18:24.945158 140664611731264 module_wrapper.py:139] From /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead. W0803 11:18:24.947272 140664611731264 module_wrapper.py:139] From /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:181: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead. W0803 11:18:24.948538 140664611731264 module_wrapper.py:139] From /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:186: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead. W0803 11:18:26.697491 140664611731264 module_wrapper.py:139] From /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead. W0803 11:18:26.701896 140664611731264 module_wrapper.py:139] From /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead. W0803 11:18:26.711719 140664611731264 module_wrapper.py:139] From /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead. W0803 11:18:26.717433 140664611731264 module_wrapper.py:139] From /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4185: The name tf.truncated_normal is deprecated. Please use tf.random.truncated_normal instead. W0803 11:18:26.822190 140664611731264 module_wrapper.py:139] From /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead. W0803 11:18:26.849472 140664611731264 deprecation.py:506] From /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. W0803 11:18:27.022933 140664611731264 module_wrapper.py:139] From /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4115: The name tf.random_normal is deprecated. Please use tf.random.normal instead. W0803 11:18:27.833210 140664611731264 module_wrapper.py:139] From /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead. W0803 11:18:27.861233 140664611731264 module_wrapper.py:139] From /home/mohsen/projects/trvae/trvae/models/_losses.py:46: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead. W0803 11:18:27.862144 140664611731264 module_wrapper.py:139] From /home/mohsen/projects/trvae/trvae/models/_losses.py:46: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead. W0803 11:18:27.927170 140664611731264 deprecation.py:323] From /home/mohsen/projects/trvae/trvae/models/_utils.py:85: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where
trVAE' network has been successfully constructed! trVAE'snetwork has been successfully compiled!
You can train scArches with train function with the following parameters:
network.train(adata,
condition_key,
train_size=0.8,
n_epochs=300,
batch_size=1024,
early_stop_limit=15,
lr_reducer=10,
verbose=5,
save=True,
)
W0803 11:19:38.272226 140664611731264 module_wrapper.py:139] From /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead. W0803 11:19:38.576020 140664611731264 module_wrapper.py:139] From /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead. W0803 11:19:38.927877 140664611731264 module_wrapper.py:139] From /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead. W0803 11:19:39.929148 140664611731264 module_wrapper.py:139] From /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
|████████████████████| 100.0% - loss: 5.9923 - mmd_loss: 0.9889 - reconstruction_loss: 5.0034 - val_loss: 5.6957 - val_mmd_loss: 0.9684 - val_reconstruction_loss: 4.727380 trVAE has been successfully saved in ./models/trVAE/.
Latent space representation of reference data can be computed using get_latent
function This function has the following parameters:
latent_adata = network.get_latent(adata, condition_key)
latent_adata
AnnData object with n_obs × n_vars = 9842 × 10 obs: 'batch', 'barcode', 'condition', 'cell_label', 'size_factors'
sc.pp.neighbors(latent_adata)
sc.tl.umap(latent_adata)
/home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/numba/compiler.py:602: NumbaPerformanceWarning: The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible. To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help. File "../../anaconda3/envs/mohsen/lib/python3.7/site-packages/umap/rp_tree.py", line 135: @numba.njit(fastmath=True, nogil=True, parallel=True) def euclidean_random_projection_split(data, indices, rng_state): ^ self.func_ir.loc)) /home/mohsen/anaconda3/envs/mohsen/lib/python3.7/site-packages/numba/compiler.py:602: NumbaPerformanceWarning: The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible. To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help. File "../../anaconda3/envs/mohsen/lib/python3.7/site-packages/umap/nndescent.py", line 47: @numba.njit(parallel=True) def nn_descent( ^ self.func_ir.loc))
sc.pl.umap(latent_adata, color=[condition_key, cell_type_key], wspace=0.5, frameon=False)