This notebook demonstrates how to train a Monet model on an scRNA-Seq dataset. Monet models are encapsulated by MonetModel
objects, and are described in the Monet paper (Wagner, 2020). After training the model on one PBMC data, we will see how this model can serve as the basis for t-SNE analyses of arbitrary PBMC datasets. More generally, Monet models are useful for analyses that aim to integrate data from multiple scRNA-Seq datasets from the same tissue type, which will be demonstrate in the following tutorials.
# change notebook width and font
from IPython.core.display import HTML, display
display(HTML("""<style>
/* source: http://stackoverflow.com/a/24207353 */
.container { width:95% !important; }
div.prompt, div.CodeMirror pre, div.output_area pre { font-family:'Hack', monospace; font-size: 10.5pt; }
</style>"""))
from monet import util
_LOGGER = util.configure_logger()
# the following is to allow embedding of plotly figures
from plotly.offline import init_notebook_mode
import plotly.graph_objs as go
init_notebook_mode(connected=True)
Here, we're training the model. The first step of training the model consists of performing molecular cross-validation (MCV; Batson et al., 2019) to infer the dimensionality of the data. Monet performs a grid search using 5-fold MCV, which is somewhat time-consuming. As you can see from the results below, for this dataset consisting of 10,681 cells, it took approx. ~20 minutes for this step to complete. The second step is a nearest-neighbor aggregation step, which is quite fast.
After training is complete, we're saving the trained model to disk using the save_pickle()
method.
import gc
from monet import ExpMatrix
from monet import MonetModel
expression_file = 'data/v3_human_pbmc_10k_expression.npz'
monet_model_file = 'output/v3_human_pbmc_10k_monet_model.pickle'
matrix = ExpMatrix.load_npz(expression_file)
# initialize and train the model
monet_model = MonetModel()
monet_model.fit(matrix)
# save the model to disk
monet_model.save_pickle(monet_model_file)
# free up memory
del matrix; gc.collect()
[2020-06-17 10:35:53] (monet.core.exp_matrix) INFO: Loaded expression matrix with 10681 cells and 16319 genes -- .npz format, 36.7 MB (hash: f9d7fac20f4de6184ff55388c267699a). [2020-06-17 10:35:53] (monet.latent.monet_model) INFO: Beginning of Phase I (Estimate dimensionality)... [2020-06-17 10:35:53] (monet.latent.monet_model) INFO: Using molecular cross-validation to determine the number of PCs... [2020-06-17 10:35:53] (monet.latent.monet_model) INFO: Testing coarse grid of num_component values... [2020-06-17 10:35:53] (monet.latent.monet_model) INFO: Testing grid of 10 num_component values... [2020-06-17 10:35:53] (monet.latent.monet_model) INFO: Now processing split 1/5... [2020-06-17 10:35:53] (monet.latent.util) INFO: Data will be split into datasets containing 90.4% and 10.0% of transcripts, respectively. [2020-06-17 10:36:00] (monet.latent.util) INFO: Done splitting data! [2020-06-17 10:36:07] (monet.latent.pca_model) INFO: The PCA took 2.3 s. [2020-06-17 10:36:08] (monet.latent.pca_model) INFO: The fraction of variance explained by the 100 selected PCs is 35.8 %. [2020-06-17 10:36:14] (monet.latent.util) INFO: Testing value 1/10 (10 PCs)... [2020-06-17 10:36:25] (monet.latent.util) INFO: Testing value 2/10 (20 PCs)... [2020-06-17 10:36:35] (monet.latent.util) INFO: Testing value 3/10 (30 PCs)... [2020-06-17 10:36:46] (monet.latent.util) INFO: Testing value 4/10 (40 PCs)... [2020-06-17 10:36:56] (monet.latent.util) INFO: Testing value 5/10 (50 PCs)... [2020-06-17 10:37:07] (monet.latent.util) INFO: Testing value 6/10 (60 PCs)... [2020-06-17 10:37:18] (monet.latent.util) INFO: Testing value 7/10 (70 PCs)... [2020-06-17 10:37:28] (monet.latent.util) INFO: Testing value 8/10 (80 PCs)... [2020-06-17 10:37:39] (monet.latent.util) INFO: Testing value 9/10 (90 PCs)... [2020-06-17 10:37:50] (monet.latent.util) INFO: Testing value 10/10 (100 PCs)... [2020-06-17 10:38:00] (monet.latent.monet_model) INFO: Now processing split 2/5... [2020-06-17 10:38:00] (monet.latent.util) INFO: Data will be split into datasets containing 90.4% and 10.0% of transcripts, respectively. [2020-06-17 10:38:08] (monet.latent.util) INFO: Done splitting data! [2020-06-17 10:38:16] (monet.latent.pca_model) INFO: The PCA took 2.3 s. [2020-06-17 10:38:16] (monet.latent.pca_model) INFO: The fraction of variance explained by the 100 selected PCs is 35.8 %. [2020-06-17 10:38:23] (monet.latent.util) INFO: Testing value 1/10 (10 PCs)... [2020-06-17 10:38:33] (monet.latent.util) INFO: Testing value 2/10 (20 PCs)... [2020-06-17 10:38:44] (monet.latent.util) INFO: Testing value 3/10 (30 PCs)... [2020-06-17 10:38:54] (monet.latent.util) INFO: Testing value 4/10 (40 PCs)... [2020-06-17 10:39:05] (monet.latent.util) INFO: Testing value 5/10 (50 PCs)... [2020-06-17 10:39:15] (monet.latent.util) INFO: Testing value 6/10 (60 PCs)... [2020-06-17 10:39:26] (monet.latent.util) INFO: Testing value 7/10 (70 PCs)... [2020-06-17 10:39:37] (monet.latent.util) INFO: Testing value 8/10 (80 PCs)... [2020-06-17 10:39:48] (monet.latent.util) INFO: Testing value 9/10 (90 PCs)... [2020-06-17 10:39:58] (monet.latent.util) INFO: Testing value 10/10 (100 PCs)... [2020-06-17 10:40:09] (monet.latent.monet_model) INFO: Now processing split 3/5... [2020-06-17 10:40:09] (monet.latent.util) INFO: Data will be split into datasets containing 90.4% and 10.0% of transcripts, respectively. [2020-06-17 10:40:16] (monet.latent.util) INFO: Done splitting data! [2020-06-17 10:40:24] (monet.latent.pca_model) INFO: The PCA took 2.4 s. [2020-06-17 10:40:24] (monet.latent.pca_model) INFO: The fraction of variance explained by the 100 selected PCs is 35.8 %. [2020-06-17 10:40:31] (monet.latent.util) INFO: Testing value 1/10 (10 PCs)... [2020-06-17 10:40:41] (monet.latent.util) INFO: Testing value 2/10 (20 PCs)... [2020-06-17 10:40:52] (monet.latent.util) INFO: Testing value 3/10 (30 PCs)... [2020-06-17 10:41:02] (monet.latent.util) INFO: Testing value 4/10 (40 PCs)... [2020-06-17 10:41:13] (monet.latent.util) INFO: Testing value 5/10 (50 PCs)... [2020-06-17 10:41:24] (monet.latent.util) INFO: Testing value 6/10 (60 PCs)... [2020-06-17 10:41:35] (monet.latent.util) INFO: Testing value 7/10 (70 PCs)... [2020-06-17 10:41:46] (monet.latent.util) INFO: Testing value 8/10 (80 PCs)... [2020-06-17 10:41:57] (monet.latent.util) INFO: Testing value 9/10 (90 PCs)... [2020-06-17 10:42:07] (monet.latent.util) INFO: Testing value 10/10 (100 PCs)... [2020-06-17 10:42:18] (monet.latent.monet_model) INFO: Now processing split 4/5... [2020-06-17 10:42:18] (monet.latent.util) INFO: Data will be split into datasets containing 90.4% and 10.0% of transcripts, respectively. [2020-06-17 10:42:26] (monet.latent.util) INFO: Done splitting data! [2020-06-17 10:42:34] (monet.latent.pca_model) INFO: The PCA took 2.5 s. [2020-06-17 10:42:34] (monet.latent.pca_model) INFO: The fraction of variance explained by the 100 selected PCs is 35.8 %. [2020-06-17 10:42:41] (monet.latent.util) INFO: Testing value 1/10 (10 PCs)... [2020-06-17 10:42:52] (monet.latent.util) INFO: Testing value 2/10 (20 PCs)... [2020-06-17 10:43:02] (monet.latent.util) INFO: Testing value 3/10 (30 PCs)... [2020-06-17 10:43:13] (monet.latent.util) INFO: Testing value 4/10 (40 PCs)... [2020-06-17 10:43:24] (monet.latent.util) INFO: Testing value 5/10 (50 PCs)... [2020-06-17 10:43:35] (monet.latent.util) INFO: Testing value 6/10 (60 PCs)... [2020-06-17 10:43:45] (monet.latent.util) INFO: Testing value 7/10 (70 PCs)... [2020-06-17 10:43:56] (monet.latent.util) INFO: Testing value 8/10 (80 PCs)... [2020-06-17 10:44:07] (monet.latent.util) INFO: Testing value 9/10 (90 PCs)... [2020-06-17 10:44:17] (monet.latent.util) INFO: Testing value 10/10 (100 PCs)... [2020-06-17 10:44:28] (monet.latent.monet_model) INFO: Now processing split 5/5... [2020-06-17 10:44:28] (monet.latent.util) INFO: Data will be split into datasets containing 90.4% and 10.0% of transcripts, respectively. [2020-06-17 10:44:36] (monet.latent.util) INFO: Done splitting data! [2020-06-17 10:44:44] (monet.latent.pca_model) INFO: The PCA took 2.8 s. [2020-06-17 10:44:44] (monet.latent.pca_model) INFO: The fraction of variance explained by the 100 selected PCs is 35.8 %. [2020-06-17 10:44:51] (monet.latent.util) INFO: Testing value 1/10 (10 PCs)... [2020-06-17 10:45:01] (monet.latent.util) INFO: Testing value 2/10 (20 PCs)... [2020-06-17 10:45:12] (monet.latent.util) INFO: Testing value 3/10 (30 PCs)... [2020-06-17 10:45:22] (monet.latent.util) INFO: Testing value 4/10 (40 PCs)... [2020-06-17 10:45:33] (monet.latent.util) INFO: Testing value 5/10 (50 PCs)... [2020-06-17 10:45:43] (monet.latent.util) INFO: Testing value 6/10 (60 PCs)... [2020-06-17 10:45:54] (monet.latent.util) INFO: Testing value 7/10 (70 PCs)... [2020-06-17 10:46:04] (monet.latent.util) INFO: Testing value 8/10 (80 PCs)... [2020-06-17 10:46:15] (monet.latent.util) INFO: Testing value 9/10 (90 PCs)... [2020-06-17 10:46:26] (monet.latent.util) INFO: Testing value 10/10 (100 PCs)... [2020-06-17 10:46:37] (monet.latent.monet_model) INFO: Coarse grid search yielded optimum of 30 PCs... [2020-06-17 10:46:37] (monet.latent.monet_model) INFO: Testing fine grid of num_component values... [2020-06-17 10:46:37] (monet.latent.monet_model) INFO: Testing grid of 6 num_component values... [2020-06-17 10:46:37] (monet.latent.monet_model) INFO: Now processing split 1/5... [2020-06-17 10:46:37] (monet.latent.util) INFO: Data will be split into datasets containing 90.4% and 10.0% of transcripts, respectively. [2020-06-17 10:46:44] (monet.latent.util) INFO: Done splitting data! [2020-06-17 10:46:52] (monet.latent.pca_model) INFO: The PCA took 2.3 s. [2020-06-17 10:46:53] (monet.latent.pca_model) INFO: The fraction of variance explained by the 100 selected PCs is 35.8 %. [2020-06-17 10:46:59] (monet.latent.util) INFO: Testing value 1/6 (23 PCs)... [2020-06-17 10:47:10] (monet.latent.util) INFO: Testing value 2/6 (26 PCs)... [2020-06-17 10:47:21] (monet.latent.util) INFO: Testing value 3/6 (29 PCs)... [2020-06-17 10:47:32] (monet.latent.util) INFO: Testing value 4/6 (31 PCs)... [2020-06-17 10:47:42] (monet.latent.util) INFO: Testing value 5/6 (34 PCs)... [2020-06-17 10:47:53] (monet.latent.util) INFO: Testing value 6/6 (37 PCs)... [2020-06-17 10:48:03] (monet.latent.monet_model) INFO: Now processing split 2/5... [2020-06-17 10:48:03] (monet.latent.util) INFO: Data will be split into datasets containing 90.4% and 10.0% of transcripts, respectively. [2020-06-17 10:48:11] (monet.latent.util) INFO: Done splitting data! [2020-06-17 10:48:19] (monet.latent.pca_model) INFO: The PCA took 2.3 s. [2020-06-17 10:48:19] (monet.latent.pca_model) INFO: The fraction of variance explained by the 100 selected PCs is 35.8 %. [2020-06-17 10:48:25] (monet.latent.util) INFO: Testing value 1/6 (23 PCs)... [2020-06-17 10:48:35] (monet.latent.util) INFO: Testing value 2/6 (26 PCs)... [2020-06-17 10:48:46] (monet.latent.util) INFO: Testing value 3/6 (29 PCs)... [2020-06-17 10:48:56] (monet.latent.util) INFO: Testing value 4/6 (31 PCs)... [2020-06-17 10:49:07] (monet.latent.util) INFO: Testing value 5/6 (34 PCs)... [2020-06-17 10:49:17] (monet.latent.util) INFO: Testing value 6/6 (37 PCs)... [2020-06-17 10:49:28] (monet.latent.monet_model) INFO: Now processing split 3/5... [2020-06-17 10:49:28] (monet.latent.util) INFO: Data will be split into datasets containing 90.4% and 10.0% of transcripts, respectively. [2020-06-17 10:49:36] (monet.latent.util) INFO: Done splitting data! [2020-06-17 10:49:43] (monet.latent.pca_model) INFO: The PCA took 2.3 s. [2020-06-17 10:49:44] (monet.latent.pca_model) INFO: The fraction of variance explained by the 100 selected PCs is 35.8 %. [2020-06-17 10:49:50] (monet.latent.util) INFO: Testing value 1/6 (23 PCs)... [2020-06-17 10:50:00] (monet.latent.util) INFO: Testing value 2/6 (26 PCs)... [2020-06-17 10:50:11] (monet.latent.util) INFO: Testing value 3/6 (29 PCs)... [2020-06-17 10:50:21] (monet.latent.util) INFO: Testing value 4/6 (31 PCs)... [2020-06-17 10:50:32] (monet.latent.util) INFO: Testing value 5/6 (34 PCs)... [2020-06-17 10:50:42] (monet.latent.util) INFO: Testing value 6/6 (37 PCs)... [2020-06-17 10:50:53] (monet.latent.monet_model) INFO: Now processing split 4/5... [2020-06-17 10:50:53] (monet.latent.util) INFO: Data will be split into datasets containing 90.4% and 10.0% of transcripts, respectively. [2020-06-17 10:51:01] (monet.latent.util) INFO: Done splitting data! [2020-06-17 10:51:08] (monet.latent.pca_model) INFO: The PCA took 2.2 s. [2020-06-17 10:51:09] (monet.latent.pca_model) INFO: The fraction of variance explained by the 100 selected PCs is 35.8 %. [2020-06-17 10:51:15] (monet.latent.util) INFO: Testing value 1/6 (23 PCs)... [2020-06-17 10:51:26] (monet.latent.util) INFO: Testing value 2/6 (26 PCs)... [2020-06-17 10:51:36] (monet.latent.util) INFO: Testing value 3/6 (29 PCs)... [2020-06-17 10:51:47] (monet.latent.util) INFO: Testing value 4/6 (31 PCs)... [2020-06-17 10:51:58] (monet.latent.util) INFO: Testing value 5/6 (34 PCs)... [2020-06-17 10:52:08] (monet.latent.util) INFO: Testing value 6/6 (37 PCs)... [2020-06-17 10:52:19] (monet.latent.monet_model) INFO: Now processing split 5/5... [2020-06-17 10:52:19] (monet.latent.util) INFO: Data will be split into datasets containing 90.4% and 10.0% of transcripts, respectively. [2020-06-17 10:52:26] (monet.latent.util) INFO: Done splitting data! [2020-06-17 10:52:34] (monet.latent.pca_model) INFO: The PCA took 2.2 s. [2020-06-17 10:52:34] (monet.latent.pca_model) INFO: The fraction of variance explained by the 100 selected PCs is 35.8 %. [2020-06-17 10:52:40] (monet.latent.util) INFO: Testing value 1/6 (23 PCs)... [2020-06-17 10:52:51] (monet.latent.util) INFO: Testing value 2/6 (26 PCs)... [2020-06-17 10:53:01] (monet.latent.util) INFO: Testing value 3/6 (29 PCs)... [2020-06-17 10:53:12] (monet.latent.util) INFO: Testing value 4/6 (31 PCs)... [2020-06-17 10:53:22] (monet.latent.util) INFO: Testing value 5/6 (34 PCs)... [2020-06-17 10:53:33] (monet.latent.util) INFO: Testing value 6/6 (37 PCs)... [2020-06-17 10:53:44] (monet.latent.monet_model) INFO: After fine grid search, optimal number of PCs is 30... [2020-06-17 10:53:44] (monet.latent.monet_model) INFO: Testing final grid of num_component values... [2020-06-17 10:53:44] (monet.latent.monet_model) INFO: Testing grid of 0 num_component values... [2020-06-17 10:53:44] (monet.latent.monet_model) INFO: Now processing split 1/5... [2020-06-17 10:53:44] (monet.latent.util) INFO: Data will be split into datasets containing 90.4% and 10.0% of transcripts, respectively. [2020-06-17 10:53:51] (monet.latent.util) INFO: Done splitting data! [2020-06-17 10:53:59] (monet.latent.pca_model) INFO: The PCA took 2.2 s. [2020-06-17 10:53:59] (monet.latent.pca_model) INFO: The fraction of variance explained by the 100 selected PCs is 35.8 %. [2020-06-17 10:54:06] (monet.latent.monet_model) INFO: Now processing split 2/5... [2020-06-17 10:54:06] (monet.latent.util) INFO: Data will be split into datasets containing 90.4% and 10.0% of transcripts, respectively. [2020-06-17 10:54:13] (monet.latent.util) INFO: Done splitting data! [2020-06-17 10:54:21] (monet.latent.pca_model) INFO: The PCA took 2.2 s. [2020-06-17 10:54:21] (monet.latent.pca_model) INFO: The fraction of variance explained by the 100 selected PCs is 35.8 %. [2020-06-17 10:54:28] (monet.latent.monet_model) INFO: Now processing split 3/5... [2020-06-17 10:54:28] (monet.latent.util) INFO: Data will be split into datasets containing 90.4% and 10.0% of transcripts, respectively. [2020-06-17 10:54:35] (monet.latent.util) INFO: Done splitting data! [2020-06-17 10:54:43] (monet.latent.pca_model) INFO: The PCA took 2.3 s. [2020-06-17 10:54:43] (monet.latent.pca_model) INFO: The fraction of variance explained by the 100 selected PCs is 35.8 %. [2020-06-17 10:54:50] (monet.latent.monet_model) INFO: Now processing split 4/5... [2020-06-17 10:54:50] (monet.latent.util) INFO: Data will be split into datasets containing 90.4% and 10.0% of transcripts, respectively. [2020-06-17 10:54:57] (monet.latent.util) INFO: Done splitting data! [2020-06-17 10:55:05] (monet.latent.pca_model) INFO: The PCA took 2.2 s. [2020-06-17 10:55:05] (monet.latent.pca_model) INFO: The fraction of variance explained by the 100 selected PCs is 35.8 %. [2020-06-17 10:55:12] (monet.latent.monet_model) INFO: Now processing split 5/5... [2020-06-17 10:55:12] (monet.latent.util) INFO: Data will be split into datasets containing 90.4% and 10.0% of transcripts, respectively. [2020-06-17 10:55:20] (monet.latent.util) INFO: Done splitting data! [2020-06-17 10:55:27] (monet.latent.pca_model) INFO: The PCA took 2.4 s. [2020-06-17 10:55:28] (monet.latent.pca_model) INFO: The fraction of variance explained by the 100 selected PCs is 35.8 %. [2020-06-17 10:55:34] (monet.latent.monet_model) INFO: After final grid search, optimal number of PCs is 30. [2020-06-17 10:55:34] (monet.latent.monet_model) INFO: Phase I (Estimating dimensionality) took 1181.2 s. [2020-06-17 10:55:34] (monet.latent.monet_model) INFO: Beginning of Phase II (Latent space inference)... [2020-06-17 10:55:34] (monet.latent.monet_model) INFO: Learning the latent space... [2020-06-17 10:55:41] (monet.latent.pca_model) INFO: The PCA took 1.4 s. [2020-06-17 10:55:42] (monet.latent.pca_model) INFO: The fraction of variance explained by the 30 selected PCs is 33.4 %. [2020-06-17 10:55:42] (monet.latent.monet_model) INFO: The median transcript count is 5783.0. [2020-06-17 10:55:42] (monet.latent.monet_model) INFO: Will use num_neighbors=35 for aggregation(value was determined automatically based on a target transcript count of 200000). [2020-06-17 10:55:42] (monet.latent.monet_model) INFO: Now performing aggregation step 1/1... [2020-06-17 10:55:42] (monet.latent.util) INFO: Calculating the pairwise distances took 0.7 s. [2020-06-17 10:55:48] (monet.latent.util) INFO: Sorting the pairwise distance matrix took 6.0 s. [2020-06-17 10:55:51] (monet.latent.util) INFO: Aggregating the expression values took 2.5 s. [2020-06-17 10:55:56] (monet.latent.pca_model) INFO: The PCA took 1.3 s. [2020-06-17 10:55:56] (monet.latent.pca_model) INFO: The fraction of variance explained by the 30 selected PCs is 90.8 %. [2020-06-17 10:55:56] (monet.latent.monet_model) INFO: Learned a 30-dimensional latent space in 22.2 s. [2020-06-17 10:55:56] (monet.latent.monet_model) INFO: Phase II (Latent space inference) took 22.2 s. [2020-06-17 10:55:56] (monet.latent.monet_model) INFO: Fitting the Monet model took 1203.6 s (20.1 min). [2020-06-17 10:55:56] (monet.latent.monet_model) INFO: Saved Monet model to pickle file "output/v3_human_pbmc_10k_monet_model.pickle".
0
We can take a look at the MCV results using the plot_mcv_results()
function. We'll load the Monet model using the MonetModel.load_pickle()
method.
from monet import MonetModel
monet_model_file = 'data/v3_human_pbmc_10k_monet_model.pickle'
#monet_model_file = 'output/v3_human_pbmc_10k_monet_model.pickle'
monet_model = MonetModel.load_pickle(monet_model_file)
fig = monet_model.plot_mcv_results()
fig.layout.title = 'MCV result'
fig.show()
[2020-06-17 10:57:54] (monet.latent.monet_model) INFO: Loaded Monet model from pickle file "data/v3_human_pbmc_10k_monet_model.pickle". 30
The idea behind Monet models is that they represent latent spaces for a particular tissue, and can form the basis for analyses of arbitrary other scRNA-Seq datasets. First, we will perform a t-SNE using the same dataset that the model was trained on. Then, we'll perform another t-SNE with a PBMC dataset generated using an earlier version of the 10x Genomics Chromium chemitry (v2).
*limited to datasets containing UMI counts. The theoretical framework underlying Monet does not extend to scRNA-Seq technologies that do not incorporate UMIs.
import gc
from monet import ExpMatrix
from monet import MonetModel
from monet import visualize
expression_file = 'data/v3_human_pbmc_10k_expression.npz'
monet_model_file = 'data/v3_human_pbmc_10k_monet_model.pickle'
#monet_model_file = 'output/v3_human_pbmc_10k_monet_model.pickle'
matrix = ExpMatrix.load_npz(expression_file)
monet_model = MonetModel.load_pickle(monet_model_file)
fig, tsne_scores = visualize.tsne_plot(
matrix, monet_model,
title='Training data (PBMC v3)')
fig.show()
# free up memory
del matrix; gc.collect()
[2020-06-17 10:59:12] (monet.core.exp_matrix) INFO: Loaded expression matrix with 10681 cells and 16319 genes -- .npz format, 36.7 MB (hash: f9d7fac20f4de6184ff55388c267699a). [2020-06-17 10:59:12] (monet.latent.monet_model) INFO: Loaded Monet model from pickle file "data/v3_human_pbmc_10k_monet_model.pickle". [2020-06-17 10:59:12] (root) INFO: Using Monet model to project data onto a 30-dimensional latent space... [2020-06-17 10:59:14] (monet.latent.pca_model) INFO: Expression profiles will be scaled 1.00x (on average). [2020-06-17 10:59:18] (monet.latent.pca_model) INFO: Projection onto 30 PCs retained 32.1 % of the total variance in the scaled and FT-transformed data. [2020-06-17 10:59:18] (root) INFO: Performing t-SNE... [2020-06-17 10:59:40] (root) INFO: t-SNE took 21.8 s.
18569
import gc
from monet import ExpMatrix
from monet import MonetModel
from monet import visualize
expression_file = 'data/v2_human_pbmc_8k_expression.npz'
monet_model_file = 'data/v3_human_pbmc_10k_monet_model.pickle'
#monet_model_file = 'output/v3_human_pbmc_10k_monet_model.pickle'
matrix = ExpMatrix.load_npz(expression_file)
monet_model = MonetModel.load_pickle(monet_model_file)
fig, tsne_scores = visualize.tsne_plot(
matrix, monet_model,
title='New data (PBMC v2)')
fig.show()
# free up memory
del matrix; gc.collect()
[2020-06-17 10:59:48] (monet.core.exp_matrix) INFO: Loaded expression matrix with 8381 cells and 15510 genes -- .npz format, 19.9 MB (hash: c299645ab748c9dbe4030fc4cace369b). [2020-06-17 10:59:48] (monet.latent.monet_model) INFO: Loaded Monet model from pickle file "data/v3_human_pbmc_10k_monet_model.pickle". [2020-06-17 10:59:48] (root) INFO: Using Monet model to project data onto a 30-dimensional latent space... [2020-06-17 10:59:48] (monet.latent.pca_model) WARNING: No expression data for 1153 / 15510 genes (7.4 %) in the PCA model. [2020-06-17 10:59:49] (monet.latent.pca_model) INFO: Expression profiles will be scaled 1.57x (on average). [2020-06-17 10:59:52] (monet.latent.pca_model) INFO: Projection onto 30 PCs retained 20.8 % of the total variance in the scaled and FT-transformed data. [2020-06-17 10:59:52] (root) INFO: Performing t-SNE... [2020-06-17 11:00:16] (root) INFO: t-SNE took 23.9 s.
18471