This notebook demonstrates how to use Monet to generate simple t-SNE visualizations. A more advanced way of doing this is by first training a MonetModel
, but we'll save that for later tutorials.
Aside from the standard t-SNE plot, Monet also supports exaggerated t-SNE plots, which are obtained by tweaking one rarely mentioned t-SNE parameter (exaggeration parameter $\alpha$) and stopping the algorithm immediately after the early exaggeration phase. This results in more condensed plots that resemble UMAP results. This approach was first described by Kobak and Berens (Nat Commun., 2019). It can be useful when you are trying to visualize a lot of cells (~20,000 or more), because t-SNE plots can look very crowded in that case.
Support for UMAP is coming soon!
Note: We're increasing the default width of the Jupyter notebook to be able to display larger figures.
# change notebook width and font
from IPython.core.display import HTML, display
display(HTML("""<style>
/* source: http://stackoverflow.com/a/24207353 */
.container { width:95% !important; }
div.prompt, div.CodeMirror pre, div.output_area pre { font-family:'Hack', monospace; font-size: 10.5pt; }
</style>"""))
from monet import util
_LOGGER = util.configure_logger()
# the following is to allow embedding of plotly figures
from plotly.offline import init_notebook_mode
import plotly.graph_objs as go
init_notebook_mode(connected=True)
t-SNE can be performed using the visualize.tsne_plot()
function. This returns both the figure and a pandas DataFrame
containing the t-SNE scores / coordinates. As you can see from the results, the t-SNE took approx. 24 seconds for this dataset containing 10,681 cells.
import gc
from monet import ExpMatrix
from monet import visualize
expression_file = 'data/v3_human_pbmc_10k_expression.npz'
matrix = ExpMatrix.load_npz(expression_file)
fig, tsne_scores = visualize.tsne_plot(matrix, title='PBMC data')
# by default, tsne_plot() performs PCA with 30 principal components
# this can be changed, e.g. to 50, using tsne_plot(..., num_components=50)
fig.show()
# free up memory
del matrix; gc.collect()
[2020-06-16 09:28:48] (monet.core.exp_matrix) INFO: Loaded expression matrix with 10681 cells and 16319 genes -- .npz format, 36.7 MB (hash: f9d7fac20f4de6184ff55388c267699a). [2020-06-16 09:28:48] (root) INFO: No Monet model provided, performing PCA to determine first 30principal components... [2020-06-16 09:28:48] (monet.latent.pca_model) INFO: Converted matrix to float32 data type. [2020-06-16 09:28:55] (monet.latent.pca_model) INFO: The PCA took 1.5 s. [2020-06-16 09:28:55] (monet.latent.pca_model) INFO: The fraction of variance explained by the 30 selected PCs is 33.4 %. [2020-06-16 09:28:55] (root) INFO: Performing t-SNE... [2020-06-16 09:29:34] (root) INFO: t-SNE took 38.9 s.
3541
This is how the tsne_scores
DataFrame looks like:
tsne_scores.iloc[:5]
t-SNE dim. 1 | t-SNE dim. 2 | |
---|---|---|
Cells | ||
GCACGTGGTCCACTCT-1 | -6.325842 | -33.456768 |
GCGAGAACATCCGTTC-1 | -12.233220 | -68.325409 |
GTTTGGACAACGAGGT-1 | -30.231630 | -10.729974 |
AGCGTATTCTAGCATG-1 | -44.112564 | -10.516320 |
GGTGATTTCGAGATGG-1 | 39.232872 | -32.560066 |
Often, we would like to use t-SNE to also show clustering results (we will look at how to perform clustering with Monet in the next tutorial) and color cells by their cluster assignments. Here, we will load clustering results from a tab-delimited text file, and then overlay the cell type assignments onto the previously generated t-SNE result.
First, we're loading the clustering result (cell labels). This is just a tab-delimited text file that we load into a pandas Series
.
from monet import util
cell_label_file = 'data/v3_pbmc_10k_clustering_annotated.tsv'
cell_labels = util.load_cell_labels(cell_label_file)
[2020-06-16 09:29:34] (monet.util.files) INFO: Loaded labels for 10681 cells from tab-delimited plain-text file.
This is how the cell_labels
data looks like:
df = cell_labels.to_frame()
df.columns = ['Labels']
df.iloc[:5]
Labels | |
---|---|
Cells | |
GCACGTGGTCCACTCT-1 | Other |
GCGAGAACATCCGTTC-1 | Naive B cells |
GTTTGGACAACGAGGT-1 | Monocytes |
AGCGTATTCTAGCATG-1 | Monocytes |
GGTGATTTCGAGATGG-1 | CD8+/CD161- Memory T cells |
Now, we're ready to plot. We will specify the order in which the clusters are plotted and appear in the legend. We will also make sure that cells labeled as "Other" appear in light gray and don't distract from the rest of the figure.
You can always change the colors used for each cluster by adding more entries to the cluster_colors
dictionary. Otherwise, Monet will just use the default color order from plotly.
cluster_order = [
'Naive T cells',
'CD4+ Memory T cells',
'CD8+/CD161- Memory T cells',
'CD8+/CD161+ Memory T cells',
'NK cells',
'Naive B cells',
'Memory B cells',
'Monocytes',
'mDCs',
'pDCs',
'Other',
]
cluster_colors = {
'Other': 'lightgray',
}
fig = visualize.plot_cells(
tsne_scores,
cell_labels=cell_labels,
cluster_order=cluster_order,
cluster_colors=cluster_colors,
width=1200)
fig.show()
Simply pass exaggerated_tsne=True
to the tsne_plot()
function. The fact that you are using a modified version of t-SNE is indicated by the asterix (*) in the axis labels. Since the plot is a lot more condensed, it's a good idea to also pass marker_size=2.5
to reduce the size of the cells in the plot from its default value (4).
Note that the tsne_plot()
function accepts the same parameters as the plot_cells()
function used above to plot an existing t-SNE result.
import gc
from monet import ExpMatrix
from monet import visualize
expression_file = 'data/v3_human_pbmc_10k_expression.npz'
matrix = ExpMatrix.load_npz(expression_file)
fig, tsne_scores = visualize.tsne_plot(
matrix,
exaggerated_tsne=True, marker_size=2.5,
cell_labels=cell_labels,
cluster_order=cluster_order,
cluster_colors=cluster_colors,
width=1200,
title='PBMC data')
fig.show()
# free up memory
del matrix; gc.collect()
[2020-06-16 09:29:38] (monet.core.exp_matrix) INFO: Loaded expression matrix with 10681 cells and 16319 genes -- .npz format, 36.7 MB (hash: f9d7fac20f4de6184ff55388c267699a). [2020-06-16 09:29:38] (root) INFO: No Monet model provided, performing PCA to determine first 30principal components... [2020-06-16 09:29:38] (monet.latent.pca_model) INFO: Converted matrix to float32 data type. [2020-06-16 09:29:46] (monet.latent.pca_model) INFO: The PCA took 1.8 s. [2020-06-16 09:29:46] (monet.latent.pca_model) INFO: The fraction of variance explained by the 30 selected PCs is 33.4 %. [2020-06-16 09:29:46] (root) INFO: Performing exaggerated t-SNE... [2020-06-16 09:29:57] (root) INFO: t-SNE took 11.5 s.
27891