#!/usr/bin/env python # coding: utf-8 # ## Overview # # This notebook demonstrates how to use Monet to generate simple t-SNE visualizations. A more advanced way of doing this is by first training a `MonetModel`, but we'll save that for later tutorials. # # Aside from the standard t-SNE plot, Monet also supports *exaggerated t-SNE* plots, which are obtained by tweaking one rarely mentioned t-SNE parameter (exaggeration parameter $\alpha$) and stopping the algorithm immediately after the early exaggeration phase. This results in more condensed plots that resemble UMAP results. This approach was first described by [Kobak and Berens (Nat Commun., 2019)](https://www.nature.com/articles/s41467-019-13056-x). It can be useful when you are trying to visualize a lot of cells (~20,000 or more), because t-SNE plots can look very crowded in that case. # # Support for UMAP is coming soon! # ### Setting up the notebook # # Note: We're increasing the default width of the Jupyter notebook to be able to display larger figures. # In[1]: # change notebook width and font from IPython.core.display import HTML, display display(HTML("""""")) from monet import util _LOGGER = util.configure_logger() # the following is to allow embedding of plotly figures from plotly.offline import init_notebook_mode import plotly.graph_objs as go init_notebook_mode(connected=True) # ## Performing t-SNE on PBMC data # # t-SNE can be performed using the `visualize.tsne_plot()` function. This returns both the figure and a pandas `DataFrame` containing the t-SNE scores / coordinates. As you can see from the results, the t-SNE took approx. 24 seconds for this dataset containing 10,681 cells. # In[2]: import gc from monet import ExpMatrix from monet import visualize expression_file = 'data/v3_human_pbmc_10k_expression.npz' matrix = ExpMatrix.load_npz(expression_file) fig, tsne_scores = visualize.tsne_plot(matrix, title='PBMC data') # by default, tsne_plot() performs PCA with 30 principal components # this can be changed, e.g. to 50, using tsne_plot(..., num_components=50) fig.show() # free up memory del matrix; gc.collect() # This is how the `tsne_scores` DataFrame looks like: # In[3]: tsne_scores.iloc[:5] # ## Overlaying clustering results onto the t-SNE # # Often, we would like to use t-SNE to also show clustering results (we will look at how to perform clustering with Monet in the next tutorial) and color cells by their cluster assignments. Here, we will load clustering results from a tab-delimited text file, and then overlay the cell type assignments onto the previously generated t-SNE result. # First, we're loading the clustering result (cell labels). This is just a tab-delimited text file that we load into a pandas `Series`. # In[4]: from monet import util cell_label_file = 'data/v3_pbmc_10k_clustering_annotated.tsv' cell_labels = util.load_cell_labels(cell_label_file) # This is how the `cell_labels` data looks like: # In[5]: df = cell_labels.to_frame() df.columns = ['Labels'] df.iloc[:5] # Now, we're ready to plot. We will specify the order in which the clusters are plotted and appear in the legend. We will also make sure that cells labeled as "Other" appear in light gray and don't distract from the rest of the figure. # # You can always change the colors used for each cluster by adding more entries to the `cluster_colors` dictionary. Otherwise, Monet will just use the default color order from plotly. # In[6]: cluster_order = [ 'Naive T cells', 'CD4+ Memory T cells', 'CD8+/CD161- Memory T cells', 'CD8+/CD161+ Memory T cells', 'NK cells', 'Naive B cells', 'Memory B cells', 'Monocytes', 'mDCs', 'pDCs', 'Other', ] cluster_colors = { 'Other': 'lightgray', } fig = visualize.plot_cells( tsne_scores, cell_labels=cell_labels, cluster_order=cluster_order, cluster_colors=cluster_colors, width=1200) fig.show() # ## Performing *exaggerated* t-SNE on PBMC data # # Simply pass `exaggerated_tsne=True` to the `tsne_plot()` function. The fact that you are using a modified version of t-SNE is indicated by the asterix (\*) in the axis labels. Since the plot is a lot more condensed, it's a good idea to also pass `marker_size=2.5` to reduce the size of the cells in the plot from its default value (4). # # Note that the `tsne_plot()` function accepts the same parameters as the `plot_cells()` function used above to plot an existing t-SNE result. # In[7]: import gc from monet import ExpMatrix from monet import visualize expression_file = 'data/v3_human_pbmc_10k_expression.npz' matrix = ExpMatrix.load_npz(expression_file) fig, tsne_scores = visualize.tsne_plot( matrix, exaggerated_tsne=True, marker_size=2.5, cell_labels=cell_labels, cluster_order=cluster_order, cluster_colors=cluster_colors, width=1200, title='PBMC data') fig.show() # free up memory del matrix; gc.collect()