This tutorial introduces you to CellRank's **high level API** for computing initial & terminal states and fate probabilities. Once we have the fate probabilities, this tutorial shows you how to use them to plot a directed PAGA graph, to compute putative lineage drivers and to visualize smooth gene expression trends. If you want a bit more control over how initial & terminal states and fate probabilities are computed, then you should check out CellRank's **low level API**, composed of kernels and estimators. This really isn't any more complicated than using scikit-learn, so please do check out the Kernels and estimators tutorial.

In this tutorial, we will use **RNA velocity and transcriptomic similarity** to estimate cell-cell transition probabilities. Using kernels and estimators, you can apply CellRank even without RNA velocity information, check out our CellRank beyond RNA velocity tutorial. CellRank generalizes beyond RNA velocity and is a widely applicable framework to model single-cell data based on the powerful concept of Markov chains.

The first part of this tutorial is very similar to scVelo's tutorial on pancreatic endocrinogenesis. The data we use here comes from Bastidas-Ponce et al. (2018). For more info on scVelo, see the documentation or read the article.

This tutorial notebook can be downloaded using the following link.

Easiest way to start is to download Miniconda3 along with the environment file found here. To create the environment, run `conda create -f environment.yml`

.

In [1]:

```
import scvelo as scv
import scanpy as sc
import cellrank as cr
import numpy as np
scv.settings.verbosity = 3
scv.settings.set_figure_params('scvelo')
cr.settings.verbosity = 2
```

In [2]:

```
import warnings
warnings.simplefilter("ignore", category=UserWarning)
warnings.simplefilter("ignore", category=FutureWarning)
warnings.simplefilter("ignore", category=DeprecationWarning)
```

First, we need to get the data. The following commands will download the `adata`

object and save it under `datasets/endocrinogenesis_day15.5.h5ad`

. We'll also show the fraction of spliced/unspliced reads, which we need to estimate RNA velocity.

In [3]:

```
adata = cr.datasets.pancreas()
scv.pl.proportions(adata)
adata
```

Out[3]:

Filter out genes which don't have enough spliced/unspliced counts, normalize and log transform the data and restrict to the top highly variable genes. Further, compute principal components and moments for velocity estimation. These are standard scanpy/scvelo functions, for more information about them, see the scVelo API.

In [4]:

```
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
sc.tl.pca(adata)
sc.pp.neighbors(adata, n_pcs=30, n_neighbors=30)
scv.pp.moments(adata, n_pcs=None, n_neighbors=None)
```

We will use the dynamical model from scVelo to estimate the velocities. Please make sure to have at least version 0.2.3 of scVelo installed to make use **parallelisation** in `scv.tl.recover_dynamics`

. On my laptop, using 8 cores, the below cell takes about 1:30 min to execute.

In [5]:

```
scv.tl.recover_dynamics(adata, n_jobs=8)
```

Once we have the parameters, we can use these to compute the velocities and the velocity graph. The velocity graph is a weighted graph that specifies how likely two cells are to transition into another, given their velocity vectors and relative positions.

In [6]:

```
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)
```

In [7]:

```
scv.pl.velocity_embedding_stream(adata, basis='umap', legend_fontsize=12, title='', smooth=.8, min_mass=4)
```

CellRank offers various ways to infuse directionality into single-cell data. Here, the directional information comes from RNA velocity, and we use this information to compute initial & terminal states as well as fate probabilities for the dynamical process of pancreatic development.

Terminal states can be computed by running the following command:

In [8]:

```
cr.tl.terminal_states(adata, cluster_key='clusters', weight_connectivities=0.2)
```

The most important parameters in the above function are:

`estimator`

: this determines what's going to behind the scenes to compute the terminal states. Options are`cr.tl.estimators.CFLARE`

("Clustering and Filtering of Left and Right Eigenvectors") or`cr.tl.estimators.GPCCA`

(Generalized Perron Cluster Cluster Analysis). The latter is the default, it computes terminal states by coarse graining the velocity-derived Markov chain into a set of macrostates that represent the slow-time scale dynamics of the process, i.e. it finds the states that you are unlikely to leave again, once you have entered them.`cluster_key`

: takes a key from`adata.obs`

to retrieve pre-computed cluster labels, i.e. 'clusters' or 'louvain'. These labels are then mapped onto the set of terminal states, to associate a name and a color with each state.`n_states`

: number of expected terminal states. This parameter is optional - if it's not provided, this number is estimated from the so-called 'eigengap heuristic' of the spectrum of the transition matrix.`method`

: This is only relevant for the estimator`GPCCA`

. It determines the way in which we compute and sort the real Schur decomposition. The default,`krylov`

, is an iterative procedure that works with sparse matrices which allows the method to scale to very large cell numbers. It relies on the libraries SLEPc and PETSc, which you will have to install separately, see our installation instructions. If your dataset is small (<5k cells), and you don't want to install these at the moment, use`method='brandts'`

. The results will be the same, the difference is that`brandts`

works with dense matrices and won't scale to very large cells numbers.`weight_connectivities`

: weight given to cell-cell similarities to account for noise in velocity vectors.

When running the above command, CellRank adds a key `terminal_states`

to adata.obs and the result can be plotted as:

In [9]:

```
cr.pl.terminal_states(adata)
```