Single cell data analysis using Scanpy¶

Notebook version: v0.0.5
Created by: Imperial BRC Genomics Facility
Maintained by: Imperial BRC Genomics Facility
Docker image: imperialgenomicsfacility/scanpy-notebook-image:release-v0.0.4
Github repository: imperial-genomics-facility/scanpy-notebook-image
Created on: 2021-May-04 12:32
Contact us: Imperial BRC Genomics Facility
License: Apache License 2.0

Introduction¶

This notebook for running single cell data analysis (for a single sample) using Scanpy package. Most of the codes and documentation used in this notebook has been copied from the following sources:

Tools required¶

Loading required libraries¶

We need to load all the required libraries to environment before we can run any of the analysis steps. Also, we are checking the version information for most of the major packages used for analysis.

In [ ]:

%matplotlib inline
import os
import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns
import matplotlib.pyplot as plt
from copy import deepcopy
import plotly.graph_objs as go
from IPython.display import HTML
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
sc.settings.verbosity = 0
sc.logging.print_header()
sns.set_theme(context='notebook', style='darkgrid', palette='colorblind')
plt.rcParams['figure.dpi'] = 150

We are setting the output file path to $/tmp/scanpy\_output.h5ad$

In [ ]:

results_file = '/tmp/scanpy_output.h5ad'

The following steps are only required for downloading test data from 10X Genomics's website.

In [ ]:

%%bash
rm -rf cache
rm -rf /tmp/data
mkdir -p /tmp/data
wget -q -O /tmp/data/pbmc3k_filtered_gene_bc_matrices.tar.gz \
  /tmp/data http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz
cd /tmp/data
tar -xzf pbmc3k_filtered_gene_bc_matrices.tar.gz

Single cell data analysis using Scanpy¶

Table of contents¶

Introduction¶

Tools required¶

Loading required libraries¶

Reading data from Cellranger output¶

Data processing and visualization¶

Checking highly variable genes¶

Doublet detection using Scrublet¶

Plot doublet score histograms for observed transcriptomes and simulated doublets¶

Quality control¶

Computing metrics for cell QC¶

Ploting predicted doublets¶

Ploting MT gene fractions¶

Count depth distribution¶

Gene count distribution¶

Counting cells per gene¶

Plotting count depth vs MT fraction¶

Checking thresholds and filtering data¶

Normalization¶

Highly variable genes¶

Regressing out technical effects¶

Principal component analysis¶

Neighborhood graph¶

Clustering the neighborhood graph¶

Embed the neighborhood graph using UMAP¶

Plotting 3D UMAP¶

Plotting 2D UMAP¶

Embed doublets using UMAP¶

Embed the neighborhood graph using tSNE¶

Embed the neighborhood graph using Diffusion Map and Graph¶

Embed the neighborhood graph using PHATE¶

Embed the neighborhood graph using PhenoGraph¶

Finding marker genes¶

Stacked violin plot of ranked genes¶

Dot plot of ranked genes¶

Matrix plot of ranked genes¶

Heatmap plot of ranked genes¶

Tracks plot of ranked genes¶

Cell annotation¶

Cell cycle scoring¶

Trajectory analysis¶

Partition-based graph abstraction¶

Explore cells in UCSC Cell Browser¶

References¶

Acknowledgement¶