ISB-CGC Community Notebooks Check out more notebooks at our Community Notebooks Repository!

Title: How to use Kallisto to quantify genes in 10X scRNA-seq

Author: David L Gibbs

Created: 2019-08-07

Purpose: Demonstrate how to use 10X fastq files and produce the gene quantification matrix

Notes:

In this notebook, we're going to use the 10X genomics fastq files that we generated earlier, to quantify gene expression per cell using Kallisto and Bustools.

It is assumed that this notebook is running INSIDE THE CLOUD! By starting up a Jupyter notebook, you are already authenticated, can read and write to cloud storage (buckets) for free, and data transfers are super fast. To start up a notebook, log into your Google Cloud Console, use the main 'hamburger' menu to find the 'AI platform' near the bottom. Select Notebooks and you'll have an interface to start either an R or Python notebook.

Resources:

In [ ]:
 
In [ ]:
cd /home/jupyter/

Software install

In [ ]:
!git clone https://github.com/pachterlab/kallisto.git
In [ ]:
cd kallisto/
In [ ]:
ls -lha
In [ ]:
!sudo apt --yes install autoconf cmake
In [ ]:
!mkdir build
In [ ]:
cd build
In [ ]:
!sudo cmake ..

!sudo make

!sudo make install
In [ ]:
!kallisto
In [ ]:
cd ../..
In [ ]:
!git clone https://github.com/BUStools/bustools.git
In [ ]:
# we need the devel version due to a bug that stopped compilation ...
!git checkout devel
In [ ]:
!git status
In [ ]:
cd bustools/
In [ ]:
!mkdir build
In [ ]:
cd build
In [ ]:
!sudo cmake ..

!sudo make

!sudo make install
In [ ]:
cd ../..
In [ ]:
!bustools
In [ ]:
 

Reference Gathering

In [ ]:
mkdir kallisto_bustools_getting_started/; cd kallisto_bustools_getting_started/
In [ ]:
!wget ftp://ftp.ensembl.org/pub/release-96/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
In [ ]:
!wget ftp://ftp.ensembl.org/pub/release-96/gtf/homo_sapiens/Homo_sapiens.GRCh38.96.gtf.gz
In [ ]:
 

Barcode whitelist

In [ ]:
# Version 3 chemistry
!wget https://github.com/BUStools/getting_started/releases/download/species_mixing/10xv3_whitelist.txt
In [ ]:
# Version 2 chemistry
!wget https://github.com/bustools/getting_started/releases/download/getting_started/10xv2_whitelist.txt

Gene map utility

In [ ]:
!wget https://raw.githubusercontent.com/BUStools/BUS_notebooks_python/master/utils/transcript2gene.py
In [ ]:
!gunzip Homo_sapiens.GRCh38.96.gtf.gz
In [ ]:
!python transcript2gene.py --use_version < Homo_sapiens.GRCh38.96.gtf > transcripts_to_genes.txt
In [ ]:
!head transcripts_to_genes.txt

Data

In [ ]:
mkdir data
In [ ]:
!gsutil -m cp gs://your-bucket/bamtofastq_S1_*  data
In [ ]:
mkdir output
In [ ]:
cd /home/jupyter
In [ ]:
ls -lha data

Indexing

In [ ]:
!kallisto index -i Homo_sapiens.GRCh38.cdna.all.idx -k 31 Homo_sapiens.GRCh38.cdna.all.fa.gz

Kallisto

In [ ]:
!kallisto bus -i Homo_sapiens.GRCh38.cdna.all.idx -o output -x 10xv3 -t 8  \
data/bamtofastq_S1_L005_R1_001.fastq.gz data/bamtofastq_S1_L005_R2_001.fastq.gz \
data/bamtofastq_S1_L005_R1_002.fastq.gz data/bamtofastq_S1_L005_R2_002.fastq.gz \
data/bamtofastq_S1_L005_R1_003.fastq.gz data/bamtofastq_S1_L005_R2_003.fastq.gz \
data/bamtofastq_S1_L005_R1_004.fastq.gz data/bamtofastq_S1_L005_R2_004.fastq.gz \
data/bamtofastq_S1_L005_R1_005.fastq.gz data/bamtofastq_S1_L005_R2_005.fastq.gz \
data/bamtofastq_S1_L005_R1_006.fastq.gz data/bamtofastq_S1_L005_R2_006.fastq.gz \
data/bamtofastq_S1_L005_R1_007.fastq.gz data/bamtofastq_S1_L005_R2_007.fastq.gz 

Bustools

In [ ]:
cd /home/jupyter/output/
In [ ]:
!mkdir genecount;
!mkdir tmp;
!mkdir eqcount
In [ ]:
!bustools correct -w ../10xv3_whitelist.txt -o output.correct.bus output.bus
In [ ]:
!bustools sort -t 8 -o output.correct.sort.bus output.correct.bus
In [ ]:
!bustools text -o output.correct.sort.txt output.correct.sort.bus
In [ ]:
!bustools count -o eqcount/output -g ../transcripts_to_genes.txt -e matrix.ec -t transcripts.txt output.correct.sort.bus
In [ ]:
!bustools count -o genecount/output -g ../transcripts_to_genes.txt -e matrix.ec -t transcripts.txt --genecounts output.correct.sort.bus
In [ ]:
!gzip output.bus
!gzip output.correct.bus

Copyting out results

In [ ]:
cd /home/jupyter
In [ ]:
!gsutil -m cp -r output gs://my-output-bucket/my-results
In [ ]: