In this tutorial we illustrate steps for analyzing fungal ITS amplicon data using the QIIME/UNITE reference OTUs (alpha version 12_11; find the link to the latest version of the reference OTUs on the QIIME resources page) to compare the composition of 9 soil communities using open-reference OTU picking.
First we initialize our environment by obtaining the necessary files.
tutorial_url = "https://s3.amazonaws.com/s3-qiime_tutorial_files/its-soils-tutorial.tgz"
reference_url = "https://github.com/downloads/qiime/its-reference-otus/its_12_11_otus.tar.gz"
!wget $tutorial_url
!wget $reference_url
--2012-12-17 08:55:29-- https://s3.amazonaws.com/s3-qiime_tutorial_files/its-soils-tutorial.tgz Resolving s3.amazonaws.com... 72.21.195.160 Connecting to s3.amazonaws.com|72.21.195.160|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 543458 (531K) [application/x-compressed] Saving to: ‘its-soils-tutorial.tgz’ 100%[======================================>] 543,458 464KB/s in 1.1s 2012-12-17 08:55:31 (464 KB/s) - ‘its-soils-tutorial.tgz’ saved [543458/543458] --2012-12-17 08:55:31-- https://github.com/downloads/qiime/its-reference-otus/its_12_11_otus.tar.gz Resolving github.com... 207.97.227.239 Connecting to github.com|207.97.227.239|:443... connected. HTTP request sent, awaiting response... 302 Found Location: http://cloud.github.com/downloads/qiime/its-reference-otus/its_12_11_otus.tar.gz [following] --2012-12-17 08:55:33-- http://cloud.github.com/downloads/qiime/its-reference-otus/its_12_11_otus.tar.gz Resolving cloud.github.com... 216.137.39.211, 216.137.39.177, 216.137.39.147, ... Connecting to cloud.github.com|216.137.39.211|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 24441994 (23M) [application/gzip] Saving to: ‘its_12_11_otus.tar.gz’ 100%[======================================>] 24,441,994 417KB/s in 45s 2012-12-17 08:56:18 (532 KB/s) - ‘its_12_11_otus.tar.gz’ saved [24441994/24441994]
Now unzip these files.
!tar -xzf its-soils-tutorial.tgz
!tar -xzf its_12_11_otus.tar.gz
!gunzip ./its_12_11_otus/rep_set/97_otus.fasta.gz
!gunzip ./its_12_11_otus/taxonomy/97_otu_taxonomy.txt.gz
You can then view the files in each of these direcories by passing the directory name to the FileLinks
function.
from IPython.display import FileLinks
FileLinks('its-soils-tutorial')
The params.txt
file contains many of the details of this analysis. You can review those by clicking the link or by catting the file.
!cat its-soils-tutorial/params.txt
assign_taxonomy:rdp_max_memory 8000 pick_otus:enable_rev_strand_match True pick_otus:max_accepts 1 pick_otus:max_rejects 8 pick_otus:stepwords 8 pick_otus:word_length 8 assign_taxonomy:id_to_taxonomy_fp its_12_11_otus/taxonomy/97_otu_taxonomy.txt assign_taxonomy:reference_seqs_fp its_12_11_otus/rep_set/97_otus.fasta beta_diversity:metrics bray_curtis
The paramters that differentiate ITS analysis from analysis of other amplicons are the two assign_taxonomy
parameters, which are pointing to the reference collection that we just downloaded, and the beta_diversity
parameter, where we specify to compare communities using the non-phylogenetic Bray-Curtis metric, which we use here because we don't currently have a phylogenetic tree relating these OTUs.
Set up paths for the analysis steps.
input_seqs_fp = "its-soils-tutorial/seqs.fna"
reference_seqs_fp = "its_12_11_otus/rep_set/97_otus.fasta"
output_dir = "ucrss_fast/"
parameters_fp = "its-soils-tutorial/params.txt"
mapping_fp = "its-soils-tutorial/map.txt"
We're now ready to run the pick_open_reference_otus.py
workflow. You can find a description of this process here.
!pick_open_reference_otus.py -i $input_seqs_fp -r $reference_seqs_fp -o $output_dir -p $parameters_fp --suppress_align_and_tree
After that completes (it will take a few minutes) we'll have the OTU table with taxonomy. You can review all of the files that are created by passing the path to the output directory to the FileLinks
function.
FileLinks("ucrss_fast/")
You can then pass the OTU table to print_biom_table_summary.py
to view a summary of the information in the OTU table.
otu_table_fp = "ucrss_fast/otu_table_mc2_w_tax.biom"
!print_biom_table_summary.py -i $otu_table_fp
Num samples: 9 Num otus: 1092 Num observations (sequences): 15927.0 Table density (fraction of non-zero values): 0.1452 Seqs/sample summary: Min: 287.0 Max: 3787.0 Median: 1257.0 Mean: 1769.66666667 Std. dev.: 1174.29875056 Median Absolute Deviation: 859.0 Default even sampling depth in core_qiime_analyses.py (just a suggestion): 287.0 Sample Metadata Categories: None provided Observation Metadata Categories: taxonomy Seqs/sample detail: LQ2: 287.0 SV1: 655.0 CL4: 875.0 AR3: 1196.0 MD4: 1257.0 PE5: 2116.0 KP4: 2193.0 PE6: 3561.0 SF1: 3787.0
Next we'll run a few representative analyses on the OTU table. First we'll compute beta diveristy and generated PCoA plots, and then we'll generate taxonomic summaries of the samples. Note that for the beta diversity analysis we choose an even sampling depth of 287
based on the results of print_biom_table_summary.py
.
bdiv_output_dir = "ucrss_fast/bdiv_even287/"
!beta_diversity_through_plots.py -i $otu_table_fp -o $bdiv_output_dir -e 287 -p $parameters_fp -m $mapping_fp
FileLinks("ucrss_fast/bdiv_even287/")
The primary files of interested here are the 2d and 3d PCoA plots. You can view those by clicking the corresponding links.
taxa_output_dir = 'ucrss_fast/taxa_plots/'
!summarize_taxa_through_plots.py -i $otu_table_fp -o $taxa_output_dir
FileLinks('ucrss_fast/taxa_plots/')
Here the most interesting file is the bar_charts.html
. Click that to open it in a new browser window/tab.