Fungal ITS QIIME analysis tutorial

In this tutorial we illustrate steps for analyzing fungal ITS amplicon data using the QIIME/UNITE reference OTUs (alpha version 12_11) to compare the composition of 9 soil communities using open-reference OTU picking. More recent ITS reference databases based on UNITE are available on the QIIME resources page. The steps in this tutorial can be generalized to work with other marker genes, such as 18S.

We recommend working through the Illumina Overview Tutorial before working through this tutorial, as it provides more detailed annotation of the steps in a QIIME analysis. This tutorial is intended to highlight the differences that are necessary to work with a database other than QIIME's default reference database. For ITS, we won't build a phylogenetic tree and therefore use nonphylogenetic diversity metrics. Instructions are included for how to build a phylogenetic tree if you're sequencing a non-16S, phylogenetically-informative marker gene (e.g., 18S).

First, we obtain the tutorial data and reference database:

In [ ]:
!(wget || curl -O
!(wget ||  curl -O

Now unzip these files.

In [ ]:
!tar -xzf its-soils-tutorial.tgz
!tar -xzf its_12_11_otus.tgz
!gunzip ./its_12_11_otus/rep_set/97_otus.fasta.gz
!gunzip ./its_12_11_otus/taxonomy/97_otu_taxonomy.txt.gz

You can then view the files in each of these direcories by passing the directory name to the FileLinks function.

In [ ]:
from IPython.display import FileLink, FileLinks

The params.txt file modifies some of the default parameters of this analysis. You can review those by clicking the link or by catting the file.

In [ ]:
!cat its-soils-tutorial/params.txt

The parameters that differentiate ITS analysis from analysis of other amplicons are the two assign_taxonomy parameters, which are pointing to the reference collection that we just downloaded.

We're now ready to run the workflow. Discussion of these methods can be found in Rideout et. al (2014).

Note that we pass -r to specify a non-default reference database. We're also passing --suppress_align_and_tree because we know that trees generated from ITS sequences are generally not phylogenetically informative.

In [ ]:
! -i its-soils-tutorial/seqs.fna -r its_12_11_otus/rep_set/97_otus.fasta -o otus/ -p its-soils-tutorial/params.txt --suppress_align_and_tree

Note: If you would like to build a phylogenetic tree (e.g., if you're using a phylogentically-informative marker gene such as 18S instead of ITS), you should remove the --suppress_align_and_tree parameter from the above command and add the following lines to the parameters file:

align_seqs:template_fp <path to reference alignment>
filter_alignment:suppress_lane_mask_filter True
filter_alignment:entropy_threshold 0.10

After that completes (it will take a few minutes) we'll have the OTU table with taxonomy. You can review all of the files that are created by passing the path to the index.html file in the output directory to the FileLink function.

In [ ]:

You can then pass the OTU table to biom summarize-table to view a summary of the information in the OTU table.

In [ ]:
!biom summarize-table -i otus/otu_table_mc2_w_tax.biom

Next, we run several core diversity analyses, including alpha/beta diversity and taxonomic summarization. We will use an even sampling depth of 353 based on the results of biom summarize-table above. Since we did not built a phylogenetic tree, we'll pass the --nonphylogenetic_diversity flag, which specifies to compute Bray-Curtis distances instead of UniFrac distances, and to use only nonphylogenetic alpha diversity metrics.

In [ ]:
! -i otus/otu_table_mc2_w_tax.biom -o cdout/ -m its-soils-tutorial/map.txt -e 353 --nonphylogenetic_diversity

You may see a warning issued above; this is safe to ignore.

Note: If you built a phylogenetic tree, you should pass the path to that tree via -t and not pass --nonphylogenetic_diversity.

You can view the output of using FileLink.

In [ ]:

Precomputed results

In case you're having trouble running the steps above, for example because of a broken QIIME installation, all of the output generated above has been precomputed. You can access this by running the cell below.

In [ ]: