In this assignment, you'll be using QIIME 2 version 2019.1 to analyze two data sets. First, you'll read and complete the QIIME 2 Moving Pictures of the Human Microbiome tutorial, which contains a few thousand sequences from four different human body sites collected over a few days. This will familiarize you with running QIIME 2 through the Jupyter Notebook, and interpreting QIIME 2 output. You'll then adapt the commands from that tutorial to analyze a real-world data set, where human-associated microbial communities were shown to have forensic potential, potentially allowing investigators to determine who touched an object based on the "microbial fingerprint" they leave behind.

Before starting this assignment, you should read Fierer et al (2010), where this forensic study was initially published. This will help you understand the study. Then, as you work through the assignment, any time you use a QIIME 2 command you should either look up that command in the QIIME 2 plugin index, or call it with its --help parameter to understand what it does and how to use it.

IMPORTANT: This assignment includes some steps that will take some time to run (possibly up to 20 minutes). You should avoid starting this assignment close to the deadline as the server may be slow at that time if a lot of people are running their analyses at the last minute. You are responsible for starting this in a timely manner! If your assignment is late because it did not complete in time, that is your fault, not ours!

Tips for completing this assignment

Running bash shell commands in the Jupyter Notebook with a Python Kernel

The QIIME 2 Moving Pictures of the Human Microbiome illustrates how to use QIIME 2 through the command line interface, or in other words using bash shell commands. To run the commands in the Jupyter Notebook, you can preface each command with an !.

For example, to run the bash shell command ls through the Jupyter Notebook, you would do the following:

In [ ]:
!ls

Running bash shell commands in the Jupyter Notebook with a Bash Kernel

To run the bash shell command ls through the Jupyter Notebook in a bash kernel, you would simply use the bash commands as normal.

In [ ]:
ls

Note you will be working within a file system and it is important that you understand how to navigate through that file system.

For example, you need to be familiar with the bash shell commands ls, cd, and pwd. Use the cells below to be sure you understand how to navigate around the file system and ensure you are in the right place to run your code.

pwd shows you your current working directory.

In [ ]:
pwd

ls shows you what files and directories are in your current working directory.

In [ ]:
ls

cd allows you to change to a directory. cd .. allows you to change out of the directory that you're currently in to the directory that contains the directory that you're currently in. In the cells below use ls to find a directory. (If you don't have one, run mkdir test-dir to create a new directory called test-dir.) cd into that directory, run ls to view the files and directories it contains (if any), and then return to the original directory using cd ... Finally, run pwd to show that you are back where you started.

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 

QIIME 2 Artifact API

If you're comfortable with Python 3 programming, you may want to use the QIIME 2 Aritfact API instead of the command line interface for running QIIME 2 in the Jupyter Notebook. This is a bit underdocumented at the moment however, so if you're not comfortable with Python 3 you should stick with running the bash shell commands for this assignment.

Downloading files from the Jupyer server using sftp

You can use an sftp client to connect to the Jupyter server. We recommend using Cyberduck, which is free and works well on macOS, Windows, and Linux. However, if you have an sftp client that you're already comfortable with, it should work fine for this.

Interacting with QIIME 2 results

QIIME 2 results are always in .qza or .qzv files, which you can learn about by reading QIIME 2 core concepts. You can view .qza and .qzv files at view.qiime2.org. To do this, you'll:

  1. download a .qza or .qzv file using either the scp command or an sftp client;
  2. go to view.qiime2.org in your web browser;
  3. and then drag and drop the .qza or .qzv file from your computer to the view.qiime2.org.

Part 1: Running the Moving Pictures of the Human Microbiome tutorial

Start reading the QIIME 2 Moving Pictures of the Human Microbiome, which will introduce you to using QIIME 2, and will teach you about the commands that you'll need to complete parts 2 and 3 of this assignment. Begin by running the commands in the Sample metadata section of the tutorial in this Jupyer Notebook.

When downloading files, you should use either the wget or curl options.

The first few commands will look like the following.

First, you'll download the sample metadata using the wget command. Then, view the resulting file by downloading it from the Jupyter server.

Before downloading, make sure you understand where you are on the file system because that is where files will be downloaded to. Use the pwd command to find your current working directory.

In [ ]:
pwd
In [ ]:
wget -O "sample-metadata.tsv" "https://data.qiime2.org/2019.1/tutorials/moving-pictures/sample_metadata.tsv"

Next, create a directory and download the two sequence data files to that directory.

In [ ]:
mkdir emp-single-end-sequences
wget -O "emp-single-end-sequences/barcodes.fastq.gz" "https://data.qiime2.org/2019.1/tutorials/moving-pictures/emp-single-end-sequences/barcodes.fastq.gz"
wget -O "emp-single-end-sequences/sequences.fastq.gz" "https://data.qiime2.org/2019.1/tutorials/moving-pictures/emp-single-end-sequences/sequences.fastq.gz"

Now, we'll load the QIIME 2 program so we can run the QIIME 2 commands.

Note: If you are disconnected from the notebook, you will need to rerun this command and change to the directory that you're working in to continue your analyses.

In [ ]:
module load qiime2/2019.1

You can now run your first QIIME 2 command, which will import the sequence data that we just downloaded into a QIIME 2 artifact.

In [ ]:
qiime tools import \
  --type EMPSingleEndSequences \
  --input-path emp-single-end-sequences \
  --output-path emp-single-end-sequences.qza

Try downloading the resulting .qza file, and then uploading it to view.qiime2.org. You'll primarily use view.qiime2.org to view QIIME 2 visualizations (i.e., .qzv files), but take a minute to look at the Provenance tab for the emp-single-end-sequences.qza artifact on view.qiime2.org. This provides information on all of the QIIME 2 commands that were run to get to this artifact, as well as some additional information. For example, you can see how long each command took to run. How long did the import command take to run?

Work through the rest of the Moving Pictures of the Human Microbiome tutorial in this notebook. As you go, answer the questions that are presented in the tutorial. (You don't need to turn those answers in, but spending time on them now will make parts 2 and 3 of this assignment simple.)

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 

Part 2: Microbiome-based forensics

Next, you'll adapt the commands above to perform another analysis and answer some questions. The commands will be largely the same as above, but adapted to the following data files.

Note: you will not need to run the bioenv or ancom steps of the Moving Pictures Tutorial to answer the questions for parts 2 or 3 of this assignment.

You should begin by creating a new directory for your new data set. It is reccommended that you name this directory forensic to set it apart from your moving pictures data.

In [ ]:
mkdir forensic

To work with files that you create in the forensic directory, you'll generally want to be in that directory. If you get error messages indicating things like "no such file or directory", or "file not found", you are most likely in the wrong directory.

In [ ]:
cd forensic

Use these commands to get the appropriate files for your analysis.

In [ ]:
wget -O "forensic-rep-seqs.qza" https://github.com/gregcaporaso/2017.04-bio450-qiime2-assignment/blob/master/rep-seqs.qza?raw=true
In [ ]:
wget -O "forensic-table.qza" https://github.com/gregcaporaso/2017.04-bio450-qiime2-assignment/blob/master/table.qza?raw=true
In [ ]:
wget -O "forensic-sample-metadata.tsv" https://raw.githubusercontent.com/gregcaporaso/2017.04-bio450-qiime2-assignment/master/sample-metadata.tsv
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 

You will need a different trained feature classifier to assign taxonomy to the forensic data. Use the command below to download that classifier.

Note you must change the classifer .qza file that is being provided to the qiime feature-classifier classify command to be the classifier that you are downloading here.

In [ ]:
wget -O "gg-13-8-99-full-length-nb-classifier.qza" "https://data.qiime2.org/2019.1/common/gg-13-8-99-nb-classifier.qza"
In [ ]:
 
In [ ]:

Part 2: Questions

Answer all of the questions below in the notebook, and turn in a copy of your notebook along with the qzv files that support each answer. In each answer, mention which qzv file(s) that you've turned in support your answer. All of these questions refer to the forensic data set, not the moving pictures data set!

Question 1: What was the minimum number of sequences per sample? What was the maximum number of sequences per sample? What even sampling depth did you choose when running qiime diversity core-metrics, and why?

Answer Question 1 in this cell.

Question 2: How long did qiime alignment mafft take to run (to the microsecond)? Review any of the produced artifacts' or visualizations' Provenace to find that information. How long did qiime phylogeny fasttree take to run (to the microsecond)?

Answer Question 2 in this cell.

Question 3: The focus of the Fierer 2010 paper was to show that it is possible to match an individual to the objects they touch based on the microbial communities that the individual leaves behind. Based on your analysis results, match the individuals to the keyboard they touched, and explain how you came to this answer. There is one right answer to this question, and you should support your answer with references to more than one .qzv file.

Answer Question 3 in this cell.

To turn in:

  • All .qzv files referenced in your answers to Part 2: Questions.
  • This Jupyter Notebook containing containing the full list of commands that you ran to generate the above data, and answers to the questions in Part 2, noting any problems that you ran into along the way.

Part 3: Interpreting and reporting microbiome results

This section will focus on working with the results generated by the commands that you ran above.

For this section, you will write a 2.5 to 3 page paper describing your analysis. Your paper should not be any shorter than 2.5 pages nor any longer than 3 pages. It must have 1.5 line spacing, 1.25" margins, and be written in 12 point Times New Roman font. Figures and tables are included in the page count, though the total space taken by these should be a maximum of one page.

Write this as if you’re submitting to a journal, so your paper should contain:

  • A title and list of authors (just you, in this case). This should be in no more than 14 point font, and follow the margin/spacing requirements listed above.
  • An Introduction section describing the goals of the study (you can refer to Fierer et al (2010));
  • A Methods section containing a brief description of your bioinformatics methods (e.g., what version of QIIME, what commands were used), and how the data was generated (e.g., sequencing platform);
  • And a Results and Conclusions section describing the results of your analysis.

You should address several specific questions in your Results and Conclusions:

  • Which keyboard belonged to each of the human subjects? Back this up with figures and statistics that support your answer. Describe what each of your figures tells you.
  • Which bacterial phyla appear to be most useful for matching each individual to their keyboard? Refer to Taxonomic barplots to address this.
  • Which features were most different in abundance across the subjects? Refer to the Taxonomic barplots to address this. Present a table containing the most differentially abundant features and their taxonomy.
  • Which subject had the most features (i.e., the highest microbial community richness) in their skin microbiome, on average? Was the number of features across subjects significantly different?