In this assignment, you'll be using QIIME 2 version 2019.1 to analyze two data sets. First, you'll read and complete the QIIME 2 Moving Pictures of the Human Microbiome tutorial, which contains a few thousand sequences from four different human body sites collected over a few days. This will familiarize you with running QIIME 2 through the Jupyter Notebook, and interpreting QIIME 2 output. You'll then adapt the commands from that tutorial to analyze a real-world data set, where human-associated microbial communities were shown to have forensic potential, potentially allowing investigators to determine who touched an object based on the "microbial fingerprint" they leave behind.
Before starting this assignment, you should read Fierer et al (2010), where this forensic study was initially published. This will help you understand the study. Then, as you work through the assignment, any time you use a QIIME 2 command you should either look up that command in the QIIME 2 plugin index, or call it with its
--help parameter to understand what it does and how to use it.
IMPORTANT: This assignment includes some steps that will take some time to run (possibly up to 20 minutes). You should avoid starting this assignment close to the deadline as the server may be slow at that time if a lot of people are running their analyses at the last minute. You are responsible for starting this in a timely manner! If your assignment is late because it did not complete in time, that is your fault, not ours!
The QIIME 2 Moving Pictures of the Human Microbiome illustrates how to use QIIME 2 through the command line interface, or in other words using bash shell commands. To run the commands in the Jupyter Notebook, you can preface each command with an
For example, to run the bash shell command
ls through the Jupyter Notebook, you would do the following:
To run the bash shell command
ls through the Jupyter Notebook in a bash kernel, you would simply use the bash commands as normal.
For example, you need to be familiar with the bash shell commands
pwd. Use the cells below to be sure you understand how to navigate around the file system and ensure you are in the right place to run your code.
pwd shows you your current working directory.
ls shows you what files and directories are in your current working directory.
cd allows you to change to a directory.
cd .. allows you to change out of the directory that you're currently in to the directory that contains the directory that you're currently in. In the cells below use
ls to find a directory. (If you don't have one, run
mkdir test-dir to create a new directory called
cd into that directory, run
ls to view the files and directories it contains (if any), and then return to the original directory using
cd ... Finally, run
pwd to show that you are back where you started.
If you're comfortable with Python 3 programming, you may want to use the QIIME 2 Aritfact API instead of the command line interface for running QIIME 2 in the Jupyter Notebook. This is a bit underdocumented at the moment however, so if you're not comfortable with Python 3 you should stick with running the bash shell commands for this assignment.
You can use an sftp client to connect to the Jupyter server. We recommend using Cyberduck, which is free and works well on macOS, Windows, and Linux. However, if you have an sftp client that you're already comfortable with, it should work fine for this.
Start reading the QIIME 2 Moving Pictures of the Human Microbiome, which will introduce you to using QIIME 2, and will teach you about the commands that you'll need to complete parts 2 and 3 of this assignment. Begin by running the commands in the Sample metadata section of the tutorial in this Jupyer Notebook.
When downloading files, you should use either the
The first few commands will look like the following.
First, you'll download the sample metadata using the
wget command. Then, view the resulting file by downloading it from the Jupyter server.
Before downloading, make sure you understand where you are on the file system because that is where files will be downloaded to. Use the
pwd command to find your current working directory.
wget -O "sample-metadata.tsv" "https://data.qiime2.org/2019.1/tutorials/moving-pictures/sample_metadata.tsv"
Next, create a directory and download the two sequence data files to that directory.
mkdir emp-single-end-sequences wget -O "emp-single-end-sequences/barcodes.fastq.gz" "https://data.qiime2.org/2019.1/tutorials/moving-pictures/emp-single-end-sequences/barcodes.fastq.gz" wget -O "emp-single-end-sequences/sequences.fastq.gz" "https://data.qiime2.org/2019.1/tutorials/moving-pictures/emp-single-end-sequences/sequences.fastq.gz"
Now, we'll load the QIIME 2 program so we can run the QIIME 2 commands.
Note: If you are disconnected from the notebook, you will need to rerun this command and change to the directory that you're working in to continue your analyses.
module load qiime2/2019.1
You can now run your first QIIME 2 command, which will import the sequence data that we just downloaded into a QIIME 2 artifact.
qiime tools import \ --type EMPSingleEndSequences \ --input-path emp-single-end-sequences \ --output-path emp-single-end-sequences.qza
Try downloading the resulting
.qza file, and then uploading it to view.qiime2.org. You'll primarily use view.qiime2.org to view QIIME 2 visualizations (i.e.,
.qzv files), but take a minute to look at the Provenance tab for the
emp-single-end-sequences.qza artifact on view.qiime2.org. This provides information on all of the QIIME 2 commands that were run to get to this artifact, as well as some additional information. For example, you can see how long each command took to run. How long did the
import command take to run?
Work through the rest of the Moving Pictures of the Human Microbiome tutorial in this notebook. As you go, answer the questions that are presented in the tutorial. (You don't need to turn those answers in, but spending time on them now will make parts 2 and 3 of this assignment simple.)
Next, you'll adapt the commands above to perform another analysis and answer some questions. The commands will be largely the same as above, but adapted to the following data files.
Note: you will not need to run the
ancom steps of the Moving Pictures Tutorial to answer the questions for parts 2 or 3 of this assignment.
You should begin by creating a new directory for your new data set. It is reccommended that you name this directory
forensic to set it apart from your moving pictures data.
To work with files that you create in the
forensic directory, you'll generally want to be in that directory. If you get error messages indicating things like "no such file or directory", or "file not found", you are most likely in the wrong directory.
Use these commands to get the appropriate files for your analysis.
wget -O "forensic-rep-seqs.qza" https://github.com/gregcaporaso/2017.04-bio450-qiime2-assignment/blob/master/rep-seqs.qza?raw=true
wget -O "forensic-table.qza" https://github.com/gregcaporaso/2017.04-bio450-qiime2-assignment/blob/master/table.qza?raw=true
wget -O "forensic-sample-metadata.tsv" https://raw.githubusercontent.com/gregcaporaso/2017.04-bio450-qiime2-assignment/master/sample-metadata.tsv
You will need a different trained feature classifier to assign taxonomy to the forensic data. Use the command below to download that classifier.
Note you must change the classifer
.qza file that is being provided to the
qiime feature-classifier classify command to be the classifier that you are downloading here.
wget -O "gg-13-8-99-full-length-nb-classifier.qza" "https://data.qiime2.org/2019.1/common/gg-13-8-99-nb-classifier.qza"
Answer all of the questions below in the notebook, and turn in a copy of your notebook along with the
qzv files that support each answer. In each answer, mention which
qzv file(s) that you've turned in support your answer. All of these questions refer to the forensic data set, not the moving pictures data set!
Question 1: What was the minimum number of sequences per sample? What was the maximum number of sequences per sample? What even sampling depth did you choose when running
qiime diversity core-metrics, and why?
Answer Question 1 in this cell.
Question 2: How long did
qiime alignment mafft take to run (to the microsecond)? Review any of the produced artifacts' or visualizations' Provenace to find that information. How long did
qiime phylogeny fasttree take to run (to the microsecond)?
Answer Question 2 in this cell.
Question 3: The focus of the Fierer 2010 paper was to show that it is possible to match an individual to the objects they touch based on the microbial communities that the individual leaves behind. Based on your analysis results, match the individuals to the keyboard they touched, and explain how you came to this answer. There is one right answer to this question, and you should support your answer with references to more than one
Answer Question 3 in this cell.
To turn in:
.qzvfiles referenced in your answers to Part 2: Questions.
This section will focus on working with the results generated by the commands that you ran above.
For this section, you will write a 2.5 to 3 page paper describing your analysis. Your paper should not be any shorter than 2.5 pages nor any longer than 3 pages. It must have 1.5 line spacing, 1.25" margins, and be written in 12 point Times New Roman font. Figures and tables are included in the page count, though the total space taken by these should be a maximum of one page.
Write this as if you’re submitting to a journal, so your paper should contain:
You should address several specific questions in your Results and Conclusions: