This notebook is for the analysis of 15 16S PCR libraries that were produced from DNA extracted from meals, collected by blending a plate of food in the blender. The meals were prepared to be typical of three diet types (Average American, USDA-recommended, and Vegan)</p></p> Before launching this ipython notebook, I typed the macqiime command to configure the shell. I'm using macqiime 1.8.0 http://www.wernerlab.org/software/macqiime/macqiime-installation

I copied the stuff below from a QIIME/iPython notebook tutorial: http://nbviewer.ipython.org/github/qiime/qiime/blob/1.8.0/examples/ipynb/illumina_overview_tutorial.ipynb</p></p>I'm not sure what all of it does...

In [1]:
from os import chdir, mkdir
from os.path import join
#the following are only available in the current development branch of IPython
from IPython.display import FileLinks, FileLink

Substitute the file paths below, and then there is no need to change any of the code in the rest of the notebook.</p></p> The version of macqiime that I'm using does not install the greengenes 99% cutoff OTUs and taxonomy, so did that manually as per the instructions on the MacQIIME Installation site. I just substituted the gg_13_8_otus folder that has all of the otu cutoffs for the one included in macqiime/greengenes/

In [2]:
project_name = "MicrobesWeEat"
sequence_file = "./MicrobesWeEat.fasta"
non_chimeric_sequence_file = "./non_chimeric_sequences.fasta"
mapping_file = "./MicrobesWeEat.txt"
otu_base = "/macqiime/greengenes/gg_13_8_otus/"
reference_seqs_99 = join(otu_base,"/macqiime/greengenes/gg_13_8_otus/rep_set/99_otus.fasta")
reference_tree_99 = join(otu_base,"/macqiime/greengenes/gg_13_8_otus/trees/99_otus.tree")
reference_tax_99 = join(otu_base,"/macqiime/greengenes/gg_13_8_otus/taxonomy/99_otu_taxonomy.txt")
reference_seqs_97 = join(otu_base,"/macqiime/greengenes/gg_13_8_otus/rep_set/97_otus.fasta")
reference_tree_97 = join(otu_base,"/macqiime/greengenes/gg_13_8_otus/trees/97_otus.tree")
reference_tax_97 = join(otu_base,"/macqiime/greengenes/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt")

Make sure mapping file is good to go. This also provides a quick check for macqiime.

In [3]:
!validate_mapping_file.py -m $mapping_file
No errors or warnings were found in mapping file.
In [43]:
#!ls RawSequenceData/
#Listed are the raw sequence files.

Demultiplex and join.

Gotta unzip them to work with them.

In [44]:
#!gunzip RawSequenceData/*.gz
#!ls RawSequenceData/

After consulting with a QIIME developer via the forum, I know that there is an issue with the trailing *:N:0 in all of the fastq files. A fix before proceeding is below, but this should not be necessary in future releases.

In [28]:
!sed 's/ 1:N:0://g' RawSequenceData/HMSB_AZ_35_NoIndex_L001_R1_001.fastq > HMSB_AZ_35_NoIndex_L001_R1_001_fixed.fastq
!sed 's/ 4:N:0://g' RawSequenceData/HMSB_AZ_35_NoIndex_L001_R4_001.fastq > HMSB_AZ_35_NoIndex_L001_R4_001_fixed.fastq
!sed 's/ 1:N:0://g' RawSequenceData/HMSB_AZ_35_NoIndex_L001_R1_002.fastq > HMSB_AZ_35_NoIndex_L001_R1_002_fixed.fastq
!sed 's/ 4:N:0://g' RawSequenceData/HMSB_AZ_35_NoIndex_L001_R4_002.fastq > HMSB_AZ_35_NoIndex_L001_R4_002_fixed.fastq
!sed 's/ 1:N:0://g' RawSequenceData/HMSB_AZ_35_NoIndex_L001_R1_003.fastq > HMSB_AZ_35_NoIndex_L001_R1_003_fixed.fastq
!sed 's/ 4:N:0://g' RawSequenceData/HMSB_AZ_35_NoIndex_L001_R4_003.fastq > HMSB_AZ_35_NoIndex_L001_R4_003_fixed.fastq
!sed 's/ 1:N:0://g' RawSequenceData/HMSB_AZ_35_NoIndex_L001_R1_004.fastq > HMSB_AZ_35_NoIndex_L001_R1_004_fixed.fastq
!sed 's/ 4:N:0://g' RawSequenceData/HMSB_AZ_35_NoIndex_L001_R4_004.fastq > HMSB_AZ_35_NoIndex_L001_R4_004_fixed.fastq

Create a new barcode file in which the forward and reverse barcodes are concatenated into a single barcode.

In [25]:
!extract_barcodes.py -r RawSequenceData/HMSB_AZ_35_NoIndex_L001_R2_001.fastq -f RawSequenceData/HMSB_AZ_35_NoIndex_L001_R3_001.fastq -c barcode_paired_end --bc1_len 8 --bc2_len 8 -o Lane1.barcodes
!extract_barcodes.py -r RawSequenceData/HMSB_AZ_35_NoIndex_L001_R2_002.fastq -f RawSequenceData/HMSB_AZ_35_NoIndex_L001_R3_002.fastq -c barcode_paired_end --bc1_len 8 --bc2_len 8 -o Lane2.barcodes
!extract_barcodes.py -r RawSequenceData/HMSB_AZ_35_NoIndex_L001_R2_003.fastq -f RawSequenceData/HMSB_AZ_35_NoIndex_L001_R3_003.fastq -c barcode_paired_end --bc1_len 8 --bc2_len 8 -o Lane3.barcodes
!extract_barcodes.py -r RawSequenceData/HMSB_AZ_35_NoIndex_L001_R2_004.fastq -f RawSequenceData/HMSB_AZ_35_NoIndex_L001_R3_004.fastq -c barcode_paired_end --bc1_len 8 --bc2_len 8 -o Lane4.barcodes

Again remove the trailing *:N:0 in all of the barcode fastq files.

In [26]:
!sed 's/ 3:N:0://g' Lane1.barcodes/barcodes.fastq > Lane1.barcodes.fastq
!sed 's/ 3:N:0://g' Lane2.barcodes/barcodes.fastq > Lane2.barcodes.fastq
!sed 's/ 3:N:0://g' Lane3.barcodes/barcodes.fastq > Lane3.barcodes.fastq
!sed 's/ 3:N:0://g' Lane4.barcodes/barcodes.fastq > Lane4.barcodes.fastq

Now, "join" the paired ends. This will create sequences that have the forward barcode at the beginning and the reverse barcode at the end.

In [33]:
!join_paired_ends.py -r HMSB_AZ_35_NoIndex_L001_R1_001_fixed.fastq -f HMSB_AZ_35_NoIndex_L001_R4_001_fixed.fastq -b Lane1.barcodes.fastq -o Lane1_joined
!join_paired_ends.py -r HMSB_AZ_35_NoIndex_L001_R1_002_fixed.fastq -f HMSB_AZ_35_NoIndex_L001_R4_002_fixed.fastq -b Lane2.barcodes.fastq -o Lane2_joined
!join_paired_ends.py -r HMSB_AZ_35_NoIndex_L001_R1_003_fixed.fastq -f HMSB_AZ_35_NoIndex_L001_R4_003_fixed.fastq -b Lane3.barcodes.fastq -o Lane3_joined
!join_paired_ends.py -r HMSB_AZ_35_NoIndex_L001_R1_004_fixed.fastq -f HMSB_AZ_35_NoIndex_L001_R4_004_fixed.fastq -b Lane4.barcodes.fastq -o Lane4_joined

Now, time to demultiplex!

In [111]:
!split_libraries_fastq.py -q 5 -r 9 -p 0.5 -i Lane1_joined/fastqjoin.join.fastq -o Demultiplexed_Lane1 -m MicrobesWeEat.txt -b Lane1_joined/fastqjoin.join_barcodes.fastq --retain_unassigned_reads --barcode_type 16 --max_barcode_errors 2
!split_libraries_fastq.py -q 5 -r 9 -p 0.5 -i Lane2_joined/fastqjoin.join.fastq -o Demultiplexed_Lane2 -m MicrobesWeEat.txt -b Lane2_joined/fastqjoin.join_barcodes.fastq --retain_unassigned_reads --barcode_type 16 --max_barcode_errors 2
!split_libraries_fastq.py -q 5 -r 9 -p 0.5 -i Lane3_joined/fastqjoin.join.fastq -o Demultiplexed_Lane3 -m MicrobesWeEat.txt -b Lane3_joined/fastqjoin.join_barcodes.fastq --retain_unassigned_reads --barcode_type 16 --max_barcode_errors 2
!split_libraries_fastq.py -q 5 -r 9 -p 0.5 -i Lane4_joined/fastqjoin.join.fastq -o Demultiplexed_Lane4 -m MicrobesWeEat.txt -b Lane4_joined/fastqjoin.join_barcodes.fastq --retain_unassigned_reads --barcode_type 16 --max_barcode_errors 2

I want to know how many reads per library I have.

In [42]:
#!cat Demultiplexed_Lane1/split_library_log.txt
#!cat Demultiplexed_Lane2/split_library_log.txt
#!cat Demultiplexed_Lane3/split_library_log.txt
#!cat Demultiplexed_Lane4/split_library_log.txt

At this point, I want to concatenate all of the lanes, but there will be multiple sequences with the same ID (e.g., Unassigned_0) and I'm assuming that will cause problems. So, I have 2 choices: concatenate before running split_libraries_fastq.py OR figure out if QIIME will take more than one input fasta file. I searched for a bit to find out if the second option will work, but I ran out of patience with the search before I found an answer. So, I'm going concatenate the output from join_paired_ends.py and run split_libraries_fastq.py again.

In [118]:
!cat Lane*_joined/fastqjoin.join.fastq > All_Lanes_joined.fastq
!cat Lane*_joined/fastqjoin.join_barcodes.fastq > All_barcodes_joined.fastq

The parameters that I'm using for split_libraries.py were optimized through trial and error. Either pretty much all sequences passed the quality filter or none of them did, so tuning the parameters was pretty straightforward.

In [119]:
!split_libraries_fastq.py -q 5 -r 9 -p 0.5 -i All_Lanes_joined.fastq -o Demultiplexed_All -m MicrobesWeEat.txt -b All_barcodes_joined.fastq --retain_unassigned_reads --barcode_type 16 --max_barcode_errors 2
In [79]:
!cat Demultiplexed_All/split_library_log.txt
cat: Demultiplexed_All/split_library_log.txt: No such file or directory
In [45]:
#!ls Demultiplexed_All/

Pick closed reference OTUs for PICRUSt

While there are good reasons NOT to use closed-reference otu-picking in general, because one of the things I'd like to do is to use PICRUSt to predict a metagenome, I will start by using the 99% cutoff greengenes reference files for closed-reference picking

In [6]:
!pick_closed_reference_otus.py -p 99parameters.txt -o greengenes_99_otus -i Demultiplexed_All/seqs.fna -r $reference_seqs_99 -t $reference_tax_99 -a -O 2 -f
In [9]:
!pick_closed_reference_otus.py -p 97parameters.txt -o greengenes_97_otus -i Demultiplexed_All/seqs.fna -r $reference_seqs_97 -t $reference_tax_97 -a -O 2 -f
In [25]:
#!cat greengenes_99_otus/uclust_ref_picked_otus/seqs_otus.log
In [64]:
#!biom summarize-table -i greengenes_99_otus/otu_table.biom -o 99_otu_table_summary.txt
#!cat 99_otu_table_summary.txt
In [1]:
#!biom summarize-table -i greengenes_97_otus/otu_table.biom -o 97_otu_table_summary.txt
#!cat 97_otu_table_summary.txt
cat: 97_otu_table_summary.txt: No such file or directory
In [6]:
#!cat greengenes_97_otus/uclust_ref_picked_otus/seqs_otus.log

Now for chimera-checking (I suppose this could be done earlier, actually.) Next time, I would do it just fter split_libraries_fastq.py). The input must be unaligned fasta sequences in the same orientation as the reference database. That's why I'm using the output from split_libraries_fastq.py, as recommended.

In [21]:
#!identify_chimeric_seqs.py -i Demultiplexed_All/seqs.fna -m usearch61 -o 97_usearch61_chimera_detection/ -r $reference_seqs_97
#!identify_chimeric_seqs.py -i Demultiplexed_All/seqs.fna -m usearch61 -o 99_usearch61_chimera_detection/ -r $reference_seqs_99

Remove the chloroplast and mitochondrial sequences. I'm also removing the "Unassigned" sequences. There are 795 of them in 39 OTUs. I manually blasted the representative sequences from each OTU, and they were all chloroplasts. I am also removing all of the remaining singletons.

In [4]:
!filter_taxa_from_otu_table.py -i greengenes_97_otus/otu_table.biom -o closed_ref_97_otu_table_no_euks.biom -n c__Chloroplast,f__mitochondria
!filter_taxa_from_otu_table.py -i closed_ref_97_otu_table_no_euks.biom -o closed_ref_97_otu_table_no_euks_no_unassigned.biom -n Unassigned
!filter_otus_from_otu_table.py -i closed_ref_97_otu_table_no_euks_no_unassigned.biom -o closed_ref_97_otu_table_filtered.biom  -n 2
In [37]:
!filter_taxa_from_otu_table.py -i greengenes_99_otus/otu_table.biom -o closed_ref_99_otu_table_no_euks.biom -n c__Chloroplast,f__mitochondria
!filter_taxa_from_otu_table.py -i closed_ref_99_otu_table_no_euks.biom -o closed_ref_99_otu_table_no_euks_no_unassigned.biom -n Unassigned
!filter_otus_from_otu_table.py -i closed_ref_99_otu_table_no_euks_no_unassigned.biom -o closed_ref_99_otu_table_filtered.biom  -n 2
In [4]:
#!biom summarize-table -i closed_ref_99_otu_table_filtered.biom -o closed_ref_99_otu_table_filtered.summary
#!cat closed_ref_99_otu_table_filtered.summary

This is a reminder to look at the biom summary before proceeding. In my case, I still have all of the PhiX and other unassigned reads in my data. So, I created a file called samples_to_keep.txt with a list of my actual sample IDs in it, and I will use this list to remove the Unassigned reads.

In [5]:
!filter_samples_from_otu_table.py -i closed_ref_97_otu_table_filtered.biom -o closed_ref_97_otu_table_final.biom --sample_id_fp samples_to_keep.txt
In [68]:
!filter_samples_from_otu_table.py -i closed_ref_99_otu_table_filtered.biom -o closed_ref_99_otu_table_final.biom --sample_id_fp samples_to_keep.txt

Run PICRUSt

Normalize otu table by 16S copy number

In [6]:
!/Applications/picrust-1.0.0/scripts/normalize_by_copy_number.py -i closed_ref_97_otu_table_final.biom -o normalized_closed_ref_97_otu_table.biom
In [69]:
!/Applications/picrust-1.0.0/scripts/normalize_by_copy_number.py -i closed_ref_99_otu_table_final.biom -o normalized_closed_ref_99_otu_table.biom

Predict the metagenome, note that I am using the -a flag to calculate NSTI.

In [7]:
!/Applications/picrust-1.0.0/scripts/predict_metagenomes.py -a normalized_closed_ref_97_NSTI.tab -i normalized_closed_ref_97_otu_table.biom -o metagenome_prediction_from_normalized_closed_ref_97_otu_table.biom
In [70]:
!/Applications/picrust-1.0.0/scripts/predict_metagenomes.py -a normalized_closed_ref_99_NSTI.tab -i normalized_closed_ref_99_otu_table.biom -o metagenome_prediction_from_normalized_closed_ref_99_otu_table.biom
In [ ]:
#!more normalized_closed_ref_97_NSTI.tab
In [72]:
#!more normalized_closed_ref_99_NSTI.tab

Collapse the thousands of predicted functions into higher categories (e.g KOs into KEGG Pathways) and generate STAMP profiles

In [8]:
!/Applications/picrust-1.0.0/scripts/categorize_by_function.py -f -i metagenome_prediction_from_normalized_closed_ref_97_otu_table.biom -c KEGG_Pathways -l 3 -o 97_MWE_predicted_metagenomes.L3.txt
!/Applications/picrust-1.0.0/scripts/categorize_by_function.py -f -i metagenome_prediction_from_normalized_closed_ref_97_otu_table.biom -c KEGG_Pathways -l 2 -o 97_MWE_predicted_metagenomes.L2.txt
In [73]:
!/Applications/picrust-1.0.0/scripts/categorize_by_function.py -f -i metagenome_prediction_from_normalized_closed_ref_99_otu_table.biom -c KEGG_Pathways -l 3 -o 99_MWE_predicted_metagenomes.L3.txt
!/Applications/picrust-1.0.0/scripts/categorize_by_function.py -f -i metagenome_prediction_from_normalized_closed_ref_99_otu_table.biom -c KEGG_Pathways -l 2 -o 99_MWE_predicted_metagenomes.L2.txt
In [9]:
!sed '1d' 97_MWE_predicted_metagenomes.L3.txt | rev | cut -f 2- | rev > 97_MWE_predicted_metagenome.L3.spf
!sed '1d' 97_MWE_predicted_metagenomes.L2.txt | rev | cut -f 2- | rev > 97_MWE_predicted_metagenome.L2.spf
In [74]:
!sed '1d' 99_MWE_predicted_metagenomes.L3.txt | rev | cut -f 2- | rev > 99_MWE_predicted_metagenome.L3.spf
!sed '1d' 99_MWE_predicted_metagenomes.L2.txt | rev | cut -f 2- | rev > 99_MWE_predicted_metagenome.L2.spf

The most significant difference between Diet Types is "Other Glycan Degradation" (K00511) at L3 (see figure).</p></p> There are no significant differences at L2.

Identify which OTUs are contributing to K00511

In [13]:
!/Applications/picrust-1.0.0/scripts/metagenome_contributions.py -l K00511 -i normalized_closed_ref_97_otu_table.biom -o 97_metagenome_contribution_K00511 -g 13_5
Filtering the genome table to include only user-specified functions: ['K00511']
In [14]:
!/Applications/picrust-1.0.0/scripts/metagenome_contributions.py -l K00511 -i normalized_closed_ref_99_otu_table.biom -o 99_metagenome_contribution_K00511 -g 13_5
Filtering the genome table to include only user-specified functions: ['K00511']
In [15]:
!cat 97_metagenome_contribution_K00511
Gene	Sample	OTU	GeneCountPerGenome	OTUAbundanceInSample	CountContributedByOTU	ContributionPercentOfSample	ContributionPercentOfAllSamples
K00511	USDAdinner	4366579	1.0	1.0	1.0	1.0	1.0
In [16]:
!cat 99_metagenome_contribution_K00511
Gene	Sample	OTU	GeneCountPerGenome	OTUAbundanceInSample	CountContributedByOTU	ContributionPercentOfSample	ContributionPercentOfAllSamples
K00511	USDAdinner	106978	1.0	1.0	1.0	1.0	1.0

So, OTU 106978, which is Corallococcus exiguus, is abundant in the USDA dinner, and is the contributor to the "Other Glycan Degradation" in this metagenome

Pick OTUs

The QIIME developers prefer the open reference OTU-picking approach, and I have no reason not to go with the default 97% cutoff, so that's what I'll do here. The pick_open_reference_otus.py script will cluster all of the sequences, assign taxonomy to the OTUs (when possible, a greengenes ID will be assigned,) choose a representative sequence from each OTU (rep_set), and align and build a phylogenetic tree from the representative sequences.

This time I'll remember to get rid of the Unassigned reads before I pick the OTUs!

In [81]:
!filter_fasta.py -f MicrobesWeEat.fasta -o MicrobesWeEat_NoUnassigned.fasta --sample_id_fp samples_to_keep.txt
In [96]:
!pick_open_reference_otus.py -p 97parameters.txt -r $reference_seqs_97 -i MicrobesWeEat_NoUnassigned.fasta -o 97_open_reference_otus -f

Sanity check for the otu table.

In [8]:
#!biom summarize-table -i 97_open_reference_otus/otu_table_mc2_w_tax_no_pynast_failures.biom -o otu_table_summary_before_cleanup.txt
#!cat otu_table_summary_before_cleanup.txt

Filter out singletons, mitochondria, and chloroplasts.

In [88]:
!filter_taxa_from_otu_table.py -i 97_open_reference_otus/otu_table_mc2_w_tax_no_pynast_failures.biom -o open_ref_97_otu_table_no_euks.biom -n c__Chloroplast,f__mitochondria
!filter_taxa_from_otu_table.py -i open_ref_97_otu_table_no_euks.biom -o open_ref_97_otu_table_no_euks_no_unassigned.biom -n Unassigned
!filter_otus_from_otu_table.py -i open_ref_97_otu_table_no_euks_no_unassigned.biom -o open_ref_97_otu_table_no_euks_no_unassigned_no_singletons.biom  -n 2
In [173]:
#sanity check
#!biom summarize-table -i open_ref_97_otu_table_no_euks_no_unassigned_no_singletons.biom -o open_ref_97_otu_table_no_euks_no_unassigned_no_singletons.summary
#!cat open_ref_97_otu_table_no_euks_no_unassigned_no_singletons.summary
In [104]:
#!identify_chimeric_seqs.py -m ChimeraSlayer -i rep_set_aligned.fasta -a core_set_aligned.fasta.imputed -o chimeric_seqs_open_97.txt

I couldn't get ChimeraSlayer to run. QIIME support thinks that the problem might be that I do not have enough memory because he was able to run it on my file. He was nice enough to send me the output, so I can proceed.

Now that I have my list of chimeric OTUs, I need to remove them from the OTU table, remove them from my aligned representative sequences, build a new phylogenetic tree from the new rep set alignment (I think, but all of this may not be necessary.)

In [170]:
#remove chimeras from aligned rep_set 
!filter_fasta.py -f 97_open_reference_otus/pynast_aligned_seqs/rep_set_aligned.fasta -o non_chimeric_rep_set_aligned.fasta -s chimeric_seqs_open_97.txt -n
In [108]:
!make_phylogeny.py -i non_chimeric_rep_set_aligned.fasta
In [172]:
!filter_otus_from_otu_table.py -i open_ref_97_otu_table_no_euks_no_unassigned_no_singletons.biom -o open_ref_97_otu_table_no_euks_no_unassigned_no_singletons_no_chimeras.biom -e chimeric_seqs_open_97.txt
In [175]:
#sanity check
#!biom summarize-table -i open_ref_97_otu_table_no_euks_no_unassigned_no_singletons_no_chimeras.biom -o open_ref_97_otu_table_no_euks_no_unassigned_no_singletons_no_chimeras.summary
#!cat open_ref_97_otu_table_no_euks_no_unassigned_no_singletons_no_chimeras.summary

The following steps won't work without the metadata added to the biom file, so I'll do that first. I had to dig the observation metadata mapping file out of here: uclust_assigned_taxonomy/rep_set_tax_assignments.txt AND I had to add a header row to make it look like the example files in http://biom-format.org/documentation/adding_metadata.html

In [26]:
!rm 97_open_otu_table_wmd.biom
!biom add-metadata -i open_ref_97_otu_table_no_euks_no_unassigned_no_singletons_no_chimeras.biom -o 97_open_otu_table_wmd.biom --sample-metadata-fp MicrobesWeEat.txt --observation-metadata-fp rep_set_tax_assignments_header.txt 

Core Diversity Analyses

Running core_diversity_analyses.py with libraries subsampled to 893 sequences and categorized by DietType. Phylogeny-based analyses will use the new tree with the chimeric sequences removed.

In [35]:
#sanity check
#!biom summarize-table -i 97_open_otu_table_wmd.biom -o 97_open_otu_table_wmd.summary
#!cat 97_open_otu_table_wmd.summary
In [37]:
#unique counts instead of totals
#!biom summarize-table -i 97_open_otu_table_wmd.biom --qualitative -o 97_open_otu_table_wmd.qualitative.summary
#!cat 97_open_otu_table_wmd.qualitative.summary
In [38]:
#unique counts instead of totals, but with rarefied OTU table
!biom summarize-table -i core_diversity_analyses_open_ref_97/table_even771.biom --qualitative -o 97_open_otu_table_771.qualitative.summary
!cat 97_open_otu_table_771.qualitative.summary
Num samples: 15
Num observations: 729
Table md5 (unzipped): 0f5f8fd0fc772b57d5af6c39cacea84b

Observations/sample summary:
 Min: 13
 Max: 203
 Median: 116.000
 Mean: 111.067
 Std. dev.: 60.793
 Sample Metadata Categories: Iron_bin; Carotene_beta; phinchID; Fermented; Sodium; Carotene_beta_bin; Protein_bin; Dairy; Forward; Sodium_bin; Sugars_bin; Fiber; Description; BarcodeSequence; PC1_bin; Fruit; Energy_bin; Vitamin_A_bin; Vitamin_A; Fatty_acids; Carbohydrate_bin; Fiber_bin; Cholesterol_bin; DietType; Reverse; Calcium_bin; Cooked; LinkerPrimerSequence; Iron; Cholesterol; HFCS; Meal; Total_lipid_bin; Total_lipid; Calcium; Vitamin_C; Energy; Sugars; Carbohydrate; PC1; Protein; Fatty_Acids_bin; Vitamin_C_bin
 Observation Metadata Categories: taxonomy; confidence

Observations/sample detail:
 USDAsnack1: 13
 USDAsnack2: 26
 AMERICANsnack: 41
 AMERICANlunch: 42
 VEGANsnack1: 58
 VEGANlunch: 94
 VEGANbreakfast: 106
 USDAlunch: 116
 AMERICANbreakfast: 123
 USDAdinner: 154
 VEGANdinner: 167
 USDAbreakfast: 168
 AMERICANdinner: 170
 VEGANsnack3: 185
 VEGANsnack2: 203
In [181]:
#!rm -r DietType_core_diversity_analyses_open_ref_97
#!rm -r core_diversity_analyses_open_ref_97
In [18]:
!core_diversity_analyses.py -p 97parameters.txt -i 97_open_otu_table_wmd.biom -o DietType_core_diversity_analyses_open_ref_97 -m MicrobesWeEat.txt -e 771 -c DietType -t non_chimeric_rep_set_aligned.tre
In [17]:
!core_diversity_analyses.py -p 97parameters.txt -i 97_open_otu_table_wmd.biom -o core_diversity_analyses_open_ref_97 -m MicrobesWeEat.txt -e 771 -t non_chimeric_rep_set_aligned.tre

The first time I ran this, it did not work. The issue is that my taxonomy metadata in the biom file is included as strings rather than as lists. I'm not sure why this is the case, but I tracked down the fix, which is to add this line to my parameters file:</p></p>summarize_taxa:md_as_string True

In [19]:
!nmds.py -i core_diversity_analyses_open_ref_97/bdiv_even771/weighted_unifrac_dm.txt -o NMDS_output
In [20]:
!cat NMDS_output
samples	NMDS1	NMDS2	NMDS3
AMERICANsnack	-1.17027261939	0.160180891398	0.117604120355
VEGANsnack1	0.728025005467	1.73753280116	-0.0163669886395
AMERICANbreakfast	0.933966186973	-0.193126733697	-0.971073635433
VEGANsnack2	0.626862241432	-0.247210344627	0.486249440075
AMERICANlunch	-1.10856720972	0.055161785597	-0.261550124104
VEGANsnack3	0.694201915455	-0.0584857699766	0.392490895575
VEGANdinner	0.791638204242	-0.318713695544	0.16089612955
VEGANlunch	-0.549011849737	-0.571419325572	0.499809089166
USDAdinner	0.692697963831	-0.347314151775	-0.222069120523
AMERICANdinner	0.737339108169	-0.211959856119	0.021761404915
USDAlunch	-0.337601140052	0.372012620336	0.557853344909
VEGANbreakfast	-0.0863078410135	-0.177500463084	-0.459990243303
USDAbreakfast	0.549193082536	-0.253528600159	0.133269140252
USDAsnack2	-1.32177734003	-0.0617229557549	-0.222565931975
USDAsnack1	-1.18038570816	0.116093797813	-0.216317520821

stress	0.0260933547181	0	0
% variation explained	0	0	0	

Statistical Analyses

want to look at colelctor's curves without rarefaction

In [21]:
!alpha_rarefaction.py -i 97_open_otu_table_wmd.biom -m $mapping_file -o 97_alpha_uneven -t 97_open_reference_otus/rep_set.tre -f

test for the significance of dietary pattern on the overall microbial community composition

In [24]:
!rm -r permanova_DietTypes
!compare_categories.py --method permanova -m $mapping_file -c DietType -i core_diversity_analyses_open_ref_97/bdiv_even771/weighted_unifrac_dm.txt -o permanova_DietTypes

test for significant differences in taxonomic richness across dietary patterns

In [25]:
!compare_alpha_diversity.py -i core_diversity_analyses_open_ref_97/arare_max771/alpha_div_collated/PD_whole_tree.txt -m $mapping_file -c DietType -o compare_alpha_div_DietType_PD

test for the significant variation in frequency of individual OTUs across dietary patterns

In [28]:
!biom add-metadata -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc771_sorted_L3.biom -o core_diversity_analyses_open_ref_97/taxa_plots/table_mc771_sorted_L2_wmd.biom --sample-metadata-fp MicrobesWeEat.txt #--observation-metadata-fp rep_set_tax_assignments_header.txt 
!biom add-metadata -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc771_sorted_L3.biom -o core_diversity_analyses_open_ref_97/taxa_plots/table_mc771_sorted_L3_wmd.biom --sample-metadata-fp MicrobesWeEat.txt #--observation-metadata-fp rep_set_tax_assignments_header.txt
!biom add-metadata -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc771_sorted_L4.biom -o core_diversity_analyses_open_ref_97/taxa_plots/table_mc771_sorted_L4_wmd.biom --sample-metadata-fp MicrobesWeEat.txt #--observation-metadata-fp rep_set_tax_assignments_header.txt 
!biom add-metadata -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc771_sorted_L5.biom -o core_diversity_analyses_open_ref_97/taxa_plots/table_mc771_sorted_L5_wmd.biom --sample-metadata-fp MicrobesWeEat.txt #--observation-metadata-fp rep_set_tax_assignments_header.txt 
In [29]:
!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc771_sorted_L2_wmd.biom -m $mapping_file -c DietType -o group_significance_L2
!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc771_sorted_L3_wmd.biom -m $mapping_file -c DietType -o group_significance_L3
!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc771_sorted_L4_wmd.biom -m $mapping_file -c DietType -o group_significance_L4
!group_significance.py -i core_diversity_analyses_open_ref_97/taxa_plots/table_mc771_sorted_L5_wmd.biom -m $mapping_file -c DietType -o group_significance_L5
No metadata in biom table.
No metadata in biom table.
No metadata in biom table.
No metadata in biom table.
In [30]:
!make_otu_heatmap.py -i 97_open_otu_table_wmd.biom
In [39]:
!make_otu_heatmap.py -i core_diversity_analyses_open_ref_97/table_even771.biom
In [34]:
!per_library_stats.py -i 97_open_otu_table_wmd.biom-m $mapping_file -o 97_open_otu_table_wmd.perlibstats
/bin/sh: per_library_stats.py: command not found

Export for Use with Phinch

In [4]:
!biom add-metadata -i otu_tables/open_ref_97_otu_table_no_euks_no_singletons.biom -o otu_tables/open_ref_97_otu_table_no_euks_no_singletons_with_metadata.biom -m MicrobesWeEat.txt