A RAD-seq library of 95 samples was prepared by Floragenex with the PstI restriction enzyme, followed by sonication and size selection. Stats reported by Floragenex include: AverageFragmentSize=386bp, Concentration=2.51ng/uL, Concentation=10nM. The library was sequenced on two lanes of Illumina HiSeq 3000 yielding 378,809,976 reads in lane 1, and 375,813,513 reads in lane 2, for a total of ~755M reads.
This is a jupyter notebook, a tool used to create an executable document to full reproduce our analyses. This notebook contains all of the code to assemble the Ficus RAD-seq data set with ipyrad. We begin by demultiplexing The raw data. The demultiplexed data (will be) archived and available online. If you downloaded the demultiplexed data you can skip to section The Demultiplexed Data and begin by loading in those data. The data were assembled under a range of parameter settings, which you can see in the Within-sample assembly section. Several Samples were filtered from the data set due to low coverage. The data was then clustered across Samples and final output files were created Across-sample assembly.
The following conda commands will locally install of the software required for this notebook.
## conda install ipyrad -c ipyrad
## conda install toytree -c eaton-lab
import ipyrad as ip
import toyplot
## print software versions
print 'ipyrad v.{}'.format(ip.__version__)
print 'toyplot v.{}'.format(toyplot.__version__)
ipyrad v.0.6.20 toyplot v.0.14.4
I started an ipcluster instance on a 40 core workstation with the ipcluster command as shown below. The cluster_info()
command shows that ipyrad is able to find all 40 cores on the cluster.
##
## ipcluster start --n=40
##
#print ip.cluster_info()
The data came to us as two large 20GB files. The barcodes file was provided by Floragenex and maps sample names to barcodes that are contained inline in the sequences, and are 10bp in length. The barcodes are printed a little further below. I ran the program fastQC on the raw data files to do a quality check, the results of which are available here lane1-fastqc and here lane2-fastqc. Overall, quality scores were very high and there was little (but some) adapter contamination, which we will filter out in the ipyrad analysis.
## The reference genome link
reference = """\
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/\
002/002/945/GCA_002002945.1_F.carica_assembly01/\
GCA_002002945.1_F.carica_assembly01_genomic.fna.gz\
"""
## Download the reference genome of F. carica
# ! wget $reference
## decompress it
# ! gunzip ./GCA_002002945.1_F.carica_assembly01_genomic.fna.gz
## Locations of the raw data and barcodes file
lane1data = "~/Documents/RADSEQ_DATA/Ficus/Ficus-1_S1_L001_R1_001.fastq.gz"
lane2data = "~/Documents/RADSEQ_DATA/Ficus/Ficus-2_S2_L002_R1_001.fastq.gz"
barcodes = "~/Documents/RADSEQ_DATA/barcodes/Ficus_Jander_2016_95barcodes.txt"
We set the location to the data and barcodes info for each object, and set the max barcode mismatch parameter to zero (strict), allowing no mismatches. You can see the full barcode information at this link.
## create an object to demultiplex each lane
demux1 = ip.Assembly("lane1")
demux2 = ip.Assembly("lane2")
## set path to data, bcodes, and max_mismatch params
demux1.set_params("project_dir", "./ficus_demux_reads")
demux1.set_params("raw_fastq_path", lane1data)
demux1.set_params("barcodes_path", barcodes)
demux1.set_params("max_barcode_mismatch", 0)
## set path to data, bcodes, and max_mismatch params
demux2.set_params("project_dir", "./ficus_demux_reads")
demux2.set_params("raw_fastq_path", lane2data)
demux2.set_params("barcodes_path", barcodes)
demux2.set_params("max_barcode_mismatch", 0)
New Assembly: lane1 New Assembly: lane2
demux1.run("1")
demux2.run("1")
Assembly: lane1 [####################] 100% chunking large files | 0:21:55 | s1 | [####################] 100% sorting reads | 0:43:43 | s1 | [####################] 100% writing/compressing | 0:14:25 | s1 | Assembly: lane2 [####################] 100% chunking large files | 0:27:44 | s1 | [####################] 100% sorting reads | 0:45:12 | s1 | [####################] 100% writing/compressing | 0:15:53 | s1 |
Now we have two directories with demultiplexed data, each with one gzipped fastq file corresponding to all of the reads matching to a particular Sample's barcode from that lane of sequencing. These are the data that (will be uploaded) to Genbank SRA when we publish. We will load the sorted fastq data at this step, to copy the same procedure that one would take if they were starting from access to the demultiplexed data.
lib1_fastqs = "./ficus_demux_reads/lane1_fastqs/*.gz"
lib2_fastqs = "./ficus_demux_reads/lane2_fastqs/*.gz"
lib1 = ip.Assembly("lib1", quiet=True)
lib1.set_params("sorted_fastq_path", lib1_fastqs)
lib1.run("1")
lib2 = ip.Assembly("lib2", quiet=True)
lib2.set_params("sorted_fastq_path", lib2_fastqs)
lib2.run("1")
Assembly: lib1 [####################] 100% loading reads | 0:01:47 | s1 | Assembly: lib2 [####################] 100% loading reads | 0:00:34 | s1 |
We will join these two demultiplexed libraries into a single analysis that has the set of parameters we will use to assemble the data set. To do this we use the merge()
command in ipyrad. On this merged Assembly we will then set a number of parameter settings that we will use to assemble the data.
## named corresponding to some params we are changing
data = ip.merge("merged", [demux1, demux2])
## set several non-default parameters
data.set_params("project_dir", "analysis-ipyrad")
data.set_params("filter_adapters", 3)
data.set_params("phred_Qscore_offset", 43)
data.set_params("max_Hs_consens", (5, 5))
data.set_params("max_shared_Hs_locus", 4)
data.set_params("filter_min_trim_len", 60)
data.set_params("trim_loci", (0, 8, 0, 0))
data.set_params("output_formats", list("lksapnv"))
## print parameters for prosperity's sake
data.get_params()
0 assembly_name merged 1 project_dir ./analysis-ipyrad 2 raw_fastq_path Merged: lane1, lane2 3 barcodes_path Merged: lane1, lane2 4 sorted_fastq_path Merged: lane1, lane2 5 assembly_method denovo 6 reference_sequence 7 datatype rad 8 restriction_overhang ('TGCAG', '') 9 max_low_qual_bases 5 10 phred_Qscore_offset 43 11 mindepth_statistical 6 12 mindepth_majrule 6 13 maxdepth 10000 14 clust_threshold 0.85 15 max_barcode_mismatch 0 16 filter_adapters 3 17 filter_min_trim_len 60 18 max_alleles_consens 2 19 max_Ns_consens (5, 5) 20 max_Hs_consens (5, 5) 21 min_samples_locus 4 22 max_SNPs_locus (20, 20) 23 max_Indels_locus (8, 8) 24 max_shared_Hs_locus 4 25 trim_reads (0, 0, 0, 0) 26 trim_loci (0, 8, 0, 0) 27 output_formats ('l', 'k', 's', 'a', 'p', 'n', 'v') 28 pop_assign_file
ficus
and drop the control Sample¶First we will drop the control sequence included for quality checking by Floragenex (FGXCONTROL). To do this we create a new branch using the argument subsample
to include all Samples except FGXCONTROL.
## drop the Floragenex control sample if it is in the data
snames = [i for i in data.samples if i != "FGXCONTROL"]
## working branch
data = data.branch("ficus", subsamples=snames)
print "summary of raw read covereage"
print data.stats.reads_raw.describe().astype(int)
summary of raw read covereage count 95 mean 5149596 std 7716026 min 14703 25% 289541 50% 1890328 75% 7402440 max 51339646 Name: reads_raw, dtype: int64
From looking closely at the data it appears there are som poor quality reads with adapter contamination, and also that there are some conspicuous long strings of poly repeats, which are probably due to the library being put on the sequencer in the wrong concentration (the facility failed to do a qPCR quantification). Setting the filter parameter in ipyrad to strict (2) uses 'cutadapt' to filter the reads. By default ipyrad would look just for the Illumina universal adapter, but I'm also adding a few additional poly-{A,C,G,T} sequences to be trimmed. These appeared to be somewhat common in the raw data, followed by nonsense.
## run step 2
data.run("2", force=True)
Assembly: ficus [####################] 100% concatenating inputs | 0:02:43 | s2 | [####################] 100% processing reads | 0:51:52 | s2 |
Steps 2-5 of ipyrad function to filter and cluster reads, and to call consensus haplotypes within samples. We'll look more closely at the stats for each step after it's finished.
## create new branches for assembly method
ficus_d = data.branch("ficus_d")
ficus_r = data.branch("ficus_r")
## set reference info
reference = "GCA_002002945.1_F.carica_assembly01_genomic.fna"
ficus_r.set_params("reference_sequence", reference)
ficus_r.set_params("assembly_method", "reference")
## map reads to reference genome
ficus_r.run("3", force=True)
## cluster reads denovo
ficus_d.run("3")
Assembly: ficus_d [####################] 100% dereplicating | 0:04:31 | s3 | [####################] 100% clustering | 0:13:54 | s3 | [####################] 100% building clusters | 0:00:46 | s3 | [####################] 100% chunking | 0:00:11 | s3 | [####################] 100% aligning | 0:23:58 | s3 | [####################] 100% concatenating | 0:00:14 | s3 |
ficus_d.run("4")
Assembly: ficus_d [####################] 100% inferring [H, E] | 0:14:01 | s4 |
Now that the reads are filtered and clustered within each Sample we want to try applying several different parameter settings for downstream analyses. One major difference will be in the minimum depth of sequencing we require to make a confident base call. We will leave one Assembly with the default setting of 6, which is somewhat conservative. We will also create a 'lowdepth' Assembly that allows base calls for depths as low as 2.
ficus_dhi = ficus_d.branch("ficus_dhi")
ficus_dlo = ficus_d.branch("ficus_dlo")
ficus_dlo.set_params("mindepth_majrule", 1)
ficus_dlo.run("5")
ficus_dhi.run("5")
Assembly: ficus_dlo [####################] 100% calculating depths | 0:00:23 | s5 | [####################] 100% chunking clusters | 0:00:21 | s5 | [####################] 100% consens calling | 0:15:17 | s5 | Assembly: ficus_dhi [####################] 100% calculating depths | 0:00:23 | s5 | [####################] 100% chunking clusters | 0:00:21 | s5 | [####################] 100% consens calling | 0:14:24 | s5 |
Compare hidepth and lodepth assemblies. The difference is not actually that great. Regardless, the samples with very few reads are going to recover very few clusters.
import numpy as np
import toyplot
## stack columns of consens stats
zero = np.zeros(ficus_dhi.stats.shape[0])
upper = ficus_dhi.stats.reads_consens
lower = -1*ficus_dlo.stats.reads_consens
boundaries = np.column_stack((lower, zero, upper))
## plot barplots
canvas = toyplot.Canvas(width=700, height=300)
axes = canvas.cartesian()
axes.bars(boundaries, baseline=None)
axes.y.ticks.show = True
axes.y.ticks.labels.angle = -90
## run step 6 on full and subsampled data sets
ficus_dlo.run("6")
ficus_dhi.run("6")
Assembly: ficus_dlo [####################] 100% concat/shuffle input | 0:00:51 | s6 | [####################] 100% clustering across | 3:26:51 | s6 | [####################] 100% building clusters | 0:00:40 | s6 | [####################] 100% aligning clusters | 0:03:15 | s6 | [####################] 100% database indels | 0:01:00 | s6 | [####################] 100% indexing clusters | 0:07:29 | s6 | [####################] 100% building database | 0:51:26 | s6 | Assembly: ficus_dhi [####################] 100% concat/shuffle input | 0:00:45 | s6 | [####################] 100% clustering across | 2:05:50 | s6 | [####################] 100% building clusters | 0:00:37 | s6 | [####################] 100% aligning clusters | 0:02:56 | s6 | [####################] 100% database indels | 0:00:50 | s6 | [####################] 100% indexing clusters | 0:06:03 | s6 | [####################] 100% building database | 0:41:25 | s6 |
We have several Samples that recovered very little data, probably as a result of having low quality DNA extractions. Figs are hard. We'll assemble one data set that includes all of these samples, but since they are likely to have little information we'll also assemble most of our data sets without these low data samples.
ficus_dhi = ip.load_json("analysis-ipyrad/ficus_dhi.json", 1)
ficus_dlo = ip.load_json("analysis-ipyrad/ficus_dlo.json", 1)
## get list of samples with >5000 consens reads
skeep = ficus_dlo.stats.index[ficus_dlo.stats.reads_consens > 1000].tolist()
## print who was excluded
print "excluded samples:\t\tconsens reads"
for name, dat in ficus_dlo.samples.items():
if name not in skeep:
print " {:<22}\t{}".format(name, int(dat.stats.reads_consens))
excluded samples: consens reads C02_citrifolia 44 C09_costaricana 189 C52_citrifolia 269 C34_triangle 374
## get list of samples with >5000 consens reads
fkeep = ficus_dhi.stats.index[ficus_dhi.stats.reads_consens > 5000].tolist()
## print who was excluded
print "excluded samples:\t\tconsens reads"
for name, dat in ficus_dhi.samples.items():
if name not in fkeep:
print " {:<22}\t{}".format(name, int(dat.stats.reads_consens))
excluded samples: consens reads A07_obtusifolia 2209 A62_turbinata 2331 C02_citrifolia 25 B103_obtusifolia 3851 A26_popenoei 2303 A63_turbinata 1009 A75_colubrinae 1987 A29_popenoei 955 A28_popenoei 2304 C33_triangle 971 A38_nymphaeifolia 4648 C09_costaricana 116 A06_obtusifolia 4188 C01_bullenei 2885 C52_citrifolia 160 C54_citrifolia 1024 C32_triangleXtrigonata 1714 A27_popenoei 1670 C34_triangle 212
Also with different thresholds for missing data.
## final min4s, min10s, and min20
for assembly in [ficus_dlo, ficus_dhi]:
for minsamp in [4, 10, 20]:
## new branch names
fullname = assembly.name + "_f{}".format(minsamp)
subname = assembly.name + "_s{}".format(minsamp)
## make new branches with subsampling
fulldata = assembly.branch(fullname, subsamples=fkeep)
subdata = assembly.branch(subname, subsamples=skeep)
## set minsamp value
fulldata.set_params("min_samples_locus", minsamp)
subdata.set_params("min_samples_locus", minsamp)
## run step7
fulldata.run("7")
subdata.run("7")
Assembly: ficus_dlo_f4 [####################] 100% filtering loci | 0:00:37 | s7 | [####################] 100% building loci/stats | 0:00:34 | s7 | [####################] 100% building alleles | 0:00:45 | s7 | [####################] 100% building vcf file | 0:01:41 | s7 | [####################] 100% writing vcf file | 0:00:01 | s7 | [####################] 100% building arrays | 0:00:30 | s7 | [####################] 100% writing outfiles | 0:01:32 | s7 | Outfiles written to: ~/Documents/Ficus/analysis-ipyrad/ficus_dlo_f4_outfiles Assembly: ficus_dlo_s4 [####################] 100% filtering loci | 0:00:44 | s7 | [####################] 100% building loci/stats | 0:00:34 | s7 | [####################] 100% building alleles | 0:00:42 | s7 | [####################] 100% building vcf file | 0:01:25 | s7 | [####################] 100% writing vcf file | 0:00:01 | s7 | [####################] 100% building arrays | 0:00:37 | s7 | [####################] 100% writing outfiles | 0:01:17 | s7 | Outfiles written to: ~/Documents/Ficus/analysis-ipyrad/ficus_dlo_s4_outfiles Assembly: ficus_dlo_f10 [####################] 100% filtering loci | 0:00:39 | s7 | [####################] 100% building loci/stats | 0:00:33 | s7 | [####################] 100% building alleles | 0:00:45 | s7 | [####################] 100% building vcf file | 0:01:16 | s7 | [####################] 100% writing vcf file | 0:00:01 | s7 | [####################] 100% building arrays | 0:00:28 | s7 | [####################] 100% writing outfiles | 0:00:58 | s7 | Outfiles written to: ~/Documents/Ficus/analysis-ipyrad/ficus_dlo_f10_outfiles Assembly: ficus_dlo_s10 [####################] 100% filtering loci | 0:00:43 | s7 | [####################] 100% building loci/stats | 0:00:34 | s7 | [####################] 100% building alleles | 0:00:43 | s7 | [####################] 100% building vcf file | 0:01:11 | s7 | [####################] 100% writing vcf file | 0:00:00 | s7 | [####################] 100% building arrays | 0:00:39 | s7 | [####################] 100% writing outfiles | 0:00:42 | s7 | Outfiles written to: ~/Documents/Ficus/analysis-ipyrad/ficus_dlo_s10_outfiles Assembly: ficus_dlo_f20 [####################] 100% filtering loci | 0:00:36 | s7 | [####################] 100% building loci/stats | 0:00:34 | s7 | [####################] 100% building alleles | 0:00:45 | s7 | [####################] 100% building vcf file | 0:01:08 | s7 | [####################] 100% writing vcf file | 0:00:00 | s7 | [####################] 100% building arrays | 0:00:28 | s7 | [####################] 100% writing outfiles | 0:00:35 | s7 | Outfiles written to: ~/Documents/Ficus/analysis-ipyrad/ficus_dlo_f20_outfiles Assembly: ficus_dlo_s20 [####################] 100% filtering loci | 0:00:44 | s7 | [####################] 100% building loci/stats | 0:00:33 | s7 | [####################] 100% building alleles | 0:00:43 | s7 | [####################] 100% building vcf file | 0:01:02 | s7 | [####################] 100% writing vcf file | 0:00:00 | s7 | [####################] 100% building arrays | 0:00:39 | s7 | [####################] 100% writing outfiles | 0:00:29 | s7 | Outfiles written to: ~/Documents/Ficus/analysis-ipyrad/ficus_dlo_s20_outfiles Assembly: ficus_dhi_f4 [####################] 100% filtering loci | 0:00:30 | s7 | [####################] 100% building loci/stats | 0:00:28 | s7 | [####################] 100% building alleles | 0:00:40 | s7 | [####################] 100% building vcf file | 0:01:24 | s7 | [####################] 100% writing vcf file | 0:00:01 | s7 | [####################] 100% building arrays | 0:00:23 | s7 | [####################] 100% writing outfiles | 0:01:26 | s7 | Outfiles written to: ~/Documents/Ficus/analysis-ipyrad/ficus_dhi_f4_outfiles Assembly: ficus_dhi_s4 [####################] 100% filtering loci | 0:00:37 | s7 | [####################] 100% building loci/stats | 0:00:28 | s7 | [####################] 100% building alleles | 0:00:38 | s7 | [####################] 100% building vcf file | 0:01:15 | s7 | [####################] 100% writing vcf file | 0:00:01 | s7 | [####################] 100% building arrays | 0:00:31 | s7 | [####################] 100% writing outfiles | 0:01:07 | s7 | Outfiles written to: ~/Documents/Ficus/analysis-ipyrad/ficus_dhi_s4_outfiles Assembly: ficus_dhi_f10 [####################] 100% filtering loci | 0:00:30 | s7 | [####################] 100% building loci/stats | 0:00:28 | s7 | [####################] 100% building alleles | 0:00:40 | s7 | [####################] 100% building vcf file | 0:01:06 | s7 | [####################] 100% writing vcf file | 0:00:00 | s7 | [####################] 100% building arrays | 0:00:23 | s7 | [####################] 100% writing outfiles | 0:00:53 | s7 | Outfiles written to: ~/Documents/Ficus/analysis-ipyrad/ficus_dhi_f10_outfiles Assembly: ficus_dhi_s10 [####################] 100% filtering loci | 0:00:37 | s7 | [####################] 100% building loci/stats | 0:00:28 | s7 | [####################] 100% building alleles | 0:00:37 | s7 | [####################] 100% building vcf file | 0:01:01 | s7 | [####################] 100% writing vcf file | 0:00:00 | s7 | [####################] 100% building arrays | 0:00:30 | s7 | [####################] 100% writing outfiles | 0:00:40 | s7 | Outfiles written to: ~/Documents/Ficus/analysis-ipyrad/ficus_dhi_s10_outfiles Assembly: ficus_dhi_f20 [####################] 100% filtering loci | 0:00:30 | s7 | [####################] 100% building loci/stats | 0:00:28 | s7 | [####################] 100% building alleles | 0:00:40 | s7 | [####################] 100% building vcf file | 0:00:57 | s7 | [####################] 100% writing vcf file | 0:00:00 | s7 | [####################] 100% building arrays | 0:00:23 | s7 | [####################] 100% writing outfiles | 0:00:35 | s7 | Outfiles written to: ~/Documents/Ficus/analysis-ipyrad/ficus_dhi_f20_outfiles Assembly: ficus_dhi_s20 [####################] 100% filtering loci | 0:00:36 | s7 | [####################] 100% building loci/stats | 0:00:27 | s7 | [####################] 100% building alleles | 0:00:37 | s7 | [####################] 100% building vcf file | 0:00:55 | s7 | [####################] 100% writing vcf file | 0:00:00 | s7 | [####################] 100% building arrays | 0:00:30 | s7 | [####################] 100% writing outfiles | 0:00:27 | s7 | Outfiles written to: ~/Documents/Ficus/analysis-ipyrad/ficus_dhi_s20_outfiles
! head -n 180 ./analysis-ipyrad/ficus_dhi_s4_outfiles/ficus_dhi_s4_stats.txt
## The number of loci caught by each filter. ## ipyrad API location: [assembly].stats_dfs.s7_filters total_filters applied_order retained_loci total_prefiltered_loci 250234 0 250234 filtered_by_rm_duplicates 6458 6458 243776 filtered_by_max_indels 4887 4887 238889 filtered_by_max_snps 9954 4454 234435 filtered_by_max_shared_het 10985 9069 225366 filtered_by_min_sample 132699 131945 93421 filtered_by_max_alleles 17038 7317 86104 total_filtered_loci 86104 0 86104 ## The number of loci recovered for each Sample. ## ipyrad API location: [assembly].stats_dfs.s7_samples sample_coverage A01_paraensis 4709 A02_paraensis 9027 A04_paraensis 33358 A05_paraensis 11921 A16_citrifolia 2809 A18_citrifolia 2860 A19_citrifolia 7264 A33_nymphaeifolia 3585 A34_nymphaeifolia 249 A41_nymphaeifolia 24073 A42_nymphaeifolia 32284 A48_trigonata 6820 A49_trigonata 7227 A55_triangle 7578 A59_dugandii 28231 A60_dugandii 2289 A61_turbinata 5593 A65_pertusa 34404 A67_bullenei 15660 A69_bullenei 2343 A70_bullenei 11085 A71_bullenei 2005 A72_bullenei 3708 A77_colubrinae 27773 A82_perforata 22655 A83_perforata 22382 A84_perforata 34127 A85_perforata 20699 A87_costaricana 3125 A94_maxima 27514 A95_insipida 15990 A96_glabrata 31630 A97_glabrata 23116 B102_obtusifolia 33193 B118_maxima 2942 B119_maxima 26742 B120_maxima 29695 B123_maxima 22648 B126_insipida 35904 B127_insipida 27385 B128_insipida 35579 B130_glabrataXmaxima 35495 B131_glabrataXmaxima 36156 B133_glabrata 32823 B134_glabrata 35118 C04_colubrinae 32020 C11_costaricana 32709 C12_dugandii 9210 C14_dugandii 33631 C15_insipida 35105 C17_maxima 29734 C18_maxima 36576 C19_nymphaeifolia 32354 C21_obtusifolia 3433 C22_obtusifolia 34673 C24_obtusifolia 32523 C25_popenoei 32528 C26_popenoei 30708 C27_popenoei 18398 C28_pertusa 31449 C30_triangle 15571 C31_triangle 15198 C36_trigonata 35036 C37_trigonata 34513 C39_trigonata 33493 C41_trigonata 13542 C43_trigonata 34741 C45_yoponensis 29750 C46_yoponensis 35213 C47_yoponensis 35740 C48_tonduzii 32862 C49_dugandii 29991 C50_insipida 36203 C51_perforata 31695 C53_citrifolia 4318 C5_colubrinae 35010 ## The number of loci for which N taxa have data. ## ipyrad API location: [assembly].stats_dfs.s7_loci locus_coverage sum_coverage 1 0 0 2 0 0 3 0 0 4 11887 11887 5 7632 19519 6 5250 24769 7 3751 28520 8 2800 31320 9 2237 33557 10 1919 35476 11 1754 37230 12 1560 38790 13 1552 40342 14 1500 41842 15 1574 43416 16 1799 45215 17 2033 47248 18 2320 49568 19 2631 52199 20 2425 54624 21 1888 56512 22 1250 57762 23 912 58674 24 845 59519 25 854 60373 26 841 61214 27 887 62101 28 916 63017 29 843 63860 30 827 64687 31 788 65475 32 705 66180 33 681 66861 34 665 67526 35 640 68166 36 541 68707 37 533 69240 38 589 69829 39 585 70414 40 635 71049 41 690 71739 42 753 72492 43 775 73267 44 935 74202 45 1023 75225 46 1091 76316 47 1086 77402 48 1154 78556 49 1119 79675 50 1088 80763 51 1127 81890 52 911 82801 53 840 83641 54 698 84339 55 566 84905 56 411 85316 57 311 85627 58 217 85844 59 124 85968 60 66 86034 61 41 86075 62 15 86090 63 12 86102 64 1 86103 65 1 86104 66 0 86104 67 0 86104 68 0 86104 69 0 86104 70 0 86104 71 0 86104 72 0 86104 73 0 86104 74 0 86104 75 0 86104 76 0 86104
! head -n 200 ./analysis-ipyrad/ficus_dlo_f4_outfiles/ficus_dlo_f4_stats.txt
## The number of loci caught by each filter. ## ipyrad API location: [assembly].stats_dfs.s7_filters total_filters applied_order retained_loci total_prefiltered_loci 301811 0 301811 filtered_by_rm_duplicates 12031 12031 289780 filtered_by_max_indels 5165 5165 284615 filtered_by_max_snps 14799 5918 278697 filtered_by_max_shared_het 10960 8745 269952 filtered_by_min_sample 170692 168518 101434 filtered_by_max_alleles 17052 6960 94474 total_filtered_loci 94474 0 94474 ## The number of loci recovered for each Sample. ## ipyrad API location: [assembly].stats_dfs.s7_samples sample_coverage A01_paraensis 6133 A02_paraensis 11534 A04_paraensis 35023 A05_paraensis 14597 A06_obtusifolia 605 A07_obtusifolia 455 A16_citrifolia 3664 A18_citrifolia 3804 A19_citrifolia 9194 A26_popenoei 542 A27_popenoei 391 A28_popenoei 495 A29_popenoei 195 A33_nymphaeifolia 4610 A34_nymphaeifolia 375 A38_nymphaeifolia 818 A41_nymphaeifolia 27315 A42_nymphaeifolia 33879 A48_trigonata 8600 A49_trigonata 8952 A55_triangle 9528 A59_dugandii 30943 A60_dugandii 3045 A61_turbinata 7101 A62_turbinata 313 A63_turbinata 144 A65_pertusa 35730 A67_bullenei 18658 A69_bullenei 3098 A70_bullenei 13800 A71_bullenei 2754 A72_bullenei 4768 A75_colubrinae 827 A77_colubrinae 30393 A82_perforata 25654 A83_perforata 25691 A84_perforata 35083 A85_perforata 23587 A87_costaricana 4064 A94_maxima 30387 A95_insipida 19263 A96_glabrata 34228 A97_glabrata 26578 B102_obtusifolia 34419 B103_obtusifolia 1485 B118_maxima 3935 B119_maxima 30186 B120_maxima 32218 B123_maxima 26420 B126_insipida 36861 B127_insipida 30207 B128_insipida 36712 B130_glabrataXmaxima 37486 B131_glabrataXmaxima 37064 B133_glabrata 34513 B134_glabrata 36393 C01_bullenei 460 C02_citrifolia 11 C04_colubrinae 34558 C09_costaricana 26 C11_costaricana 33770 C12_dugandii 11363 C14_dugandii 34604 C15_insipida 35995 C17_maxima 30679 C18_maxima 37328 C19_nymphaeifolia 33252 C21_obtusifolia 4365 C22_obtusifolia 35408 C24_obtusifolia 34171 C25_popenoei 33331 C26_popenoei 32015 C27_popenoei 21479 C28_pertusa 32119 C30_triangle 17738 C31_triangle 18362 C32_triangleXtrigonata 215 C33_triangle 286 C34_triangle 54 C36_trigonata 35796 C37_trigonata 35531 C39_trigonata 34858 C41_trigonata 16630 C43_trigonata 35561 C45_yoponensis 30696 C46_yoponensis 35982 C47_yoponensis 36405 C48_tonduzii 33723 C49_dugandii 31698 C50_insipida 37261 C51_perforata 33538 C52_citrifolia 64 C53_citrifolia 5357 C54_citrifolia 235 C5_colubrinae 36321 ## The number of loci for which N taxa have data. ## ipyrad API location: [assembly].stats_dfs.s7_loci locus_coverage sum_coverage 1 0 0 2 0 0 3 0 0 4 14735 14735 5 9272 24007 6 6292 30299 7 4315 34614 8 3163 37777 9 2482 40259 10 2065 42324 11 1793 44117 12 1600 45717 13 1485 47202 14 1513 48715 15 1452 50167 16 1593 51760 17 1807 53567 18 2047 55614 19 2440 58054 20 2723 60777 21 2464 63241 22 1599 64840 23 950 65790 24 853 66643 25 836 67479 26 798 68277 27 752 69029 28 841 69870 29 869 70739 30 850 71589 31 823 72412 32 765 73177 33 716 73893 34 699 74592 35 635 75227 36 617 75844 37 550 76394 38 529 76923 39 522 77445 40 530 77975 41 561 78536 42 592 79128 43 664 79792 44 745 80537 45 814 81351 46 888 82239 47 956 83195 48 1035 84230 49 1068 85298 50 1134 86432 51 1117 87549 52 1140 88689 53 1096 89785 54 979 90764 55 908 91672 56 774 92446 57 593 93039 58 480 93519 59 371 93890 60 242 94132 61 157 94289 62 87 94376 63 50 94426 64 34 94460 65 7 94467 66 5 94472 67 1 94473 68 1 94474 69 0 94474 70 0 94474 71 0 94474 72 0 94474 73 0 94474 74 0 94474 75 0 94474 76 0 94474 77 0 94474 78 0 94474 79 0 94474