Notebook

Tutorial:¶

single-end ddRAD analysis¶

(or other two-cutter based datatypes)¶

pyRAD v.3.0.4¶

Topics:¶

Setup of params file for single-end ddRAD (ddrad)
Assemble simulated data set
Visualize results

What is ddRAD?¶

The double digest library preparation method was developed and described by Peterson et al. 2012.

A ddRAD library is prepared by cutting genomic DNA with two different restriction enzymes and selecting the intervening fragments that are within a certain size window. These will contain overhangs from the respective cut sites on either side. One side will have a barcode+illumina adapter ligated, and the other end will have only the reverse Illumina adapter ligated. Depending on the size selection window, the far end adapter and cutsite will not appear in the part of the fragment that is sequenced (the read).

The Example data¶

For this tutorial I simulated ddRAD data on a 12 taxon species tree, shown below. You can download the data using the script below and assemble these data by following along with all of the instructions.

In [1]:

%%bash
## download the data
wget -q http://www.dereneaton.com/downloads/simddrads.zip
## unzip the data
unzip simddrads.zip

Archive:  simddrads.zip
  inflating: simddrads.barcodes      
  inflating: simddrads_R1_.fastq.gz

Notes on the data set¶

These data were simulated with a size selection window between 50-250 bp, meaning that many of the reads will contain the far-end adapter sequence within the 100 bp sequenced read. I included this because it is a fairly common occurrence in empirical ddRAD data sets. I will show how to filter out these adapters in step 2 of the pyrad analysis.

Assembling the data set with pyRAD¶

We first create an empty params.txt input file for the pyRAD analysis. The following command will create a template which we will fill in with all relevant parameter settings for the analysis.

In [2]:

%%bash
pyrad -n

	new params.txt file created

Take a look at the default options. Each line designates a parameter, and contains a "##" symbol after which comments can be added, and which includes a description of the parameter. The affected step for each parameter is shown in parentheses. The first 14 parameters are required. Numbers 15-37 are optional.

In [3]:

%%bash
cat params.txt

==** parameter inputs for pyRAD version 3.0.4  **======================== affected step ==
./                          ## 1. Working directory                                 (all)
./*.fastq.gz                ## 2. Loc. of non-demultiplexed files (if not line 16)  (s1)
./*.barcodes                ## 3. Loc. of barcode file (if not line 16)             (s1)
vsearch                     ## 4. command (or path) to call vsearch (or usearch)    (s3,s6)
muscle                      ## 5. command (or path) to call muscle                  (s3,s7)
TGCAG                       ## 6. Restriction overhang (e.g., C|TGCAG -> TGCAG)     (s1,s2)
2                           ## 7. N processors (parallel)                           (all)
6                           ## 8. Mindepth: min coverage for a cluster              (s4,s5)
4                           ## 9. NQual: max # sites with qual < 20 (or see line 20)(s2)
.88                         ## 10. Wclust: clustering threshold as a decimal        (s3,s6)
rad                         ## 11. Datatype: rad,gbs,ddrad,pairgbs,pairddrad,merged (all)
4                           ## 12. MinCov: min samples in a final locus             (s7)
3                           ## 13. MaxSH: max inds with shared hetero site          (s7)
c88d6m4p3                   ## 14. Prefix name for final output (no spaces)         (s7)
==== optional params below this line ===================================  affected step ==
                       ## 15.opt.: select subset (prefix* only selector)            (s2-s7)
                       ## 16.opt.: add-on (outgroup) taxa (list or prefix*)         (s6,s7)
                       ## 17.opt.: exclude taxa (list or prefix*)                   (s7)
                       ## 18.opt.: loc. of de-multiplexed data                      (s2)
                       ## 19.opt.: maxM: N mismatches in barcodes (def= 1)          (s1)
                       ## 20.opt.: phred Qscore offset (def= 33)                    (s2)
                       ## 21.opt.: filter: def=0=NQual 1=NQual+adapters. 2=strict   (s2)
                       ## 22.opt.: a priori E,H (def= 0.001,0.01, if not estimated) (s5)
                       ## 23.opt.: maxN: max Ns in a cons seq (def=5)               (s5)
                       ## 24.opt.: maxH: max heterozyg. sites in cons seq (def=5)   (s5)
                       ## 25.opt.: ploidy: max alleles in cons seq (def=2;see docs) (s4,s5)
                       ## 26.opt.: maxSNPs: (def=100). Paired (def=100,100)         (s7)
                       ## 27.opt.: maxIndels: within-clust,across-clust (def. 3,99) (s3,s7)
                       ## 28.opt.: random number seed (def. 112233)              (s3,s6,s7)
                       ## 29.opt.: trim overhang left,right on final loci, def(0,0) (s7)
                       ## 30.opt.: output formats: p,n,a,s,v,u,t,m,k,g,* (see docs) (s7)
                       ## 31.opt.: maj. base call at depth>x<mindepth (def.x=mindepth) (s5)
                       ## 32.opt.: keep trimmed reads (def=0). Enter min length.    (s2)
                       ## 33.opt.: max stack size (int), def= max(500,mean+2*SD)    (s3)
                       ## 34.opt.: minDerep: exclude dereps with <= N copies, def=1 (s3)
                       ## 35.opt.: use hierarchical clustering (def.=0, 1=yes)      (s6)
                       ## 36.opt.: repeat masking (def.=1='dust' method, 0=no)      (s3,s6)
                       ## 37.opt.: vsearch max threads per job (def.=6; see docs)   (s3,s6)
==== optional: list group/clade assignments below this line (see docs) ==================

Edit the params file¶

I will use the script below to substitute new values, but you should simply use any text editor to make changes. For this analysis I made the following changes from the defaults:

 6. set the two restriction enzymes used to generate the ddRAD data
 10. lowered clustering threshold to .85
 11. set datatype to ddrad
 14. changed the output name prefix
 19. mismatches for demulitiplexing set to 0, exact match.
 24. Raised maxH. Lower is better for filtering paralogs.
 30. added additional output formats (e.g., nexus,SNPs,STRUCTURE)

In [4]:

%%bash
sed -i '/## 6. /c\TGCAG,AATT                 ## 6. cutsites... ' ./params.txt
sed -i '/## 10. /c\.85                       ## 10. lowered clust thresh' ./params.txt
sed -i '/## 11. /c\ddrad                     ## 11. datatype... ' ./params.txt
sed -i '/## 14. /c\c85d6m4p3                 ## 14. prefix name ' ./params.txt
sed -i '/## 19./c\0                     ## 19. errbarcode... ' ./params.txt
sed -i '/## 24./c\10                    ## 24. maxH... ' ./params.txt
sed -i '/## 30./c\*                     ## 30. outformats... ' ./params.txt

Let's take a look at the edited params.txt file

In [5]:

%%bash
cat params.txt

==** parameter inputs for pyRAD version 3.0.4  **======================== affected step ==
./                          ## 1. Working directory                                 (all)
./*.fastq.gz                ## 2. Loc. of non-demultiplexed files (if not line 16)  (s1)
./*.barcodes                ## 3. Loc. of barcode file (if not line 16)             (s1)
vsearch                     ## 4. command (or path) to call vsearch (or usearch)    (s3,s6)
muscle                      ## 5. command (or path) to call muscle                  (s3,s7)
TGCAG,AATT                 ## 6. cutsites... 
2                           ## 7. N processors (parallel)                           (all)
6                           ## 8. Mindepth: min coverage for a cluster              (s4,s5)
4                           ## 9. NQual: max # sites with qual < 20 (or see line 20)(s2)
.85                       ## 10. lowered clust thresh
ddrad                     ## 11. datatype... 
4                           ## 12. MinCov: min samples in a final locus             (s7)
3                           ## 13. MaxSH: max inds with shared hetero site          (s7)
c85d6m4p3                 ## 14. prefix name 
==== optional params below this line ===================================  affected step ==
                       ## 15.opt.: select subset (prefix* only selector)            (s2-s7)
                       ## 16.opt.: add-on (outgroup) taxa (list or prefix*)         (s6,s7)
                       ## 17.opt.: exclude taxa (list or prefix*)                   (s7)
                       ## 18.opt.: loc. of de-multiplexed data                      (s2)
0                     ## 19. errbarcode... 
                       ## 20.opt.: phred Qscore offset (def= 33)                    (s2)
                       ## 21.opt.: filter: def=0=NQual 1=NQual+adapters. 2=strict   (s2)
                       ## 22.opt.: a priori E,H (def= 0.001,0.01, if not estimated) (s5)
                       ## 23.opt.: maxN: max Ns in a cons seq (def=5)               (s5)
10                    ## 24. maxH... 
                       ## 25.opt.: ploidy: max alleles in cons seq (def=2;see docs) (s4,s5)
                       ## 26.opt.: maxSNPs: (def=100). Paired (def=100,100)         (s7)
                       ## 27.opt.: maxIndels: within-clust,across-clust (def. 3,99) (s3,s7)
                       ## 28.opt.: random number seed (def. 112233)              (s3,s6,s7)
                       ## 29.opt.: trim overhang left,right on final loci, def(0,0) (s7)
*                     ## 30. outformats... 
                       ## 31.opt.: maj. base call at depth>x<mindepth (def.x=mindepth) (s5)
                       ## 32.opt.: keep trimmed reads (def=0). Enter min length.    (s2)
                       ## 33.opt.: max stack size (int), def= max(500,mean+2*SD)    (s3)
                       ## 34.opt.: minDerep: exclude dereps with <= N copies, def=1 (s3)
                       ## 35.opt.: use hierarchical clustering (def.=0, 1=yes)      (s6)
                       ## 36.opt.: repeat masking (def.=1='dust' method, 0=no)      (s3,s6)
                       ## 37.opt.: vsearch max threads per job (def.=6; see docs)   (s3,s6)
==== optional: list group/clade assignments below this line (see docs) ==================

Step 1: De-multiplexing the data¶

Your data will likely come to you non-demultiplexed (meaning not sorted by which individual the reads belong to). You can demultiplex the reads according to their barcodes using pyRAD or separate software. Here pyrad uses information from the barcodes file to demultiplex the reads.

Take a look at the barcodes file¶

In [6]:

%%bash
cat simddrads.barcodes

1A0	CATCAT
1B0	GGGAAT
1C0	AAGGTG
1D0	GTTTTA
2E0	GTTAGT
2F0	TATAGT
2G0	TGTGAA
2H0	TGTGGT
3I0	TATAAG
3J0	GGAAAG
3K0	GTGTTT
3L0	GGAGGT

In [7]:

%%bash
pyrad -p params.txt -s 1


     ------------------------------------------------------------
      pyRAD : RADseq for phylogenetics & introgression analyses
     ------------------------------------------------------------


	step 1: sorting reads by barcode
	 .

Now we can look at the stats output for this below:

In [8]:

%%bash
cat stats/s1.sorting.txt

file    	Nreads	cut_found	bar_matched
simddrads_.fastq.gz	240000	240000	240000


sample	true_bar	obs_bars	N_obs
1C0    	AAGGTG    	AAGGTG	20000   
1A0    	CATCAT    	CATCAT	20000   
3J0    	GGAAAG    	GGAAAG	20000   
3L0    	GGAGGT    	GGAGGT	20000   
1B0    	GGGAAT    	GGGAAT	20000   
3K0    	GTGTTT    	GTGTTT	20000   
2E0    	GTTAGT    	GTTAGT	20000   
1D0    	GTTTTA    	GTTTTA	20000   
3I0    	TATAAG    	TATAAG	20000   
2F0    	TATAGT    	TATAGT	20000   
2G0    	TGTGAA    	TGTGAA	20000   
2H0    	TGTGGT    	TGTGGT	20000   

nomatch  	_            	0

The de-multiplexed reads are written to a new file for each individual in a new directory created within your working directory called fastq/

In [9]:

%%bash 
ls fastq/

1A0_R1.fq.gz
1B0_R1.fq.gz
1C0_R1.fq.gz
1D0_R1.fq.gz
2E0_R1.fq.gz
2F0_R1.fq.gz
2G0_R1.fq.gz
2H0_R1.fq.gz
3I0_R1.fq.gz
3J0_R1.fq.gz
3K0_R1.fq.gz
3L0_R1.fq.gz

An individual file will look like below:

In [10]:

%%bash
## I show the first 12 lines and 80 characters to print clearly here
less fastq/1A0_R1.fq.gz | head -n 12 | cut -c 1-80

@lane1_fakedata0_R1_0 1:N:0:
TGCAGGCGAAGGTATATTGAGCGTCTTAAGCCCCCGTGGTACTTTCTGCTCTTAGCCCATTATCTTGTCTACGAATTCAT
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@lane1_fakedata0_R1_1 1:N:0:
TGCAGGCGAAGGTATATTGAGCGTCTTAAGCCCCCGTGGTACTTTCTGCTCTTAGCCCATTATCTTGTCTACGAATTCAT
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@lane1_fakedata0_R1_2 1:N:0:
TGCAGGCGAAGGTATATTGAGCGTCTTAAGCCCCCGTGGTACTTTCTGCTCTTAGCCCATTATCTTGTCTACGAATTCAT
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

If your data were previously de-multiplexed you need the following things before step 2:

your sorted file names should be formatted similar to above, but with your sample names substituted for 1A0, 1A1, etc.
the files can be zipped (.gz) or not (.fq or .fastq).
the barcode should be removed (not on left side of reads)
the restriction site should not be removed, but if it is, enter a '@' symbol before the location of your sorted data.
Enter on line 18 of the params file the location of your sorted data.

Step 2: Quality filtering¶

Next we apply the quality filtering.

Here we set the quality filter (param 21) to 1, meaning that it will check for presence of Illumina adapters and trim them off. In combination, we will also set the minimum length of kept reads after trimming (param 32) to 50.
In addition to filtering for adapters, this step also filters based on quality scores. Low quality sites are converted to Ns, and any locus with more than X number of Ns is discarded, where X is the number set in param 9.

In [11]:

%%bash
sed -i '/## 21./c\1                     ## 24. filter setting ' ./params.txt
sed -i '/## 32./c\35                    ## 32. min trim length' ./params.txt

In [12]:

%%bash
pyrad -p params.txt -s 2


     ------------------------------------------------------------
      pyRAD : RADseq for phylogenetics & introgression analyses
     ------------------------------------------------------------

	step 2: editing raw reads 
	............

Statistics for the number of reads that passed filtering can be found in the stats/ directory. You can see here that a few reads were filtered out erroneously due to detection of the common cutter overhang (AATT) adjacent to the beginning of what appeared to be the adapter sequence. In this case 3 loci (~60 read copies) per sample were excluded. About 200 (20%) loci had the adapter detected but were still retained after trimming.

In [13]:

%%bash 
cat stats/s2.rawedit.txt

sample 	Nreads	passed	passed.w.trim	passed.total
1A0	20000	15804	4136	19940
1B0	20000	15802	4139	19941
1C0	20000	15820	4120	19940
1D0	20000	15798	4142	19940
2E0	20000	15799	4140	19939
2F0	20000	15804	4137	19941
2G0	20000	15800	4140	19940
2H0	20000	15810	4131	19941
3I0	20000	15799	4141	19940
3J0	20000	15801	4139	19940
3K0	20000	15780	4160	19940
3L0	20000	15821	4119	19940

    Nreads = total number of reads for a sample
    passed = retained reads that passed quality filtering at full length
    passed.w.trim= retained reads that were trimmed due to detection of adapters
    passed.total  = total kept reads of sufficient length
    note: you can set the option in params file to include trimmed reads of xx length.

The filtered data files are converted to fasta format and written to a directory called edits/. I show this below:

In [14]:

%%bash
ls edits/

1A0.edit
1B0.edit
1C0.edit
1D0.edit
2E0.edit
2F0.edit
2G0.edit
2H0.edit
3I0.edit
3J0.edit
3K0.edit
3L0.edit

The edited reads are saved in fasta format in the edits/ directory

In [15]:

%%bash
head -n 8 edits/1A0.edit

>1A0_0_r1
TGCAGACTCAATTGACAGGATCTTGAGCGGGTGCGTCAGTAACCGAAGACCCCTGAGGCTACTTGAGGGCTGATCAACATAGCCAATTTAAGTG
>1A0_1_r1
TGCAGACTCAATTGACAGGATCTTGAGCGGGTGCGTCAGTAACCGAAGACGCCTGAGGCTACTTGAGGGCTGATCAACATAGCCAATTTAAGTG
>1A0_2_r1
TGCAGACTCAATTGACAGGATCTTGAGCGGGTGCGTCAGTAACCGAAGACCCCTGAGGCTACTTGAGGGCTGATCAACATAGCCAATTTAAGTG
>1A0_3_r1
TGCAGACTCAATTGACAGGATCTTGAGCGGGTGCGTCAGTAACCGAAGACCCCTGAGGCTACTTGAGGGCTGATCAACATAGCCAATTTAAGTG

Step 3: Within-sample clustering¶

Steps 3-7 proceed the same as in the original 'rad' datatype, with the only difference being that in step 7 the second cut site is trimmed from the end of loci when it is present.

In [16]:

%%bash
pyrad -p params.txt -s 3


     ------------------------------------------------------------
      pyRAD : RADseq for phylogenetics & introgression analyses
     ------------------------------------------------------------


	de-replicating files for clustering...

	step 3: within-sample clustering of 12 samples at 
	        '.85' similarity. Running 2 parallel jobs
	 	with up to 6 threads per job. If needed, 
		adjust to avoid CPU and MEM limits

	sample 3L0 finished, 997 loci
	sample 1C0 finished, 998 loci
	sample 2H0 finished, 999 loci
	sample 1B0 finished, 999 loci
	sample 2F0 finished, 998 loci
	sample 1A0 finished, 1000 loci
	sample 3J0 finished, 997 loci
	sample 2G0 finished, 997 loci
	sample 1D0 finished, 997 loci
	sample 3I0 finished, 997 loci
	sample 2E0 finished, 999 loci
	sample 3K0 finished, 997 loci

In [18]:

%%bash
head -n 23 stats/s3.clusters.txt

taxa	total	dpt.me	dpt.sd	d>5.tot	d>5.me	d>5.sd	badpairs
1A0	1000	19.94	1.024	997	19.996	0.077	0
1B0	999	19.961	0.85	997	19.999	0.032	0
1C0	998	19.98	0.602	997	19.999	0.032	0
1D0	997	20.0	0.0	997	20.0	0.0	0
2E0	999	19.959	0.851	997	19.997	0.055	0
2F0	998	19.981	0.601	997	20.0	0.0	0
2G0	997	20.0	0.0	997	20.0	0.0	0
2H0	999	19.961	0.85	997	19.999	0.032	0
3I0	997	20.0	0.0	997	20.0	0.0	0
3J0	997	20.0	0.0	997	20.0	0.0	0
3K0	997	20.0	0.0	997	20.0	0.0	0
3L0	997	20.0	0.0	997	20.0	0.0	0

    ## total = total number of clusters, including singletons
    ## dpt.me = mean depth of clusters
    ## dpt.sd = standard deviation of cluster depth
    ## >N.tot = number of clusters with depth greater than N
    ## >N.me = mean depth of clusters with depth greater than N
    ## >N.sd = standard deviation of cluster depth for clusters with depth greater than N
    ## badpairs = mismatched 1st & 2nd reads (only for paired ddRAD data)

Steps 4 & 5: Consensus base calling¶

We next make consensus base calls for each cluster within each individual. First we estimate the error rate and heterozygosity within each sample:

In [19]:

%%bash
pyrad -p params.txt -s 4


     ------------------------------------------------------------
      pyRAD : RADseq for phylogenetics & introgression analyses
     ------------------------------------------------------------


	step 4: estimating error rate and heterozygosity
	............

In [20]:

%%bash
pyrad -p params.txt -s 5


     ------------------------------------------------------------
      pyRAD : RADseq for phylogenetics & introgression analyses
     ------------------------------------------------------------


	step 5: creating consensus seqs for 12 samples, using H=0.00143 E=0.00050
	............

In [21]:

%%bash
cat stats/s5.consens.txt

taxon          	nloci	f1loci	f2loci	nsites	npoly	poly
1C0	998	997	997	79620	121	0.0015197
2F0	998	997	996	79510	126	0.0015847
1A0	1000	997	997	79592	112	0.0014072
2E0	999	997	997	79594	117	0.00147
1B0	999	997	997	79594	115	0.0014448
3K0	997	997	997	79571	124	0.0015584
2G0	997	997	997	79592	123	0.0015454
3L0	997	997	997	79611	134	0.0016832
2H0	999	997	997	79601	126	0.0015829
3I0	997	997	997	79594	93	0.0011684
1D0	997	997	997	79592	134	0.0016836
3J0	997	997	997	79591	127	0.0015957

    ## nloci = number of loci
    ## f1loci = number of loci with >N depth coverage
    ## f2loci = number of loci with >N depth and passed paralog filter
    ## nsites = number of sites across f loci
    ## npoly = number of polymorphic sites in nsites
    ## poly = frequency of polymorphic sites

Step 6: Across-sample clustering¶

This step clusters consensus sequences across samples. It can take a long time for very large data sets. If run normally it will print its progress to the screen so you can judge how long it might take. This example will take less than a minute.

In [22]:

%%bash
pyrad -p params.txt -s 6

vsearch v1.0.7_linux_x86_64, 7.5GB RAM, 4 cores
https://github.com/torognes/vsearch


	finished clustering


     ------------------------------------------------------------
      pyRAD : RADseq for phylogenetics & introgression analyses
     ------------------------------------------------------------


	step 6: clustering across 12 samples at '.85' similarity 

Reading file /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/clust.85/cat.haplos_ 100%
1074692 nt in 11963 seqs, min 49, max 97, avg 90
Indexing sequences 100%
Counting unique k-mers 100%
Clustering 100%
Writing clusters 100%
Clusters: 997 Size min 11, max 12, avg 12.0
Singletons: 0, 0.0% of seqs, 0.0% of clusters

Step 7: Statistics and output files¶

Alignment of loci, statistics, and output of various formatted data files.

In [23]:

%%bash
pyrad -p params.txt -s 7

	ingroup 1A0,1B0,1C0,1D0,2E0,2F0,2G0,2H0,3I0,3J0,3K0,3L0
	addon 
	exclude 
	
	final stats written to:
	 /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/stats/c85d6m4p3.stats
	output files being written to:
	 /home/deren/Dropbox/Public/PyRAD_TUTORIALS/tutorial_pairddRAD/outfiles/ directory

	writing nexus file
	writing phylip file
	  + writing full SNPs file
	  + writing unlinked SNPs file
	  + writing STRUCTURE file
	  + writing geno file
	  ** must enter group/clade assignments for treemix output 
	writing vcf file
	writing alleles file
	  ** must enter group/clade assignments for migrate-n output


     ------------------------------------------------------------
      pyRAD : RADseq for phylogenetics & introgression analyses
     ------------------------------------------------------------

...

Stats for this run. In this case 996 loci were shared across all 12 samples (remember a few loci were filtered out when we were testing for adapter sequences).

In [24]:

%%bash
cat stats/c85d6m4p3.stats


997         ## loci with > minsp containing data
997         ## loci with > minsp containing data & paralogs removed
997         ## loci with > minsp containing data & paralogs removed & final filtering

## number of loci recovered in final data set for each taxon.
taxon	nloci
1A0	997
1B0	997
1C0	997
1D0	997
2E0	997
2F0	996
2G0	997
2H0	997
3I0	997
3J0	997
3K0	997
3L0	997


## nloci = number of loci with data for exactly ntaxa
## ntotal = number of loci for which at least ntaxa have data
ntaxa	nloci	saved	ntotal
1	-
2	-		-
3	-		-
4	0	*	997
5	0	*	997
6	0	*	997
7	0	*	997
8	0	*	997
9	0	*	997
10	0	*	997
11	1	*	997
12	996	*	996


## nvar = number of loci containing n variable sites (pis+autapomorphies).
## sumvar = sum of variable sites (SNPs).
## pis = number of loci containing n parsimony informative sites.
## sumpis = sum of parsimony informative sites.
	nvar	sumvar	PIS	sumPIS
0	21	0	295	0
1	78	78	332	332
2	168	414	222	776
3	187	975	104	1088
4	204	1791	30	1208
5	160	2591	12	1268
6	94	3155	2	1280
7	38	3421	0	1280
8	33	3685	0	1280
9	6	3739	0	1280
10	6	3799	0	1280
11	2	3821	0	1280
total var= 3821
total pis= 1280
sampled unlinked SNPs= 976
sampled unlinked bi-allelic SNPs= 928

Output files¶

In [25]:

%%bash
ls outfiles/

c85d6m4p3.alleles
c85d6m4p3.excluded_loci
c85d6m4p3.geno
c85d6m4p3.loci
c85d6m4p3.nex
c85d6m4p3.phy
c85d6m4p3.snps
c85d6m4p3.str
c85d6m4p3.unlinked_snps
c85d6m4p3.vcf

Stats files¶

In [26]:

%%bash
ls stats/

c85d6m4p3.stats
Pi_E_estimate.txt
s1.sorting.txt
s2.rawedit.txt
s3.clusters.txt
s5.consens.txt