This tutorial is meant as a walkthrough for a single-end RADseq analyses. If you have not yet read the full tutorial, you should start there for a broader description of how pyRAD works. If you are new to RADseq analyses, this tutorial will provide a simple overview of how to execute pyRAD, what the data files look like, and how to check that your analysis is working, and the expected output formats.
Each cell in this tutorial begins with the header (%%bash) indicating that the code should be executed in a command line shell, for example by copying and pasting the text into your terminal (but excluding the %%bash header).
Begin by executing the command below. This will download an example simulated RADseq data set and unarchive it into your current directory.
%%bash
wget -q dereneaton.com/downloads/simRADs.zip
unzip simRADs.zip
Archive: simRADs.zip inflating: simRADs.barcodes inflating: simRADs_R1.fastq.gz
%%bash
## I have pyRAD in my $PATH so that I can call it by simply typing pyRAD.
## If you haven't done this then you will need to type the full path to
## the pyRAD script to execute it.
## call pyRAD with the (-n) option
pyRAD -n
new params.txt file created
The params file lists on each line one parameter followed by a ## mark, after which any comments can be left. In the comments section there is a description of the parameter and in parentheses the step of the analysis affected by the parameter. Lines 1-12 are required, the remaining lines are optional. The params.txt file is further described in the general tutorial.
%%bash
cat params.txt
==** parameter inputs for pyRAD version 3.0a **========================== affected step == ./ ## 1. Working directory (all) ./*.fastq.gz ## 2. Loc. of non-demultiplexed files (if not line 16) (s1) ./*.barcodes ## 3. Loc. of barcode file (if not line 16) (s1) vsearch ## 4. command (or path) to call vsearch (or usearch) (s3,s6) muscle ## 5. command (or path) to call muscle (s3,s7) TGCAG ## 6. Restriction overhang (e.g., C|TGCAG -> TGCAG) (s1,s2) 2 ## 7. N processors (parallel) (all) 6 ## 8. Mindepth: min coverage for a cluster (s4,s5) 4 ## 9. NQual: max # sites with qual < 20 (line 18) (s2) .88 ## 10. Wclust: clustering threshold as a decimal (s3,s6) rad ## 11. Datatype: rad,gbs,ddrad,pairgbs,pairddrad,merge (all) 4 ## 12. MinCov: min samples in a final locus (s7) 3 ## 13. MaxSH: max inds with shared hetero site (s7) c88d6m4p3 ## 14. Prefix name for final output (no spaces) (s7) ==== optional params below this line =================================== affected step == ## 15.opt.: select subset (prefix* only selector) (s2-s7) ## 16.opt.: add-on (outgroup) taxa (list or prefix*) (s6,s7) ## 17.opt.: exclude taxa (list or prefix*) (s7) ## 18.opt.: loc. of de-multiplexed data (s2) ## 19.opt.: maxM: N mismatches in barcodes (def= 1) (s1) ## 20.opt.: phred Qscore offset (def= 33) (s2) ## 21.opt.: filter: def=0=NQual 1=NQual+adapters. 2=strict (s2) ## 22.opt.: a priori E,H (def= 0.001,0.01, if not estimated) (s5) ## 23.opt.: maxN: max Ns in a cons seq (def=5) (s5) ## 24.opt.: maxH: max heterozyg. sites in cons seq (def=5) (s5) ## 25.opt.: ploidy: max alleles in cons seq (def=2;see docs) (s4,s5) ## 26.opt.: maxSNPs: (def=100). Paired (def=100,100) (s7) ## 27.opt.: maxIndels: within-clust,across-clust (def. 3,99) (s3,s7) ## 28.opt.: random number seed (def. 112233) (s3,s6,s7) ## 29.opt.: trim overhang left,right on final loci, def(0,0) (s7) ## 30.opt.: output formats: p,n,a,s,v,u,t,m,k,g,* (see docs) (s7) ## 31.opt.: call maj. consens if depth < stat. limit (def=0) (s5) ## 32.opt.: keep trimmed reads (def=0). Enter min length. (s2) ## 33.opt.: max stack size (int), def= max(500,mean+2*SD) (s3) ## 34.opt.: minDerep: exclude dereps with <= N copies, def=1 (s3) ## 35.opt.: use hierarchical clustering (def.=0, 1=yes) (s6) ## 36.opt.: repeat masking (def.=1='dust' method, 0=no) (s3,s6) ## 37.opt.: vsearch threads per job (def.=6; see docs) (s3,s6) ==== optional: list group/clade assignments below this line (see docs) ==================
%%bash
sed -i '/## 7. /c\2 ## 7. N processors... ' params.txt
sed -i '/## 10. /c\.85 ## 10. lowered clust thresh... ' params.txt
sed -i '/## 14. /c\c85m4p3 ## 14. outprefix... ' params.txt
sed -i '/## 24./c\8 ## 24. maxH raised ... ' params.txt
sed -i '/## 30./c\* ## 30. all output formats... ' params.txt
%%bash
cat params.txt
==** parameter inputs for pyRAD version 3.0a **========================== affected step == ./ ## 1. Working directory (all) ./*.fastq.gz ## 2. Loc. of non-demultiplexed files (if not line 16) (s1) ./*.barcodes ## 3. Loc. of barcode file (if not line 16) (s1) vsearch ## 4. command (or path) to call vsearch (or usearch) (s3,s6) muscle ## 5. command (or path) to call muscle (s3,s7) TGCAG ## 6. Restriction overhang (e.g., C|TGCAG -> TGCAG) (s1,s2) 2 ## 7. N processors... 6 ## 8. Mindepth: min coverage for a cluster (s4,s5) 4 ## 9. NQual: max # sites with qual < 20 (line 18) (s2) .85 ## 10. lowered clust thresh... rad ## 11. Datatype: rad,gbs,ddrad,pairgbs,pairddrad,merge (all) 4 ## 12. MinCov: min samples in a final locus (s7) 3 ## 13. MaxSH: max inds with shared hetero site (s7) c85m4p3 ## 14. outprefix... ==== optional params below this line =================================== affected step == ## 15.opt.: select subset (prefix* only selector) (s2-s7) ## 16.opt.: add-on (outgroup) taxa (list or prefix*) (s6,s7) ## 17.opt.: exclude taxa (list or prefix*) (s7) ## 18.opt.: loc. of de-multiplexed data (s2) ## 19.opt.: maxM: N mismatches in barcodes (def= 1) (s1) ## 20.opt.: phred Qscore offset (def= 33) (s2) ## 21.opt.: filter: def=0=NQual 1=NQual+adapters. 2=strict (s2) ## 22.opt.: a priori E,H (def= 0.001,0.01, if not estimated) (s5) ## 23.opt.: maxN: max Ns in a cons seq (def=5) (s5) 8 ## 24. maxH raised ... ## 25.opt.: ploidy: max alleles in cons seq (def=2;see docs) (s4,s5) ## 26.opt.: maxSNPs: (def=100). Paired (def=100,100) (s7) ## 27.opt.: maxIndels: within-clust,across-clust (def. 3,99) (s3,s7) ## 28.opt.: random number seed (def. 112233) (s3,s6,s7) ## 29.opt.: trim overhang left,right on final loci, def(0,0) (s7) * ## 30. all output formats... ## 31.opt.: call maj. consens if depth < stat. limit (def=0) (s5) ## 32.opt.: keep trimmed reads (def=0). Enter min length. (s2) ## 33.opt.: max stack size (int), def= max(500,mean+2*SD) (s3) ## 34.opt.: minDerep: exclude dereps with <= N copies, def=1 (s3) ## 35.opt.: use hierarchical clustering (def.=0, 1=yes) (s6) ## 36.opt.: repeat masking (def.=1='dust' method, 0=no) (s3,s6) ## 37.opt.: vsearch threads per job (def.=6; see docs) (s3,s6) ==== optional: list group/clade assignments below this line (see docs) ==================
Let's take a look at what the raw data look like.
Your input data will be in fastQ format, usually ending in .fq or .fastq. Your data could be split among multiple files, or all within a single file (de-multiplexing goes much faster if they happen to be split into multiple files). The file/s may be compressed with gzip so that they have a .gz ending, but they do not need to be. The location of these files should be entered on line 2 of the params file. Below are the first three reads in the example file.
%%bash
less simRADs_R1.fastq.gz | head -n 12 | cut -c 1-90
@lane1_fakedata0_R1_0 1:N:0: TTTTAATGCAGTGAGTGGCCATGCAATATATATTTACGGGCGCATAGAGACCCTCAAGACTGCCAACCGGGTGAATCACTATTTGCTTAG + BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB @lane1_fakedata0_R1_1 1:N:0: TTTTAATGCAGTGAGTGGCCATGCAATATATATTTACGGGCGCATAGAGACCCTCAAGACTGCCAACCGGGTGAATCACTATTTGCTTAG + BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB @lane1_fakedata0_R1_2 1:N:0: TTTTAATGCAGTGAGTGGCCATGCAATATATATTTACGGGCGCATAGAGACCCTCAAGACTGCCAACCGGGTGAATCACTATTTGCTTAG + BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
Each read takes four lines. The first is the name of the read (its location on the plate). The second line contains the sequence data. The third line is a spacer. And the fourth line the quality scores for the base calls. In this case arbitrarily high since the data were simulated.
These are 100 bp single-end reads prepared as RADseq. The first six bases form the barcode and the next five bases (TGCAG) the restriction site overhang. All following bases make up the sequence data.
This step uses information in the barcodes file to sort data into a separate file for each sample. Below is the barcodes file, with sample names and their barcodes each on a separate line with a tab between them.
%%bash
cat simRADs.barcodes
1A0 CATCAT 1B0 TTTTAA 1C0 AGGGGA 1D0 TAAGGT 2E0 TTTATA 2F0 GAGTAT 2G0 ATAGAG 2H0 ATGAGG 3I0 GGGTTT 3J0 TTAAAA 3K0 GGATTG 3L0 AAGAAG
Step 1 writes the de-multiplexed data to a new file for each sample in a new directory created within the working directory called fastq/.
%%bash
pyRAD -p params.txt -s 1
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ step 1: sorting reads by barcode .
You can see that this created a new file for each sample in the directory 'fastq/'
%%bash
ls fastq/
1A0_R1.fq.gz 1B0_R1.fq.gz 1C0_R1.fq.gz 1D0_R1.fq.gz 2E0_R1.fq.gz 2F0_R1.fq.gz 2G0_R1.fq.gz 2H0_R1.fq.gz 3I0_R1.fq.gz 3J0_R1.fq.gz 3K0_R1.fq.gz 3L0_R1.fq.gz
A new directory called stats will also have been created. Each step of the pyRAD analysis will create a new stats output file in this directory. The stats output for step 1 is below:
%%bash
cat stats/s1.sorting.txt
file Nreads cut_found bar_matched simRADs_R1.fastq.gz 480000 480000 480000 sample true_bar obs_bars N_obs 3L0 AAGAAG AAGAAG 40000 1C0 AGGGGA AGGGGA 40000 2G0 ATAGAG ATAGAG 40000 2H0 ATGAGG ATGAGG 40000 1A0 CATCAT CATCAT 40000 2F0 GAGTAT GAGTAT 40000 3K0 GGATTG GGATTG 40000 3I0 GGGTTT GGGTTT 40000 1D0 TAAGGT TAAGGT 40000 3J0 TTAAAA TTAAAA 40000 2E0 TTTATA TTTATA 40000 1B0 TTTTAA TTTTAA 40000 nomatch _ 0
This step filters reads based on quality scores, and can be used to detect Illumina adapters in your reads, which is sometimes a problem with homebrew type library preparations. Here the filter is set to the default value of 0, meaning it filters only based on quality scores of base calls. The filtered files are written to a new directory called edits/.
%%bash
pyRAD -p params.txt -s 2
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ step 2: editing raw reads ............
%%bash
ls edits/
1A0.edit 1B0.edit 1C0.edit 1D0.edit 2E0.edit 2F0.edit 2G0.edit 2H0.edit 3I0.edit 3J0.edit 3K0.edit 3L0.edit
The filtered data are written in fasta format (quality scores removed) into a new directory called edits/. Below I show a preview of the file which you can view most easily using the less
command (I use head
here to make it fit in the text window better).
%%bash
head -n 10 edits/1A0.edit | cut -c 1-80
>1A0_0_r1 TGCAGTGAGTGGCCATGCAATATATATTTACGGGCTCATAGAGACCCTCAAGACTGCCAACCGGGTGAATCACTATTTGC >1A0_1_r1 TGCAGTGAGTGGCCATGCAATATATATTTACGGGCTCATAGAGACCCTCAAGACTGCCAACCGGGTGAATCACTATTTGC >1A0_2_r1 TGCAGTGAGTGGCCATGCAATATATATTTACGGGCTCATAGAGACCCTCAAGACTGCCAACCGGGTGAATCACTATTTGC >1A0_3_r1 TGCAGTGAGTGGCCATGCAATATATATTTACGGGCTCATAGAGACCCTCAAGACTGCCAACCGGGTGAATCACTATTTGC >1A0_4_r1 TGCAGTGAGTGGCCATGCAATATATATTTACGGGCTCATAGAGACCCTCAAGACTGCCAACCGGGTGAATCACTATTTGC
Step 3 de-replicates and then clusters reads within each sample by the set clustering threshold and writes the clusters to new files in a directory called clust.xx
%%bash
pyRAD -p params.txt -s 3
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ de-replicating files for clustering... step 3: within-sample clustering of 12 samples at '.85' similarity. Running 2 parallel jobs with up to6 threads per job. If needed, adjust to avoid CPU and MEM limits sample 1C0 finished, 2000 loci sample 3J0 finished, 2000 loci sample 3K0 finished, 2000 loci sample 1B0 finished, 2000 loci sample 3L0 finished, 2000 loci sample 2F0 finished, 2000 loci sample 3I0 finished, 2000 loci sample 2H0 finished, 2000 loci sample 2G0 finished, 2000 loci sample 2E0 finished, 2000 loci sample 1A0 finished, 2000 loci sample 1D0 finished, 2000 loci
Once again, I recommend you use the unix command 'less' to look at the clustS files. These contain each cluster separated by "//". For the first few clusters below you can see that there is one or two alleles in the cluster and one or a few reads that contained a (simulated) sequencing error.
%%bash
less clust.85/1A0.clustS.gz | head -n 26 | cut -c 1-80
>1A0_2540_r1;size=17; TGCAGTGTAACGTTGTATCCATCGAGTCGATCATAGCCTAAAATAAGTAACACTAATCAGGCGCGCTGGTTGGGGGATCA >1A0_2541_r1;size=1;+ TGCAGTGTAACGTTGTATCCAACGAGTCGATCATAGCCTAAAATAAGTAACACTAATCAGGCGCGCTGGTTGGGGGATCA >1A0_2549_r1;size=1;+ TGCAGTGTAACGTTGTATCCATCGAGTCGATCATAGCCTAAAATAAGTAACGCTAATCAGGCGCGCTGGTTGGGGGATCA >1A0_2551_r1;size=1;+ TGCAGTGTAACGTTGTATCCATCGAGTCGATCATAGCCTAAAATAAGTAACACTAATCAGGCGCGTTGGTTGGGGGATCA // // >1A0_2140_r1;size=19; TGCAGCTCCGTCACTGCTCAGCGAACCTACTATCTAGTCGGAAAAGGTTCCGGCCCTTATGCTAAGTGCAAGCTGCCAGT >1A0_2155_r1;size=1;+ TGCAGCTCCCTCACTGCTCAGCGAACCTACTATCTAGTCGGAAAAGGTTCCGGCCCTTATGCTAAGTGCAAGCTGCCAGT // // >1A0_8280_r1;size=10; TGCAGCGTATATGATCAGAACCGGGTGAGTGGGTACCGCGAACCGAAAGGCATCGAAAGTTTAGCGCAGCACTAATCTCA >1A0_8290_r1;size=8;+ TGCAGCGTATATGATCAGAACCGGGTGAGTGGGTACCGCGAACCGAAAGGCACCGAAAGTTTAGCGCAGCACTAATCTCA >1A0_8297_r1;size=1;+ TGCAGCGTATATGATCAGAACCGGGTGAGTGGGAACCGCGAACCGAAAGGCACCGAAAGTTTAGCGCAGCACTAATCTCA >1A0_8292_r1;size=1;+ TGCAGCCTATATGATCAGAACCGGGTGAGTGGGTACCGCGAACCGAAAGGCACCGAAAGTTTAGCGCAGCACTAATCTCA // //
The stats output tells you how many clusters were found, and their mean depth of coverage. It also tells you how many pass your minimum depth setting. You can use this information to decide if you wish to increase or decrease the mindepth before it is applied for making consensus base calls in steps 4 & 5.
%%bash
head -n 40 stats/s3.clusters.txt
taxa total dpt.me dpt.sd d>5.tot d>5.me d>5.sd badpairs 1A0 2000 20.0 0.0 2000 20.0 0.0 0 1B0 2000 20.0 0.0 2000 20.0 0.0 0 1C0 2000 20.0 0.0 2000 20.0 0.0 0 1D0 2000 20.0 0.0 2000 20.0 0.0 0 2E0 2000 20.0 0.0 2000 20.0 0.0 0 2F0 2000 20.0 0.0 2000 20.0 0.0 0 2G0 2000 20.0 0.0 2000 20.0 0.0 0 2H0 2000 20.0 0.0 2000 20.0 0.0 0 3I0 2000 20.0 0.0 2000 20.0 0.0 0 3J0 2000 20.0 0.0 2000 20.0 0.0 0 3K0 2000 20.0 0.0 2000 20.0 0.0 0 3L0 2000 20.0 0.0 2000 20.0 0.0 0 ## total = total number of clusters, including singletons ## dpt.me = mean depth of clusters ## dpt.sd = standard deviation of cluster depth ## >N.tot = number of clusters with depth greater than N ## >N.me = mean depth of clusters with depth greater than N ## >N.sd = standard deviation of cluster depth for clusters with depth greater than N ## badpairs = mismatched 1st & 2nd reads (only for paired ddRAD data) HISTOGRAMS sample: 1A0 bins depth_histogram cnts : 0------------50-------------100% 0 0 5 0 10 0 15 0 20 ******************************* 2000 25 0 30 0 35 0 40 0 50 0 100 0
%%bash
pyRAD -p params.txt -s 4
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ step 4: estimating error rate and heterozygosity ............
%%bash
less stats/Pi_E_estimate.txt
taxa H E 3K0 0.00135982 0.00048078 1C0 0.00134858 0.00048372 1D0 0.00135375 0.00048822 3I0 0.00129751 0.00048694 2H0 0.00133223 0.00049211 2F0 0.00135365 0.0004995 2E0 0.00126915 0.00051556 1B0 0.00149924 0.00049663 1A0 0.00136043 0.00051028 3J0 0.00144422 0.0005089 2G0 0.00138185 0.00051206 3L0 0.00143349 0.00051991
%%bash
pyRAD -p params.txt -s 5
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ step 5: creating consensus seqs for 12 samples, using H=0.00137 E=0.00050 ............
%%bash
less stats/s5.consens.txt
taxon nloci f1loci f2loci nsites npoly poly 2G0 2000 2000 2000 178001 246 0.001382 3L0 2000 2000 2000 178003 255 0.0014326 3J0 2000 2000 2000 178003 257 0.0014438 1A0 2000 2000 2000 178002 242 0.0013595 2E0 2000 2000 2000 178002 226 0.0012696 1B0 2000 2000 2000 178005 267 0.0015 2F0 2000 2000 2000 178002 241 0.0013539 2H0 2000 2000 2000 178001 237 0.0013315 3I0 2000 2000 2000 178003 231 0.0012977 1D0 2000 2000 2000 178002 241 0.0013539 1C0 2000 2000 2000 178002 240 0.0013483 3K0 2000 2000 2000 178001 242 0.0013595 ## nloci = number of loci ## f1loci = number of loci with >N depth coverage ## f2loci = number of loci with >N depth and passed paralog filter ## nsites = number of sites across f loci ## npoly = number of polymorphic sites in nsites ## poly = frequency of polymorphic sites
Step 6 clusters consensus sequences across samples. It will print its progress to the screen. This uses 6 threads by default. If you enter 0 for param 37 it will use all available processors.
%%bash
pyRAD -p params.txt -s 6
vsearch v1.0.3_linux_x86_64, 7.5GB RAM, 4 cores https://github.com/torognes/vsearch finished clustering
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ step 6: clustering across 12 samples at '.85' similarity Reading file /home/deren/Dropbox/Public/PYRAD_TUTORIALS/tutorial_RAD/clust.85/cat.haplos_ 100% 2256027 nt in 24000 seqs, min 94, max 96, avg 94 Indexing sequences 100% Counting unique k-mers 100% Clustering 100% Writing clusters 100% Clusters: 2000 Size min 12, max 12, avg 12.0 Singletons: 0, 0.0% of seqs, 0.0% of clusters
The final step is to output data only for the loci that you want to have included in your data set. This filters once again for potential paralogs or highly repetitive regions, and includes options to minimize the amount of missing data in the output.
%%bash
pyRAD -p params.txt -s 7
ingroup 1A0,1B0,1C0,1D0,2E0,2F0,2G0,2H0,3I0,3J0,3K0,3L0 addon exclude Warning: data set c85m4p3.loci already exists Skipping re-alignment. Creating extra data formats from the existing file To create a new .loci file and alignment move/delete c85m4p3.loci or change the outname prefix in the params file writing nexus file writing phylip file writing unlinked SNPs file + writing full SNPs file + writing STRUCTURE file + writing geno file ** must enter group/clade assignments for treemix output writing vcf file writing alleles file ** must enter group/clade assignments for migrate-n output
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------
%%bash
less stats/c85m4p3.stats
2000 ## loci with > minsp containing data 2000 ## loci with > minsp containing data & paralogs removed 2000 ## loci with > minsp containing data & paralogs removed & final filtering ## number of loci recovered in final data set for each taxon. taxon nloci 1A0 2000 1B0 2000 1C0 2000 1D0 2000 2E0 2000 2F0 2000 2G0 2000 2H0 2000 3I0 2000 3J0 2000 3K0 2000 3L0 2000 ## nloci = number of loci with data for exactly ntaxa ## ntotal = number of loci for which at least ntaxa have data ntaxa nloci saved ntotal 1 - 2 - - 3 - - 4 0 * 2000 5 0 * 2000 6 0 * 2000 7 0 * 2000 8 0 * 2000 9 0 * 2000 10 0 * 2000 11 0 * 2000 12 2000 * 2000 ## var = number of loci containing n variable sites. ## pis = number of loci containing n parsimony informative var sites. n var PIS 0 41 551 1 1083 699 2 945 475 3 637 187 4 367 69 5 182 13 6 59 3 7 18 2 8 12 1 9 1 0 total var= 7848 total pis= 2591 sampled unlinked SNPs= 1959 sampled bi-allelic SNPs= 1911 sampled unlinked SNPs= 1959 sampled bi-allelic SNPs= 1911
We created 11 output files from our analysis. The standard two (.loci and .excluded_loci), as well as the 9 additional ones listed in the params file. These are all shown below.
%%bash
ls outfiles/
c85m4p3.alleles c85m4p3.excluded_loci c85m4p3.geno c85m4p3.loci c85m4p3.nex c85m4p3.phy c85m4p3.snps c85m4p3.str c85m4p3.unlinked_snps c85m4p3.vcf
The ".loci" file contains each locus listed in a fasta-like format that also shows which sites are variable below each locus. Autapomorphies are listed as '-' and shared SNPs as '*'. This is a custom format that is human readable and also used as input to perform D-statistic tests in pyRAD. This is the easiest way to visualize your results. I recommend viewing the file with the command less
. Below I use a head and cut to make it easy to view in this window.
%%bash
head -n 39 outfiles/c85m4p3.loci | cut -c 1-75
>1A0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGCGATCCTAGTTATCATAAG >1B0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAG >1C0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAG >1D0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAG >2E0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAG >2F0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAG >2G0 TCTCTCTCGCGATCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAG >2H0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAG >3I0 TCTCTCTCGCGGTCGATGATATTGCGAGGGGAGTAGCAGGCCAAACCGAGATACTAGTTATCATAAG >3J0 TCTCTCTCGCGGTCGATGATATTGCGAGGGGAGTAGCAGGCCAAACCGAGATACTAGTTATCATAAG >3K0 TCTCTCTCGCGGTCGATGATCTTGCGAGGGAAGTAGCAGGCCAAATCGAGATACTAGTTATCATAAG >3L0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAG // - - * * - * >1A0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCAC >1B0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCAC >1C0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCAC >1D0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCAC >2E0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCAC >2F0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCAC >2G0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCAC >2H0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCAC >3I0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCAC >3J0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAASTAATGTAAAGGCAC >3K0 GTTCTGGACAASACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCAC >3L0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCAM // - - - >1A0 TCCGATAGCCAGGTCTCGAGGTCGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCG >1B0 TCCGATAGCCAGGTCTCGAGGTCGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCG >1C0 TCCGATAGCCAGGTCTCGAGGTCGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCG >1D0 TCCGATAGCCAGGTCTCGAGGTCGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCG >2E0 TCCGATAGCCAGGTCTCGAGGTCGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCG >2F0 TCCGATAGCCAGGTCTCGAGGTCGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCG >2G0 TCCGATAGCCAGGTCTCGAGGTCGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCG >2H0 TCCGATAGCCAGGTCTCGAGGTCGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCG >3I0 TCCGATAGCCAGGACTCGAGGTCGACTACCGGCGTGATGTCGGGTTCACCCCCCGAGCATCGGTGCG >3J0 TCCGATAGCCAGGACTCGAGGTCGACTACCGGCGTGATGTCGGGTTCACCCCCCGGGCATCGGTGCG >3K0 TCCGATAGCCAGGACTCGAGGTCGACTACCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCG >3L0 TCCGATAGCCAGGACTCGAGGTCGACTACCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCG // * * * -
%%bash
head -n 50 outfiles/c85m4p3.phy | cut -c 1-85
12 178083 1A0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGCGATCCTAGTTATCATAAGAATCTTATTGAT 1B0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTGAT 1C0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTGAT 1D0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTGAT 2E0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTGAT 2F0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTGAT 2G0 TCTCTCTCGCGATCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTGAT 2H0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTGAT 3I0 TCTCTCTCGCGGTCGATGATATTGCGAGGGGAGTAGCAGGCCAAACCGAGATACTAGTTATCATAAGAATCTTATTGAT 3J0 TCTCTCTCGCGGTCGATGATATTGCGAGGGGAGTAGCAGGCCAAACCGAGATACTAGTTATCATAAGAATCTTATTGAT 3K0 TCTCTCTCGCGGTCGATGATCTTGCGAGGGAAGTAGCAGGCCAAATCGAGATACTAGTTATCATAAGAATCTTATTGAK 3L0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTGAT
%%bash
head -n 50 outfiles/c85m4p3.nex | cut -c 1-85
#NEXUS BEGIN DATA; DIMENSIONS NTAX=12 NCHAR=178083; FORMAT DATATYPE=DNA MISSING=N GAP=- INTERLEAVE=YES; MATRIX 1B0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG 2G0 TCTCTCTCGCGATCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG 2F0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG 1A0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGCGATCCTAGTTATCATAAGAATCTTATTG 2H0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG 2E0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG 3I0 TCTCTCTCGCGGTCGATGATATTGCGAGGGGAGTAGCAGGCCAAACCGAGATACTAGTTATCATAAGAATCTTATTG 1C0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG 3L0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG 1D0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG 3J0 TCTCTCTCGCGGTCGATGATATTGCGAGGGGAGTAGCAGGCCAAACCGAGATACTAGTTATCATAAGAATCTTATTG 3K0 TCTCTCTCGCGGTCGATGATCTTGCGAGGGAAGTAGCAGGCCAAATCGAGATACTAGTTATCATAAGAATCTTATTG 1B0 GACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGGGGATTGAATAC 2G0 GACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGGGGATTGAATAC 2F0 GACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGGGGATTGAATAC 1A0 GACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGGGGATTGAATAC 2H0 GACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGGGGATTGAATAC 2E0 GACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGGGGATTGAATAC 3I0 GACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGGGGATTGAATAC 1C0 GACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGGGGATTGAATAC 3L0 GACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCAMGTGTCCAAGGGGATTGAATAC 1D0 GACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGGGGATTGAATAC 3J0 GACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAASTAATGTAAAGGCACGTGTCCAAGGGGATTGAATAC 3K0 SACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGGGGATTGAATAC 1B0 CGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCGAAGGATATGGATACGCCGAGAGTTGCTACCTA 2G0 CGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCGAAGGATATGGATACGCCGAGAGTTGCTACCTA 2F0 CGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCGAAGGATATGGATACGCCGAGAGTTGCTACCTA 1A0 CGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCGAAGGATATGGATACGCCGAGAGTTGCTACCTA 2H0 CGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCGAAGGATATGGATACGCCGAGAGTTGCTMCCTA 2E0 CGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCGAAGGATATGGATACGCCGAGAGTTGCTACCTA 3I0 CGACTACCGGCGTGATGTCGGGTTCACCCCCCGAGCATCGGTGCGAAGGATATGGATACGCCGAGAGTTGCTACCTA 1C0 CGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCGAAGGATATGGATACGCCGAGAGTTGCTACCTA 3L0 CGACTACCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCGAAGGATATGGATACGCCGAGAGTTGCTACCTA 1D0 CGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCGAAGGATATGGATACGCCGAGAGTTGCTACCTA 3J0 CGACTACCGGCGTGATGTCGGGTTCACCCCCCGGGCATCGGTGCGAAGGATATGGATACGCCGAGAGTTGCTACCTA 3K0 CGACTACCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCGAAGGATATGGATACGCCGAGAGTTGCTACCTA 1B0 GGCTAGTTGTGCTATGAGGAGTATACAAGACTTATCTACAGTGCCCGTCAGTGCAGGCGGGCTTAGCTTATTTGAAT 2G0 GGCTAGTTGTGCTATGAGGAGTATACAAGACTTATCTACAGTGCCCGTCAGTGCAGGCGGGCTTAGCTTATTTGAAT 2F0 GGCTAGTTGCGCTATGAGGAGTATACAAGACTTATCTACAGTGCCCGTCAGTGCAGKCGGGCTTAGCTTATTTGAAT 1A0 GGCTAGTTGTGCTATGAGGAGTATACAAGACTTATCTACAGTGCCCGTCAGTGCAGGCGGGCTTAGCTTATTTGAAT 2H0 GGCTAGTTGTGCTATGAGGAGTATACAAGACTTATCTACAGTGCCCGTCAGTGCAGGCGGGCTTAGCTTATTTGAAT 2E0 GGCTAGTTGTGCTATGAGGAGTATACAAGACTTATCTACAGTGCCCGTCAGTGCAGKCGGGCTTAGCTTATTTGAAT
%%bash
head -n 50 outfiles/c85m4p3.alleles| cut -c 1-85
>1A0_0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGCGATCCTAGTTATCATAAGAATCTTATTG >1A0_1 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGCGATCCTAGTTATCATAAGAATCTTATTG >1B0_0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG >1B0_1 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG >1C0_0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG >1C0_1 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG >1D0_0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG >1D0_1 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG >2E0_0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG >2E0_1 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG >2F0_0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG >2F0_1 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG >2G0_0 TCTCTCTCGCGATCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG >2G0_1 TCTCTCTCGCGATCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG >2H0_0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG >2H0_1 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG >3I0_0 TCTCTCTCGCGGTCGATGATATTGCGAGGGGAGTAGCAGGCCAAACCGAGATACTAGTTATCATAAGAATCTTATTG >3I0_1 TCTCTCTCGCGGTCGATGATATTGCGAGGGGAGTAGCAGGCCAAACCGAGATACTAGTTATCATAAGAATCTTATTG >3J0_0 TCTCTCTCGCGGTCGATGATATTGCGAGGGGAGTAGCAGGCCAAACCGAGATACTAGTTATCATAAGAATCTTATTG >3J0_1 TCTCTCTCGCGGTCGATGATATTGCGAGGGGAGTAGCAGGCCAAACCGAGATACTAGTTATCATAAGAATCTTATTG >3K0_0 TCTCTCTCGCGGTCGATGATCTTGCGAGGGAAGTAGCAGGCCAAATCGAGATACTAGTTATCATAAGAATCTTATTG >3K0_1 TCTCTCTCGCGGTCGATGATCTTGCGAGGGAAGTAGCAGGCCAAATCGAGATACTAGTTATCATAAGAATCTTATTG >3L0_0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG >3L0_1 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTATTG // - - * * - * >1A0_0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >1A0_1 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >1B0_0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >1B0_1 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >1C0_0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >1C0_1 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >1D0_0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >1D0_1 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >2E0_0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >2E0_1 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >2F0_0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >2F0_1 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >2G0_0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >2G0_1 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >2H0_0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >2H0_1 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >3I0_0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >3I0_1 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >3J0_0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAAGTAATGTAAAGGCACGTGTCCAAGG >3J0_1 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >3K0_0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >3K0_1 GTTCTGGACAACACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >3L0_0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAAGG >3L0_1 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCAAGTGTCCAAGG // - - -
%%bash
head -n 50 outfiles/c85m4p3.str | cut -c 1-20
1A0 1 3 2 1 1A0 1 3 2 1 1B0 1 3 2 1 1B0 1 3 2 1 1C0 1 3 2 1 1C0 1 3 2 1 1D0 1 3 2 1 1D0 1 3 2 1 2E0 1 3 2 1 2E0 1 3 2 1 2F0 1 3 2 3 2F0 1 3 2 3 2G0 1 3 2 1 2G0 1 3 2 1 2H0 1 3 2 1 2H0 1 3 2 1 3I0 3 3 0 1 3I0 3 3 0 1 3J0 3 3 2 1 3J0 3 3 2 1 3K0 1 3 2 1 3K0 1 3 2 1 3L0 1 3 2 1 3L0 1 0 2 1
%%bash
head -n 40 outfiles/c85m4p3.geno
222222220022 222222222221 222222220222 222220222222 222222220000 222222221222 222222222220 221222222222 222222022222 222222220000 222200000000 222222220022 222222212222 222122222222 222200000000 212222222222 222022222222 222222212222 222222122222 222200000000 222222022222 222222200000 222222220000 222200000000 220222222222 222222221222 222222202222 222200022222 222200000000 222222212222 222222202222 222222212222 221222222222 221222222222 212222222222 222222202222 222200020000 222221222222 220222222222 222222022222
%%bash
head -n 50 outfiles/c85m4p3.snps | cut -c 1-85
## 12 taxa, 2000 loci, 7888 snps 1A0 GAATCCT GCC TTAG AATTAT GTGYTGA GC TAYC CT AGTACC CCC AATC GAAC TCGT GCA TTC CG 1B0 GAATACT GCC TTAG AATTAT GTGYTGA GC TATC CT AGTTCC CCC AATC GAAC TCGW GCA TTC CS 1C0 GAATACT GCC TTAG AATTAT GTGTTGA GC TATC CY AGTACC CCC WATC GAAC TCGA GCA TTC CG 1D0 GAATACT GCC TTAG AATTAT GTGCTGA GC TATT CT ATTACC CCC AATC GAAC GCGA GCM TTC CG 2E0 GAATACT GCC TTAG AATTAT KTGCTGA GC TATC CT AGTACC CTC AATG GAAC TCGA GCA ATC GG 2F0 GAATACT GCC TTAG AATCAT KTGCYGA GC TATC CT AGTACC CTC AATG GAAC TCGA SCA ATC GG 2G0 AAATACT GCC TTAG AATTAT GTGCTGA GC TATC CT AGGACC CTC AATG GARC TCSA GCA ATC GG 2H0 GAATACT GCC TTAG MATTAT GTGCTGA GA TATC CT TGTAAC GTC AATG GAAC TYGA GCA ATG GG 3I0 GAGCAAT GCC AACA AAATAT GGGCTGC SC TATC CT AGTACC CCG ATGG AAAC TCGA GSA ATC CG 3J0 GAGCAAT GSC AACG ARATAT GGGCTGA GC TATC CT AGTACC CCG ATGG AAAC TCGA GCA ATC CG 3K0 GCATAAK SCC AAAG AAATAT GGGCTSA GC TWTC TT AGTACC CCG ATGG GGAC TCGA GCA AGC CG 3L0 GAATACT GCM AAAG AAATCY GGSCTGA GC CATC CT AGTACT CCG ATGG GAAT TCGA GCA ATC CG
%%bash
head -n 50 outfiles/c85m4p3.unlinked_snps | cut -c 1-85
12 1959 1A0 TCGTTGTTTCCGCATGCAAATCTCTGGCTGATATCTCKGAATGGTCGCAAACCTTGTTAGATTCRTCCATTACCCATCC 1B0 TCGTTGTTTCCGCATSCAAATCTCTGGCTGATATYTCGGAAWGGTCGGAMATMTTGTTAGATTCGTCCATTACCCATCC 1C0 TCGTTGTYTCCGCATGCAAATCTCAGGCTGATMKCTCGGGATGGTCGGAAACCTTATTAGWTTCAYCCATTACCAATCC 1D0 TCGTTGTTTCCGCMTGGAAATCTCTGGCTGATATCTCGGAATGGTCGGAAACCATGTTAGATTCGTCAATTACCCCGCC 2E0 TCGTTGTTTCGGCAAGCAACTCTTTGGACGATATCTAGGAATGGTCGGCAACCTAGTTAGACTCGTCCATTCYYCATTC 2F0 TCGCTGTTTCGGCAAGCAACTCTTTGGACGATATCTAGKAATGGTCGGCAWCCTAGTTAGACTCGTCCATTCTCCATCC 2G0 TCGTTGTTGCGGCAAGCAMCGCTTTGGACGATATCTAGGACTGSTCCGCAACCTAGTTAGACTCGTCCAYTCCCCATCC 2H0 TCGTTGTTTCGGYAAGCWACTTTTTGACCSTWATCGCGGAATGGTCGGCAACCTTGTTAGACTCGTCCATTACCCATCC 3I0 CCATGSTTTGGACAAGCAACTTCTTKGCCGATATCTAGGAATRGAAGGAAACCTTGATAGACTCGTTCCTCACCCATCC 3J0 CCGTGGTTTGGACAAGCAACTTCTTGGCCGATATCTAGGAATGGTAGGAAACCTTGAKGKACTCGTTCCTCACCCATCC 3K0 TCGTGGTTTGGGCAAGCAACTTCTTGGCCGATATCTAGGAATGGTAGGAAACCTTGATAGACWSGTTCCTCACCCATCA 3L0 TMGTGGCTTGGGCAAGCAACTTCTTGGCCGATATCTAGGAATGGTCGGAAACCTTGTTAGACTCGTTCCTCACCCATCC
You may also produce some more complicated formatting options that involve pooling individuals into groups or populations. This can be done for the "treemix" and "migrate" outputs, which are formatted for input into the programs TreeMix and migrate-n, respectively. Grouping individuals into populations is done with the final lines of the params file as shown below, and similar to the assignment of individuals into clades for hierarchical clustering (see full tutorial).
Each line designates a group, and has three arguments that are separated by space or tab. The first is the group name, the second is the minimum number of individuals that must have data in that group for a locus to be included in the output, and the third is a list of the members of that group. Lists of taxa can include comma-separated names and wildcard selectors, like below. Example:
%%bash
## append group designations to the params file
echo "pop1 4 1A0,1B0,1C0,1D0 " >> params.txt
echo "pop2 4 2E0,2F0,2G0,2H0 " >> params.txt
echo "pop3 4 3* " >> params.txt
## view params file
cat params.txt
==** parameter inputs for pyRAD version 3.0a **========================== affected step == ./ ## 1. Working directory (all) ./*.fastq.gz ## 2. Loc. of non-demultiplexed files (if not line 16) (s1) ./*.barcodes ## 3. Loc. of barcode file (if not line 16) (s1) vsearch ## 4. command (or path) to call vsearch (or usearch) (s3,s6) muscle ## 5. command (or path) to call muscle (s3,s7) TGCAG ## 6. Restriction overhang (e.g., C|TGCAG -> TGCAG) (s1,s2) 2 ## 7. N processors... 6 ## 8. Mindepth: min coverage for a cluster (s4,s5) 4 ## 9. NQual: max # sites with qual < 20 (line 18) (s2) .85 ## 10. lowered clust thresh... rad ## 11. Datatype: rad,gbs,ddrad,pairgbs,pairddrad,merge (all) 4 ## 12. MinCov: min samples in a final locus (s7) 3 ## 13. MaxSH: max inds with shared hetero site (s7) c85m4p3 ## 14. outprefix... ==== optional params below this line =================================== affected step == ## 15.opt.: select subset (prefix* only selector) (s2-s7) ## 16.opt.: add-on (outgroup) taxa (list or prefix*) (s6,s7) ## 17.opt.: exclude taxa (list or prefix*) (s7) ## 18.opt.: loc. of de-multiplexed data (s2) ## 19.opt.: maxM: N mismatches in barcodes (def= 1) (s1) ## 20.opt.: phred Qscore offset (def= 33) (s2) ## 21.opt.: filter: def=0=NQual 1=NQual+adapters. 2=strict (s2) ## 22.opt.: a priori E,H (def= 0.001,0.01, if not estimated) (s5) ## 23.opt.: maxN: max Ns in a cons seq (def=5) (s5) 8 ## 24. maxH raised ... ## 25.opt.: ploidy: max alleles in cons seq (def=2;see docs) (s4,s5) ## 26.opt.: maxSNPs: (def=100). Paired (def=100,100) (s7) ## 27.opt.: maxIndels: within-clust,across-clust (def. 3,99) (s3,s7) ## 28.opt.: random number seed (def. 112233) (s3,s6,s7) ## 29.opt.: trim overhang left,right on final loci, def(0,0) (s7) * ## 30. all output formats... ## 31.opt.: call maj. consens if depth < stat. limit (def=0) (s5) ## 32.opt.: keep trimmed reads (def=0). Enter min length. (s2) ## 33.opt.: max stack size (int), def= max(500,mean+2*SD) (s3) ## 34.opt.: minDerep: exclude dereps with <= N copies, def=1 (s3) ## 35.opt.: use hierarchical clustering (def.=0, 1=yes) (s6) ## 36.opt.: repeat masking (def.=1='dust' method, 0=no) (s3,s6) ## 37.opt.: vsearch threads per job (def.=6; see docs) (s3,s6) ==== optional: list group/clade assignments below this line (see docs) ================== pop1 4 1A0,1B0,1C0,1D0 pop2 4 2E0,2F0,2G0,2H0 pop3 4 3*
Now if we run pyRAD with the 'm' (migrate) or 't' (treemix) output options, it will create their output files.
%%bash
pyRAD -p params.txt -s 7
ingroup 1A0,1B0,1C0,1D0,2E0,2F0,2G0,2H0,3I0,3J0,3K0,3L0 addon exclude Warning: data set c85m4p3.loci already exists Skipping re-alignment. Creating extra data formats from the existing file To create a new .loci file and alignment move/delete c85m4p3.loci or change the outname prefix in the params file writing nexus file writing phylip file writing unlinked SNPs file + writing full SNPs file + writing STRUCTURE file + writing geno file + writing treemix file data set reduced for group coverage minimums pop1 ['1A0', '1B0', '1C0', '1D0'] minimum= 4 pop2 ['2E0', '2F0', '2G0', '2H0'] minimum= 4 pop3 ['3J0', '3I0', '3K0', '3L0'] minimum= 4 writing vcf file writing alleles file writing migrate-n file data set reduced for group coverage minimums pop1 ['1A0', '1B0', '1C0', '1D0'] minimum= 4 pop2 ['2E0', '2F0', '2G0', '2H0'] minimum= 4 pop3 ['3J0', '3I0', '3K0', '3L0'] minimum= 4
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ Cluster input file: using /home/deren/Dropbox/Public/PYRAD_TUTORIALS/tutorial_RAD/clust.85/cat.clust_.gz
%%bash
less outfiles/c85m4p3.treemix.gz | head -n 30
pop1 pop2 pop3 8,0 8,0 4,4 8,0 8,0 7,1 8,0 8,0 6,2 8,0 6,2 8,0 8,0 8,0 0,8 8,0 8,0 7,1 8,0 8,0 6,2 7,1 8,0 8,0 8,0 6,2 8,0 8,0 8,0 0,8 0,8 8,0 8,0 8,0 8,0 4,4 8,0 7,1 8,0 7,1 8,0 8,0 0,8 8,0 8,0 7,1 8,0 8,0 6,2 8,0 8,0 8,0 7,1 8,0 8,0 7,1 8,0 0,8 8,0 8,0 8,0 6,2 8,0 8,0 6,2 0,8 8,0 8,0 0,8 0,8 8,0 8,0 6,2 8,0 8,0 8,0 8,0 7,1 8,0 6,2 8,0 8,0 2,6 8,0 0,8 8,0 8,0
%%bash
head -n 40 outfiles/c85m4p3.migrate | cut -c 1-85
3 2000 ( npops nloci for data set c85m4p3.loci ) 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 8 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 pop1_0 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGCGATCCTAGTTATCATAAGAATCTTAT pop1_1 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTAT pop1_2 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTAT pop1_3 TCTCTCTCGCGGTCGATGATATTGCGAGGGAAGTAGCAGGCCAAATCGAGATCCTAGTTATCATAAGAATCTTAT pop1_0 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAA pop1_1 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAA pop1_2 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAA pop1_3 GTTCTGGACAAGACAATAGCTCTCCTCCTATGTATGGCTGCCTGTCACTTAACTAATGTAAAGGCACGTGTCCAA pop1_0 TCCGATAGCCAGGTCTCGAGGTCGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCGAAGGATAT pop1_1 TCCGATAGCCAGGTCTCGAGGTCGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCGAAGGATAT pop1_2 TCCGATAGCCAGGTCTCGAGGTCGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCGAAGGATAT pop1_3 TCCGATAGCCAGGTCTCGAGGTCGACTTCCGGCGTGATGTCGGGTTCAACCCCCGGGCATCGGTGCGAAGGATAT pop1_0 TTGCTACCTACACACTGAAACGAATATTGCATGGGCTAGTTGTGCTATGAGGAGTATACAAGACTTATCTACAGT pop1_1 TTGCTACCTACACACTGAAACGAATATTGCATGGGCTAGTTGTGCTATGAGGAGTATACAAGACTTATCTACAGT pop1_2 TTGCTACCTACACACTGAAACGAATATTGCATGGGCTAGTTGTGCTATGAGGAGTATACAAGACTTATCTACAGT pop1_3 TTGCTACCTACACACTGAAACGAATATTGCATGGGCTAGTTGTGCTATGAGGAGTATACAAGACTTATCTACAGT pop1_0 GCGGGCTTAGCTTATTTGAATACAAATGAATGCACGGTATTGAAYCCCGGCGGGATAAATATTAACAGAATAGCG pop1_1 GCGGGCTTAGCTTATTTGAATACAAATGAATGCACGGTATTGAAYCCCGGCGGGATAAATATTAACAGAATAGCG pop1_2 GCGGGCTTAGCTTATTTGAATACAAATGAATGCACGGTATTGAATCCCGGCGGGATAAATATTAACAGAATAGCG pop1_3 GCGGGCTTAGCTTATTTGAATACAAATGAATGCACGGTATTGAACCCCGGCGGGATAAATATTAACAGAATAGCG pop1_0 GTAAAACCCCCGATGCTAACAGCTATTTTAAGCGTGCATTGAGGACGCACCGGACATGTGGTATGTTTCTTTATT pop1_1 GTAAAACCCCCGATGCTAACAGCTATTTTAAGCGTGCATTGAGGACGCACCGGACATGTGGTATGTTTCTTTATT pop1_2 GTAAAACCCCCGATGCTAACAGCTATTTTAAGCGTGCATTGAGGACGCACCGGACATGTGGTATGTTTCTTTATT pop1_3 GTAAAACCCCCGATGCTAACAGCTATTTTAAGCGTGCATTGAGGACGCACCGGACATGTGGTATGTTTCTTTATT pop1_0 TTTGTGTTAACCGCCCTTTGCTTTGATATTGCCCGCCAAGCGTCTATTGGCAATYCAGAAGGCTATCAAACGTCT pop1_1 TTTGTGTTAACCGCCCTTTGCTTTGATATTGCCCGCCAAGCGTCTATTGGCAATTCAGAAGGCTATCAAACGTCT pop1_2 TTTGTGTTAACCGCCCTTTGCTTTGATATTGCCCGCCAAGCGTCTATTGGCAATTCAGAAGGCTATCAAACGTCT pop1_3 TTTGTGTTAACCGCCCTTTGCTTTGATATTGCCCGCCAAGCGTCTATTGGCAATTCAGAAGGTTATCAAACGTCT pop1_0 CGATTCAATCGATTAGATCTTGGAGTTAAATCACAACACCTGATGTCCTTCAAATAATAGATACGGCCCTATCGC pop1_1 CGATTCAATCGATTAGATCTTGGAGTTAAATCACAACACCTGATGTCCTTCAAATAATAGATACGGCCCTATCGC pop1_2 CGATTCAATCGATTAGATCTTGGAGTTAAATCACAACACCTGATGTCCTTCAAATAATAGATACGGCCCTATCGC pop1_3 CGATTCAATCGATTAGATCTTGGAGTTAAATCACAACACCTGATGTCCTTCAAATAATAGATACGGCCCTATCGC pop1_0 GGGACCAAGCTAAAGGTACCGACTCGAGCTGTGGCGTCACTGACAAGTGGCGCAAATATGGTTAGAATGGCGTCG pop1_1 GGGACCAAGCTAAAGGTACCGACTCGAGCTGTGGCGTCACTGACTAGTGGCGCAAATATGGTTAGAATGGCGTCG pop1_2 GGGACCAAGCTAAAGGTACCGACTCGAGCTGTGGCGTCACTGACAAGTGGCGCAAATATGGTTAGAATGGCGTCG pop1_3 GGGACCAAGCTAAATGTACCGACTCGAGCTGTGGCGTCACTGACAAGTGGCGCAAATATGGTTAGAATGGCGTCG pop1_0 GGCGCGGTTTGTTTTTCGCAAGGCACCATGTGCCGAATTAATCCTAGTTCTGTGCTAGAACGGTGGATGCCCATT