Deren Eaton
4/27/15
I still need to make quite a few tweaks and further testing before I release this (including making it faster and much less memory intensive) but I wanted to run it by you at this stage to see if the formatting is similar to what you have in mind for the output.
To demonstrate the new ".catg" output format which includes base counts for all reads I simulated data using the simrrls program (described further here). This is the program I use to generate data for testing pyrad. It's also useful in this case since you might be interested in examining data generated with different known error rates, or sequencing coverage.
The program simulates loci on an input topology under the coalescent, with output formatted into one of several "reduced representation library" (rrl) methods, such as RADseq, and with a range of changeable parameter settings.
## pyrad (v.3.1)
## git clone xxxxxxxxxxxxxxxx
## simrrls (v.1.5)
## git clone yyyyyyyyyyyyyyyy
%%bash
simrrls
Here is how you can use this script Usage: python /home/deren/local/bin/simrrls -I <float> probability mutation is an indel (default 0.0) -D <bool> allow locus dropout (def=0) -L <int> Number of loci to simulate (def=1000) -l <int> locus length (def=100) -u <float> per-site mutation rate (default: 1e-9) -N <int> effective population size (default: 1e5) -i <int> number of sampled inds per tip taxon (default 1) -s <str> insert size (size selection) (min,max) def=300,800 -e <int> sequencing error rate (def 0.0005) -d <str> seq. depth (def=10,0). N(mean,std) for 2 diploid copies -t <str> ultrametric tree w br lens (file,Newick,or use default) -f <str> datatype (def=rad, others=ddrad,gbs,pairddrad,pairgbs) -o <str> output file prefix name (def='out') -s1 <int> random seed 1 (default 123456) -s2 <int> random seed 2 (default 987654) -c1 <int> restriction cut site 1 (CTGCAG) ->TGCAG -c2 <int> restriction cut site 2 (CAATTG) ->AATTG -v <int> verbose: 1=screen, 2=log, 3=both
Most of the parameters were left at default. I first simulated 1000 loci each 100bp in length with a low sequencing error rate low (0.0005) and with equally high coverage for both diploid alleles in each locus (mean=10 copies each, sd=0). All data sets are simulated on the default 12 taxon tree provided in simrrls, otherwise one could enter their own topology.
%%bash
## directories for the three data sets
mkdir -p dataset1/
mkdir -p dataset2/
mkdir -p dataset3/
%%bash
simrrls -L 500 -l 100 -N 2e5 -e 0.0005 -d 10,0 -o dataset1/dataset1
simulating rad data THETA= 0.004 creating new barcode map
Second I simulated data with a 20X higher error rate (0.01)
%%bash
simrrls -L 500 -l 100 -N 2e5 -e 0.01 -d 10,0 -o dataset2/dataset2
simulating rad data THETA= 0.004 creating new barcode map
Third I simulated a low coverage data set in which it is difficult to distinguish sequencing errors from homozygotes (coverage per allele is drawn from a normal distribution N(6,2).
%%bash
simrrls -L 500 -l 100 -N 2e5 -e 0.005 -d 6,2 -o dataset3/dataset3
simulating rad data THETA= 0.004 creating new barcode map
Create a default params file
%%bash
pyrad -n
new params.txt file created
The output format 'c' designate the .cat format that we are interested in.
%%bash
sed -i '/## 1. /c\dataset1 ## 1. working directory ' params.txt
sed -i '/## 2. /c\dataset1/*.gz ## 2. data loc ' params.txt
sed -i '/## 3. /c\dataset1/*.barcodes ## 3. barcode loc ' params.txt
sed -i '/## 7. /c\4 ## 7. N processors ' params.txt
sed -i '/## 10./c\.85 ## 10. clust threshold ' params.txt
sed -i '/## 14./c\dataset1 ## 14. output name ' params.txt
sed -i '/## 30./c\c ## 30. output formats ' params.txt
%%bash
#cat params.txt | cut -c 1-80
%%bash
pyrad -p params.txt
vsearch v1.1.3_linux_x86_64, 7.5GB RAM, 4 cores https://github.com/torognes/vsearch
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ step 1: sorting reads by barcode . step 2: editing raw reads ............ de-replicating files for clustering... step 3: within-sample clustering of 12 samples at .85 similarity. Running 4 parallel jobs with up to 6 threads per job. If needed, adjust to avoid CPU and MEM limits sample 2H0 finished, 500 loci sample 3I0 finished, 500 loci sample 1C0 finished, 500 loci sample 3L0 finished, 500 loci sample 1D0 finished, 500 loci sample 2F0 finished, 500 loci sample 3K0 finished, 500 loci sample 1B0 finished, 500 loci sample 2G0 finished, 500 loci sample 1A0 finished, 500 loci sample 2E0 finished, 500 loci sample 3J0 finished, 500 loci step 4: estimating error rate and heterozygosity ............ step 5: creating consensus seqs for 12 samples, using H=0.00407 E=0.00050 ............ step 6: clustering across 12 samples at .85 similarity Reading file dataset1/clust.85/cat.haplos_ 100% 562966 nt in 5989 seqs, min 94, max 94, avg 94 Indexing sequences 100% Masking 100% Counting unique k-mers 100% Clustering 100% Writing clusters 100% Clusters: 500 Size min 11, max 12, avg 12.0 Singletons: 0, 0.0% of seqs, 0.0% of clusters finished clustering Using across-sample cluster input file: dataset1/clust.85/cat.clust_.gz ingroup 1A0, 1B0, 1C0, 1D0 2E0, 2F0, 2G0, 2H0 3I0, 3J0, 3K0, 3L0 addon exclude .... final stats written to: dataset1/stats/dataset1.stats output files being written to: dataset1/outfiles/ directory writing cat file ... loading full sequence data ... matching sequence data files
%%bash
sed -i '/## 1. /c\dataset2 ## 1. working directory ' params.txt
sed -i '/## 2. /c\dataset2/*.gz ## 2. data loc ' params.txt
sed -i '/## 3. /c\dataset2/*.barcodes ## 3. barcode loc ' params.txt
sed -i '/## 14./c\dataset2 ## 14. output name ' params.txt
%%bash
pyrad -p params.txt
vsearch v1.1.3_linux_x86_64, 7.5GB RAM, 4 cores https://github.com/torognes/vsearch
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ step 1: sorting reads by barcode . step 2: editing raw reads ............ de-replicating files for clustering... step 3: within-sample clustering of 12 samples at .85 similarity. Running 4 parallel jobs with up to 6 threads per job. If needed, adjust to avoid CPU and MEM limits sample 2H0 finished, 500 loci sample 1C0 finished, 500 loci sample 3L0 finished, 500 loci sample 3I0 finished, 500 loci sample 2F0 finished, 500 loci sample 1D0 finished, 500 loci sample 3K0 finished, 500 loci sample 1B0 finished, 500 loci sample 1A0 finished, 500 loci sample 2E0 finished, 500 loci sample 2G0 finished, 500 loci sample 3J0 finished, 500 loci step 4: estimating error rate and heterozygosity ............ step 5: creating consensus seqs for 12 samples, using H=0.00421 E=0.00996 ............ step 6: clustering across 12 samples at .85 similarity Reading file dataset2/clust.85/cat.haplos_ 100% 563154 nt in 5991 seqs, min 94, max 94, avg 94 Indexing sequences 100% Masking 100% Counting unique k-mers 100% Clustering 100% Writing clusters 100% Clusters: 500 Size min 11, max 12, avg 12.0 Singletons: 0, 0.0% of seqs, 0.0% of clusters finished clustering Using across-sample cluster input file: dataset2/clust.85/cat.clust_.gz ingroup 1A0, 1B0, 1C0, 1D0 2E0, 2F0, 2G0, 2H0 3I0, 3J0, 3K0, 3L0 addon exclude .... final stats written to: dataset2/stats/dataset2.stats output files being written to: dataset2/outfiles/ directory writing cat file ... loading full sequence data ... matching sequence data files
%%bash
sed -i '/## 1. /c\dataset3 ## 1. working directory ' params.txt
sed -i '/## 2. /c\dataset3/*.gz ## 2. data loc ' params.txt
sed -i '/## 3. /c\dataset3/*.barcodes ## 3. barcode loc ' params.txt
sed -i '/## 14./c\dataset3 ## 14. output name ' params.txt
%%bash
pyrad -p params.txt
vsearch v1.1.3_linux_x86_64, 7.5GB RAM, 4 cores https://github.com/torognes/vsearch
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ step 1: sorting reads by barcode . step 2: editing raw reads ............ de-replicating files for clustering... step 3: within-sample clustering of 12 samples at .85 similarity. Running 4 parallel jobs with up to 6 threads per job. If needed, adjust to avoid CPU and MEM limits sample 3L0 finished, 500 loci sample 1C0 finished, 500 loci sample 2F0 finished, 500 loci sample 2G0 finished, 500 loci sample 1A0 finished, 500 loci sample 3J0 finished, 500 loci sample 3I0 finished, 500 loci sample 3K0 finished, 500 loci sample 1B0 finished, 500 loci sample 2H0 finished, 500 loci sample 1D0 finished, 500 loci sample 2E0 finished, 500 loci step 4: estimating error rate and heterozygosity ............ step 5: creating consensus seqs for 12 samples, using H=0.00374 E=0.00501 ............ step 6: clustering across 12 samples at .85 similarity Reading file dataset3/clust.85/cat.haplos_ 100% 557202 nt in 5928 seqs, min 92, max 94, avg 94 Indexing sequences 100% Masking 100% Counting unique k-mers 100% Clustering 100% Writing clusters 100% Clusters: 500 Size min 9, max 12, avg 11.9 Singletons: 0, 0.0% of seqs, 0.0% of clusters finished clustering Using across-sample cluster input file: dataset3/clust.85/cat.clust_.gz ingroup 1A0, 1B0, 1C0, 1D0 2E0, 2F0, 2G0, 2H0 3I0, 3J0, 3K0, 3L0 addon exclude .... final stats written to: dataset3/stats/dataset3.stats output files being written to: dataset3/outfiles/ directory writing cat file ... loading full sequence data ... matching sequence data files
Dataset1 has few errors, easily distinguished from heterozygotes.
%%bash
head -50 dataset1/outfiles/dataset1.catg | cut -c 1-48
12 44500 1A0 1B0 1C0 1D0 2E0 2F0 2G0 2H0 3I0 YCCCCCCCCCCC 10,0,10,0 20,0,0,0 20,0,0,0 20,0,0, GGGGGGGGGGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20 AAAAAAAAAAAA 0,20,0,0 0,20,0,0 0,20,0,0 0,20,0,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 AAAAAAAAAAAA 0,20,0,0 0,20,0,0 0,20,0,0 0,20,0,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 GGGGGGGGGGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20 AAAAAAAAAAAA 0,20,0,0 0,20,0,0 0,20,0,0 0,20,0,0 GGGGGGGGGGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20 AAAAAAAAAAAA 0,20,0,0 0,20,0,0 0,20,0,0 0,20,0,0 GGGGRRGGGGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20 GGGGGGGGGGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 GGGGGGGGGGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20 AAAAAAAAAAAA 0,20,0,0 0,20,0,0 0,20,0,0 0,20,0,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 AAAAAAAAAAAA 0,20,0,0 0,20,0,0 0,20,0,0 0,20,0,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,1,19,0 AAACGGGGGGGG 0,20,0,0 0,20,0,0 0,20,0,0 20,0,0,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 TTTTTTTTTKTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 GGGGGGGGGGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 GGGGGGGGGGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 TTTTTTTTTTTW 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 GGGGGGGGGGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 AAAAAAAAAAAA 0,20,0,0 0,20,0,0 0,20,0,0 0,20,0,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 GGGGGGGGGGGG 0,0,0,20 0,0,1,19 0,0,0,20 0,0,0,20 GGGGGGGGGGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 GGGGGGGGGGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20 AAAAAAAAAAAA 0,20,0,0 0,20,0,0 0,20,0,0 0,20,0,0
Dataset 2 has many more errors, but still easily distinguished from heterozygotes.
%%bash
head -50 dataset2/outfiles/dataset2.catg | cut -c 1-48
12 44500 1A0 1B0 1C0 1D0 2E0 2F0 2G0 2H0 3I0 CCCCCCCCCCCC 20,0,0,0 19,0,1,0 20,0,0,0 20,0,0,0 GGGGGGGGCCGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20 AAAAAAAAAAAA 0,20,0,0 0,20,0,0 0,20,0,0 1,19,0,0 TTTTTGTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,1,19,0 0,0,20,0 GGGGGGGGGGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20 TTTTTTTTTTTT 0,0,19,1 0,0,20,0 0,0,20,0 0,0,20,0 AAAAAAAAAAAR 0,20,0,0 0,20,0,0 0,20,0,0 0,20,0,0 AAAAAAAAGGGA 0,20,0,0 0,20,0,0 0,20,0,0 0,20,0,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 2,0,18,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 AAAAAAAAAAAA 0,19,0,1 0,20,0,0 0,20,0,0 0,20,0,0 AAAAAAAAAAAA 0,20,0,0 0,20,0,0 0,20,0,0 0,20,0,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 CCCCCCCCCCCC 19,0,0,1 20,0,0,0 20,0,0,0 20,0,0,0 GGGGGGGGGGGG 0,0,1,19 0,0,2,18 0,0,0,20 0,0,0,20 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 GGGGGGGGGGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 GGGGGGGGGGGG 0,0,0,20 0,0,0,20 0,0,0,20 1,0,0,19 AAAAAAAAAAAA 0,19,1,0 0,20,0,0 0,20,0,0 0,20,0,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,1,19,0 0,0,19,1 TTTTTTTTTTTT 1,0,19,0 0,0,20,0 0,0,20,0 0,0,20,0 GGGGGGGGGGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20 TTTTTTTTTTTT 0,1,19,0 0,0,20,0 0,0,20,0 0,0,20,0 AAAAAAAAAAAA 0,20,0,0 0,20,0,0 0,20,0,0 0,20,0,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 AAAAAAAAAAAA 0,20,0,0 0,19,1,0 0,20,0,0 0,20,0,0 TTTTTTTTTTTT 0,0,19,1 0,0,20,0 0,0,20,0 0,0,20,0 AAAAAAAAAAAA 0,20,0,0 0,20,0,0 0,20,0,0 0,19,1,0 GGGGGGGGGKGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,1,19 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 TTTTTTTTTTTT 0,0,20,0 2,0,18,0 0,1,19,0 0,0,20,0 AAAAAAAAAAAA 0,19,0,1 0,20,0,0 0,20,0,0 0,20,0,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 CCCCCCCCCCCC 20,0,0,0 18,1,1,0 20,0,0,0 20,0,0,0 CCCCCCCCCCCC 19,1,0,0 19,1,0,0 20,0,0,0 20,0,0,0 AAAAAAAAAAAA 0,19,0,1 0,20,0,0 1,19,0,0 0,20,0,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 CCCCCCCCCCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0 TTTTTTTTTTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0 TTTTKTTTTTTT 1,0,19,0 0,0,20,0 0,1,19,0 0,1,19,0
Dataset3 heterozygotes could be more difficult to distinguish from sequencing errors.
%%bash
head -200 dataset3/outfiles/dataset3.catg | cut -c 1-48
12 44500 1A0 1B0 1C0 1D0 2E0 2F0 2G0 2H0 3I0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 GGGGGGGGGGGG 0,1,0,12 0,0,0,10 0,0,0,12 0,0,0,16 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 AAAAAAAAAAAA 0,13,0,0 0,10,0,0 0,12,0,0 0,16,0,0 CCCCCCCYCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 ANAAAAAAAAAA 0,13,0,0 0,8,0,2 0,12,0,0 0,16,0,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 AAAAAAAAAAAA 0,13,0,0 0,10,0,0 0,12,0,0 0,16,0,0 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 TTTTTTTTTTTT 0,1,12,0 0,0,10,0 0,0,12,0 0,0,16,0 CCCCCCCCCCCC 13,0,0,0 9,0,0,1 12,0,0,0 16,0,0,0 TTTTTTTTTTTT 0,0,12,1 1,0,9,0 0,0,12,0 0,0,16,0 GGGGGGGGGGGG 0,0,0,13 0,0,0,10 0,0,0,12 0,0,0,16 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 AAAAAAAAAAAA 0,13,0,0 0,9,1,0 0,12,0,0 0,16,0,0 GGGGGGGGGGGG 0,0,0,13 0,0,0,10 0,0,0,12 0,0,0,16 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 GGGGGGGGGGGG 0,0,0,13 0,0,0,10 0,0,0,12 0,0,0,16 CCCCCCCCCCCC 12,0,0,1 10,0,0,0 12,0,0,0 16,0,0,0 AAAAAAAAAAAA 0,13,0,0 0,10,0,0 0,12,0,0 0,16,0,0 AAAAAAAAAAAA 0,13,0,0 0,10,0,0 0,12,0,0 0,16,0,0 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 AAAAAAAAAAAA 0,13,0,0 0,10,0,0 0,12,0,0 0,16,0,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 AAAAAAAAAAAA 0,13,0,0 0,10,0,0 0,12,0,0 0,16,0,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 GGGGGGGGCCCC 0,0,0,13 0,0,0,10 0,0,0,12 0,0,0,16 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 AAAAAAAAAAAA 0,13,0,0 0,10,0,0 0,12,0,0 0,16,0,0 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 AAAAAAAAAAAA 0,12,0,1 0,10,0,0 0,11,0,1 0,16,0,0 AAAAAAAAAAAA 0,13,0,0 0,10,0,0 0,12,0,0 0,16,0,0 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 AAAAAAAAAAAA 0,13,0,0 0,10,0,0 0,12,0,0 1,15,0,0 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 CCTCCCCCCCCC 13,0,0,0 10,0,0,0 0,0,12,0 16,0,0,0 AAAAAAAAAAAA 0,12,1,0 0,10,0,0 0,12,0,0 0,16,0,0 AAAAAAAAAAAA 1,12,0,0 0,10,0,0 0,12,0,0 0,16,0,0 TTTTTTTTTTTT 0,1,12,0 0,0,10,0 0,0,12,0 1,0,15,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 GGGGGGGGGGGG 0,0,0,13 0,0,0,10 0,0,0,12 0,0,0,16 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 AAAAAAAAAAAA 0,13,0,0 0,10,0,0 0,12,0,0 0,15,1,0 GGGGGGGGGGGG 0,0,0,13 0,0,0,10 0,0,0,12 0,0,0,16 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 GGGGGGGGGGGG 0,0,0,13 0,0,0,10 0,0,0,12 0,0,0,16 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 TTTCCCCCCCCC 0,0,13,0 0,0,10,0 0,0,12,0 16,0,0,0 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 GGGGGGGGGGGG 0,0,0,13 0,0,0,10 0,0,0,12 0,0,0,16 AAAAAAAAAAAA 0,13,0,0 0,10,0,0 0,12,0,0 0,16,0,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 GGGGGGGGGGGG 0,0,0,13 0,0,0,10 0,0,0,12 0,0,0,16 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 GGGGGGGGGGGG 0,0,0,13 0,0,0,10 0,0,0,12 0,0,0,16 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 TTTTTTTTTTTT 0,0,13,0 0,0,10,0 0,0,12,0 0,0,16,0 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 GGGGGGGGGGGG 0,0,0,13 0,0,0,10 0,0,0,12 0,0,0,16 GGGGGGGGGGGG 0,0,0,13 0,0,0,10 0,0,0,12 0,0,0,16 CCCCCCCCCCCC 13,0,0,0 10,0,0,0 12,0,0,0 16,0,0,0 GGGGGGGGGGGG 0,0,0,13 0,0,0,10 0,0,0,12 0,0,0,16 GGGGGGGGGGGG 0,0,0,13 0,0,0,10 0,0,0,12 0,0,0,16 TTTTTTTTTTTT 1,0,10,0 0,0,12,0 0,0,14,0 0,0,11,0 TTTTTTTTTWTA 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 AAAAAAAAAAAA 0,11,0,0 0,12,0,0 0,13,1,0 0,11,0,0 GGGGGGGGGGGG 0,0,0,11 0,0,0,12 0,0,0,14 0,0,0,11 GGGGGGGGGGGG 0,0,0,11 0,0,0,12 0,0,0,14 0,0,0,11 GGGGGGGGGGGG 0,0,0,11 1,0,0,11 0,0,0,14 0,0,0,11 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 CCCCCCCCCCTC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 GGGGGGGGGGGG 0,0,0,11 0,0,0,12 0,0,0,14 0,0,0,11 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 13,1,0,0 10,1,0,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 13,0,0,1 11,0,0,0 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 CCCCCCCCCCCC 11,0,0,0 11,1,0,0 14,0,0,0 11,0,0,0 GGKGGGGGGGGG 0,0,0,11 0,0,0,12 0,0,6,8 0,0,0,11 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 AAAAAAAAAAAA 0,11,0,0 0,12,0,0 0,14,0,0 0,11,0,0 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 TYTTTTTTTTTG 0,0,11,0 8,0,4,0 0,0,14,0 0,0,11,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 AAAAAAAAAAAA 0,11,0,0 0,12,0,0 0,14,0,0 0,11,0,0 GGGGGGGGGGGG 0,0,0,11 0,0,0,12 0,0,0,14 0,0,0,11 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 CCCCCCCCCCCC 10,1,0,0 12,0,0,0 14,0,0,0 11,0,0,0 GGGGGGGGGGGG 0,0,0,11 0,0,0,12 0,0,0,14 0,0,0,11 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 AAAAAAAAAAAA 0,11,0,0 0,12,0,0 0,14,0,0 0,11,0,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 CCCCCCCCCCCC 10,0,1,0 12,0,0,0 14,0,0,0 11,0,0,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 AAAAAAAAAAAA 0,11,0,0 0,12,0,0 0,14,0,0 0,11,0,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 TWTTTTTTTTTT 0,0,11,0 0,4,8,0 0,0,14,0 0,0,11,0 AAAAAAAAAAAA 0,11,0,0 0,12,0,0 0,14,0,0 0,11,0,0 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 GGGGGGGGGGGG 0,0,0,11 0,0,0,12 1,0,0,13 0,0,0,11 CCCCCCCCCCCC 11,0,0,0 11,1,0,0 14,0,0,0 11,0,0,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 AAAAAAAAAAAA 0,11,0,0 0,12,0,0 0,14,0,0 0,11,0,0 AAAAAAAAAAAA 0,11,0,0 0,12,0,0 0,14,0,0 1,10,0,0 GGGGGGGGGGGG 0,0,0,11 0,0,0,12 0,0,0,14 0,0,0,11 GGGGGGGGGGGG 0,0,0,11 0,0,0,12 0,0,0,14 0,0,0,11 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 AAAMAAAAAAAA 0,11,0,0 0,12,0,0 0,14,0,0 3,8,0,0 GGGGGGGGGGGG 0,0,0,11 0,0,0,12 0,0,0,14 0,0,0,11 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 GGGGGGGGGGGG 0,0,0,11 0,0,0,12 0,0,0,14 0,0,0,11 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 TTTAAAAATTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,11,0,0 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 GGGGGGGGGGGG 0,0,0,11 0,0,0,12 0,0,0,14 0,0,0,11 AAAAAAAAAAAA 0,11,0,0 0,12,0,0 0,14,0,0 0,11,0,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 AAAAAAAAAAAA 0,11,0,0 0,11,1,0 0,14,0,0 0,11,0,0 MCCCCCCCCCCC 5,6,0,0 12,0,0,0 14,0,0,0 11,0,0,0 GGGGGGGGGGGG 0,0,0,11 0,0,0,12 0,0,0,14 0,0,0,11 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 CCCCGGGGCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 GGGGGGGGGGGG 0,0,0,11 0,0,0,12 0,0,0,14 0,0,0,11 GGGGGGGGGGGG 0,0,0,11 0,0,0,12 0,0,0,14 0,0,0,11 TTTTTTTTTTTT 0,0,11,0 0,0,12,0 0,0,14,0 0,0,11,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 GGGGGGGGGGGG 0,0,0,11 0,0,0,12 0,0,0,14 0,0,0,11 SSGGGGGGGGGG 5,0,0,6 4,0,0,8 0,0,0,14 0,0,0,11 0 AAAAAAAAAAAA 0,11,0,0 0,12,0,0 0,14,0,0 0,11,0,0 AAAAAAAAAAAA 0,11,0,0 0,12,0,0 0,14,0,0 0,11,0,0 CCCCCCCCCCCC 11,0,0,0 12,0,0,0 14,0,0,0 11,0,0,0 GGGGGGGGGGGG 0,0,0,12 0,0,0,8 0,0,0,17 0,0,0,14 AAAAAAAAAAAA 0,12,0,0 0,8,0,0 0,17,0,0 0,14,0,0 CCSCCCCCCCCC 12,0,0,0 8,0,0,0 8,1,0,8 14,0,0,0 1 GGGGGGGGSGGG 0,0,0,12 0,0,0,8 0,0,0,17 0,0,0,14 AAAAAAAAAAAA 0,12,0,0 0,8,0,0 0,16,1,0 0,14,0,0 GGGGGGGGGGGG 0,0,0,12 0,0,0,8 0,0,0,17 0,0,0,14 GGGGGGGGGGGG 0,0,0,12 0,0,0,8 0,0,0,17 0,0,0,14 TTTTTTTTTTTT 0,0,12,0 0,0,8,0 0,0,17,0 0,0,14,0 TTTTTTTTTTTT 0,0,12,0 0,0,8,0 0,0,17,0 0,0,14,0 AAWAAAAAAAAA 0,12,0,0 0,8,0,0 0,8,9,0 0,14,0,0 0 AAAAAAAAAAAA 0,12,0,0 0,8,0,0 0,17,0,0 0,14,0,0 AAAAAAAAAAAA 0,12,0,0 0,8,0,0 0,17,0,0 0,14,0,0 CCCCCCCCCCCC 12,0,0,0 8,0,0,0 17,0,0,0 14,0,0,0 AAAAAAAAAAAA 0,12,0,0 0,8,0,0 0,17,0,0 0,14,0,0 GGGGGGGGGGGG 0,0,0,12 0,0,0,8 0,1,0,16 0,0,0,14 AAAAAAAAAAAA 0,12,0,0 0,8,0,0 0,16,1,0 0,14,0,0 GGGGGGGGGGGG 0,0,0,12 0,0,0,8 0,0,0,17 0,0,0,14 GGGGGGGGGGGG 0,0,0,12 0,0,0,8 0,1,0,16 0,0,0,14 TTTTTTTTTTTT 0,0,12,0 0,0,8,0 0,0,17,0 0,0,14,0 GGGGGGGGGGGG 0,0,0,12 0,0,0,8 0,0,0,17 0,0,0,14 TTTTTTTTTTTT 0,0,12,0 0,0,8,0 0,0,17,0 0,0,14,0 GGGGGGGGGGGG 0,0,0,12 0,0,0,8 0,0,0,17 0,0,0,14 CCCCCCCCGGGC 12,0,0,0 8,0,0,0 17,0,0,0 14,0,0,0 TTTTTTTTTTTT 0,0,12,0 0,0,8,0 0,0,17,0 0,0,14,0 TTTTTTTTTTTT 0,0,12,0 0,0,8,0 0,0,17,0 0,0,13,1 CCCCCCCCCCCC 12,0,0,0 8,0,0,0 16,0,1,0 14,0,0,0 TTTTTTTTTTTT 0,0,12,0 0,0,8,0 0,0,17,0 0,1,13,0 CCCCCCCCCCCC 12,0,0,0 8,0,0,0 17,0,0,0 14,0,0,0 GGGGGGGGGGGG 0,0,0,12 0,0,0,8 0,0,0,17 0,0,0,14 TTTTTTAATTTT 0,0,12,0 0,0,8,0 0,0,17,0 0,0,14,0
This format seems to meet the needs you requested, but let me know and I can change it accordingly. I'm working to stick this info into the VCF format now as well, though that will probably be even more bloated than this format. This is nice in that the data matrix is easily human-readable.