Live oaks introgression manuscript

Notebook 3: Introgression analyses

D. Eaton, A. Hipp, A. Gonzalez-Rodriguez & J. Cavender-Bares

contact: [email protected]

This notebook shows the creation of input files for performing D-statistic tests, runs the tests in pyRAD, and creates tables of results.

In [2]:
%%bash
mkdir -p analysis_dtests
In [25]:
import numpy as np
import scipy.stats
import itertools
import gzip
from collections import OrderedDict, Counter

D-statistic Introgression analyses

Four taxon D-statistic tests were performed on the ".loci" output of the largest data set (virentes_c85d6m4p5.loci). Input files for D-statistic tests designate which samples to iterate over for each test. Below I wrote a Python function to create these input files, and then I perform the tests by designating input files to pyRAD using the -d tag.

In [8]:
## create D-statistic template input file
! ~/pyrad_v.2.13/pyRAD -D  >  analysis_dtests/input.template
In [9]:
! cat analysis_dtests/input.template
200                          ## N bootstrap replicates
test.loci                    ## loc/path to input .loci file
dstats/test1_res             ## output file path/name (no suffix)
4                            ## which test: 4,part,foil,foilalt
2                            ## N cores (execute jobs [lines below] in parallel
0                            ## output ABBA/BABA loci to files (0=no,1,2=verbose)
0                            ## output bootstrap Ds to files (0=no,1=yes)
-----------------------------------------------------------
In [10]:
%%bash
## substitute in relevant parameters to template file
sed -i "/## N boot/c\200                           ## N bootstrap reps "   analysis_dtests/input.template
sed -i "/## loc/c\analysis_pyrad/outfiles/virentes_c85d6m4p5.loci   ## loc/path "   analysis_dtests/input.template
sed -i "/## output file/c\analysis_dtests/out      ## output file" analysis_dtests/input.template
sed -i "/## N cores/c\20                           ## N cores (parallel) "  analysis_dtests/input.template
sed -i "/## output ABBA/c\0                        ## output ABBA/BABA " analysis_dtests/input.template
sed -i "/## output bootstrap/c\0                   ## output bootstraps " analysis_dtests/input.template
In [10]:
## Python function for editing input files
## Takes a list of sample names for each taxon
## and writes iterations to file, uses template file

def makeD4(P1,P2,P3,O,num,name,template):
    stringout = ""
    tests = []
    i = 1
    for p1 in P1: 
        for p2 in P2:
            for p3 in P3:
                if p1 != p2:
                    if set([p1,p2,p3]) not in tests:
                        stringout += "[%s]\t[%s]\t[%s]\t[%s]\t## test %i.%i\n" % (p1,p2,p3,",".join(O),num,i)
                        i += 1
                        tests.append(set([p1,p2,p3]))
    ! head -n 8 $template > "analysis_dtests/input."$num"."$name".txt"
    ! echo "$stringout" >> "analysis_dtests/input."$num"."$name".txt"
    ostring = "/## output file/c\\analysis_dtests/out."+str(num)+"."+str(name)+"      ## output file "
    ! sed -i "$ostring" "analysis_dtests/input."$num"."$name".txt"
In [11]:
## create sample to taxon lists
geminata = ["FLAB109","FLWO6","FLSF54","FLCK18"]
minima = ["FLSF47","FLMO62","FLSA185","FLCK216"]
virginiana = ["LALC2","FLSF33","FLBA140"]   ## excluded SCCU3 for too little data
oleoides = ["MXSA3017","BZBB1","HNDA09","CRL0001","CRL0030"]
sagraena = ["CUSV6","CUVN10","CUCA4"]
brandegei = ["BJVL19","BJSL25","BJSB3"]
fusiformis_t = ["TXMD3","TXGR3"]
fusiformis_m = ["MXED8","MXGT4"]
whiteoaks = ["AR","EN","DO","DU"]

Create input files for D-statistic tests and perform tests

In [12]:
## write the input file for test 1, testing introgression between minima and geminata
makeD4(geminata,geminata,minima,whiteoaks,1,"GGM","analysis_dtests/input.template")
In [13]:
## This is an example of what an input test file looks like
## it iterates over all 'iterations' (sample combinations) in a 'test'
! cat analysis_dtests/input.1.GGM.txt
200                           ## N bootstrap reps 
analysis_pyrad/outfiles/virentes_c85d6m4p5.loci   ## loc/path 
analysis_dtests/out.1.GGM      ## output file 
4                            ## which test: 4,part,foil,foilalt
20                           ## N cores (parallel) 
0                        ## output ABBA/BABA 
0                   ## output bootstraps 
-----------------------------------------------------------
[FLAB109]	[FLWO6]	[FLSF47]	[AR,EN,DO,DU]	## test 1.1
[FLAB109]	[FLWO6]	[FLMO62]	[AR,EN,DO,DU]	## test 1.2
[FLAB109]	[FLWO6]	[FLSA185]	[AR,EN,DO,DU]	## test 1.3
[FLAB109]	[FLWO6]	[FLCK216]	[AR,EN,DO,DU]	## test 1.4
[FLAB109]	[FLSF54]	[FLSF47]	[AR,EN,DO,DU]	## test 1.5
[FLAB109]	[FLSF54]	[FLMO62]	[AR,EN,DO,DU]	## test 1.6
[FLAB109]	[FLSF54]	[FLSA185]	[AR,EN,DO,DU]	## test 1.7
[FLAB109]	[FLSF54]	[FLCK216]	[AR,EN,DO,DU]	## test 1.8
[FLAB109]	[FLCK18]	[FLSF47]	[AR,EN,DO,DU]	## test 1.9
[FLAB109]	[FLCK18]	[FLMO62]	[AR,EN,DO,DU]	## test 1.10
[FLAB109]	[FLCK18]	[FLSA185]	[AR,EN,DO,DU]	## test 1.11
[FLAB109]	[FLCK18]	[FLCK216]	[AR,EN,DO,DU]	## test 1.12
[FLWO6]	[FLSF54]	[FLSF47]	[AR,EN,DO,DU]	## test 1.13
[FLWO6]	[FLSF54]	[FLMO62]	[AR,EN,DO,DU]	## test 1.14
[FLWO6]	[FLSF54]	[FLSA185]	[AR,EN,DO,DU]	## test 1.15
[FLWO6]	[FLSF54]	[FLCK216]	[AR,EN,DO,DU]	## test 1.16
[FLWO6]	[FLCK18]	[FLSF47]	[AR,EN,DO,DU]	## test 1.17
[FLWO6]	[FLCK18]	[FLMO62]	[AR,EN,DO,DU]	## test 1.18
[FLWO6]	[FLCK18]	[FLSA185]	[AR,EN,DO,DU]	## test 1.19
[FLWO6]	[FLCK18]	[FLCK216]	[AR,EN,DO,DU]	## test 1.20
[FLSF54]	[FLCK18]	[FLSF47]	[AR,EN,DO,DU]	## test 1.21
[FLSF54]	[FLCK18]	[FLMO62]	[AR,EN,DO,DU]	## test 1.22
[FLSF54]	[FLCK18]	[FLSA185]	[AR,EN,DO,DU]	## test 1.23
[FLSF54]	[FLCK18]	[FLCK216]	[AR,EN,DO,DU]	## test 1.24

In [19]:
## create remaining input files for D4 tests
makeD4(geminata,geminata,minima,whiteoaks,1,"GGM","analysis_dtests/input.template")
makeD4(minima,minima,geminata,whiteoaks,2,"MMG","analysis_dtests/input.template")
makeD4(geminata,geminata,virginiana,whiteoaks,3,"GGV","analysis_dtests/input.template")
makeD4(minima,minima,virginiana,whiteoaks,4,"MMV","analysis_dtests/input.template")
makeD4(minima,geminata,virginiana,whiteoaks,5,"MGV","analysis_dtests/input.template")
makeD4(virginiana,virginiana,minima,whiteoaks,6,"VVM","analysis_dtests/input.template")
makeD4(virginiana,virginiana,geminata,whiteoaks,7,"VVG","analysis_dtests/input.template")
makeD4(virginiana,geminata,minima,whiteoaks,8,"VGM","analysis_dtests/input.template")
makeD4(oleoides,sagraena,minima+geminata+virginiana,whiteoaks,9,"OSmgv","analysis_dtests/input.template")
makeD4(sagraena,minima+geminata+virginiana,oleoides,whiteoaks,10,"SmgvO","analysis_dtests/input.template")
makeD4(minima+geminata+virginiana,oleoides,sagraena,whiteoaks,11,"mgvOS","analysis_dtests/input.template")
makeD4(oleoides, oleoides,fusiformis_t+fusiformis_m,whiteoaks,12,"OOF","analysis_dtests/input.template")
makeD4(oleoides, oleoides,brandegei,whiteoaks,13,"OOB","analysis_dtests/input.template")
makeD4(brandegei, fusiformis_t+fusiformis_m, oleoides,whiteoaks,14,"BFO","analysis_dtests/input.template")
makeD4(brandegei, fusiformis_t+fusiformis_m, sagraena,whiteoaks,15,"BFS","analysis_dtests/input.template")
makeD4(brandegei, fusiformis_t+fusiformis_m, minima+geminata+virginiana,whiteoaks,16,"BFmgv","analysis_dtests/input.template")
makeD4(brandegei, brandegei, fusiformis_t+fusiformis_m,whiteoaks,17,"BBF","analysis_dtests/input.template")
makeD4(sagraena, sagraena, minima+geminata+virginiana,whiteoaks,18,"SSmgv","analysis_dtests/input.template")
makeD4(minima, virginiana,sagraena,whiteoaks,19,"MVS","analysis_dtests/input.template")
makeD4(minima, geminata,sagraena,whiteoaks,20,"MGS","analysis_dtests/input.template")
makeD4(virginiana, geminata,sagraena,whiteoaks,21,"VGS","analysis_dtests/input.template")
makeD4(minima+geminata,virginiana,fusiformis_t,whiteoaks,22,"mgVFt","analysis_dtests/input.template")
makeD4(oleoides,oleoides,fusiformis_m,whiteoaks,23,"OOFm","analysis_dtests/input.template")
makeD4(sagraena, sagraena, oleoides ,whiteoaks,24,"SSO","analysis_dtests/input.template")
makeD4(oleoides, oleoides, sagraena ,whiteoaks,25,"OOS","analysis_dtests/input.template")
In [ ]:
## run tests 1-26
for test in range(1,26):
    t = glob.glob("analysis_dtests/input."+str(test)+".*")[0]
    stderr = ! ~/Dropbox/pyrad-git/pyRAD -d $t

Partitioned D-statistics

The partitioned test statistics can be used to identify false positive D4 statistics that arise due to shared ancestry. I investigate selected tests that yielded significant D4 statistics.

In [21]:
## Python function for creating input files
## from a list of samples in each taxon
## and writes iterations to outfile, based on template file
def makeD5(P1,P2,P3a,P3b,O,num,name,template):
    stringout = ""
    tests = []
    i = 1
    for p1 in P1: 
        for p2 in P2:
            for p3a in P3a:
                for p3b in P3b:
                    if p1 != p2:
                        if set([p1,p2,p3a,p3b]) not in tests:
                            stringout += "[%s]\t[%s]\t[%s]\t[%s]\t[%s]\t## test %i.%i\n" % (p1,p2,p3a,p3b,",".join(O),num,i)
                            i += 1
                            tests.append(set([p1,p2,p3a,p3b]))
    ! head -n 8 $template > "analysis_dtests/input."$num"."$name".txt"
    ! echo "$stringout" >> "analysis_dtests/input."$num"."$name".txt"
    ostring = "/## output file/c\\analysis_dtests/out."+str(num)+"."+str(name)+"      ## output file "
    ! sed -i "$ostring" "analysis_dtests/input."$num"."$name".txt"
In [19]:
## re-define sample to taxon lists to select only 
## the most variable individuals from D4 tests to reduce redundancy
geminata = ["FLCK18"]
minima = ["FLSA185"]
virg_l = ["LALC2"]
virg_f = ["FLBA140"]  
oleo_m = ["MXSA3017"]
oleo_h = ["HNDA09"]
sagraena = ["CUVN10"]
brandegei = ["BJVL19"]
fusiformis_t = ["TXMD3"]
fusiformis_m = ["MXED8"]
whiteoaks = ["AR","EN","DO","DU"]

I set the test type to 'part' for partitioned and create input files.

In [16]:
! sed  -i "/## which/c\part                       ## which test " analysis_dtests/input.template

## gene flow between clades (Fig.2)
makeD5(brandegei, fusiformis_t, oleo_m+oleo_h, virg_f+virg_l, whiteoaks, 26, "BFtOV", "analysis_dtests/input.template")
makeD5(brandegei, fusiformis_m, oleo_m+oleo_h, virg_f+virg_l, whiteoaks, 27, "BFmOV", "analysis_dtests/input.template")
makeD5(oleo_m+oleo_h, virg_f+virg_l, fusiformis_t, brandegei, whiteoaks, 28, "OVFtB", "analysis_dtests/input.template")
makeD5(oleo_m+oleo_h, virg_f+virg_l, fusiformis_m, brandegei, whiteoaks, 29, "OVFmB", "analysis_dtests/input.template")

makeD5(brandegei, fusiformis_t, sagraena, virg_f+virg_l, whiteoaks, 30, "BFtSV", "analysis_dtests/input.template")
makeD5(brandegei, fusiformis_m, sagraena, virg_f+virg_l, whiteoaks, 31, "BFmSV", "analysis_dtests/input.template")
makeD5(oleo_m+oleo_h, sagraena, fusiformis_t, brandegei, whiteoaks, 32, "SVFtB", "analysis_dtests/input.template")
makeD5(oleo_m+oleo_h, sagraena, fusiformis_m, brandegei, whiteoaks, 33, "SVFmB", "analysis_dtests/input.template")

makeD5(brandegei, fusiformis_t, oleo_m+oleo_h, sagraena, whiteoaks, 34, "BFtOS", "analysis_dtests/input.template")
makeD5(brandegei, fusiformis_m, oleo_m+oleo_h, sagraena, whiteoaks, 35, "BFmOS", "analysis_dtests/input.template")
makeD5(oleo_m+oleo_h, sagraena, fusiformis_t, brandegei, whiteoaks, 36, "OSFtB", "analysis_dtests/input.template")
makeD5(oleo_m+oleo_h, sagraena, fusiformis_m, brandegei, whiteoaks, 37, "OSFmB", "analysis_dtests/input.template")

## testing cuba hypothesis 1, contrasts oleoides
makeD5(oleo_m, oleo_h, sagraena, minima, whiteoaks, 38, "OOSM", "analysis_dtests/input.template")
makeD5(minima, sagraena, oleo_h, oleo_m, whiteoaks, 39, "MSO0", "analysis_dtests/input.template")
makeD5(oleo_m, oleo_h, sagraena, geminata, whiteoaks, 40, "OOSG", "analysis_dtests/input.template")
makeD5(geminata, sagraena, oleo_h, oleo_m, whiteoaks, 41, "GSO0", "analysis_dtests/input.template")
makeD5(oleo_m, oleo_h, sagraena, virg_l+virg_f, whiteoaks, 42, "OOSV", "analysis_dtests/input.template")
makeD5(virg_l+virg_f, sagraena, oleo_h, oleo_m, whiteoaks, 43, "VSO0", "analysis_dtests/input.template")

## testing cuba hypothesis 2, contrasts Florida oaks
makeD5(minima, virg_l+virg_f, sagraena, oleo_h, whiteoaks, 44, "MVS0", "analysis_dtests/input.template")
makeD5(oleo_h, sagraena, virg_l+virg_f, minima, whiteoaks, 45, "OSVM", "analysis_dtests/input.template")
makeD5(geminata, virg_l+virg_f, sagraena, oleo_h, whiteoaks, 46, "GVSO", "analysis_dtests/input.template")
makeD5(oleo_h, sagraena, virg_l+virg_f, geminata, whiteoaks, 47, "OSVG", "analysis_dtests/input.template")
makeD5(virg_l, virg_f, sagraena, oleo_h, whiteoaks, 48, "VVSO", "analysis_dtests/input.template")
makeD5(oleo_h, sagraena, virg_f, virg_l, whiteoaks, 49, "OSVV", "analysis_dtests/input.template")
makeD5(geminata, minima, sagraena, oleo_h, whiteoaks, 50, "GMSO", "analysis_dtests/input.template")
makeD5(oleo_h, sagraena, geminata, minima, whiteoaks, 51, "OSMG", "analysis_dtests/input.template")

## testing cuba hypothesis 3, contrasts oleoides, Florida, no Sagraeana
makeD5(minima, geminata, oleo_h, oleo_m, whiteoaks, 52, "MGOO", "analysis_dtests/input.template")
makeD5(minima, virg_l+virg_f, oleo_h, oleo_m, whiteoaks, 53, "MVOO", "analysis_dtests/input.template")
makeD5(geminata, virg_l+virg_f, oleo_h, oleo_m, whiteoaks, 54, "GVOO", "analysis_dtests/input.template")
makeD5(virg_l, virg_f, oleo_h, oleo_m, whiteoaks, 55, "VVOO", "analysis_dtests/input.template")
makeD5(oleo_m, oleo_h, geminata, minima, whiteoaks, 56, "OOGM", "analysis_dtests/input.template")
makeD5(oleo_m, oleo_h, virg_l+virg_f, minima, whiteoaks, 57, "OOVM", "analysis_dtests/input.template")
makeD5(oleo_m, oleo_h, virg_l+virg_f, geminata, whiteoaks, 58, "OOVG", "analysis_dtests/input.template")
makeD5(oleo_m, oleo_h, geminata, minima, whiteoaks, 59, "OOVV", "analysis_dtests/input.template")
In [ ]:
## run tests 26-50
for test in range(26,60):
    t = glob.glob("analysis_dtests/input."+str(test)+".*")[0]
    stderr = ! ~/Dropbox/pyrad-git/pyRAD -d $t

Analysis of D-statistic results

Python functions used to calculate significance of D-statistic tests with Holm-bonferroni correction, and to test for correlation with geographic distances. Sampling locations are in Supplementary Table Sxx.

In [20]:
## strips junk from Dtest results files to make names match in G dict'
def nname(x):
    x = x.replace("]","").replace("[","").strip()
    return x
In [21]:
## returns number of significant tests at some alpha level of significance
## after sequential bonferoni correction.
from itertools import groupby
import scipy.stats

def correct(Zs,alpha):
    """ performs Holm-bonferroni correction on 
        a list of Z-scores """
    PS = []
    Zs = list(Zs)
    pone = map(scipy.stats.distributions.norm.sf, Zs)
    pvals = map(lambda x: x*2, pone)  ## two-sided test
    pvals_idxs = zip(pvals, xrange(len(pvals)))
    pvals_idxs.sort()
    lp = len(pvals)
    for pval, idxs in groupby(pvals_idxs, lambda x: x[0]):
        idxs = list(idxs)
        for p, i in idxs:
            if p < alpha/lp:
                PS.append(i)
        lp -= len(idxs)
    return [len(PS),len(Zs),min(Zs),max(Zs)]
In [22]:
## reads in Z scores from the pyRAD output and return summary
def getZs(infile,Ntest,exclude=[]):
    """ returns the Z-scores from a 
        PyRAD Dtest output file """
    lines = open(infile,'r').readlines()
    header, data = lines[0:1],lines[2:]
    dat1 = [i.strip().split("\t") for i in data]
    for name in exclude:
        dat1 = [line for line in dat1 if name not in "".join(line)]
    D = np.array([i for i in dat1])
    Za = []
    Zb = []
    Zab = []
    taxorder = []
    L = []
    
    if Ntest == 5:
        for line in dat1:
            p1,p2,p31,p32,o = line[0:5]
            Zab.append(line[8])
            Za.append(line[9])
            Zb.append(line[10])
            L.append(line[17])
            taxorder.append([p1,p2,p31,p32,o])
        RR = [[map(float,Zab),map(float,Za),map(float,Zb)],min(map(int,L)),max(map(int,L)),taxorder]
    
    if Ntest == 4:
        taxorder = zip([D[0],D[1],D[2]])
        Z,L = D[:,6],D[:,9]
        RR = [map(float,Z), min(map(int,L)), max(map(int,L)),taxorder]
        
    return RR
In [23]:
## for printing results of bonferoni test
def retHB(infile, Ntest, alpha, exclude=[], header=False, ptax=False):
    """ combines the previous two functions into one, 
        and returns the output in pretty format """
    if Ntest == 4:
        if header == True:
            print "\t".join(["range_Z", "N tests/Sig", "range_N_loci"]) 
        Zs,minL,maxL,taxorder = getZs(infile, Ntest, exclude)
        s,t,minz,maxz = correct(Zs, alpha)
        return [minz,maxz,s,t,minL,maxL]
        
    elif Ntest == 5:
        if header == True:
            print "\t".join(["range_Z12", "Sig",
                             "range_Z1", "Sig", 
                             "range_Z2", "Sig", 
                             "range_N_loci"]) 
        ZS,minL,maxL,taxorder = getZs(infile, Ntest, exclude)
        s12,t12,minz12,maxz12 = correct(ZS[0], alpha)
        s1,t1,minz1,maxz1 = correct(ZS[1], alpha)
        s2,t2,minz2,maxz2 = correct(ZS[2], alpha)
        
        return [minz12,maxz12,s12,t12,
                minz1,maxz1,s1,t1,
                minz2,maxz2,s2,t2,
                minL,maxL]

Summarized results

In [26]:
import glob
infiles = glob.glob("analysis_dtests/*.D4.txt")
infiles.sort(key=lambda x: int(x.split('.')[1]))
D = {}
for ff in infiles:
    rr = retHB(ff,4,0.01,[],0)
    ss = "\t".join(map(str,[ff.split("/")[-1]]))
    D[ff.split("/")[-1][4:-7]] = tuple(rr)
    print "%s  \t(%.1f - %.1f)\t%s/%s\t(%s - %s)" % (tuple([ff.split("/")[-1]])+tuple(rr))
out.1.GGM.D4.txt  	(0.0 - 2.3)	0/23	(8603 - 15297)
out.2.MMG.D4.txt  	(1.3 - 6.8)	12/23	(8008 - 12652)
out.3.GGV.D4.txt  	(0.2 - 2.4)	0/17	(10283 - 20388)
out.4.MMV.D4.txt  	(0.2 - 4.7)	7/17	(8015 - 13022)
out.5.MGV.D4.txt  	(0.1 - 7.9)	28/47	(8402 - 16087)
out.6.VVM.D4.txt  	(0.0 - 1.6)	0/11	(9873 - 16589)
out.7.VVG.D4.txt  	(0.1 - 2.5)	0/11	(11854 - 20361)
out.8.VGM.D4.txt  	(0.0 - 3.9)	1/47	(8402 - 16087)
out.9.OSmgv.D4.txt  	(3.1 - 16.2)	164/164	(6216 - 17821)
out.10.SmgvO.D4.txt  	(14.7 - 36.4)	164/164	(6216 - 17821)
out.11.mgvOS.D4.txt  	(6.8 - 25.8)	164/164	(6216 - 17821)
out.12.OOF.D4.txt  	(0.0 - 1.6)	0/39	(10541 - 16074)
out.13.OOB.D4.txt  	(0.1 - 2.4)	0/29	(11698 - 17326)
out.14.BFO.D4.txt  	(0.0 - 8.1)	29/59	(10665 - 18638)
out.15.BFS.D4.txt  	(0.9 - 8.1)	30/35	(7506 - 17427)
out.16.BFmgv.D4.txt  	(1.3 - 17.9)	119/131	(8819 - 19929)
out.17.BBF.D4.txt  	(0.2 - 2.6)	0/11	(12587 - 20569)
out.18.SSmgv.D4.txt  	(0.0 - 4.1)	2/32	(6295 - 14053)
out.19.MVS.D4.txt  	(1.1 - 7.1)	17/35	(6191 - 15894)
out.20.GVS.D4.txt  	(0.0 - 6.9)	18/47	(6235 - 14695)
out.21.VGS.D4.txt  	(0.0 - 2.9)	0/35	(7166 - 18320)
out.22.mgVFt.D4.txt  	(3.6 - 10.6)	47/47	(8240 - 16372)
out.23.OOVFm.D4.txt  	(0.0 - 1.7)	0/19	(12470 - 16074)
out.24.SSO.D4.txt  	(0.0 - 4.1)	2/14	(7195 - 13110)
out.25.OOS.D4.txt  	(0.1 - 7.3)	10/29	(7856 - 15871)
out.99.OSV.D4.txt  	(5.1 - 12.7)	44/44	(7146 - 17821)

Introgression from the outgroups?

This test uses the more distant red oaks as an outgroup to test introgression into Q. minima relative to Q. virginiana or Q. geminata from a more distant white oak. This result was suggested by the Treemix results. However, we find little evidence of introgression from more distant white oaks, and because they are so divergent I suspect that if there was gene flow it would be very apparent.

In [28]:
import pandas as pd
In [29]:
print pd.read_table("analysis_dtests/supplemental/extras2.D4.txt")
    P1         P2         P3    O            D  std(D)     Z    BABA    ABBA  \
0   [FLSA185]  [LALC2]    [DO]  [NI,HE]  0.101   0.051  1.99  209.25  256.25   
1   [FLSA185]  [FLBA140]  [DO]  [NI,HE]  0.040   0.053  0.75  224.00  242.50   
2   [FLSA185]  [FLSF54]   [DO]  [NI,HE]  0.050   0.052  0.96  208.75  230.75   
3   [FLSA185]  [FLAB109]  [DO]  [NI,HE]  0.014   0.057  0.25  221.50  228.00   
4   [FLSF47]   [LALC2]    [DO]  [NI,HE] -0.006   0.044  0.13  272.88  269.88   
5   [FLSF47]   [FLBA140]  [DO]  [NI,HE]  0.032   0.041  0.79  270.00  288.00   
6   [FLSF47]   [FLSF54]   [DO]  [NI,HE] -0.019   0.046  0.42  268.88  258.62   
7   [FLSF47]   [FLAB109]  [DO]  [NI,HE]  0.024   0.054  0.45  257.88  270.62   
8   [FLSA185]  [LALC2]    [AR]  [NI,HE]  0.045   0.054  0.83  283.25  309.75   
9   [FLSA185]  [FLBA140]  [AR]  [NI,HE] -0.044   0.054  0.81  307.50  281.75   
10  [FLSA185]  [FLSF54]   [AR]  [NI,HE] -0.022   0.046  0.48  285.50  273.25   
11  [FLSA185]  [FLAB109]  [AR]  [NI,HE] -0.097   0.053  1.82  311.62  256.62   
12  [FLSF47]   [LALC2]    [AR]  [NI,HE] -0.003   0.048  0.07  337.38  335.12   
13  [FLSF47]   [FLBA140]  [AR]  [NI,HE]  0.061   0.041  1.50  327.62  370.38   
14  [FLSF47]   [FLSF54]   [AR]  [NI,HE] -0.037   0.043  0.86  320.12  297.12   
15  [FLSF47]   [FLAB109]  [AR]  [NI,HE]  0.019   0.043  0.45  304.75  316.75   

    nloci  nboot  pdisc  notes  
0   10493    200   0.05    NaN  
1    9979    200   0.05    NaN  
2    9618    200   0.06    NaN  
3   10117    200   0.05    NaN  
4   13372    200   0.05    NaN  
5   12578    200   0.05    NaN  
6   12141    200   0.05    NaN  
7   12541    200   0.05    NaN  
8   11237    200   0.06    NaN  
9   10856    200   0.06    NaN  
10  10532    200   0.06    NaN  
11  10957    200   0.06    NaN  
12  14318    200   0.05    NaN  
13  13619    200   0.06    NaN  
14  13183    200   0.06    NaN  
15  13460    200   0.05    NaN  

[16 rows x 13 columns]

Test Dfoil statistics for analyses in Fig.2

These tests were performed after the other analyses, using pyRAD v3.0.1. This analysis was experimental (Dfoil method not yet published, my dfoil code not yet fully tested) and thus not included in the manuscript.

change template to dfoil method

This option excludes information from terminal SNP patterns (e.g., BAAAA or AAABA), since these can lead to biases when tip samples have different population sizes, especially when taking measurements including heterozygous sites. This is implemented as the Dfoil_alt method, as described in Pease \& Hahn (2014) doi:10.1101/004689

In [17]:
! sed -i s/part\ /foilalt/g analysis_dtests/input.template
In [30]:
! cat analysis_dtests/input.template
200                           ## N bootstrap reps 
/home/deren/Documents/Oaks/Virentes/analysis_pyrad/outfiles/virentes_c85d6m4p5.loci   ## loc/path 
analysis_dtests/out      ## output file
foilalt                           ## which test: 4,part,foil,foilalt
20                           ## N cores (parallel) 
0                        ## output ABBA/BABA 
0                   ## output bootstraps 
-----------------------------------------------------------
In [31]:
makeD5(brandegei, fusiformis_t, oleo_m+oleo_h, virg_f+virg_l, whiteoaks, 100, "BFtOV", "analysis_dtests/input.template")
makeD5(brandegei, fusiformis_m, oleo_m+oleo_h, virg_f+virg_l, whiteoaks, 101, "BFmOV", "analysis_dtests/input.template")
makeD5(brandegei, fusiformis_t, sagraena, virg_f+virg_l, whiteoaks, 102, "BFtSV", "analysis_dtests/input.template")
makeD5(brandegei, fusiformis_m, sagraena, virg_f+virg_l, whiteoaks, 103, "BFmSV", "analysis_dtests/input.template")
makeD5(brandegei, fusiformis_t, oleo_m+oleo_h, sagraena, whiteoaks, 104, "BFtOS", "analysis_dtests/input.template")
makeD5(brandegei, fusiformis_m, oleo_m+oleo_h, sagraena, whiteoaks, 105, "BFmOS", "analysis_dtests/input.template")
In [37]:
import glob
for test in range(100,106):
    t = glob.glob("analysis_dtests/input."+str(test)+".*")[0]
    stderr = ! ~/Dropbox/pyrad-github/pyRAD -d $t