Notebook

Supplementary Materials¶

Genotyping-by-sequencing as a tool to infer phylogeny and ancestral hybridization: A case study in Carex (Cyperaceae)¶

Escudero et al. (2014)¶

Contact: ahipp@mortonarb.org¶

This is an IPython Notebook, best viewed through a web browser at the following permanent link: nbviewer. Each cell in the notebook will process Python code by default, but can also execute other languages when directed. Here we use primarily bash scripts (cells headed with %%bash or lines starting with !) but also a few R scripts (cells headed with %%R). By executing all cells in this notebook our entire analysis should be easily reproducible.

Executing the code will require the following software (we use the version listed):

pyRAD (v.2.13)
RAxML (v.8.0.16)
mrBayes (v.3.2.2)
RADami (v.1.0-3)

Organize notebook¶

In [ ]:

%%bash
mkdir -p analysis_pyrad/
mkdir -p analysis_dtests/
mkdir -p analysis_raxml/
mkdir -p analysis_mrbayes/

pyRAD GBS data assembly¶

First we create a template params file for pyRAD (v.2.13) using the (-n) argument, then we substitute in the relevant parameters for our analysis using the bash command sed.

In [23]:

%%bash
## create template file
~/Dropbox/pyrad-git/pyRAD -n
mv params.txt analysis_pyrad/params.txt

## substitute in parameters
sed -i '/## 1. /c\analysis_pyrad/_c90d5     ## 1. working directory ' analysis_pyrad/params.txt
sed -i '/## 2. /c\/home/deren/Documents/RADSEQ_DATA/Carex/*.fastq             ## 2. data file         ' analysis_pyrad/params.txt
sed -i '/## 3. /c\/home/deren/Documents/RADSEQ_DATA/barcodes/Carex1.barcodes  ## 3. barcodes file     ' analysis_pyrad/params.txt
sed -i '/## 7. /c\10                    ## 7. use multiple cores ' analysis_pyrad/params.txt
sed -i '/## 8. /c\5                     ## 8. set mindepth   ' analysis_pyrad/params.txt
sed -i '/## 9. /c\3                     ## 9. max lowQual   ' analysis_pyrad/params.txt
sed -i '/## 10. /c\.90                  ## 10. decreased clust threshold ' analysis_pyrad/params.txt
sed -i '/## 11. /c\gbs                  ## 11. changed data type ' analysis_pyrad/params.txt
sed -i '/## 14. /c\c90d5m4p3            ## 14. output name '       analysis_pyrad/params.txt
sed -i '/## 21./c\2                     ## 21. Filter type '       analysis_pyrad/params.txt
sed -i '/## 23./c\3                     ## 23. maxH consens '      analysis_pyrad/params.txt
sed -i '/## 24./c\3                     ## 24. maxN consens '      analysis_pyrad/params.txt
sed -i '/## 29./c\2,2                   ## 29. trim overhang edges ' analysis_pyrad/params.txt
sed -i '/## 30./c\n                     ## 30. add output formats '  analysis_pyrad/params.txt
sed -i '/## 34./c\99999                 ## 34. no max stack size '   analysis_pyrad/params.txt

	new params.txt file created

De-multiplex the data set¶

Calling step 1 of pyRAD de-muliplexes the data set, here allowing for one base difference in the barcodes. This creates a fastQ file for only the samples that are indicated in our barcodes file. For this study this includes 9 sampled individuals.

In [ ]:

%%bash
## create sub working directory
mkdir -p analysis_pyrad/_c90d5/

## demultiplex data files
~/Dropbox/pyrad-git/pyRAD -p analysis_pyrad/params.txt -s 1   2&> /dev/null

Assemble 90% data set at mindepth 5¶

In [18]:

%%bash
## assemble data sets
~/Dropbox/pyrad-git/pyRAD -p analysis_pyrad/params.txt -s 234567   2&> /dev/null

## print summary of results
cat analysis_pyrad/_c90d5/stats/c90d5m4p3.stats


5024        ## loci with > minsp containing data
4958        ## loci with > minsp containing data & paralogs removed
4958        ## loci with > minsp containing data & paralogs removed & final filtering

## number of loci recovered in final data set for each taxon.
taxon	nloci
2816   	4326
2830D  	3918
2893G1 	4075
albolut	4261
longii 	3712
stramin	4200
suberec	3885
vexans 	1749
waponah	3682


## nloci = number of loci with data for exactly ntaxa
## ntotal = number of loci for which at least ntaxa have data
ntaxa	nloci	saved	ntotal
1	-
2	-		-
3	-		-
4	767	*	4958
5	664	*	4191
6	613	*	3527
7	687	*	2914
8	1110	*	2227
9	1117	*	1117


## var = number of loci containing n variable sites.
## pis = number of loci containing n parsimony informative var sites.
n	var	PIS
0	3164	4299
1	1763	523
2	484	95
3	137	31
4	53	9
5	8	1
6	4	0
7	1	0
8	2	0
9	1	0
total var= 3450
total pis= 847

Assemble 85% data set at mindepth 5¶

In [ ]:

%%bash
## create sub working directory
mkdir -p analysis_pyrad/_c85d5/

## set parameters in params file
sed -i '/## 1. /c\analysis_pyrad/_c85d5     ## 1. working directory ' analysis_pyrad/params.txt
sed -i '/## 10./c\.85                       ## 10. decreased clust threshold ' analysis_pyrad/params.txt
sed -i '/## 14./c\c85d5m4p3                 ## 14. output name '   analysis_pyrad/params.txt
sed -i '/## 18./c\analysis_pyrad/_c90d5/fastq/*.gz    ## 18. use de-multiplexed data '   analysis_pyrad/params.txt

In [1]:

%%bash
~/Dropbox/pyrad-git/pyRAD -p analysis_pyrad/params.txt -s 234567  2&> /dev/null
cat analysis_pyrad/_c85d5/stats/c85d5m4p3.stats


4799        ## loci with > minsp containing data
4739        ## loci with > minsp containing data & paralogs removed
4739        ## loci with > minsp containing data & paralogs removed & final filtering

## number of loci recovered in final data set for each taxon.
taxon	nloci
2816   	4047
2830D  	3670
2893G1 	3840
albolut	4019
longii 	3495
stramin	3993
suberec	3672
vexans 	1692
waponah	3485


## nloci = number of loci with data for exactly ntaxa
## ntotal = number of loci for which at least ntaxa have data
ntaxa	nloci	saved	ntotal
1	-
2	-		-
3	-		-
4	766	*	4739
5	667	*	3973
6	600	*	3306
7	698	*	2706
8	1044	*	2008
9	964	*	964


## var = number of loci containing n variable sites.
## pis = number of loci containing n parsimony informative var sites.
n	var	PIS
0	3010	4118
1	1653	491
2	465	81
3	135	30
4	55	9
5	22	5
6	13	3
7	6	2
8	1	0
total var= 3446
total pis= 836

Assemble 85% data set at mindepth 2¶

In [ ]:

%%bash
## create sub working directory
mkdir -p analysis_pyrad/_c85d2/

## copy re-usable files from _c85d5/
cp -r analysis_pyrad/_c85d5/stats/ analysis_pyrad/_c85d2/
cp -r analysis_pyrad/_c85d5/clust.85/ analysis_pyrad/_c85d2/

## remove consens files b/c they will be remade at lower mindepth
rm analysis_pyrad/_c85d2/clust.85/*consens*

## set new parameters in params file
sed -i '/## 1. /c\analysis_pyrad/_c85d2       ## 1. working directory ' analysis_pyrad/params.txt
sed -i '/## 8. /c\2                           ## 8. set mindepth ' analysis_pyrad/params.txt
sed -i '/## 14./c\c85d2m4p3                   ## 14. output name ' analysis_pyrad/params.txt
sed -i '/## 31./c\1                           ## 31. allow lowdepth base calls ' analysis_pyrad/params.txt

In [2]:

%%bash
~/Dropbox/pyrad-git/pyRAD -p analysis_pyrad/params.txt -s 567 2&> /dev/null
cat analysis_pyrad/_c85d2/stats/c85d2m4p3.stats


7274        ## loci with > minsp containing data
7201        ## loci with > minsp containing data & paralogs removed
7201        ## loci with > minsp containing data & paralogs removed & final filtering

## number of loci recovered in final data set for each taxon.
taxon	nloci
2816   	5832
2830D  	5568
2893G1 	5642
albolut	5907
longii 	5120
stramin	5713
suberec	5443
vexans 	3386
waponah	5746


## nloci = number of loci with data for exactly ntaxa
## ntotal = number of loci for which at least ntaxa have data
ntaxa	nloci	saved	ntotal
1	-
2	-		-
3	-		-
4	1263	*	7201
5	921	*	5938
6	896	*	5017
7	1125	*	4121
8	1515	*	2996
9	1481	*	1481


## var = number of loci containing n variable sites.
## pis = number of loci containing n parsimony informative var sites.
n	var	PIS
0	3807	6076
1	2793	797
2	1011	203
3	433	91
4	151	19
5	76	10
6	27	3
7	15	1
8	6	1
9	4	0
10	2	0
11	0	0
12	0	0
13	0	0
14	1	0
total var= 7483
total pis= 1635

RAxML Analyses¶

In [86]:

%%bash
raxmlHPC-PTHREADS-AVX -f a \
                      -p 12345 \
                      -s analysis_pyrad/_c90d5/outfiles/c90d5m4p3.phy \
                      -x 12345 -N 100 \
                      -m GTRGAMMA \
                      -n c90d5m4p3 \
                      -o 'stramin' \
                      -w /home/deren/Documents/Marcial/Carex/analysis_raxml/ \
                      -T 5   2&> /dev/null

In [87]:

%%bash
raxmlHPC-PTHREADS-AVX -f a \
                      -p 12345 \
                      -s analysis_pyrad/_c85d5/outfiles/c85d5m4p3.phy \
                      -x 12345 -N 100 \
                      -m GTRGAMMA \
                      -n c85d5m4p3 \
                      -o 'stramin' \
                      -w /home/deren/Documents/Marcial/Carex/analysis_raxml/ \
                      -T 5   2&> /dev/null

In [88]:

%%bash
raxmlHPC-PTHREADS-AVX -f a \
                      -p 12345 \
                      -s analysis_pyrad/_c85d2/outfiles/c85d2m4p3.phy \
                      -x 12345 -N 100 \
                      -m GTRGAMMA \
                      -n c85d2m4p3 \
                      -o 'stramin' \
                      -w /home/deren/Documents/Marcial/Carex/analysis_raxml/ \
                      -T 5   2&> /dev/null

mrBayes Analyses¶

We ran two independent mrBayes (v.3.2.2) analyses for each data set, each running for 10M generations sampling trees every 1000 gens and using four Metropolis-coupled MCMC chains per run.

In [7]:

## create a nexus string with parameters for mrbayes run
nexinfo = """
\nbegin mrbayes;
lset nst=6 rates=invgamma;
set autoclose=yes;
mcmc ngen=10000000 printfreq=1000 samplefreq=1000 nchains=4 savebrlens=yes;
sump;
sumt;
end; 
"""

## append mrbayes nexus info to the end of nexus sequence files
! echo '$nexinfo' >> analysis_pyrad/_c90d5/outfiles/c90d5m4p3.nex
! echo '$nexinfo' >> analysis_pyrad/_c85d5/outfiles/c85d5m4p3.nex
! echo '$nexinfo' >> analysis_pyrad/_c85d2/outfiles/c85d2m4p3.nex

In [ ]:

## call mrbayes on each nexus file data set
## Do two runs for each data set
quiet = ! mb analysis_pyrad/_c90d5/outfiles/c90d5m4p3.nex
quiet = ! mb analysis_pyrad/_c85d5/outfiles/c85d5m4p3.nex
quiet = ! mb analysis_pyrad/_c85d2/outfiles/c85d2m4p3.nex
#quiet = ! mpirun -np 6 analysis_pyrad/_c85d2/outfiles/c85d2m4p3.nex

In [2]:

## clean up, move files to mrbayes directory
! mv /home/deren/Documents/Marcial/Carex/analysis_pyrad/_c90d5/outfiles/c90d5m4p3.nex.* analysis_mrbayes/
! mv /home/deren/Documents/Marcial/Carex/analysis_pyrad/_c85d5/outfiles/c85d5m4p3.nex.* analysis_mrbayes/
! mv /home/deren/Documents/Marcial/Carex/analysis_pyrad/_c85d2/outfiles/c85d2m4p3.nex.* analysis_mrbayes/

D-statistic tests for introgression¶

Create four-taxon D-statistic input file¶

In [154]:

%%bash
## create Dtest template file
~/Dropbox/pyrad-git/pyRAD -D > analysis_dtests/input.D4

In [155]:

%%bash
## substitute in new parameters
sed -i '/## N bootstrap/c\200                         ## N bootstrap ' analysis_dtests/input.D4
sed -i '/## loc/c\analysis_pyrad/_c85d2/outfiles/c85d2m4p3.loci     ## loc/path ' analysis_dtests/input.D4
sed -i '/## output file /c\analysis_dtests/test1     ## output file ' analysis_dtests/input.D4
sed -i '/## N cores/c\4                           ## N cores ' analysis_dtests/input.D4

Append iterations over test samples to file by iterating over sample names with the following Python script:

In [156]:

CS = ['2816','2830D','2893G1']
W = ['waponah']
CSU = ['suberec'] 
CL = ['longii']
CV = ['vexans']
CA = ['albolut']
S = ['stramin']

def makeD4(P1,P2,P3,O,poly,infile):
    ! head -n 8 $infile > temp
    ! mv temp $infile
    tests = []
    if not poly:
        for o in O:
            for p3 in P3:
                for p2 in P2:
                    for p1 in P1:
                        if p1 != p2:
                            if set((p1,p2,p3,o)) not in tests:
                                tests.append(set((p1,p2,p3,o)))
                                test = '%s\t%s\t%s\t%s\t##' % (p1,p2,p3,o)
                                ## appends tests to input file
                                ! echo '$test' >> analysis_dtests/input.D4
    else:
        for p3 in P3:
            for p2 in P2:
                for p1 in P1:
                    if p1 != p2:
                        if set((p1,p2,p3)) not in tests:
                            tests.append(set((p1,p2,p3)))
                            test = '[%s]\t[%s]\t[%s]\t[%s]\t##' % (p1,p2,p3,",".join(O))
                            ## appends tests to input file
                            ! echo '$test' >> analysis_dtests/input.D4

In [157]:

# make and print input file               
makeD4(CS,CS+W,CSU+CL+CV+CA,S,0,"analysis_dtests/input.D4")
! cat analysis_dtests/input.D4

200                         ## N bootstrap 
analysis_pyrad/_c85d2/outfiles/c85d2m4p3.loci     ## loc/path 
analysis_dtests/test1     ## output file 
4                            ## which test: 4,part,foil,foilalt
4                           ## N cores 
0                            ## output ABBA/BABA loci to files (0=no,1,2=verbose)
0                            ## output bootstrap Ds to files (0=no,1=yes)
-----------------------------------------------------------
2830D	2816	suberec	stramin	##
2893G1	2816	suberec	stramin	##
2893G1	2830D	suberec	stramin	##
2816	waponah	suberec	stramin	##
2830D	waponah	suberec	stramin	##
2893G1	waponah	suberec	stramin	##
2830D	2816	longii	stramin	##
2893G1	2816	longii	stramin	##
2893G1	2830D	longii	stramin	##
2816	waponah	longii	stramin	##
2830D	waponah	longii	stramin	##
2893G1	waponah	longii	stramin	##
2830D	2816	vexans	stramin	##
2893G1	2816	vexans	stramin	##
2893G1	2830D	vexans	stramin	##
2816	waponah	vexans	stramin	##
2830D	waponah	vexans	stramin	##
2893G1	waponah	vexans	stramin	##
2830D	2816	albolut	stramin	##
2893G1	2816	albolut	stramin	##
2893G1	2830D	albolut	stramin	##
2816	waponah	albolut	stramin	##
2830D	waponah	albolut	stramin	##
2893G1	waponah	albolut	stramin	##

Perform D-statistic tests¶

Test 1: Introgression with one C. scoparia to the exclusion of the others.¶

In [158]:

quiet = ! ~/Dropbox/pyrad-git/pyRAD -d analysis_dtests/input.D4

Test 1H: Now we re-run it with including heterozygous sites¶

In [171]:

## change output file name
! sed -i '/## output file /c\analysis_dtests/test1.H   ## output file ' analysis_dtests/input.D4
## make new input file
makeD4(CS,CS,CSU+CL+CV+CA,S,1,"analysis_dtests/input.D4")
## run the test
quiet = ! ~/Dropbox/pyrad-git/pyRAD -d analysis_dtests/input.D4
! cat analysis_dtests/test1.D4.txt

P1    	P2     	P3     	O      	D	std(D)	Z	BABA	ABBA	nloci	nboot	pdisc	notes
2830D 	2816   	suberec	stramin	0.391	0.319	1.23	7.00	16.00	3466	200	0.00	
2893G1	2816   	suberec	stramin	0.300	0.366	0.82	7.00	13.00	3560	200	0.00	
2893G1	2830D  	suberec	stramin	-0.333	0.348	0.96	10.00	5.00	3434	200	0.00	
2816  	waponah	suberec	stramin	-0.342	0.153	2.24	51.00	25.00	3442	200	0.02	
2830D 	waponah	suberec	stramin	-0.233	0.171	1.37	37.00	23.00	3304	200	0.01	
2893G1	waponah	suberec	stramin	-0.309	0.153	2.02	36.00	19.00	3401	200	0.01	
2830D 	2816   	longii 	stramin	0.120	0.322	0.37	11.00	14.00	3337	200	0.00	
2893G1	2816   	longii 	stramin	0.100	0.404	0.25	9.00	11.00	3451	200	0.00	
2893G1	2830D  	longii 	stramin	-0.125	0.319	0.39	9.00	7.00	3332	200	0.00	
2816  	waponah	longii 	stramin	-0.172	0.180	0.96	34.00	24.00	3312	200	0.01	
2830D 	waponah	longii 	stramin	-0.067	0.153	0.43	24.00	21.00	3203	200	0.01	
2893G1	waponah	longii 	stramin	-0.095	0.167	0.57	23.00	19.00	3306	200	0.01	
2830D 	2816   	vexans 	stramin	0.167	0.336	0.50	5.00	7.00	2048	200	0.00	
2893G1	2816   	vexans 	stramin	-0.143	0.454	0.31	4.00	3.00	2072	200	0.00	
2893G1	2830D  	vexans 	stramin	-0.647	0.249	2.60	14.00	3.00	2036	200	0.01	
2816  	waponah	vexans 	stramin	0.290	0.213	1.36	11.00	20.00	2096	200	0.01	
2830D 	waponah	vexans 	stramin	0.243	0.201	1.21	14.00	23.00	2046	200	0.01	
2893G1	waponah	vexans 	stramin	0.071	0.238	0.30	13.00	15.00	2047	200	0.01	
2830D 	2816   	albolut	stramin	0.520	0.248	2.09	6.00	19.00	3597	200	0.00	
2893G1	2816   	albolut	stramin	0.538	0.254	2.12	6.00	20.00	3700	200	0.00	
2893G1	2830D  	albolut	stramin	-0.263	0.234	1.13	12.00	7.00	3552	200	0.01	
2816  	waponah	albolut	stramin	-0.281	0.159	1.77	41.00	23.00	3606	200	0.01	
2830D 	waponah	albolut	stramin	-0.119	0.134	0.89	33.00	26.00	3436	200	0.01	
2893G1	waponah	albolut	stramin	-0.067	0.171	0.39	24.00	21.00	3539	200	0.01

Test 2: Introgression between C. waponahkikensis and C. scoparia¶

In [175]:

## change output file name
! sed -i '/## output file /c\analysis_dtests/test2     ## output file ' analysis_dtests/input.D4
## make new input file
makeD4(CS,CS,W,S+CL+CV+CA,0,"analysis_dtests/input.D4")
## run the test
quiet = ! ~/Dropbox/pyrad-git/pyRAD -d analysis_dtests/input.D4
! cat analysis_dtests/test2.D4.txt

P1    	P2   	P3     	O      	D	std(D)	Z	BABA	ABBA	nloci	nboot	pdisc	notes
2830D 	2816 	waponah	stramin	0.048	0.285	0.17	10.00	11.00	3538	200	0.00	
2893G1	2816 	waponah	stramin	-0.143	0.276	0.52	12.00	9.00	3626	200	0.00	
2893G1	2830D	waponah	stramin	-0.143	0.310	0.46	8.00	6.00	3501	200	0.00	
2830D 	2816 	waponah	longii 	-0.222	0.224	0.99	22.00	14.00	3267	200	0.01	
2893G1	2816 	waponah	longii 	-0.333	0.258	1.29	20.00	10.00	3356	200	0.01	
2893G1	2830D	waponah	longii 	0.200	0.258	0.78	6.00	9.00	3248	200	0.00	
2830D 	2816 	waponah	vexans 	0.111	0.263	0.42	8.00	10.00	2116	200	0.01	
2893G1	2816 	waponah	vexans 	-0.200	0.330	0.61	9.00	6.00	2098	200	0.00	
2893G1	2830D	waponah	vexans 	-0.429	0.347	1.24	5.00	2.00	2062	200	0.00	
2830D 	2816 	waponah	albolut	-0.212	0.232	0.92	20.00	13.00	3529	200	0.01	
2893G1	2816 	waponah	albolut	-0.600	0.207	2.89	20.00	5.00	3593	200	0.00	
2893G1	2830D	waponah	albolut	-0.040	0.248	0.16	13.00	12.00	3458	200	0.01

Test 2H: Re-run including heterozygous sites¶

This uses the SNP frequency weighted across multiple outgroups as the outgroup allele.

In [164]:

## change output file name
! sed -i '/## output file /c\analysis_dtests/test2.H     ## output file ' analysis_dtests/input.D4
## make new input file
makeD4(CS,CS,W,S+CL+CV+CA,1,"analysis_dtests/input.D4")
## run the test
quiet = ! ~/Dropbox/pyrad-git/pyRAD -d analysis_dtests/input.D4

Test 3: introgression between non-scoparia clade¶

In [174]:

## change output file name
! sed -i '/## output file /c\analysis_dtests/test3     ## output file ' analysis_dtests/input.D4
## make new input file
makeD4(CL,CV,CA+CSU,S+CS,0,"analysis_dtests/input.D4")
## run the test
quiet = ! ~/Dropbox/pyrad-git/pyRAD -d analysis_dtests/input.D4
! cat analysis_dtests/test3.D4.txt

P1    	P2    	P3     	O      	D	std(D)	Z	BABA	ABBA	nloci	nboot	pdisc	notes
longii	vexans	albolut	stramin	-0.379	0.183	2.07	20.00	9.00	2016	200	0.01	
longii	vexans	suberec	stramin	-0.333	0.230	1.45	16.00	8.00	1972	200	0.01	
longii	vexans	albolut	2816   	-0.286	0.207	1.38	18.00	10.00	2005	200	0.01	
longii	vexans	suberec	2816   	-0.185	0.231	0.80	16.00	11.00	1957	200	0.01	
longii	vexans	albolut	2830D  	-0.360	0.209	1.72	17.00	8.00	1968	200	0.01	
longii	vexans	suberec	2830D  	-0.238	0.259	0.92	13.00	8.00	1921	200	0.01	
longii	vexans	albolut	2893G1 	-0.083	0.223	0.37	13.00	11.00	1987	200	0.01	
longii	vexans	suberec	2893G1 	-0.143	0.279	0.51	12.00	9.00	1937	200	0.01

Test 3H: Re-run including heterozygous sites¶

In [170]:

## change output file name
! sed -i '/## output file /c\analysis_dtests/test3.H     ## output file ' analysis_dtests/input.D4
## make new input file
makeD4(CL,CV,CA+CSU,S+CS,1,"analysis_dtests/input.D4")
## run the test
quiet = ! ~/Dropbox/pyrad-git/pyRAD -d analysis_dtests/input.D4

R analyses --¶

Make tree plots in the R package 'ape'

In [80]:

## If executing this code in the IPython notebook 
## you will need to have the python library python-Rpy2 installed
## otherwise you can execute the code below in a separate R shell
%load_ext rmagic

In [81]:

%%R
library(ape)
library(RADami)

plotting trees¶

In [141]:

%%R
tre <- read.tree("analysis_raxml/RAxML_bestTree.c90d5m4p3")
boots <- read.tree("analysis_raxml/RAxML_bipartitions.c90d5m4p3")
plot(ladderize(tre))
nodelabels(boots$node.label, bg='grey')

In [139]:

%%R
tre <- read.tree("analysis_raxml/RAxML_bestTree.c85d5m4p3")
boots <- read.tree("analysis_raxml/RAxML_bipartitions.c85d5m4p3")
tre <- ladderize(rotate(tre,11))
plot(tre)
nodelabels(boots$node.label, bg='grey')

In [140]:

%%R
tre <- read.tree("analysis_raxml/RAxML_bestTree.c85d2m4p3")
boots <- read.tree("analysis_raxml/RAxML_bipartitions.c85d2m4p3")
plot(ladderize(tre))
nodelabels(boots$node.label, bg='grey')

partitioned RAD analysis¶

Examine the number of loci supporting each tree using the R package 'RADami' (v.1.0-3)

In [ ]:

%%R

# Partitioned RAD analysis of the Carex scoparia trees
# 2014-06-06

# First, prune to the ingroup taxa of relevance to your question.
# Then, generate a treeset using NNI:

scop.tre.c85d2 <- read.tree('../R1.pyRAD/analysis_raxml/RAxML_bestTree.c85d2m4p3')
scop.tre.c85d5 <- read.tree('../R1.pyRAD/analysis_raxml/RAxML_bestTree.c85d5m4p3')
scop.tre.c90d5 <- read.tree('../R1.pyRAD/analysis_raxml/RAxML_bestTree.c90d5m4p3')

scop.tre.c85d2$tip.label <- paste('>', scop.tre.c85d2$tip.label, sep = '')
scop.tre.c85d5$tip.label <- paste('>', scop.tre.c85d5$tip.label, sep = '')
scop.tre.c90d5$tip.label <- paste('>', scop.tre.c90d5$tip.label, sep = '')

scop.tre.c85d2 <- drop.tip(scop.tre.c85d2, '>stramin')
scop.tre.c85d5 <- drop.tip(scop.tre.c85d5, '>stramin')
scop.tre.c90d5 <- drop.tip(scop.tre.c90d5, '>stramin')

scop.nni.c85d2 <- genTrees(scop.tre.c85d2, maxmoves=3, N = 200, filebase = 'scop.nni.c85d2', perms = c(length(nni(scop.tre.c85d2)), 150, 150))
scop.nni.c85d5 <- genTrees(scop.tre.c85d5, maxmoves=3, N = 200, filebase = 'scop.nni.c85d5', perms = c(length(nni(scop.tre.c85d5)), 150, 150))
scop.nni.c90d5 <- genTrees(scop.tre.c90d5, maxmoves=3, N = 200, filebase = 'scop.nni.c90d5', perms = c(length(nni(scop.tre.c90d5)), 150, 150))

In [ ]:

%%R

# then, generate datasets for each of your treesets:

scop.rad.c85d2 <- read.pyRAD('../R1.pyRAD/analysis_pyrad/_c85d2/outfiles/c85d2m4p3.loci')
scop.rad.c85d5 <- read.pyRAD('../R1.pyRAD/analysis_pyrad/_c85d5/outfiles/c85d5m4p3.loci')
scop.rad.c90d5 <- read.pyRAD('../R1.pyRAD/analysis_pyrad/_c90d5/outfiles/c90d5m4p3.loci')

gen.RAD.loci.datasets(scop.rad.c85d2, scop.nni.c85d2, fileBase = 'c85d2', cores = 10, taxa = scop.nni.c85d2[[1]]$tip.label)
gen.RAD.loci.datasets(scop.rad.c85d5, scop.nni.c85d5, fileBase = 'c85d5', cores = 10, taxa = scop.nni.c85d2[[1]]$tip.label)
gen.RAD.loci.datasets(scop.rad.c90d5, scop.nni.c90d5, fileBase = 'c90d5', cores = 10, taxa = scop.nni.c85d2[[1]]$tip.label)

In [ ]:

%%R

# Each resulting dataset will have a shell script associated with it in the folder numbered '.0', and this should be executed 
# outside of R to get the RAxML output needed for the next steps of analysis. You can close or open R as you see fit, but if
# you close it, save the workspace (it will make your life easier in the next steps, and I have assumed in the next lines that
# you are working in the still-open workspace, or loaded the saved workspace and re-attached RADami). Read in the likelihood files,
# ranking the trees by likelihood:

scop.matched.c85d5 <- match.lnL.to.trees('c85d5.0')
scop.matched.c85d2 <- match.lnL.to.trees('c85d2.0')
scop.matched.c90d5 <- match.lnL.to.trees('c90d5.0')

# Then, plot them:

plot(scop.matched.c85d2, fileprefix='c85d2', panels = c('bestMat', 'worstMat', 'bestMat'), ci = rep(0.95, 3), regression = c(F, F, T), 
      lnL.break = c(-1E10, -1E10, -374000), pch = 3)
plot(scop.matched.c90d5, fileprefix='c90d5', panels = c('bestMat', 'worstMat', 'bestMat'), ci = rep(0.95, 3), regression = c(F, F, T), 
      lnL.break = c(-1E10, -1E10, -253350), pch = 3)
plot(scop.matched.c85d5, fileprefix='c85d5', panels = c('bestMat', 'worstMat'), ci = rep(0.95, 2), regression = c(T, F), 
      lnL.break = c(-1E10, -1E10), pch = 3)

# and look at the outliers:

pdf('c90d5.trees.pdf', width = 8.5, height = 11)
layout(matrix(1:6, 3, 2))
for(i in c(1, which(colSums(scop.ranks.c90d5$bestMat) > 59))) {
  plot(scop.nni.c90d5[[i]])
  title(main = paste('c90d5, tree',i))
  }
dev.off()