This is an IPython notebook. Most of the code is composed of bash scripts, indicated by %%bash at the top of the cell, otherwise it is IPython code. This notebook includes code to download, assemble and analyze a published RADseq data set.
### Notebook 4
### Data set 4 (Orestias)
### Authors: Takahashi & Moreno (2015)
### Data Location: DDBJ DRA DRA003595
Sequence data for this study are archived on the NCBI sequence read archive (SRA). Below I read in SraRunTable.txt for this project which contains all of the information we need to download the data.
%%bash
## make a new directory for this analysis
mkdir -p empirical_4/fastq/
import os
def wget_download_ddbj(SRR, outdir):
""" Python function to get sra data from ncbi and write to
outdir with a new name using bash call wget """
## create a call string
call = "wget -q -r -nH --cut-dirs=9 -P "+outdir+" "+\
"ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/sra/ByExp/"+\
"sra/DRX/DRX033/DRX033{:03d}".format(SRR)
## run wget call
! $call
Here we pass the SRR number and the sample name to the wget_download
function so that the files are saved. In this case we do not have the sample names, just their SRR IDs.
for ID in range(6,70):
wget_download_ddbj(ID, "empirical_4/fastq/")
%%bash
## convert sra files to fastq using fastq-dump tool
## output as gzipped into the fastq directory
fastq-dump --gzip -O empirical_4/fastq/ empirical_4/fastq/*.sra
## remove .sra files
rm empirical_4/fastq/*.sra
Read 3850280 spots for empirical_4/fastq/DRR036765.sra Written 3850280 spots for empirical_4/fastq/DRR036765.sra Read 5135835 spots for empirical_4/fastq/DRR036766.sra Written 5135835 spots for empirical_4/fastq/DRR036766.sra Read 3812927 spots for empirical_4/fastq/DRR036767.sra Written 3812927 spots for empirical_4/fastq/DRR036767.sra Read 4279789 spots for empirical_4/fastq/DRR036768.sra Written 4279789 spots for empirical_4/fastq/DRR036768.sra Read 4238415 spots for empirical_4/fastq/DRR036769.sra Written 4238415 spots for empirical_4/fastq/DRR036769.sra Read 4897736 spots for empirical_4/fastq/DRR036770.sra Written 4897736 spots for empirical_4/fastq/DRR036770.sra Read 4110217 spots for empirical_4/fastq/DRR036771.sra Written 4110217 spots for empirical_4/fastq/DRR036771.sra Read 4381798 spots for empirical_4/fastq/DRR036772.sra Written 4381798 spots for empirical_4/fastq/DRR036772.sra Read 3692455 spots for empirical_4/fastq/DRR036773.sra Written 3692455 spots for empirical_4/fastq/DRR036773.sra Read 4146148 spots for empirical_4/fastq/DRR036774.sra Written 4146148 spots for empirical_4/fastq/DRR036774.sra Read 5296764 spots for empirical_4/fastq/DRR036775.sra Written 5296764 spots for empirical_4/fastq/DRR036775.sra Read 3916368 spots for empirical_4/fastq/DRR036776.sra Written 3916368 spots for empirical_4/fastq/DRR036776.sra Read 5076822 spots for empirical_4/fastq/DRR036777.sra Written 5076822 spots for empirical_4/fastq/DRR036777.sra Read 3986174 spots for empirical_4/fastq/DRR036778.sra Written 3986174 spots for empirical_4/fastq/DRR036778.sra Read 3652015 spots for empirical_4/fastq/DRR036779.sra Written 3652015 spots for empirical_4/fastq/DRR036779.sra Read 4887011 spots for empirical_4/fastq/DRR036780.sra Written 4887011 spots for empirical_4/fastq/DRR036780.sra Read 4027193 spots for empirical_4/fastq/DRR036781.sra Written 4027193 spots for empirical_4/fastq/DRR036781.sra Read 5385172 spots for empirical_4/fastq/DRR036782.sra Written 5385172 spots for empirical_4/fastq/DRR036782.sra Read 5128952 spots for empirical_4/fastq/DRR036783.sra Written 5128952 spots for empirical_4/fastq/DRR036783.sra Read 4593914 spots for empirical_4/fastq/DRR036784.sra Written 4593914 spots for empirical_4/fastq/DRR036784.sra Read 2334669 spots for empirical_4/fastq/DRR036785.sra Written 2334669 spots for empirical_4/fastq/DRR036785.sra Read 4626890 spots for empirical_4/fastq/DRR036786.sra Written 4626890 spots for empirical_4/fastq/DRR036786.sra Read 3888993 spots for empirical_4/fastq/DRR036787.sra Written 3888993 spots for empirical_4/fastq/DRR036787.sra Read 4512205 spots for empirical_4/fastq/DRR036788.sra Written 4512205 spots for empirical_4/fastq/DRR036788.sra Read 4174220 spots for empirical_4/fastq/DRR036789.sra Written 4174220 spots for empirical_4/fastq/DRR036789.sra Read 767653 spots for empirical_4/fastq/DRR036790.sra Written 767653 spots for empirical_4/fastq/DRR036790.sra Read 4019423 spots for empirical_4/fastq/DRR036791.sra Written 4019423 spots for empirical_4/fastq/DRR036791.sra Read 3286802 spots for empirical_4/fastq/DRR036792.sra Written 3286802 spots for empirical_4/fastq/DRR036792.sra Read 4601860 spots for empirical_4/fastq/DRR036793.sra Written 4601860 spots for empirical_4/fastq/DRR036793.sra Read 5436565 spots for empirical_4/fastq/DRR036794.sra Written 5436565 spots for empirical_4/fastq/DRR036794.sra Read 4528702 spots for empirical_4/fastq/DRR036795.sra Written 4528702 spots for empirical_4/fastq/DRR036795.sra Read 4328453 spots for empirical_4/fastq/DRR036796.sra Written 4328453 spots for empirical_4/fastq/DRR036796.sra Read 4845789 spots for empirical_4/fastq/DRR036797.sra Written 4845789 spots for empirical_4/fastq/DRR036797.sra Read 4231660 spots for empirical_4/fastq/DRR036798.sra Written 4231660 spots for empirical_4/fastq/DRR036798.sra Read 4610513 spots for empirical_4/fastq/DRR036799.sra Written 4610513 spots for empirical_4/fastq/DRR036799.sra Read 5079930 spots for empirical_4/fastq/DRR036800.sra Written 5079930 spots for empirical_4/fastq/DRR036800.sra Read 4435752 spots for empirical_4/fastq/DRR036801.sra Written 4435752 spots for empirical_4/fastq/DRR036801.sra Read 3428835 spots for empirical_4/fastq/DRR036802.sra Written 3428835 spots for empirical_4/fastq/DRR036802.sra Read 1880356 spots for empirical_4/fastq/DRR036803.sra Written 1880356 spots for empirical_4/fastq/DRR036803.sra Read 5300107 spots for empirical_4/fastq/DRR036804.sra Written 5300107 spots for empirical_4/fastq/DRR036804.sra Read 2912768 spots for empirical_4/fastq/DRR036805.sra Written 2912768 spots for empirical_4/fastq/DRR036805.sra Read 4221013 spots for empirical_4/fastq/DRR036806.sra Written 4221013 spots for empirical_4/fastq/DRR036806.sra Read 4787167 spots for empirical_4/fastq/DRR036807.sra Written 4787167 spots for empirical_4/fastq/DRR036807.sra Read 4955937 spots for empirical_4/fastq/DRR036808.sra Written 4955937 spots for empirical_4/fastq/DRR036808.sra Read 4351758 spots for empirical_4/fastq/DRR036809.sra Written 4351758 spots for empirical_4/fastq/DRR036809.sra Read 2199538 spots for empirical_4/fastq/DRR036810.sra Written 2199538 spots for empirical_4/fastq/DRR036810.sra Read 4750069 spots for empirical_4/fastq/DRR036811.sra Written 4750069 spots for empirical_4/fastq/DRR036811.sra Read 4544226 spots for empirical_4/fastq/DRR036812.sra Written 4544226 spots for empirical_4/fastq/DRR036812.sra Read 4825518 spots for empirical_4/fastq/DRR036813.sra Written 4825518 spots for empirical_4/fastq/DRR036813.sra Read 5517492 spots for empirical_4/fastq/DRR036814.sra Written 5517492 spots for empirical_4/fastq/DRR036814.sra Read 3299739 spots for empirical_4/fastq/DRR036815.sra Written 3299739 spots for empirical_4/fastq/DRR036815.sra Read 4464593 spots for empirical_4/fastq/DRR036816.sra Written 4464593 spots for empirical_4/fastq/DRR036816.sra Read 4873706 spots for empirical_4/fastq/DRR036817.sra Written 4873706 spots for empirical_4/fastq/DRR036817.sra Read 3321824 spots for empirical_4/fastq/DRR036818.sra Written 3321824 spots for empirical_4/fastq/DRR036818.sra Read 3625781 spots for empirical_4/fastq/DRR036819.sra Written 3625781 spots for empirical_4/fastq/DRR036819.sra Read 3779872 spots for empirical_4/fastq/DRR036820.sra Written 3779872 spots for empirical_4/fastq/DRR036820.sra Read 5404611 spots for empirical_4/fastq/DRR036821.sra Written 5404611 spots for empirical_4/fastq/DRR036821.sra Read 5276329 spots for empirical_4/fastq/DRR036822.sra Written 5276329 spots for empirical_4/fastq/DRR036822.sra Read 3720213 spots for empirical_4/fastq/DRR036823.sra Written 3720213 spots for empirical_4/fastq/DRR036823.sra Read 4306471 spots for empirical_4/fastq/DRR036824.sra Written 4306471 spots for empirical_4/fastq/DRR036824.sra Read 4862198 spots for empirical_4/fastq/DRR036825.sra Written 4862198 spots for empirical_4/fastq/DRR036825.sra Read 4332795 spots for empirical_4/fastq/DRR036826.sra Written 4332795 spots for empirical_4/fastq/DRR036826.sra Read 4589896 spots for empirical_4/fastq/DRR036827.sra Written 4589896 spots for empirical_4/fastq/DRR036827.sra Read 4980042 spots for empirical_4/fastq/DRR036828.sra Written 4980042 spots for empirical_4/fastq/DRR036828.sra Read 272718918 spots total Written 272718918 spots total
%%bash
pyrad --version
pyRAD 3.0.63
%%bash
## delete existing params file if it exits
rm params.txt
## create a new default params file
pyrad -n
new params.txt file created
%%bash
## substitute new parameters into file
sed -i '/## 1. /c\empirical_4/ ## 1. working directory ' params.txt
sed -i '/## 6. /c\TGCAGG ## 6. cutters ' params.txt
sed -i '/## 7. /c\30 ## 7. N processors ' params.txt
sed -i '/## 9. /c\6 ## 9. NQual ' params.txt
sed -i '/## 10./c\.85 ## 10. clust threshold ' params.txt
sed -i '/## 12./c\4 ## 12. MinCov ' params.txt
sed -i '/## 13./c\10 ## 13. maxSH ' params.txt
sed -i '/## 14./c\empirical_4_m4 ## 14. output name ' params.txt
sed -i '/## 18./c\empirical_4/fastq/*.gz ## 18. data location ' params.txt
sed -i '/## 29./c\2,2 ## 29. trim overhang ' params.txt
sed -i '/## 30./c\p,n,s ## 30. output formats ' params.txt
cat params.txt
==** parameter inputs for pyRAD version 3.0.63 **======================== affected step == empirical_4/ ## 1. working directory ./*.fastq.gz ## 2. Loc. of non-demultiplexed files (if not line 18) (s1) ./*.barcodes ## 3. Loc. of barcode file (if not line 18) (s1) vsearch ## 4. command (or path) to call vsearch (or usearch) (s3,s6) muscle ## 5. command (or path) to call muscle (s3,s7) TGCAGG ## 6. cutters 30 ## 7. N processors 6 ## 8. Mindepth: min coverage for a cluster (s4,s5) 6 ## 9. NQual .85 ## 10. clust threshold rad ## 11. Datatype: rad,gbs,pairgbs,pairddrad,(others:see docs)(all) 4 ## 12. MinCov 10 ## 13. maxSH empirical_4_m4 ## 14. output name ==== optional params below this line =================================== affected step == ## 15.opt.: select subset (prefix* only selector) (s2-s7) ## 16.opt.: add-on (outgroup) taxa (list or prefix*) (s6,s7) ## 17.opt.: exclude taxa (list or prefix*) (s7) empirical_4/fastq/*.gz ## 18. data location ## 19.opt.: maxM: N mismatches in barcodes (def= 1) (s1) ## 20.opt.: phred Qscore offset (def= 33) (s2) ## 21.opt.: filter: def=0=NQual 1=NQual+adapters. 2=strict (s2) ## 22.opt.: a priori E,H (def= 0.001,0.01, if not estimated) (s5) ## 23.opt.: maxN: max Ns in a cons seq (def=5) (s5) ## 24.opt.: maxH: max heterozyg. sites in cons seq (def=5) (s5) ## 25.opt.: ploidy: max alleles in cons seq (def=2;see docs) (s4,s5) ## 26.opt.: maxSNPs: (def=100). Paired (def=100,100) (s7) ## 27.opt.: maxIndels: within-clust,across-clust (def. 3,99) (s3,s7) ## 28.opt.: random number seed (def. 112233) (s3,s6,s7) 2,2 ## 29. trim overhang p,n,s ## 30. output formats ## 31.opt.: maj. base call at depth>x<mindepth (def.x=mindepth) (s5) ## 32.opt.: keep trimmed reads (def=0). Enter min length. (s2) ## 33.opt.: max stack size (int), def= max(500,mean+2*SD) (s3) ## 34.opt.: minDerep: exclude dereps with <= N copies, def=1 (s3) ## 35.opt.: use hierarchical clustering (def.=0, 1=yes) (s6) ## 36.opt.: repeat masking (def.=1='dust' method, 0=no) (s3,s6) ## 37.opt.: vsearch max threads per job (def.=6; see docs) (s3,s6) ==== optional: list group/clade assignments below this line (see docs) ==================
pyrad -p params.txt -s 234567 >> log.txt 2>&1
%%bash
sed -i '/## 12./c\2 ## 12. MinCov ' params.txt
sed -i '/## 14./c\empirical_4_m2 ## 14. output name ' params.txt
%%bash
pyrad -p params.txt -s 7 >> log.txt 2>&1
We are interested in the relationship between the amount of input (raw) data between any two samples, the average coverage they recover when clustered together, and the phylogenetic distances separating samples.
The average number of raw reads per sample is 1.36M.
## import data frame
import pandas as pd
## read in the data
s4dat = pd.read_table("empirical_4/stats/s2.rawedit.txt", header=0, nrows=65)
## print summary stats
print s4dat["passed.total"].describe()
## find which sample has the most raw data
maxraw = s4dat["passed.total"].max()
print "\nmost raw data in sample:"
print s4dat['sample '][s4dat['passed.total']==maxraw]
count 64.000000 mean 3959338.812500 std 811579.026685 min 731712.000000 25% 3635737.000000 50% 4054918.000000 75% 4540098.750000 max 5074496.000000 Name: passed.total, dtype: float64 most raw data in sample: 10 DRR036775 Name: sample , dtype: object
pyrad v.3.0.63 outputs depth information for each sample which I read in here and plot. First let's ask which sample has the highest depth of coverage.
## read in the s3 results
s4dat = pd.read_table("empirical_4/stats/s3.clusters.txt", header=0, nrows=65)
## print summary stats
print "summary of means\n=================="
print s4dat['dpt.me'].describe()
## print summary stats
print "\nsummary of std\n=================="
print s4dat['dpt.sd'].describe()
## print summary stats
print "\nsummary of proportion lowdepth\n=================="
print pd.Series(1-s4dat['d>5.tot']/s4dat["total"]).describe()
## find which sample has the greatest depth of retained loci
max_hiprop = (s4dat["d>5.tot"]/s4dat["total"]).max()
print "\nhighest coverage in sample:"
print s4dat['taxa'][s4dat['d>5.tot']/s4dat["total"]==max_hiprop]
summary of means ================== count 64.000000 mean 106.000750 std 17.392284 min 40.235000 25% 98.422750 50% 107.650000 75% 118.500750 max 131.455000 Name: dpt.me, dtype: float64 summary of std ================== count 64.000000 mean 289.564375 std 140.245059 min 95.184000 25% 205.293000 50% 235.668500 75% 325.144000 max 757.646000 Name: dpt.sd, dtype: float64 summary of proportion lowdepth ================== count 64.000000 mean 0.053360 std 0.010158 min 0.038625 25% 0.046790 50% 0.051447 75% 0.058441 max 0.104888 dtype: float64 highest coverage in sample: 10 DRR036775 Name: taxa, dtype: object
Green shows the loci that were discarded and orange the loci that were retained. The majority of data were discarded for being too low of coverage.
import toyplot
import toyplot.svg
import numpy as np
## read in the depth information for this sample
with open("empirical_4/clust.85/DRR036775.depths", 'rb') as indat:
depths = np.array(indat.read().strip().split(","), dtype=int)
## make a barplot in Toyplot
canvas = toyplot.Canvas(width=350, height=300)
axes = canvas.axes(xlabel="Depth of coverage (N reads)",
ylabel="N loci",
label="dataset4/sample=DRR036775")
## select the loci with depth > 5 (kept)
keeps = depths[depths>5]
## plot kept and discarded loci
edat = np.histogram(depths, range(30)) # density=True)
kdat = np.histogram(keeps, range(30)) #, density=True)
axes.bars(edat)
axes.bars(kdat)
#toyplot.svg.render(canvas, "empirical_4_depthplot.svg")
<toyplot.mark.BarMagnitudes at 0x7ff380951310>
cat empirical_4/stats/empirical_4_m4.stats
46277 ## loci with > minsp containing data 42877 ## loci with > minsp containing data & paralogs removed 42877 ## loci with > minsp containing data & paralogs removed & final filtering ## number of loci recovered in final data set for each taxon. taxon nloci DRR036765 27950 DRR036766 30223 DRR036767 29273 DRR036768 28909 DRR036769 28879 DRR036770 29524 DRR036771 28375 DRR036772 28756 DRR036773 27906 DRR036774 28927 DRR036775 31633 DRR036776 27658 DRR036777 30722 DRR036778 28349 DRR036779 27420 DRR036780 29359 DRR036781 28123 DRR036782 29928 DRR036783 29606 DRR036784 28864 DRR036785 25495 DRR036786 28829 DRR036787 29177 DRR036788 28432 DRR036789 28598 DRR036790 13855 DRR036791 28826 DRR036792 27106 DRR036793 30117 DRR036794 30466 DRR036795 29279 DRR036796 28378 DRR036797 29160 DRR036798 28070 DRR036799 29059 DRR036800 29404 DRR036801 28522 DRR036802 27407 DRR036803 22502 DRR036804 29971 DRR036805 26894 DRR036806 28787 DRR036807 30322 DRR036808 29587 DRR036809 28472 DRR036810 24144 DRR036811 29457 DRR036812 29349 DRR036813 28553 DRR036814 29899 DRR036815 27081 DRR036816 27908 DRR036817 30888 DRR036818 27955 DRR036819 28459 DRR036820 28671 DRR036821 30179 DRR036822 30425 DRR036823 28337 DRR036824 28810 DRR036825 29466 DRR036826 28469 DRR036827 28876 DRR036828 29248 ## nloci = number of loci with data for exactly ntaxa ## ntotal = number of loci for which at least ntaxa have data ntaxa nloci saved ntotal 1 - 2 - - 3 - - 4 562 * 42877 5 496 * 42315 6 485 * 41819 7 517 * 41334 8 627 * 40817 9 910 * 40190 10 1361 * 39280 11 1620 * 37919 12 1031 * 36299 13 343 * 35268 14 184 * 34925 15 160 * 34741 16 171 * 34581 17 212 * 34410 18 192 * 34198 19 192 * 34006 20 200 * 33814 21 187 * 33614 22 215 * 33427 23 199 * 33212 24 221 * 33013 25 248 * 32792 26 243 * 32544 27 225 * 32301 28 276 * 32076 29 233 * 31800 30 227 * 31567 31 245 * 31340 32 295 * 31095 33 273 * 30800 34 293 * 30527 35 314 * 30234 36 254 * 29920 37 322 * 29666 38 349 * 29344 39 352 * 28995 40 373 * 28643 41 397 * 28270 42 416 * 27873 43 446 * 27457 44 521 * 27011 45 558 * 26490 46 723 * 25932 47 828 * 25209 48 875 * 24381 49 1021 * 23506 50 1062 * 22485 51 1072 * 21423 52 957 * 20351 53 892 * 19394 54 1047 * 18502 55 1189 * 17455 56 1476 * 16266 57 1890 * 14790 58 2200 * 12900 59 2433 * 10700 60 2672 * 8267 61 2464 * 5595 62 1846 * 3131 63 1019 * 1285 64 266 * 266 ## nvar = number of loci containing n variable sites (pis+autapomorphies). ## sumvar = sum of variable sites (SNPs). ## pis = number of loci containing n parsimony informative sites. ## sumpis = sum of parsimony informative sites. nvar sumvar PIS sumPIS 0 4922 0 12420 0 1 5533 5533 8723 8723 2 5742 17017 7273 23269 3 5663 34006 5269 39076 4 5184 54742 3641 53640 5 4499 77237 2239 64835 6 3409 97691 1263 72413 7 2451 114848 733 77544 8 1691 128376 401 80752 9 1177 138969 279 83263 10 739 146359 199 85253 11 537 152266 138 86771 12 358 156562 88 87827 13 263 159981 70 88737 14 169 162347 46 89381 15 135 164372 28 89801 16 84 165716 24 90185 17 86 167178 7 90304 18 47 168024 14 90556 19 50 168974 3 90613 20 34 169654 2 90653 21 23 170137 6 90779 22 18 170533 8 90955 23 19 170970 1 90978 24 13 171282 1 91002 25 8 171482 0 91002 26 5 171612 0 91002 27 7 171801 0 91002 28 2 171857 0 91002 29 1 171886 1 91031 30 2 171946 0 91031 31 1 171977 0 91031 32 3 172073 0 91031 33 0 172073 0 91031 34 1 172107 0 91031 35 0 172107 0 91031 36 0 172107 0 91031 37 0 172107 0 91031 38 0 172107 0 91031 39 0 172107 0 91031 40 0 172107 0 91031 41 0 172107 0 91031 42 0 172107 0 91031 43 1 172150 0 91031 total var= 172150 total pis= 91031 sampled unlinked SNPs= 37955 sampled unlinked bi-allelic SNPs= -30031
%%bash
head -n 20 empirical_4/stats/empirical_4_m2.stats
48546 ## loci with > minsp containing data 45146 ## loci with > minsp containing data & paralogs removed 45146 ## loci with > minsp containing data & paralogs removed & final filtering ## number of loci recovered in final data set for each taxon. taxon nloci DRR036765 28070 DRR036766 30376 DRR036767 29411 DRR036768 29051 DRR036769 29022 DRR036770 29587 DRR036771 28435 DRR036772 28812 DRR036773 27965 DRR036774 28996 DRR036775 31778 DRR036776 27701
%%bash
## raxml argumement w/ ...
raxmlHPC-PTHREADS-AVX -f a -m GTRGAMMA -N 100 -x 12345 -p 12345 -T 20 \
-w /home/deren/Documents/RADmissing/empirical_4/ \
-n empirical_4_m4 -s empirical_4/outfiles/empirical_4_m4.phy
%%bash
## raxml argumement w/ ...
raxmlHPC-PTHREADS-AVX -f a -m GTRGAMMA -N 100 -x 12345 -p 12345 -T 20 \
-w /home/deren/Documents/RADmissing/empirical_4/ \
-n empirical_4_m2 -s empirical_4/outfiles/empirical_4_m2.phy
%%bash
head -n 40 empirical_4/RAxML_info.empirical_4
This is RAxML version 8.0.16 released by Alexandros Stamatakis on March 21 2014. With greatly appreciated code contributions by: Andre Aberer (HITS) Simon Berger (HITS) Alexey Kozlov (HITS) Nick Pattengale (Sandia) Wayne Pfeiffer (SDSC) Akifumi S. Tanabe (NRIFS) David Dao (KIT) Charlie Taylor (UF) Alignment has 339249 distinct alignment patterns Proportion of gaps and completely undetermined characters in this alignment: 33.82% RAxML rapid bootstrapping and subsequent ML search Using 1 distinct models/data partitions with joint branch length optimization Executing 100 rapid bootstrap inferences and thereafter a thorough ML search All free model parameters will be estimated by RAxML GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter GAMMA Model parameters will be estimated up to an accuracy of 0.1000000000 Log Likelihood units Partition: 0 Alignment Patterns: 339249 Name: No Name Provided DataType: DNA Substitution Matrix: GTR
ape
¶The backbone of the ingroup taxa has very low support across the radiation. The same result was found in the original paper (see Fig. 4 and Supplemental Fig S1 of Takahashi et al). Below we plot the full tree and a zoomed in tree of just the ingroup taxa.
%load_ext rpy2.ipython
%%R -w 1000 -h 800
library(ape)
tre <- read.tree("empirical_4/RAxML_bipartitions.empirical_4")
ltre <- ladderize(tre)
outgroups = c("DRR036791", "DRR036765", "DRR036790", "DRR036767", "DRR036769",
"DRR036766", "DRR036777", "DRR036793", "DRR036778", "DRR036778",
"DRR036792", "DRR036768", "DRR036775")
rtre <- root(ltre, outgroups)
ingrouptre <- drop.tip(ltre, outgroups)
par(mfrow=c(1,2))
plot(ltre, edge.width=2)
nodelabels(ltre$node.label, cex=1)
plot(ingrouptre, edge.width=2)
nodelabels(ingrouptre$node.label, cex=1)
%%R
mean(cophenetic.phylo(ltre))
[1] 0.009676341
%%R -h 700
utre <- ladderize(chronopl(rtre, 0.5, resolve.root=TRUE))
plot(utre, edge.width=2)