As part of the ipyrad.analysis
toolkit we've created convenience functions for easily running common RAxML commands. This can be useful when you want to run all of your analyes in a clean stream-lined way in a jupyter-notebook to create a completely reproducible study.
There are many ways to install raxml, the simplest of which is to use conda. This will install several raxml binaries into your conda path. If you want to call a different version of raxml that can easily be done by changing the parameter 'binary'.
## conda install ipyrad -c ipyrad
## conda install toytree -c eaton-lab
## conda install raxml -c bioconda
Create a raxml object which has a bunch of default parameters associated with it. The only required argument to initialize the object is a phylip formatted sequence file. In this example I provide a name and working directory as well.
import ipyrad.analysis as ipa
import toyplot
import toytree
rax = ipa.raxml(
data="./analysis-ipyrad/aligntest_outfiles/aligntest.phy",
name="aligntest",
workdir="analysis-raxml",
);
You can also modify many of the other command line arguments to raxml by changing values in the params dictionary of your raxml object. These values could also have been set when you initialized the object.
## set some other params
rax.params.N = 10
rax.params.T = 2
rax.params.o = None
#rax.params.o = ["32082_przewalskii", "33588_przewalskii"]
It is good practice to always print the command string so that you know exactly what was called for you analysis and it is documented.
print rax.command
raxmlHPC-PTHREADS-SSE3 -f a -T 2 -m GTRGAMMA -N 10 -x 12345 -p 54321 -n aligntest -w /home/deren/Documents/ipyrad/tests/analysis-raxml -s /home/deren/Documents/ipyrad/tests/analysis-ipyrad/aligntest_outfiles/aligntest.phy
This will start the job running. We haven't made a progress bar yet but we will add one soon.
rax.run(force=True)
job aligntest finished successfully
One of the reasons it is so convenient to run your raxml jobs this way is that the results files are easily accessible from your raxml objects.
rax.trees
bestTree ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bestTree.aligntest bipartitions ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bipartitions.aligntest bipartitionsBranchLabels ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bipartitionsBranchLabels.aligntest bootstrap ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bootstrap.aligntest info ~/Documents/ipyrad/tests/analysis-raxml/RAxML_info.aligntest
Here we use toytree to plot the bootstrap results.
tre = toytree.tree(rax.trees.bipartitions)
tre.root(wildcard="3")
tre.draw(
height=300,
width=300,
node_labels=tre.get_node_values("support"),
);
Using the ipyparallel library you can submit raxml jobs to run in parallel on cluster in a load-balanced fashion. You can then tell the notebook to wait until all jobs are finished before progressing in the notebook to draw trees, etc.
In a separate terminal start an ipcluster
instance and tell it how many engines to start.
##
## ipcluster start --n=20
##
Create a Client connected to the cluster
import ipyparallel as ipp
ipyclient = ipp.Client()
Create several raxml objects for different data sets
rax1 = ipa.raxml(
data="~/Documents/ipyrad/tests/analysis-ipyrad/pedic_outfiles/pedic.phy",
name="rax1", T=4, N=100)
rax2 = ipa.raxml(
data="~/Documents/ipyrad/tests/analysis-ipyrad/aligntest_outfiles/aligntest.phy",
name="rax2", T=4, N=100)
Submit jobs to run on the cluster queue.
rax1.run(ipyclient=ipyclient, force=True)
rax2.run(ipyclient=ipyclient, force=True)
job rax1 submitted to cluster job rax2 submitted to cluster
Wait for jobs to finish
## you can query each job while it's running
rax1.async.ready()
True
## or just block until all jobs on ipyclient are finished
ipyclient.wait()
True
Here we will draw a slighly more complex tree figure that combines two trees onto a single canvas.
## load trees and add to axes
tre1 = toytree.tree(rax1.trees.bipartitions)
tre1.root(wildcard="prz")
tre1.draw(width=300);
tre2 = toytree.tree(rax2.trees.bipartitions)
tre2.root(wildcard="3")
tre2.draw(width=300);