Cookbook: Parallelized STRUCTURE analyses on unlinked SNPs

As part of the ipyrad.analysis toolkit we've created convenience functions for easily distributing STRUCTURE analysis jobs on an HPC cluster, and for doing so in a programmatic and reproducible way. Importantly, our workflow allows you to easily sample different distributions of unlinked SNPs among replicate analyses, with the final inferred population structure summarized from a distribution of replicates. We also provide some simple interactive plotting functions to make barplots and slightly fancier figure, like below.

A note on Jupyter/IPython

This is a Jupyter notebook, a reproducible and executable document. The code in this notebook is Python (2.7), and should be executed either in a jupyter-notebook, like this one, or in an IPython terminal. Execute each cell in order to reproduce our entire analysis. We make use of the ipyparallel Python library to distribute STRUCTURE jobs across processers in parallel. If that is confusing, see our tutorial on using ipcluster with jupyter. The example data set used in this analysis is from the empirical example ipyrad tutorial.

Required software

You can easily install the required software for this notebook locally using conda by running the commented code below in a terminal. If you are working on an HPC cluster you do not need administrator privileges to install the software in this way, since it is only installed locally.

In [1]:
## conda install ipyrad -c ipyrad
## conda install structure -c ipyrad
## conda install clumpp -c ipyrad
## conda install toytree -c eaton-lab

Import Python libraries

In [2]:
import ipyrad.analysis as ipa      ## ipyrad analysis toolkit
import ipyparallel as ipp          ## parallel processing
import toyplot                     ## plotting library

Parallel cluster setup

Start an ipcluster instance in a separate terminal. An easy way to do this in a jupyter-notebook running on an HPC cluster is to go to your Jupyter dashboard, and click [new], and then [terminal], and run 'ipcluster start' in that terminal. This will start a local cluster on the compute node you are connected to. See our [ipyparallel tutorial] (coming soon) for further details.

In [3]:
##
## ipcluster start --n=40
##
In [4]:
## get parallel client
ipyclient = ipp.Client()
print "Connected to {} cores".format(len(ipyclient))
Connected to 4 cores

Enter input and output file locations

In [5]:
## the structure formatted file
strfile = "./analysis-ipyrad/pedic-full_outfiles/pedic-full.str"

## an optional mapfile, to sample unlinked SNPs
mapfile = "./analysis-ipyrad/pedic-full_outfiles/pedic-full.snps.map"

## the directory where outfiles should be written
workdir = "./analysis-structure/"

Create a Structure Class object

Structure is kind of an old fashioned program that requires creating quite a few input files to run, which makes it not very convenient to use in a programmatic and reproducible way. To work around this we've created a convenience wrapper object to make it easy to submit Structure jobs and to summarize their results.

In [6]:
## create a Structure object
struct = ipa.structure(name="structure-test",
                       data=strfile, 
                       mapfile=mapfile,
                       workdir=workdir)

Set parameter options for this object

Our Structure object will be used to submit jobs to the cluster. It has associated with it a name, a set of input files, and a large number of parameter settings. You can modify the parameters by setting them like below. You can also use tab-completion to see all of the available options, or print them like below. See the full structure docs here for further details on the function of each parameter. In support of reproducibility, it is good practice to print both the mainparams and extraparams so it is clear which options you used.

In [7]:
## set mainparams for object
struct.mainparams.burnin = 10000
struct.mainparams.numreps = 100000

## see all mainparams
print struct.mainparams

## see or set extraparams
print struct.extraparams
burnin             10000               
extracols          0                   
label              1                   
locdata            0                   
mapdistances       0                   
markernames        0                   
markovphase        0                   
missing            -9                  
notambiguous       -999                
numreps            100000              
onerowperind       0                   
phased             0                   
phaseinfo          0                   
phenotype          0                   
ploidy             2                   
popdata            0                   
popflag            0                   
recessivealleles   0                   

admburnin           500                 
alpha               1.0                 
alphamax            10.0                
alphapriora         1.0                 
alphapriorb         2.0                 
alphapropsd         0.025               
ancestdist          0                   
ancestpint          0.9                 
computeprob         1                   
echodata            0                   
fpriormean          0.01                
fpriorsd            0.05                
freqscorr           1                   
gensback            2                   
inferalpha          1                   
inferlambda         0                   
intermedsave        0                   
lambda_             1.0                 
linkage             0                   
locispop            0                   
locprior            0                   
locpriorinit        1.0                 
log10rmax           1.0                 
log10rmin           -4.0                
log10rpropsd        0.1                 
log10rstart         -2.0                
maxlocprior         20.0                
metrofreq           10                  
migrprior           0.01                
noadmix             0                   
numboxes            1000                
onefst              0                   
pfrompopflagonly    0                   
popalphas           0                   
popspecificlambda   0                   
printlambda         1                   
printlikes          0                   
printnet            1                   
printqhat           0                   
printqsum           1                   
randomize           0                   
reporthitrate       0                   
seed                12345               
sitebysite          0                   
startatpopinfo      0                   
unifprioralpha      1                   
updatefreq          10000               
usepopinfo          0                   

Submit jobs to run on the cluster

The function run() distributes jobs to run on the cluster and load-balances the parallel workload. It takes a number of arguments. The first, kpop, is the number of populations. The second, nreps, is the number of replicated runs to perform. Each rep has a different random seed, and if you entered a mapfile for your Structure object then it will subsample unlinked snps independently in each replicate. The seed argument can be used to make the replicate analyses reproducible. The extraparams.seed parameter will be generated from this for each replicate. And finally, provide it the ipyclient object that we created above. The structure object will store an asynchronous results object for each job that is submitted so that we can query whether the jobs are finished yet or not. Using a simple for-loop we'll submit 20 replicate jobs to run at four different values of K.

In [8]:
## a range of K-values to test
tests = [3, 4, 5, 6]
In [9]:
## submit batches of 20 replicate jobs for each value of K 
for kpop in tests:
    struct.run(
        kpop=kpop, 
        nreps=20, 
        seed=12345,
        ipyclient=ipyclient,
        )
submitted 20 structure jobs [structure-test-K-3]
submitted 20 structure jobs [structure-test-K-4]
submitted 20 structure jobs [structure-test-K-5]
submitted 20 structure jobs [structure-test-K-6]

Track progress until finished

You can check for finished results by using the get_clumpp_table() function, which tries to summarize the finished results files. If no results are ready it will simply print a warning message telling you to wait. If you want the notebook to block/wait until all jobs are finished then execute the wait() function of the ipyclient object, like below.

In [11]:
## see submitted jobs (we query first 10 here)
struct.asyncs[:10]
Out[11]:
[<AsyncResult: _call_structure>,
 <AsyncResult: _call_structure>,
 <AsyncResult: _call_structure>,
 <AsyncResult: _call_structure>,
 <AsyncResult: _call_structure>,
 <AsyncResult: _call_structure>,
 <AsyncResult: _call_structure>,
 <AsyncResult: _call_structure>,
 <AsyncResult: _call_structure>,
 <AsyncResult: _call_structure>]
In [12]:
## query a specific job result by index
if struct.asyncs[0].ready():
    print struct.asyncs[0].result()
In [ ]:
## block/wait until all jobs finished
ipyclient.wait() 

Summarize replicates with CLUMPP

We ran 20 replicates per K-value hypothesis. We now need to concatenate and purmute those results so they can be summarized. For this we use the software clumpp. The default arguments to clumpp are generally good, but you can modify them the same as structure params, by accessing the .clumppparams attribute of your structure object. See the clumpp documentation for more details. If you have a large number of samples (>50) you may wish to use the largeKgreedy algorithm (m=3) for faster runtimes. Below we run clumpp for each value of K that we ran structure on. You only need to tell the get_clumpp_table() function the value of K and it will find all of the result files given the Structure object's name and workdir.

In [21]:
## set some clumpp params
struct.clumppparams.m = 3               ## use largegreedy algorithm
struct.clumppparams.greedy_option = 2   ## test nrepeat possible orders
struct.clumppparams.repeats = 10000     ## number of repeats
struct.clumppparams
Out[21]:
c                         13                  
datatype                  0                   
every_permfile            0                   
greedy_option             2                   
indfile                   0                   
kpop                      5                   
m                         3                   
miscfile                  0                   
order_by_run              1                   
outfile                   0                   
override_warnings         0                   
permfile                  0                   
permutationsfile          0                   
permuted_datafile         0                   
popfile                   0                   
print_every_perm          0                   
print_permuted_data       0                   
print_random_inputorder   0                   
r                         16                  
random_inputorderfile     0                   
repeats                   10000               
s                         2                   
w                         1                   
In [33]:
## run clumpp for each value of K
tables = {}
for kpop in tests:
    tables[kpop] = struct.get_clumpp_table(kpop)
mean scores across 20 replicates.
mean scores across 20 replicates.
mean scores across 20 replicates.
mean scores across 20 replicates.

Sort the table order how you like it

This can be useful if, for example, you want to order the names to be in the same order as tips on your phylogeny.

In [34]:
## custom sorting order
myorder = [
    "32082_przewalskii", 
    "33588_przewalskii",
    "41478_cyathophylloides", 
    "41954_cyathophylloides", 
    "29154_superba",
    "30686_cyathophylla", 
    "33413_thamno", 
    "30556_thamno", 
    "35236_rex", 
    "40578_rex", 
    "35855_rex",
    "39618_rex", 
    "38362_rex",
]

print "custom ordering"
print tables[4].ix[myorder]
custom ordering
                          0      1          2      3
32082_przewalskii       1.0  0.000  0.000e+00  0.000
33588_przewalskii       1.0  0.000  0.000e+00  0.000
41478_cyathophylloides  0.0  0.005  9.948e-01  0.000
41954_cyathophylloides  0.0  0.005  9.948e-01  0.000
29154_superba           0.0  0.019  6.731e-01  0.308
30686_cyathophylla      0.0  0.020  6.820e-01  0.298
33413_thamno            0.0  0.845  0.000e+00  0.155
30556_thamno            0.0  0.892  7.000e-04  0.107
35236_rex               0.0  0.908  1.000e-04  0.092
40578_rex               0.0  0.989  2.000e-04  0.010
35855_rex               0.0  0.990  0.000e+00  0.010
39618_rex               0.0  1.000  0.000e+00  0.000
38362_rex               0.0  1.000  0.000e+00  0.000

A function for adding an interactive hover to our plots

The function automatically parses the table above for you. It can reorder the individuals based on their membership in each group, or based on an input list of ordered names. It returns the table of data as well as a list with information for making interactive hover boxes, which you can see below by hovering over the plots.

In [35]:
def hover(table):
    hover = []
    for row in range(table.shape[0]):
        stack = []
        for col in range(table.shape[1]):
            label = "Name: {}\nGroup: {}\nProp: {}"\
                .format(table.index[row], 
                        table.columns[col],
                        table.ix[row, col])
            stack.append(label)
        hover.append(stack)
    return list(hover)

Visualize population structure in barplots

Hover over the plot to see sample names and info in the hover box.

In [36]:
for kpop in tests:
    ## parse outfile to a table and re-order it
    table = tables[kpop]
    table = table.ix[myorder]
    
    ## plot barplot w/ hover
    canvas, axes, mark = toyplot.bars(
                            table, 
                            title=hover(table),
                            width=400, 
                            height=200, 
                            yshow=False,                            
                            style={"stroke": toyplot.color.near_black},
                            )
Name: 32082_przewalskii Group: 0 Prop: 0.0Name: 33588_przewalskii Group: 0 Prop: 0.0Name: 41478_cyathophylloides Group: 0 Prop: 0.0211Name: 41954_cyathophylloides Group: 0 Prop: 0.0211Name: 29154_superba Group: 0 Prop: 0.0553Name: 30686_cyathophylla Group: 0 Prop: 0.0556Name: 33413_thamno Group: 0 Prop: 0.9956Name: 30556_thamno Group: 0 Prop: 0.9974Name: 35236_rex Group: 0 Prop: 0.9979Name: 40578_rex Group: 0 Prop: 0.9986Name: 35855_rex Group: 0 Prop: 0.9988Name: 39618_rex Group: 0 Prop: 1.0Name: 38362_rex Group: 0 Prop: 1.0Name: 32082_przewalskii Group: 1 Prop: 0.0Name: 33588_przewalskii Group: 1 Prop: 0.0Name: 41478_cyathophylloides Group: 1 Prop: 0.9789Name: 41954_cyathophylloides Group: 1 Prop: 0.9789Name: 29154_superba Group: 1 Prop: 0.9448Name: 30686_cyathophylla Group: 1 Prop: 0.9444Name: 33413_thamno Group: 1 Prop: 0.0044Name: 30556_thamno Group: 1 Prop: 0.0026Name: 35236_rex Group: 1 Prop: 0.0021Name: 40578_rex Group: 1 Prop: 0.0015Name: 35855_rex Group: 1 Prop: 0.0013Name: 39618_rex Group: 1 Prop: 0.0Name: 38362_rex Group: 1 Prop: 0.0Name: 32082_przewalskii Group: 2 Prop: 1.0Name: 33588_przewalskii Group: 2 Prop: 1.0Name: 41478_cyathophylloides Group: 2 Prop: 0.0Name: 41954_cyathophylloides Group: 2 Prop: 0.0Name: 29154_superba Group: 2 Prop: 0.0Name: 30686_cyathophylla Group: 2 Prop: 0.0Name: 33413_thamno Group: 2 Prop: 0.0Name: 30556_thamno Group: 2 Prop: 0.0Name: 35236_rex Group: 2 Prop: 0.0Name: 40578_rex Group: 2 Prop: 0.0Name: 35855_rex Group: 2 Prop: 0.0Name: 39618_rex Group: 2 Prop: 0.0Name: 38362_rex Group: 2 Prop: 0.004812
Name: 32082_przewalskii Group: 0 Prop: 1.0Name: 33588_przewalskii Group: 0 Prop: 1.0Name: 41478_cyathophylloides Group: 0 Prop: 0.0Name: 41954_cyathophylloides Group: 0 Prop: 0.0Name: 29154_superba Group: 0 Prop: 0.0Name: 30686_cyathophylla Group: 0 Prop: 0.0Name: 33413_thamno Group: 0 Prop: 0.0Name: 30556_thamno Group: 0 Prop: 0.0Name: 35236_rex Group: 0 Prop: 0.0Name: 40578_rex Group: 0 Prop: 0.0Name: 35855_rex Group: 0 Prop: 0.0Name: 39618_rex Group: 0 Prop: 0.0Name: 38362_rex Group: 0 Prop: 0.0Name: 32082_przewalskii Group: 1 Prop: 0.0Name: 33588_przewalskii Group: 1 Prop: 0.0Name: 41478_cyathophylloides Group: 1 Prop: 0.0052Name: 41954_cyathophylloides Group: 1 Prop: 0.0052Name: 29154_superba Group: 1 Prop: 0.0186Name: 30686_cyathophylla Group: 1 Prop: 0.0197Name: 33413_thamno Group: 1 Prop: 0.8454Name: 30556_thamno Group: 1 Prop: 0.8922Name: 35236_rex Group: 1 Prop: 0.9081Name: 40578_rex Group: 1 Prop: 0.9894Name: 35855_rex Group: 1 Prop: 0.9901Name: 39618_rex Group: 1 Prop: 1.0Name: 38362_rex Group: 1 Prop: 1.0Name: 32082_przewalskii Group: 2 Prop: 0.0Name: 33588_przewalskii Group: 2 Prop: 0.0Name: 41478_cyathophylloides Group: 2 Prop: 0.9948Name: 41954_cyathophylloides Group: 2 Prop: 0.9948Name: 29154_superba Group: 2 Prop: 0.6731Name: 30686_cyathophylla Group: 2 Prop: 0.682Name: 33413_thamno Group: 2 Prop: 0.0Name: 30556_thamno Group: 2 Prop: 0.0007Name: 35236_rex Group: 2 Prop: 0.0001Name: 40578_rex Group: 2 Prop: 0.0002Name: 35855_rex Group: 2 Prop: 0.0Name: 39618_rex Group: 2 Prop: 0.0Name: 38362_rex Group: 2 Prop: 0.0Name: 32082_przewalskii Group: 3 Prop: 0.0Name: 33588_przewalskii Group: 3 Prop: 0.0Name: 41478_cyathophylloides Group: 3 Prop: 0.0Name: 41954_cyathophylloides Group: 3 Prop: 0.0Name: 29154_superba Group: 3 Prop: 0.3084Name: 30686_cyathophylla Group: 3 Prop: 0.2983Name: 33413_thamno Group: 3 Prop: 0.1546Name: 30556_thamno Group: 3 Prop: 0.1071Name: 35236_rex Group: 3 Prop: 0.0919Name: 40578_rex Group: 3 Prop: 0.0104Name: 35855_rex Group: 3 Prop: 0.0099Name: 39618_rex Group: 3 Prop: 0.0Name: 38362_rex Group: 3 Prop: 0.004812
Name: 32082_przewalskii Group: 0 Prop: 0.0Name: 33588_przewalskii Group: 0 Prop: 0.0Name: 41478_cyathophylloides Group: 0 Prop: 0.0Name: 41954_cyathophylloides Group: 0 Prop: 0.0Name: 29154_superba Group: 0 Prop: 0.0263Name: 30686_cyathophylla Group: 0 Prop: 0.0346Name: 33413_thamno Group: 0 Prop: 0.7078Name: 30556_thamno Group: 0 Prop: 0.7924Name: 35236_rex Group: 0 Prop: 0.8263Name: 40578_rex Group: 0 Prop: 0.9469Name: 35855_rex Group: 0 Prop: 0.9486Name: 39618_rex Group: 0 Prop: 1.0Name: 38362_rex Group: 0 Prop: 1.0Name: 32082_przewalskii Group: 1 Prop: 0.0Name: 33588_przewalskii Group: 1 Prop: 0.0Name: 41478_cyathophylloides Group: 1 Prop: 0.0Name: 41954_cyathophylloides Group: 1 Prop: 0.0Name: 29154_superba Group: 1 Prop: 0.4769Name: 30686_cyathophylla Group: 1 Prop: 0.4415Name: 33413_thamno Group: 1 Prop: 0.0075Name: 30556_thamno Group: 1 Prop: 0.0153Name: 35236_rex Group: 1 Prop: 0.0379Name: 40578_rex Group: 1 Prop: 0.0329Name: 35855_rex Group: 1 Prop: 0.0305Name: 39618_rex Group: 1 Prop: 0.0Name: 38362_rex Group: 1 Prop: 0.0Name: 32082_przewalskii Group: 2 Prop: 1.0Name: 33588_przewalskii Group: 2 Prop: 1.0Name: 41478_cyathophylloides Group: 2 Prop: 0.0Name: 41954_cyathophylloides Group: 2 Prop: 0.0Name: 29154_superba Group: 2 Prop: 0.0Name: 30686_cyathophylla Group: 2 Prop: 0.0Name: 33413_thamno Group: 2 Prop: 0.0Name: 30556_thamno Group: 2 Prop: 0.0Name: 35236_rex Group: 2 Prop: 0.0Name: 40578_rex Group: 2 Prop: 0.0Name: 35855_rex Group: 2 Prop: 0.0Name: 39618_rex Group: 2 Prop: 0.0Name: 38362_rex Group: 2 Prop: 0.0Name: 32082_przewalskii Group: 3 Prop: 0.0Name: 33588_przewalskii Group: 3 Prop: 0.0Name: 41478_cyathophylloides Group: 3 Prop: 1.0Name: 41954_cyathophylloides Group: 3 Prop: 1.0Name: 29154_superba Group: 3 Prop: 0.496Name: 30686_cyathophylla Group: 3 Prop: 0.5239Name: 33413_thamno Group: 3 Prop: 0.0Name: 30556_thamno Group: 3 Prop: 0.0Name: 35236_rex Group: 3 Prop: 0.0Name: 40578_rex Group: 3 Prop: 0.0Name: 35855_rex Group: 3 Prop: 0.0Name: 39618_rex Group: 3 Prop: 0.0Name: 38362_rex Group: 3 Prop: 0.0Name: 32082_przewalskii Group: 4 Prop: 0.0Name: 33588_przewalskii Group: 4 Prop: 0.0Name: 41478_cyathophylloides Group: 4 Prop: 0.0Name: 41954_cyathophylloides Group: 4 Prop: 0.0Name: 29154_superba Group: 4 Prop: 0.0008Name: 30686_cyathophylla Group: 4 Prop: 0.0Name: 33413_thamno Group: 4 Prop: 0.2847Name: 30556_thamno Group: 4 Prop: 0.1923Name: 35236_rex Group: 4 Prop: 0.1358Name: 40578_rex Group: 4 Prop: 0.0203Name: 35855_rex Group: 4 Prop: 0.0209Name: 39618_rex Group: 4 Prop: 0.0Name: 38362_rex Group: 4 Prop: 0.004812
Name: 32082_przewalskii Group: 0 Prop: 0.0Name: 33588_przewalskii Group: 0 Prop: 0.0Name: 41478_cyathophylloides Group: 0 Prop: 0.0Name: 41954_cyathophylloides Group: 0 Prop: 0.0Name: 29154_superba Group: 0 Prop: 0.4604Name: 30686_cyathophylla Group: 0 Prop: 0.4249Name: 33413_thamno Group: 0 Prop: 0.0015Name: 30556_thamno Group: 0 Prop: 0.0039Name: 35236_rex Group: 0 Prop: 0.0289Name: 40578_rex Group: 0 Prop: 0.0024Name: 35855_rex Group: 0 Prop: 0.003Name: 39618_rex Group: 0 Prop: 0.0001Name: 38362_rex Group: 0 Prop: 0.0Name: 32082_przewalskii Group: 1 Prop: 0.0Name: 33588_przewalskii Group: 1 Prop: 0.0Name: 41478_cyathophylloides Group: 1 Prop: 0.0Name: 41954_cyathophylloides Group: 1 Prop: 0.0Name: 29154_superba Group: 1 Prop: 0.0225Name: 30686_cyathophylla Group: 1 Prop: 0.0313Name: 33413_thamno Group: 1 Prop: 0.647Name: 30556_thamno Group: 1 Prop: 0.7293Name: 35236_rex Group: 1 Prop: 0.7636Name: 40578_rex Group: 1 Prop: 0.81Name: 35855_rex Group: 1 Prop: 0.8138Name: 39618_rex Group: 1 Prop: 0.9999Name: 38362_rex Group: 1 Prop: 1.0Name: 32082_przewalskii Group: 2 Prop: 0.0Name: 33588_przewalskii Group: 2 Prop: 0.0Name: 41478_cyathophylloides Group: 2 Prop: 0.0Name: 41954_cyathophylloides Group: 2 Prop: 0.0Name: 29154_superba Group: 2 Prop: 0.0Name: 30686_cyathophylla Group: 2 Prop: 0.0Name: 33413_thamno Group: 2 Prop: 0.3483Name: 30556_thamno Group: 2 Prop: 0.1861Name: 35236_rex Group: 2 Prop: 0.0632Name: 40578_rex Group: 2 Prop: 0.022Name: 35855_rex Group: 2 Prop: 0.0224Name: 39618_rex Group: 2 Prop: 0.0Name: 38362_rex Group: 2 Prop: 0.0Name: 32082_przewalskii Group: 3 Prop: 1.0Name: 33588_przewalskii Group: 3 Prop: 1.0Name: 41478_cyathophylloides Group: 3 Prop: 0.0Name: 41954_cyathophylloides Group: 3 Prop: 0.0Name: 29154_superba Group: 3 Prop: 0.0Name: 30686_cyathophylla Group: 3 Prop: 0.0Name: 33413_thamno Group: 3 Prop: 0.0Name: 30556_thamno Group: 3 Prop: 0.0Name: 35236_rex Group: 3 Prop: 0.0Name: 40578_rex Group: 3 Prop: 0.0Name: 35855_rex Group: 3 Prop: 0.0Name: 39618_rex Group: 3 Prop: 0.0Name: 38362_rex Group: 3 Prop: 0.0Name: 32082_przewalskii Group: 4 Prop: 0.0Name: 33588_przewalskii Group: 4 Prop: 0.0Name: 41478_cyathophylloides Group: 4 Prop: 0.0Name: 41954_cyathophylloides Group: 4 Prop: 0.0Name: 29154_superba Group: 4 Prop: 0.0009Name: 30686_cyathophylla Group: 4 Prop: 0.001Name: 33413_thamno Group: 4 Prop: 0.0031Name: 30556_thamno Group: 4 Prop: 0.0808Name: 35236_rex Group: 4 Prop: 0.1443Name: 40578_rex Group: 4 Prop: 0.1656Name: 35855_rex Group: 4 Prop: 0.1609Name: 39618_rex Group: 4 Prop: 0.0Name: 38362_rex Group: 4 Prop: 0.0Name: 32082_przewalskii Group: 5 Prop: 0.0Name: 33588_przewalskii Group: 5 Prop: 0.0Name: 41478_cyathophylloides Group: 5 Prop: 1.0Name: 41954_cyathophylloides Group: 5 Prop: 1.0Name: 29154_superba Group: 5 Prop: 0.5162Name: 30686_cyathophylla Group: 5 Prop: 0.5428Name: 33413_thamno Group: 5 Prop: 0.0Name: 30556_thamno Group: 5 Prop: 0.0Name: 35236_rex Group: 5 Prop: 0.0Name: 40578_rex Group: 5 Prop: 0.0Name: 35855_rex Group: 5 Prop: 0.0Name: 39618_rex Group: 5 Prop: 0.0Name: 38362_rex Group: 5 Prop: 0.004812

Make a slightly fancier plot and save to file

In [37]:
## save plots for your favorite value of K
table = struct.get_clumpp_table(kpop=3)
table = table.ix[myorder]
mean scores across 20 replicates.
In [38]:
## further styling of plot with css 
style = {"stroke":toyplot.color.near_black, 
         "stroke-width": 2}

## build barplot
canvas = toyplot.Canvas(width=600, height=250)
axes = canvas.cartesian(bounds=("5%", "95%", "5%", "45%"))
axes.bars(table, title=hover(table), style=style)

## add names to x-axis
ticklabels = [i for i in table.index.tolist()]
axes.x.ticks.locator = toyplot.locator.Explicit(labels=ticklabels)
axes.x.ticks.labels.angle = -60
axes.x.ticks.show = True
axes.x.ticks.labels.offset = 10
axes.x.ticks.labels.style = {"font-size": "12px"}
axes.x.spine.style = style
axes.y.show = False
    
## options: uncomment to save plots. Only html retains hover.
import toyplot.svg
import toyplot.pdf
import toyplot.html
toyplot.svg.render(canvas, "struct.svg")
toyplot.pdf.render(canvas, "struct.pdf")
toyplot.html.render(canvas, "struct.html")

## show in notebook
canvas
Out[38]:
Name: 32082_przewalskii Group: 0 Prop: 0.0Name: 33588_przewalskii Group: 0 Prop: 0.0Name: 41478_cyathophylloides Group: 0 Prop: 0.0211Name: 41954_cyathophylloides Group: 0 Prop: 0.0211Name: 29154_superba Group: 0 Prop: 0.0553Name: 30686_cyathophylla Group: 0 Prop: 0.0556Name: 33413_thamno Group: 0 Prop: 0.9956Name: 30556_thamno Group: 0 Prop: 0.9974Name: 35236_rex Group: 0 Prop: 0.9979Name: 40578_rex Group: 0 Prop: 0.9986Name: 35855_rex Group: 0 Prop: 0.9988Name: 39618_rex Group: 0 Prop: 1.0Name: 38362_rex Group: 0 Prop: 1.0Name: 32082_przewalskii Group: 1 Prop: 0.0Name: 33588_przewalskii Group: 1 Prop: 0.0Name: 41478_cyathophylloides Group: 1 Prop: 0.9789Name: 41954_cyathophylloides Group: 1 Prop: 0.9789Name: 29154_superba Group: 1 Prop: 0.9448Name: 30686_cyathophylla Group: 1 Prop: 0.9444Name: 33413_thamno Group: 1 Prop: 0.0044Name: 30556_thamno Group: 1 Prop: 0.0026Name: 35236_rex Group: 1 Prop: 0.0021Name: 40578_rex Group: 1 Prop: 0.0015Name: 35855_rex Group: 1 Prop: 0.0013Name: 39618_rex Group: 1 Prop: 0.0Name: 38362_rex Group: 1 Prop: 0.0Name: 32082_przewalskii Group: 2 Prop: 1.0Name: 33588_przewalskii Group: 2 Prop: 1.0Name: 41478_cyathophylloides Group: 2 Prop: 0.0Name: 41954_cyathophylloides Group: 2 Prop: 0.0Name: 29154_superba Group: 2 Prop: 0.0Name: 30686_cyathophylla Group: 2 Prop: 0.0Name: 33413_thamno Group: 2 Prop: 0.0Name: 30556_thamno Group: 2 Prop: 0.0Name: 35236_rex Group: 2 Prop: 0.0Name: 40578_rex Group: 2 Prop: 0.0Name: 35855_rex Group: 2 Prop: 0.0Name: 39618_rex Group: 2 Prop: 0.0Name: 38362_rex Group: 2 Prop: 0.032082_przewalskii33588_przewalskii41478_cyathophylloides41954_cyathophylloides29154_superba30686_cyathophylla33413_thamno30556_thamno35236_rex40578_rex35855_rex39618_rex38362_rex

Calculating the best K

I haven't gotten around to writing the code for this yet (contributors are welcome!). For now, I like using the site http://taylor0.biology.ucla.edu/structureHarvester/. It's great. Super easy. Zip up all the files in our structure directory, submit them to the site, and you're done.

In [165]:
%%bash -s "$STRUCTDIR"

## creates zip dir of all files ending with _f
zip $1/structure-files.zip $1/*_f
updating: home/deren/Documents/ipyrad/tests/analysis_structure/K-3-rep-0_f (deflated 82%)
updating: home/deren/Documents/ipyrad/tests/analysis_structure/K-3-rep-1_f (deflated 82%)
updating: home/deren/Documents/ipyrad/tests/analysis_structure/K-3-rep-2_f (deflated 81%)
updating: home/deren/Documents/ipyrad/tests/analysis_structure/K-3-rep-3_f (deflated 82%)
updating: home/deren/Documents/ipyrad/tests/analysis_structure/K-3-rep-4_f (deflated 82%)
updating: home/deren/Documents/ipyrad/tests/analysis_structure/K-4-rep-0_f (deflated 80%)
updating: home/deren/Documents/ipyrad/tests/analysis_structure/K-4-rep-1_f (deflated 80%)
updating: home/deren/Documents/ipyrad/tests/analysis_structure/K-4-rep-2_f (deflated 80%)
updating: home/deren/Documents/ipyrad/tests/analysis_structure/K-4-rep-3_f (deflated 81%)
updating: home/deren/Documents/ipyrad/tests/analysis_structure/K-4-rep-4_f (deflated 80%)
  adding: home/deren/Documents/ipyrad/tests/analysis_structure/K-5-rep-0_f (deflated 80%)
  adding: home/deren/Documents/ipyrad/tests/analysis_structure/K-5-rep-1_f (deflated 80%)
  adding: home/deren/Documents/ipyrad/tests/analysis_structure/K-5-rep-2_f (deflated 79%)
  adding: home/deren/Documents/ipyrad/tests/analysis_structure/K-5-rep-3_f (deflated 80%)
  adding: home/deren/Documents/ipyrad/tests/analysis_structure/K-5-rep-4_f (deflated 80%)
  adding: home/deren/Documents/ipyrad/tests/analysis_structure/K-6-rep-0_f (deflated 79%)
  adding: home/deren/Documents/ipyrad/tests/analysis_structure/K-6-rep-1_f (deflated 79%)
  adding: home/deren/Documents/ipyrad/tests/analysis_structure/K-6-rep-2_f (deflated 79%)
  adding: home/deren/Documents/ipyrad/tests/analysis_structure/K-6-rep-3_f (deflated 79%)
  adding: home/deren/Documents/ipyrad/tests/analysis_structure/K-6-rep-4_f (deflated 79%)

Copying this notebook to your computer/cluster

You can easily copy this notebook and then just replace my file names with your filenames to run your analysis. Just click on the [Download Notebook] link at the top of this page. Then run jupyter-notebook from a terminal and open this notebook from the dashboard.