Example notebook: Structure with pop assignments¶

This notebook shows how to use the ipyrad.analysis toolkit to generate structure input files that use population information.

Required software¶

In [1]:

# conda install ipyrad -c ipyrad
# conda install structure clumpp -c ipyrad
# conda install toytree -c eaton-lab

In [2]:

import ipyrad.analysis as ipa
import toyplot

Create a structure analysis object¶

If you include a 'mapfile' then we will use locus information to subsample just a single SNP from each locus so that the resulting data file will meet the expectations of structure that SNPs are "unlinked". If you create multiple replicates files using different random seeds then different SNPs will be selected in each rep.

In [3]:

s = ipa.structure(
        name="test", 
        workdir="analysis-structure",
        data="analysis-ipyrad/ped_min10_outfiles/ped_min10.str",
        mapfile="analysis-ipyrad/ped_min10_outfiles/ped_min10.snps.map",
    )

Set params for the structure analysis¶

These values are used to generate the "mainparams" and "extraparams" files for structure.

In [4]:

## set run parameters (you probably want to run >10X this long)
s.mainparams.burnin = 1000
s.mainparams.numreps = 5000

## tell structure to expect popdata & popflag
s.mainparams.popdata = 1
s.mainparams.popflag = 1

## print all mainparams
s.mainparams

Out[4]:

burnin             1000                
extracols          0                   
label              1                   
locdata            0                   
mapdistances       0                   
markernames        0                   
markovphase        0                   
missing            -9                  
notambiguous       -999                
numreps            5000                
onerowperind       0                   
phased             0                   
phaseinfo          0                   
phenotype          0                   
ploidy             2                   
popdata            1                   
popflag            1                   
recessivealleles   0

In [5]:

## tell structure to use popinfo
s.extraparams.usepopinfo = 1

## print all other extraparams
s.extraparams

Out[5]:

admburnin           500                 
alpha               1.0                 
alphamax            10.0                
alphapriora         1.0                 
alphapriorb         2.0                 
alphapropsd         0.025               
ancestdist          0                   
ancestpint          0.9                 
computeprob         1                   
echodata            0                   
fpriormean          0.01                
fpriorsd            0.05                
freqscorr           1                   
gensback            2                   
inferalpha          1                   
inferlambda         0                   
intermedsave        0                   
lambda_             1.0                 
linkage             0                   
locispop            0                   
locprior            0                   
locpriorinit        1.0                 
log10rmax           1.0                 
log10rmin           -4.0                
log10rpropsd        0.1                 
log10rstart         -2.0                
maxlocprior         20.0                
metrofreq           10                  
migrprior           0.01                
noadmix             0                   
numboxes            1000                
onefst              0                   
pfrompopflagonly    0                   
popalphas           0                   
popspecificlambda   0                   
printlambda         1                   
printlikes          0                   
printnet            1                   
printqhat           0                   
printqsum           1                   
randomize           0                   
reporthitrate       0                   
seed                12345               
sitebysite          0                   
startatpopinfo      0                   
unifprioralpha      1                   
updatefreq          10000               
usepopinfo          1

In [6]:

s.header

Out[6]:

	labels	popdata	popflag	locdata	phenotype
0	29154_superba
1	30556_thamno
2	30686_cyathophylla
3	32082_przewalskii
4	33413_thamno
5	33588_przewalskii
6	35236_rex
7	35855_rex
8	38362_rex
9	39618_rex
10	40578_rex
11	41478_cyathophylloides
12	41954_cyathophylloides

You can fill it in by filling the header attribute lists¶

popdata is the a priori population assignment of an individual to a population. Assignments should be non-zero integers (e.g., 1, 2, 3). Zero is reserved to mean that there is no a priori assignment. popflag indicates whether or not to use the population assignment in the analysis (1) or to leave it to be inferred (0). So in the example below seven samples have assigned populations (popflag=1), and six samples will have their population assignments inferred (popflag=0). The popdata information will only be used for the seven individuals with assigned pops.

In [7]:

## assign popdata and popflag by entering a list of values
s.popdata = [1, 3, 1, 2, 3, 2, 3, 3, 3, 3, 3, 1, 1]
s.popflag = [1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1]

## print the header information
s.header

Out[7]:

	labels	popdata	popflag
0	29154_superba	1	1
1	30556_thamno	3	1
2	30686_cyathophylla	1	0
3	32082_przewalskii	2	0
4	33413_thamno	3	0
5	33588_przewalskii	2	0
6	35236_rex	3	0
7	35855_rex	3	0
8	38362_rex	3	1
9	39618_rex	3	1
10	40578_rex	3	1
11	41478_cyathophylloides	1	1
12	41954_cyathophylloides	1	1

Write the input files and run STRUCTURE¶

This will write the .str file (and subsample SNPs if you included a mapfile) with the header information included, and it will write a mainparams and extraparams file with the parameter settings that we entered above.

In [8]:

## writes three input files and then call structure yourself
s.write_structure_files(kpop=3)

Out[8]:

('/home/deren/Documents/ipyrad/tests/analysis-structure/tmp-test-3-1.mainparams.txt',
 '/home/deren/Documents/ipyrad/tests/analysis-structure/tmp-test-3-1.extraparams.txt',
 '/home/deren/Documents/ipyrad/tests/analysis-structure/tmp-test-3-1.strfile.txt')

Or, submit jobs directly to the cluster¶

If you start an ipcluster instance (see other tutorials) you can submit structure jobs directly to the cluster and easily collect the results, like below.

In [9]:

## connect to a running ipcluster instance
import ipyparallel as ipp
ipyclient = ipp.Client()
print "connected to {} cores".format(len(ipyclient))

connected to 40 cores

In [10]:

## submit job replicates to ipyclient
s.run(kpop=3, nreps=3, ipyclient=ipyclient)

submitted 3 structure jobs [test-K-3]

In [11]:

## wait for jobs to finish
ipyclient.wait()

Out[11]:

True

Collect results and plot¶

In [12]:

## get table of summarized results
table = s.get_clumpp_table(3)

mean scores across 3 replicates.

In [13]:

## reorder table by membership in groups
table.sort_values(by=[0, 1, 2], inplace=True)

In [14]:

## build barplot
canvas = toyplot.Canvas(width=500, height=250)
axes = canvas.cartesian(bounds=("10%", "90%", "10%", "45%"))
axes.bars(table)

## add labels to x-axis
ticklabels = [i for i in table.index.tolist()]
axes.x.ticks.locator = toyplot.locator.Explicit(labels=ticklabels)
axes.x.ticks.labels.angle = -60
axes.x.ticks.show = True
axes.x.ticks.labels.offset = 10
axes.x.ticks.labels.style = {"font-size": "12px"}