In this tutorial we show how to use iSet in Pyhton. The tutorial requires sample data at http://www.ebi.ac.uk/~casale/data.zip, which can be obtained running the following commands
wget http://www.ebi.ac.uk/~casale/data.zip
unzip data.zip
As we show in this tutorial, iSet can be applied for interaction analysis in two data designs:
iSet is an extension of mtSet that allows to test for polygenic interactions with environment or other contexts. iSet can be applied in designs where all individuals have been phenotyped in each context (complete design) as well as for populations that have been stratified by a context variable (stratified design). We will here see first an application of iSet for analysis of complete designs, and then an application for the analysis of stratified design in the next section.
# activiate inline plotting
%matplotlib inline
import scipy as sp
import scipy.linalg
import limix
from limix.iSet.iset import fit_iSet
import pandas as pd
# base name for bed, bim and fam
bfile = 'data/chrom22_subsample20_maf0.10'
from limix.mtSet.core import plink_reader
# import genotype positions
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
chrom = bim[:, 0].astype(float)
pos = bim[:, -1].astype(float)
# uses splitter to split the genotypes
from limix.mtSet.core.splitter import Splitter
split = Splitter(pos=pos,chrom=chrom)
The method splitGeno
allows to define the regions that will then considered for the analysis with iSet.
Information relative to the calculated regions can be cached in an external file by activating the cache option (see below).
Argument | Default | Datatype | Explanation |
---|---|---|---|
method | 'slidingWindow' | str | Uses a sliding window approach to define regions (a region-based approach will be availabe soon) |
size | 5E+04 (50kb) | float | Window size. Pace is set at half the size of the window |
minSnps | 1 | int | Windows with number of SNPs lower that this threshold are not considered |
maxSnps | sp.inf | int | Windows with number of SNPs higher that this threshold are not considered |
cache | False | bool | If true, it activates the caching |
out_dir | './cache' | str | outdir of the cache file |
fname | None | str | Name of the file |
rewrite | False | bool | If true and the cache file already exists, the cache file is overwritten |
split.splitGeno(cache=True, fname='regions.h5', minSnps=4)
print '%d windows' % split.nWindows
1380 windows
# import phenotype and sample relatedness
pheno_file = './data/pheno.phe'
sample_relatedness_file = './data/chrom22.cov'
sample_relatedness_file = './data/chrom22.cov'
Y = sp.loadtxt(pheno_file)[:,:2]
R = sp.loadtxt(sample_relatedness_file)
# load fixed effect covariates (10 PCs + intercept term)
sp.loadtxt()
# read fam
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
fam = plink_reader.readFAM(bfile,usecols=(0,1))
n_wnds = 10 # only 10 windows are considered
LLR = sp.zeros(n_wnds) # vector with test statistics of the n_wnds regions
df = pd.DataFrame()
df0 = pd.DataFrame()
import time
for wnd_i in range(n_wnds):
t0 = time.time()
wnd_pos = split.wnd_pos[wnd_i]
nSnps = split.nSnps[wnd_i]
idx_wnd_start = split.idx_wnd_start[wnd_i]
print '.. window %d - (%d, %d-%d) - %d snps' % (wnd_i, wnd_pos[0], wnd_pos[1], wnd_pos[2], nSnps)
Xr = plink_reader.readBED(bfile, useMAFencoding=True, start = idx_wnd_start, nSNPs = nSnps, bim=bim , fam=fam)['snps']
xr = sp.dot(sp.rand(Xr.shape[0]), Xr)
idxs_u = sp.sort(sp.unique(xr, return_index=True)[1])
Xr = Xr[:,idxs_u]
Xr-= Xr.mean(0)
Xr/= Xr.std(0)
Xr/= sp.sqrt(Xr.shape[1])
_df, _df0 = fit_iSet(Y, F=F, Xr=Xr, n_nulls=10)
df = df.append(_df)
df0 = df0.append(_df0)
print 'Elapsed:', time.time()-t0
.. window 0 - (22, 16025000-16075000) - 21 snps
/Users/casale/anaconda/lib/python2.7/site-packages/limix-0.8.0.dev0-py2.7-macosx-10.6-x86_64.egg/limix/mtSet/core/plink_reader.py:127: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future SNPs = SP.zeros(((SP.ceil(0.25*N)*4),nSNPs),order=order) /Users/casale/anaconda/lib/python2.7/site-packages/limix-0.8.0.dev0-py2.7-macosx-10.6-x86_64.egg/limix/mtSet/core/plink_reader.py:142: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future bytes = SP.array(bytearray(f.read(nbyte))).reshape((SP.ceil(0.25*N),Sblock),order='F')
Elapsed: 11.3809649944 .. window 1 - (22, 16050000-16100000) - 23 snps Elapsed: 12.6584079266 .. window 2 - (22, 16125000-16175000) - 7 snps Elapsed: 16.8145720959 .. window 3 - (22, 16225000-16275000) - 9 snps Elapsed: 104.873666048 .. window 4 - (22, 16250000-16300000) - 16 snps Elapsed: 20.669064045 .. window 5 - (22, 16275000-16325000) - 12 snps Elapsed: 14.1111199856 .. window 6 - (22, 16325000-16375000) - 5 snps Elapsed: 12.9142489433 .. window 7 - (22, 16350000-16400000) - 5 snps Elapsed: 13.7915680408 .. window 8 - (22, 16475000-16525000) - 8 snps Elapsed: 16.0733628273 .. window 9 - (22, 16500000-16550000) - 7 snps Elapsed: 19.4930050373
The dataframe df contains log likelihood ratio scores for the different models, variance component attributable to persistent rescaling-GxC and heterogeneity-GxC and information about convergence
df
Heterogeneity-GxC var | Persistent Var | Rescaling-GxC Var | iSet LLR | iSet-het LLR | mtSet LLR | |
---|---|---|---|---|---|---|
0 | 0.005392 | 0.010507 | 7.101712e-03 | 5.746295e-01 | 2.224021e-01 | 1.655404e+00 |
0 | 0.007778 | 0.010084 | 5.555618e-03 | 5.572166e-01 | 3.393949e-01 | 1.605843e+00 |
0 | 0.000050 | 0.000050 | 4.840134e-10 | 7.581852e-03 | -8.268391e-07 | -8.269062e-07 |
0 | 0.000050 | 0.006019 | 1.255343e-02 | 2.000499e+00 | 4.871453e-09 | 2.296836e+00 |
0 | 0.000050 | 0.004385 | 1.233389e-02 | 1.820602e+00 | -8.327561e-11 | 1.820689e+00 |
0 | 0.000050 | 0.000050 | 3.900443e-12 | 6.427395e-07 | -2.638672e-10 | -3.454943e-10 |
0 | 0.000050 | 0.000935 | 2.033645e-04 | 1.718735e-02 | 1.829665e-08 | 1.718657e-02 |
0 | 0.000050 | 0.000050 | 2.224766e-13 | 7.225904e-03 | 1.202191e-07 | -2.722800e-10 |
0 | 0.000827 | 0.003191 | 8.814909e-03 | 8.597470e-01 | 1.172438e-02 | 8.970728e-01 |
0 | 0.000050 | 0.001833 | 8.049454e-03 | 5.902992e-01 | 1.579309e-08 | 5.902869e-01 |
The dataframe df0 contains log likelihood ratios when data are from the null. These are necessary to iSet to calculate P values.
df0
iSet LLR0 | iSet-het LLR0 | mtSet LLR0 | |
---|---|---|---|
0 | 1.555267e-02 | -1.114967e-08 | 6.763972e-01 |
1 | 5.138475e-03 | 1.666820e-08 | 9.962754e-02 |
2 | 1.322256e+00 | 3.478742e-07 | 3.593482e-01 |
3 | 5.303700e-03 | 3.838863e-09 | -1.591616e-12 |
4 | 2.649577e-02 | 1.236790e-07 | 4.307782e-01 |
5 | 2.371773e+00 | 1.979566e-08 | -2.283286e-09 |
6 | 1.140588e-01 | 8.015625e-07 | -3.249224e-07 |
7 | 7.300552e-05 | -1.030742e-09 | 5.007758e-03 |
8 | 1.931133e-01 | -8.095640e-10 | -3.333867e-10 |
9 | 1.024240e-03 | 2.546921e-08 | 6.409358e-01 |
0 | 1.538946e+00 | 6.648315e-08 | 4.238569e-03 |
1 | 2.684850e-01 | 1.207556e-07 | 3.899477e-01 |
2 | 1.982851e-01 | 2.740084e-07 | 6.468065e-01 |
3 | 5.448258e-02 | 2.304423e-07 | 2.953999e-01 |
4 | 1.040587e-01 | 3.410205e-07 | 2.598717e-02 |
5 | 5.039921e-01 | 4.222368e-07 | 9.883369e-02 |
6 | 1.169442e-01 | -1.964736e-09 | 4.058230e-01 |
7 | 4.981047e+00 | -1.250402e-08 | 1.874179e-01 |
8 | 1.376885e-01 | 5.323929e-02 | 5.414452e-01 |
9 | 1.713601e-02 | -9.514019e-08 | -2.625995e-09 |
0 | 7.129975e-01 | -7.318970e-07 | -8.095083e-08 |
1 | 1.773877e-02 | 8.319699e-08 | 3.107571e-01 |
2 | 5.700531e-07 | -1.459642e-08 | -1.425803e-09 |
3 | 5.083909e-03 | 4.496542e-09 | 1.231549e-01 |
4 | 2.108239e-01 | 2.066857e-01 | -2.949758e-08 |
5 | 2.454567e+00 | 8.748543e-09 | -1.677336e-09 |
6 | 8.920570e-02 | -2.443301e-09 | 1.382382e-01 |
7 | 3.982626e-01 | 1.240664e-09 | 3.098874e-01 |
8 | 6.941323e-01 | 3.221816e-07 | 6.301227e-02 |
9 | 5.363783e-03 | 1.982642e-09 | 1.342524e+00 |
... | ... | ... | ... |
0 | 6.824015e-03 | -2.870593e-12 | 1.024917e-01 |
1 | 1.227777e-03 | 1.075570e-06 | -8.662937e-11 |
2 | 2.025293e-02 | 3.847382e-07 | 5.108097e-01 |
3 | 8.013020e-01 | 8.046220e-08 | 3.585721e+00 |
4 | 7.454272e-03 | 5.885283e-10 | -2.145043e-09 |
5 | 2.169429e-01 | 5.314251e-08 | 1.388684e-01 |
6 | 6.550467e-03 | -4.160765e-08 | 6.302979e-01 |
7 | 1.268165e-01 | -1.860755e-04 | 1.117286e+00 |
8 | 8.665353e-01 | 5.419619e-08 | -1.076216e-09 |
9 | 1.980706e-03 | 8.179478e-08 | 7.312406e-01 |
0 | 1.366847e-03 | 1.222673e-07 | -1.336459e-06 |
1 | 4.231934e-02 | 6.916301e-07 | -1.545544e-08 |
2 | 6.360202e-01 | 6.561397e-07 | 2.556833e-01 |
3 | 3.579892e-01 | 1.243274e-06 | -8.588472e-10 |
4 | 1.162855e-01 | -4.903882e-10 | 1.918087e-01 |
5 | 2.841780e-01 | -2.251682e-08 | 3.612482e-02 |
6 | 4.007141e-01 | 6.620144e-07 | 6.667211e-01 |
7 | 1.253339e+00 | 9.091066e-04 | -2.502361e-09 |
8 | 3.956992e-03 | 8.407589e-07 | -5.046115e-08 |
9 | 2.841871e-03 | 6.264202e-09 | -5.674963e-09 |
0 | 8.987440e-02 | -1.561793e-07 | -5.309630e-09 |
1 | 1.802102e-06 | 1.770561e-07 | 5.294796e-02 |
2 | 1.271400e-02 | 2.876845e-09 | 2.589745e+00 |
3 | 1.451155e-02 | -2.205797e-07 | 6.947324e-01 |
4 | 1.653160e+00 | 7.874888e-09 | 2.198925e+00 |
5 | 2.299080e-01 | 9.061552e-08 | 9.677878e-01 |
6 | 2.180528e-03 | 3.503377e-07 | 2.677990e-01 |
7 | 6.993240e-03 | -9.168471e-08 | 3.555292e-02 |
8 | 3.878006e-01 | -3.444711e-11 | 6.384556e-03 |
9 | 7.838907e-03 | 1.011358e-08 | -8.930101e-11 |
100 rows × 3 columns
import pylab as pl
tot_var = df['Persistent Var'].values + df['Rescaling-GxC Var'].values + df['Heterogeneity-GxC var'].values
nohet_var = df['Persistent Var'].values + df['Rescaling-GxC Var'].values
pl.fill_between(split.wnd_pos[:n_wnds,1], 0, df['Persistent Var'].values, color='gray')
pl.fill_between(split.wnd_pos[:n_wnds,1], df['Persistent Var'].values, nohet_var, color='Orange')
pl.fill_between(split.wnd_pos[:n_wnds,1], nohet_var, tot_var, color='Gold')
<matplotlib.collections.PolyCollection at 0x115777490>
Empirical P values are obtained from a relatively small number of genome-wide permutations by pooling across all conisdered steps.
from limix.mtSet.core.iset_utils import calc_emp_pv_eff
#calculate P values for the three tests
for test in ['mtSet', 'iSet', 'iSet-het']:
df[test+' pv'] = calc_emp_pv_eff(df[test+' LLR'].values, df0[test+' LLR0'].values)
#makes a manhattan plot
wnd_start = split.wnd_pos[:n_wnds,1]
wnd_end = split.wnd_pos[:n_wnds,2]
import pylab as pl
pl.plot(wnd_start, -sp.log10(df['mtSet pv'].values), 'o', color='DarkGreen')
pl.plot(wnd_start, -sp.log10(df['iSet pv'].values), 'o', color='DarkRed')
pl.plot(wnd_start, -sp.log10(df['iSet-het pv'].values), 'o', color='Gold')
[<matplotlib.lines.Line2D at 0x1163e2450>]
iSet is an extension of mtSet that allows to test for polygenic interactions with environment or other contexts. iSet can be applied in designs where all individuals have been phenotyped in each context (complete design) as well as for populations that have been stratified by a context variable (stratified design). We will here see application of iSet for analysis of stratified designs.
# activiate inline plotting
%matplotlib inline
from download_examples import get_1000G_mtSet
import scipy as sp
import scipy.linalg
import limix
from limix.iSet.iset import fit_iSet
import pandas as pd
# loading 1000G genotypes for mtSet demo
get_1000G_mtSet()
# base name for bed, bim and fam
bfile = './data/1000g/chrom22_subsample20_maf0.10'
from limix.mtSet.core import plink_reader
# import genotype positions
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
chrom = bim[:, 0].astype(float)
pos = bim[:, -1].astype(float)
# uses splitter to split the genotypes
from limix.mtSet.core.splitter import Splitter
split = Splitter(pos=pos,chrom=chrom)
The method splitGeno
allows to define the regions that will then considered for the analysis with iSet.
Information relative to the calculated regions can be cached in an external file by activating the cache option (see below).
Argument | Default | Datatype | Explanation |
---|---|---|---|
method | 'slidingWindow' | str | Uses a sliding window approach to define regions (a region-based approach will be availabe soon) |
size | 5E+04 (50kb) | float | Window size. Pace is set at half the size of the window |
minSnps | 1 | int | Windows with number of SNPs lower that this threshold are not considered |
maxSnps | sp.inf | int | Windows with number of SNPs higher that this threshold are not considered |
cache | False | bool | If true, it activates the caching |
out_dir | './cache' | str | outdir of the cache file |
fname | None | str | Name of the file |
rewrite | False | bool | If true and the cache file already exists, the cache file is overwritten |
split.splitGeno(cache=True, fname='regions.h5', minSnps=4)
print '%d windows' % split.nWindows
1380 windows
# import phenotype and sample relatedness
pheno_file = './data/1000g/pheno.phe'
sample_relatedness_file = './data/1000g/chrom22.cov'
Y = sp.loadtxt(pheno_file)[:,:1]
R = sp.loadtxt(sample_relatedness_file)
print(Y.shape)
(274, 1)
# let's suppose the first half of the individuals are phenotyped in context A and
# the second half on context B
Ie = sp.arange(R.shape[0])<0.5*R.shape[0]
# corrects for population structure using the first 10 PCs of the relatedness matrix
S_R, U_R = scipy.linalg.eigh(R+1e-4*sp.eye(R.shape[0])) # these are needed for the full mtSet model
F = sp.concatenate([U_R[:,-10:], sp.ones([U_R.shape[0], 1])], 1)
# read fam
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
fam = plink_reader.readFAM(bfile,usecols=(0,1))
n_wnds = 10 # only 10 windows are considered
LLR = sp.zeros(n_wnds) # vector with test statistics of the n_wnds regions
df = pd.DataFrame()
df0 = pd.DataFrame()
import time
for wnd_i in range(n_wnds):
t0 = time.time()
wnd_pos = split.wnd_pos[wnd_i]
nSnps = split.nSnps[wnd_i]
idx_wnd_start = split.idx_wnd_start[wnd_i]
print '.. window %d - (%d, %d-%d) - %d snps' % (wnd_i, wnd_pos[0], wnd_pos[1], wnd_pos[2], nSnps)
Xr = plink_reader.readBED(bfile, useMAFencoding=True, start = idx_wnd_start, nSNPs = nSnps, bim=bim , fam=fam)['snps']
xr = sp.dot(sp.rand(Xr.shape[0]), Xr)
idxs_u = sp.sort(sp.unique(xr, return_index=True)[1])
Xr = Xr[:,idxs_u]
Xr-= Xr.mean(0)
Xr/= Xr.std(0)
Xr/= sp.sqrt(Xr.shape[1])
_df, _df0 = fit_iSet(Y, F=F, Xr=Xr, Ie=Ie, n_nulls=10)
df = df.append(_df)
df0 = df0.append(_df0)
print 'Elapsed:', time.time()-t0
.. window 0 - (22, 16025000-16075000) - 21 snps Elapsed: 5.46386289597 .. window 1 - (22, 16050000-16100000) - 23 snps Elapsed: 5.34179997444 .. window 2 - (22, 16125000-16175000) - 7 snps Elapsed: 4.30499100685 .. window 3 - (22, 16225000-16275000) - 9 snps Elapsed: 5.6210091114 .. window 4 - (22, 16250000-16300000) - 16 snps Elapsed: 5.8319299221 .. window 5 - (22, 16275000-16325000) - 12 snps Elapsed: 5.72252416611 .. window 6 - (22, 16325000-16375000) - 5 snps Elapsed: 5.94124102592 .. window 7 - (22, 16350000-16400000) - 5 snps Elapsed: 4.80616307259 .. window 8 - (22, 16475000-16525000) - 8 snps Elapsed: 4.15918207169 .. window 9 - (22, 16500000-16550000) - 7 snps Elapsed: 4.45016479492
The dataframe df contains log likelihood ratio scores for the different models, variance component attributable to persistent rescaling-GxC and heterogeneity-GxC and information about convergence.
df
Heterogeneity-GxC var | Persistent Var | Rescaling-GxC Var | iSet LLR | iSet-het LLR | mtSet LLR | |
---|---|---|---|---|---|---|
0 | -0.025807 | 0.059489 | 1.620445e-02 | 2.536120e-01 | 1.431980e-01 | 1.253018e+00 |
0 | -0.017351 | 0.051698 | 8.162345e-03 | 1.151319e-01 | 9.235339e-02 | 1.046717e+00 |
0 | -0.000100 | 0.000100 | 4.437414e-11 | 1.728208e-07 | -4.211657e-09 | -4.237307e-09 |
0 | -0.017640 | 0.002922 | 3.255356e-02 | 5.018203e-01 | 1.145868e-08 | 5.018202e-01 |
0 | -0.003076 | 0.001255 | 4.877466e-03 | 2.303760e-02 | -1.327507e-09 | 2.301929e-02 |
0 | -0.000100 | 0.000100 | 7.126069e-14 | 7.600009e-07 | 2.849561e-10 | -2.643219e-12 |
0 | -0.013473 | 0.013947 | 1.353159e-02 | 3.469951e-01 | -2.905769e-04 | 3.541518e-01 |
0 | -0.000100 | 0.000100 | 1.501852e-07 | 4.185099e-07 | -6.807419e-07 | -6.841037e-07 |
0 | -0.001059 | 0.050339 | 7.098318e-04 | 1.430663e+00 | 5.423559e-01 | 2.143354e+00 |
0 | -0.014258 | 0.025472 | 1.755192e-02 | 9.496528e-01 | 5.507864e-02 | 9.827788e-01 |
The dataframe df0 contains log likelihood ratios when data are from the null. These are necessary to iSet to calculate P values.
df0
iSet LLR0 | iSet-het LLR0 | mtSet LLR0 | |
---|---|---|---|
0 | 8.701465e-01 | 4.699373e-09 | 7.641269e-01 |
1 | 1.326539e+00 | 1.775959e-08 | 7.268617e-01 |
2 | 1.682668e-01 | 4.048806e-08 | 4.109243e-01 |
3 | 6.993162e-01 | 3.979626e-02 | 6.094640e-02 |
4 | 4.454416e-01 | 9.676313e-09 | 1.360218e+00 |
5 | 3.325469e-01 | 2.781690e-08 | -2.262937e-10 |
6 | 3.882787e-03 | 3.794824e-08 | -1.680718e-09 |
7 | 1.536725e+00 | 4.458499e-08 | -1.468123e-09 |
8 | 4.738676e-02 | 1.543978e-08 | -2.570744e-11 |
9 | 1.615154e+00 | 4.746059e-08 | 1.436369e+00 |
0 | 2.896440e+00 | -3.513286e-07 | 7.101769e-01 |
1 | 5.469898e-01 | 1.827495e-08 | -2.736243e-09 |
2 | 6.700332e-02 | 1.055858e-01 | -1.757883e-11 |
3 | 9.861550e-02 | 3.519875e-08 | -1.529388e-07 |
4 | 6.305888e-01 | 3.242017e-08 | 4.032068e-01 |
5 | 1.184665e+00 | -7.830135e-08 | 9.479611e-01 |
6 | 7.165629e-01 | 1.139197e-02 | -9.854688e-09 |
7 | 5.446231e-02 | 2.438270e-09 | 1.109642e-01 |
8 | 3.084690e-01 | 5.544223e-08 | -3.404310e-09 |
9 | 8.050786e-02 | 1.705288e-09 | 1.399125e-02 |
0 | 6.840853e-04 | 5.868465e-01 | 3.604761e-01 |
1 | 9.845385e-08 | 1.434927e-09 | 1.186774e-02 |
2 | 1.190925e-03 | -2.374596e-07 | -5.115908e-13 |
3 | 2.072626e+00 | 1.207957e-08 | 1.874106e-03 |
4 | 2.848547e-01 | -2.602064e-09 | 6.681083e-01 |
5 | 5.402823e-01 | -2.528708e-09 | -6.086225e-10 |
6 | 7.867916e-01 | 3.331607e-09 | 8.866508e-02 |
7 | 3.144970e-02 | 4.968342e-09 | -6.268286e-08 |
8 | 9.240636e-06 | 6.960661e-09 | -3.770140e-11 |
9 | 1.408660e-06 | 8.478040e-09 | 5.024917e-01 |
... | ... | ... | ... |
0 | 3.618376e-01 | 1.625580e-10 | -9.757457e-10 |
1 | 1.296286e-02 | -9.717652e-09 | -3.811223e-09 |
2 | 2.637526e-01 | -8.319745e-10 | 1.760752e-01 |
3 | 5.198513e-01 | 3.027425e-08 | 8.526513e-14 |
4 | 2.249842e-01 | -1.295462e-09 | -2.532204e-09 |
5 | 3.049868e-06 | -1.456584e-09 | 7.744523e-01 |
6 | 1.695221e-01 | 8.543087e-08 | 2.239574e-03 |
7 | 7.297701e-01 | 1.763445e-08 | 1.270094e+00 |
8 | 2.980555e-01 | 7.361109e-09 | 1.076726e-02 |
9 | 6.107584e-01 | 1.563791e-09 | 5.718380e-01 |
0 | 3.746989e-01 | 8.190568e-10 | -2.868344e-08 |
1 | 2.557176e-04 | -9.237351e-08 | -3.477284e-07 |
2 | 9.510620e-01 | -7.084111e-11 | 8.870894e-02 |
3 | 7.999111e-02 | 1.019976e-08 | 4.138307e-01 |
4 | 3.914442e-02 | 1.043787e-08 | -2.800391e-10 |
5 | 1.115584e+00 | 1.265378e+00 | 2.756960e-04 |
6 | 6.104677e-02 | 5.224194e-09 | 1.050632e-01 |
7 | 1.526689e+00 | 6.251213e-10 | 5.871766e-02 |
8 | 4.908732e-05 | -6.544099e-11 | 1.440395e-01 |
9 | 1.342923e-01 | 6.910514e-08 | 1.195091e+00 |
0 | 2.524510e-01 | 1.705915e+00 | 7.643556e-03 |
1 | 2.279359e-01 | 2.201907e-08 | 1.626311e+00 |
2 | 2.878298e-01 | 1.038200e+00 | 1.202775e-01 |
3 | 1.121573e-06 | 3.969859e-09 | 4.472819e-01 |
4 | 1.029227e-01 | -6.920686e-10 | -2.700062e-13 |
5 | 1.121706e-05 | 1.542335e-07 | 6.712342e-01 |
6 | 1.083405e-01 | 3.995524e-10 | 6.778384e-02 |
7 | 7.616543e-01 | 1.344246e-08 | 7.651055e-01 |
8 | 2.134231e-02 | 9.555620e-09 | 3.038097e-01 |
9 | 3.509122e-02 | 3.415337e-09 | 1.191820e+00 |
100 rows × 3 columns
import pylab as pl
tot_var = df['Persistent Var'].values + df['Rescaling-GxC Var'].values + df['Heterogeneity-GxC var'].values
nohet_var = df['Persistent Var'].values + df['Rescaling-GxC Var'].values
pl.fill_between(split.wnd_pos[:n_wnds,1], 0, df['Persistent Var'].values, color='gray')
pl.fill_between(split.wnd_pos[:n_wnds,1], df['Persistent Var'].values, nohet_var, color='Orange')
pl.fill_between(split.wnd_pos[:n_wnds,1], nohet_var, tot_var, color='Gold')
<matplotlib.collections.PolyCollection at 0x11aea0ad0>
Empirical P values are obtained from a relatively small number of genome-wide permutations by pooling across all considered steps.
from limix.mtSet.core.iset_utils import calc_emp_pv_eff
#calculate P values for the three tests
for test in ['mtSet', 'iSet', 'iSet-het']:
df[test+' pv'] = calc_emp_pv_eff(df[test+' LLR'].values, df0[test+' LLR0'].values)
#makes a manhattan plot
wnd_start = split.wnd_pos[:n_wnds,1]
wnd_end = split.wnd_pos[:n_wnds,2]
import pylab as pl
pl.plot(wnd_start, -sp.log10(df['mtSet pv'].values), 'o', color='DarkGreen')
pl.plot(wnd_start, -sp.log10(df['iSet pv'].values), 'o', color='DarkRed')
pl.plot(wnd_start, -sp.log10(df['iSet-het pv'].values), 'o', color='Gold')
[<matplotlib.lines.Line2D at 0x115bbbb90>]