Using iSet in Python¶

In this tutorial we show how to use iSet in Pyhton. The tutorial requires sample data at http://www.ebi.ac.uk/~casale/data.zip, which can be obtained running the following commands

wget http://www.ebi.ac.uk/~casale/data.zip
unzip data.zip

As we show in this tutorial, iSet can be applied for interaction analysis in two data designs:

complete design, where all individuals have been phenotyped in each context
stratified design, where each individual has been phenotyped in only one of the two contexts

Complete design¶

iSet is an extension of mtSet that allows to test for polygenic interactions with environment or other contexts. iSet can be applied in designs where all individuals have been phenotyped in each context (complete design) as well as for populations that have been stratified by a context variable (stratified design). We will here see first an application of iSet for analysis of complete designs, and then an application for the analysis of stratified design in the next section.

Setting up¶

In [4]:

# activiate inline plotting
%matplotlib inline

import scipy as sp
import scipy.linalg
import limix
from limix.iSet.iset import fit_iSet
import pandas as pd

In [1]:

# base name for bed, bim and fam
bfile = 'data/chrom22_subsample20_maf0.10'

Split genotypes into regions¶

In [4]:

from limix.mtSet.core import plink_reader

In [5]:

# import genotype positions
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
chrom = bim[:, 0].astype(float)
pos = bim[:, -1].astype(float)

In [6]:

# uses splitter to split the genotypes
from limix.mtSet.core.splitter import Splitter
split = Splitter(pos=pos,chrom=chrom)

The method splitGeno allows to define the regions that will then considered for the analysis with iSet. Information relative to the calculated regions can be cached in an external file by activating the cache option (see below).

Argument	Default	Datatype	Explanation
method	'slidingWindow'	str	Uses a sliding window approach to define regions (a region-based approach will be availabe soon)
size	5E+04 (50kb)	float	Window size. Pace is set at half the size of the window
minSnps	1	int	Windows with number of SNPs lower that this threshold are not considered
maxSnps	sp.inf	int	Windows with number of SNPs higher that this threshold are not considered
cache	False	bool	If true, it activates the caching
out_dir	'./cache'	str	outdir of the cache file
fname	None	str	Name of the file
rewrite	False	bool	If true and the cache file already exists, the cache file is overwritten

In [7]:

split.splitGeno(cache=True, fname='regions.h5', minSnps=4)
print '%d windows' % split.nWindows

1380 windows

Apply¶

In [8]:

# import phenotype and sample relatedness
pheno_file = './data/pheno.phe'
sample_relatedness_file = './data/chrom22.cov'
sample_relatedness_file = './data/chrom22.cov'
Y = sp.loadtxt(pheno_file)[:,:2]
R = sp.loadtxt(sample_relatedness_file)

In [9]:

# load fixed effect covariates (10 PCs + intercept term)
sp.loadtxt()

In [10]:

# read fam
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
fam = plink_reader.readFAM(bfile,usecols=(0,1))

In [11]:

n_wnds = 10 # only 10 windows are considered
LLR = sp.zeros(n_wnds) # vector with test statistics of the n_wnds regions
df = pd.DataFrame()
df0 = pd.DataFrame()
import time

for wnd_i in range(n_wnds):
    t0 = time.time()
    wnd_pos = split.wnd_pos[wnd_i]
    nSnps = split.nSnps[wnd_i]
    idx_wnd_start = split.idx_wnd_start[wnd_i]
    print '.. window %d - (%d, %d-%d) - %d snps' % (wnd_i, wnd_pos[0], wnd_pos[1], wnd_pos[2], nSnps)

    Xr = plink_reader.readBED(bfile, useMAFencoding=True, start = idx_wnd_start, nSNPs = nSnps, bim=bim , fam=fam)['snps']
    xr = sp.dot(sp.rand(Xr.shape[0]), Xr)
    idxs_u = sp.sort(sp.unique(xr, return_index=True)[1])
    Xr = Xr[:,idxs_u]
    Xr-= Xr.mean(0)
    Xr/= Xr.std(0)
    Xr/= sp.sqrt(Xr.shape[1])

    _df, _df0 = fit_iSet(Y, F=F, Xr=Xr, n_nulls=10)
    df  = df.append(_df)
    df0 = df0.append(_df0)
    print 'Elapsed:', time.time()-t0

.. window 0 - (22, 16025000-16075000) - 21 snps

/Users/casale/anaconda/lib/python2.7/site-packages/limix-0.8.0.dev0-py2.7-macosx-10.6-x86_64.egg/limix/mtSet/core/plink_reader.py:127: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  SNPs = SP.zeros(((SP.ceil(0.25*N)*4),nSNPs),order=order)
/Users/casale/anaconda/lib/python2.7/site-packages/limix-0.8.0.dev0-py2.7-macosx-10.6-x86_64.egg/limix/mtSet/core/plink_reader.py:142: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bytes = SP.array(bytearray(f.read(nbyte))).reshape((SP.ceil(0.25*N),Sblock),order='F')

Elapsed: 11.3809649944
.. window 1 - (22, 16050000-16100000) - 23 snps
Elapsed: 12.6584079266
.. window 2 - (22, 16125000-16175000) - 7 snps
Elapsed: 16.8145720959
.. window 3 - (22, 16225000-16275000) - 9 snps
Elapsed: 104.873666048
.. window 4 - (22, 16250000-16300000) - 16 snps
Elapsed: 20.669064045
.. window 5 - (22, 16275000-16325000) - 12 snps
Elapsed: 14.1111199856
.. window 6 - (22, 16325000-16375000) - 5 snps
Elapsed: 12.9142489433
.. window 7 - (22, 16350000-16400000) - 5 snps
Elapsed: 13.7915680408
.. window 8 - (22, 16475000-16525000) - 8 snps
Elapsed: 16.0733628273
.. window 9 - (22, 16500000-16550000) - 7 snps
Elapsed: 19.4930050373

The dataframe df contains log likelihood ratio scores for the different models, variance component attributable to persistent rescaling-GxC and heterogeneity-GxC and information about convergence

In [12]:

df

Out[12]:

Heterogeneity-GxC var	Persistent Var	Rescaling-GxC Var	iSet LLR	iSet-het LLR	mtSet LLR
0.005392	0.010507	7.101712e-03	5.746295e-01	2.224021e-01	1.655404e+00
0.007778	0.010084	5.555618e-03	5.572166e-01	3.393949e-01	1.605843e+00
0.000050	0.000050	4.840134e-10	7.581852e-03	-8.268391e-07	-8.269062e-07
0.000050	0.006019	1.255343e-02	2.000499e+00	4.871453e-09	2.296836e+00
0.000050	0.004385	1.233389e-02	1.820602e+00	-8.327561e-11	1.820689e+00
0.000050	0.000050	3.900443e-12	6.427395e-07	-2.638672e-10	-3.454943e-10
0.000050	0.000935	2.033645e-04	1.718735e-02	1.829665e-08	1.718657e-02
0.000050	0.000050	2.224766e-13	7.225904e-03	1.202191e-07	-2.722800e-10
0.000827	0.003191	8.814909e-03	8.597470e-01	1.172438e-02	8.970728e-01
0.000050	0.001833	8.049454e-03	5.902992e-01	1.579309e-08	5.902869e-01

The dataframe df0 contains log likelihood ratios when data are from the null. These are necessary to iSet to calculate P values.

In [13]:

df0

Out[13]:

	iSet LLR0	iSet-het LLR0	mtSet LLR0
0	1.555267e-02	-1.114967e-08	6.763972e-01
1	5.138475e-03	1.666820e-08	9.962754e-02
2	1.322256e+00	3.478742e-07	3.593482e-01
3	5.303700e-03	3.838863e-09	-1.591616e-12
4	2.649577e-02	1.236790e-07	4.307782e-01
5	2.371773e+00	1.979566e-08	-2.283286e-09
6	1.140588e-01	8.015625e-07	-3.249224e-07
7	7.300552e-05	-1.030742e-09	5.007758e-03
8	1.931133e-01	-8.095640e-10	-3.333867e-10
9	1.024240e-03	2.546921e-08	6.409358e-01
0	1.538946e+00	6.648315e-08	4.238569e-03
1	2.684850e-01	1.207556e-07	3.899477e-01
2	1.982851e-01	2.740084e-07	6.468065e-01
3	5.448258e-02	2.304423e-07	2.953999e-01
4	1.040587e-01	3.410205e-07	2.598717e-02
5	5.039921e-01	4.222368e-07	9.883369e-02
6	1.169442e-01	-1.964736e-09	4.058230e-01
7	4.981047e+00	-1.250402e-08	1.874179e-01
8	1.376885e-01	5.323929e-02	5.414452e-01
9	1.713601e-02	-9.514019e-08	-2.625995e-09
0	7.129975e-01	-7.318970e-07	-8.095083e-08
1	1.773877e-02	8.319699e-08	3.107571e-01
2	5.700531e-07	-1.459642e-08	-1.425803e-09
3	5.083909e-03	4.496542e-09	1.231549e-01
4	2.108239e-01	2.066857e-01	-2.949758e-08
5	2.454567e+00	8.748543e-09	-1.677336e-09
6	8.920570e-02	-2.443301e-09	1.382382e-01
7	3.982626e-01	1.240664e-09	3.098874e-01
8	6.941323e-01	3.221816e-07	6.301227e-02
9	5.363783e-03	1.982642e-09	1.342524e+00
...	...	...	...
0	6.824015e-03	-2.870593e-12	1.024917e-01
1	1.227777e-03	1.075570e-06	-8.662937e-11
2	2.025293e-02	3.847382e-07	5.108097e-01
3	8.013020e-01	8.046220e-08	3.585721e+00
4	7.454272e-03	5.885283e-10	-2.145043e-09
5	2.169429e-01	5.314251e-08	1.388684e-01
6	6.550467e-03	-4.160765e-08	6.302979e-01
7	1.268165e-01	-1.860755e-04	1.117286e+00
8	8.665353e-01	5.419619e-08	-1.076216e-09
9	1.980706e-03	8.179478e-08	7.312406e-01
0	1.366847e-03	1.222673e-07	-1.336459e-06
1	4.231934e-02	6.916301e-07	-1.545544e-08
2	6.360202e-01	6.561397e-07	2.556833e-01
3	3.579892e-01	1.243274e-06	-8.588472e-10
4	1.162855e-01	-4.903882e-10	1.918087e-01
5	2.841780e-01	-2.251682e-08	3.612482e-02
6	4.007141e-01	6.620144e-07	6.667211e-01
7	1.253339e+00	9.091066e-04	-2.502361e-09
8	3.956992e-03	8.407589e-07	-5.046115e-08
9	2.841871e-03	6.264202e-09	-5.674963e-09
0	8.987440e-02	-1.561793e-07	-5.309630e-09
1	1.802102e-06	1.770561e-07	5.294796e-02
2	1.271400e-02	2.876845e-09	2.589745e+00
3	1.451155e-02	-2.205797e-07	6.947324e-01
4	1.653160e+00	7.874888e-09	2.198925e+00
5	2.299080e-01	9.061552e-08	9.677878e-01
6	2.180528e-03	3.503377e-07	2.677990e-01
7	6.993240e-03	-9.168471e-08	3.555292e-02
8	3.878006e-01	-3.444711e-11	6.384556e-03
9	7.838907e-03	1.011358e-08	-8.930101e-11

100 rows × 3 columns

Variance Component Plot¶

In [14]:

import pylab as pl
tot_var = df['Persistent Var'].values + df['Rescaling-GxC Var'].values + df['Heterogeneity-GxC var'].values
nohet_var = df['Persistent Var'].values + df['Rescaling-GxC Var'].values
pl.fill_between(split.wnd_pos[:n_wnds,1], 0, df['Persistent Var'].values, color='gray')
pl.fill_between(split.wnd_pos[:n_wnds,1], df['Persistent Var'].values, nohet_var, color='Orange')
pl.fill_between(split.wnd_pos[:n_wnds,1], nohet_var, tot_var, color='Gold')

Out[14]:

<matplotlib.collections.PolyCollection at 0x115777490>

P-values¶

Empirical P values are obtained from a relatively small number of genome-wide permutations by pooling across all conisdered steps.

In [15]:

from limix.mtSet.core.iset_utils import calc_emp_pv_eff
#calculate P values for the three tests
for test in ['mtSet', 'iSet', 'iSet-het']:
    df[test+' pv'] = calc_emp_pv_eff(df[test+' LLR'].values, df0[test+' LLR0'].values)

In [16]:

#makes a manhattan plot
wnd_start = split.wnd_pos[:n_wnds,1]
wnd_end = split.wnd_pos[:n_wnds,2]
import pylab as pl
pl.plot(wnd_start, -sp.log10(df['mtSet pv'].values), 'o', color='DarkGreen')
pl.plot(wnd_start, -sp.log10(df['iSet pv'].values), 'o', color='DarkRed')
pl.plot(wnd_start, -sp.log10(df['iSet-het pv'].values), 'o', color='Gold')

Out[16]:

[<matplotlib.lines.Line2D at 0x1163e2450>]

Stratified design¶

iSet is an extension of mtSet that allows to test for polygenic interactions with environment or other contexts. iSet can be applied in designs where all individuals have been phenotyped in each context (complete design) as well as for populations that have been stratified by a context variable (stratified design). We will here see application of iSet for analysis of stratified designs.

Setting up¶

In [18]:

# activiate inline plotting
%matplotlib inline

from download_examples import get_1000G_mtSet
import scipy as sp
import scipy.linalg
import limix
from limix.iSet.iset import fit_iSet
import pandas as pd

In [19]:

# loading 1000G genotypes for mtSet demo
get_1000G_mtSet()

In [20]:

# base name for bed, bim and fam
bfile = './data/1000g/chrom22_subsample20_maf0.10'

Split genotypes into regions¶

In [21]:

from limix.mtSet.core import plink_reader

In [22]:

# import genotype positions
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
chrom = bim[:, 0].astype(float)
pos = bim[:, -1].astype(float)

In [23]:

# uses splitter to split the genotypes
from limix.mtSet.core.splitter import Splitter
split = Splitter(pos=pos,chrom=chrom)

The method splitGeno allows to define the regions that will then considered for the analysis with iSet. Information relative to the calculated regions can be cached in an external file by activating the cache option (see below).

Argument	Default	Datatype	Explanation
method	'slidingWindow'	str	Uses a sliding window approach to define regions (a region-based approach will be availabe soon)
size	5E+04 (50kb)	float	Window size. Pace is set at half the size of the window
minSnps	1	int	Windows with number of SNPs lower that this threshold are not considered
maxSnps	sp.inf	int	Windows with number of SNPs higher that this threshold are not considered
cache	False	bool	If true, it activates the caching
out_dir	'./cache'	str	outdir of the cache file
fname	None	str	Name of the file
rewrite	False	bool	If true and the cache file already exists, the cache file is overwritten

In [24]:

split.splitGeno(cache=True, fname='regions.h5', minSnps=4)
print '%d windows' % split.nWindows

1380 windows

Apply¶

In [25]:

# import phenotype and sample relatedness
pheno_file = './data/1000g/pheno.phe'
sample_relatedness_file = './data/1000g/chrom22.cov'
Y = sp.loadtxt(pheno_file)[:,:1]
R = sp.loadtxt(sample_relatedness_file)
print(Y.shape)

(274, 1)

In [26]:

# let's suppose the first half of the individuals are phenotyped in context A and
# the second half on context B
Ie = sp.arange(R.shape[0])<0.5*R.shape[0]

In [27]:

# corrects for population structure using the first 10 PCs of the relatedness matrix
S_R, U_R = scipy.linalg.eigh(R+1e-4*sp.eye(R.shape[0])) # these are needed for the full mtSet model
F = sp.concatenate([U_R[:,-10:], sp.ones([U_R.shape[0], 1])], 1)

In [28]:

# read fam
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
fam = plink_reader.readFAM(bfile,usecols=(0,1))

In [29]:

n_wnds = 10 # only 10 windows are considered
LLR = sp.zeros(n_wnds) # vector with test statistics of the n_wnds regions
df = pd.DataFrame()
df0 = pd.DataFrame()
import time

for wnd_i in range(n_wnds):
    t0 = time.time()
    wnd_pos = split.wnd_pos[wnd_i]
    nSnps = split.nSnps[wnd_i]
    idx_wnd_start = split.idx_wnd_start[wnd_i]
    print '.. window %d - (%d, %d-%d) - %d snps' % (wnd_i, wnd_pos[0], wnd_pos[1], wnd_pos[2], nSnps)

    Xr = plink_reader.readBED(bfile, useMAFencoding=True, start = idx_wnd_start, nSNPs = nSnps, bim=bim , fam=fam)['snps']
    xr = sp.dot(sp.rand(Xr.shape[0]), Xr)
    idxs_u = sp.sort(sp.unique(xr, return_index=True)[1])
    Xr = Xr[:,idxs_u]
    Xr-= Xr.mean(0)
    Xr/= Xr.std(0)
    Xr/= sp.sqrt(Xr.shape[1])

    _df, _df0 = fit_iSet(Y, F=F, Xr=Xr, Ie=Ie, n_nulls=10)
    df  = df.append(_df)
    df0 = df0.append(_df0)
    print 'Elapsed:', time.time()-t0

.. window 0 - (22, 16025000-16075000) - 21 snps
Elapsed: 5.46386289597
.. window 1 - (22, 16050000-16100000) - 23 snps
Elapsed: 5.34179997444
.. window 2 - (22, 16125000-16175000) - 7 snps
Elapsed: 4.30499100685
.. window 3 - (22, 16225000-16275000) - 9 snps
Elapsed: 5.6210091114
.. window 4 - (22, 16250000-16300000) - 16 snps
Elapsed: 5.8319299221
.. window 5 - (22, 16275000-16325000) - 12 snps
Elapsed: 5.72252416611
.. window 6 - (22, 16325000-16375000) - 5 snps
Elapsed: 5.94124102592
.. window 7 - (22, 16350000-16400000) - 5 snps
Elapsed: 4.80616307259
.. window 8 - (22, 16475000-16525000) - 8 snps
Elapsed: 4.15918207169
.. window 9 - (22, 16500000-16550000) - 7 snps
Elapsed: 4.45016479492

The dataframe df contains log likelihood ratio scores for the different models, variance component attributable to persistent rescaling-GxC and heterogeneity-GxC and information about convergence.

In [30]:

df

Out[30]:

Heterogeneity-GxC var	Persistent Var	Rescaling-GxC Var	iSet LLR	iSet-het LLR	mtSet LLR
-0.025807	0.059489	1.620445e-02	2.536120e-01	1.431980e-01	1.253018e+00
-0.017351	0.051698	8.162345e-03	1.151319e-01	9.235339e-02	1.046717e+00
-0.000100	0.000100	4.437414e-11	1.728208e-07	-4.211657e-09	-4.237307e-09
-0.017640	0.002922	3.255356e-02	5.018203e-01	1.145868e-08	5.018202e-01
-0.003076	0.001255	4.877466e-03	2.303760e-02	-1.327507e-09	2.301929e-02
-0.000100	0.000100	7.126069e-14	7.600009e-07	2.849561e-10	-2.643219e-12
-0.013473	0.013947	1.353159e-02	3.469951e-01	-2.905769e-04	3.541518e-01
-0.000100	0.000100	1.501852e-07	4.185099e-07	-6.807419e-07	-6.841037e-07
-0.001059	0.050339	7.098318e-04	1.430663e+00	5.423559e-01	2.143354e+00
-0.014258	0.025472	1.755192e-02	9.496528e-01	5.507864e-02	9.827788e-01

The dataframe df0 contains log likelihood ratios when data are from the null. These are necessary to iSet to calculate P values.

In [31]:

df0

Out[31]:

	iSet LLR0	iSet-het LLR0	mtSet LLR0
0	8.701465e-01	4.699373e-09	7.641269e-01
1	1.326539e+00	1.775959e-08	7.268617e-01
2	1.682668e-01	4.048806e-08	4.109243e-01
3	6.993162e-01	3.979626e-02	6.094640e-02
4	4.454416e-01	9.676313e-09	1.360218e+00
5	3.325469e-01	2.781690e-08	-2.262937e-10
6	3.882787e-03	3.794824e-08	-1.680718e-09
7	1.536725e+00	4.458499e-08	-1.468123e-09
8	4.738676e-02	1.543978e-08	-2.570744e-11
9	1.615154e+00	4.746059e-08	1.436369e+00
0	2.896440e+00	-3.513286e-07	7.101769e-01
1	5.469898e-01	1.827495e-08	-2.736243e-09
2	6.700332e-02	1.055858e-01	-1.757883e-11
3	9.861550e-02	3.519875e-08	-1.529388e-07
4	6.305888e-01	3.242017e-08	4.032068e-01
5	1.184665e+00	-7.830135e-08	9.479611e-01
6	7.165629e-01	1.139197e-02	-9.854688e-09
7	5.446231e-02	2.438270e-09	1.109642e-01
8	3.084690e-01	5.544223e-08	-3.404310e-09
9	8.050786e-02	1.705288e-09	1.399125e-02
0	6.840853e-04	5.868465e-01	3.604761e-01
1	9.845385e-08	1.434927e-09	1.186774e-02
2	1.190925e-03	-2.374596e-07	-5.115908e-13
3	2.072626e+00	1.207957e-08	1.874106e-03
4	2.848547e-01	-2.602064e-09	6.681083e-01
5	5.402823e-01	-2.528708e-09	-6.086225e-10
6	7.867916e-01	3.331607e-09	8.866508e-02
7	3.144970e-02	4.968342e-09	-6.268286e-08
8	9.240636e-06	6.960661e-09	-3.770140e-11
9	1.408660e-06	8.478040e-09	5.024917e-01
...	...	...	...
0	3.618376e-01	1.625580e-10	-9.757457e-10
1	1.296286e-02	-9.717652e-09	-3.811223e-09
2	2.637526e-01	-8.319745e-10	1.760752e-01
3	5.198513e-01	3.027425e-08	8.526513e-14
4	2.249842e-01	-1.295462e-09	-2.532204e-09
5	3.049868e-06	-1.456584e-09	7.744523e-01
6	1.695221e-01	8.543087e-08	2.239574e-03
7	7.297701e-01	1.763445e-08	1.270094e+00
8	2.980555e-01	7.361109e-09	1.076726e-02
9	6.107584e-01	1.563791e-09	5.718380e-01
0	3.746989e-01	8.190568e-10	-2.868344e-08
1	2.557176e-04	-9.237351e-08	-3.477284e-07
2	9.510620e-01	-7.084111e-11	8.870894e-02
3	7.999111e-02	1.019976e-08	4.138307e-01
4	3.914442e-02	1.043787e-08	-2.800391e-10
5	1.115584e+00	1.265378e+00	2.756960e-04
6	6.104677e-02	5.224194e-09	1.050632e-01
7	1.526689e+00	6.251213e-10	5.871766e-02
8	4.908732e-05	-6.544099e-11	1.440395e-01
9	1.342923e-01	6.910514e-08	1.195091e+00
0	2.524510e-01	1.705915e+00	7.643556e-03
1	2.279359e-01	2.201907e-08	1.626311e+00
2	2.878298e-01	1.038200e+00	1.202775e-01
3	1.121573e-06	3.969859e-09	4.472819e-01
4	1.029227e-01	-6.920686e-10	-2.700062e-13
5	1.121706e-05	1.542335e-07	6.712342e-01
6	1.083405e-01	3.995524e-10	6.778384e-02
7	7.616543e-01	1.344246e-08	7.651055e-01
8	2.134231e-02	9.555620e-09	3.038097e-01
9	3.509122e-02	3.415337e-09	1.191820e+00

100 rows × 3 columns

Variance Component Plot¶

In [32]:

import pylab as pl
tot_var = df['Persistent Var'].values + df['Rescaling-GxC Var'].values + df['Heterogeneity-GxC var'].values
nohet_var = df['Persistent Var'].values + df['Rescaling-GxC Var'].values
pl.fill_between(split.wnd_pos[:n_wnds,1], 0, df['Persistent Var'].values, color='gray')
pl.fill_between(split.wnd_pos[:n_wnds,1], df['Persistent Var'].values, nohet_var, color='Orange')
pl.fill_between(split.wnd_pos[:n_wnds,1], nohet_var, tot_var, color='Gold')

Out[32]:

<matplotlib.collections.PolyCollection at 0x11aea0ad0>

P-values¶

Empirical P values are obtained from a relatively small number of genome-wide permutations by pooling across all considered steps.

In [33]:

from limix.mtSet.core.iset_utils import calc_emp_pv_eff
#calculate P values for the three tests
for test in ['mtSet', 'iSet', 'iSet-het']:
    df[test+' pv'] = calc_emp_pv_eff(df[test+' LLR'].values, df0[test+' LLR0'].values)

In [34]:

#makes a manhattan plot
wnd_start = split.wnd_pos[:n_wnds,1]
wnd_end = split.wnd_pos[:n_wnds,2]
import pylab as pl
pl.plot(wnd_start, -sp.log10(df['mtSet pv'].values), 'o', color='DarkGreen')
pl.plot(wnd_start, -sp.log10(df['iSet pv'].values), 'o', color='DarkRed')
pl.plot(wnd_start, -sp.log10(df['iSet-het pv'].values), 'o', color='Gold')

Out[34]:

[<matplotlib.lines.Line2D at 0x115bbbb90>]

Developers¶

Francesco Paolo Casale (casale@ebi.ac.uk)
Danilo Horta (horta@ebi.ac.uk)
Barbara Rakitsch (rakitsch@ebi.ac.uk)
Oliver Stegle (stegle@ebi.ac.uk)