*Key ideas:* False discovery rates, simulation study, multiple testing

This notebook is a small simulation study to assess the sampling properties of different approaches to handling multiple testing, using the routines provided in the statsmodels `stats.multitest`

module. All assessments are done for the case of a one-sample Z-test for the expected value of a population, where the data are an independent and identically distributed sample from the population.

We start by importing a few modules:

In [1]:

```
import numpy as np
from statsmodels.stats.multitest import multipletests
from scipy.stats.distributions import norm
import pandas as pd
```

Next we define a class that holds several parameters that control how the test data are generated. Each parameter is given a default value.

In [2]:

```
class parameters:
# Number of simulation replications
nrep = 1000
# Sample size
n = 40
# Effect size (mean under the alternative hypothesis).
effect_size = 1
# Cluster sie
clust_size = 5
# Intraclass correlation for clusters
icc = 0.
# The threshold for calling a positive result
threshold = 0.1
# The multitest method to evaluate
method = "fdr_by"
```

Next we define a function called "simulation", where all the simulation takes place. This function takes an instance of the "parameters" class as an argument. The results of this function are:

- Expected number of calls
- FDR (False Discovery Rate) -- this is what the FDR methods are supposed to control

In [3]:

```
def simulate(p):
efdr = []
for i in range(p.nrep):
data = np.random.normal(size=(200,p.n))
# Introduce some positive dependence
data = np.kron(data, np.ones((p.clust_size,1)))
data = np.sqrt(p.icc)*data + np.sqrt(1-p.icc)*np.random.normal(size=data.shape)
# 20 tests will follow the alternative hypothesis. Place these
# tests at random positions so they are distributed among the clusters
ii = np.random.permutation(data.shape[0])
i1 = ii[:20]
i0 = ii[20:]
data[i1,:] += p.effect_size
# ix=1 is true alternative, ix=0 is true null
ix = np.zeros(data.shape[0])
ix[i1] = 1
# Carry out the one-sample Z-tests
zscores = np.sqrt(p.n) * data.mean(1) / data.std(1)
pvalues = 2*norm.cdf(-np.abs(zscores))
# Get the adjusted test results
apv = multipletests(pvalues, method=p.method)[1]
# Number of calls, empirical FDR
efdr.append([np.sum(apv < p.threshold), np.mean(ix[apv < p.threshold] == 0)])
return np.asarray(efdr)
```

First we run the simulation with independent tests (all population structure parameters are set at their default values). The results show that the approach is slightly conservative (the observed FDR is lower than the nominal FDR, which defaults to 0.1).

In [4]:

```
efdr = simulate(parameters())
rslt = pd.Series(efdr.mean(0), index=["Mean #calls", "Observed FDR"])
print rslt
```

Now we introduce some positive dependence among the tests. The FDR approaches are derived for data-generating models with no dependence between tests, but the methods often perform well even when fairly strong dependence is present.

In [5]:

```
p = parameters()
p.icc = 0.9
p.effect_size = 1
efdr = simulate(p)
rslt = pd.Series(efdr.mean(0), index=["Mean #calls", "Observed FDR"])
print rslt
```