False Discovery Rates

Key ideas: False discovery rates, simulation study, multiple testing

This notebook is a small simulation study to assess the sampling properties of different approaches to handling multiple testing, using the routines provided in the statsmodels stats.multitest module. All assessments are done for the case of a one-sample Z-test for the expected value of a population, where the data are an independent and identically distributed sample from the population.

We start by importing a few modules:

In [1]:
import numpy as np
from statsmodels.stats.multitest import multipletests
from scipy.stats.distributions import norm
import pandas as pd

Next we define a class that holds several parameters that control how the test data are generated. Each parameter is given a default value.

In [2]:
class parameters:

    # Number of simulation replications
    nrep = 1000

    # Sample size
    n = 40

    # Effect size (mean under the alternative hypothesis).
    effect_size = 1

    # Cluster sie
    clust_size = 5
    # Intraclass correlation for clusters
    icc = 0.

    # The threshold for calling a positive result
    threshold = 0.1

    # The multitest method to evaluate
    method = "fdr_by"

Next we define a function called "simulation", where all the simulation takes place. This function takes an instance of the "parameters" class as an argument. The results of this function are:

  • Expected number of calls
  • FDR (False Discovery Rate) -- this is what the FDR methods are supposed to control
In [3]:
def simulate(p):

    efdr = []
    for i in range(p.nrep):
        data = np.random.normal(size=(200,p.n))

        # Introduce some positive dependence
        data = np.kron(data, np.ones((p.clust_size,1)))
        data = np.sqrt(p.icc)*data + np.sqrt(1-p.icc)*np.random.normal(size=data.shape)
        # 20 tests will follow the alternative hypothesis.  Place these
        # tests at random positions so they are distributed among the clusters
        ii = np.random.permutation(data.shape[0])
        i1 = ii[:20]
        i0 = ii[20:]
        data[i1,:] += p.effect_size

        # ix=1 is true alternative, ix=0 is true null
        ix = np.zeros(data.shape[0])
        ix[i1] = 1

        # Carry out the one-sample Z-tests
        zscores = np.sqrt(p.n) * data.mean(1) / data.std(1)
        pvalues = 2*norm.cdf(-np.abs(zscores))
        # Get the adjusted test results
        apv = multipletests(pvalues, method=p.method)[1]
        # Number of calls, empirical FDR
        efdr.append([np.sum(apv < p.threshold), np.mean(ix[apv < p.threshold] == 0)])
    return np.asarray(efdr)

First we run the simulation with independent tests (all population structure parameters are set at their default values). The results show that the approach is slightly conservative (the observed FDR is lower than the nominal FDR, which defaults to 0.1).

In [4]:
efdr = simulate(parameters())
rslt = pd.Series(efdr.mean(0), index=["Mean #calls", "Observed FDR"])
print rslt
Mean #calls     20.884000
Observed FDR     0.045815
dtype: float64

Now we introduce some positive dependence among the tests. The FDR approaches are derived for data-generating models with no dependence between tests, but the methods often perform well even when fairly strong dependence is present.

In [5]:
p = parameters()
p.icc = 0.9
p.effect_size = 1
efdr = simulate(p)
rslt = pd.Series(efdr.mean(0), index=["Mean #calls", "Observed FDR"])
print rslt
Mean #calls     20.861000
Observed FDR     0.041134
dtype: float64