Using the sample information from the phase I trial, we can calculate power for a phase II trial.
We wish to identify sample sizes for high dose and standard dose (in a 2:1 ratio) that will be able to detect a positive difference in the probability $p$ of a $\ge 4$ fold titer rise to H3N2 influenza. That is, we wish to be able to detect:
$$\delta = p^{(HD)} - p^{(SD)} > 0$$with 80% power. This analysis will be conducted using the proper Bayesian approach (Spiegelhalter et al. 2004). The advantage of this approach for power analysis over a frequentist power analysis is that it does not assume that the underlying probabilities are known. Failure to do so will tend to underestimate variance and result in underpowered experiments.
A sample leading to a successful trial (one in which a difference is detected) under a particular sample size $n$ is:
$$E_n = \left\{\mathbf{x}; Pr( \delta > 0 | \mathbf{x} ) \ge 0.95\right\}$$so the probability of a successful trial is:
$$\psi(n) = P_{X^{(n)}}(X^{(n)} \in E_n)$$which is the expected Bayesian power.
In our case, the power is:
$$\psi(n) = \Phi\left(\frac{\pi_{hd} - \pi_{sd}}{\sqrt{\pi_{hd}(1-\pi_{hd})/n_{hd} - \pi_{sd}(1-\pi_{sd})/n_{sd}}}\right)$$where the $\pi_i$ are the posterior probabilities, parameterized by a beta distribution with a binomial likelihood for the data.
We will use the sample data from the phase I trial (GiaQuinta et al. 2014) to construct prior distributions for $p^{(HD)}$ and $p^{(SD)}$.
%matplotlib inline
import numpy as np
import seaborn as sns
import pymc3 as pm
import theano.tensor as tt
plt = sns.plt
# Standard normal from complimentary error function
Φ = lambda x: 0.5*tt.erfc(-x/tt.sqrt(2))
def power_analysis(n):
# Allocate samples 2:1
n_hd = int(2*n/3)
n_sd = n - n_hd
with pm.Model() as model:
# probabilities of >= 4-fold-rise taken from Beta priors constructed from phase I data
π_hd = pm.Beta('π_hd', 12, 10)
π_sd = pm.Beta('π_sd', 2, 13)
# Draw binomial samples from probabilities
x_hd = pm.Binomial('x_hd', n_hd, π_hd)
x_sd = pm.Binomial('x_sd', n_sd, π_sd)
# Empirical probabilities
π_hd_post = x_hd/n_hd
π_sd_post = x_sd/n_sd
# Convert difference in probabilities to standard normal
d = pm.Deterministic('d', (π_hd_post - π_sd_post)
/ tt.sqrt(π_hd_post*(1-π_hd_post)/n_hd + π_sd_post*(1-π_sd_post)/n_sd))
# P-value
p = 1 - Φ(d)
# Successful detection
power = pm.Deterministic('power', p<0.05)
return model
Use model to estimate power for a range of sample sizes.
n=60:
iterations = 30000
keep = 20000
mod60 = power_analysis(60)
with mod60:
tr = pm.sample(iterations, step=pm.Metropolis(), init=None)
pm.df_summary(tr[-keep:], varnames=['power'])[['mean', 'mc_error']]
100%|██████████| 30000/30000 [00:07<00:00, 3802.60it/s]
mean | mc_error | |
---|---|---|
power | 0.8896 | 0.006621 |
n=48:
iterations = 30000
keep = 20000
mod48 = power_analysis(48)
with mod48:
tr = pm.sample(iterations, step=pm.Metropolis(), init=None)
pm.df_summary(tr[-keep:], varnames=['power'])[['mean', 'mc_error']]
100%|██████████| 30000/30000 [00:08<00:00, 3689.98it/s]
mean | mc_error | |
---|---|---|
power | 0.8685 | 0.0075 |
n=42:
mod42 = power_analysis(42)
with mod42:
tr = pm.sample(iterations, step=pm.Metropolis(), init=None)
pm.df_summary(tr[-keep:], varnames=['power'])[['mean', 'mc_error']]
100%|██████████| 30000/30000 [00:07<00:00, 3967.43it/s]
mean | mc_error | |
---|---|---|
power | 0.836 | 0.008768 |
n=36:
mod36 = power_analysis(36)
with mod36:
tr = pm.sample(iterations, step=pm.Metropolis(), init=None)
pm.df_summary(tr[-keep:], varnames=['power'])[['mean', 'mc_error']]
100%|██████████| 30000/30000 [00:07<00:00, 3774.80it/s]
mean | mc_error | |
---|---|---|
power | 0.8212 | 0.008525 |
n=27:
mod27 = power_analysis(27)
with mod27:
tr = pm.sample(iterations, step=pm.Metropolis(), init=None)
pm.df_summary(tr[-keep:], varnames=['power'])[['mean', 'mc_error']]
100%|██████████| 30000/30000 [00:08<00:00, 3413.36it/s]
mean | mc_error | |
---|---|---|
power | 0.74735 | 0.009091 |
The power analysis suggests that a sample size in the low- to mid-30s would be sufficient to detect a difference between high and standard dose with 80% power in a similar population. A sample size over 60 would be required for 90% power.