Sequence of data $X_1, X_2, \ldots$.
Two hypotheses, $H_0$ and $H_1$, each of which completely specifies the joint distribution of the data.
Assume that the joint distributions under $H_0$ and $H_1$ are absolutely continuous with respect to each other, relative to some underlying measure.
Let $f_{0m}$ be the likelihood of $H_0$ for data $(X_j)_{j=1}^m$ and let $f_{1m}$ be the likelihood of $H_1$ for data $(X_j)_{j=1}^m$.
The likelihood ratio of $H_1$ to $H_0$ is $f_{1m}/f_{0m}$; this is (loosely speaking) the probability of observing $X_1, \ldots, X_m$ if $H_1$ is true, divided by the probability of observing $X_1, \ldots, X_m$ if $H_0$ is true.
The probability of observing the data actually observed will tend to be higher for whichever hypothesis is in fact true, so this likelihood ratio will tend to be greater than $1$ if $H_1$ is true, and will tend to be less than $1$ if $H_0$ is true. The more observations we make, the more probable it is that the resulting likelihood ratio will be small if $H_0$ is true. Wald (1945) showed that if $H_0$ is true, then the probability is at most $\alpha$ that the likelihood ratio is ever greater than $1/\alpha$, no matter how many observations are made. More generally, we have:
For any $\alpha \in (0, 1)$ and $\beta \in [0, 1)$, the following sequential algorithm tests the hypothesis $H_0$ at level no larger than $\alpha$ and with power at least $1-\beta$ against the alternative $H_1$.
Set $m=0$.
Don't need to know the distribution of the likelihood ratio $\mbox{LR}=\frac{f_{1m}}{f_{0m}}$ under the null hypothesis to find the critical values for the test.
For iid data, the likelihood ratio is a product of terms. On a log scale, it's a sum. Each "trial" produces another term in the sum, positive or negative. But—unlike the classical Gambler's Ruin problem in which the game is fair—the terms are not equal in magnitude: the steps are not all the same size.
Suppose $X_1, X_2, \ldots,$ are iid $\mbox{Bernoulli}(p)$ random variables and let $p_1 > p_0$.
Set $\mbox{LR} \leftarrow 1$ and $j \leftarrow 0$.
What's $\mbox{LR}$ at stage $m$? Let $T_m \equiv \sum_{j=1}^m X_j$. $$ \frac{p_{1m}}{p_{0m}} \equiv \frac{p_1^{T_m}(1-p_1)^{m-T_m}}{p_0^{T_m}(1-p_0)^{m-T_m}}. $$ This is the ratio of binomial probability when $p = p_1$ to binomial probability when $p = p_0$ (the binomial coefficients in the numerator and denominator cancel).
Wald's SPRT for $p$ in iid Bernoulli trials
Conclude $p > p_0$ if $$ \frac{p_{1m}}{p_{0m}} \ge \frac{1-\beta}{\alpha}. $$ Conclude $p \le p_0$ if $$ \frac{p_{1m}}{p_{0m}} \le \frac{\beta}{1-\alpha}. $$ Otherwise, draw again.
The SPRT approximately minimizes the expected sample size when $p \le p_0$ or $p > p_1$. For values in $(p_1, p_0)$, it can have larger sample sizes than fixed-sample-size tests.
Let's watch the SPRT in action for iid Bernoulli trials.
# This is the first cell with code: set up the Python environment
%matplotlib inline
from __future__ import print_function, division
import matplotlib.pyplot as plt
import math
import numpy as np
import scipy as sp
import scipy.stats
from scipy.stats import binom
import pandas as pd
# For interactive widgets
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
from IPython.display import clear_output, display, HTML
def plotBinomialSPRT(n, p, p0, p1, alpha, beta):
'''
Plots the progress of the SPRT for n iid Bernoulli trials with probabiity p
of success, for testing the hypothesis that p=p0 against the hypothesis p=p1
with significance level alpha and power 1-beta
'''
fig, ax = plt.subplots(nrows=2, ncols=1, sharex=True)
trials = sp.stats.binom.rvs(1, p, size=n+1) # leave room to start at 1
terms = np.ones(n+1)
sfac = p1/p0
ffac = (1.0-p1)/(1.0-p0)
terms[trials == 1.0] = sfac
terms[trials == 0.0] = ffac
terms[0] = 1.0
logterms = np.log(terms)
#
ax[0].plot(range(n+1),np.cumprod(terms), color='b')
ax[0].axhline(y=(1-beta)/alpha, xmin=0, xmax=n, color='g', label=r'$(1-\beta)/\alpha$')
ax[0].axhline(y=beta/(1-alpha), xmin=0, xmax=n, color='r', label=r'$\beta/(1-\alpha)$')
ax[0].set_title('Simulation of Wald\'s SPRT')
ax[0].set_ylabel('LR')
ax[0].legend(loc='best')
#
ax[1].plot(range(n+1),np.cumsum(logterms), color='b', linestyle='--')
ax[1].axhline(y=math.log((1-beta)/alpha), xmin=0, xmax=n, color='g', label=r'$log((1-\beta)/\alpha)$')
ax[1].axhline(y=math.log(beta/(1-alpha)), xmin=0, xmax=n, color='r', label=r'$log(\beta/(1-\alpha))$')
ax[1].set_ylabel('log(LR)')
ax[1].set_xlabel('trials')
ax[1].legend(loc='best')
plt.show()
interact(plotBinomialSPRT, n=widgets.IntSlider(min=5, max=300, step=5, value=100),\
p=widgets.FloatSlider(min=0.001, max=1, step=0.01, value=.45),\
p0=widgets.FloatSlider(min=0.001, max=1, step=0.01, value=.5),\
p1=widgets.FloatSlider(min=0.001, max=1, step=0.01, value=.6),\
alpha=widgets.FloatSlider(min=0.001, max=1, step=0.01, value=.05),\
beta=widgets.FloatSlider(min=0.001, max=1, step=0.01, value=.05)
)
<function __main__.plotBinomialSPRT>
For $p_0 < p_1$,
For $p_1 < p_0$, the directions are reversed.
The inequalities hold when $\beta = 0$ also. Then the likelihood ratio has chance at most $\alpha$ of ever being greater than $1/\alpha$, if in fact $p > p_0$.
Hence, $1/\mbox{LR}$ is the $P$-value of the hypothesis $p > p_0$. This can be used to construct one-sided confidence bounds for $p$ and other parameters. The next chapter does exactly that, to find a lower confidence bound for the mean of a nonnegative population.
As an illustration of SPRT in for obervations that are not iid, consider the following problem.
There is a population of $B$ items. Item $b$ has "weight" $N_b$ and "value" $a_b \in \{0, 1\}$. Let $N = \sum_{b=1}^B N_b$. (You might think of items as manufacturing lots, where $a_b = 1$ means the lot is acceptable and $a_b=0$ means the lot is defective.)
We want to test the hypothesis $H_0$ that $\frac{1}{N}\sum_b N_b a_b = 1/2$ against the alternative hypothesis $H_1$ that $\frac{1}{N}\sum_b N_b a_b = \gamma $, for some fixed $\gamma > 1/2$.
We will draw items sequentially, without replacement, such that the chance that item $b$ is selected in draw $i$, assuming it has not been selected already, is $N_b/\sum_{j \notin {\mathcal B_{i-1}}} N_j$, where ${\mathcal B_{i-1}}$ is the sample up to and including the $i-1$st draw, and ${\mathcal B_0} \equiv \emptyset$. That is, the chance of selecting an item is in proportion to its weight among the items that have not yet been selected.
Let $\mathbb B_i$ denote the item selected at random in such a manner in the $i$th draw.
The chance that the first draw ${\mathbb B_1}$ gives an item with value 1, i.e., $\Pr \{a_{\mathbb B_1} = 1\}$, is $\frac{1}{N}\sum_b N_b a_b$. Under $H_0$, this chance is $p_{01} = 1/2$; under $H_1$, this chance is $p_{11} = \gamma$.
Given the values of $\{a_{\mathbb B_k}\}_{k=1}^i$, the conditional probability that the $i$th draw gives an item with value 1 is
$$ \Pr \{a_{\mathbb B_i} = 1 | {\mathcal B_{i-1}} \} = \frac{ \sum_{b \notin {\mathcal B_{i-1}}} N_b a_b}{\sum_{b \notin {\mathcal B_{i-1}}} N_b}. $$Under $H_0$, this chance is
$$ p_{0i} = \frac{N/2 - \sum_{b \in {\mathcal B_{i-1}}} N_b a_b}{N - \sum_{b \in {\mathcal B_{i-1}}} N_b}. $$Under $H_1$, this chance is
$$ p_{1i} = \frac{N \gamma - \sum_{b \in {\mathcal B_{i-1}}} N_b a_b}{N - \sum_{b \in {\mathcal B_{i-1}}} N_b}. $$Let $X_k$ be the indicator of the event that the $k$th draw gives an item with value $1$, i.e., the indicator of the event $a_{\mathbb B_k} = 1$. The likelihood ratio for a given sequence $\{X_k\}_{k=1}^i$ is
$$ \mbox{LR} = \frac{\prod_{k=1}^i p_{1k}^{X_k}(1-p_{1k})^{1-X_k}} {\prod_{k=1}^i p_{0k}^{X_k}(1-p_{0k})^{1-X_k}}. $$This can be simplified: $p_{0k}$ and $p_{1k}$ have the same denominator, $N - \sum_{b \in {\mathcal B_{i-1}}} N_b$, and their numerators share a term. Define $N(k) \equiv \sum_{j = 1}^{k-1}N_b$ and $A(k) \equiv \sum_{j = 1}^{k-1}N_b a_b$. Then
$$ \mbox{LR} = \prod_{k=1}^i \left ( \frac{N\gamma - A(k)}{N/2 - A(k)} \right )^{X_k} \left ( \frac{N-N\gamma - (N(k) - A(k))}{N-N/2 - (N(k)-A(k))} \right )^{1-X_k} $$$$ = \prod_{k=1}^i \left ( \frac{N\gamma - A(k)}{N/2 - A(k)} \right )^{X_k} \left ( \frac{N(1-\gamma) - N(k) + A(k)}{N/2 - N(k) + A(k)} \right )^{1-X_k}, $$where the products are defined to be infinite if the denominator vanishes anywhere.
If $H_0$ is true, the chance that $\mbox{LR}$ is ever greater than $1/\alpha$ is at most $\alpha$.