Answer all questions and submit them either as a Jupyter notebook, LaTeX document, or Markdown document. Each question is worth 25 points.
This homework is due Friday, March 18, 2022.
import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# Set seed
np.random.seed(42)
The goal of this problem is to investigate the role of the proposal distribution in a Metropolis-Hastings algorithm designed to simulate from the posterior distribution of the mixture parameter $\delta$.
In part (1), you are asked to simulate data from a distribution with $\delta$ known. For parts (2)–(4), assume $\delta$ is unknown with prior $\delta \sim Unif( 0,1)$. For parts (2)–(4), provide an appropriate plot and a table summarizing the output of the algorithm. To facilitate comparisons, use the same number of iterations, random seed, starting values, and burn-in period for all implementations of the algorithm.
Carlin (1992) considers a Bayesian approach to meta-analysis, and includes the following examples of 22 trials of beta-blockers to prevent mortality after myocardial infarction. These data are given below.
In one possible random effects model we assume the true baseline mean (on a log-odds scale) $m_i$ in a trial $i$ is drawn from some population distribution. Let $r^C_i$ denote number of events in the control group in trial $i$, and $r^T_i$ denote events under active treatment in trial $i$. Our model is:
$$\begin{aligned} r^C_i &\sim \text{Binomial}\left(p^C_i, n^C_i\right) \\ r^T_i &\sim \text{Binomial}\left(p^T_i, n^T_i\right) \\ \text{logit}\left(p^C_i\right) &= \mu_i \\ \text{logit}\left(p^T_i\right) &= \mu_i + \delta \\ \mu_i &\sim \text{Normal}(m, s). \end{aligned}$$In this case, we want to make inferences about the population effect $m$, and the predictive distribution for the effect $\delta_{\text{new}}$ in a new trial.
This particular model uses a random effect for the population mean, and a fixed effect for the treatment effect. There are 3 other models you could fit to represent all possible combinations of fixed or random effects for these two parameters.
Build all 4 models to estimate the treatment effect in PyMC and
Which model would you select and why?
r_t_obs = np.array([3, 7, 5, 102, 28, 4, 98, 60, 25, 138, 64, 45, 9, 57, 25, 33, 28, 8, 6, 32, 27, 22])
n_t_obs = np.array([38, 114, 69, 1533, 355, 59, 945, 632, 278,1916, 873, 263, 291, 858, 154, 207, 251, 151, 174, 209, 391, 680])
r_c_obs = np.array([3, 14, 11, 127, 27, 6, 152, 48, 37, 188, 52, 47, 16, 45, 31, 38, 12, 6, 3, 40, 43, 39])
n_c_obs = np.array([39, 116, 93, 1520, 365, 52, 939, 471, 282, 1921, 583, 266, 293, 883, 147, 213, 122, 154, 134, 218, 364, 674])
N = len(n_c_obs)