Notebook

Modular version of rat tumors code¶

This is a modularized version of the rat tumor analysis, factored into reusable functions.

Setup¶

Grab packages

In [1]:

library(ggplot2)
library(plyr)
library(tidyverse)
library(rstan)

Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ---------------------------------------------------
arrange():   dplyr, plyr
compact():   purrr, plyr
count():     dplyr, plyr
failwith():  dplyr, plyr
filter():    dplyr, stats
id():        dplyr, plyr
lag():       dplyr, stats
mutate():    dplyr, plyr
rename():    dplyr, plyr
summarise(): dplyr, plyr
summarize(): dplyr, plyr
Loading required package: StanHeaders
rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3)
For execution on a local, multicore CPU with excess RAM we recommend calling
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

Attaching package: ‘rstan’

The following object is masked from ‘package:tidyr’:

    extract

In [2]:

library(doParallel)
registerDoParallel()

Loading required package: foreach

Attaching package: ‘foreach’

The following objects are masked from ‘package:purrr’:

    accumulate, when

Loading required package: iterators
Loading required package: parallel

In [3]:

rstan_options(auto_write=TRUE)
options(mc.cores=parallel::detectCores())

Set options

In [4]:

options(repr.matrix.max.rows = 20)

Load the data:

In [5]:

rat.data = read_csv("rats.csv")
summary(rat.data)

Parsed with column specification:
cols(
  Infected = col_integer(),
  Total = col_integer()
)

    Infected          Total      
 Min.   : 0.000   Min.   :10.00  
 1st Qu.: 1.000   1st Qu.:19.00  
 Median : 3.000   Median :20.00  
 Mean   : 3.761   Mean   :24.49  
 3rd Qu.: 5.000   3rd Qu.:22.50  
 Max.   :16.000   Max.   :52.00

In [6]:

rat.data

Infected	Total
0	20
0	20
0	20
0	20
0	20
0	20
0	20
0	19
0	19
0	19
⋮	⋮
5	19
6	22
6	20
6	20
6	20
16	52
15	46
15	47
9	24
4	14

The Model¶

In this data set, for each experiment $j$, we have two observations: $c_j$, the number of infected rats, and $n_j$, the number of rats.

We model the data as having an infection rate $\theta_j$ for each experiment. We want to make inferences about the distribution of $\theta_j$.

In more detail, we use the following model:

$c_j \sim \mathrm{Bin}(n_j,\theta_j)$
$\theta_j \sim \mathrm{Beta}(\alpha, \beta)$

For the purposes of notational simplicity theorem, we parameterize our model parameters as $\phi = (\alpha, \beta)$.

Therefore, we have the goal of computing $P(\phi,\mathbf{\theta}|\mathbf{c})$ given $\mathbf{c}$ and our model / likelihood function $P(\mathbf{c},\mathbf{\theta}|\phi) = \prod_j P(c_j,\theta_j|\phi_j) = \prod_j P(c_j|\theta_j)P(\theta_j|\phi_j)$.

The final derivation, as per Eq. 5.8 in Gelman et al., is

$$P(\alpha, \beta|\mathrm{y}) \propto P(\alpha, \beta) \prod_j \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} \frac{\Gamma(\alpha + c_j)\Gamma(\beta+n_j-c_j)}{\Gamma(\alpha+\beta+n_j)}$$

We will compute this log-transformed.

So we define the log likelihood function for this model. It will take as input $\alpha$ and $\beta$ vectors (which should be the same length), and return the log likelihood of each $(\alpha, \beta)$ value given the $\mathbf{c}$ and $\mathbf{n}$ vectors:

In [7]:

model.loglike = function(alpha, beta, cs, ns) {
    base = lgamma(alpha + beta) - lgamma(alpha) - lgamma(beta)
    data = mapply(a=alpha, b=beta, FUN=function(a, b) {
        sum(lgamma(a + cs) + 
                lgamma(b + ns - cs) - 
                lgamma(a + b + ns))
    })
    nrow(rat.data) * base + data
}

We're also going to use the diffuse prior from Equation 5.9:

$$P(\alpha, \beta) \propto (\alpha + \beta)^{-5/2}$$

We will not worry about the constant, since we care about relative densities.

In [8]:

model.prior = prior = function(alpha, beta) {
    (alpha + beta) ^ (-5 / 2)
}

Densities in Transformed Space¶

We want to be able to compute and reason about the probability density at various $\alpha$ and $\beta$ values. However, raw $\alpha \times \beta$ space is not very convenient, so we will plot instead over $x = \mathrm{log}(\frac{\alpha}{\beta})$ and $y = \mathrm{log}(\alpha + \beta)$, yielding:

$$\begin{align*} x & = \mathrm{log}(\frac{\alpha}{\beta}) \\ y & = \mathrm{log}(\alpha + \beta) \\ \alpha & = e^x \beta \\ \beta & = \frac{e^y}{e^x + 1} \end{align*}$$

In [9]:

par.beta = function(x, y) {
    exp(y) / (exp(x) + 1)
}
par.alpha = function(x, y) {
    exp(x) * par.beta(x, y)
}

Given ranges (really, sequences) of $x$ and $y$ values, we want to compute the probability density. We call this xydensity because it is the density at $(x, y)$. It returns the density at each pair $(x, y)$, taking the Cartesian product of the x and y vectors. The counts and totals parameters ($\mathbf{c}$ and $\mathbf{n}$) are also vectors, but the value for each $(x,y)$ poitn depends on the entire $\mathbf{c}$ and $\mathbf{n}$ vectors.

In [10]:

xydensity = function(x, y, counts, totals, prior=model.prior, loglike=model.loglike) {
    expand.grid(x=x, y=y) %>%
        mutate(alpha = par.alpha(x, y),
               beta = par.beta(x, y)) %>%
        mutate(logPrior = log(prior(alpha, beta)),
               logLike = loglike(alpha, beta, counts, totals),
               rawLogPost = logPrior + logLike) %>%
        mutate(logJacobian = log(alpha) + log(beta),
               logPost = rawLogPost + logJacobian)
}

Let's now try it:

In [11]:

dens.points = xydensity(x=sample(seq(-2.3, -1.3, 0.005), 100),
                        y=sample(seq(1, 5, 0.005), 100),
                        counts=rat.data$Infected, totals=rat.data$Total)
dens.points

x	y	alpha	beta	logPrior	logLike	rawLogPost	logJacobian	logPost
-1.405	4.28	14.233092	58.00735	-10.7	-747.8008	-758.5008	6.716139	-751.7847
-1.905	4.28	9.358288	62.88215	-10.7	-737.5867	-748.2867	6.377525	-741.9092
-1.395	4.28	14.347727	57.89271	-10.7	-748.4699	-759.1699	6.722183	-752.4477
-2.255	4.28	6.856994	65.38345	-10.7	-751.5870	-762.2870	6.105538	-756.1814
-1.300	4.28	15.471375	56.76906	-10.7	-755.8795	-766.5795	6.777983	-759.8015
-1.995	4.28	8.649268	63.59117	-10.7	-739.8336	-750.5336	6.309949	-744.2236
-1.335	4.28	15.050104	57.19034	-10.7	-752.9243	-763.6243	6.757770	-756.8666
-1.380	4.28	14.520979	57.71946	-10.7	-749.5123	-760.2123	6.731189	-753.4812
-1.780	4.28	10.424522	61.81592	-10.7	-736.2991	-746.9991	6.468322	-740.5308
-1.600	4.28	12.135066	60.10537	-10.7	-738.6894	-749.3894	6.592199	-742.7972
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮
-1.910	3.645	4.937745	33.34503	-9.1125	-733.6095	-742.7220	5.103817	-737.6182
-1.370	3.645	7.756850	30.52592	-9.1125	-745.6215	-754.7340	5.467153	-749.2668
-2.150	3.645	3.994089	34.28868	-9.1125	-739.7899	-748.9024	4.919631	-743.9828
-1.390	3.645	7.633882	30.64889	-9.1125	-744.3755	-753.4880	5.455193	-748.0328
-1.970	3.645	4.685386	33.59739	-9.1125	-734.6180	-743.7305	5.058897	-738.6716
-1.670	3.645	6.064917	32.21786	-9.1125	-733.8212	-742.9337	5.275042	-737.6586
-2.230	3.645	3.716822	34.56595	-9.1125	-743.0010	-752.1135	4.855738	-747.2578
-2.020	3.645	4.483640	33.79913	-9.1125	-735.7450	-744.8575	5.020870	-739.8366
-1.630	3.645	6.271881	32.01089	-9.1125	-734.5936	-743.7061	5.302152	-738.4040
-1.875	3.645	5.090240	33.19253	-9.1125	-733.2038	-742.3163	5.129650	-737.1867

In [12]:

dim(dens.points)

10000
9

In [13]:

summary(dens.points %>% select(rawLogPost, logLike, logPost))

   rawLogPost        logLike          logPost      
 Min.   :-776.0   Min.   :-763.5   Min.   :-768.5  
 1st Qu.:-751.3   1st Qu.:-743.9   1st Qu.:-747.3  
 Median :-746.0   Median :-739.2   Median :-742.8  
 Mean   :-747.4   Mean   :-740.2   Mean   :-743.8  
 3rd Qu.:-742.0   3rd Qu.:-735.6   3rd Qu.:-739.1  
 Max.   :-737.0   Max.   :-730.3   Max.   :-733.8

In [14]:

ggplot(dens.points) +
    aes(x=x, y=y, z=logPost) +
    geom_contour(binwidth=0.5) +
    xlab("log(a / b)") +
    ylab("log(a + b)")

Inference for Parameter Values¶

With the model in place, we want to use the computed point densities to do some esimation of the posterior distributions.

In [15]:

expvals = function(dens) {
    normPost = dens$logPost - max(dens$logPost)
    alpha = sum(dens$alpha * exp(normPost)) / sum(exp(normPost))
    beta = sum(dens$beta * exp(normPost)) / sum(exp(normPost))
    x = log(alpha / beta)
    y = log(alpha + beta)
    mean = alpha / (alpha + beta)
    data.frame(alpha=alpha, beta=beta, x=x, y=y, mean=mean)
}

In [16]:

expvals(dens.points)

alpha	beta	x	y	mean
2.406424	14.37326	-1.787228	2.820169	0.143413

MAP estimate¶

We want to find the maximum value of the log posterior, so that we can use it later for normalization in the integral. The optim function seems perfect for this!

In [17]:

model.map = optim(c(1, 1), function(pars) {
    alpha = pars[1]
    beta = pars[2]
    log(model.prior(alpha, beta)) + model.loglike(alpha, beta, rat.data$Infected, rat.data$Total)
}, control=list(fnscale=-1))
model.map

$par
[1]  1.82629 10.84766

$value
[1] -737.0054

$counts
function gradient 
      71       NA 

$convergence
[1] 0

$message
NULL

Integral Inference for the Posterior¶

First, we will write a function to compute the probability density of a particular $\theta$, as an integral $P(\theta|\mathbf{y}) = \iint P(\theta|\alpha,\beta)P(\alpha,\beta|\mathbf{y}) \, \mathrm{d}\alpha \mathrm{d}\beta$.

This function is over theta (it will return a vector of densities, one corresponding to each value of theta). The norm parameter is a normalizing constant used to make the log likelihoods exponentiable within the limits of floating point accuracy; a value a little less than the maximum log likelihood computed by the xydensity function is a good choice.

In [18]:

model.theta.post = function(theta, logScale, counts, totals, prior=model.prior, loglike=model.loglike,
                            parallel=!identical(Sys.info()[["sysname"]], "Windows")) {
    aaply(theta, 1, function(thv) {
        integrate(function(beta) {
            aaply(beta, 1, function(bv) {
                integrate(function(alpha) {
                    ll = loglike(alpha, bv, counts, totals)
                    like = exp(ll - logScale)
                    pth = dbeta(thv, alpha, bv)
                    pth * prior(alpha, bv) * like
                }, 0, Inf, rel.tol = 0.001)$value
                })
            }, 0, Inf, rel.tol = 0.001)$value
    }, .parallel=parallel)
}

What's our maximum density? We'll use that as a starting point for scaling.

In [19]:

model.map$value

-737.005356663874

Let's compute; we do a second normalization to account for the fact that we are computing on intervals of 0.001, to try to make the area under the curve sum to 1. This computation takes a little while.

In [20]:

theta.post = data.frame(Theta=seq(0.01, 0.99, 0.005)) %>%
    mutate(RawPostDens = model.theta.post(Theta, model.map$value + 1,
                                          rat.data$Infected, rat.data$Total)) %>%
    mutate(PostDens = RawPostDens / sum(RawPostDens * 0.005))

In [21]:

ggplot(theta.post) +
    aes(x=Theta, y=PostDens) +
geom_line()

Monte Carlo Simulation of the Posterior¶

Now we are going to use STAN to conduct an MCMC simulation of the posterior distribution.

In [18]:

writeLines(readLines("ratmodel.stan"))

data {
    int<lower=0> J;
    int<lower=0> n[J];
    int<lower=0> y[J];
}
parameters {
    real<lower=0> alpha;
    real<lower=0> beta;
    real<lower=0,upper=1> theta[J];
}
model {
    theta ~ beta(alpha, beta);
    y ~ binomial(n, theta);
}

In [20]:

rat_fit = stan(file="ratmodel.stan", data=list(J=nrow(rat.data), y=rat.data$Infected, n=rat.data$Total),
               iter=2000, chains=4, control=list(adapt_delta=0.99))

Warning message:
“There were 76 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup”Warning message:
“There were 30 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded”Warning message:
“There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low”Warning message:
“Examine the pairs() plot to diagnose sampling problems
”

In [21]:

print(rat_fit)

Inference for Stan model: ratmodel.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

              mean  se_mean       sd    2.5%     25%     50%     75%    97.5%
alpha      2098.59  2383.60  4957.45    1.65    2.61    3.67   27.73 17130.45
beta      10124.90 11645.87 23651.40    9.75   15.84   22.16  173.69 81715.78
theta[1]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[2]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[3]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[4]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[5]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[6]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[7]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[8]      0.10     0.03     0.05    0.02    0.06    0.09    0.15     0.18
theta[9]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[10]     0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[11]     0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[12]     0.10     0.03     0.05    0.02    0.05    0.09    0.15     0.18
theta[13]     0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[14]     0.10     0.03     0.05    0.02    0.06    0.09    0.15     0.18
theta[15]     0.12     0.02     0.05    0.03    0.08    0.11    0.16     0.19
theta[16]     0.12     0.02     0.05    0.03    0.07    0.11    0.16     0.20
theta[17]     0.12     0.02     0.05    0.03    0.07    0.12    0.16     0.20
theta[18]     0.11     0.02     0.05    0.03    0.07    0.11    0.16     0.20
theta[19]     0.12     0.02     0.05    0.03    0.08    0.12    0.16     0.20
theta[20]     0.12     0.02     0.05    0.03    0.08    0.12    0.16     0.20
theta[21]     0.12     0.02     0.05    0.03    0.08    0.12    0.16     0.20
theta[22]     0.12     0.02     0.05    0.03    0.08    0.12    0.16     0.21
theta[23]     0.13     0.01     0.04    0.05    0.10    0.14    0.17     0.22
theta[24]     0.12     0.02     0.05    0.04    0.09    0.12    0.16     0.20
theta[25]     0.13     0.02     0.05    0.04    0.09    0.13    0.16     0.21
theta[26]     0.13     0.02     0.05    0.04    0.09    0.13    0.17     0.21
theta[27]     0.13     0.01     0.05    0.04    0.10    0.14    0.17     0.23
theta[28]     0.13     0.01     0.05    0.04    0.09    0.14    0.17     0.23
theta[29]     0.13     0.01     0.05    0.04    0.09    0.14    0.17     0.23
theta[30]     0.13     0.00     0.05    0.04    0.10    0.14    0.17     0.23
theta[31]     0.13     0.01     0.05    0.05    0.10    0.14    0.17     0.22
theta[32]     0.13     0.01     0.05    0.05    0.10    0.14    0.17     0.22
theta[33]     0.14     0.00     0.05    0.04    0.10    0.14    0.17     0.25
theta[34]     0.13     0.02     0.04    0.05    0.10    0.13    0.16     0.19
theta[35]     0.14     0.00     0.05    0.05    0.10    0.14    0.17     0.24
theta[36]     0.13     0.02     0.04    0.06    0.10    0.13    0.16     0.20
theta[37]     0.14     0.00     0.05    0.05    0.11    0.15    0.17     0.24
theta[38]     0.15     0.00     0.04    0.08    0.12    0.15    0.17     0.23
theta[39]     0.15     0.00     0.04    0.08    0.12    0.15    0.17     0.23
theta[40]     0.15     0.00     0.05    0.06    0.12    0.15    0.17     0.26
theta[41]     0.15     0.00     0.05    0.06    0.12    0.15    0.17     0.25
theta[42]     0.15     0.00     0.05    0.05    0.11    0.15    0.18     0.27
theta[43]     0.17     0.00     0.04    0.10    0.15    0.17    0.19     0.27
theta[44]     0.18     0.00     0.04    0.11    0.15    0.17    0.20     0.27
theta[45]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.28
theta[46]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.29
theta[47]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.28
theta[48]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.29
theta[49]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.29
theta[50]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.29
theta[51]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.29
theta[52]     0.18     0.00     0.04    0.11    0.16    0.17    0.21     0.28
theta[53]     0.17     0.00     0.05    0.08    0.14    0.17    0.20     0.29
theta[54]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.30
theta[55]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.30
theta[56]     0.18     0.00     0.05    0.10    0.15    0.17    0.20     0.30
theta[57]     0.20     0.01     0.05    0.12    0.17    0.18    0.23     0.30
theta[58]     0.20     0.02     0.05    0.13    0.17    0.19    0.23     0.31
theta[59]     0.19     0.00     0.06    0.09    0.15    0.17    0.21     0.32
theta[60]     0.19     0.00     0.05    0.10    0.15    0.17    0.22     0.32
theta[61]     0.19     0.00     0.06    0.11    0.16    0.18    0.22     0.33
theta[62]     0.19     0.00     0.06    0.10    0.16    0.18    0.22     0.33
theta[63]     0.20     0.00     0.06    0.10    0.16    0.18    0.23     0.34
theta[64]     0.20     0.00     0.06    0.11    0.16    0.19    0.24     0.35
theta[65]     0.20     0.00     0.06    0.11    0.16    0.18    0.24     0.36
theta[66]     0.21     0.02     0.06    0.11    0.16    0.18    0.24     0.36
theta[67]     0.23     0.03     0.06    0.15    0.18    0.23    0.28     0.35
theta[68]     0.24     0.03     0.06    0.15    0.18    0.24    0.29     0.37
theta[69]     0.24     0.03     0.06    0.15    0.18    0.24    0.28     0.37
theta[70]     0.24     0.03     0.07    0.14    0.17    0.23    0.29     0.40
theta[71]     0.19     0.00     0.06    0.09    0.15    0.17    0.22     0.34
lp__       -713.48    71.18   104.29 -790.11 -776.16 -767.13 -726.07  -487.68
          n_eff Rhat
alpha         4 4.40
beta          4 4.25
theta[1]      4 1.48
theta[2]      3 1.53
theta[3]      3 1.52
theta[4]      3 1.49
theta[5]      3 1.52
theta[6]      3 1.53
theta[7]      3 1.53
theta[8]      3 1.50
theta[9]      3 1.49
theta[10]     3 1.49
theta[11]     4 1.46
theta[12]     4 1.44
theta[13]     4 1.43
theta[14]     4 1.46
theta[15]     5 1.26
theta[16]     5 1.25
theta[17]     5 1.25
theta[18]     5 1.26
theta[19]     5 1.24
theta[20]     5 1.24
theta[21]     6 1.21
theta[22]     6 1.21
theta[23]    11 1.11
theta[24]     6 1.21
theta[25]     7 1.18
theta[26]     7 1.16
theta[27]    10 1.11
theta[28]    12 1.09
theta[29]    11 1.10
theta[30]  4000 1.10
theta[31]    11 1.11
theta[32]    11 1.11
theta[33]  4000 1.06
theta[34]     6 1.22
theta[35]  4000 1.08
theta[36]     7 1.19
theta[37]  4000 1.06
theta[38]  4000 1.05
theta[39]  4000 1.03
theta[40]  4000 1.02
theta[41]  4000 1.02
theta[42]  4000 1.02
theta[43]  4000 1.01
theta[44]  4000 1.03
theta[45]  4000 1.00
theta[46]  4000 1.00
theta[47]  4000 1.00
theta[48]  4000 1.01
theta[49]  4000 1.00
theta[50]  4000 1.00
theta[51]  4000 1.00
theta[52]  4000 1.04
theta[53]  4000 1.01
theta[54]  4000 1.01
theta[55]  4000 1.01
theta[56]  4000 1.02
theta[57]    11 1.10
theta[58]     9 1.13
theta[59]  4000 1.03
theta[60]  4000 1.03
theta[61]  4000 1.06
theta[62]  4000 1.04
theta[63]  4000 1.06
theta[64]  4000 1.09
theta[65]  4000 1.08
theta[66]    13 1.09
theta[67]     4 1.38
theta[68]     4 1.42
theta[69]     4 1.38
theta[70]     5 1.27
theta[71]  4000 1.03
lp__          2 6.70

Samples were drawn using NUTS(diag_e) at Fri Jan 19 08:24:26 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

In [22]:

pairs(rat_fit, pars=c("alpha", "beta", "lp__"))

In [23]:

rat_sim = rstan::extract(rat_fit, permuted=TRUE)

In [24]:

n_sims = length(rat_sim$lp__)
n_sims

4000

In [25]:

theta_sims = data_frame(alpha=rat_sim$alpha, beta=rat_sim$beta) %>%
    mutate(Theta=rbeta(n(), alpha, beta))

In [26]:

summary(theta_sims)

     alpha                beta              Theta         
 Min.   :    0.996   Min.   :    6.13   Min.   :0.002141  
 1st Qu.:    2.611   1st Qu.:   15.84   1st Qu.:0.102461  
 Median :    3.673   Median :   22.16   Median :0.152477  
 Mean   : 2098.586   Mean   :10124.90   Mean   :0.148635  
 3rd Qu.:   27.729   3rd Qu.:  173.69   3rd Qu.:0.176309  
 Max.   :17826.122   Max.   :86848.38   Max.   :0.509650

In [27]:

theta_dens = density(theta_sims$Theta)

In [28]:

ggplot(data_frame(x=theta_dens$x, y=theta_dens$y)) +
    aes(x, y) +
    geom_line()

Use priors & reparameterization¶

In [29]:

writeLines(readLines("ratxmodel.stan"))

data {
    int<lower=0> J;
    int<lower=0> n[J];
    int<lower=0> y[J];
}
parameters {
    real<lower=0,upper=1> phi;
    real<lower=0.1> lambda;
    real<lower=0,upper=1> theta[J];
}
transformed parameters {
    real<lower=0> alpha;
    real<lower=0> beta;
    alpha = lambda * phi;
    beta = lambda * (1 - phi);
}
model {
    phi ~ beta(1,1);
    lambda ~ pareto(0.1, 1.5);
    theta ~ beta(alpha, beta);
    y ~ binomial(n, theta);
}

In [51]:

ratx_fit = stan(file="ratxmodel.stan",
                data=list(J=nrow(rat.data), y=rat.data$Infected, n=rat.data$Total),
                iter=2000, chains=4, control=list(adapt_delta=0.99))

hash mismatch so recompiling; make sure Stan code ends with a blank line

In [52]:

print(ratx_fit)

Inference for Stan model: ratxmodel.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

             mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff
phi          0.14    0.00 0.01    0.12    0.14    0.14    0.15    0.17  3290
lambda      14.94    0.19 5.09    7.87   11.35   14.07   17.55   27.21   691
theta[1]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.16  4000
theta[2]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.16  4000
theta[3]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.16  4000
theta[4]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.16  4000
theta[5]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.17  4000
theta[6]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.16  4000
theta[7]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.16  4000
theta[8]     0.06    0.00 0.04    0.01    0.03    0.05    0.09    0.16  4000
theta[9]     0.06    0.00 0.04    0.00    0.03    0.05    0.09    0.17  4000
theta[10]    0.06    0.00 0.04    0.01    0.03    0.05    0.09    0.17  4000
theta[11]    0.06    0.00 0.04    0.01    0.03    0.05    0.09    0.16  4000
theta[12]    0.06    0.00 0.04    0.01    0.03    0.06    0.09    0.16  4000
theta[13]    0.06    0.00 0.04    0.01    0.03    0.05    0.09    0.17  4000
theta[14]    0.07    0.00 0.05    0.01    0.03    0.06    0.09    0.17  4000
theta[15]    0.09    0.00 0.05    0.02    0.05    0.08    0.12    0.21  4000
theta[16]    0.09    0.00 0.05    0.02    0.05    0.08    0.12    0.21  4000
theta[17]    0.09    0.00 0.05    0.02    0.05    0.08    0.12    0.21  4000
theta[18]    0.09    0.00 0.05    0.02    0.05    0.08    0.12    0.20  4000
theta[19]    0.09    0.00 0.05    0.02    0.06    0.09    0.12    0.21  4000
theta[20]    0.09    0.00 0.05    0.02    0.05    0.08    0.12    0.22  4000
theta[21]    0.09    0.00 0.05    0.02    0.06    0.09    0.12    0.21  4000
theta[22]    0.09    0.00 0.05    0.02    0.06    0.09    0.12    0.21  4000
theta[23]    0.12    0.00 0.05    0.04    0.09    0.12    0.15    0.24  4000
theta[24]    0.10    0.00 0.05    0.03    0.07    0.10    0.13    0.21  4000
theta[25]    0.11    0.00 0.05    0.03    0.07    0.10    0.14    0.22  4000
theta[26]    0.11    0.00 0.05    0.03    0.07    0.10    0.14    0.22  4000
theta[27]    0.12    0.00 0.05    0.04    0.08    0.11    0.15    0.25  4000
theta[28]    0.12    0.00 0.05    0.03    0.08    0.11    0.15    0.24  4000
theta[29]    0.12    0.00 0.05    0.04    0.08    0.11    0.15    0.24  4000
theta[30]    0.12    0.00 0.06    0.03    0.08    0.11    0.15    0.24  4000
theta[31]    0.12    0.00 0.06    0.03    0.08    0.11    0.15    0.25  4000
theta[32]    0.12    0.00 0.05    0.03    0.08    0.11    0.15    0.25  4000
theta[33]    0.13    0.00 0.07    0.03    0.08    0.12    0.16    0.28  4000
theta[34]    0.11    0.00 0.04    0.05    0.08    0.11    0.14    0.20  4000
theta[35]    0.12    0.00 0.06    0.04    0.08    0.11    0.15    0.25  4000
theta[36]    0.12    0.00 0.04    0.05    0.09    0.11    0.14    0.21  4000
theta[37]    0.13    0.00 0.06    0.04    0.09    0.12    0.16    0.27  4000
theta[38]    0.14    0.00 0.04    0.07    0.11    0.14    0.17    0.24  4000
theta[39]    0.15    0.00 0.04    0.07    0.11    0.14    0.18    0.24  4000
theta[40]    0.15    0.00 0.06    0.05    0.10    0.14    0.18    0.28  4000
theta[41]    0.15    0.00 0.06    0.05    0.10    0.14    0.18    0.28  4000
theta[42]    0.15    0.00 0.07    0.05    0.10    0.14    0.19    0.30  4000
theta[43]    0.18    0.00 0.05    0.10    0.14    0.17    0.21    0.28  4000
theta[44]    0.19    0.00 0.05    0.10    0.15    0.18    0.22    0.30  4000
theta[45]    0.18    0.00 0.06    0.07    0.13    0.17    0.22    0.32  4000
theta[46]    0.18    0.00 0.07    0.07    0.13    0.17    0.22    0.32  4000
theta[47]    0.18    0.00 0.06    0.07    0.13    0.17    0.22    0.32  4000
theta[48]    0.18    0.00 0.07    0.07    0.13    0.17    0.22    0.32  4000
theta[49]    0.18    0.00 0.07    0.07    0.13    0.17    0.22    0.32  4000
theta[50]    0.18    0.00 0.06    0.07    0.13    0.17    0.22    0.32  4000
theta[51]    0.18    0.00 0.06    0.07    0.13    0.17    0.22    0.32  4000
theta[52]    0.19    0.00 0.05    0.11    0.16    0.19    0.22    0.30  4000
theta[53]    0.18    0.00 0.07    0.07    0.13    0.17    0.22    0.32  4000
theta[54]    0.18    0.00 0.07    0.07    0.13    0.17    0.22    0.32  4000
theta[55]    0.18    0.00 0.07    0.07    0.14    0.18    0.22    0.32  4000
theta[56]    0.19    0.00 0.06    0.09    0.15    0.19    0.23    0.34  4000
theta[57]    0.22    0.00 0.05    0.12    0.18    0.21    0.25    0.33  4000
theta[58]    0.22    0.00 0.05    0.13    0.19    0.22    0.25    0.33  4000
theta[59]    0.21    0.00 0.07    0.09    0.16    0.20    0.25    0.36  4000
theta[60]    0.21    0.00 0.07    0.09    0.16    0.20    0.25    0.36  4000
theta[61]    0.22    0.00 0.07    0.10    0.17    0.21    0.26    0.36  4000
theta[62]    0.21    0.00 0.07    0.09    0.16    0.21    0.26    0.37  4000
theta[63]    0.22    0.00 0.07    0.11    0.17    0.22    0.27    0.37  4000
theta[64]    0.23    0.00 0.07    0.10    0.18    0.23    0.28    0.40  4000
theta[65]    0.23    0.00 0.07    0.11    0.18    0.23    0.28    0.39  4000
theta[66]    0.24    0.00 0.07    0.11    0.18    0.23    0.28    0.39  4000
theta[67]    0.27    0.00 0.05    0.17    0.23    0.27    0.31    0.38  4000
theta[68]    0.28    0.00 0.06    0.18    0.24    0.28    0.32    0.40  4000
theta[69]    0.28    0.00 0.06    0.18    0.24    0.28    0.32    0.40  4000
theta[70]    0.29    0.00 0.08    0.16    0.23    0.28    0.34    0.45  4000
theta[71]    0.21    0.00 0.08    0.09    0.16    0.21    0.26    0.38  4000
alpha        2.15    0.03 0.73    1.14    1.64    2.01    2.52    3.93   647
beta        12.79    0.17 4.39    6.68    9.70   12.01   15.04   23.39   707
lp__      -789.34    0.35 8.79 -806.87 -795.18 -789.23 -783.18 -772.64   616
          Rhat
phi       1.00
lambda    1.01
theta[1]  1.00
theta[2]  1.00
theta[3]  1.00
theta[4]  1.00
theta[5]  1.00
theta[6]  1.00
theta[7]  1.00
theta[8]  1.00
theta[9]  1.00
theta[10] 1.00
theta[11] 1.00
theta[12] 1.00
theta[13] 1.00
theta[14] 1.00
theta[15] 1.00
theta[16] 1.00
theta[17] 1.00
theta[18] 1.00
theta[19] 1.00
theta[20] 1.00
theta[21] 1.00
theta[22] 1.00
theta[23] 1.00
theta[24] 1.00
theta[25] 1.00
theta[26] 1.00
theta[27] 1.00
theta[28] 1.00
theta[29] 1.00
theta[30] 1.00
theta[31] 1.00
theta[32] 1.00
theta[33] 1.00
theta[34] 1.00
theta[35] 1.00
theta[36] 1.00
theta[37] 1.00
theta[38] 1.00
theta[39] 1.00
theta[40] 1.00
theta[41] 1.00
theta[42] 1.00
theta[43] 1.00
theta[44] 1.00
theta[45] 1.00
theta[46] 1.00
theta[47] 1.00
theta[48] 1.00
theta[49] 1.00
theta[50] 1.00
theta[51] 1.00
theta[52] 1.00
theta[53] 1.00
theta[54] 1.00
theta[55] 1.00
theta[56] 1.00
theta[57] 1.00
theta[58] 1.00
theta[59] 1.00
theta[60] 1.00
theta[61] 1.00
theta[62] 1.00
theta[63] 1.00
theta[64] 1.00
theta[65] 1.00
theta[66] 1.00
theta[67] 1.00
theta[68] 1.00
theta[69] 1.00
theta[70] 1.00
theta[71] 1.00
alpha     1.01
beta      1.01
lp__      1.01

Samples were drawn using NUTS(diag_e) at Fri Jan 19 08:44:25 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

In [36]:

ratx_sim = rstan::extract(ratx_fit, permuted=TRUE)

In [37]:

n_sims = length(ratx_sim$lp__)
n_sims

4000

In [38]:

thetax_sims = data_frame(alpha=ratx_sim$alpha, beta=ratx_sim$beta) %>%
    mutate(Theta=rbeta(n(), alpha, beta))

In [39]:

summary(thetax_sims)

     alpha             beta            Theta          
 Min.   :0.7488   Min.   : 4.179   Min.   :0.0002697  
 1st Qu.:1.6366   1st Qu.: 9.655   1st Qu.:0.0725437  
 Median :2.0138   Median :11.967   Median :0.1277092  
 Mean   :2.1246   Mean   :12.589   Mean   :0.1441277  
 3rd Qu.:2.4784   3rd Qu.:14.725   3rd Qu.:0.1963121  
 Max.   :6.5699   Max.   :38.993   Max.   :0.7375433

In [40]:

thetax_dens = density(thetax_sims$Theta)

In [41]:

ggplot(data_frame(x=thetax_dens$x, y=thetax_dens$y)) +
    aes(x, y) +
    geom_line()

Unified Plots¶

In [42]:

posteriors = bind_rows(Integrated=select(theta.post, Theta, Density=PostDens),
                       MCMC=data_frame(Theta=theta_dens$x, Density=theta_dens$y),
                       MCMC_RP=data_frame(Theta=thetax_dens$x, Density=thetax_dens$y),
                       .id="Method")

Error in select(theta.post, Theta, Density = PostDens): object 'theta.post' not found
Traceback:

1. bind_rows(Integrated = select(theta.post, Theta, Density = PostDens), 
 .     MCMC = data_frame(Theta = theta_dens$x, Density = theta_dens$y), 
 .     MCMC_RP = data_frame(Theta = thetax_dens$x, Density = thetax_dens$y), 
 .     .id = "Method")
2. flatten_bindable(dots_values(...))
3. dots_values(...)
4. map(dots, function(dot) eval_bare(dot$expr, dot$env))
5. lapply(.x, .f, ...)
6. FUN(X[[i]], ...)
7. eval_bare(dot$expr, dot$env)
8. select(theta.post, Theta, Density = PostDens)

In [43]:

ggplot(posteriors) +
    aes(x=Theta, y=Density, color=Method) +
    geom_line()

Error in ggplot(posteriors): object 'posteriors' not found
Traceback:

1. ggplot(posteriors)

In [ ]:

Infected	Total
0	20
0	20
0	20
0	20
0	20
0	20
0	20
0	19
0	19
0	19
⋮	⋮
5	19
6	22
6	20
6	20
6	20
16	52
15	46
15	47
9	24
4	14

Infected	Total
0	20
0	20
0	20
0	20
0	20
0	20
0	20
0	19
0	19
0	19
⋮	⋮
5	19
6	22
6	20
6	20
6	20
16	52
15	46
15	47
9	24
4	14

Infected	Total
0	20
0	20
0	20
0	20
0	20
0	20
0	20
0	19
0	19
0	19
⋮	⋮
5	19
6	22
6	20
6	20
6	20
16	52
15	46
15	47
9	24
4	14