Modular version of rat tumors code

This is a modularized version of the rat tumor analysis, factored into reusable functions.

Setup

Grab packages

In [1]:
library(ggplot2)
library(plyr)
library(tidyverse)
library(rstan)
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ---------------------------------------------------
arrange():   dplyr, plyr
compact():   purrr, plyr
count():     dplyr, plyr
failwith():  dplyr, plyr
filter():    dplyr, stats
id():        dplyr, plyr
lag():       dplyr, stats
mutate():    dplyr, plyr
rename():    dplyr, plyr
summarise(): dplyr, plyr
summarize(): dplyr, plyr
Loading required package: StanHeaders
rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3)
For execution on a local, multicore CPU with excess RAM we recommend calling
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

Attaching package: ‘rstan’

The following object is masked from ‘package:tidyr’:

    extract

In [2]:
library(doParallel)
registerDoParallel()
Loading required package: foreach

Attaching package: ‘foreach’

The following objects are masked from ‘package:purrr’:

    accumulate, when

Loading required package: iterators
Loading required package: parallel
In [3]:
rstan_options(auto_write=TRUE)
options(mc.cores=parallel::detectCores())

Set options

In [4]:
options(repr.matrix.max.rows = 20)

Load the data:

In [5]:
rat.data = read_csv("rats.csv")
summary(rat.data)
Parsed with column specification:
cols(
  Infected = col_integer(),
  Total = col_integer()
)
    Infected          Total      
 Min.   : 0.000   Min.   :10.00  
 1st Qu.: 1.000   1st Qu.:19.00  
 Median : 3.000   Median :20.00  
 Mean   : 3.761   Mean   :24.49  
 3rd Qu.: 5.000   3rd Qu.:22.50  
 Max.   :16.000   Max.   :52.00  
In [6]:
rat.data
InfectedTotal
0 20
0 20
0 20
0 20
0 20
0 20
0 20
0 19
0 19
0 19
519
622
620
620
620
1652
1546
1547
924
414

The Model

In this data set, for each experiment $j$, we have two observations: $c_j$, the number of infected rats, and $n_j$, the number of rats.

We model the data as having an infection rate $\theta_j$ for each experiment. We want to make inferences about the distribution of $\theta_j$.

In more detail, we use the following model:

  • $c_j \sim \mathrm{Bin}(n_j,\theta_j)$
  • $\theta_j \sim \mathrm{Beta}(\alpha, \beta)$

For the purposes of notational simplicity theorem, we parameterize our model parameters as $\phi = (\alpha, \beta)$.

Therefore, we have the goal of computing $P(\phi,\mathbf{\theta}|\mathbf{c})$ given $\mathbf{c}$ and our model / likelihood function $P(\mathbf{c},\mathbf{\theta}|\phi) = \prod_j P(c_j,\theta_j|\phi_j) = \prod_j P(c_j|\theta_j)P(\theta_j|\phi_j)$.

The final derivation, as per Eq. 5.8 in Gelman et al., is

$$P(\alpha, \beta|\mathrm{y}) \propto P(\alpha, \beta) \prod_j \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} \frac{\Gamma(\alpha + c_j)\Gamma(\beta+n_j-c_j)}{\Gamma(\alpha+\beta+n_j)}$$

We will compute this log-transformed.

So we define the log likelihood function for this model. It will take as input $\alpha$ and $\beta$ vectors (which should be the same length), and return the log likelihood of each $(\alpha, \beta)$ value given the $\mathbf{c}$ and $\mathbf{n}$ vectors:

In [7]:
model.loglike = function(alpha, beta, cs, ns) {
    base = lgamma(alpha + beta) - lgamma(alpha) - lgamma(beta)
    data = mapply(a=alpha, b=beta, FUN=function(a, b) {
        sum(lgamma(a + cs) + 
                lgamma(b + ns - cs) - 
                lgamma(a + b + ns))
    })
    nrow(rat.data) * base + data
}

We're also going to use the diffuse prior from Equation 5.9:

$$P(\alpha, \beta) \propto (\alpha + \beta)^{-5/2}$$

We will not worry about the constant, since we care about relative densities.

In [8]:
model.prior = prior = function(alpha, beta) {
    (alpha + beta) ^ (-5 / 2)
}

Densities in Transformed Space

We want to be able to compute and reason about the probability density at various $\alpha$ and $\beta$ values. However, raw $\alpha \times \beta$ space is not very convenient, so we will plot instead over $x = \mathrm{log}(\frac{\alpha}{\beta})$ and $y = \mathrm{log}(\alpha + \beta)$, yielding:

$$\begin{align*} x & = \mathrm{log}(\frac{\alpha}{\beta}) \\ y & = \mathrm{log}(\alpha + \beta) \\ \alpha & = e^x \beta \\ \beta & = \frac{e^y}{e^x + 1} \end{align*}$$

In [9]:
par.beta = function(x, y) {
    exp(y) / (exp(x) + 1)
}
par.alpha = function(x, y) {
    exp(x) * par.beta(x, y)
}

Given ranges (really, sequences) of $x$ and $y$ values, we want to compute the probability density. We call this xydensity because it is the density at $(x, y)$. It returns the density at each pair $(x, y)$, taking the Cartesian product of the x and y vectors. The counts and totals parameters ($\mathbf{c}$ and $\mathbf{n}$) are also vectors, but the value for each $(x,y)$ poitn depends on the entire $\mathbf{c}$ and $\mathbf{n}$ vectors.

In [10]:
xydensity = function(x, y, counts, totals, prior=model.prior, loglike=model.loglike) {
    expand.grid(x=x, y=y) %>%
        mutate(alpha = par.alpha(x, y),
               beta = par.beta(x, y)) %>%
        mutate(logPrior = log(prior(alpha, beta)),
               logLike = loglike(alpha, beta, counts, totals),
               rawLogPost = logPrior + logLike) %>%
        mutate(logJacobian = log(alpha) + log(beta),
               logPost = rawLogPost + logJacobian)
}

Let's now try it:

In [11]:
dens.points = xydensity(x=sample(seq(-2.3, -1.3, 0.005), 100),
                        y=sample(seq(1, 5, 0.005), 100),
                        counts=rat.data$Infected, totals=rat.data$Total)
dens.points
xyalphabetalogPriorlogLikerawLogPostlogJacobianlogPost
-1.405 4.28 14.23309258.00735 -10.7 -747.8008-758.50086.716139 -751.7847
-1.905 4.28 9.35828862.88215 -10.7 -737.5867-748.28676.377525 -741.9092
-1.395 4.28 14.34772757.89271 -10.7 -748.4699-759.16996.722183 -752.4477
-2.255 4.28 6.85699465.38345 -10.7 -751.5870-762.28706.105538 -756.1814
-1.300 4.28 15.47137556.76906 -10.7 -755.8795-766.57956.777983 -759.8015
-1.995 4.28 8.64926863.59117 -10.7 -739.8336-750.53366.309949 -744.2236
-1.335 4.28 15.05010457.19034 -10.7 -752.9243-763.62436.757770 -756.8666
-1.380 4.28 14.52097957.71946 -10.7 -749.5123-760.21236.731189 -753.4812
-1.780 4.28 10.42452261.81592 -10.7 -736.2991-746.99916.468322 -740.5308
-1.600 4.28 12.13506660.10537 -10.7 -738.6894-749.38946.592199 -742.7972
-1.910 3.645 4.937745 33.34503 -9.1125 -733.6095-742.72205.103817 -737.6182
-1.370 3.645 7.756850 30.52592 -9.1125 -745.6215-754.73405.467153 -749.2668
-2.150 3.645 3.994089 34.28868 -9.1125 -739.7899-748.90244.919631 -743.9828
-1.390 3.645 7.633882 30.64889 -9.1125 -744.3755-753.48805.455193 -748.0328
-1.970 3.645 4.685386 33.59739 -9.1125 -734.6180-743.73055.058897 -738.6716
-1.670 3.645 6.064917 32.21786 -9.1125 -733.8212-742.93375.275042 -737.6586
-2.230 3.645 3.716822 34.56595 -9.1125 -743.0010-752.11354.855738 -747.2578
-2.020 3.645 4.483640 33.79913 -9.1125 -735.7450-744.85755.020870 -739.8366
-1.630 3.645 6.271881 32.01089 -9.1125 -734.5936-743.70615.302152 -738.4040
-1.875 3.645 5.090240 33.19253 -9.1125 -733.2038-742.31635.129650 -737.1867
In [12]:
dim(dens.points)
  1. 10000
  2. 9
In [13]:
summary(dens.points %>% select(rawLogPost, logLike, logPost))
   rawLogPost        logLike          logPost      
 Min.   :-776.0   Min.   :-763.5   Min.   :-768.5  
 1st Qu.:-751.3   1st Qu.:-743.9   1st Qu.:-747.3  
 Median :-746.0   Median :-739.2   Median :-742.8  
 Mean   :-747.4   Mean   :-740.2   Mean   :-743.8  
 3rd Qu.:-742.0   3rd Qu.:-735.6   3rd Qu.:-739.1  
 Max.   :-737.0   Max.   :-730.3   Max.   :-733.8  
In [14]:
ggplot(dens.points) +
    aes(x=x, y=y, z=logPost) +
    geom_contour(binwidth=0.5) +
    xlab("log(a / b)") +
    ylab("log(a + b)")

Inference for Parameter Values

With the model in place, we want to use the computed point densities to do some esimation of the posterior distributions.

In [15]:
expvals = function(dens) {
    normPost = dens$logPost - max(dens$logPost)
    alpha = sum(dens$alpha * exp(normPost)) / sum(exp(normPost))
    beta = sum(dens$beta * exp(normPost)) / sum(exp(normPost))
    x = log(alpha / beta)
    y = log(alpha + beta)
    mean = alpha / (alpha + beta)
    data.frame(alpha=alpha, beta=beta, x=x, y=y, mean=mean)
}
In [16]:
expvals(dens.points)
alphabetaxymean
2.406424 14.37326 -1.7872282.820169 0.143413

MAP estimate

We want to find the maximum value of the log posterior, so that we can use it later for normalization in the integral. The optim function seems perfect for this!

In [17]:
model.map = optim(c(1, 1), function(pars) {
    alpha = pars[1]
    beta = pars[2]
    log(model.prior(alpha, beta)) + model.loglike(alpha, beta, rat.data$Infected, rat.data$Total)
}, control=list(fnscale=-1))
model.map
$par
[1]  1.82629 10.84766

$value
[1] -737.0054

$counts
function gradient 
      71       NA 

$convergence
[1] 0

$message
NULL

Integral Inference for the Posterior

First, we will write a function to compute the probability density of a particular $\theta$, as an integral $P(\theta|\mathbf{y}) = \iint P(\theta|\alpha,\beta)P(\alpha,\beta|\mathbf{y}) \, \mathrm{d}\alpha \mathrm{d}\beta$.

This function is over theta (it will return a vector of densities, one corresponding to each value of theta). The norm parameter is a normalizing constant used to make the log likelihoods exponentiable within the limits of floating point accuracy; a value a little less than the maximum log likelihood computed by the xydensity function is a good choice.

In [18]:
model.theta.post = function(theta, logScale, counts, totals, prior=model.prior, loglike=model.loglike,
                            parallel=!identical(Sys.info()[["sysname"]], "Windows")) {
    aaply(theta, 1, function(thv) {
        integrate(function(beta) {
            aaply(beta, 1, function(bv) {
                integrate(function(alpha) {
                    ll = loglike(alpha, bv, counts, totals)
                    like = exp(ll - logScale)
                    pth = dbeta(thv, alpha, bv)
                    pth * prior(alpha, bv) * like
                }, 0, Inf, rel.tol = 0.001)$value
                })
            }, 0, Inf, rel.tol = 0.001)$value
    }, .parallel=parallel)
}

What's our maximum density? We'll use that as a starting point for scaling.

In [19]:
model.map$value
-737.005356663874

Let's compute; we do a second normalization to account for the fact that we are computing on intervals of 0.001, to try to make the area under the curve sum to 1. This computation takes a little while.

In [20]:
theta.post = data.frame(Theta=seq(0.01, 0.99, 0.005)) %>%
    mutate(RawPostDens = model.theta.post(Theta, model.map$value + 1,
                                          rat.data$Infected, rat.data$Total)) %>%
    mutate(PostDens = RawPostDens / sum(RawPostDens * 0.005))
In [21]:
ggplot(theta.post) +
    aes(x=Theta, y=PostDens) +
geom_line()

Monte Carlo Simulation of the Posterior

Now we are going to use STAN to conduct an MCMC simulation of the posterior distribution.

In [18]:
writeLines(readLines("ratmodel.stan"))
data {
    int<lower=0> J;
    int<lower=0> n[J];
    int<lower=0> y[J];
}
parameters {
    real<lower=0> alpha;
    real<lower=0> beta;
    real<lower=0,upper=1> theta[J];
}
model {
    theta ~ beta(alpha, beta);
    y ~ binomial(n, theta);
}
In [20]:
rat_fit = stan(file="ratmodel.stan", data=list(J=nrow(rat.data), y=rat.data$Infected, n=rat.data$Total),
               iter=2000, chains=4, control=list(adapt_delta=0.99))
Warning message:
“There were 76 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup”Warning message:
“There were 30 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded”Warning message:
“There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low”Warning message:
“Examine the pairs() plot to diagnose sampling problems
”
In [21]:
print(rat_fit)
Inference for Stan model: ratmodel.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

              mean  se_mean       sd    2.5%     25%     50%     75%    97.5%
alpha      2098.59  2383.60  4957.45    1.65    2.61    3.67   27.73 17130.45
beta      10124.90 11645.87 23651.40    9.75   15.84   22.16  173.69 81715.78
theta[1]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[2]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[3]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[4]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[5]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[6]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[7]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[8]      0.10     0.03     0.05    0.02    0.06    0.09    0.15     0.18
theta[9]      0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[10]     0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[11]     0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[12]     0.10     0.03     0.05    0.02    0.05    0.09    0.15     0.18
theta[13]     0.10     0.03     0.05    0.01    0.05    0.09    0.15     0.18
theta[14]     0.10     0.03     0.05    0.02    0.06    0.09    0.15     0.18
theta[15]     0.12     0.02     0.05    0.03    0.08    0.11    0.16     0.19
theta[16]     0.12     0.02     0.05    0.03    0.07    0.11    0.16     0.20
theta[17]     0.12     0.02     0.05    0.03    0.07    0.12    0.16     0.20
theta[18]     0.11     0.02     0.05    0.03    0.07    0.11    0.16     0.20
theta[19]     0.12     0.02     0.05    0.03    0.08    0.12    0.16     0.20
theta[20]     0.12     0.02     0.05    0.03    0.08    0.12    0.16     0.20
theta[21]     0.12     0.02     0.05    0.03    0.08    0.12    0.16     0.20
theta[22]     0.12     0.02     0.05    0.03    0.08    0.12    0.16     0.21
theta[23]     0.13     0.01     0.04    0.05    0.10    0.14    0.17     0.22
theta[24]     0.12     0.02     0.05    0.04    0.09    0.12    0.16     0.20
theta[25]     0.13     0.02     0.05    0.04    0.09    0.13    0.16     0.21
theta[26]     0.13     0.02     0.05    0.04    0.09    0.13    0.17     0.21
theta[27]     0.13     0.01     0.05    0.04    0.10    0.14    0.17     0.23
theta[28]     0.13     0.01     0.05    0.04    0.09    0.14    0.17     0.23
theta[29]     0.13     0.01     0.05    0.04    0.09    0.14    0.17     0.23
theta[30]     0.13     0.00     0.05    0.04    0.10    0.14    0.17     0.23
theta[31]     0.13     0.01     0.05    0.05    0.10    0.14    0.17     0.22
theta[32]     0.13     0.01     0.05    0.05    0.10    0.14    0.17     0.22
theta[33]     0.14     0.00     0.05    0.04    0.10    0.14    0.17     0.25
theta[34]     0.13     0.02     0.04    0.05    0.10    0.13    0.16     0.19
theta[35]     0.14     0.00     0.05    0.05    0.10    0.14    0.17     0.24
theta[36]     0.13     0.02     0.04    0.06    0.10    0.13    0.16     0.20
theta[37]     0.14     0.00     0.05    0.05    0.11    0.15    0.17     0.24
theta[38]     0.15     0.00     0.04    0.08    0.12    0.15    0.17     0.23
theta[39]     0.15     0.00     0.04    0.08    0.12    0.15    0.17     0.23
theta[40]     0.15     0.00     0.05    0.06    0.12    0.15    0.17     0.26
theta[41]     0.15     0.00     0.05    0.06    0.12    0.15    0.17     0.25
theta[42]     0.15     0.00     0.05    0.05    0.11    0.15    0.18     0.27
theta[43]     0.17     0.00     0.04    0.10    0.15    0.17    0.19     0.27
theta[44]     0.18     0.00     0.04    0.11    0.15    0.17    0.20     0.27
theta[45]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.28
theta[46]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.29
theta[47]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.28
theta[48]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.29
theta[49]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.29
theta[50]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.29
theta[51]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.29
theta[52]     0.18     0.00     0.04    0.11    0.16    0.17    0.21     0.28
theta[53]     0.17     0.00     0.05    0.08    0.14    0.17    0.20     0.29
theta[54]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.30
theta[55]     0.17     0.00     0.05    0.08    0.14    0.17    0.19     0.30
theta[56]     0.18     0.00     0.05    0.10    0.15    0.17    0.20     0.30
theta[57]     0.20     0.01     0.05    0.12    0.17    0.18    0.23     0.30
theta[58]     0.20     0.02     0.05    0.13    0.17    0.19    0.23     0.31
theta[59]     0.19     0.00     0.06    0.09    0.15    0.17    0.21     0.32
theta[60]     0.19     0.00     0.05    0.10    0.15    0.17    0.22     0.32
theta[61]     0.19     0.00     0.06    0.11    0.16    0.18    0.22     0.33
theta[62]     0.19     0.00     0.06    0.10    0.16    0.18    0.22     0.33
theta[63]     0.20     0.00     0.06    0.10    0.16    0.18    0.23     0.34
theta[64]     0.20     0.00     0.06    0.11    0.16    0.19    0.24     0.35
theta[65]     0.20     0.00     0.06    0.11    0.16    0.18    0.24     0.36
theta[66]     0.21     0.02     0.06    0.11    0.16    0.18    0.24     0.36
theta[67]     0.23     0.03     0.06    0.15    0.18    0.23    0.28     0.35
theta[68]     0.24     0.03     0.06    0.15    0.18    0.24    0.29     0.37
theta[69]     0.24     0.03     0.06    0.15    0.18    0.24    0.28     0.37
theta[70]     0.24     0.03     0.07    0.14    0.17    0.23    0.29     0.40
theta[71]     0.19     0.00     0.06    0.09    0.15    0.17    0.22     0.34
lp__       -713.48    71.18   104.29 -790.11 -776.16 -767.13 -726.07  -487.68
          n_eff Rhat
alpha         4 4.40
beta          4 4.25
theta[1]      4 1.48
theta[2]      3 1.53
theta[3]      3 1.52
theta[4]      3 1.49
theta[5]      3 1.52
theta[6]      3 1.53
theta[7]      3 1.53
theta[8]      3 1.50
theta[9]      3 1.49
theta[10]     3 1.49
theta[11]     4 1.46
theta[12]     4 1.44
theta[13]     4 1.43
theta[14]     4 1.46
theta[15]     5 1.26
theta[16]     5 1.25
theta[17]     5 1.25
theta[18]     5 1.26
theta[19]     5 1.24
theta[20]     5 1.24
theta[21]     6 1.21
theta[22]     6 1.21
theta[23]    11 1.11
theta[24]     6 1.21
theta[25]     7 1.18
theta[26]     7 1.16
theta[27]    10 1.11
theta[28]    12 1.09
theta[29]    11 1.10
theta[30]  4000 1.10
theta[31]    11 1.11
theta[32]    11 1.11
theta[33]  4000 1.06
theta[34]     6 1.22
theta[35]  4000 1.08
theta[36]     7 1.19
theta[37]  4000 1.06
theta[38]  4000 1.05
theta[39]  4000 1.03
theta[40]  4000 1.02
theta[41]  4000 1.02
theta[42]  4000 1.02
theta[43]  4000 1.01
theta[44]  4000 1.03
theta[45]  4000 1.00
theta[46]  4000 1.00
theta[47]  4000 1.00
theta[48]  4000 1.01
theta[49]  4000 1.00
theta[50]  4000 1.00
theta[51]  4000 1.00
theta[52]  4000 1.04
theta[53]  4000 1.01
theta[54]  4000 1.01
theta[55]  4000 1.01
theta[56]  4000 1.02
theta[57]    11 1.10
theta[58]     9 1.13
theta[59]  4000 1.03
theta[60]  4000 1.03
theta[61]  4000 1.06
theta[62]  4000 1.04
theta[63]  4000 1.06
theta[64]  4000 1.09
theta[65]  4000 1.08
theta[66]    13 1.09
theta[67]     4 1.38
theta[68]     4 1.42
theta[69]     4 1.38
theta[70]     5 1.27
theta[71]  4000 1.03
lp__          2 6.70

Samples were drawn using NUTS(diag_e) at Fri Jan 19 08:24:26 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
In [22]:
pairs(rat_fit, pars=c("alpha", "beta", "lp__"))
In [23]:
rat_sim = rstan::extract(rat_fit, permuted=TRUE)
In [24]:
n_sims = length(rat_sim$lp__)
n_sims
4000
In [25]:
theta_sims = data_frame(alpha=rat_sim$alpha, beta=rat_sim$beta) %>%
    mutate(Theta=rbeta(n(), alpha, beta))
In [26]:
summary(theta_sims)
     alpha                beta              Theta         
 Min.   :    0.996   Min.   :    6.13   Min.   :0.002141  
 1st Qu.:    2.611   1st Qu.:   15.84   1st Qu.:0.102461  
 Median :    3.673   Median :   22.16   Median :0.152477  
 Mean   : 2098.586   Mean   :10124.90   Mean   :0.148635  
 3rd Qu.:   27.729   3rd Qu.:  173.69   3rd Qu.:0.176309  
 Max.   :17826.122   Max.   :86848.38   Max.   :0.509650  
In [27]:
theta_dens = density(theta_sims$Theta)
In [28]:
ggplot(data_frame(x=theta_dens$x, y=theta_dens$y)) +
    aes(x, y) +
    geom_line()

Use priors & reparameterization

In [29]:
writeLines(readLines("ratxmodel.stan"))
data {
    int<lower=0> J;
    int<lower=0> n[J];
    int<lower=0> y[J];
}
parameters {
    real<lower=0,upper=1> phi;
    real<lower=0.1> lambda;
    real<lower=0,upper=1> theta[J];
}
transformed parameters {
    real<lower=0> alpha;
    real<lower=0> beta;
    alpha = lambda * phi;
    beta = lambda * (1 - phi);
}
model {
    phi ~ beta(1,1);
    lambda ~ pareto(0.1, 1.5);
    theta ~ beta(alpha, beta);
    y ~ binomial(n, theta);
}
In [51]:
ratx_fit = stan(file="ratxmodel.stan",
                data=list(J=nrow(rat.data), y=rat.data$Infected, n=rat.data$Total),
                iter=2000, chains=4, control=list(adapt_delta=0.99))
hash mismatch so recompiling; make sure Stan code ends with a blank line
In [52]:
print(ratx_fit)
Inference for Stan model: ratxmodel.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

             mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff
phi          0.14    0.00 0.01    0.12    0.14    0.14    0.15    0.17  3290
lambda      14.94    0.19 5.09    7.87   11.35   14.07   17.55   27.21   691
theta[1]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.16  4000
theta[2]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.16  4000
theta[3]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.16  4000
theta[4]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.16  4000
theta[5]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.17  4000
theta[6]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.16  4000
theta[7]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.16  4000
theta[8]     0.06    0.00 0.04    0.01    0.03    0.05    0.09    0.16  4000
theta[9]     0.06    0.00 0.04    0.00    0.03    0.05    0.09    0.17  4000
theta[10]    0.06    0.00 0.04    0.01    0.03    0.05    0.09    0.17  4000
theta[11]    0.06    0.00 0.04    0.01    0.03    0.05    0.09    0.16  4000
theta[12]    0.06    0.00 0.04    0.01    0.03    0.06    0.09    0.16  4000
theta[13]    0.06    0.00 0.04    0.01    0.03    0.05    0.09    0.17  4000
theta[14]    0.07    0.00 0.05    0.01    0.03    0.06    0.09    0.17  4000
theta[15]    0.09    0.00 0.05    0.02    0.05    0.08    0.12    0.21  4000
theta[16]    0.09    0.00 0.05    0.02    0.05    0.08    0.12    0.21  4000
theta[17]    0.09    0.00 0.05    0.02    0.05    0.08    0.12    0.21  4000
theta[18]    0.09    0.00 0.05    0.02    0.05    0.08    0.12    0.20  4000
theta[19]    0.09    0.00 0.05    0.02    0.06    0.09    0.12    0.21  4000
theta[20]    0.09    0.00 0.05    0.02    0.05    0.08    0.12    0.22  4000
theta[21]    0.09    0.00 0.05    0.02    0.06    0.09    0.12    0.21  4000
theta[22]    0.09    0.00 0.05    0.02    0.06    0.09    0.12    0.21  4000
theta[23]    0.12    0.00 0.05    0.04    0.09    0.12    0.15    0.24  4000
theta[24]    0.10    0.00 0.05    0.03    0.07    0.10    0.13    0.21  4000
theta[25]    0.11    0.00 0.05    0.03    0.07    0.10    0.14    0.22  4000
theta[26]    0.11    0.00 0.05    0.03    0.07    0.10    0.14    0.22  4000
theta[27]    0.12    0.00 0.05    0.04    0.08    0.11    0.15    0.25  4000
theta[28]    0.12    0.00 0.05    0.03    0.08    0.11    0.15    0.24  4000
theta[29]    0.12    0.00 0.05    0.04    0.08    0.11    0.15    0.24  4000
theta[30]    0.12    0.00 0.06    0.03    0.08    0.11    0.15    0.24  4000
theta[31]    0.12    0.00 0.06    0.03    0.08    0.11    0.15    0.25  4000
theta[32]    0.12    0.00 0.05    0.03    0.08    0.11    0.15    0.25  4000
theta[33]    0.13    0.00 0.07    0.03    0.08    0.12    0.16    0.28  4000
theta[34]    0.11    0.00 0.04    0.05    0.08    0.11    0.14    0.20  4000
theta[35]    0.12    0.00 0.06    0.04    0.08    0.11    0.15    0.25  4000
theta[36]    0.12    0.00 0.04    0.05    0.09    0.11    0.14    0.21  4000
theta[37]    0.13    0.00 0.06    0.04    0.09    0.12    0.16    0.27  4000
theta[38]    0.14    0.00 0.04    0.07    0.11    0.14    0.17    0.24  4000
theta[39]    0.15    0.00 0.04    0.07    0.11    0.14    0.18    0.24  4000
theta[40]    0.15    0.00 0.06    0.05    0.10    0.14    0.18    0.28  4000
theta[41]    0.15    0.00 0.06    0.05    0.10    0.14    0.18    0.28  4000
theta[42]    0.15    0.00 0.07    0.05    0.10    0.14    0.19    0.30  4000
theta[43]    0.18    0.00 0.05    0.10    0.14    0.17    0.21    0.28  4000
theta[44]    0.19    0.00 0.05    0.10    0.15    0.18    0.22    0.30  4000
theta[45]    0.18    0.00 0.06    0.07    0.13    0.17    0.22    0.32  4000
theta[46]    0.18    0.00 0.07    0.07    0.13    0.17    0.22    0.32  4000
theta[47]    0.18    0.00 0.06    0.07    0.13    0.17    0.22    0.32  4000
theta[48]    0.18    0.00 0.07    0.07    0.13    0.17    0.22    0.32  4000
theta[49]    0.18    0.00 0.07    0.07    0.13    0.17    0.22    0.32  4000
theta[50]    0.18    0.00 0.06    0.07    0.13    0.17    0.22    0.32  4000
theta[51]    0.18    0.00 0.06    0.07    0.13    0.17    0.22    0.32  4000
theta[52]    0.19    0.00 0.05    0.11    0.16    0.19    0.22    0.30  4000
theta[53]    0.18    0.00 0.07    0.07    0.13    0.17    0.22    0.32  4000
theta[54]    0.18    0.00 0.07    0.07    0.13    0.17    0.22    0.32  4000
theta[55]    0.18    0.00 0.07    0.07    0.14    0.18    0.22    0.32  4000
theta[56]    0.19    0.00 0.06    0.09    0.15    0.19    0.23    0.34  4000
theta[57]    0.22    0.00 0.05    0.12    0.18    0.21    0.25    0.33  4000
theta[58]    0.22    0.00 0.05    0.13    0.19    0.22    0.25    0.33  4000
theta[59]    0.21    0.00 0.07    0.09    0.16    0.20    0.25    0.36  4000
theta[60]    0.21    0.00 0.07    0.09    0.16    0.20    0.25    0.36  4000
theta[61]    0.22    0.00 0.07    0.10    0.17    0.21    0.26    0.36  4000
theta[62]    0.21    0.00 0.07    0.09    0.16    0.21    0.26    0.37  4000
theta[63]    0.22    0.00 0.07    0.11    0.17    0.22    0.27    0.37  4000
theta[64]    0.23    0.00 0.07    0.10    0.18    0.23    0.28    0.40  4000
theta[65]    0.23    0.00 0.07    0.11    0.18    0.23    0.28    0.39  4000
theta[66]    0.24    0.00 0.07    0.11    0.18    0.23    0.28    0.39  4000
theta[67]    0.27    0.00 0.05    0.17    0.23    0.27    0.31    0.38  4000
theta[68]    0.28    0.00 0.06    0.18    0.24    0.28    0.32    0.40  4000
theta[69]    0.28    0.00 0.06    0.18    0.24    0.28    0.32    0.40  4000
theta[70]    0.29    0.00 0.08    0.16    0.23    0.28    0.34    0.45  4000
theta[71]    0.21    0.00 0.08    0.09    0.16    0.21    0.26    0.38  4000
alpha        2.15    0.03 0.73    1.14    1.64    2.01    2.52    3.93   647
beta        12.79    0.17 4.39    6.68    9.70   12.01   15.04   23.39   707
lp__      -789.34    0.35 8.79 -806.87 -795.18 -789.23 -783.18 -772.64   616
          Rhat
phi       1.00
lambda    1.01
theta[1]  1.00
theta[2]  1.00
theta[3]  1.00
theta[4]  1.00
theta[5]  1.00
theta[6]  1.00
theta[7]  1.00
theta[8]  1.00
theta[9]  1.00
theta[10] 1.00
theta[11] 1.00
theta[12] 1.00
theta[13] 1.00
theta[14] 1.00
theta[15] 1.00
theta[16] 1.00
theta[17] 1.00
theta[18] 1.00
theta[19] 1.00
theta[20] 1.00
theta[21] 1.00
theta[22] 1.00
theta[23] 1.00
theta[24] 1.00
theta[25] 1.00
theta[26] 1.00
theta[27] 1.00
theta[28] 1.00
theta[29] 1.00
theta[30] 1.00
theta[31] 1.00
theta[32] 1.00
theta[33] 1.00
theta[34] 1.00
theta[35] 1.00
theta[36] 1.00
theta[37] 1.00
theta[38] 1.00
theta[39] 1.00
theta[40] 1.00
theta[41] 1.00
theta[42] 1.00
theta[43] 1.00
theta[44] 1.00
theta[45] 1.00
theta[46] 1.00
theta[47] 1.00
theta[48] 1.00
theta[49] 1.00
theta[50] 1.00
theta[51] 1.00
theta[52] 1.00
theta[53] 1.00
theta[54] 1.00
theta[55] 1.00
theta[56] 1.00
theta[57] 1.00
theta[58] 1.00
theta[59] 1.00
theta[60] 1.00
theta[61] 1.00
theta[62] 1.00
theta[63] 1.00
theta[64] 1.00
theta[65] 1.00
theta[66] 1.00
theta[67] 1.00
theta[68] 1.00
theta[69] 1.00
theta[70] 1.00
theta[71] 1.00
alpha     1.01
beta      1.01
lp__      1.01

Samples were drawn using NUTS(diag_e) at Fri Jan 19 08:44:25 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
In [36]:
ratx_sim = rstan::extract(ratx_fit, permuted=TRUE)
In [37]:
n_sims = length(ratx_sim$lp__)
n_sims
4000
In [38]:
thetax_sims = data_frame(alpha=ratx_sim$alpha, beta=ratx_sim$beta) %>%
    mutate(Theta=rbeta(n(), alpha, beta))
In [39]:
summary(thetax_sims)
     alpha             beta            Theta          
 Min.   :0.7488   Min.   : 4.179   Min.   :0.0002697  
 1st Qu.:1.6366   1st Qu.: 9.655   1st Qu.:0.0725437  
 Median :2.0138   Median :11.967   Median :0.1277092  
 Mean   :2.1246   Mean   :12.589   Mean   :0.1441277  
 3rd Qu.:2.4784   3rd Qu.:14.725   3rd Qu.:0.1963121  
 Max.   :6.5699   Max.   :38.993   Max.   :0.7375433  
In [40]:
thetax_dens = density(thetax_sims$Theta)
In [41]:
ggplot(data_frame(x=thetax_dens$x, y=thetax_dens$y)) +
    aes(x, y) +
    geom_line()

Unified Plots

In [42]:
posteriors = bind_rows(Integrated=select(theta.post, Theta, Density=PostDens),
                       MCMC=data_frame(Theta=theta_dens$x, Density=theta_dens$y),
                       MCMC_RP=data_frame(Theta=thetax_dens$x, Density=thetax_dens$y),
                       .id="Method")
Error in select(theta.post, Theta, Density = PostDens): object 'theta.post' not found
Traceback:

1. bind_rows(Integrated = select(theta.post, Theta, Density = PostDens), 
 .     MCMC = data_frame(Theta = theta_dens$x, Density = theta_dens$y), 
 .     MCMC_RP = data_frame(Theta = thetax_dens$x, Density = thetax_dens$y), 
 .     .id = "Method")
2. flatten_bindable(dots_values(...))
3. dots_values(...)
4. map(dots, function(dot) eval_bare(dot$expr, dot$env))
5. lapply(.x, .f, ...)
6. FUN(X[[i]], ...)
7. eval_bare(dot$expr, dot$env)
8. select(theta.post, Theta, Density = PostDens)
In [43]:
ggplot(posteriors) +
    aes(x=Theta, y=Density, color=Method) +
    geom_line()
Error in ggplot(posteriors): object 'posteriors' not found
Traceback:

1. ggplot(posteriors)