Modular version of rat tumors code

This is a modularized version of the rat tumor analysis, factored into reusable functions.

Setup

Grab packages

In [1]:
library(ggplot2)
library(plyr)
library(tidyverse)
library(rstan)
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ---------------------------------------------------
arrange():   dplyr, plyr
compact():   purrr, plyr
count():     dplyr, plyr
failwith():  dplyr, plyr
filter():    dplyr, stats
id():        dplyr, plyr
lag():       dplyr, stats
mutate():    dplyr, plyr
rename():    dplyr, plyr
summarise(): dplyr, plyr
summarize(): dplyr, plyr
Loading required package: StanHeaders
rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3)
For execution on a local, multicore CPU with excess RAM we recommend calling
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

Attaching package: ‘rstan’

The following object is masked from ‘package:tidyr’:

    extract

In [2]:
library(doParallel)
registerDoParallel()
Loading required package: foreach

Attaching package: ‘foreach’

The following objects are masked from ‘package:purrr’:

    accumulate, when

Loading required package: iterators
Loading required package: parallel
In [3]:
rstan_options(auto_write=TRUE)
options(mc.cores=parallel::detectCores())

Set options

In [4]:
options(repr.matrix.max.rows = 20)

Load the data:

In [5]:
rat.data = read_csv("rats.csv")
summary(rat.data)
Parsed with column specification:
cols(
  Infected = col_integer(),
  Total = col_integer()
)
    Infected          Total      
 Min.   : 0.000   Min.   :10.00  
 1st Qu.: 1.000   1st Qu.:19.00  
 Median : 3.000   Median :20.00  
 Mean   : 3.761   Mean   :24.49  
 3rd Qu.: 5.000   3rd Qu.:22.50  
 Max.   :16.000   Max.   :52.00  
In [6]:
rat.data
InfectedTotal
0 20
0 20
0 20
0 20
0 20
0 20
0 20
0 19
0 19
0 19
519
622
620
620
620
1652
1546
1547
924
414

The Model

In this data set, for each experiment $j$, we have two observations: $c_j$, the number of infected rats, and $n_j$, the number of rats.

We model the data as having an infection rate $\theta_j$ for each experiment. We want to make inferences about the distribution of $\theta_j$.

In more detail, we use the following model:

  • $c_j \sim \mathrm{Bin}(n_j,\theta_j)$
  • $\theta_j \sim \mathrm{Beta}(\alpha, \beta)$

For the purposes of notational simplicity theorem, we parameterize our model parameters as $\phi = (\alpha, \beta)$.

Therefore, we have the goal of computing $P(\phi,\mathbf{\theta}|\mathbf{c})$ given $\mathbf{c}$ and our model / likelihood function $P(\mathbf{c},\mathbf{\theta}|\phi) = \prod_j P(c_j,\theta_j|\phi_j) = \prod_j P(c_j|\theta_j)P(\theta_j|\phi_j)$.

The final derivation, as per Eq. 5.8 in Gelman et al., is

$$P(\alpha, \beta|\mathrm{y}) \propto P(\alpha, \beta) \prod_j \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} \frac{\Gamma(\alpha + c_j)\Gamma(\beta+n_j-c_j)}{\Gamma(\alpha+\beta+n_j)}$$

We will compute this log-transformed.

So we define the log likelihood function for this model. It will take as input $\alpha$ and $\beta$ vectors (which should be the same length), and return the log likelihood of each $(\alpha, \beta)$ value given the $\mathbf{c}$ and $\mathbf{n}$ vectors:

In [7]:
model.loglike = function(alpha, beta, cs, ns) {
    base = lgamma(alpha + beta) - lgamma(alpha) - lgamma(beta)
    data = mapply(a=alpha, b=beta, FUN=function(a, b) {
        sum(lgamma(a + cs) + 
                lgamma(b + ns - cs) - 
                lgamma(a + b + ns))
    })
    nrow(rat.data) * base + data
}

We're also going to use the diffuse prior from Equation 5.9:

$$P(\alpha, \beta) \propto (\alpha + \beta)^{-5/2}$$

We will not worry about the constant, since we care about relative densities.

In [8]:
model.prior = prior = function(alpha, beta) {
    (alpha + beta) ^ (-5 / 2)
}

Densities in Transformed Space

We want to be able to compute and reason about the probability density at various $\alpha$ and $\beta$ values. However, raw $\alpha \times \beta$ space is not very convenient, so we will plot instead over $x = \mathrm{log}(\frac{\alpha}{\beta})$ and $y = \mathrm{log}(\alpha + \beta)$, yielding:

$$\begin{align*} x & = \mathrm{log}(\frac{\alpha}{\beta}) \\ y & = \mathrm{log}(\alpha + \beta) \\ \alpha & = e^x \beta \\ \beta & = \frac{e^y}{e^x + 1} \end{align*}$$

In [9]:
par.beta = function(x, y) {
    exp(y) / (exp(x) + 1)
}
par.alpha = function(x, y) {
    exp(x) * par.beta(x, y)
}

Given ranges (really, sequences) of $x$ and $y$ values, we want to compute the probability density. We call this xydensity because it is the density at $(x, y)$. It returns the density at each pair $(x, y)$, taking the Cartesian product of the x and y vectors. The counts and totals parameters ($\mathbf{c}$ and $\mathbf{n}$) are also vectors, but the value for each $(x,y)$ poitn depends on the entire $\mathbf{c}$ and $\mathbf{n}$ vectors.

In [10]:
xydensity = function(x, y, counts, totals, prior=model.prior, loglike=model.loglike) {
    expand.grid(x=x, y=y) %>%
        mutate(alpha = par.alpha(x, y),
               beta = par.beta(x, y)) %>%
        mutate(logPrior = log(prior(alpha, beta)),
               logLike = loglike(alpha, beta, counts, totals),
               rawLogPost = logPrior + logLike) %>%
        mutate(logJacobian = log(alpha) + log(beta),
               logPost = rawLogPost + logJacobian)
}

Let's now try it:

In [11]:
dens.points = xydensity(x=sample(seq(-2.3, -1.3, 0.005), 100),
                        y=sample(seq(1, 5, 0.005), 100),
                        counts=rat.data$Infected, totals=rat.data$Total)
dens.points
xyalphabetalogPriorlogLikerawLogPostlogJacobianlogPost
-2.200 4.79 12.00012 108.30125-11.975 -754.1022-766.07727.169833 -758.9073
-1.485 4.79 22.21646 98.08491-11.975 -745.7459-757.72097.686667 -750.0342
-2.195 4.79 12.05424 108.24712-11.975 -753.7877-765.76277.173834 -758.5889
-1.505 4.79 21.85647 98.44490-11.975 -744.7450-756.72007.673994 -749.0461
-1.795 4.79 17.13823 103.16314-11.975 -739.1231-751.09817.477623 -743.6205
-1.825 4.79 16.70204 103.59933-11.975 -739.4081-751.38317.456062 -743.9270
-1.610 4.79 20.04084 100.26053-11.975 -740.8506-752.82567.605544 -745.2201
-1.515 4.79 21.67818 98.62319-11.975 -744.2764-756.25147.667613 -748.5838
-1.500 4.79 21.94604 98.35533-11.975 -744.9873-756.96237.677173 -749.2851
-1.830 4.79 16.63025 103.67112-11.975 -739.4700-751.44507.452447 -743.9925
-1.850 2.55 1.740138 11.06697 -6.375 -730.8109-737.18592.957929 -734.2279
-2.045 2.55 1.467164 11.33994 -6.375 -733.1860-739.56102.811662 -736.7494
-1.535 2.55 2.270230 10.53687 -6.375 -733.3097-739.68473.174762 -736.5099
-1.640 2.55 2.080707 10.72640 -6.375 -731.4909-737.86593.105415 -734.7605
-1.910 2.55 1.651871 11.15523 -6.375 -731.2609-737.63592.913817 -734.7221
-1.455 2.55 2.423530 10.38357 -6.375 -735.4355-741.81053.225450 -738.5850
-2.270 2.55 1.199234 11.60787 -6.375 -738.8034-745.17842.633367 -742.5451
-2.015 2.55 1.506589 11.30051 -6.375 -732.6554-739.03042.834697 -736.1957
-1.905 2.55 1.659079 11.14803 -6.375 -731.2134-737.58842.917525 -734.6709
-1.760 2.55 1.879959 10.92714 -6.375 -730.6517-737.02673.022500 -734.0042
In [12]:
dim(dens.points)
  1. 10000
  2. 9
In [13]:
summary(dens.points %>% select(rawLogPost, logLike, logPost))
   rawLogPost        logLike          logPost      
 Min.   :-775.6   Min.   :-763.1   Min.   :-768.1  
 1st Qu.:-751.9   1st Qu.:-744.0   1st Qu.:-747.5  
 Median :-746.0   Median :-739.0   Median :-742.7  
 Mean   :-747.5   Mean   :-740.1   Mean   :-743.7  
 3rd Qu.:-741.6   3rd Qu.:-734.6   3rd Qu.:-738.2  
 Max.   :-737.0   Max.   :-730.3   Max.   :-733.8  
In [14]:
ggplot(dens.points) +
    aes(x=x, y=y, z=logPost) +
    geom_contour(binwidth=0.5) +
    xlab("log(a / b)") +
    ylab("log(a + b)")

Inference for Parameter Values

With the model in place, we want to use the computed point densities to do some esimation of the posterior distributions.

In [15]:
expvals = function(dens) {
    normPost = dens$logPost - max(dens$logPost)
    alpha = sum(dens$alpha * exp(normPost)) / sum(exp(normPost))
    beta = sum(dens$beta * exp(normPost)) / sum(exp(normPost))
    x = log(alpha / beta)
    y = log(alpha + beta)
    mean = alpha / (alpha + beta)
    data.frame(alpha=alpha, beta=beta, x=x, y=y, mean=mean)
}
In [16]:
expvals(dens.points)
alphabetaxymean
2.386292 14.15951 -1.7806462.806132 0.1442234

MAP estimate

We want to find the maximum value of the log posterior, so that we can use it later for normalization in the integral. The optim function seems perfect for this!

In [17]:
model.map = optim(c(1, 1), function(pars) {
    alpha = pars[1]
    beta = pars[2]
    log(model.prior(alpha, beta)) + model.loglike(alpha, beta, rat.data$Infected, rat.data$Total)
}, control=list(fnscale=-1))
model.map
$par
[1]  1.82629 10.84766

$value
[1] -737.0054

$counts
function gradient 
      71       NA 

$convergence
[1] 0

$message
NULL

Integral Inference for the Posterior

First, we will write a function to compute the probability density of a particular $\theta$, as an integral $P(\theta|\mathbf{y}) = \iint P(\theta|\alpha,\beta)P(\alpha,\beta|\mathbf{y}) \, \mathrm{d}\alpha \mathrm{d}\beta$.

This function is over theta (it will return a vector of densities, one corresponding to each value of theta). The norm parameter is a normalizing constant used to make the log likelihoods exponentiable within the limits of floating point accuracy; a value a little less than the maximum log likelihood computed by the xydensity function is a good choice.

In [18]:
model.theta.post = function(theta, logScale, counts, totals, prior=model.prior, loglike=model.loglike,
                            parallel=!identical(Sys.info()[["sysname"]], "Windows")) {
    aaply(theta, 1, function(thv) {
        integrate(function(beta) {
            aaply(beta, 1, function(bv) {
                integrate(function(alpha) {
                    ll = loglike(alpha, bv, counts, totals)
                    like = exp(ll - logScale)
                    pth = dbeta(thv, alpha, bv)
                    pth * prior(alpha, bv) * like
                }, 0, Inf, rel.tol = 0.001)$value
                })
            }, 0, Inf, rel.tol = 0.001)$value
    }, .parallel=parallel)
}

What's our maximum density? We'll use that as a starting point for scaling.

In [19]:
model.map$value
-737.005356663874

Let's compute; we do a second normalization to account for the fact that we are computing on intervals of 0.001, to try to make the area under the curve sum to 1. This computation takes a little while.

In [20]:
theta.post = data.frame(Theta=seq(0.01, 0.99, 0.005)) %>%
    mutate(RawPostDens = model.theta.post(Theta, model.map$value + 1,
                                          rat.data$Infected, rat.data$Total)) %>%
    mutate(PostDens = RawPostDens / sum(RawPostDens * 0.005))
In [21]:
ggplot(theta.post) +
    aes(x=Theta, y=PostDens) +
geom_line()

Monte Carlo Simulation of the Posterior

Now we are going to use STAN to conduct an MCMC simulation of the posterior distribution.

In [23]:
writeLines(readLines("ratmodel.stan"))
data {
    int<lower=0> J;
    int<lower=0> n[J];
    int<lower=0> y[J];
}
parameters {
    real<lower=0> alpha;
    real<lower=0> beta;
    real<lower=0,upper=1> theta[J];
}
model {
    theta ~ beta(alpha, beta);
    y ~ binomial(n, theta);
}
In [24]:
rat_fit = stan(file="ratmodel.stan", data=list(J=nrow(rat.data), y=rat.data$Infected, n=rat.data$Total),
               iter=1000, chains=4, control=list(adapt_delta=0.99))
hash mismatch so recompiling; make sure Stan code ends with a blank line
In file included from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/rstan/include/rstan/rstaninc.hpp:3:0,
                 from file13e2b6020c641.cpp:440:
/home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/rstan/include/rstan/stan_fit.hpp: In function 'int rstan::{anonymous}::command(rstan::stan_args&, Model&, Rcpp::List&, const std::vector<long unsigned int>&, const std::vector<std::__cxx11::basic_string<char> >&, RNG_t&)':
/home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/rstan/include/rstan/stan_fit.hpp:438:8: warning: 'template<class> class std::auto_ptr' is deprecated [-Wdeprecated-declarations]
   std::auto_ptr<stan::io::var_context> init_context_ptr;
        ^~~~~~~~
In file included from /home/MICHAELEKSTRAND/miniconda2/envs/rats/x86_64-conda_cos6-linux-gnu/include/c++/7.2.0/bits/locale_conv.h:41:0,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/x86_64-conda_cos6-linux-gnu/include/c++/7.2.0/locale:43,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/BH/include/boost/lexical_cast/detail/converter_lexical_streams.hpp:43,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/BH/include/boost/lexical_cast/detail/converter_lexical.hpp:54,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/BH/include/boost/lexical_cast/try_lexical_convert.hpp:42,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/BH/include/boost/lexical_cast.hpp:32,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/BH/include/boost/math/tools/convert_from_string.hpp:15,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/BH/include/boost/math/constants/constants.hpp:13,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/StanHeaders/include/stan/math/prim/scal/fun/constants.hpp:4,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/StanHeaders/include/stan/math/rev/core/operator_unary_plus.hpp:7,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/StanHeaders/include/stan/math/rev/core.hpp:34,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/StanHeaders/include/stan/math/rev/mat.hpp:4,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/StanHeaders/include/stan/math.hpp:4,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/StanHeaders/include/src/stan/model/model_header.hpp:4,
                 from file13e2b6020c641.cpp:8:
/home/MICHAELEKSTRAND/miniconda2/envs/rats/x86_64-conda_cos6-linux-gnu/include/c++/7.2.0/bits/unique_ptr.h:51:28: note: declared here
   template<typename> class auto_ptr;
                            ^~~~~~~~
In file included from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/rstan/include/rstan/rstaninc.hpp:3:0,
                 from file13e2b6020c641.cpp:440:
/home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/rstan/include/rstan/stan_fit.hpp:547:10: warning: 'template<class> class std::auto_ptr' is deprecated [-Wdeprecated-declarations]
     std::auto_ptr<rstan_sample_writer> sample_writer_ptr;
          ^~~~~~~~
In file included from /home/MICHAELEKSTRAND/miniconda2/envs/rats/x86_64-conda_cos6-linux-gnu/include/c++/7.2.0/bits/locale_conv.h:41:0,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/x86_64-conda_cos6-linux-gnu/include/c++/7.2.0/locale:43,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/BH/include/boost/lexical_cast/detail/converter_lexical_streams.hpp:43,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/BH/include/boost/lexical_cast/detail/converter_lexical.hpp:54,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/BH/include/boost/lexical_cast/try_lexical_convert.hpp:42,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/BH/include/boost/lexical_cast.hpp:32,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/BH/include/boost/math/tools/convert_from_string.hpp:15,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/BH/include/boost/math/constants/constants.hpp:13,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/StanHeaders/include/stan/math/prim/scal/fun/constants.hpp:4,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/StanHeaders/include/stan/math/rev/core/operator_unary_plus.hpp:7,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/StanHeaders/include/stan/math/rev/core.hpp:34,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/StanHeaders/include/stan/math/rev/mat.hpp:4,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/StanHeaders/include/stan/math.hpp:4,
                 from /home/MICHAELEKSTRAND/miniconda2/envs/rats/lib/R/library/StanHeaders/include/src/stan/model/model_header.hpp:4,
                 from file13e2b6020c641.cpp:8:
/home/MICHAELEKSTRAND/miniconda2/envs/rats/x86_64-conda_cos6-linux-gnu/include/c++/7.2.0/bits/unique_ptr.h:51:28: note: declared here
   template<typename> class auto_ptr;
                            ^~~~~~~~
In [25]:
print(rat_fit)
Inference for Stan model: ratmodel.
4 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=2000.

             mean se_mean    sd    2.5%     25%     50%     75%   97.5% n_eff
alpha        3.49    0.14  1.68    1.57    2.36    3.07    4.10    7.48   140
beta        20.86    0.86 10.11    9.34   14.30   18.70   24.29   45.64   138
theta[1]     0.07    0.00  0.04    0.01    0.04    0.07    0.10    0.17  2000
theta[2]     0.08    0.00  0.04    0.01    0.05    0.07    0.10    0.17  2000
theta[3]     0.07    0.00  0.04    0.01    0.04    0.07    0.10    0.17  2000
theta[4]     0.07    0.00  0.04    0.01    0.04    0.07    0.10    0.17  2000
theta[5]     0.08    0.00  0.04    0.01    0.04    0.07    0.10    0.17  2000
theta[6]     0.08    0.00  0.04    0.01    0.04    0.07    0.10    0.17  2000
theta[7]     0.08    0.00  0.04    0.01    0.04    0.07    0.10    0.17  2000
theta[8]     0.08    0.00  0.04    0.01    0.04    0.07    0.10    0.17  2000
theta[9]     0.08    0.00  0.04    0.02    0.04    0.07    0.10    0.17  2000
theta[10]    0.08    0.00  0.04    0.01    0.05    0.07    0.10    0.17  2000
theta[11]    0.08    0.00  0.04    0.01    0.04    0.07    0.10    0.18  2000
theta[12]    0.08    0.00  0.04    0.01    0.05    0.07    0.10    0.18  2000
theta[13]    0.08    0.00  0.05    0.01    0.04    0.07    0.10    0.18  2000
theta[14]    0.08    0.00  0.04    0.01    0.05    0.07    0.11    0.18  2000
theta[15]    0.10    0.00  0.04    0.03    0.07    0.09    0.13    0.20  2000
theta[16]    0.10    0.00  0.05    0.02    0.06    0.09    0.13    0.21  2000
theta[17]    0.10    0.00  0.04    0.03    0.07    0.09    0.12    0.19  2000
theta[18]    0.10    0.00  0.05    0.03    0.07    0.09    0.13    0.20  2000
theta[19]    0.10    0.00  0.05    0.02    0.07    0.09    0.13    0.22  2000
theta[20]    0.10    0.00  0.05    0.03    0.07    0.10    0.13    0.21  2000
theta[21]    0.10    0.00  0.05    0.03    0.07    0.10    0.13    0.21  2000
theta[22]    0.10    0.00  0.05    0.03    0.07    0.10    0.13    0.22  2000
theta[23]    0.13    0.00  0.05    0.05    0.09    0.12    0.15    0.22  2000
theta[24]    0.11    0.00  0.04    0.04    0.08    0.10    0.14    0.21  2000
theta[25]    0.11    0.00  0.05    0.04    0.08    0.11    0.14    0.22  2000
theta[26]    0.11    0.00  0.05    0.04    0.08    0.11    0.14    0.22  2000
theta[27]    0.12    0.00  0.05    0.04    0.09    0.12    0.15    0.23  2000
theta[28]    0.12    0.00  0.05    0.04    0.09    0.12    0.15    0.23  2000
theta[29]    0.12    0.00  0.05    0.04    0.09    0.12    0.15    0.24  2000
theta[30]    0.12    0.00  0.05    0.04    0.09    0.12    0.16    0.25  2000
theta[31]    0.12    0.00  0.05    0.04    0.08    0.11    0.15    0.23  2000
theta[32]    0.12    0.00  0.05    0.04    0.08    0.11    0.15    0.23  2000
theta[33]    0.13    0.00  0.06    0.04    0.09    0.12    0.17    0.26  2000
theta[34]    0.11    0.00  0.04    0.05    0.09    0.11    0.14    0.20  2000
theta[35]    0.13    0.00  0.05    0.04    0.09    0.12    0.16    0.24  2000
theta[36]    0.12    0.00  0.04    0.06    0.09    0.12    0.15    0.20  2000
theta[37]    0.13    0.00  0.05    0.04    0.09    0.12    0.16    0.26  2000
theta[38]    0.14    0.00  0.04    0.07    0.11    0.14    0.17    0.23  2000
theta[39]    0.15    0.00  0.04    0.07    0.12    0.15    0.17    0.24  2000
theta[40]    0.15    0.00  0.05    0.06    0.11    0.14    0.18    0.27  2000
theta[41]    0.15    0.00  0.05    0.06    0.11    0.14    0.18    0.27  2000
theta[42]    0.15    0.00  0.06    0.05    0.10    0.14    0.18    0.28  2000
theta[43]    0.17    0.00  0.04    0.10    0.14    0.17    0.20    0.26  2000
theta[44]    0.18    0.00  0.05    0.11    0.15    0.18    0.21    0.28  2000
theta[45]    0.17    0.00  0.06    0.07    0.13    0.16    0.21    0.30  2000
theta[46]    0.17    0.00  0.06    0.08    0.13    0.17    0.21    0.30  2000
theta[47]    0.17    0.00  0.06    0.08    0.13    0.16    0.21    0.31  2000
theta[48]    0.17    0.00  0.06    0.08    0.13    0.17    0.21    0.30  2000
theta[49]    0.17    0.00  0.06    0.07    0.13    0.16    0.20    0.30  2000
theta[50]    0.17    0.00  0.06    0.07    0.13    0.17    0.21    0.30  2000
theta[51]    0.17    0.00  0.06    0.07    0.13    0.16    0.21    0.30  2000
theta[52]    0.19    0.00  0.05    0.11    0.16    0.19    0.22    0.28  2000
theta[53]    0.17    0.00  0.06    0.07    0.13    0.17    0.21    0.31  2000
theta[54]    0.17    0.00  0.06    0.07    0.13    0.17    0.21    0.31  2000
theta[55]    0.17    0.00  0.06    0.08    0.13    0.17    0.21    0.31  2000
theta[56]    0.18    0.00  0.06    0.09    0.14    0.18    0.22    0.31  2000
theta[57]    0.21    0.00  0.05    0.12    0.17    0.21    0.24    0.33  2000
theta[58]    0.21    0.00  0.05    0.13    0.18    0.21    0.25    0.32  2000
theta[59]    0.19    0.00  0.06    0.09    0.15    0.19    0.23    0.34  2000
theta[60]    0.19    0.00  0.06    0.09    0.15    0.19    0.23    0.33  2000
theta[61]    0.20    0.00  0.06    0.10    0.16    0.20    0.24    0.33  2000
theta[62]    0.20    0.00  0.06    0.09    0.15    0.19    0.24    0.33  2000
theta[63]    0.21    0.00  0.06    0.10    0.16    0.20    0.25    0.34  2000
theta[64]    0.22    0.00  0.07    0.11    0.17    0.21    0.26    0.36  2000
theta[65]    0.22    0.00  0.06    0.11    0.17    0.21    0.26    0.36  2000
theta[66]    0.22    0.00  0.07    0.11    0.17    0.21    0.26    0.36  2000
theta[67]    0.26    0.00  0.05    0.17    0.22    0.26    0.29    0.37  2000
theta[68]    0.27    0.00  0.06    0.17    0.23    0.26    0.30    0.38  2000
theta[69]    0.26    0.00  0.05    0.17    0.22    0.26    0.29    0.37  2000
theta[70]    0.26    0.00  0.07    0.15    0.21    0.25    0.30    0.41  2000
theta[71]    0.20    0.00  0.07    0.09    0.15    0.19    0.24    0.36  2000
lp__      -771.12    0.93 11.30 -793.30 -778.77 -771.21 -763.63 -748.74   149
          Rhat
alpha     1.04
beta      1.04
theta[1]  1.00
theta[2]  1.01
theta[3]  1.01
theta[4]  1.01
theta[5]  1.01
theta[6]  1.01
theta[7]  1.01
theta[8]  1.00
theta[9]  1.01
theta[10] 1.01
theta[11] 1.01
theta[12] 1.01
theta[13] 1.00
theta[14] 1.00
theta[15] 1.00
theta[16] 1.00
theta[17] 1.00
theta[18] 1.00
theta[19] 1.00
theta[20] 1.00
theta[21] 1.00
theta[22] 1.00
theta[23] 1.00
theta[24] 1.00
theta[25] 1.00
theta[26] 1.00
theta[27] 1.00
theta[28] 1.00
theta[29] 1.00
theta[30] 1.00
theta[31] 1.00
theta[32] 1.00
theta[33] 1.00
theta[34] 1.00
theta[35] 1.00
theta[36] 1.00
theta[37] 1.00
theta[38] 1.00
theta[39] 1.00
theta[40] 1.00
theta[41] 1.00
theta[42] 1.00
theta[43] 1.00
theta[44] 1.00
theta[45] 1.00
theta[46] 1.00
theta[47] 1.00
theta[48] 1.00
theta[49] 1.00
theta[50] 1.00
theta[51] 1.00
theta[52] 1.00
theta[53] 1.00
theta[54] 1.00
theta[55] 1.00
theta[56] 1.00
theta[57] 1.00
theta[58] 1.00
theta[59] 1.00
theta[60] 1.00
theta[61] 1.00
theta[62] 1.00
theta[63] 1.00
theta[64] 1.00
theta[65] 1.00
theta[66] 1.00
theta[67] 1.01
theta[68] 1.00
theta[69] 1.00
theta[70] 1.00
theta[71] 1.00
lp__      1.04

Samples were drawn using NUTS(diag_e) at Tue Dec  5 14:54:00 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
In [26]:
pairs(rat_fit, pars=c("alpha", "beta", "lp__"))
In [27]:
rat_sim = rstan::extract(rat_fit, permuted=TRUE)
In [28]:
n_sims = length(rat_sim$lp__)
n_sims
2000
In [29]:
theta_sims = data_frame(alpha=rat_sim$alpha, beta=rat_sim$beta) %>%
    mutate(Theta=rbeta(n(), alpha, beta))
In [30]:
summary(theta_sims)
     alpha             beta             Theta         
 Min.   : 1.041   Min.   :  6.696   Min.   :0.003298  
 1st Qu.: 2.364   1st Qu.: 14.301   1st Qu.:0.086690  
 Median : 3.071   Median : 18.696   Median :0.129438  
 Mean   : 3.487   Mean   : 20.864   Mean   :0.141307  
 3rd Qu.: 4.105   3rd Qu.: 24.285   3rd Qu.:0.183722  
 Max.   :18.909   Max.   :112.569   Max.   :0.495186  
In [31]:
theta_dens = density(theta_sims$Theta)
In [32]:
ggplot(data_frame(x=theta_dens$x, y=theta_dens$y)) +
    aes(x, y) +
    geom_line()

Use priors & reparameterization

In [33]:
writeLines(readLines("ratxmodel.stan"))
data {
    int<lower=0> J;
    int<lower=0> n[J];
    int<lower=0> y[J];
}
parameters {
    real<lower=0,upper=1> phi;
    real<lower=0.1> lambda;
    real<lower=0,upper=1> theta[J];
}
transformed parameters {
    real<lower=0> alpha;
    real<lower=0> beta;
    alpha = lambda * phi;
    beta = lambda * 1 - phi;
}
model {
    phi ~ beta(1,1);
    lambda ~ pareto(0.1, 1.5);
    theta ~ beta(alpha, beta);
    y ~ binomial(n, theta);
}
In [34]:
ratx_fit = stan(file="ratxmodel.stan",
                data=list(J=nrow(rat.data), y=rat.data$Infected, n=rat.data$Total),
                iter=1000, chains=4, control=list(adapt_delta=0.99))
In [35]:
print(ratx_fit)
Inference for Stan model: ratxmodel.
4 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=2000.

             mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff
phi          0.17    0.00 0.02    0.13    0.16    0.17    0.18    0.21  1244
lambda      12.43    0.23 4.31    6.26    9.42   11.71   14.56   22.93   352
theta[1]     0.06    0.00 0.04    0.00    0.03    0.05    0.08    0.17  2000
theta[2]     0.06    0.00 0.04    0.00    0.03    0.05    0.08    0.15  2000
theta[3]     0.06    0.00 0.04    0.00    0.03    0.05    0.08    0.16  2000
theta[4]     0.06    0.00 0.04    0.00    0.03    0.05    0.08    0.17  2000
theta[5]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.16  2000
theta[6]     0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.16  2000
theta[7]     0.06    0.00 0.04    0.00    0.03    0.05    0.08    0.16  2000
theta[8]     0.06    0.00 0.04    0.01    0.03    0.05    0.09    0.17  2000
theta[9]     0.06    0.00 0.04    0.00    0.03    0.05    0.08    0.16  2000
theta[10]    0.06    0.00 0.04    0.01    0.03    0.05    0.09    0.16  2000
theta[11]    0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.16  2000
theta[12]    0.06    0.00 0.04    0.01    0.03    0.05    0.09    0.17  2000
theta[13]    0.06    0.00 0.04    0.01    0.03    0.05    0.08    0.17  2000
theta[14]    0.06    0.00 0.04    0.01    0.03    0.05    0.09    0.17  2000
theta[15]    0.09    0.00 0.05    0.02    0.05    0.08    0.12    0.21  2000
theta[16]    0.09    0.00 0.05    0.02    0.05    0.08    0.12    0.19  2000
theta[17]    0.09    0.00 0.05    0.01    0.05    0.08    0.11    0.20  2000
theta[18]    0.09    0.00 0.05    0.02    0.05    0.08    0.12    0.21  2000
theta[19]    0.09    0.00 0.05    0.02    0.05    0.08    0.12    0.21  2000
theta[20]    0.09    0.00 0.05    0.02    0.05    0.08    0.12    0.21  2000
theta[21]    0.09    0.00 0.05    0.02    0.06    0.09    0.12    0.21  2000
theta[22]    0.09    0.00 0.05    0.02    0.06    0.09    0.12    0.22  2000
theta[23]    0.12    0.00 0.05    0.04    0.08    0.11    0.15    0.24  2000
theta[24]    0.10    0.00 0.05    0.03    0.07    0.10    0.13    0.21  2000
theta[25]    0.11    0.00 0.05    0.03    0.07    0.10    0.14    0.22  2000
theta[26]    0.11    0.00 0.05    0.03    0.07    0.10    0.14    0.22  2000
theta[27]    0.12    0.00 0.06    0.03    0.07    0.11    0.15    0.25  2000
theta[28]    0.12    0.00 0.06    0.03    0.08    0.11    0.15    0.25  2000
theta[29]    0.12    0.00 0.06    0.03    0.08    0.11    0.15    0.24  2000
theta[30]    0.12    0.00 0.05    0.03    0.08    0.11    0.15    0.24  2000
theta[31]    0.12    0.00 0.06    0.03    0.08    0.11    0.15    0.25  2000
theta[32]    0.12    0.00 0.06    0.03    0.08    0.11    0.15    0.25  2000
theta[33]    0.13    0.00 0.07    0.02    0.08    0.11    0.16    0.29  2000
theta[34]    0.11    0.00 0.04    0.05    0.08    0.11    0.14    0.20  2000
theta[35]    0.12    0.00 0.06    0.03    0.08    0.11    0.15    0.25  2000
theta[36]    0.12    0.00 0.04    0.05    0.09    0.11    0.14    0.21  2000
theta[37]    0.13    0.00 0.06    0.04    0.09    0.12    0.16    0.27  2000
theta[38]    0.14    0.00 0.04    0.07    0.11    0.14    0.17    0.24  2000
theta[39]    0.15    0.00 0.04    0.07    0.12    0.14    0.17    0.24  2000
theta[40]    0.15    0.00 0.06    0.05    0.11    0.14    0.19    0.28  2000
theta[41]    0.15    0.00 0.06    0.05    0.10    0.14    0.19    0.28  2000
theta[42]    0.15    0.00 0.07    0.05    0.10    0.14    0.19    0.31  2000
theta[43]    0.18    0.00 0.05    0.10    0.15    0.17    0.21    0.28  2000
theta[44]    0.19    0.00 0.05    0.10    0.15    0.18    0.22    0.30  2000
theta[45]    0.18    0.00 0.07    0.06    0.13    0.17    0.22    0.33  2000
theta[46]    0.18    0.00 0.07    0.07    0.13    0.17    0.22    0.33  2000
theta[47]    0.18    0.00 0.07    0.07    0.13    0.17    0.22    0.33  2000
theta[48]    0.18    0.00 0.07    0.07    0.13    0.17    0.22    0.32  2000
theta[49]    0.18    0.00 0.06    0.07    0.13    0.17    0.22    0.32  2000
theta[50]    0.18    0.00 0.07    0.07    0.13    0.17    0.22    0.32  2000
theta[51]    0.18    0.00 0.07    0.07    0.13    0.17    0.22    0.32  2000
theta[52]    0.20    0.00 0.05    0.11    0.16    0.19    0.23    0.31  2000
theta[53]    0.19    0.00 0.07    0.08    0.14    0.18    0.23    0.33  2000
theta[54]    0.18    0.00 0.07    0.07    0.13    0.18    0.22    0.33  2000
theta[55]    0.18    0.00 0.07    0.07    0.14    0.18    0.22    0.33  2000
theta[56]    0.20    0.00 0.07    0.08    0.15    0.19    0.24    0.34  2000
theta[57]    0.22    0.00 0.05    0.13    0.18    0.21    0.25    0.33  2000
theta[58]    0.22    0.00 0.05    0.12    0.19    0.22    0.26    0.33  2000
theta[59]    0.21    0.00 0.07    0.09    0.15    0.20    0.25    0.36  2000
theta[60]    0.21    0.00 0.07    0.09    0.16    0.20    0.25    0.35  2000
theta[61]    0.22    0.00 0.07    0.10    0.17    0.21    0.26    0.37  2000
theta[62]    0.21    0.00 0.07    0.10    0.16    0.21    0.26    0.37  2000
theta[63]    0.22    0.00 0.07    0.10    0.17    0.22    0.27    0.38  2000
theta[64]    0.24    0.00 0.07    0.12    0.19    0.23    0.28    0.40  2000
theta[65]    0.24    0.00 0.07    0.11    0.18    0.23    0.28    0.39  2000
theta[66]    0.24    0.00 0.08    0.11    0.18    0.23    0.28    0.41  2000
theta[67]    0.27    0.00 0.05    0.17    0.24    0.27    0.31    0.39  2000
theta[68]    0.28    0.00 0.06    0.18    0.24    0.28    0.32    0.40  2000
theta[69]    0.28    0.00 0.06    0.18    0.24    0.28    0.32    0.40  2000
theta[70]    0.29    0.00 0.07    0.16    0.24    0.29    0.34    0.46  2000
theta[71]    0.22    0.00 0.08    0.09    0.16    0.21    0.26    0.38  2000
alpha        2.08    0.04 0.74    1.05    1.57    1.95    2.45    3.83   293
beta        12.27    0.23 4.31    6.10    9.25   11.55   14.39   22.78   353
lp__      -791.80    0.62 9.25 -810.62 -797.45 -791.54 -785.83 -773.42   221
          Rhat
phi       1.00
lambda    1.01
theta[1]  1.00
theta[2]  1.00
theta[3]  1.00
theta[4]  1.00
theta[5]  1.00
theta[6]  1.00
theta[7]  1.00
theta[8]  1.00
theta[9]  1.00
theta[10] 1.00
theta[11] 1.00
theta[12] 1.00
theta[13] 1.00
theta[14] 1.00
theta[15] 1.00
theta[16] 1.00
theta[17] 1.00
theta[18] 1.00
theta[19] 1.00
theta[20] 1.00
theta[21] 1.00
theta[22] 1.00
theta[23] 1.00
theta[24] 1.00
theta[25] 1.00
theta[26] 1.00
theta[27] 1.00
theta[28] 1.00
theta[29] 1.00
theta[30] 1.00
theta[31] 1.00
theta[32] 1.00
theta[33] 1.00
theta[34] 1.00
theta[35] 1.00
theta[36] 1.00
theta[37] 1.00
theta[38] 1.00
theta[39] 1.00
theta[40] 1.00
theta[41] 1.00
theta[42] 1.00
theta[43] 1.00
theta[44] 1.00
theta[45] 1.00
theta[46] 1.00
theta[47] 1.00
theta[48] 1.00
theta[49] 1.00
theta[50] 1.00
theta[51] 1.00
theta[52] 1.00
theta[53] 1.00
theta[54] 1.00
theta[55] 1.00
theta[56] 1.00
theta[57] 1.00
theta[58] 1.00
theta[59] 1.00
theta[60] 1.00
theta[61] 1.00
theta[62] 1.00
theta[63] 1.00
theta[64] 1.00
theta[65] 1.00
theta[66] 1.00
theta[67] 1.00
theta[68] 1.00
theta[69] 1.00
theta[70] 1.00
theta[71] 1.00
alpha     1.01
beta      1.01
lp__      1.01

Samples were drawn using NUTS(diag_e) at Tue Dec  5 14:54:36 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
In [36]:
ratx_sim = rstan::extract(ratx_fit, permuted=TRUE)
In [37]:
n_sims = length(ratx_sim$lp__)
n_sims
2000
In [38]:
thetax_sims = data_frame(alpha=ratx_sim$alpha, beta=ratx_sim$beta) %>%
    mutate(Theta=rbeta(n(), alpha, beta))
In [39]:
summary(thetax_sims)
     alpha             beta            Theta          
 Min.   :0.7255   Min.   : 3.949   Min.   :0.0003899  
 1st Qu.:1.5695   1st Qu.: 9.248   1st Qu.:0.0740209  
 Median :1.9482   Median :11.553   Median :0.1294079  
 Mean   :2.0797   Mean   :12.265   Mean   :0.1465249  
 3rd Qu.:2.4497   3rd Qu.:14.395   3rd Qu.:0.1996516  
 Max.   :6.8578   Max.   :35.257   Max.   :0.5698463  
In [40]:
thetax_dens = density(thetax_sims$Theta)
In [41]:
ggplot(data_frame(x=thetax_dens$x, y=thetax_dens$y)) +
    aes(x, y) +
    geom_line()

Unified Plots

In [42]:
posteriors = bind_rows(Integrated=select(theta.post, Theta, Density=PostDens),
                       MCMC=data_frame(Theta=theta_dens$x, Density=theta_dens$y),
                       MCMC_RP=data_frame(Theta=thetax_dens$x, Density=thetax_dens$y),
                       .id="Method")
In [43]:
ggplot(posteriors) +
    aes(x=Theta, y=Density, color=Method) +
    geom_line()