This is a modularized version of the rat tumor analysis, factored into reusable functions.
Grab packages
library(ggplot2)
library(plyr)
library(tidyverse)
library(rstan)
Loading tidyverse: tibble Loading tidyverse: tidyr Loading tidyverse: readr Loading tidyverse: purrr Loading tidyverse: dplyr Conflicts with tidy packages --------------------------------------------------- arrange(): dplyr, plyr compact(): purrr, plyr count(): dplyr, plyr failwith(): dplyr, plyr filter(): dplyr, stats id(): dplyr, plyr lag(): dplyr, stats mutate(): dplyr, plyr rename(): dplyr, plyr summarise(): dplyr, plyr summarize(): dplyr, plyr Loading required package: StanHeaders rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3) For execution on a local, multicore CPU with excess RAM we recommend calling rstan_options(auto_write = TRUE) options(mc.cores = parallel::detectCores()) Attaching package: ‘rstan’ The following object is masked from ‘package:tidyr’: extract
library(doParallel)
registerDoParallel()
Loading required package: foreach Attaching package: ‘foreach’ The following objects are masked from ‘package:purrr’: accumulate, when Loading required package: iterators Loading required package: parallel
rstan_options(auto_write=TRUE)
options(mc.cores=parallel::detectCores())
Set options
options(repr.matrix.max.rows = 20)
Load the data:
rat.data = read_csv("rats.csv")
summary(rat.data)
Parsed with column specification: cols( Infected = col_integer(), Total = col_integer() )
Infected Total Min. : 0.000 Min. :10.00 1st Qu.: 1.000 1st Qu.:19.00 Median : 3.000 Median :20.00 Mean : 3.761 Mean :24.49 3rd Qu.: 5.000 3rd Qu.:22.50 Max. :16.000 Max. :52.00
rat.data
Infected | Total |
---|---|
0 | 20 |
0 | 20 |
0 | 20 |
0 | 20 |
0 | 20 |
0 | 20 |
0 | 20 |
0 | 19 |
0 | 19 |
0 | 19 |
⋮ | ⋮ |
5 | 19 |
6 | 22 |
6 | 20 |
6 | 20 |
6 | 20 |
16 | 52 |
15 | 46 |
15 | 47 |
9 | 24 |
4 | 14 |
In this data set, for each experiment $j$, we have two observations: $c_j$, the number of infected rats, and $n_j$, the number of rats.
We model the data as having an infection rate $\theta_j$ for each experiment. We want to make inferences about the distribution of $\theta_j$.
In more detail, we use the following model:
For the purposes of notational simplicity theorem, we parameterize our model parameters as $\phi = (\alpha, \beta)$.
Therefore, we have the goal of computing $P(\phi,\mathbf{\theta}|\mathbf{c})$ given $\mathbf{c}$ and our model / likelihood function $P(\mathbf{c},\mathbf{\theta}|\phi) = \prod_j P(c_j,\theta_j|\phi_j) = \prod_j P(c_j|\theta_j)P(\theta_j|\phi_j)$.
The final derivation, as per Eq. 5.8 in Gelman et al., is
$$P(\alpha, \beta|\mathrm{y}) \propto P(\alpha, \beta) \prod_j \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} \frac{\Gamma(\alpha + c_j)\Gamma(\beta+n_j-c_j)}{\Gamma(\alpha+\beta+n_j)}$$We will compute this log-transformed.
So we define the log likelihood function for this model. It will take as input $\alpha$ and $\beta$ vectors (which should be the same length), and return the log likelihood of each $(\alpha, \beta)$ value given the $\mathbf{c}$ and $\mathbf{n}$ vectors:
model.loglike = function(alpha, beta, cs, ns) {
base = lgamma(alpha + beta) - lgamma(alpha) - lgamma(beta)
data = mapply(a=alpha, b=beta, FUN=function(a, b) {
sum(lgamma(a + cs) +
lgamma(b + ns - cs) -
lgamma(a + b + ns))
})
nrow(rat.data) * base + data
}
We're also going to use the diffuse prior from Equation 5.9:
$$P(\alpha, \beta) \propto (\alpha + \beta)^{-5/2}$$We will not worry about the constant, since we care about relative densities.
model.prior = prior = function(alpha, beta) {
(alpha + beta) ^ (-5 / 2)
}
We want to be able to compute and reason about the probability density at various $\alpha$ and $\beta$ values. However, raw $\alpha \times \beta$ space is not very convenient, so we will plot instead over $x = \mathrm{log}(\frac{\alpha}{\beta})$ and $y = \mathrm{log}(\alpha + \beta)$, yielding:
$$\begin{align*} x & = \mathrm{log}(\frac{\alpha}{\beta}) \\ y & = \mathrm{log}(\alpha + \beta) \\ \alpha & = e^x \beta \\ \beta & = \frac{e^y}{e^x + 1} \end{align*}$$par.beta = function(x, y) {
exp(y) / (exp(x) + 1)
}
par.alpha = function(x, y) {
exp(x) * par.beta(x, y)
}
Given ranges (really, sequences) of $x$ and $y$ values, we want to compute the probability density. We call this xydensity
because it is the density at $(x, y)$. It returns the density at each pair $(x, y)$, taking the Cartesian product of the x
and y
vectors. The counts
and totals
parameters ($\mathbf{c}$ and $\mathbf{n}$) are also vectors, but the value for each $(x,y)$ poitn depends on the entire $\mathbf{c}$ and $\mathbf{n}$ vectors.
xydensity = function(x, y, counts, totals, prior=model.prior, loglike=model.loglike) {
expand.grid(x=x, y=y) %>%
mutate(alpha = par.alpha(x, y),
beta = par.beta(x, y)) %>%
mutate(logPrior = log(prior(alpha, beta)),
logLike = loglike(alpha, beta, counts, totals),
rawLogPost = logPrior + logLike) %>%
mutate(logJacobian = log(alpha) + log(beta),
logPost = rawLogPost + logJacobian)
}
Let's now try it:
dens.points = xydensity(x=sample(seq(-2.3, -1.3, 0.005), 100),
y=sample(seq(1, 5, 0.005), 100),
counts=rat.data$Infected, totals=rat.data$Total)
dens.points
x | y | alpha | beta | logPrior | logLike | rawLogPost | logJacobian | logPost |
---|---|---|---|---|---|---|---|---|
-1.405 | 4.28 | 14.233092 | 58.00735 | -10.7 | -747.8008 | -758.5008 | 6.716139 | -751.7847 |
-1.905 | 4.28 | 9.358288 | 62.88215 | -10.7 | -737.5867 | -748.2867 | 6.377525 | -741.9092 |
-1.395 | 4.28 | 14.347727 | 57.89271 | -10.7 | -748.4699 | -759.1699 | 6.722183 | -752.4477 |
-2.255 | 4.28 | 6.856994 | 65.38345 | -10.7 | -751.5870 | -762.2870 | 6.105538 | -756.1814 |
-1.300 | 4.28 | 15.471375 | 56.76906 | -10.7 | -755.8795 | -766.5795 | 6.777983 | -759.8015 |
-1.995 | 4.28 | 8.649268 | 63.59117 | -10.7 | -739.8336 | -750.5336 | 6.309949 | -744.2236 |
-1.335 | 4.28 | 15.050104 | 57.19034 | -10.7 | -752.9243 | -763.6243 | 6.757770 | -756.8666 |
-1.380 | 4.28 | 14.520979 | 57.71946 | -10.7 | -749.5123 | -760.2123 | 6.731189 | -753.4812 |
-1.780 | 4.28 | 10.424522 | 61.81592 | -10.7 | -736.2991 | -746.9991 | 6.468322 | -740.5308 |
-1.600 | 4.28 | 12.135066 | 60.10537 | -10.7 | -738.6894 | -749.3894 | 6.592199 | -742.7972 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
-1.910 | 3.645 | 4.937745 | 33.34503 | -9.1125 | -733.6095 | -742.7220 | 5.103817 | -737.6182 |
-1.370 | 3.645 | 7.756850 | 30.52592 | -9.1125 | -745.6215 | -754.7340 | 5.467153 | -749.2668 |
-2.150 | 3.645 | 3.994089 | 34.28868 | -9.1125 | -739.7899 | -748.9024 | 4.919631 | -743.9828 |
-1.390 | 3.645 | 7.633882 | 30.64889 | -9.1125 | -744.3755 | -753.4880 | 5.455193 | -748.0328 |
-1.970 | 3.645 | 4.685386 | 33.59739 | -9.1125 | -734.6180 | -743.7305 | 5.058897 | -738.6716 |
-1.670 | 3.645 | 6.064917 | 32.21786 | -9.1125 | -733.8212 | -742.9337 | 5.275042 | -737.6586 |
-2.230 | 3.645 | 3.716822 | 34.56595 | -9.1125 | -743.0010 | -752.1135 | 4.855738 | -747.2578 |
-2.020 | 3.645 | 4.483640 | 33.79913 | -9.1125 | -735.7450 | -744.8575 | 5.020870 | -739.8366 |
-1.630 | 3.645 | 6.271881 | 32.01089 | -9.1125 | -734.5936 | -743.7061 | 5.302152 | -738.4040 |
-1.875 | 3.645 | 5.090240 | 33.19253 | -9.1125 | -733.2038 | -742.3163 | 5.129650 | -737.1867 |
dim(dens.points)
summary(dens.points %>% select(rawLogPost, logLike, logPost))
rawLogPost logLike logPost Min. :-776.0 Min. :-763.5 Min. :-768.5 1st Qu.:-751.3 1st Qu.:-743.9 1st Qu.:-747.3 Median :-746.0 Median :-739.2 Median :-742.8 Mean :-747.4 Mean :-740.2 Mean :-743.8 3rd Qu.:-742.0 3rd Qu.:-735.6 3rd Qu.:-739.1 Max. :-737.0 Max. :-730.3 Max. :-733.8
ggplot(dens.points) +
aes(x=x, y=y, z=logPost) +
geom_contour(binwidth=0.5) +
xlab("log(a / b)") +
ylab("log(a + b)")
With the model in place, we want to use the computed point densities to do some esimation of the posterior distributions.
expvals = function(dens) {
normPost = dens$logPost - max(dens$logPost)
alpha = sum(dens$alpha * exp(normPost)) / sum(exp(normPost))
beta = sum(dens$beta * exp(normPost)) / sum(exp(normPost))
x = log(alpha / beta)
y = log(alpha + beta)
mean = alpha / (alpha + beta)
data.frame(alpha=alpha, beta=beta, x=x, y=y, mean=mean)
}
expvals(dens.points)
alpha | beta | x | y | mean |
---|---|---|---|---|
2.406424 | 14.37326 | -1.787228 | 2.820169 | 0.143413 |
We want to find the maximum value of the log posterior, so that we can use it later for normalization in the integral. The optim
function seems perfect for this!
model.map = optim(c(1, 1), function(pars) {
alpha = pars[1]
beta = pars[2]
log(model.prior(alpha, beta)) + model.loglike(alpha, beta, rat.data$Infected, rat.data$Total)
}, control=list(fnscale=-1))
model.map
$par [1] 1.82629 10.84766 $value [1] -737.0054 $counts function gradient 71 NA $convergence [1] 0 $message NULL
First, we will write a function to compute the probability density of a particular $\theta$, as an integral $P(\theta|\mathbf{y}) = \iint P(\theta|\alpha,\beta)P(\alpha,\beta|\mathbf{y}) \, \mathrm{d}\alpha \mathrm{d}\beta$.
This function is over theta
(it will return a vector of densities, one corresponding to each value of theta
). The norm
parameter is a normalizing constant used to make the log likelihoods exponentiable within the limits of floating point accuracy; a value a little less than the maximum log likelihood computed by the xydensity
function is a good choice.
model.theta.post = function(theta, logScale, counts, totals, prior=model.prior, loglike=model.loglike,
parallel=!identical(Sys.info()[["sysname"]], "Windows")) {
aaply(theta, 1, function(thv) {
integrate(function(beta) {
aaply(beta, 1, function(bv) {
integrate(function(alpha) {
ll = loglike(alpha, bv, counts, totals)
like = exp(ll - logScale)
pth = dbeta(thv, alpha, bv)
pth * prior(alpha, bv) * like
}, 0, Inf, rel.tol = 0.001)$value
})
}, 0, Inf, rel.tol = 0.001)$value
}, .parallel=parallel)
}
What's our maximum density? We'll use that as a starting point for scaling.
model.map$value
Let's compute; we do a second normalization to account for the fact that we are computing on intervals of 0.001, to try to make the area under the curve sum to 1. This computation takes a little while.
theta.post = data.frame(Theta=seq(0.01, 0.99, 0.005)) %>%
mutate(RawPostDens = model.theta.post(Theta, model.map$value + 1,
rat.data$Infected, rat.data$Total)) %>%
mutate(PostDens = RawPostDens / sum(RawPostDens * 0.005))
ggplot(theta.post) +
aes(x=Theta, y=PostDens) +
geom_line()
Now we are going to use STAN to conduct an MCMC simulation of the posterior distribution.
writeLines(readLines("ratmodel.stan"))
data { int<lower=0> J; int<lower=0> n[J]; int<lower=0> y[J]; } parameters { real<lower=0> alpha; real<lower=0> beta; real<lower=0,upper=1> theta[J]; } model { theta ~ beta(alpha, beta); y ~ binomial(n, theta); }
rat_fit = stan(file="ratmodel.stan", data=list(J=nrow(rat.data), y=rat.data$Infected, n=rat.data$Total),
iter=2000, chains=4, control=list(adapt_delta=0.99))
Warning message: “There were 76 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup”Warning message: “There were 30 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded”Warning message: “There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See http://mc-stan.org/misc/warnings.html#bfmi-low”Warning message: “Examine the pairs() plot to diagnose sampling problems ”
print(rat_fit)
Inference for Stan model: ratmodel. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 2.5% 25% 50% 75% 97.5% alpha 2098.59 2383.60 4957.45 1.65 2.61 3.67 27.73 17130.45 beta 10124.90 11645.87 23651.40 9.75 15.84 22.16 173.69 81715.78 theta[1] 0.10 0.03 0.05 0.01 0.05 0.09 0.15 0.18 theta[2] 0.10 0.03 0.05 0.01 0.05 0.09 0.15 0.18 theta[3] 0.10 0.03 0.05 0.01 0.05 0.09 0.15 0.18 theta[4] 0.10 0.03 0.05 0.01 0.05 0.09 0.15 0.18 theta[5] 0.10 0.03 0.05 0.01 0.05 0.09 0.15 0.18 theta[6] 0.10 0.03 0.05 0.01 0.05 0.09 0.15 0.18 theta[7] 0.10 0.03 0.05 0.01 0.05 0.09 0.15 0.18 theta[8] 0.10 0.03 0.05 0.02 0.06 0.09 0.15 0.18 theta[9] 0.10 0.03 0.05 0.01 0.05 0.09 0.15 0.18 theta[10] 0.10 0.03 0.05 0.01 0.05 0.09 0.15 0.18 theta[11] 0.10 0.03 0.05 0.01 0.05 0.09 0.15 0.18 theta[12] 0.10 0.03 0.05 0.02 0.05 0.09 0.15 0.18 theta[13] 0.10 0.03 0.05 0.01 0.05 0.09 0.15 0.18 theta[14] 0.10 0.03 0.05 0.02 0.06 0.09 0.15 0.18 theta[15] 0.12 0.02 0.05 0.03 0.08 0.11 0.16 0.19 theta[16] 0.12 0.02 0.05 0.03 0.07 0.11 0.16 0.20 theta[17] 0.12 0.02 0.05 0.03 0.07 0.12 0.16 0.20 theta[18] 0.11 0.02 0.05 0.03 0.07 0.11 0.16 0.20 theta[19] 0.12 0.02 0.05 0.03 0.08 0.12 0.16 0.20 theta[20] 0.12 0.02 0.05 0.03 0.08 0.12 0.16 0.20 theta[21] 0.12 0.02 0.05 0.03 0.08 0.12 0.16 0.20 theta[22] 0.12 0.02 0.05 0.03 0.08 0.12 0.16 0.21 theta[23] 0.13 0.01 0.04 0.05 0.10 0.14 0.17 0.22 theta[24] 0.12 0.02 0.05 0.04 0.09 0.12 0.16 0.20 theta[25] 0.13 0.02 0.05 0.04 0.09 0.13 0.16 0.21 theta[26] 0.13 0.02 0.05 0.04 0.09 0.13 0.17 0.21 theta[27] 0.13 0.01 0.05 0.04 0.10 0.14 0.17 0.23 theta[28] 0.13 0.01 0.05 0.04 0.09 0.14 0.17 0.23 theta[29] 0.13 0.01 0.05 0.04 0.09 0.14 0.17 0.23 theta[30] 0.13 0.00 0.05 0.04 0.10 0.14 0.17 0.23 theta[31] 0.13 0.01 0.05 0.05 0.10 0.14 0.17 0.22 theta[32] 0.13 0.01 0.05 0.05 0.10 0.14 0.17 0.22 theta[33] 0.14 0.00 0.05 0.04 0.10 0.14 0.17 0.25 theta[34] 0.13 0.02 0.04 0.05 0.10 0.13 0.16 0.19 theta[35] 0.14 0.00 0.05 0.05 0.10 0.14 0.17 0.24 theta[36] 0.13 0.02 0.04 0.06 0.10 0.13 0.16 0.20 theta[37] 0.14 0.00 0.05 0.05 0.11 0.15 0.17 0.24 theta[38] 0.15 0.00 0.04 0.08 0.12 0.15 0.17 0.23 theta[39] 0.15 0.00 0.04 0.08 0.12 0.15 0.17 0.23 theta[40] 0.15 0.00 0.05 0.06 0.12 0.15 0.17 0.26 theta[41] 0.15 0.00 0.05 0.06 0.12 0.15 0.17 0.25 theta[42] 0.15 0.00 0.05 0.05 0.11 0.15 0.18 0.27 theta[43] 0.17 0.00 0.04 0.10 0.15 0.17 0.19 0.27 theta[44] 0.18 0.00 0.04 0.11 0.15 0.17 0.20 0.27 theta[45] 0.17 0.00 0.05 0.08 0.14 0.17 0.19 0.28 theta[46] 0.17 0.00 0.05 0.08 0.14 0.17 0.19 0.29 theta[47] 0.17 0.00 0.05 0.08 0.14 0.17 0.19 0.28 theta[48] 0.17 0.00 0.05 0.08 0.14 0.17 0.19 0.29 theta[49] 0.17 0.00 0.05 0.08 0.14 0.17 0.19 0.29 theta[50] 0.17 0.00 0.05 0.08 0.14 0.17 0.19 0.29 theta[51] 0.17 0.00 0.05 0.08 0.14 0.17 0.19 0.29 theta[52] 0.18 0.00 0.04 0.11 0.16 0.17 0.21 0.28 theta[53] 0.17 0.00 0.05 0.08 0.14 0.17 0.20 0.29 theta[54] 0.17 0.00 0.05 0.08 0.14 0.17 0.19 0.30 theta[55] 0.17 0.00 0.05 0.08 0.14 0.17 0.19 0.30 theta[56] 0.18 0.00 0.05 0.10 0.15 0.17 0.20 0.30 theta[57] 0.20 0.01 0.05 0.12 0.17 0.18 0.23 0.30 theta[58] 0.20 0.02 0.05 0.13 0.17 0.19 0.23 0.31 theta[59] 0.19 0.00 0.06 0.09 0.15 0.17 0.21 0.32 theta[60] 0.19 0.00 0.05 0.10 0.15 0.17 0.22 0.32 theta[61] 0.19 0.00 0.06 0.11 0.16 0.18 0.22 0.33 theta[62] 0.19 0.00 0.06 0.10 0.16 0.18 0.22 0.33 theta[63] 0.20 0.00 0.06 0.10 0.16 0.18 0.23 0.34 theta[64] 0.20 0.00 0.06 0.11 0.16 0.19 0.24 0.35 theta[65] 0.20 0.00 0.06 0.11 0.16 0.18 0.24 0.36 theta[66] 0.21 0.02 0.06 0.11 0.16 0.18 0.24 0.36 theta[67] 0.23 0.03 0.06 0.15 0.18 0.23 0.28 0.35 theta[68] 0.24 0.03 0.06 0.15 0.18 0.24 0.29 0.37 theta[69] 0.24 0.03 0.06 0.15 0.18 0.24 0.28 0.37 theta[70] 0.24 0.03 0.07 0.14 0.17 0.23 0.29 0.40 theta[71] 0.19 0.00 0.06 0.09 0.15 0.17 0.22 0.34 lp__ -713.48 71.18 104.29 -790.11 -776.16 -767.13 -726.07 -487.68 n_eff Rhat alpha 4 4.40 beta 4 4.25 theta[1] 4 1.48 theta[2] 3 1.53 theta[3] 3 1.52 theta[4] 3 1.49 theta[5] 3 1.52 theta[6] 3 1.53 theta[7] 3 1.53 theta[8] 3 1.50 theta[9] 3 1.49 theta[10] 3 1.49 theta[11] 4 1.46 theta[12] 4 1.44 theta[13] 4 1.43 theta[14] 4 1.46 theta[15] 5 1.26 theta[16] 5 1.25 theta[17] 5 1.25 theta[18] 5 1.26 theta[19] 5 1.24 theta[20] 5 1.24 theta[21] 6 1.21 theta[22] 6 1.21 theta[23] 11 1.11 theta[24] 6 1.21 theta[25] 7 1.18 theta[26] 7 1.16 theta[27] 10 1.11 theta[28] 12 1.09 theta[29] 11 1.10 theta[30] 4000 1.10 theta[31] 11 1.11 theta[32] 11 1.11 theta[33] 4000 1.06 theta[34] 6 1.22 theta[35] 4000 1.08 theta[36] 7 1.19 theta[37] 4000 1.06 theta[38] 4000 1.05 theta[39] 4000 1.03 theta[40] 4000 1.02 theta[41] 4000 1.02 theta[42] 4000 1.02 theta[43] 4000 1.01 theta[44] 4000 1.03 theta[45] 4000 1.00 theta[46] 4000 1.00 theta[47] 4000 1.00 theta[48] 4000 1.01 theta[49] 4000 1.00 theta[50] 4000 1.00 theta[51] 4000 1.00 theta[52] 4000 1.04 theta[53] 4000 1.01 theta[54] 4000 1.01 theta[55] 4000 1.01 theta[56] 4000 1.02 theta[57] 11 1.10 theta[58] 9 1.13 theta[59] 4000 1.03 theta[60] 4000 1.03 theta[61] 4000 1.06 theta[62] 4000 1.04 theta[63] 4000 1.06 theta[64] 4000 1.09 theta[65] 4000 1.08 theta[66] 13 1.09 theta[67] 4 1.38 theta[68] 4 1.42 theta[69] 4 1.38 theta[70] 5 1.27 theta[71] 4000 1.03 lp__ 2 6.70 Samples were drawn using NUTS(diag_e) at Fri Jan 19 08:24:26 2018. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
pairs(rat_fit, pars=c("alpha", "beta", "lp__"))
rat_sim = rstan::extract(rat_fit, permuted=TRUE)
n_sims = length(rat_sim$lp__)
n_sims
theta_sims = data_frame(alpha=rat_sim$alpha, beta=rat_sim$beta) %>%
mutate(Theta=rbeta(n(), alpha, beta))
summary(theta_sims)
alpha beta Theta Min. : 0.996 Min. : 6.13 Min. :0.002141 1st Qu.: 2.611 1st Qu.: 15.84 1st Qu.:0.102461 Median : 3.673 Median : 22.16 Median :0.152477 Mean : 2098.586 Mean :10124.90 Mean :0.148635 3rd Qu.: 27.729 3rd Qu.: 173.69 3rd Qu.:0.176309 Max. :17826.122 Max. :86848.38 Max. :0.509650
theta_dens = density(theta_sims$Theta)
ggplot(data_frame(x=theta_dens$x, y=theta_dens$y)) +
aes(x, y) +
geom_line()
writeLines(readLines("ratxmodel.stan"))
data { int<lower=0> J; int<lower=0> n[J]; int<lower=0> y[J]; } parameters { real<lower=0,upper=1> phi; real<lower=0.1> lambda; real<lower=0,upper=1> theta[J]; } transformed parameters { real<lower=0> alpha; real<lower=0> beta; alpha = lambda * phi; beta = lambda * (1 - phi); } model { phi ~ beta(1,1); lambda ~ pareto(0.1, 1.5); theta ~ beta(alpha, beta); y ~ binomial(n, theta); }
ratx_fit = stan(file="ratxmodel.stan",
data=list(J=nrow(rat.data), y=rat.data$Infected, n=rat.data$Total),
iter=2000, chains=4, control=list(adapt_delta=0.99))
hash mismatch so recompiling; make sure Stan code ends with a blank line
print(ratx_fit)
Inference for Stan model: ratxmodel. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff phi 0.14 0.00 0.01 0.12 0.14 0.14 0.15 0.17 3290 lambda 14.94 0.19 5.09 7.87 11.35 14.07 17.55 27.21 691 theta[1] 0.06 0.00 0.04 0.01 0.03 0.05 0.08 0.16 4000 theta[2] 0.06 0.00 0.04 0.01 0.03 0.05 0.08 0.16 4000 theta[3] 0.06 0.00 0.04 0.01 0.03 0.05 0.08 0.16 4000 theta[4] 0.06 0.00 0.04 0.01 0.03 0.05 0.08 0.16 4000 theta[5] 0.06 0.00 0.04 0.01 0.03 0.05 0.08 0.17 4000 theta[6] 0.06 0.00 0.04 0.01 0.03 0.05 0.08 0.16 4000 theta[7] 0.06 0.00 0.04 0.01 0.03 0.05 0.08 0.16 4000 theta[8] 0.06 0.00 0.04 0.01 0.03 0.05 0.09 0.16 4000 theta[9] 0.06 0.00 0.04 0.00 0.03 0.05 0.09 0.17 4000 theta[10] 0.06 0.00 0.04 0.01 0.03 0.05 0.09 0.17 4000 theta[11] 0.06 0.00 0.04 0.01 0.03 0.05 0.09 0.16 4000 theta[12] 0.06 0.00 0.04 0.01 0.03 0.06 0.09 0.16 4000 theta[13] 0.06 0.00 0.04 0.01 0.03 0.05 0.09 0.17 4000 theta[14] 0.07 0.00 0.05 0.01 0.03 0.06 0.09 0.17 4000 theta[15] 0.09 0.00 0.05 0.02 0.05 0.08 0.12 0.21 4000 theta[16] 0.09 0.00 0.05 0.02 0.05 0.08 0.12 0.21 4000 theta[17] 0.09 0.00 0.05 0.02 0.05 0.08 0.12 0.21 4000 theta[18] 0.09 0.00 0.05 0.02 0.05 0.08 0.12 0.20 4000 theta[19] 0.09 0.00 0.05 0.02 0.06 0.09 0.12 0.21 4000 theta[20] 0.09 0.00 0.05 0.02 0.05 0.08 0.12 0.22 4000 theta[21] 0.09 0.00 0.05 0.02 0.06 0.09 0.12 0.21 4000 theta[22] 0.09 0.00 0.05 0.02 0.06 0.09 0.12 0.21 4000 theta[23] 0.12 0.00 0.05 0.04 0.09 0.12 0.15 0.24 4000 theta[24] 0.10 0.00 0.05 0.03 0.07 0.10 0.13 0.21 4000 theta[25] 0.11 0.00 0.05 0.03 0.07 0.10 0.14 0.22 4000 theta[26] 0.11 0.00 0.05 0.03 0.07 0.10 0.14 0.22 4000 theta[27] 0.12 0.00 0.05 0.04 0.08 0.11 0.15 0.25 4000 theta[28] 0.12 0.00 0.05 0.03 0.08 0.11 0.15 0.24 4000 theta[29] 0.12 0.00 0.05 0.04 0.08 0.11 0.15 0.24 4000 theta[30] 0.12 0.00 0.06 0.03 0.08 0.11 0.15 0.24 4000 theta[31] 0.12 0.00 0.06 0.03 0.08 0.11 0.15 0.25 4000 theta[32] 0.12 0.00 0.05 0.03 0.08 0.11 0.15 0.25 4000 theta[33] 0.13 0.00 0.07 0.03 0.08 0.12 0.16 0.28 4000 theta[34] 0.11 0.00 0.04 0.05 0.08 0.11 0.14 0.20 4000 theta[35] 0.12 0.00 0.06 0.04 0.08 0.11 0.15 0.25 4000 theta[36] 0.12 0.00 0.04 0.05 0.09 0.11 0.14 0.21 4000 theta[37] 0.13 0.00 0.06 0.04 0.09 0.12 0.16 0.27 4000 theta[38] 0.14 0.00 0.04 0.07 0.11 0.14 0.17 0.24 4000 theta[39] 0.15 0.00 0.04 0.07 0.11 0.14 0.18 0.24 4000 theta[40] 0.15 0.00 0.06 0.05 0.10 0.14 0.18 0.28 4000 theta[41] 0.15 0.00 0.06 0.05 0.10 0.14 0.18 0.28 4000 theta[42] 0.15 0.00 0.07 0.05 0.10 0.14 0.19 0.30 4000 theta[43] 0.18 0.00 0.05 0.10 0.14 0.17 0.21 0.28 4000 theta[44] 0.19 0.00 0.05 0.10 0.15 0.18 0.22 0.30 4000 theta[45] 0.18 0.00 0.06 0.07 0.13 0.17 0.22 0.32 4000 theta[46] 0.18 0.00 0.07 0.07 0.13 0.17 0.22 0.32 4000 theta[47] 0.18 0.00 0.06 0.07 0.13 0.17 0.22 0.32 4000 theta[48] 0.18 0.00 0.07 0.07 0.13 0.17 0.22 0.32 4000 theta[49] 0.18 0.00 0.07 0.07 0.13 0.17 0.22 0.32 4000 theta[50] 0.18 0.00 0.06 0.07 0.13 0.17 0.22 0.32 4000 theta[51] 0.18 0.00 0.06 0.07 0.13 0.17 0.22 0.32 4000 theta[52] 0.19 0.00 0.05 0.11 0.16 0.19 0.22 0.30 4000 theta[53] 0.18 0.00 0.07 0.07 0.13 0.17 0.22 0.32 4000 theta[54] 0.18 0.00 0.07 0.07 0.13 0.17 0.22 0.32 4000 theta[55] 0.18 0.00 0.07 0.07 0.14 0.18 0.22 0.32 4000 theta[56] 0.19 0.00 0.06 0.09 0.15 0.19 0.23 0.34 4000 theta[57] 0.22 0.00 0.05 0.12 0.18 0.21 0.25 0.33 4000 theta[58] 0.22 0.00 0.05 0.13 0.19 0.22 0.25 0.33 4000 theta[59] 0.21 0.00 0.07 0.09 0.16 0.20 0.25 0.36 4000 theta[60] 0.21 0.00 0.07 0.09 0.16 0.20 0.25 0.36 4000 theta[61] 0.22 0.00 0.07 0.10 0.17 0.21 0.26 0.36 4000 theta[62] 0.21 0.00 0.07 0.09 0.16 0.21 0.26 0.37 4000 theta[63] 0.22 0.00 0.07 0.11 0.17 0.22 0.27 0.37 4000 theta[64] 0.23 0.00 0.07 0.10 0.18 0.23 0.28 0.40 4000 theta[65] 0.23 0.00 0.07 0.11 0.18 0.23 0.28 0.39 4000 theta[66] 0.24 0.00 0.07 0.11 0.18 0.23 0.28 0.39 4000 theta[67] 0.27 0.00 0.05 0.17 0.23 0.27 0.31 0.38 4000 theta[68] 0.28 0.00 0.06 0.18 0.24 0.28 0.32 0.40 4000 theta[69] 0.28 0.00 0.06 0.18 0.24 0.28 0.32 0.40 4000 theta[70] 0.29 0.00 0.08 0.16 0.23 0.28 0.34 0.45 4000 theta[71] 0.21 0.00 0.08 0.09 0.16 0.21 0.26 0.38 4000 alpha 2.15 0.03 0.73 1.14 1.64 2.01 2.52 3.93 647 beta 12.79 0.17 4.39 6.68 9.70 12.01 15.04 23.39 707 lp__ -789.34 0.35 8.79 -806.87 -795.18 -789.23 -783.18 -772.64 616 Rhat phi 1.00 lambda 1.01 theta[1] 1.00 theta[2] 1.00 theta[3] 1.00 theta[4] 1.00 theta[5] 1.00 theta[6] 1.00 theta[7] 1.00 theta[8] 1.00 theta[9] 1.00 theta[10] 1.00 theta[11] 1.00 theta[12] 1.00 theta[13] 1.00 theta[14] 1.00 theta[15] 1.00 theta[16] 1.00 theta[17] 1.00 theta[18] 1.00 theta[19] 1.00 theta[20] 1.00 theta[21] 1.00 theta[22] 1.00 theta[23] 1.00 theta[24] 1.00 theta[25] 1.00 theta[26] 1.00 theta[27] 1.00 theta[28] 1.00 theta[29] 1.00 theta[30] 1.00 theta[31] 1.00 theta[32] 1.00 theta[33] 1.00 theta[34] 1.00 theta[35] 1.00 theta[36] 1.00 theta[37] 1.00 theta[38] 1.00 theta[39] 1.00 theta[40] 1.00 theta[41] 1.00 theta[42] 1.00 theta[43] 1.00 theta[44] 1.00 theta[45] 1.00 theta[46] 1.00 theta[47] 1.00 theta[48] 1.00 theta[49] 1.00 theta[50] 1.00 theta[51] 1.00 theta[52] 1.00 theta[53] 1.00 theta[54] 1.00 theta[55] 1.00 theta[56] 1.00 theta[57] 1.00 theta[58] 1.00 theta[59] 1.00 theta[60] 1.00 theta[61] 1.00 theta[62] 1.00 theta[63] 1.00 theta[64] 1.00 theta[65] 1.00 theta[66] 1.00 theta[67] 1.00 theta[68] 1.00 theta[69] 1.00 theta[70] 1.00 theta[71] 1.00 alpha 1.01 beta 1.01 lp__ 1.01 Samples were drawn using NUTS(diag_e) at Fri Jan 19 08:44:25 2018. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
ratx_sim = rstan::extract(ratx_fit, permuted=TRUE)
n_sims = length(ratx_sim$lp__)
n_sims
thetax_sims = data_frame(alpha=ratx_sim$alpha, beta=ratx_sim$beta) %>%
mutate(Theta=rbeta(n(), alpha, beta))
summary(thetax_sims)
alpha beta Theta Min. :0.7488 Min. : 4.179 Min. :0.0002697 1st Qu.:1.6366 1st Qu.: 9.655 1st Qu.:0.0725437 Median :2.0138 Median :11.967 Median :0.1277092 Mean :2.1246 Mean :12.589 Mean :0.1441277 3rd Qu.:2.4784 3rd Qu.:14.725 3rd Qu.:0.1963121 Max. :6.5699 Max. :38.993 Max. :0.7375433
thetax_dens = density(thetax_sims$Theta)
ggplot(data_frame(x=thetax_dens$x, y=thetax_dens$y)) +
aes(x, y) +
geom_line()
posteriors = bind_rows(Integrated=select(theta.post, Theta, Density=PostDens),
MCMC=data_frame(Theta=theta_dens$x, Density=theta_dens$y),
MCMC_RP=data_frame(Theta=thetax_dens$x, Density=thetax_dens$y),
.id="Method")
Error in select(theta.post, Theta, Density = PostDens): object 'theta.post' not found Traceback: 1. bind_rows(Integrated = select(theta.post, Theta, Density = PostDens), . MCMC = data_frame(Theta = theta_dens$x, Density = theta_dens$y), . MCMC_RP = data_frame(Theta = thetax_dens$x, Density = thetax_dens$y), . .id = "Method") 2. flatten_bindable(dots_values(...)) 3. dots_values(...) 4. map(dots, function(dot) eval_bare(dot$expr, dot$env)) 5. lapply(.x, .f, ...) 6. FUN(X[[i]], ...) 7. eval_bare(dot$expr, dot$env) 8. select(theta.post, Theta, Density = PostDens)
ggplot(posteriors) +
aes(x=Theta, y=Density, color=Method) +
geom_line()
Error in ggplot(posteriors): object 'posteriors' not found Traceback: 1. ggplot(posteriors)