Model Fitting in (Vector-Borne Disease) Ecology and Evolution

Introduction¶

This notebook assumes that you have already seen the lecture on model fitting in Ecology and Evolution. In this workshop we will work through multiple techniques to do so using various examples.

In [5]:

# For starters, clear all variables and graphic devices and load necessary packages:

rm(list = ls())
graphics.off()

In [6]:

library(repr)
options(repr.plot.width=5, repr.plot.height=4) # Change default plot size; not necessary if you are using Rstudio

Model fitting using Non-linear least squares¶

We will work with several practical examples here. These assume that you have at least a conceptual understanding of what Linear vs Non-linear models are, how they are fitted to data, and how the fits can be assessed statistically. If not, you may want to see the Linear Models (video here) and NLLS lectures first.

You will need the nls.lm R package, which you can install using the standard method (linux users, launch R in sudo mode first):

> install.packages("minpack.lm")

Why nls.lm? The standard NLLS function in R, called nls, uses a less robust algorithm called the Gauss-Newton algorithm. Therefore, nls will often fail to fit your model to the data if you start off at starting values for the parameters that are too far from what the optimal values would be, especially if the "parameter space" is weirdly shaped, i.e., the model has a mathematical form that makes it hard to find parameter combinations that minimize the residual sum of squared (RSS). If this does not makes sense, don't worry about it- just go with nls_LM from the nls.lm package instead of nls! If you are really curious, try substituting nls for nls_LM in the examples below and compare the results.

Now load the minpack.lm package:

In [7]:

require("minpack.lm") # for Levenberg-Marquardt nlls fitting

Loading required package: minpack.lm

Traits data as an example¶

Our first set of examples will focus on traits.

A trait is any measurable feature of an individual organism. This includes physical traits (e.g., morphology, body mass, wing length), performance traits (e.g., respiration rate, body velocity, fecundity), and behavioral traits (e.g., feeding preference, foraging strategy, mate choice). All natural populations show variation in traits across individuals. A trait is functional when it directly (e.g., mortality rate) or indirectly (e.g., somatic development or growth rate) determines individual fitness. Therefore, variation in (functional) traits can generate variation in the rate of increase and persistence of populations. When measured in the context of life cycles, without considering interactions with other organisms (e.g., predators or prey of the vector), functional traits are typically called life history traits (such as mortality rate and fecundity). Other traits determine interactions both within the vector population (e.g., intra-specific interference or mating frequency) and between vectors and other species, including the species which may act as resources (prey, for example). Thus both life history and interaction traits determine vector population fitness and therefore abundance, which ultimately influences disease transmission rate.

Allometric scaling of traits¶

Let's start with a common and reasonably simple example from biology: allometric scaling. Allometric relationships between linear measurements such as body length, wing span, and thorax width are a good way to obtain estimates of body weights of individual vectors such as mosquitoes, ticks, and flies. We will look at allometric scaling of body weight vs. total body length in dragonflies and damselfiles (yes, these are not known vectors!).

Allometric relationships take the form:

$y = a x^b$

where $x$ and $y$ are morphological measures (body length and body weight respectively, in our current example), the constant is the value of $y$ at body length $x = 1$ unit, and $b$ is the scaling "exponent". This is also called a power-law, because $y$ relates to $x$ through a simple power.

First create a function object for the power law model:

In [8]:

powMod <- function(x, a, b) {
    return(a * x^b)
}

Now get the data (first click on link and use "Save as" or Ctrl+S to download it as a csv). Then, save it in your data directory. After that, import it into your R workspace:

In [9]:

MyData <- read.csv("../data/GenomeSize.csv")

head(MyData)

Suborder	Family	Species	GenomeSize	GenomeSE	GenomeN	BodyWeight	TotalLength	HeadLength	ThoraxLength	AdbdomenLength	ForewingLength	HindwingLength	ForewingArea	HindwingArea	MorphologyN
Anisoptera	Aeshnidae	Aeshna canadensis	2.20	NA	1	0.159	67.58	6.83	11.81	48.94	45.47	45.40	369.57	483.61	2
Anisoptera	Aeshnidae	Aeshna constricta	1.76	0.06	4	0.228	71.97	6.84	10.72	54.41	46.00	45.48	411.15	517.38	3
Anisoptera	Aeshnidae	Aeshna eremita	1.85	NA	1	0.312	78.80	6.27	16.19	56.33	51.24	49.47	460.72	574.33	1
Anisoptera	Aeshnidae	Aeshna tuberculifera	1.78	0.10	2	0.218	72.44	6.62	12.53	53.29	49.84	48.82	468.74	591.42	2
Anisoptera	Aeshnidae	Aeshna umbrosa	2.00	NA	1	0.207	73.05	4.92	11.11	57.03	46.51	45.97	382.48	481.44	1
Anisoptera	Aeshnidae	Aeshna verticalis	1.59	NA	1	0.220	66.25	6.48	11.64	48.13	45.91	44.91	400.40	486.97	1

Anisoptera are dragonflies, and Zygoptera are Damselflies. The variables of interest are BodyWeight and TotalLength. Let's use the dragonflies data subset.

So subset the data accordingly and remove NAs:

In [10]:

Data2Fit <- subset(MyData,Suborder == "Anisoptera")

Data2Fit <- Data2Fit[!is.na(Data2Fit$TotalLength),] # remove NA's

Plot it:

In [11]:

plot(Data2Fit$TotalLength, Data2Fit$BodyWeight)

Or, using ggplot:

In [12]:

library("ggplot2")

ggplot(Data2Fit, aes(x = TotalLength, y = BodyWeight)) + geom_point()

Remember, when you write this analysis into a stand-alone R script, you should put all commands for loading packages (library(), require()) at the start of the script.

Now fit the model to the data using NLLS:

In [13]:

PowFit <- nlsLM(BodyWeight ~ powMod(TotalLength, a, b), data = Data2Fit, start = list(a = .1, b = .1))

Before proceeding further, have a look at what nlsLM's arguments are:

In [14]:

?nlsLM

Note that NLLS fitting requires "starting values" for the parameters (two in this case: a and b).

Having obtained the fit, we can use summary() just like we would for a lm() fit object.

In [15]:

summary(PowFit)

Formula: BodyWeight ~ powMod(TotalLength, a, b)

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
a 3.941e-06  2.234e-06   1.764    0.083 .  
b 2.585e+00  1.348e-01  19.174   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02807 on 58 degrees of freedom

Number of iterations to convergence: 39 
Achieved convergence tolerance: 1.49e-08

Most of the output is analogous to the output of an lm(). However, further statistical inference here cannot be done using Analysis of Variance (ANOVA), because the model is not a Linear Model. Try anova(PowFit), and see what happens. The Number of iterations to convergence, and Achieved convergence tolerance stem from the fact that NLLS fitting requires computer simulations; revisit the Lecture for an explanation of this.

Now let's visualize the fit. For this, first we need to generate a vector of body lengths (the x-axis variable) for plotting:

In [16]:

Lengths <- seq(min(Data2Fit$TotalLength),max(Data2Fit$TotalLength),len=200)

Next, calculate the predicted line. For this, we will need to extract the coefficient from the model fit object using the coef()command.

In [17]:

coef(PowFit)["a"]
coef(PowFit)["b"]

a: 3.94068491030517e-06

b: 2.58504796772587

So, we can do the following:

In [18]:

Predic2PlotPow <- powMod(Lengths,coef(PowFit)["a"],coef(PowFit)["b"])

Now plot the data and the fitted model line:

In [19]:

plot(Data2Fit$TotalLength, Data2Fit$BodyWeight)
lines(Lengths, Predic2PlotPow, col = 'blue', lwd = 2.5)

We can claculate the confidence intervals on the estimated parameters as we would in OLS fitting used for Linear Models:

In [20]:

confint(PowFit)

Waiting for profiling to be done...

	2.5%	97.5%
a	1.171935e-06	1.205273e-05
b	2.318292e+00	2.872287e+00

As you likely have learnt before, a coefficient's CI should not include zero for it to be statistically significant (different from zero).

Exercises ¶

(a) Make the same plot as above, fitted line and all, in ggplot, and add (display) the equation you estimated to your new (ggplot) plot. The equation is: $\text{Weight} = 3.94 \times 10^{-06} \times \text{Length}^{2.59}$

(b) Try playing with the starting values, and see if you can "break" the model fitting -- that is, change the starting values till the NLLS fitting does not converge on a solution.

(c) Repeat the model fitting (including a-b above) using the Zygoptera data subset.

(d) There is an alternative (and in fact, more commonly-used) approach for fitting the allometric model to data: using Ordinary Least Squares on bi-logarithamically transformed data. That is, if you take a log of both sides of the allometric equation we get,

$\log(y) = \log(a) + b \log(x)$

This is a straight line equation of the form $c = d + b z$ , where $c = \log(c)$ , $d = \log(a)$ , $z = \log(x)$ , and $b$ is now the slope parameter. So you can use Ordinary Least Squares and the linear models framework (with lm()) in R to estimate the parameters of the allometric equation.

In this exercise, try comparing the NLLS vs OLS methods to see how much difference you get in the parameter estimates between them. For example, see the methods used in this paper by Cohen et al 2012.

(e) The allometry between Body weight and Length is not the end of the story. You have a number of other linear morphological measurements (HeadLength, ThoraxLength, AdbdomenLength, ForewingLength, HindwingLength, ForewingArea, and HindwingArea) that can also be investigated. In this exercise, try two lines of investigation (again, repeated separately for Dragonflies and Damselfiles):

(i) How do each of these measures allometrically scale with Body length (obtain estimates of scaling constant and exponent)? (Hint: you may want to use the pairs() command in R to get an overview of all the pairs of potential scaling relationships.

(ii) Do any of the linear morphological measurements other than body length better predict Body weight? That is, does body weight scale more tightly with a linear morphological measurement other than total body length? You would use model selection here, which we will learn next. But for now, you can just look at and compare the $R^2$ values of the models.

Comparing models¶

How do we know that there isn't a better or alternative model that adequately explains the pattern in your dataset?

This is important consideration in all data analyses (and more generally, the scientific method!), so you must aim to compare your NLLS model with an one or more alternatives for a more extensive and reliable investigation of the problem.

Let's use model comparison to investigate whether the relationship between body weight and length we found above is indeed allometric. For this, we need an alternative model that can be fitted to the same data. Let's try a quadratic curve, which is of the form:

$y = a + b x + c x^2$

This can also capture curvature in data, and is an alternative model to the allometric equation. Note that this mode is linear in its parameters (a linear model), which you can fit to the simply data using your familiar lm() function:

In [21]:

QuaFit <- lm(BodyWeight ~ poly(TotalLength,2), data = Data2Fit)

And like before, we obtain the predicted values (but this time using the predict.lm function):

In [22]:

Predic2PlotQua <- predict.lm(QuaFit, data.frame(TotalLength = Lengths))

Now let's plot the two fitted models together:

In [23]:

plot(Data2Fit$TotalLength, Data2Fit$BodyWeight)
lines(Lengths, Predic2PlotPow, col = 'blue', lwd = 2.5)
lines(Lengths, Predic2PlotQua, col = 'red', lwd = 2.5)

Very similar fits, except that the quadratic model seems to deviate a bit from the data at the lower end of the data range. Let's do a proper, formal model comparison now to check which model better-fits the data.

First calculate the R $^2$ values of the two fitted models:

In [24]:

RSS_Pow <- sum(residuals(PowFit)^2)  # Residual sum of squares
TSS_Pow <- sum((Data2Fit$BodyWeight - mean(Data2Fit$BodyWeight))^2)  # Total sum of squares
RSq_Pow <- 1 - (RSS_Pow/TSS_Pow)  # R-squared value

RSS_Qua <- sum(residuals(QuaFit)^2)  # Residual sum of squares
TSS_Qua <- sum((Data2Fit$BodyWeight - mean(Data2Fit$BodyWeight))^2)  # Total sum of squares
RSq_Qua <- 1 - (RSS_Qua/TSS_Qua)  # R-squared value

RSq_Pow 
RSq_Qua

0.90054752976309

0.900302864503218

Not very useful. In general, R $^2$ is a good measure of model fit, but cannot be used for model selection -- epecially not here, given the tiny difference in R $^2$ 's.

Instead, as explained in the lecture, we can use the Akaike Information Criterion (AIC):

In [25]:

n <- nrow(Data2Fit) #set sample size
kPow <- length(coef(PowFit)) # get number of parameters in power law model
kQua <- length(coef(QuaFit)) # get number of parameters in quadratic model

AIC_Pow <- n * log((2 * pi) / n) + n + 2 + n * log(RSS_Pow) + 2 * kPow
AIC_Qua <- n * log((2 * pi) / n) + n + 2 + n * log(RSS_Qua) + 2 * kQua
AIC_Pow - AIC_Qua

-2.1474260812509

Of course, as you might have suspected, we can do this using an in-built function in R!

In [26]:

AIC(PowFit) - AIC(QuaFit)

-2.1474260812509

So which model wins? As we had dicussed in the NLLS lecture, a rule of thumb is that a AIC value difference (typically denoted as $\Delta$ AIC) > 2 is a acceptable cutoff for calling a winner. So the power law (allometric model) is a better fit here. Read the Johnson & Omland paper for more on model selection in Ecology and Evolution.

Exercises ¶

(a) Calculate the Bayesian Information Criterion (BIC), also know as the Schwarz Criterion (see your Lecture notes and the Johnson & Omland paper, and use $\Delta$ BIC to select the better fitting model.

(b) Fit a straight line to the same data and compare with the allometric and quadratic models.

(c) Repeat the model comparison (incuding 1-2 above) using the Damselflies (Zygoptera) data subset -- does the allometric model still win?

(d) Repeat exercise (e)(i) and (ii) from the above set, but with model comparison (e.g., again using a quadratic as an alternative model) to establish that the relationships are indeed allometric.

(d) Repeat exercise (e)(ii) from the above set, but with model comparison to establish which linear measurement is the best predictor of Body weight.

Albatross chick growth¶

Now let's look at a different trait example: the growth of an individual albatross chick (you can find similar data for vector and non-vector arthropods in VecTraits). First load and plot the data:

In [27]:

alb<-read.csv(file="../data/albatross_grow.csv")
alb<-subset(x=alb, !is.na(alb$wt))
plot(alb$age, alb$wt, xlab="age (days)", ylab="weight (g)", xlim=c(0, 100))

Fitting the three models using NLLS¶

Let's fit multiple mdoels to this dataset.

The Von Bertalanffy model is commonly used for modelling the growth of an individual. It's formulation is:

$W(t) = \rho (L_{\infty}(1-e^{-Kt})+L_0 e^{-Kt})^3$

If we pull out $L_{\infty}$ and define $c=L_0/L_{\infty}$ and $W_{\infty}=\rho L_{\infty}^3$ this equation becomes:

$W(t) = W_{\infty}(1-e^{-Kt}+ c e^{-Kt})^3.$

$W_{\infty}$ is interpreted as the mean asymptotic weight, and $c$ the ratio between the initial and final lengths. This second equation is the one we will fit.

We will compare this model against the classical Logistic growth equation, and a straight line. First, as we did before, let's define the R functions for the two models:

In [28]:

logistic1<-function(t, r, K, N0){
  N0*K*exp(r*t)/(K+N0*(exp(r*t)-1))
}

vonbert.w<-function(t, Winf, c, K){
  Winf*(1 - exp(-K*t) + c*exp(-K*t))^3
}

For the straight line, we use simply use R's lm() function, as that is a linear least squares problem. Using NLLS will give (approximately) the same answer, of course. Now fit all 3 models using least squares.

We will scale the data before fitting to improve the stability of the estimates:

In [29]:

scale<-4000

alb.lin<-lm(wt/scale~age, data=alb)

In [30]:

alb.log<-nlsLM(wt/scale~logistic1(age, r, K, N0), start=list(K=1, r=0.1, N0=0.1), data=alb)

alb.vb<-nlsLM(wt/scale~vonbert.w(age, Winf, c, K), start=list(Winf=0.75, c=0.01, K=0.01), data=alb)

Next let's calculate predictions for each of the models across a range of ages.

In [31]:

ages<-seq(0, 100, length=1000)

pred.lin<-predict(alb.lin, newdata = list(age=ages))*scale

pred.log<-predict(alb.log, newdata = list(age=ages))*scale

pred.vb<-predict(alb.vb, newdata = list(age=ages))*scale

And finally plot the data with the fits:

In [32]:

plot(alb$age, alb$wt, xlab="age (days)", ylab="weight (g)", xlim=c(0,100))
lines(ages, pred.lin, col=2, lwd=2)
lines(ages, pred.log, col=3, lwd=2)
lines(ages, pred.vb, col=4, lwd=2)

legend("topleft", legend = c("linear", "logistic", "Von Bert"), lwd=2, lty=1, col=2:4)

Next examine the residuals between the 3 models:

In [33]:

par(mfrow=c(3,1), bty="n")
plot(alb$age, resid(alb.lin), main="LM resids", xlim=c(0,100))
plot(alb$age, resid(alb.log), main="Logisitic resids", xlim=c(0,100))
plot(alb$age, resid(alb.vb), main="VB resids", xlim=c(0,100))

The residuals for all 3 models still exhibit some patterns. In particular, the data seems to go down near the end of the observation period, but none of these models can capture that behavior.

Finally, let's compare the 3 models using a simpler approach than the AIC/BIC one that we used above by calculating adjusted Sums of Squared Errors (SSE's):

In [34]:

n<-length(alb$wt)
list(lin=signif(sum(resid(alb.lin)^2)/(n-2*2), 3), 
    log= signif(sum(resid(alb.log)^2)/(n-2*3), 3), 
    vb= signif(sum(resid(alb.vb)^2)/(n-2*3), 3))

$lin: 0.00958
$log: 0.0056
$vb: 0.00628

The logistic model has the lowest adjusted SSE, so it's the best by this measure. It is also, visually, a better fit.

Exercises ¶

(a) Use AIC/BIC to perform model selection on the Albatross data as we did for the trait allometry example.

(b) Write this example as a self-sufficient R script, with ggplot istead of base plotting

Aedes aegypti fecundity¶

Now let's actually look at a disease vector example! These data measure the reponse of Aedes aegypti fecundity to temperature.

First load and visualize the data:

In [35]:

aedes<-read.csv(file="../data/aedes_fecund.csv")

plot(aedes$T, aedes$EFD, xlab="temperature (C)", ylab="Eggs/day")

The TPC models¶

Let's define some models first:

In [36]:

quad1 <- function(T, T0, Tm, c){
  c*(T-T0)*(T-Tm)*as.numeric(T<Tm)*as.numeric(T>T0)
}

briere <- function(T, T0, Tm, c){
  c*T*(T-T0)*(abs(Tm-T)^(1/2))*as.numeric(T<Tm)*as.numeric(T>T0)
}

Instead of using the inbuilt quadratic function in R, we we define our own to make it easier to choose starting values, and so that we can force the function to be equal to zero above and below the minimum and maximum temperature thresholds (more on this below). The Briere function is a commonly used model for tempoeratuire dependence of insect traits. As in the case of the albatross growth data, we will also compare these two with a strauight line (again, its a linear model, so we can just use lm() without needing to define a function for it).

Now fit all 3 models using least squares. Although it's not as necessary here (as the data don't have as large values as the albatross example), we will again scale the data first:

In [37]:

scale <- 20

aed.lin <- lm(EFD/scale~T, data=aedes)

aed.quad <- nlsLM(EFD/scale~quad1(T, T0, Tm, c), start=list(T0=10, Tm=40, c=0.01), data=aedes)

aed.br <- nlsLM(EFD/scale~briere(T, T0, Tm, c), start=list(T0=10, Tm=40, c=0.1), data=aedes)

Exercises ¶

(a) Complete the Aedes data analysis by fitiing model, calculating predictions and then comparing models. Write a single, self-standing script for it. Which model fits best? By what measure?

(b) In this script, use ggplot instead of base plotting.

Abundances as an example¶

Why Abundances?¶

Fluctuations in the abundance (density) of a disease's vector play a crucial role in its transmission dynamics. This is especially true if vector population densities or their traits change at the same or shorter timescales than the rate of disease transmission. Indeed, most vectors are small ectotherms with short generation times and greater sensitivity to environmental conditions than their (invariably larger, longer-lived, and often, endothermic) hosts. So understanding how vector populations vary over time, space, and with respect to environmental variables such as temperature and preciptation is key. We will look at fitting models to the growth of a single population here. Time series analyses are coever in a separate session.

Population growth rate example¶

A population grows exponentially while its abundance is low and resources are not limiting (the Malthusian principle). This growth then slows and eventually stops as resources become limiting. There may also be a time lag before the population growth really takes off at the start. We will focus on microbial (specifically, bacterial) growth rates. Bacterial growth in batch culture follows a distinct set of phases; lag phase, exponential phase and stationary phase. During the lag phase a suite of transcriptional machinery is activated, including genes involved in nutrient uptake and metabolic changes, as bacteria prepare for growth. During the exponential growth phase, bacteria divide at a constant rate, the population doubling with each generation. When the carrying capacity of the media is reached, growth slows and the number of cells in the culture stabilises, beginning the stationary phase.

Traditionally, microbial growth rates were measured by plotting cell numbers or culture density against time on a semi-log graph and fitting a straight line through the exponential growth phase – the slope of the line gives the maximum growth rate ( $r_{max}$ ). Models have since been developed which we can use to describe the whole sigmoidal bacterial growth curve.

Let's first generate some "data" on the number of bacterial cells as a function of time that we can play with:

In [104]:

time <- c(0, 2, 4, 6, 8, 10, 12, 16, 20, 24) # timepoints, in hours
log_cells <- c(3.62, 3.62, 3.63, 4.14, 5.23, 6.27, 7.57, 8.38, 8.70, 8.69) # logged cell counts - more on this below

set.seed(1234) # set seed to ensure you always get the same random sequence if fluctuations  

data <- data.frame(time, log_cells + rnorm(length(time),sd=.1)) # add some random error

names(data) <- c("t", "LogN")

head(data)

t	LogN
0	3.499293
2	3.647743
4	3.738444
6	3.905430
8	5.272912
10	6.320606

We have added a vector of normally distributed errors to emulate random "sampling errors". Note also that the assumption of normality of errors underlies the statistical analyses of Ordinary NLLS fits, just as it underlies Ordinary Least Squares (your standard linear modelling). In this case, we are talking about log-normality because we are using logged cell counts. Why log them? Because NLLS often converges better if you linearize the data (and correspindingly, the model - see how the models are specified below).

Plot the data:

In [105]:

ggplot(data, aes(x = t, y = LogN)) + geom_point()

We will fit three growth models, all of which are known to fit such population growth data, especially in microbes. These are a modified Gompertz model (Zwietering et. al., 1990), the Baranyi model (Baranyi, 1993) and the Buchanan model (or three-phase logistic model; Buchanan, 1997). Given a set of cell numbers (N) and times (t), each growth model can be described in terms of:

$N_0$ : Initial cell culture (Population) density (number of cells per unit volume)

$N_{max}$ : Maximum culture density (aka "carrying capacity")

$r_{max}$ : Maximum growth rate

$t_{lag}$ : Duration of the lag phase before the population starts growing exponentially

First let's specify the model functions:

In [106]:

baranyi_model <- function(t, r_max, N_max, N_0, t_lag){  # Baranyi model (Baranyi 1993)
return(N_max + log10((-1+exp(r_max*t_lag) + exp(r_max*t))/(exp(r_max*t) - 1 + exp(r_max*t_lag) * 10^(N_max-N_0))))
}

buchanan_model <- function(t, r_max, N_max, N_0, t_lag){ # Buchanan model - three phase logistic (Buchanan 1997)
  return(N_0 + (t >= t_lag) * (t <= (t_lag + (N_max - N_0) * log(10)/r_max)) * r_max * (t - t_lag)/log(10) +
           (t >= t_lag) * (t > (t_lag + (N_max - N_0) * log(10)/r_max)) * (N_max - N_0))
}

gompertz_model <- function(t, r_max, N_max, N_0, t_lag){  # Modified gompertz growth model (Zwietering 1990)
  return(N_0 + (N_max - N_0) * exp(-exp(r_max * exp(1) * (t_lag - t)/((N_max - N_0) * log(10)) + 1)))
}

It is important to note that we have written the functions in log (to the base 10 - can also be base 2 or natural log) scale because we want to do the fitting in log scale (both model and data linearized). The interpretation of each of the the estimated/fitted paramters does not change if we take a log of the model's equation.

Now let's generate some starting values for the NLLS fitting. We did not pay much attention to what starting values we used in the above example on fitting an allometric model because the power-law model is easy to fit using NLLS, and starting far from the optimal parameters does not matter too much. Here, we derive the starting values by using the actual data:

In [96]:

N_0_start <- min(data$LogN)
N_max_start <- max(data$LogN)
t_lag_start <- data$t[which.max(diff(diff(data$LogN)))]
r_max_start <- max(diff(data$LogN))/mean(diff(data$t))

Now fit the models:

In [97]:

fit_baranyi <- nlsLM(LogN ~ baranyi_model(t = t, r_max, N_max, N_0, t_lag), data,
              list(t_lag=t_lag_start, r_max=r_max_start, N_0 = N_0_start, N_max = N_max_start))

fit_buchanan <- nlsLM(LogN ~ buchanan_model(t = t, r_max, N_max, N_0, t_lag), data,
                        list(t_lag=t_lag_start, r_max=r_max_start, N_0 = N_0_start, N_max = N_max_start))

fit_gompertz <- nlsLM(LogN ~ gompertz_model(t = t, r_max, N_max, N_0, t_lag), data,
                      list(t_lag=t_lag_start, r_max=r_max_start, N_0 = N_0_start, N_max = N_max_start))

You might get a warning that one or more of the models generated some NaNs during the fitting procedure for the given data. You can ignore the warning in this case. But not always – sometimes these NaNs mean that the equation is wrongly written, or that it generates NaNs across the whole range of the x-values, in which case the model is inappropriate for these data.

Get the model summaries:

In [107]:

summary(fit_baranyi)
summary(fit_buchanan)
summary(fit_gompertz)

Formula: LogN ~ baranyi_model(t = t, r_max, N_max, N_0, t_lag)

Parameters:
      Estimate Std. Error t value Pr(>|t|)    
t_lag  5.52751    0.34869   15.85 4.00e-06 ***
r_max  1.41823    0.09356   15.16 5.20e-06 ***
N_0    3.59209    0.08646   41.55 1.30e-08 ***
N_max  8.52886    0.08352  102.12 5.94e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.143 on 6 degrees of freedom

Number of iterations to convergence: 10 
Achieved convergence tolerance: 1.49e-08

Formula: LogN ~ buchanan_model(t = t, r_max, N_max, N_0, t_lag)

Parameters:
      Estimate Std. Error t value Pr(>|t|)    
t_lag  5.42029    0.24932   21.74 6.19e-07 ***
r_max  1.36647    0.06886   19.84 1.06e-06 ***
N_0    3.62849    0.07721   46.99 6.22e-09 ***
N_max  8.52330    0.07721  110.39 3.73e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1337 on 6 degrees of freedom

Number of iterations to convergence: 9 
Achieved convergence tolerance: 1.49e-08

Formula: LogN ~ gompertz_model(t = t, r_max, N_max, N_0, t_lag)

Parameters:
      Estimate Std. Error t value Pr(>|t|)    
t_lag  5.76455    0.26923   21.41 6.77e-07 ***
r_max  1.57526    0.09930   15.86 3.98e-06 ***
N_0    3.60201    0.07068   50.96 3.83e-09 ***
N_max  8.65943    0.08526  101.56 6.14e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1191 on 6 degrees of freedom

Number of iterations to convergence: 8 
Achieved convergence tolerance: 1.49e-08

And see how the fits look:

In [108]:

timepoints <- seq(0, 24, 0.1)

baranyi_points <- baranyi_model(t = timepoints, r_max = coef(fit_baranyi)["r_max"], N_max = coef(fit_baranyi)["N_max"], N_0 = coef(fit_baranyi)["N_0"], t_lag = coef(fit_baranyi)["t_lag"])

buchanan_points <- buchanan_model(t = timepoints, r_max = coef(fit_buchanan)["r_max"], N_max = coef(fit_buchanan)["N_max"], N_0 = coef(fit_buchanan)["N_0"], t_lag = coef(fit_buchanan)["t_lag"])

gompertz_points <- gompertz_model(t = timepoints, r_max = coef(fit_gompertz)["r_max"], N_max = coef(fit_gompertz)["N_max"], N_0 = coef(fit_gompertz)["N_0"], t_lag = coef(fit_gompertz)["t_lag"])

df1 <- data.frame(timepoints, baranyi_points)
df1$model <- "Baranyi"
names(df1) <- c("t", "LogN", "model")

df2 <- data.frame(timepoints, buchanan_points)
df2$model <- "Buchanan"
names(df2) <- c("t", "LogN", "model")

df3 <- data.frame(timepoints, gompertz_points)
df3$model <- "Gompertz"
names(df3) <- c("t", "LogN", "model")

model_frame <- rbind(df1, df2, df3)

ggplot(data, aes(x = t, y = LogN)) +
  geom_point(size = 3) +
  geom_line(data = model_frame, aes(x = t, y = LogN, col = model), size = 1)

Exercises¶

(a) Calculate the confidence intervals on the parameters of each of the three fitted models, and use model selection (using AIC and/or BIC) as you did before to see if you can determine the best-fitting model among the three.

(b) Alternatively, for a different random sequence of fluctuations, one or more of the models may fail to fit (a singular gradiant matrix error). Try repeating the above fitting with a different random seed (change the integers given to the random.seed( ) function), or increase the sampling error by increasing the standard deviationand see if it happens. If/when the NLLS optimization does fail to converge (the RSS minimum was not found), you can try to fix it by chaning the starting values.

(c) Repeat the model comparison exercise 1000 times (You will have to write a loop), and determine if/whether one model generally wins more often than the others. Note that each run will generate a slightly different dataset, because we are adding a vector of random errors every time the "data" are generated. This may result in failure of the NLLS fitting to converge, in which case you will need to use the try() or tryCatch functions.

(d) Repeat (b), but increase the error by increasing the standard deviation of the normal error distributon, and see if there are differences in the robustness of the models to sampling/experimental errors. You may also want to try changing the distribution of the errors to some non-normal distribution and see what happens.

Model fitting using Maximum Likelihood¶

Above we learned how to fit a mathematical model/equation to data by using the Least Squares method (linear or nonlinear). That is, we choose the parameters of model being fitted (e.g., straight line) to minimize the sum of the squares of the residuals/errors sround the fitted model. An alternative to minimizing the sum of squared errors is to find parameters to the function such that the likelihood of the parameters, given the data and the model, is maximized. Please see the lectures for the theoretical background to the following R examples.

We will first implement the (negative log) likelihood for simple linear regression (SLR) in R. Recall that SLR assumes every observation in the dataset was generated by the model:

$Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i, \;\;\; \varepsilon_i \stackrel{\mathrm{iid}}{\sim} \mathrm{N}(0, \sigma^2)$

That is, this is a model for the conditional distribution of $Y$ given $X$ . The pdf for the normal distribution is given by

$f(x) = \frac{1}{\sqrt{2\sigma^2 \pi}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2} \right)$

In the SLR model, the conditional distribution has this distribution.

That is, for any single observation, $y_i$

$f(y_i|\beta_0, \beta_1, x_i) = \frac{1}{\sqrt{2\sigma^2 \pi}} \exp\left(-\frac{(y_i-(\beta_0+\beta_1 x_i))^2}{2\sigma^2} \right)$

Interpreting this function as a function of the parameters $\theta=\{ \beta_0, \beta_1, \sigma \}$ , then it gives us the likelihood of the $i^{\mathrm{th}}$ data point.

As we did for the simple binomial distribution (see lecture), we can use this to estimate the parameters of the model.

Implementing the Likelihood in R¶

First, we need to build an R function that returns the (negative log) likelihood for simple linear regression (it is negative log because the log of likelihood is itself negative):

In [66]:

nll.slr <- function(par, dat, ...){
  args <- list(...)
  
  b0 <- par[1]
  b1 <- par[2]
  X <- dat$X
  Y <- dat$Y
  if(!is.na(args$sigma)){
      sigma <- args$sigma
  } else 
      sigma <- par[3]

  mu <- b0+b1*X
  
  return(-sum(dnorm(Y, mean=mu, sd=sigma, log=TRUE)))
}

Note that we do something a bit different here (the "..." bit). We do it this way because we want to be able to use R's optim() function later.

The dnorm() function calculates the logged (the log=TRUE argument) probability of observing Y given mu, sigma and that X.

The negative sign on sum() is because the optim() function in R will minimize the negative log-likelihood, which is a sum: Recall that The log-likelihood of the parameters $\theta$ being true given data x equals to the sum of the logged probability densities of observing the data x given parameters $\theta$ . We want to maximize this (log-) likelihood using optim().

Let's generate some simulated data, assuming that: $\beta_0=$ b0, $\beta_1=$ b1, and $\sigma=$ sigma. For this, we will generate random deviations to simulate sampling or measurement error around an otherwise perfect line of data values:

In [67]:

set.seed(123)
n <- 30
b0 <- 10
b1 <- 3
sigma <- 2
X <- rnorm(n, mean=3, sd=7)
Y <- b0 + b1*X + rnorm(n, mean=0, sd=sigma)
dat <- data.frame(X=X, Y=Y) # convert to a data frame

In the first line, we set.seed() to ensure that we can reproduce the results. The seed number you choose is the starting point used in the generation of a sequence of random numbers. No plot the "data":

In [68]:

plot(X, Y)

Likelihood profile in R¶

For now, let's assume that we know what $\beta_1$ is. Let's build a likelihood profile for the simulated data:

In [69]:

N <- 50
b0s <- seq(5, 15, length=N)
mynll <- rep(NA, length=50)
for(i in 1:N){
  mynll[i] <- nll.slr(par=c(b0s[i],b1), dat=dat, sigma=sigma)
}

That is, we calculate the negative log-likelihood for fixed b1, across a range (5 - 15) of b0.

Now plot the profile:

In [70]:

plot(b0s, mynll, type="l")
abline(v=b0, col=2)
abline(v=b0s[which.min(mynll)], col=3)

The true value for b0 (10) is the red line, while the value that minimizes the log-likelihood (i.e., maximizes the negative log-likelihood) is the green line. These are not the same because maximum likelihood is providing an estimate of the true value given the measurement errors (that we ourselves generated in tgis synthetic dataset).

Likelihood surface in R¶

If we wanted to estimate both $\beta_0$ and $\beta_1$ (two parameters), we need to deal with a two-dimensional maximum likelihood surface. The simplest approach is to do a grid search to find this likelihood surface.

In [71]:

N0 <- 100
N1 <- 101
b0s <- seq(7,12, length=N0)
b1s <- seq(1,5, length=N1)

mynll<-matrix(NA, nrow=N0, ncol=N1)
for(i in 1:N0){
  for(j in 1:N1) mynll[i,j]<-nll.slr(par=c(b0s[i],b1s[j]), dat=dat, sigma=sigma)
}

ww <- which(mynll==min(mynll), arr.ind=TRUE)

b0.est <- b0s[ww[1]]
b1.est <- b1s[ww[2]]
rbind(c(b0, b1), c(b0.est, b1.est))

filled.contour(x = b0s, y = b1s, z= mynll, col=heat.colors(21), 
               plot.axes = {axis(1); axis(2); points(b0,b1, pch=21); 
                            points(b0.est, b1.est, pch=8, cex=1.5); xlab="b0"; ylab="b1"})

10.00000	3.00
10.48485	2.96

There is a lot going on here. Make sure you ask one of us if some of the code does not make sense!

Again, note that the true parameter combination (asterisk) and the one what maximizes the negative log-likelihood (circle) are different.

Conditional Likelihood¶

We can also look at the conditional surfaces (i.e., we look at the slice around whatever the best estimate is for the other parameter):

In [72]:

par(mfrow=c(1,2), bty="n")
plot(b0s, mynll[,ww[2]], type="l", xlab="b0", ylab="NLL")
plot(b1s, mynll[ww[1],], type="l", xlab="b1", ylab="NLL")

Alternatives to Grid Search¶

There are many alternative methods to grid searches. Since we are seeking to minimize an arbitrary function (the negative log likelihood) we typically use a descent method to perform general optimization.

There are lots of options implemented in the optimfunction in R. We won't go into the details of these methods, due to time constraints. However, typically one would most commonly use:

Brent's method: for 1-D search within a bounding box, only
L-BFGS-B (limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm with bounding box constraints): a quasi-Newton method, used for higher dimensions, when you want to be able to put simple limits on your search area.

Maximum Likelihood using `optim()`¶

We can now do the fitting. This involves optimization (to find the appropriate parameter values that achieve the maximum of the likelihood surface above). For this, we will use R's versatile optim() function.

The first argument for optim() is the function that you want to minimize, and the second is a vector of starting values for your parameters (as always, do a?optim). After the main arguments, you can add what you need to evaluate your function (e.g. sigma ). The addtional argument sigma can be "fed" to nll.slr because we use the ... convention when defining it.

In [73]:

fit <- optim(nll.slr, par=c(2, 1), method="L-BFGS-B", ## this is a n-D method
              lower=-Inf, upper=Inf, dat=dat, sigma=sigma)

fit

$par

10.4589351280817
2.96170447551098

$value

58.2247252772924

$counts

function: 12
gradient: 12

$convergence

0

$message

'CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH'

Easy as pie (once you have the recipe)! We can also fit sigma as the same time if we want:

In [74]:

fit <- optim(nll.slr, par=c(2, 1, 5), method="L-BFGS-B", ## this is a n-D method
              lower=c(-Inf, -Inf, 0.1), upper=Inf, dat=dat, sigma=NA)
fit$par

10.4589449542964
2.96170371229526
1.62168936031414

The starting values (b0 = 2, b1 = 1, sigma = 5) need to be assigned as we would do for NLLS. Also note that much like NLLS, we have bounded the parameters. The exact starting values are not too important in this case (try changing them see what happens).

Now visualize the fit:

In [75]:

plot(X, Y)
abline(a=fit$par[1], b=fit$par[2], col=2, lwd=2)

Confidence intervals¶

The joint distribution of the MLEs are asymptotically Normally distributed. Given this, if you are minimizing the negative log likelihood (NLL) then the covariance matrix of the estimates is (asymptotically) the inverse of the Hessian matrix. The Hessian matrix evalutes the second derivatives of the NLL (numerically here), which gives us information about the curvature the likelihood. Thus we can use the Hessian to estimate confidence intervals:

In [76]:

fit <- optim(nll.slr, par=c(2, 1), method="L-BFGS-B", hessian=TRUE, lower=-Inf, upper=Inf, dat=dat, sigma=sigma)

fisher_info<-solve(fit$hessian)
est_sigma<-sqrt(diag(fisher_info))
upper<-fit$par+1.96*est_sigma
lower<-fit$par-1.96*est_sigma
interval<-data.frame(value=fit$par, upper=upper, lower=lower)
interval

value	upper	lower
10.458935	11.228565	9.689305
2.961704	3.067705	2.855704

Comparison to fitting with least squares¶

We can, of course, simply fit the model with lest squares using the lm() function:

In [99]:

lmfit<-lm(Y~X)

summary(lmfit)$coeff

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	10.458936	0.32957007	31.73509	1.699822e-23
X	2.961704	0.04539126	65.24834	3.874555e-32

The estimates we get using optim() are almost identical to the estimates that we obtain here, and the standard errors on the intercept and slope are very similar to those we calculated from the Hessian (est_sigma= r est_sigma).

Model Selection¶

You can use AIC or BIC as you did in NLLS using the likelihood you have calculated.

You can also use the Likelihood Ratio Test (LRT).

Exercises ¶

Try MLE fitting for the allometric trait data example above. You will use the same data + functions that you used to practice fitting curves using non-linear least squares methods. You have two options here. The easier one is to convert the power law model to a straight line model by taking a log (explained the Allometry Exercises. Specifically,

(a) Using the nll.slr function as an example, write a function that calculates the negative log likelihood as a function of the parameters describing your trait and any additional parameters you need for an appropriate noise distribution (e.g., $\sigma$ if you have normally distributed errors).

(b) For at least one of your parameters plot a likelihood profile given your data, with the other parametes fixed.

(c) Use the optim function to find the MLE of the same parameter and indicate this on your likelihood profile.

(d) Obtain a confidence interval for your estimate.

A more challenging option is to fit the allometry data directly to the power law equation. You would need to assume a log-normal distribution for the errors instead of normal, in this case.

Fitting Models the Bayesian way¶

Recall from the lectures that for Bayesian model fitting/inference, we need to:

Assess MCMC convergence: MCMC is family of algorithm for sampling probability distributions so that it can be adequately characterized (in the Bayesian context the posterior distribution). The MCMC procedure reaches convergence once we have sufficient random draws from the posterior distribution. To assess convergence we look at trace plots. The goal is to get "fuzzy caterpillars"-looking curves.
Summarize MCMC draws: Summarize and visualize outcome of the random draws using histograms for all draws for each parameter, and calculate expectation, variance, credibility interval, etc.
Prior Sensitivity: Assess prior sensitivity by changing prior values and check whether it affects the results or not. If it does, that means that the results are too sensitive to that prior, not good!
Make inferences: We use the values from item (2) to make inferences and answer the research question.

Because likelihoods form the basis for Bayesian model fitting, we will first do an exercise to understand their calculation.

A Likelihoods exercise¶

The Binomial Distribution¶

The Binomial distribution is used to model the number of "successes" in a set of trials (e.g., number of heads when you flip a coin $N$ times). The pmf is

${N \choose x} p^x(1-p)^{N-x}$ such that

$\mathrm{E}[x]=Np$ . Throughout this "experiment", you will assume that your experiment consists of flipping 20 coins, so that

$N=20$ .

Let's use the Binomial distribution to practice two methods of estimating parameters for a probability distribution: method of moments and maximum likelihood.

Simulating from the Binomial using R¶

First take 50 draws from a binomial (using rbinom) for each $p\in$ 0.1, 0.5, 0.8 with $N=20$ . For this, lets set seed so that we can reproduce this exact sequence of sampling (why?):

In [1]:

set.seed(54321)

In [2]:

## 50 draws with each p 
pp<-c(0.1, 0.5, 0.8)
N<-20
reps<-50

Now plot the histograms of these draws together with the density functions.

In [3]:

## histograms + density here
x<-seq(0, 50, by=1)
par(mfrow=c(1,3), bty="n")

# Write more code here

Q1: Do the histograms look like the distributions for all 3 values of $p$ ? If not, what do you think is going on?

You'll notice that for $p=0.1$ the histogram and densities don't look quite the same -- the hist() function is lumping together the zeros and ones which makes it look off. This is typical for distributions that are truncated.

Method of Moments (MoM) Estimators¶

To obtain a method of moments estimator, we equate the theoretical moments (which will be a function of the parameters of the distribution) with the corresponding sample moments, and solve for the parameters in order to obtain an estimate. For the binomial distribution, there is only one parameter, $p$ .

Q2: Given the analytic expected value, above, and assuming that the sample mean is $m$ (the mean number of observed heads across replicates), what is the MoM estimator for $p$ ?

Now calculate the MoM estimator for each of your 3 sets of simulated data sets to get the estimates for each of your values of $p$ .

In [5]:

## MOM estimators for 3 simulated sets

Q3: How good are your estimates for $p$ ? Do you get something close to the true value?

For 1 of your values of $p$ , take 20 draws from the binomial with $N=20$ and calculate the MoM. Repeat this 100 times (hint: the replicate() and lapply functions may be useful.) Make a histogram of your estimates, and add a line/point to the plot to indicate the real value of $p$ that you used to simulate the data.

In [ ]:

## MoM estimates, histogram

Q4: Is the MoM successfully estimating $p$ ? Does your histogram for $p$ look more or less normal? If so, what theorem might explain why this would be the case?

MLE for Binomial Distribution¶

Likelihood and Log Likelihood¶

Imagine that you flip a coin $N$ times, and then repeat the experiment $n$ times. Thus, you have data $x=x`1, x`2, \dots x`n$ that are the number of times you observed a head in each trial. $p$ is the probability of obtaining a head.

Q5: Write down the likelihood and log-likelihood for the data. Take the derivative of the negative log-likelihood, set this equal to zero, and find the MLE, $\hat{p}$ .

Computing the likelihood and MLE in R¶

Simulate some data with $p=0.25$ , $N=10$ , and 10 replicates. Calculate the negative log-likelihood of your simulated data across a range of $p$ (from 0 to 1), and plot them. You may do this by using the built in functions in R (specifically dbinom) or write your own function. This is called a "likelihood profile''. Plot your likelihood profile with a line indicating the true value of $p$ . Add lines indicating the MLE $\hat{p}$ and the MoM estimator for $p$ to your likelihood profile.

In [4]:

pp<-.25
N<-10
reps<-10
## Make one set of data

## the likelihood is always exactly zero
## at p=0,1, so I skip those values
ps<-seq(0.01, 0.99, by=0.01) 

## Likelihood


## MLE/MoM estimators 

## now plot the negative log likelihood profile

Q6: How does your MLE compare to the true parameter value? How could you estimate the MLE from the likelihood profile if you didn't have a way to calculate the MLE directly? If you chose another version of the random seed, do you get the same answer?

Example: Midge Wing Length¶

We will use this simple example to go through the steps of assessing a Bayesian model and we'll see that MCMC can allow us to approximate the posterior distribution.

Grogan and Wirth (1981) provide data on the wing length (in millimeters) of nine members of a species of midge (small, two-winged flies).

From these measurements we wish to make inference about the population mean $\mu$ .

In [1]:

WL.data <- read.csv("../data/MidgeWingLength.csv")
Y <- WL.data$WingLength
n <- length(Y)

hist(Y,breaks=10,xlab="Wing Length (mm)")

Non-Bayesian analysis¶

We might expect that these midge data could be draws from a Normal distribution $\mathcal{N}(\mu, \sigma^2)$ . Recall that the MLEs for $\mu$ and $\sigma^2$ here are simply the sample mean and sample variance respectively:

In [2]:

m<-sum(Y)/n
s2<-sum((Y-m)^2)/(n-1)
round(c(m, s2), 3)

1.804
0.017

In [3]:

x<-seq(1.4,2.2, length=50)
hist(Y,breaks=10,xlab="Wing Length (mm)", xlim=c(1.4, 2.2), freq=FALSE) 
lines(x, dnorm(x, mean=m, sd=sqrt(s2)), col=2)

NOTE: I've plotted the estimate of the population distribution here, but this is not the *predictive distribution* (which would be a Student T because we're estimating both the mean and variance...).

The non-Bayesian version here has the advantage of being quick and familiar. However, from our point of view it has two weaknesses:

Because we have so few data points estimates of the accuracy of our predictions aren't available. 9 points is only barely enough to estimate a mean, so we don't trust any of the variance calculations.
We can't easily incorporate things that we might already know about midges into our analysis.

Let's see how we can do a similar analysis using a Bayesian approach, first analytically, and the with JAGS.

Setting up the Bayesian Model¶

We need to define the likelihood and the priors for our Bayesian analysis. Given the analysis that we've just done, let's assume that our data come from a normal distribution with unknown mean, $\mu$ but that we know the variance is $\sigma^2 = 0.025$ . That is:

$\mathbf{Y} \stackrel{\mathrm{iid}}{\sim} \mathcal{N}(\mu, 0.025^2)$

Prior Information¶

Studies from other populations suggest that wing lengths are usually around 1.9 mm, so we set $\mu_0 = 1.9$

We also know that lengths must be positive ( $\mu >0$ )

We can approximate this restriction with a normal prior distribution for $\mu$ as follows:

Since most of the normal density is within two standard deviations of the mean we choose $\tau^2_0$ so that

$\mu_0 - 2\sigma_0 >0 \Rightarrow \sigma_0 <1.9/2 = 0.95$

I will choose $\sigma_0=0.8$ here. Thus our prior for mu will be:

$\mu \sim \mathcal{N}(1.9, 0.8^2)$

Together, then, our full model is:

$\begin{align*} \mathbf{Y} & \stackrel{\mathrm{iid}}{\sim} \mathcal{N}(\mu, 0.025^2)\\ \mu &\sim \mathcal{N}(1.9, 0.8^2) \end{align*}$

Analytic Posterior¶

For this very simple case it is easy to write down the posterior distribution (up to some constant). First, note that the likehood for the data can be written as

$\begin{align*} \mathcal{L} &\propto \prod_{i=1}^n \frac{1}{\sigma} \exp\left(-\frac{1}{2\sigma^2}(Y_i-\mu)^2 \right) \\ & = \frac{1}{\sigma^n} \exp\left(-\frac{1}{2\sigma^2}\sum_{i=1}^n (Y_i-\mu)^2 \right)\\ & \propto \exp\left(-\frac{n}{2\sigma^2} (\bar{Y}-\mu)^2 \right) \end{align*}$

Multiplying the prior through we get the following for the posterior:

$\mathrm{P}(\mu|\mathbf{Y}) \propto \exp \left(-\frac{n}{2\sigma^2} (\bar{Y}-\mu)^2 \right) \exp\left(-\frac{1}{2\sigma_0^2}(\mu-\mu_0)^2 \right)$

You can re-arrange, complete the square, etc, to get a new expression that is like

$\mathrm{P}(\mu|\mathbf{Y}) \propto \exp \left(-\frac{1}{2\sigma_p^2} (\mu_p-\mu)^2 \right)$

where

$\begin{align*} \mu_p & = \frac{n\sigma_0^2}{\sigma^2 + n\sigma_0^2} \bar{Y} + \frac{\sigma^2}{\frac{\sigma^2}{n} + \sigma_0^2} \mu_0\\ & \\ \sigma_p^2 & = \left( \frac{n}{\sigma^2} + \frac{1}{\sigma_0^2} \right)^{-1} \end{align*}$

Instead of writing this last in terms of the variances, we could instead use precision (the inverse variance) which gives a simpler expression:

$\tau_p = n\tau + \tau_0$

Just like in our earlier example, our estimate of the mean is a weighted average of the data and the prior, with the variance being determined by the data and prior variances.

So lets write a little function to calculate $\mu_p$ and $\tau_p$ and the plug in our numbers:

In [7]:

tau.post<-function(tau, tau0, n){n*tau + tau0}
mu.post<-function(Ybar, mu0, sig20, sig2, n){
  weight<-sig2+n*sig20
  
  return(n*sig20*Ybar/weight + sig2*mu0/weight)
}

Let's plot 3 things together -- the data histogram, the prior, and the posterior:

In [8]:

mu0 <- 1.9
s20 <- 0.8
s2<- 0.025 ## "true" variance

mp<-mu.post(Ybar=m, mu0=mu0, sig20=s20, sig2=s2, n=n)
tp<-tau.post(tau=1/s2, tau0=1/s20, n=n)

Let's plot the result:

In [9]:

x<-seq(1.3,2.3, length=1000)
hist(Y,breaks=10,xlab="Wing Length (mm)", xlim=c(1.3, 2.3),
     freq=FALSE, ylim=c(0,8)) 
lines(x, dnorm(x, mean=mu0, sd=sqrt(s20)), col=2, lty=2, lwd=2) ## prior
lines(x, dnorm(x, mean=mp, sd=sqrt(1/tp)), col=4, lwd=2) ## posterior
legend("topleft", legend=c("prior", "posterior"), col=c(2,4), lty=c(2,1), lwd=2)

Exercise: Prior sensitivity¶

Change the values of the mean and the variance that you choose for the prior ("hyperparameters"). What does this do to the posterior distribution. E.g., what happens if the variance you choose is small, and $\mu_0 =2.5$ or so. Is this what you expect?

Numerical evaluation of the posterior with JAGS¶

Let's show that we can get the same thing from JAGS that we were able to get from the analytic results. You'll need to make sure you have installed JAGS (which must be done outside of R) and then the libraries rjags and coda.

In [11]:

# Load libraries
require(rjags) # does the fitting
require(coda) # makes diagnostic plots
##require(mcmcplots) # another option for diagnostic plots

Specifying the model¶

First we must encode our choices for our data model and priors to pass them to the fitting routines in JAGS. This involves setting up a ${\tt model}$ that includes the likelihood for each data point and a prior for every parameter we want to estimate. Here is an example of how we would do this for the simple model we fit for the midge data (note that JAGS uses the precision instead of the variance or sd for the normal distribution:

In [14]:

model1 <- "model{

  ## Likelihood
  for(i in 1:n){
    Y[i] ~ dnorm(mu,tau)
  }

  ## Prior for mu
  mu        ~ dnorm(mu0,tau0)

} ## close model  
"

Now create the JAGS model:

In [15]:

model <- jags.model(textConnection(model1), 
                    n.chains = 1, ## usually do more
                    data = list(Y=Y,n=n, ## data
                                mu0=mu0, tau0=1/s20, ## hyperparams
                                tau = 1/s2 ## known precision
                                ),
                    inits=list(mu=3) ## setting an starting val
                    )

Error in jags.model(textConnection(model1), n.chains = 1, data = list(Y = Y, : could not find function "jags.model"
Traceback:

Now we'll run the MCMC and, see how the output looks for a short chain with no burnin:

In [ ]:

samp <- coda.samples(model, 
        variable.names=c("mu"), 
        n.iter=1000, progress.bar="none")

plot(samp)

MCMC is a rejection algorithm that often needs to converge or "burn-in" -- that is we need to potentially move until we're taking draws from the correct distribution. Unlike for optimization problems, this does not mean that the algorithm :eads toward a single value. Instead we're looking for a pattern where the draws are seemingly unrelated and random. To assess convergence we look at trace plots, the goal is to get traces that look like "fuzzy caterpillars".

Sometimes at the beginning of a run, if we start far from the area near the posterior mean of the parameter, we will instead get something that looks like a trending time series. If this is the case we have to drop the samples that were taken during the burn-in phase. Here's an example of how to do that:

In [ ]:

update(model, 10000, progress.bar="none") # Burnin for 10000 samples

samp <- coda.samples(model, 
        variable.names=c("mu"), 
        n.iter=20000, progress.bar="none")

plot(samp)

This is a very fuzzy caterpillar!

We can also use the summary function to examine the samples generated:

In [ ]:

summary(samp)

Let's compare these draws to what we got with our analytic solution:

In [ ]:

x<-seq(1.3,2.3, length=1000)
hist(samp[[1]], xlab="mu", xlim=c(1.3, 2.3),
     freq=FALSE, ylim=c(0,8), main ="posterior samples") 
lines(x, dnorm(x, mean=mu0, sd=sqrt(s20)), col=2, lty=2, lwd=2) ## prior
lines(x, dnorm(x, mean=mp, sd=sqrt(1/tp)), col=4, lwd=2) ## posterior
legend("topleft", legend=c("prior", "analytic posterior"), col=c(2,4), lty=c(2,1), lwd=2)

It worked!

As with the analytic approach, it's always a good idea when you run your analyses to see how sensitive is your result to the priors you choose. Unless you are purposefully choosing an informative prior, we usually want the prior and posterior to look different.

Estimating the population variance¶

One advantage of the numerical approach is that we can choose almost anything we want for the priors on multiple parameters without worrying if they are conjugate, or if we want to include additional information. For example, let's say that, not, we want to force the mean to be positive (and also the data, perhaps), and concurrently estimate the variance. Here is a possible model.

In [16]:

model2 <- "model{

  # Likelihood
  for(i in 1:n){
    Y[i] ~ dnorm(mu,tau) T(0,) ## truncates at 0
  }

  # Prior for mu
  mu    ~ dnorm(mu0,tau0)

  # Prior for the precision
  tau   ~ dgamma(a, b)

  # Compute the variance
  s2       <- 1/tau
}"

## hyperparams for tau
a   <- 0.01
b   <- 0.01

m2 <- jags.model(textConnection(model2), 
                    n.chains = 1,
                    data = list(Y=Y, n=n,
                               mu0=mu0, tau0=1/s20, ## mu hyperparams
                                a=a, b=b ## tau hyperparams
                                ),
                    inits=list(mu=3, tau=10) ## starting vals
                )

samp <- coda.samples(m2, 
        variable.names=c("mu","s2"), 
        n.iter=1000, progress.bar="none")

plot(samp)

summary(samp)

Now we plot each with their priors:

In [17]:

par(mfrow=c(1,2), bty="n")

hist(samp[[1]][,1], xlab="samples of mu", main="mu")
lines(x, dnorm(x, mean=mu0, sd=sqrt(s20)), 
      col=2, lty=2, lwd=2) ## prior

x2<-seq(0, 200, length=1000)
hist(1/samp[[1]][,2], xlab="samples of tau", main="tau")
lines(x2, dgamma(x2, shape = a, rate = b), 
      col=2, lty=2, lwd=2) ## prior

Error in hist(samp[[1]][, 1], xlab = "samples of mu", main = "mu"): object 'samp' not found
Traceback:

1. hist(samp[[1]][, 1], xlab = "samples of mu", main = "mu")

We also want to look at the joint distribution of $\mu$ and $\sigma^2$ :

In [ ]:

plot(as.numeric(samp[[1]][,1]), samp[[1]][,2], xlab="mu", ylab="s2")

Exercise: Updating the Bayesian model¶

Redo the previous analysis placing a gamma prior on $\mu$ as well. Set the prior so that the mean and variance are the same as in the normal example from above (use moment matching). Do you get something similar?

Aedes data revisited using Bayesian fitting¶

Now let's do some Bayesian model fitting to Aedes thermal performance data. Lets try out the R2jags package for this.

In [90]:

require(R2jags) # fitting
require(coda) # diagnostic plots
set.seed(1234)

Loading required package: R2jags

Attaching package: ‘R2jags’

The following object is masked from ‘package:coda’:

    traceplot

Load the data:

In [91]:

Aaeg.data <- read.csv("../data/AeaegyptiTraitData.csv")

The Data¶

These data are traits from Aedes aegypti mosquitoes measured across temperature in lab experiments. The traits we have data on thermal performance are:

pEA: proportion surviving from egg to adulthood
MDR: mosquito development rate
PDR: parasite development rate (= 1/EIP the extrinsic incubation period)
$\mu$ (mu): death rate (= 1/longevity)

Note that some of the traits come in multiple forms (e.g., $\mu$ and 1/ $\mu$ , PDR and EIP).

Have a look at the data:

In [92]:

head(Aaeg.data)

trait.name	T	trait	ref	trait2	trait2.name
pEA	22	0.90812	Westbrook_Thesis_2010	NA	NA
pEA	27	0.93590	Westbrook_Thesis_2010	NA	NA
pEA	32	0.81944	Westbrook_Thesis_2010	NA	NA
MDR	22	0.09174	Westbrook_Thesis_2010	NA	NA
MDR	27	0.13587	Westbrook_Thesis_2010	NA	NA
MDR	32	0.15823	Westbrook_Thesis_2010	NA	NA

In [93]:

mu.data <- subset(Aaeg.data, trait.name == "mu")
lf.data <- subset(Aaeg.data, trait.name == "1/mu")
par(mfrow=c(1,2), bty="l") 
plot(trait ~ T, data = mu.data, ylab="mu")
plot(trait ~ T, data = lf.data, ylab="1/mu")

Note that the $\mu$ data is u-shaped and the lifespan data is unimodal (hump-shaped).

Since thermal biology theory is based on unimodal thermal responses, we want to fit the trait as lifespan instead of $\mu$ . Thus, we'll need to convert the $\mu$ data to lifespan by taking the inverse. The combined data should have a nice unimodal shape that we can fit a function to:

In [94]:

mu.data.inv <- mu.data # make a copy of the mu data
mu.data.inv$trait <- 1/mu.data$trait # take the inverse of the trait values to convert mu to lifespan
lf.data.comb <- rbind(mu.data.inv, lf.data) # combine both lifespan data sets together 
  
plot(trait ~ T, data = lf.data.comb, ylab="1/mu")

Two thermal performance curve models¶

Most thermal response curves can be reasonably fit using one of two thermal reponses. Traits that respond unimodally but symmetrically to temperature can be fit with a quadratic function:

$B = q (T-T_0) (T-T_m)$

Traits that respond unimodally but asymetrically can be fited with a Briere function:

$B = q T (T-T_0) \sqrt{T_m-T}$

In both models, $T_0$ is the lower thermal limit, $T_m$ is the upper thermal limit (i.e., where the trait value goes to zero on either end), and $q$ scales the elevation of the curve, (and so also the value at the optimum temperature).

The thermal response model file¶

Unlike the previous bayesian \example, here we will provide jags with the model written as a .txt file. THis can be in your working directory, or elsewhere (but then inout the full path to it --- ideally a relative path).

You can either write the text yourself directly to the file, or create it using the sink() function via your R script (see below):

Note that the model file quad.txt has two mandatory sections (the priors and the likelihood) and one optional section (derived measures calculated from your fitted parameters).

In the example below for a quadratic function, most of the priors are specified via uniform distributions (the two arguments specific the lower and upper bounds, respectively). Note that unlike in R and most other programs, in jags, the inverse of the variance of the normal distribution is used, denoted by $\tau (= \frac{1}{\sigma^2}$ ).

The likelihood for can be interpreted as follows: the observed data are normally distributed where the mean at a given temperature follows the quadratic equation.

Now, prepare the data for jags:

In [101]:

# Parameters to Estimate
parameters <- c("cf.q", "cf.T0", "cf.Tm","cf.sigma", "z.trait.mu.pred")

# Initial values for the parameters
inits<-function(){list(
  cf.q = 0.01,
  cf.Tm = 35,
  cf.T0 = 5,
  cf.sigma = rlnorm(1))}

# MCMC Settings: number of posterior dist elements = [(ni - nb) / nt ] * nc
ni <- 25000 # number of iterations in each chain
nb <- 5000 # number of 'burn in' iterations to discard
nt <- 8 # thinning rate - jags saves every nt iterations in each chain
nc <- 3 # number of chains

# Temperature sequence for derived quantity calculations
Temp.xs <- seq(0, 45, 0.2)
N.Temp.xs <-length(Temp.xs)

### Fitting the trait thermal response; Pull out data columns as vectors
data <- lf.data.comb # this lets us reuse the same generic code: we only change this first line
trait <- data$trait
N.obs <- length(trait)
temp <- data$T

# Bundle all data in a list for JAGS
jag.data<-list(trait = trait, N.obs = N.obs, temp = temp, Temp.xs = Temp.xs, N.Temp.xs = N.Temp.xs)

Now run the fitting using jags:

In [102]:

lf.fit <- jags(data=jag.data, inits=inits, parameters.to.save=parameters, 
               model.file="quad.txt", n.thin=nt, n.chains=nc, n.burnin=nb, 
               n.iter=ni, DIC=T, working.directory=getwd())

Warning message in file(modfile, "rt"):
“cannot open file 'quad.txt': No such file or directory”

Error in jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, : Cannot open model file "quad.txt"
Traceback:

1. jags(data = jag.data, inits = inits, parameters.to.save = parameters, 
 .     model.file = "quad.txt", n.thin = nt, n.chains = nc, n.burnin = nb, 
 .     n.iter = ni, DIC = T, working.directory = getwd())
2. jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, 
 .     n.adapt = 0)
3. stop(paste("Cannot open model file \"", modfile, "\"", sep = ""))

Change into "mcmc" type samples for visualization with the coda package:

In [117]:

lf.fit.mcmc <- as.mcmc(lf.fit)

Running diagnostics¶

View the parameters (only the first 5 lines, or it will also show you all of your derived quantities):

In [118]:

lf.fit$BUGSoutput$summary[1:5,]

	mean	sd	2.5%	25%	50%	75%	97.5%	Rhat	n.eff
cf.T0	6.4943265	2.09300384	1.60602110	5.24712819	6.7642097	7.960755	9.8890203	1.011800	450
cf.Tm	39.2489169	1.66643065	36.56998429	38.04932891	39.0173139	40.200410	43.1551656	1.004244	580
cf.q	0.1099374	0.02623226	0.06461983	0.09123192	0.1085517	0.126104	0.1657431	1.005886	440
cf.sigma	6.7100226	0.96797653	5.13196346	6.03242714	6.6036237	7.279794	8.9213322	1.001077	7200
deviance	197.7198535	3.09086985	193.77713702	195.42279657	197.0587314	199.288979	205.4720801	1.001325	3800

Plot the chains:

In [119]:

plot(lf.fit.mcmc[,c(1,3,4)])

Plot the fits¶

In [120]:

plot(trait ~ T, xlim = c(0, 45), ylim = c(0,42), data = lf.data.comb, ylab = "Lifespan for Ae. aegypti", xlab = "Temperature")
lines(lf.fit$BUGSoutput$summary[6:(6 + N.Temp.xs - 1), "2.5%"] ~ Temp.xs, lty = 2)
lines(lf.fit$BUGSoutput$summary[6:(6 + N.Temp.xs - 1), "97.5%"] ~ Temp.xs, lty = 2)
lines(lf.fit$BUGSoutput$summary[6:(6 + N.Temp.xs - 1), "mean"] ~ Temp.xs)

Additional analyses¶

You can use the which.max() function to find the optimal temperature for adult lifespan:

In [121]:

Temp.xs[which.max(as.vector(lf.fit$BUGSoutput$summary[6:(6 + N.Temp.xs - 1), "mean"]))]

23

You can also pull out the lifespan values for each iteration of the MCMC chain over the temperature gradient to calculate $R_0$ :

In [122]:

lf.grad <- lf.fit$BUGSoutput$sims.list$z.trait.mu.pred
dim(lf.grad) # A matrix with 7500 iterations of the MCMC chains at 226 temperatures

7500
226

A Bayesian model fitting of abundance data¶

We will now perform a bayesian analysis of population growth data.

In [10]:

require(R2jags) # does the fitting
require(coda) # makes diagnostic plots
library(IDPmisc) # makes nice colored pairs plots to look at joint posteriors

Loading required package: R2jags
Loading required package: rjags
Loading required package: coda
Linked to JAGS 4.2.0
Loaded modules: basemod,bugs

Attaching package: ‘R2jags’

The following object is masked from ‘package:coda’:

    traceplot

Loading required package: grid
Loading required package: lattice

The Data¶

These data are observations of the amphibian fungal pathogen Batrachochytrium dendrobatidis being grown in liquid culture at multiple different temperatures. The experiment is conducted in 96 well plates with a fixed initial innoculation of fungal spores in each well, and the plate placed in a constant temperature incubator. Each day, 8 wells per plate are observed and the optical density (OD) is measured. We will focus on a single temperature trial across mulitple plates with OD as the response.

We will fit a logistic model to these growth data.

Let's have a look at the data first:

In [11]:

dat <- read.csv("../data/lb_ab_temps.csv")
head(dat)

X	EXP	TEMP	DAY	ISOLATE	PLATE	WELL	OD	NC_AVG	OD_SUB
1	2	5	1	LB_AB	P.16	1	0.120	0.1195	0.0005
2	2	5	1	LB_AB	P.16	2	0.120	0.1195	0.0005
3	2	5	1	LB_AB	P.16	3	0.122	0.1195	0.0025
4	2	5	1	LB_AB	P.16	4	0.123	0.1195	0.0035
5	2	5	1	LB_AB	P.16	5	0.125	0.1195	0.0055
6	2	5	1	LB_AB	P.16	6	0.125	0.1195	0.0055

We are only interested in a subset of these data, so we will subset out only those from experiment 2, and a temperature of 12 $^\circ$ C.

In [12]:

d2<-dat[which(dat$EXP==2),2:8]
d2<-d2[which(d2$TEMP==12),]
summary(d2)

      EXP         TEMP         DAY         ISOLATE        PLATE    
 Min.   :2   Min.   :12   Min.   : 1.00   LB_AB:730   P.11   :160  
 1st Qu.:2   1st Qu.:12   1st Qu.: 9.00               P.12   :150  
 Median :2   Median :12   Median :19.00               P.13   :150  
 Mean   :2   Mean   :12   Mean   :19.04               P.14   :140  
 3rd Qu.:2   3rd Qu.:12   3rd Qu.:29.00               P.15   :130  
 Max.   :2   Max.   :12   Max.   :39.00               P.1    :  0  
                                                      (Other):  0  
      WELL             OD        
 Min.   :1.000   Min.   :0.0930  
 1st Qu.:2.000   1st Qu.:0.1630  
 Median :4.000   Median :0.2580  
 Mean   :4.315   Mean   :0.2437  
 3rd Qu.:6.000   3rd Qu.:0.3170  
 Max.   :8.000   Max.   :0.4600

Now plot it:

In [13]:

Temps<-seq(0,max(d2$DAY)-1, by=0.05)
mycol<-1   
my.ylim<-c(0, 0.5)
my.title<-"LB-AB isolate, T=12C"

plot(d2$DAY-1, d2$OD, xlim=c(0,max(Temps)), ylim=my.ylim,
    pch=(mycol+20),
    xlab="time (days)", ylab="",
    main=my.title,
    col=mycol+1, cex=1.5)

Specifying the growth curve¶

Although logistic growth is often written as a differential equation, here we will work with the analytic solution of the model:

$\mu(t) = \frac{KY_0}{Y_0+(K-Y_0)\exp{(-rt)}}$

This gives the mean function that we want to fit. We will assume log-normal noise around this response, as the optical density is bounded to be greater than 0 and since we also have increasing variance over time (as the optical density increases).

The thermal response model file¶

JAGS needs the model written as a .txt or .bug file inside the working directory. You can either make the text file directly, or create it using the sink() function in your R script, as follows:

Note that the model file has two mandatory sections (the priors and the likelihood) and one optional section (derived quantiaties calculated from your fitted parameters).

In the example below we will build the model function with the log-normal likelihood for the logistic growth function. Priors are a combination of uniform and exponential distributions. As with the normal distribution, jags uses $\tau$ to parameterize the variance of the normal distribution ( $\tau = 1/(\sigma^2)$ ). However it can be easier to specify the prior on sigma directly. In this example we will generate posterior samples of derived quantities outside of JAGS (so you can see what this is actually doing).

Additional settings for jags¶

Now for some additional settings/specifications for jags:

In [14]:

# Parameters to Estimate
parameters <- c('Y0', 'K', 'r', 'sigma')

# Initial values for the parameters
inits<-function(){list(
  Y0 = 0.1,
  K = 0.4,
  r = 0.1,
  sigma = rlnorm(1))}

# MCMC Settings: number of posterior dist elements = [(ni - nb) / nt ] * nc
ni <- 6000 # number of iterations in each chain
nb <- 1000 # number of 'burn in' iterations to discard
nt <- 1 # thinning rate - jags saves every nt iterations in each chain
nc <- 5 # number of chains

Fitting the model¶

Now we can run jags:

In [15]:

# Pull out data columns as vectors
data <- d2 # this lets us reuse the same generic code: we only change this first line
Y <- data$OD
N <- length(Y)
t <- data$DAY

# Bundle all data in a list for JAGS
jag.data<-list(Y = Y, N = N, t = t)

# Run JAGS
OD.12C <- jags(data=jag.data, inits=inits, parameters.to.save=parameters, 
               model.file="jags-logistic.bug", n.thin=nt, n.chains=nc, n.burnin=nb, 
               n.iter=ni, DIC=T, working.directory=getwd())

module glm loaded

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 730
   Unobserved stochastic nodes: 4
   Total graph size: 1598

Initializing model

Change into "mcmc" type samples for visualization with coda:

In [16]:

OD.12C.mcmc<-as.mcmc(OD.12C)

Diagnostics¶

As you did in the Traits bayesian fitting example, there are a number of model diagnostics that we need to check. First we want to look at the chains and confirm that they look like "fuzzy caterpillars" -- no linear/non-linear patterns across the chains, low auto-correlation, etc.

First view the fitted parameters:

In [17]:

OD.12C$BUGSoutput$summary

	mean	sd	2.5%	25%	50%	75%	97.5%	Rhat	n.eff
K	3.659490e-01	0.0055513205	3.552925e-01	3.622170e-01	3.658641e-01	3.696170e-01	3.772298e-01	1.001300	7500
Y0	9.051409e-02	0.0004844611	9.001452e-02	9.015819e-02	9.037434e-02	9.072057e-02	9.179751e-02	1.001085	16000
deviance	-2.823094e+03	6.0822395330	-2.832126e+03	-2.827586e+03	-2.824031e+03	-2.819654e+03	-2.808733e+03	1.001097	15000
r	1.051490e-01	0.0023217883	1.005654e-01	1.035754e-01	1.051368e-01	1.067121e-01	1.097114e-01	1.001249	8600
sigma	1.563601e-01	0.0041764147	1.484153e-01	1.535103e-01	1.562650e-01	1.591003e-01	1.647488e-01	1.001038	22000

Plot the chains using the coda package:

In [18]:

plot(OD.12C.mcmc[,c(1,2,4)])

We can examine the ACF of the chains as well, similarly to a time series:

In [40]:

s1<-as.data.frame(OD.12C.mcmc[[1]])
par(mfrow=c(2,2))
for(i in 2:5) acf(s1[,i], lag.max=20)

There is still a bit of autocorrelation, but it isn't too bad. The chain for $\sigma$ is mixing best. We could reduce the autocorrelation even further by thinning the chain (i.e., change the nt parameter to 5 or 10).

The last important diagnostic is to compare the prior and posterior distributions. Various packages in R have bespoke functions to do this. Here we use functions that we provide in the mcmc_utils.R file provided on the website.

In [41]:

source("../code/mcmc_utils.R")

We also can write a function to put the samples into a convenient format for visualizing, etc:

In [42]:

samps<-NULL
for(i in 1:nc){
  samps<-rbind(samps, as.data.frame(OD.12C.mcmc[[i]]))
}

samps<-samps[,c(5,2,3,4)]

And also, we can building a list to hold all the information about the priors for each parameter:

In [43]:

priors<-list()
priors$names<-c("Y0", "K", "r","sigma")
priors$fun<-c("uniform", "uniform", "exp","exp")
priors$hyper<-matrix(NA, ncol=4, nrow=3)
priors$hyper[,1]<-c(0.09, 0.15, NA)
priors$hyper[,2]<-c(0.01,  0.6, NA)
priors$hyper[,3]<-c(1000, NA, NA) 
priors$hyper[,4]<-c(0.1, NA, NA)

Now we can plot the histograms of the posterior samples together with the prior distributions:

In [44]:

plot.hists(samps, my.par=c(2,2), n.hists=4, priors=priors, mai=c(0.5, 0.5, 0.25, 0.2))

The prior distribution here is very different from the posterior. These data are highly informative for the parameters of interest and are very unlikely to be influenced much by the prior distribution (although you can always change the priors to check this). However, notice that $Y_0$ (the initial condition) is truncated by the prior. This is a fairly strong prior, because we know something about the initial optical density that is typical for the esperimental set up with the density of innoculum used and with a properly calibrated set-up.

Visualizing the joint posterior of parameters¶

It's often useful to also look at the joint distbution of all of your parameters together. Of course, if you have a high dimensional posterior, rendering a 2-D representation can be difficult. Instead, the standard is to examine the pair-wise posterior distribution, for instance as follows (using the s1 data frame we created above):

In [46]:

ipairs(s1[,2:5], ztransf = function(x){x[x<1] <- 1; log2(x)})

As you can see, estimates of $r$ and $K$ are highly correlated -- not surprising given the interplay between them in the logistic growth function. This correlation is an important feature of the system, and we use the full posterior distribution that includes this correlation when we want to build the corresponding posterior distribution of the behavior of the logistic function.

The posterior distribution of the mean function¶

The final step is to check how well we are fitting the data. To do this we usually examine the posterior distribution of the mean function of our system, in this case the distribution of the logistic solution and compare this to the data. To do this, for each of our posterior samples (or a thinned subset), we plug the parameters for the $i^{\mathrm th}$ sample $\theta_i$ into our function of interest, and evaluate the function as a desired set of $x$ 's. For instance, for logistic growth, we'll evaluate

$\mu(t) = \frac{K_iY_{0,i}}{Y_{0,i}+(K_i-Y_{0,i})\exp{(-r_it)}}$ for the

$i^{\mathrm th}$ set of parameters for a sequence of times,

$t$ . This we obtain points describing the curve

$\mu_i(t)$ for each set of parameters. Here is one way to do this:

In [48]:

my.logistic<-function(t, Y0, K, r){
  return(K*Y0/(Y0+(K-Y0)*exp(-r*t)))
}

ts<-seq(0, 40, length=100)
ss<-seq(1, dim(samps)[1], by=10)
my.curves<-matrix(NA, nrow=length(ss), ncol=length(ts))

for(i in 1:length(ss)){
  my.curves[i,]<-my.logistic(t=ts, Y0=samps$Y0[i], K=samps$K[i], r=samps$r[i])
}

We can now plot all of these curves:

In [49]:

plot(ts, my.curves[1,], col=1, type="l", ylim=c(0.09, 0.36), 
     ylab="predicted OD", xlab="time (days)")
for(i in 2:length(ss)) lines(ts, my.curves[i,], col=i)

Then we can summarize this posterior using the apply function to find the mean and the (for simplicity) quantile based 95% CI:

In [50]:

m.log<-apply(my.curves, 2, mean)
l.log<-apply(my.curves, 2, quantile, probs=0.025)
u.log<-apply(my.curves, 2, quantile, probs=0.975)

For comparison, here is how to find the 95% HPD Interval across time, using the HPDinterval function from the coda package:

In [51]:

hpd.log<-NULL
for(i in 1:length(ts)){
  hpd.log<-cbind(hpd.log, as.numeric(HPDinterval(mcmc(my.curves[,i]))))
}

And plot these together with the data (in this case the HPD and quantile based intervals are indistinguishable):

In [52]:

my.ylim<-c(0.09, 0.45)
my.title<-"LB-AB isolate, T=12C"

plot(d2$DAY-1, d2$OD, xlim=c(0,max(Temps)), ylim=my.ylim,
    pch=(mycol+20),
    xlab="time (days)", ylab="",
    main=my.title,
    col="grey", cex=1.5)
lines(ts, m.log, col=1, lwd=2)
lines(ts, l.log, col=2, lwd=2, lty=2)
lines(ts, u.log, col=2, lwd=2, lty=2)

lines(ts, hpd.log[1,], col=3, lwd=2, lty=3)
lines(ts, hpd.log[2,], col=3, lwd=2, lty=3)

Note that this only shows the uncertainty in the mean function -- the assumed model with log normal noise says that the observations simply have this mean. The fit is attributing the majority of the observed noise to process error rather than parameter uncertainty.

Readings and Resources ¶

Motulsky, Harvey, and Arthur Christopoulos. Fitting models to biological data using linear and nonlinear regression: a practical guide to curve fitting. OUP USA, 2004.
Johnson, J. B. & Omland, K. S. 2004 Model selection in ecology and evolution. Trends Ecol. Evol. 19, 101–108.
The NCEAS non-linear modelling working group
Mixed-Effects Models in S and S-PLUS
Bolker, B. Ecological models and data in R. (Princeton University Press, 2008).

Model Fitting in (Vector-Borne Disease) Ecology and Evolution

Contents

Introduction¶

Model fitting using Non-linear least squares¶

Traits data as an example¶

Allometric scaling of traits¶

Exercises ¶

Comparing models¶

Exercises ¶

Albatross chick growth¶

Fitting the three models using NLLS¶

Exercises ¶

Aedes aegypti fecundity¶

The TPC models¶

Exercises ¶

Abundances as an example¶

Why Abundances?¶

Population growth rate example¶

Exercises¶

Model fitting using Maximum Likelihood¶

Implementing the Likelihood in R¶

Likelihood profile in R¶

Likelihood surface in R¶

Conditional Likelihood¶

Alternatives to Grid Search¶

Maximum Likelihood using optim()¶

Confidence intervals¶

Comparison to fitting with least squares¶

Model Selection¶

Exercises ¶

Fitting Models the Bayesian way¶

A Likelihoods exercise¶

The Binomial Distribution¶

Simulating from the Binomial using R¶

Method of Moments (MoM) Estimators¶

MLE for Binomial Distribution¶

Likelihood and Log Likelihood¶

Computing the likelihood and MLE in R¶

Example: Midge Wing Length¶

Non-Bayesian analysis¶

Setting up the Bayesian Model¶

Prior Information¶

Analytic Posterior¶

Exercise: Prior sensitivity¶

Numerical evaluation of the posterior with JAGS¶

Specifying the model¶

Estimating the population variance¶

Exercise: Updating the Bayesian model¶

Aedes data revisited using Bayesian fitting¶

The Data¶

Two thermal performance curve models¶

The thermal response model file¶

Running diagnostics¶

Plot the fits¶

Additional analyses¶

A Bayesian model fitting of abundance data¶

The Data¶

Specifying the growth curve¶

The thermal response model file¶

Additional settings for jags¶

Fitting the model¶

Diagnostics¶

Visualizing the joint posterior of parameters¶

The posterior distribution of the mean function¶

Readings and Resources ¶

Maximum Likelihood using `optim()`¶