< | Main Contents | >
optim()
# For starters, clear all variables and graphic devices and load necessary packages:
rm(list = ls())
graphics.off()
library(repr)
options(repr.plot.width=5, repr.plot.height=4) # Change default plot size; not necessary if you are using Rstudio
We will work with several practical examples here. These assume that you have at least a conceptual understanding of what Linear vs Non-linear models are, how they are fitted to data, and how the fits can be assessed statistically. If not, you may want to see the Linear Models (video here) and NLLS lectures first.
You will need the nls.lm
R package, which you can install using the standard method (linux users, launch R in sudo
mode first):
> install.packages("minpack.lm")
Why nls.lm
? The standard NLLS function in R, called nls
, uses a less robust algorithm called the Gauss-Newton algorithm. Therefore, nls
will often fail to fit your model to the data if you start off at starting values for the parameters that are too far from what the optimal values would be, especially if the "parameter space" is weirdly shaped, i.e., the model has a mathematical form that makes it hard to find parameter combinations that minimize the residual sum of squared (RSS). If this does not makes sense, don't worry about it- just go with nls_LM
from the nls.lm
package instead of nls
! If you are really curious, try substituting nls
for nls_LM
in the examples below and compare the results.
Now load the minpack.lm
package:
require("minpack.lm") # for Levenberg-Marquardt nlls fitting
Loading required package: minpack.lm
Our first set of examples will focus on traits.
A trait is any measurable feature of an individual organism. This includes physical traits (e.g., morphology, body mass, wing length), performance traits (e.g., respiration rate, body velocity, fecundity), and behavioral traits (e.g., feeding preference, foraging strategy, mate choice). All natural populations show variation in traits across individuals. A trait is functional when it directly (e.g., mortality rate) or indirectly (e.g., somatic development or growth rate) determines individual fitness. Therefore, variation in (functional) traits can generate variation in the rate of increase and persistence of populations. When measured in the context of life cycles, without considering interactions with other organisms (e.g., predators or prey of the vector), functional traits are typically called life history traits (such as mortality rate and fecundity). Other traits determine interactions both within the vector population (e.g., intra-specific interference or mating frequency) and between vectors and other species, including the species which may act as resources (prey, for example). Thus both life history and interaction traits determine vector population fitness and therefore abundance, which ultimately influences disease transmission rate.
Let's start with a common and reasonably simple example from biology: allometric scaling. Allometric relationships between linear measurements such as body length, wing span, and thorax width are a good way to obtain estimates of body weights of individual vectors such as mosquitoes, ticks, and flies. We will look at allometric scaling of body weight vs. total body length in dragonflies and damselfiles (yes, these are not known vectors!).
Allometric relationships take the form:
where x and y are morphological measures (body length and body weight respectively, in our current example), the constant is the value of y at body length x=1 unit, and b is the scaling "exponent". This is also called a power-law, because y relates to x through a simple power.
First create a function object for the power law model:
powMod <- function(x, a, b) {
return(a * x^b)
}
Now get the data (first click on link and use "Save as" or Ctrl+S
to download it as a csv). Then, save it in your data
directory. After that, import it into your R workspace:
MyData <- read.csv("../data/GenomeSize.csv")
head(MyData)
Suborder | Family | Species | GenomeSize | GenomeSE | GenomeN | BodyWeight | TotalLength | HeadLength | ThoraxLength | AdbdomenLength | ForewingLength | HindwingLength | ForewingArea | HindwingArea | MorphologyN |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Anisoptera | Aeshnidae | Aeshna canadensis | 2.20 | NA | 1 | 0.159 | 67.58 | 6.83 | 11.81 | 48.94 | 45.47 | 45.40 | 369.57 | 483.61 | 2 |
Anisoptera | Aeshnidae | Aeshna constricta | 1.76 | 0.06 | 4 | 0.228 | 71.97 | 6.84 | 10.72 | 54.41 | 46.00 | 45.48 | 411.15 | 517.38 | 3 |
Anisoptera | Aeshnidae | Aeshna eremita | 1.85 | NA | 1 | 0.312 | 78.80 | 6.27 | 16.19 | 56.33 | 51.24 | 49.47 | 460.72 | 574.33 | 1 |
Anisoptera | Aeshnidae | Aeshna tuberculifera | 1.78 | 0.10 | 2 | 0.218 | 72.44 | 6.62 | 12.53 | 53.29 | 49.84 | 48.82 | 468.74 | 591.42 | 2 |
Anisoptera | Aeshnidae | Aeshna umbrosa | 2.00 | NA | 1 | 0.207 | 73.05 | 4.92 | 11.11 | 57.03 | 46.51 | 45.97 | 382.48 | 481.44 | 1 |
Anisoptera | Aeshnidae | Aeshna verticalis | 1.59 | NA | 1 | 0.220 | 66.25 | 6.48 | 11.64 | 48.13 | 45.91 | 44.91 | 400.40 | 486.97 | 1 |
Anisoptera are dragonflies, and Zygoptera are Damselflies. The variables of interest are BodyWeight
and TotalLength
. Let's use the dragonflies data subset.
So subset the data accordingly and remove NAs:
Data2Fit <- subset(MyData,Suborder == "Anisoptera")
Data2Fit <- Data2Fit[!is.na(Data2Fit$TotalLength),] # remove NA's
Plot it:
plot(Data2Fit$TotalLength, Data2Fit$BodyWeight)
Or, using ggplot
:
library("ggplot2")
ggplot(Data2Fit, aes(x = TotalLength, y = BodyWeight)) + geom_point()
Remember, when you write this analysis into a stand-alone R script, you should put all commands for loading packages (library()
, require()
) at the start of the script.
Now fit the model to the data using NLLS:
PowFit <- nlsLM(BodyWeight ~ powMod(TotalLength, a, b), data = Data2Fit, start = list(a = .1, b = .1))
Before proceeding further, have a look at what nlsLM's arguments are:
?nlsLM
Note that NLLS fitting requires "starting values" for the parameters (two in this case: a
and b
).
Having obtained the fit, we can use summary()
just like we would for a lm()
fit object.
summary(PowFit)
Formula: BodyWeight ~ powMod(TotalLength, a, b) Parameters: Estimate Std. Error t value Pr(>|t|) a 3.941e-06 2.234e-06 1.764 0.083 . b 2.585e+00 1.348e-01 19.174 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.02807 on 58 degrees of freedom Number of iterations to convergence: 39 Achieved convergence tolerance: 1.49e-08
Most of the output is analogous to the output of an lm()
. However, further statistical inference here cannot be done using Analysis of Variance (ANOVA), because the model is not a Linear Model. Try anova(PowFit)
, and see what happens. The Number of iterations to convergence
, and Achieved convergence tolerance
stem from the fact that NLLS fitting requires computer simulations; revisit the Lecture for an explanation of this.
Now let's visualize the fit. For this, first we need to generate a vector of body lengths (the x-axis variable) for plotting:
Lengths <- seq(min(Data2Fit$TotalLength),max(Data2Fit$TotalLength),len=200)
Next, calculate the predicted line. For this, we will need to extract the coefficient from the model fit object using the coef()
command.
coef(PowFit)["a"]
coef(PowFit)["b"]
So, we can do the following:
Predic2PlotPow <- powMod(Lengths,coef(PowFit)["a"],coef(PowFit)["b"])
Now plot the data and the fitted model line:
plot(Data2Fit$TotalLength, Data2Fit$BodyWeight)
lines(Lengths, Predic2PlotPow, col = 'blue', lwd = 2.5)
We can claculate the confidence intervals on the estimated parameters as we would in OLS fitting used for Linear Models:
confint(PowFit)
Waiting for profiling to be done...
2.5% | 97.5% | |
---|---|---|
a | 1.171935e-06 | 1.205273e-05 |
b | 2.318292e+00 | 2.872287e+00 |
As you likely have learnt before, a coefficient's CI should not include zero for it to be statistically significant (different from zero).
(a) Make the same plot as above, fitted line and all, in ggplot
, and add (display) the equation you estimated to your new (ggplot) plot. The equation is: Weight=3.94×10−06×Length2.59
(b) Try playing with the starting values, and see if you can "break" the model fitting -- that is, change the starting values till the NLLS fitting does not converge on a solution.
(c) Repeat the model fitting (including a-b above) using the Zygoptera data subset.
(d) There is an alternative (and in fact, more commonly-used) approach for fitting the allometric model to data: using Ordinary Least Squares on bi-logarithamically transformed data. That is, if you take a log of both sides of the allometric equation we get,
log(y)=log(a)+blog(x)This is a straight line equation of the form c=d+bz, where c=log(c), d=log(a), z=log(x), and b is now the slope parameter. So you can use Ordinary Least Squares and the linear models framework (with lm()
) in R to estimate the parameters of the allometric equation.
In this exercise, try comparing the NLLS vs OLS methods to see how much difference you get in the parameter estimates between them. For example, see the methods used in this paper by Cohen et al 2012.
(e) The allometry between Body weight and Length is not the end of the story. You have a number of other linear morphological measurements (HeadLength
, ThoraxLength
, AdbdomenLength
, ForewingLength
, HindwingLength
, ForewingArea
, and HindwingArea
) that can also be investigated. In this exercise, try two lines of investigation (again, repeated separately for Dragonflies and Damselfiles):
(i) How do each of these measures allometrically scale with Body length (obtain estimates of scaling constant and exponent)? (Hint: you may want to use the pairs()
command in R to get an overview of all the pairs of potential scaling relationships.
(ii) Do any of the linear morphological measurements other than body length better predict Body weight? That is, does body weight scale more tightly with a linear morphological measurement other than total body length? You would use model selection here, which we will learn next. But for now, you can just look at and compare the R2 values of the models.
How do we know that there isn't a better or alternative model that adequately explains the pattern in your dataset?
This is important consideration in all data analyses (and more generally, the scientific method!), so you must aim to compare your NLLS model with an one or more alternatives for a more extensive and reliable investigation of the problem.
Let's use model comparison to investigate whether the relationship between body weight and length we found above is indeed allometric. For this, we need an alternative model that can be fitted to the same data. Let's try a quadratic curve, which is of the form:
y=a+bx+cx2This can also capture curvature in data, and is an alternative model to the allometric equation. Note that this mode is linear in its parameters (a linear model), which you can fit to the simply data using your familiar lm()
function:
QuaFit <- lm(BodyWeight ~ poly(TotalLength,2), data = Data2Fit)
And like before, we obtain the predicted values (but this time using the predict.lm
function):
Predic2PlotQua <- predict.lm(QuaFit, data.frame(TotalLength = Lengths))
Now let's plot the two fitted models together:
plot(Data2Fit$TotalLength, Data2Fit$BodyWeight)
lines(Lengths, Predic2PlotPow, col = 'blue', lwd = 2.5)
lines(Lengths, Predic2PlotQua, col = 'red', lwd = 2.5)
Very similar fits, except that the quadratic model seems to deviate a bit from the data at the lower end of the data range. Let's do a proper, formal model comparison now to check which model better-fits the data.
First calculate the R2 values of the two fitted models:
RSS_Pow <- sum(residuals(PowFit)^2) # Residual sum of squares
TSS_Pow <- sum((Data2Fit$BodyWeight - mean(Data2Fit$BodyWeight))^2) # Total sum of squares
RSq_Pow <- 1 - (RSS_Pow/TSS_Pow) # R-squared value
RSS_Qua <- sum(residuals(QuaFit)^2) # Residual sum of squares
TSS_Qua <- sum((Data2Fit$BodyWeight - mean(Data2Fit$BodyWeight))^2) # Total sum of squares
RSq_Qua <- 1 - (RSS_Qua/TSS_Qua) # R-squared value
RSq_Pow
RSq_Qua
Not very useful. In general, R2 is a good measure of model fit, but cannot be used for model selection -- epecially not here, given the tiny difference in R2's.
Instead, as explained in the lecture, we can use the Akaike Information Criterion (AIC):
n <- nrow(Data2Fit) #set sample size
kPow <- length(coef(PowFit)) # get number of parameters in power law model
kQua <- length(coef(QuaFit)) # get number of parameters in quadratic model
AIC_Pow <- n * log((2 * pi) / n) + n + 2 + n * log(RSS_Pow) + 2 * kPow
AIC_Qua <- n * log((2 * pi) / n) + n + 2 + n * log(RSS_Qua) + 2 * kQua
AIC_Pow - AIC_Qua
Of course, as you might have suspected, we can do this using an in-built function in R!
AIC(PowFit) - AIC(QuaFit)
So which model wins? As we had dicussed in the NLLS lecture, a rule of thumb is that a AIC value difference (typically denoted as ΔAIC) > 2 is a acceptable cutoff for calling a winner. So the power law (allometric model) is a better fit here. Read the Johnson & Omland paper for more on model selection in Ecology and Evolution.
(a) Calculate the Bayesian Information Criterion (BIC), also know as the Schwarz Criterion (see your Lecture notes and the Johnson & Omland paper, and use ΔBIC to select the better fitting model.
(b) Fit a straight line to the same data and compare with the allometric and quadratic models.
(c) Repeat the model comparison (incuding 1-2 above) using the Damselflies (Zygoptera) data subset -- does the allometric model still win?
(d) Repeat exercise (e)(i) and (ii) from the above set, but with model comparison (e.g., again using a quadratic as an alternative model) to establish that the relationships are indeed allometric.
(d) Repeat exercise (e)(ii) from the above set, but with model comparison to establish which linear measurement is the best predictor of Body weight.
alb<-read.csv(file="../data/albatross_grow.csv")
alb<-subset(x=alb, !is.na(alb$wt))
plot(alb$age, alb$wt, xlab="age (days)", ylab="weight (g)", xlim=c(0, 100))
Let's fit multiple mdoels to this dataset.
The Von Bertalanffy model is commonly used for modelling the growth of an individual. It's formulation is:
W(t)=ρ(L∞(1−e−Kt)+L0e−Kt)3If we pull out L∞ and define c=L0/L∞ and W∞=ρL3∞ this equation becomes:
W(t)=W∞(1−e−Kt+ce−Kt)3.W∞ is interpreted as the mean asymptotic weight, and c the ratio between the initial and final lengths. This second equation is the one we will fit.
We will compare this model against the classical Logistic growth equation, and a straight line. First, as we did before, let's define the R functions for the two models:
logistic1<-function(t, r, K, N0){
N0*K*exp(r*t)/(K+N0*(exp(r*t)-1))
}
vonbert.w<-function(t, Winf, c, K){
Winf*(1 - exp(-K*t) + c*exp(-K*t))^3
}
For the straight line, we use simply use R's lm()
function, as that is a linear least squares problem. Using NLLS will give (approximately) the same answer, of course. Now fit all 3 models using least squares.
We will scale the data before fitting to improve the stability of the estimates:
scale<-4000
alb.lin<-lm(wt/scale~age, data=alb)
alb.log<-nlsLM(wt/scale~logistic1(age, r, K, N0), start=list(K=1, r=0.1, N0=0.1), data=alb)
alb.vb<-nlsLM(wt/scale~vonbert.w(age, Winf, c, K), start=list(Winf=0.75, c=0.01, K=0.01), data=alb)
Next let's calculate predictions for each of the models across a range of ages.
ages<-seq(0, 100, length=1000)
pred.lin<-predict(alb.lin, newdata = list(age=ages))*scale
pred.log<-predict(alb.log, newdata = list(age=ages))*scale
pred.vb<-predict(alb.vb, newdata = list(age=ages))*scale
And finally plot the data with the fits:
plot(alb$age, alb$wt, xlab="age (days)", ylab="weight (g)", xlim=c(0,100))
lines(ages, pred.lin, col=2, lwd=2)
lines(ages, pred.log, col=3, lwd=2)
lines(ages, pred.vb, col=4, lwd=2)
legend("topleft", legend = c("linear", "logistic", "Von Bert"), lwd=2, lty=1, col=2:4)
Next examine the residuals between the 3 models:
par(mfrow=c(3,1), bty="n")
plot(alb$age, resid(alb.lin), main="LM resids", xlim=c(0,100))
plot(alb$age, resid(alb.log), main="Logisitic resids", xlim=c(0,100))
plot(alb$age, resid(alb.vb), main="VB resids", xlim=c(0,100))
The residuals for all 3 models still exhibit some patterns. In particular, the data seems to go down near the end of the observation period, but none of these models can capture that behavior.
Finally, let's compare the 3 models using a simpler approach than the AIC/BIC one that we used above by calculating adjusted Sums of Squared Errors (SSE's):
n<-length(alb$wt)
list(lin=signif(sum(resid(alb.lin)^2)/(n-2*2), 3),
log= signif(sum(resid(alb.log)^2)/(n-2*3), 3),
vb= signif(sum(resid(alb.vb)^2)/(n-2*3), 3))
The logistic model has the lowest adjusted SSE, so it's the best by this measure. It is also, visually, a better fit.
(a) Use AIC/BIC to perform model selection on the Albatross data as we did for the trait allometry example.
(b) Write this example as a self-sufficient R script, with ggplot istead of base plotting
Now let's actually look at a disease vector example! These data measure the reponse of Aedes aegypti fecundity to temperature.
First load and visualize the data:
aedes<-read.csv(file="../data/aedes_fecund.csv")
plot(aedes$T, aedes$EFD, xlab="temperature (C)", ylab="Eggs/day")
Let's define some models first:
quad1 <- function(T, T0, Tm, c){
c*(T-T0)*(T-Tm)*as.numeric(T<Tm)*as.numeric(T>T0)
}
briere <- function(T, T0, Tm, c){
c*T*(T-T0)*(abs(Tm-T)^(1/2))*as.numeric(T<Tm)*as.numeric(T>T0)
}
Instead of using the inbuilt quadratic function in R, we we define our own to make it easier to choose starting values, and so that we can force the function to be equal to zero above and below the minimum and maximum temperature thresholds (more on this below). The Briere function is a commonly used model for tempoeratuire dependence of insect traits. As in the case of the albatross growth data, we will also compare these two with a strauight line (again, its a linear model, so we can just use lm()
without needing to define a function for it).
Now fit all 3 models using least squares. Although it's not as necessary here (as the data don't have as large values as the albatross example), we will again scale the data first:
scale <- 20
aed.lin <- lm(EFD/scale~T, data=aedes)
aed.quad <- nlsLM(EFD/scale~quad1(T, T0, Tm, c), start=list(T0=10, Tm=40, c=0.01), data=aedes)
aed.br <- nlsLM(EFD/scale~briere(T, T0, Tm, c), start=list(T0=10, Tm=40, c=0.1), data=aedes)
(a) Complete the Aedes data analysis by fitiing model, calculating predictions and then comparing models. Write a single, self-standing script for it. Which model fits best? By what measure?
(b) In this script, use ggplot instead of base plotting.
Fluctuations in the abundance (density) of a disease's vector play a crucial role in its transmission dynamics. This is especially true if vector population densities or their traits change at the same or shorter timescales than the rate of disease transmission. Indeed, most vectors are small ectotherms with short generation times and greater sensitivity to environmental conditions than their (invariably larger, longer-lived, and often, endothermic) hosts. So understanding how vector populations vary over time, space, and with respect to environmental variables such as temperature and preciptation is key. We will look at fitting models to the growth of a single population here. Time series analyses are coever in a separate session.
A population grows exponentially while its abundance is low and resources are not limiting (the Malthusian principle). This growth then slows and eventually stops as resources become limiting. There may also be a time lag before the population growth really takes off at the start. We will focus on microbial (specifically, bacterial) growth rates. Bacterial growth in batch culture follows a distinct set of phases; lag phase, exponential phase and stationary phase. During the lag phase a suite of transcriptional machinery is activated, including genes involved in nutrient uptake and metabolic changes, as bacteria prepare for growth. During the exponential growth phase, bacteria divide at a constant rate, the population doubling with each generation. When the carrying capacity of the media is reached, growth slows and the number of cells in the culture stabilises, beginning the stationary phase.
Traditionally, microbial growth rates were measured by plotting cell numbers or culture density against time on a semi-log graph and fitting a straight line through the exponential growth phase – the slope of the line gives the maximum growth rate (rmax). Models have since been developed which we can use to describe the whole sigmoidal bacterial growth curve.
Let's first generate some "data" on the number of bacterial cells as a function of time that we can play with:
time <- c(0, 2, 4, 6, 8, 10, 12, 16, 20, 24) # timepoints, in hours
log_cells <- c(3.62, 3.62, 3.63, 4.14, 5.23, 6.27, 7.57, 8.38, 8.70, 8.69) # logged cell counts - more on this below
set.seed(1234) # set seed to ensure you always get the same random sequence if fluctuations
data <- data.frame(time, log_cells + rnorm(length(time),sd=.1)) # add some random error
names(data) <- c("t", "LogN")
head(data)
t | LogN |
---|---|
0 | 3.499293 |
2 | 3.647743 |
4 | 3.738444 |
6 | 3.905430 |
8 | 5.272912 |
10 | 6.320606 |
We have added a vector of normally distributed errors to emulate random "sampling errors". Note also that the assumption of normality of errors underlies the statistical analyses of Ordinary NLLS fits, just as it underlies Ordinary Least Squares (your standard linear modelling). In this case, we are talking about log-normality because we are using logged cell counts. Why log them? Because NLLS often converges better if you linearize the data (and correspindingly, the model - see how the models are specified below).
Plot the data:
ggplot(data, aes(x = t, y = LogN)) + geom_point()
We will fit three growth models, all of which are known to fit such population growth data, especially in microbes. These are a modified Gompertz model (Zwietering et. al., 1990), the Baranyi model (Baranyi, 1993) and the Buchanan model (or three-phase logistic model; Buchanan, 1997). Given a set of cell numbers (N) and times (t), each growth model can be described in terms of:
N0: Initial cell culture (Population) density (number of cells per unit volume)
Nmax: Maximum culture density (aka "carrying capacity")
rmax: Maximum growth rate
tlag: Duration of the lag phase before the population starts growing exponentially
First let's specify the model functions:
baranyi_model <- function(t, r_max, N_max, N_0, t_lag){ # Baranyi model (Baranyi 1993)
return(N_max + log10((-1+exp(r_max*t_lag) + exp(r_max*t))/(exp(r_max*t) - 1 + exp(r_max*t_lag) * 10^(N_max-N_0))))
}
buchanan_model <- function(t, r_max, N_max, N_0, t_lag){ # Buchanan model - three phase logistic (Buchanan 1997)
return(N_0 + (t >= t_lag) * (t <= (t_lag + (N_max - N_0) * log(10)/r_max)) * r_max * (t - t_lag)/log(10) +
(t >= t_lag) * (t > (t_lag + (N_max - N_0) * log(10)/r_max)) * (N_max - N_0))
}
gompertz_model <- function(t, r_max, N_max, N_0, t_lag){ # Modified gompertz growth model (Zwietering 1990)
return(N_0 + (N_max - N_0) * exp(-exp(r_max * exp(1) * (t_lag - t)/((N_max - N_0) * log(10)) + 1)))
}
It is important to note that we have written the functions in log (to the base 10 - can also be base 2 or natural log) scale because we want to do the fitting in log scale (both model and data linearized). The interpretation of each of the the estimated/fitted paramters does not change if we take a log of the model's equation.
Now let's generate some starting values for the NLLS fitting. We did not pay much attention to what starting values we used in the above example on fitting an allometric model because the power-law model is easy to fit using NLLS, and starting far from the optimal parameters does not matter too much. Here, we derive the starting values by using the actual data:
N_0_start <- min(data$LogN)
N_max_start <- max(data$LogN)
t_lag_start <- data$t[which.max(diff(diff(data$LogN)))]
r_max_start <- max(diff(data$LogN))/mean(diff(data$t))
Now fit the models:
fit_baranyi <- nlsLM(LogN ~ baranyi_model(t = t, r_max, N_max, N_0, t_lag), data,
list(t_lag=t_lag_start, r_max=r_max_start, N_0 = N_0_start, N_max = N_max_start))
fit_buchanan <- nlsLM(LogN ~ buchanan_model(t = t, r_max, N_max, N_0, t_lag), data,
list(t_lag=t_lag_start, r_max=r_max_start, N_0 = N_0_start, N_max = N_max_start))
fit_gompertz <- nlsLM(LogN ~ gompertz_model(t = t, r_max, N_max, N_0, t_lag), data,
list(t_lag=t_lag_start, r_max=r_max_start, N_0 = N_0_start, N_max = N_max_start))
You might get a warning that one or more of the models generated some NaNs during the fitting procedure for the given data. You can ignore the warning in this case. But not always – sometimes these NaNs mean that the equation is wrongly written, or that it generates NaNs across the whole range of the x-values, in which case the model is inappropriate for these data.
Get the model summaries:
summary(fit_baranyi)
summary(fit_buchanan)
summary(fit_gompertz)
Formula: LogN ~ baranyi_model(t = t, r_max, N_max, N_0, t_lag) Parameters: Estimate Std. Error t value Pr(>|t|) t_lag 5.52751 0.34869 15.85 4.00e-06 *** r_max 1.41823 0.09356 15.16 5.20e-06 *** N_0 3.59209 0.08646 41.55 1.30e-08 *** N_max 8.52886 0.08352 102.12 5.94e-11 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.143 on 6 degrees of freedom Number of iterations to convergence: 10 Achieved convergence tolerance: 1.49e-08
Formula: LogN ~ buchanan_model(t = t, r_max, N_max, N_0, t_lag) Parameters: Estimate Std. Error t value Pr(>|t|) t_lag 5.42029 0.24932 21.74 6.19e-07 *** r_max 1.36647 0.06886 19.84 1.06e-06 *** N_0 3.62849 0.07721 46.99 6.22e-09 *** N_max 8.52330 0.07721 110.39 3.73e-11 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1337 on 6 degrees of freedom Number of iterations to convergence: 9 Achieved convergence tolerance: 1.49e-08
Formula: LogN ~ gompertz_model(t = t, r_max, N_max, N_0, t_lag) Parameters: Estimate Std. Error t value Pr(>|t|) t_lag 5.76455 0.26923 21.41 6.77e-07 *** r_max 1.57526 0.09930 15.86 3.98e-06 *** N_0 3.60201 0.07068 50.96 3.83e-09 *** N_max 8.65943 0.08526 101.56 6.14e-11 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1191 on 6 degrees of freedom Number of iterations to convergence: 8 Achieved convergence tolerance: 1.49e-08
And see how the fits look:
timepoints <- seq(0, 24, 0.1)
baranyi_points <- baranyi_model(t = timepoints, r_max = coef(fit_baranyi)["r_max"], N_max = coef(fit_baranyi)["N_max"], N_0 = coef(fit_baranyi)["N_0"], t_lag = coef(fit_baranyi)["t_lag"])
buchanan_points <- buchanan_model(t = timepoints, r_max = coef(fit_buchanan)["r_max"], N_max = coef(fit_buchanan)["N_max"], N_0 = coef(fit_buchanan)["N_0"], t_lag = coef(fit_buchanan)["t_lag"])
gompertz_points <- gompertz_model(t = timepoints, r_max = coef(fit_gompertz)["r_max"], N_max = coef(fit_gompertz)["N_max"], N_0 = coef(fit_gompertz)["N_0"], t_lag = coef(fit_gompertz)["t_lag"])
df1 <- data.frame(timepoints, baranyi_points)
df1$model <- "Baranyi"
names(df1) <- c("t", "LogN", "model")
df2 <- data.frame(timepoints, buchanan_points)
df2$model <- "Buchanan"
names(df2) <- c("t", "LogN", "model")
df3 <- data.frame(timepoints, gompertz_points)
df3$model <- "Gompertz"
names(df3) <- c("t", "LogN", "model")
model_frame <- rbind(df1, df2, df3)
ggplot(data, aes(x = t, y = LogN)) +
geom_point(size = 3) +
geom_line(data = model_frame, aes(x = t, y = LogN, col = model), size = 1)
(a) Calculate the confidence intervals on the parameters of each of the three fitted models, and use model selection (using AIC and/or BIC) as you did before to see if you can determine the best-fitting model among the three.
(b) Alternatively, for a different random sequence of fluctuations, one or more of the models may fail to fit (a singular gradiant matrix
error). Try repeating the above fitting with a different random seed (change the integers given to the random.seed( )
function), or increase the sampling error by increasing the standard deviationand see if it happens. If/when the NLLS optimization does fail to converge (the RSS minimum was not found), you can try to fix it by chaning the starting values.
(c) Repeat the model comparison exercise 1000 times (You will have to write a loop), and determine if/whether one model generally wins more often than the others. Note that each run will generate a slightly different dataset, because we are adding a vector of random errors every time the "data" are generated. This may result in failure of the NLLS fitting to converge, in which case you will need to use the try()
or tryCatch
functions.
(d) Repeat (b), but increase the error by increasing the standard deviation of the normal error distributon, and see if there are differences in the robustness of the models to sampling/experimental errors. You may also want to try changing the distribution of the errors to some non-normal distribution and see what happens.
Above we learned how to fit a mathematical model/equation to data by using the Least Squares method (linear or nonlinear). That is, we choose the parameters of model being fitted (e.g., straight line) to minimize the sum of the squares of the residuals/errors sround the fitted model. An alternative to minimizing the sum of squared errors is to find parameters to the function such that the likelihood of the parameters, given the data and the model, is maximized. Please see the lectures for the theoretical background to the following R examples.
We will first implement the (negative log) likelihood for simple linear regression (SLR) in R. Recall that SLR assumes every observation in the dataset was generated by the model:
Yi=β0+β1Xi+εi,εiiid∼N(0,σ2)That is, this is a model for the conditional distribution of Y given X. The pdf for the normal distribution is given by
f(x)=1√2σ2πexp(−(x−μ)22σ2)In the SLR model, the conditional distribution has this distribution.
That is, for any single observation, yi f(yi|β0,β1,xi)=1√2σ2πexp(−(yi−(β0+β1xi))22σ2)
Interpreting this function as a function of the parameters θ={β0,β1,σ}, then it gives us the likelihood of the ith data point.
As we did for the simple binomial distribution (see lecture), we can use this to estimate the parameters of the model.
First, we need to build an R function that returns the (negative log) likelihood for simple linear regression (it is negative log because the log of likelihood is itself negative):
nll.slr <- function(par, dat, ...){
args <- list(...)
b0 <- par[1]
b1 <- par[2]
X <- dat$X
Y <- dat$Y
if(!is.na(args$sigma)){
sigma <- args$sigma
} else
sigma <- par[3]
mu <- b0+b1*X
return(-sum(dnorm(Y, mean=mu, sd=sigma, log=TRUE)))
}
Note that we do something a bit different here (the "...
" bit). We do it this way because we want to be able to use R's optim()
function later.
The dnorm()
function calculates the logged (the log=TRUE
argument) probability of observing Y given mu, sigma and that X.
The negative sign on sum()
is because the optim()
function in R will minimize the negative log-likelihood, which is a sum: Recall that The log-likelihood of the parameters θ being true given data x equals to the sum of the logged probability densities of observing the data x given parameters θ. We want to maximize this (log-) likelihood using optim()
.
Let's generate some simulated data, assuming that: β0= b0
, β1= b1
, and σ= sigma
. For this, we will generate random deviations to simulate sampling or measurement error around an otherwise perfect line of data values:
set.seed(123)
n <- 30
b0 <- 10
b1 <- 3
sigma <- 2
X <- rnorm(n, mean=3, sd=7)
Y <- b0 + b1*X + rnorm(n, mean=0, sd=sigma)
dat <- data.frame(X=X, Y=Y) # convert to a data frame
In the first line, we set.seed()
to ensure that we can reproduce the results. The seed number you choose is the starting point used in the generation of a sequence of random numbers. No plot the "data":
plot(X, Y)
For now, let's assume that we know what β1 is. Let's build a likelihood profile for the simulated data:
N <- 50
b0s <- seq(5, 15, length=N)
mynll <- rep(NA, length=50)
for(i in 1:N){
mynll[i] <- nll.slr(par=c(b0s[i],b1), dat=dat, sigma=sigma)
}
That is, we calculate the negative log-likelihood for fixed b1, across a range (5 - 15) of b0.
Now plot the profile:
plot(b0s, mynll, type="l")
abline(v=b0, col=2)
abline(v=b0s[which.min(mynll)], col=3)
The true value for b0 (10) is the red line, while the value that minimizes the log-likelihood (i.e., maximizes the negative log-likelihood) is the green line. These are not the same because maximum likelihood is providing an estimate of the true value given the measurement errors (that we ourselves generated in tgis synthetic dataset).
If we wanted to estimate both β0 and β1 (two parameters), we need to deal with a two-dimensional maximum likelihood surface. The simplest approach is to do a grid search to find this likelihood surface.
N0 <- 100
N1 <- 101
b0s <- seq(7,12, length=N0)
b1s <- seq(1,5, length=N1)
mynll<-matrix(NA, nrow=N0, ncol=N1)
for(i in 1:N0){
for(j in 1:N1) mynll[i,j]<-nll.slr(par=c(b0s[i],b1s[j]), dat=dat, sigma=sigma)
}
ww <- which(mynll==min(mynll), arr.ind=TRUE)
b0.est <- b0s[ww[1]]
b1.est <- b1s[ww[2]]
rbind(c(b0, b1), c(b0.est, b1.est))
filled.contour(x = b0s, y = b1s, z= mynll, col=heat.colors(21),
plot.axes = {axis(1); axis(2); points(b0,b1, pch=21);
points(b0.est, b1.est, pch=8, cex=1.5); xlab="b0"; ylab="b1"})
10.00000 | 3.00 |
10.48485 | 2.96 |
There is a lot going on here. Make sure you ask one of us if some of the code does not make sense!
Again, note that the true parameter combination (asterisk) and the one what maximizes the negative log-likelihood (circle) are different.
We can also look at the conditional surfaces (i.e., we look at the slice around whatever the best estimate is for the other parameter):
par(mfrow=c(1,2), bty="n")
plot(b0s, mynll[,ww[2]], type="l", xlab="b0", ylab="NLL")
plot(b1s, mynll[ww[1],], type="l", xlab="b1", ylab="NLL")
There are many alternative methods to grid searches. Since we are seeking to minimize an arbitrary function (the negative log likelihood) we typically use a descent method to perform general optimization.
There are lots of options implemented in the optim
function in R. We won't go into the details of these methods, due to time constraints. However, typically one would most commonly use:
optim()
¶We can now do the fitting. This involves optimization (to find the appropriate parameter values that achieve the maximum of the likelihood surface above). For this, we will use R's versatile optim()
function.
The first argument for optim()
is the function that you want to minimize, and the second is a vector of starting values for your parameters (as always, do a?optim
). After the main arguments, you can add what you need to evaluate your function (e.g. sigma
). The addtional argument sigma can be "fed" to nll.slr
because we use the ...
convention when defining it.
fit <- optim(nll.slr, par=c(2, 1), method="L-BFGS-B", ## this is a n-D method
lower=-Inf, upper=Inf, dat=dat, sigma=sigma)
fit
Easy as pie (once you have the recipe)! We can also fit sigma as the same time if we want:
fit <- optim(nll.slr, par=c(2, 1, 5), method="L-BFGS-B", ## this is a n-D method
lower=c(-Inf, -Inf, 0.1), upper=Inf, dat=dat, sigma=NA)
fit$par
The starting values (b0 = 2, b1 = 1, sigma = 5) need to be assigned as we would do for NLLS. Also note that much like NLLS, we have bounded the parameters. The exact starting values are not too important in this case (try changing them see what happens).
Now visualize the fit:
plot(X, Y)
abline(a=fit$par[1], b=fit$par[2], col=2, lwd=2)
The joint distribution of the MLEs are asymptotically Normally distributed. Given this, if you are minimizing the negative log likelihood (NLL) then the covariance matrix of the estimates is (asymptotically) the inverse of the Hessian matrix. The Hessian matrix evalutes the second derivatives of the NLL (numerically here), which gives us information about the curvature the likelihood. Thus we can use the Hessian to estimate confidence intervals:
fit <- optim(nll.slr, par=c(2, 1), method="L-BFGS-B", hessian=TRUE, lower=-Inf, upper=Inf, dat=dat, sigma=sigma)
fisher_info<-solve(fit$hessian)
est_sigma<-sqrt(diag(fisher_info))
upper<-fit$par+1.96*est_sigma
lower<-fit$par-1.96*est_sigma
interval<-data.frame(value=fit$par, upper=upper, lower=lower)
interval
value | upper | lower |
---|---|---|
10.458935 | 11.228565 | 9.689305 |
2.961704 | 3.067705 | 2.855704 |
We can, of course, simply fit the model with lest squares using the lm()
function:
lmfit<-lm(Y~X)
summary(lmfit)$coeff
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 10.458936 | 0.32957007 | 31.73509 | 1.699822e-23 |
X | 2.961704 | 0.04539126 | 65.24834 | 3.874555e-32 |
The estimates we get using optim()
are almost identical to the estimates that we obtain here, and the standard errors on the intercept and slope are very similar to those we calculated from the Hessian (est_sigma= r est_sigma
).
You can use AIC or BIC as you did in NLLS using the likelihood you have calculated.
You can also use the Likelihood Ratio Test (LRT).
Try MLE fitting for the allometric trait data example above. You will use the same data + functions that you used to practice fitting curves using non-linear least squares methods. You have two options here. The easier one is to convert the power law model to a straight line model by taking a log (explained the Allometry Exercises. Specifically,
(a) Using the nll.slr
function as an example, write a function that calculates the negative log likelihood as a function of the parameters describing your trait and any additional parameters you need for an appropriate noise distribution (e.g., σ if you have normally distributed errors).
(b) For at least one of your parameters plot a likelihood profile given your data, with the other parametes fixed.
(c) Use the optim
function to find the MLE of the same parameter and indicate this on your likelihood profile.
(d) Obtain a confidence interval for your estimate.
A more challenging option is to fit the allometry data directly to the power law equation. You would need to assume a log-normal distribution for the errors instead of normal, in this case.
Recall from the lectures that for Bayesian model fitting/inference, we need to:
Assess MCMC convergence: MCMC is family of algorithm for sampling probability distributions so that it can be adequately characterized (in the Bayesian context the posterior distribution). The MCMC procedure reaches convergence once we have sufficient random draws from the posterior distribution. To assess convergence we look at trace plots. The goal is to get "fuzzy caterpillars"-looking curves.
Summarize MCMC draws: Summarize and visualize outcome of the random draws using histograms for all draws for each parameter, and calculate expectation, variance, credibility interval, etc.
Prior Sensitivity: Assess prior sensitivity by changing prior values and check whether it affects the results or not. If it does, that means that the results are too sensitive to that prior, not good!
Make inferences: We use the values from item (2) to make inferences and answer the research question.
Because likelihoods form the basis for Bayesian model fitting, we will first do an exercise to understand their calculation.
The Binomial distribution is used to model the number of "successes" in a set of trials (e.g., number of heads when you flip a coin N times). The pmf is
(Nx)px(1−p)N−x
Let's use the Binomial distribution to practice two methods of estimating parameters for a probability distribution: method of moments and maximum likelihood.
First take 50 draws from a binomial (using rbinom) for each p∈ 0.1, 0.5, 0.8 with N=20. For this, lets set seed so that we can reproduce this exact sequence of sampling (why?):
set.seed(54321)
## 50 draws with each p
pp<-c(0.1, 0.5, 0.8)
N<-20
reps<-50
Now plot the histograms of these draws together with the density functions.
## histograms + density here
x<-seq(0, 50, by=1)
par(mfrow=c(1,3), bty="n")
# Write more code here
Q1: Do the histograms look like the distributions for all 3 values of p? If not, what do you think is going on?
You'll notice that for p=0.1 the histogram and densities don't look quite the same -- the hist()
function is lumping together the zeros and ones which makes it look off. This is typical for distributions that are truncated.
To obtain a method of moments estimator, we equate the theoretical moments (which will be a function of the parameters of the distribution) with the corresponding sample moments, and solve for the parameters in order to obtain an estimate. For the binomial distribution, there is only one parameter, p.
Q2: Given the analytic expected value, above, and assuming that the sample mean is m (the mean number of observed heads across replicates), what is the MoM estimator for p?
Now calculate the MoM estimator for each of your 3 sets of simulated data sets to get the estimates for each of your values of p.
## MOM estimators for 3 simulated sets
Q3: How good are your estimates for p? Do you get something close to the true value?
For 1 of your values of p, take 20 draws from the binomial with N=20 and calculate the MoM. Repeat this 100 times (hint: the replicate()
and lapply
functions may be useful.) Make a histogram of your estimates, and add a line/point to the plot to indicate the real value of p that you used to simulate the data.
## MoM estimates, histogram
Q4: Is the MoM successfully estimating p? Does your histogram for p look more or less normal? If so, what theorem might explain why this would be the case?
Imagine that you flip a coin N times, and then repeat the experiment n times. Thus, you have data x=x‘1,x‘2,…x‘n that are the number of times you observed a head in each trial. p is the probability of obtaining a head.
Q5: Write down the likelihood and log-likelihood for the data. Take the derivative of the negative log-likelihood, set this equal to zero, and find the MLE, ˆp.
Simulate some data with p=0.25, N=10, and 10 replicates. Calculate the negative log-likelihood of your simulated data across a range of p (from 0 to 1), and plot them. You may do this by using the built in functions in R (specifically dbinom
) or write your own function. This is called a "likelihood profile''. Plot your likelihood profile with a line indicating the true value of p. Add lines indicating the MLE ˆp and the MoM estimator for p to your likelihood profile.
pp<-.25
N<-10
reps<-10
## Make one set of data
## the likelihood is always exactly zero
## at p=0,1, so I skip those values
ps<-seq(0.01, 0.99, by=0.01)
## Likelihood
## MLE/MoM estimators
## now plot the negative log likelihood profile
Q6: How does your MLE compare to the true parameter value? How could you estimate the MLE from the likelihood profile if you didn't have a way to calculate the MLE directly? If you chose another version of the random seed, do you get the same answer?
We will use this simple example to go through the steps of assessing a Bayesian model and we'll see that MCMC can allow us to approximate the posterior distribution.
Grogan and Wirth (1981) provide data on the wing length (in millimeters) of nine members of a species of midge (small, two-winged flies).
From these measurements we wish to make inference about the population mean μ.
WL.data <- read.csv("../data/MidgeWingLength.csv")
Y <- WL.data$WingLength
n <- length(Y)
hist(Y,breaks=10,xlab="Wing Length (mm)")
We might expect that these midge data could be draws from a Normal distribution N(μ,σ2). Recall that the MLEs for μ and σ2 here are simply the sample mean and sample variance respectively:
m<-sum(Y)/n
s2<-sum((Y-m)^2)/(n-1)
round(c(m, s2), 3)
x<-seq(1.4,2.2, length=50)
hist(Y,breaks=10,xlab="Wing Length (mm)", xlim=c(1.4, 2.2), freq=FALSE)
lines(x, dnorm(x, mean=m, sd=sqrt(s2)), col=2)
NOTE: I've plotted the estimate of the population distribution here, but this is not the *predictive distribution* (which would be a Student T because we're estimating both the mean and variance...).
The non-Bayesian version here has the advantage of being quick and familiar. However, from our point of view it has two weaknesses:
Because we have so few data points estimates of the accuracy of our predictions aren't available. 9 points is only barely enough to estimate a mean, so we don't trust any of the variance calculations.
We can't easily incorporate things that we might already know about midges into our analysis.
Let's see how we can do a similar analysis using a Bayesian approach, first analytically, and the with JAGS.
We need to define the likelihood and the priors for our Bayesian analysis. Given the analysis that we've just done, let's assume that our data come from a normal distribution with unknown mean, μ but that we know the variance is σ2=0.025. That is:
Yiid∼N(μ,0.0252)Studies from other populations suggest that wing lengths are usually around 1.9 mm, so we set μ0=1.9
We also know that lengths must be positive (μ>0)
We can approximate this restriction with a normal prior distribution for μ as follows:
Since most of the normal density is within two standard deviations of the mean we choose τ20 so that
μ0−2σ0>0⇒σ0<1.9/2=0.95I will choose σ0=0.8 here. Thus our prior for mu will be: μ∼N(1.9,0.82)
Together, then, our full model is: Yiid∼N(μ,0.0252)μ∼N(1.9,0.82)
For this very simple case it is easy to write down the posterior distribution (up to some constant). First, note that the likehood for the data can be written as
L∝n∏i=11σexp(−12σ2(Yi−μ)2)=1σnexp(−12σ2n∑i=1(Yi−μ)2)∝exp(−n2σ2(ˉY−μ)2)Multiplying the prior through we get the following for the posterior:
P(μ|Y)∝exp(−n2σ2(ˉY−μ)2)exp(−12σ20(μ−μ0)2)You can re-arrange, complete the square, etc, to get a new expression that is like
P(μ|Y)∝exp(−12σ2p(μp−μ)2)where
μp=nσ20σ2+nσ20ˉY+σ2σ2n+σ20μ0σ2p=(nσ2+1σ20)−1Instead of writing this last in terms of the variances, we could instead use precision (the inverse variance) which gives a simpler expression:
τp=nτ+τ0Just like in our earlier example, our estimate of the mean is a weighted average of the data and the prior, with the variance being determined by the data and prior variances.
So lets write a little function to calculate μp and τp and the plug in our numbers:
tau.post<-function(tau, tau0, n){n*tau + tau0}
mu.post<-function(Ybar, mu0, sig20, sig2, n){
weight<-sig2+n*sig20
return(n*sig20*Ybar/weight + sig2*mu0/weight)
}
Let's plot 3 things together -- the data histogram, the prior, and the posterior:
mu0 <- 1.9
s20 <- 0.8
s2<- 0.025 ## "true" variance
mp<-mu.post(Ybar=m, mu0=mu0, sig20=s20, sig2=s2, n=n)
tp<-tau.post(tau=1/s2, tau0=1/s20, n=n)
Let's plot the result:
x<-seq(1.3,2.3, length=1000)
hist(Y,breaks=10,xlab="Wing Length (mm)", xlim=c(1.3, 2.3),
freq=FALSE, ylim=c(0,8))
lines(x, dnorm(x, mean=mu0, sd=sqrt(s20)), col=2, lty=2, lwd=2) ## prior
lines(x, dnorm(x, mean=mp, sd=sqrt(1/tp)), col=4, lwd=2) ## posterior
legend("topleft", legend=c("prior", "posterior"), col=c(2,4), lty=c(2,1), lwd=2)
Change the values of the mean and the variance that you choose for the prior ("hyperparameters"). What does this do to the posterior distribution. E.g., what happens if the variance you choose is small, and μ0=2.5 or so. Is this what you expect?
Let's show that we can get the same thing from JAGS that we were able to get from the analytic results. You'll need to make sure you have installed JAGS (which must be done outside of R) and then the libraries rjags
and coda
.
# Load libraries
require(rjags) # does the fitting
require(coda) # makes diagnostic plots
##require(mcmcplots) # another option for diagnostic plots
First we must encode our choices for our data model and priors to pass them to the fitting routines in JAGS. This involves setting up a model that includes the likelihood for each data point and a prior for every parameter we want to estimate. Here is an example of how we would do this for the simple model we fit for the midge data (note that JAGS uses the precision instead of the variance or sd for the normal distribution:
model1 <- "model{
## Likelihood
for(i in 1:n){
Y[i] ~ dnorm(mu,tau)
}
## Prior for mu
mu ~ dnorm(mu0,tau0)
} ## close model
"
Now create the JAGS model:
model <- jags.model(textConnection(model1),
n.chains = 1, ## usually do more
data = list(Y=Y,n=n, ## data
mu0=mu0, tau0=1/s20, ## hyperparams
tau = 1/s2 ## known precision
),
inits=list(mu=3) ## setting an starting val
)
Error in jags.model(textConnection(model1), n.chains = 1, data = list(Y = Y, : could not find function "jags.model" Traceback:
Now we'll run the MCMC and, see how the output looks for a short chain with no burnin:
samp <- coda.samples(model,
variable.names=c("mu"),
n.iter=1000, progress.bar="none")
plot(samp)
MCMC is a rejection algorithm that often needs to converge or "burn-in" -- that is we need to potentially move until we're taking draws from the correct distribution. Unlike for optimization problems, this does not mean that the algorithm :eads toward a single value. Instead we're looking for a pattern where the draws are seemingly unrelated and random. To assess convergence we look at trace plots, the goal is to get traces that look like "fuzzy caterpillars".
Sometimes at the beginning of a run, if we start far from the area near the posterior mean of the parameter, we will instead get something that looks like a trending time series. If this is the case we have to drop the samples that were taken during the burn-in phase. Here's an example of how to do that:
update(model, 10000, progress.bar="none") # Burnin for 10000 samples
samp <- coda.samples(model,
variable.names=c("mu"),
n.iter=20000, progress.bar="none")
plot(samp)
This is a very fuzzy caterpillar!
We can also use the summary function to examine the samples generated:
summary(samp)
Let's compare these draws to what we got with our analytic solution:
x<-seq(1.3,2.3, length=1000)
hist(samp[[1]], xlab="mu", xlim=c(1.3, 2.3),
freq=FALSE, ylim=c(0,8), main ="posterior samples")
lines(x, dnorm(x, mean=mu0, sd=sqrt(s20)), col=2, lty=2, lwd=2) ## prior
lines(x, dnorm(x, mean=mp, sd=sqrt(1/tp)), col=4, lwd=2) ## posterior
legend("topleft", legend=c("prior", "analytic posterior"), col=c(2,4), lty=c(2,1), lwd=2)
It worked!
As with the analytic approach, it's always a good idea when you run your analyses to see how sensitive is your result to the priors you choose. Unless you are purposefully choosing an informative prior, we usually want the prior and posterior to look different.
One advantage of the numerical approach is that we can choose almost anything we want for the priors on multiple parameters without worrying if they are conjugate, or if we want to include additional information. For example, let's say that, not, we want to force the mean to be positive (and also the data, perhaps), and concurrently estimate the variance. Here is a possible model.
model2 <- "model{
# Likelihood
for(i in 1:n){
Y[i] ~ dnorm(mu,tau) T(0,) ## truncates at 0
}
# Prior for mu
mu ~ dnorm(mu0,tau0)
# Prior for the precision
tau ~ dgamma(a, b)
# Compute the variance
s2 <- 1/tau
}"
## hyperparams for tau
a <- 0.01
b <- 0.01
m2 <- jags.model(textConnection(model2),
n.chains = 1,
data = list(Y=Y, n=n,
mu0=mu0, tau0=1/s20, ## mu hyperparams
a=a, b=b ## tau hyperparams
),
inits=list(mu=3, tau=10) ## starting vals
)
samp <- coda.samples(m2,
variable.names=c("mu","s2"),
n.iter=1000, progress.bar="none")
plot(samp)
summary(samp)
Now we plot each with their priors:
par(mfrow=c(1,2), bty="n")
hist(samp[[1]][,1], xlab="samples of mu", main="mu")
lines(x, dnorm(x, mean=mu0, sd=sqrt(s20)),
col=2, lty=2, lwd=2) ## prior
x2<-seq(0, 200, length=1000)
hist(1/samp[[1]][,2], xlab="samples of tau", main="tau")
lines(x2, dgamma(x2, shape = a, rate = b),
col=2, lty=2, lwd=2) ## prior
Error in hist(samp[[1]][, 1], xlab = "samples of mu", main = "mu"): object 'samp' not found Traceback: 1. hist(samp[[1]][, 1], xlab = "samples of mu", main = "mu")
We also want to look at the joint distribution of μ and σ2:
plot(as.numeric(samp[[1]][,1]), samp[[1]][,2], xlab="mu", ylab="s2")
Redo the previous analysis placing a gamma prior on μ as well. Set the prior so that the mean and variance are the same as in the normal example from above (use moment matching). Do you get something similar?
Now let's do some Bayesian model fitting to Aedes thermal performance data. Lets try out the R2jags
package for this.
require(R2jags) # fitting
require(coda) # diagnostic plots
set.seed(1234)
Loading required package: R2jags Attaching package: ‘R2jags’ The following object is masked from ‘package:coda’: traceplot
Load the data:
Aaeg.data <- read.csv("../data/AeaegyptiTraitData.csv")
These data are traits from Aedes aegypti mosquitoes measured across temperature in lab experiments. The traits we have data on thermal performance are:
Note that some of the traits come in multiple forms (e.g., μ and 1/μ, PDR and EIP).
Have a look at the data:
head(Aaeg.data)
trait.name | T | trait | ref | trait2 | trait2.name |
---|---|---|---|---|---|
pEA | 22 | 0.90812 | Westbrook_Thesis_2010 | NA | NA |
pEA | 27 | 0.93590 | Westbrook_Thesis_2010 | NA | NA |
pEA | 32 | 0.81944 | Westbrook_Thesis_2010 | NA | NA |
MDR | 22 | 0.09174 | Westbrook_Thesis_2010 | NA | NA |
MDR | 27 | 0.13587 | Westbrook_Thesis_2010 | NA | NA |
MDR | 32 | 0.15823 | Westbrook_Thesis_2010 | NA | NA |
mu.data <- subset(Aaeg.data, trait.name == "mu")
lf.data <- subset(Aaeg.data, trait.name == "1/mu")
par(mfrow=c(1,2), bty="l")
plot(trait ~ T, data = mu.data, ylab="mu")
plot(trait ~ T, data = lf.data, ylab="1/mu")
Note that the μ data is u-shaped and the lifespan data is unimodal (hump-shaped).
Since thermal biology theory is based on unimodal thermal responses, we want to fit the trait as lifespan instead of μ. Thus, we'll need to convert the μ data to lifespan by taking the inverse. The combined data should have a nice unimodal shape that we can fit a function to:
mu.data.inv <- mu.data # make a copy of the mu data
mu.data.inv$trait <- 1/mu.data$trait # take the inverse of the trait values to convert mu to lifespan
lf.data.comb <- rbind(mu.data.inv, lf.data) # combine both lifespan data sets together
plot(trait ~ T, data = lf.data.comb, ylab="1/mu")
Most thermal response curves can be reasonably fit using one of two thermal reponses. Traits that respond unimodally but symmetrically to temperature can be fit with a quadratic function:
B=q(T−T0)(T−Tm)
Traits that respond unimodally but asymetrically can be fited with a Briere function:
B=qT(T−T0)√Tm−T
In both models, T0 is the lower thermal limit, Tm is the upper thermal limit (i.e., where the trait value goes to zero on either end), and q scales the elevation of the curve, (and so also the value at the optimum temperature).
Unlike the previous bayesian \example, here we will provide jags with the model written as a .txt
file. THis can be in your working directory, or elsewhere (but then inout the full path to it --- ideally a relative path).
You can either write the text yourself directly to the file, or create it using the sink() function via your R script (see below):
Note that the model file quad.txt
has two mandatory sections (the priors and the likelihood) and one optional section (derived measures calculated from your fitted parameters).
In the example below for a quadratic function, most of the priors are specified via uniform distributions (the two arguments specific the lower and upper bounds, respectively). Note that unlike in R and most other programs, in jags, the inverse of the variance of the normal distribution is used, denoted by τ(=1σ2).
The likelihood for can be interpreted as follows: the observed data are normally distributed where the mean at a given temperature follows the quadratic equation.
Now, prepare the data for jags:
# Parameters to Estimate
parameters <- c("cf.q", "cf.T0", "cf.Tm","cf.sigma", "z.trait.mu.pred")
# Initial values for the parameters
inits<-function(){list(
cf.q = 0.01,
cf.Tm = 35,
cf.T0 = 5,
cf.sigma = rlnorm(1))}
# MCMC Settings: number of posterior dist elements = [(ni - nb) / nt ] * nc
ni <- 25000 # number of iterations in each chain
nb <- 5000 # number of 'burn in' iterations to discard
nt <- 8 # thinning rate - jags saves every nt iterations in each chain
nc <- 3 # number of chains
# Temperature sequence for derived quantity calculations
Temp.xs <- seq(0, 45, 0.2)
N.Temp.xs <-length(Temp.xs)
### Fitting the trait thermal response; Pull out data columns as vectors
data <- lf.data.comb # this lets us reuse the same generic code: we only change this first line
trait <- data$trait
N.obs <- length(trait)
temp <- data$T
# Bundle all data in a list for JAGS
jag.data<-list(trait = trait, N.obs = N.obs, temp = temp, Temp.xs = Temp.xs, N.Temp.xs = N.Temp.xs)
Now run the fitting using jags:
lf.fit <- jags(data=jag.data, inits=inits, parameters.to.save=parameters,
model.file="quad.txt", n.thin=nt, n.chains=nc, n.burnin=nb,
n.iter=ni, DIC=T, working.directory=getwd())
Warning message in file(modfile, "rt"): “cannot open file 'quad.txt': No such file or directory”
Error in jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, : Cannot open model file "quad.txt" Traceback: 1. jags(data = jag.data, inits = inits, parameters.to.save = parameters, . model.file = "quad.txt", n.thin = nt, n.chains = nc, n.burnin = nb, . n.iter = ni, DIC = T, working.directory = getwd()) 2. jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, . n.adapt = 0) 3. stop(paste("Cannot open model file \"", modfile, "\"", sep = ""))
Change into "mcmc" type samples for visualization with the coda
package:
lf.fit.mcmc <- as.mcmc(lf.fit)
View the parameters (only the first 5 lines, or it will also show you all of your derived quantities):
lf.fit$BUGSoutput$summary[1:5,]
mean | sd | 2.5% | 25% | 50% | 75% | 97.5% | Rhat | n.eff | |
---|---|---|---|---|---|---|---|---|---|
cf.T0 | 6.4943265 | 2.09300384 | 1.60602110 | 5.24712819 | 6.7642097 | 7.960755 | 9.8890203 | 1.011800 | 450 |
cf.Tm | 39.2489169 | 1.66643065 | 36.56998429 | 38.04932891 | 39.0173139 | 40.200410 | 43.1551656 | 1.004244 | 580 |
cf.q | 0.1099374 | 0.02623226 | 0.06461983 | 0.09123192 | 0.1085517 | 0.126104 | 0.1657431 | 1.005886 | 440 |
cf.sigma | 6.7100226 | 0.96797653 | 5.13196346 | 6.03242714 | 6.6036237 | 7.279794 | 8.9213322 | 1.001077 | 7200 |
deviance | 197.7198535 | 3.09086985 | 193.77713702 | 195.42279657 | 197.0587314 | 199.288979 | 205.4720801 | 1.001325 | 3800 |
Plot the chains:
plot(lf.fit.mcmc[,c(1,3,4)])
plot(trait ~ T, xlim = c(0, 45), ylim = c(0,42), data = lf.data.comb, ylab = "Lifespan for Ae. aegypti", xlab = "Temperature")
lines(lf.fit$BUGSoutput$summary[6:(6 + N.Temp.xs - 1), "2.5%"] ~ Temp.xs, lty = 2)
lines(lf.fit$BUGSoutput$summary[6:(6 + N.Temp.xs - 1), "97.5%"] ~ Temp.xs, lty = 2)
lines(lf.fit$BUGSoutput$summary[6:(6 + N.Temp.xs - 1), "mean"] ~ Temp.xs)
You can use the which.max()
function to find the optimal temperature for adult lifespan:
Temp.xs[which.max(as.vector(lf.fit$BUGSoutput$summary[6:(6 + N.Temp.xs - 1), "mean"]))]
You can also pull out the lifespan values for each iteration of the MCMC chain over the temperature gradient to calculate R0:
lf.grad <- lf.fit$BUGSoutput$sims.list$z.trait.mu.pred
dim(lf.grad) # A matrix with 7500 iterations of the MCMC chains at 226 temperatures
We will now perform a bayesian analysis of population growth data.
require(R2jags) # does the fitting
require(coda) # makes diagnostic plots
library(IDPmisc) # makes nice colored pairs plots to look at joint posteriors
Loading required package: R2jags Loading required package: rjags Loading required package: coda Linked to JAGS 4.2.0 Loaded modules: basemod,bugs Attaching package: ‘R2jags’ The following object is masked from ‘package:coda’: traceplot Loading required package: grid Loading required package: lattice
These data are observations of the amphibian fungal pathogen Batrachochytrium dendrobatidis being grown in liquid culture at multiple different temperatures. The experiment is conducted in 96 well plates with a fixed initial innoculation of fungal spores in each well, and the plate placed in a constant temperature incubator. Each day, 8 wells per plate are observed and the optical density (OD) is measured. We will focus on a single temperature trial across mulitple plates with OD as the response.
We will fit a logistic model to these growth data.
Let's have a look at the data first:
dat <- read.csv("../data/lb_ab_temps.csv")
head(dat)
X | EXP | TEMP | DAY | ISOLATE | PLATE | WELL | OD | NC_AVG | OD_SUB |
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 5 | 1 | LB_AB | P.16 | 1 | 0.120 | 0.1195 | 0.0005 |
2 | 2 | 5 | 1 | LB_AB | P.16 | 2 | 0.120 | 0.1195 | 0.0005 |
3 | 2 | 5 | 1 | LB_AB | P.16 | 3 | 0.122 | 0.1195 | 0.0025 |
4 | 2 | 5 | 1 | LB_AB | P.16 | 4 | 0.123 | 0.1195 | 0.0035 |
5 | 2 | 5 | 1 | LB_AB | P.16 | 5 | 0.125 | 0.1195 | 0.0055 |
6 | 2 | 5 | 1 | LB_AB | P.16 | 6 | 0.125 | 0.1195 | 0.0055 |
We are only interested in a subset of these data, so we will subset out only those from experiment 2, and a temperature of 12∘C.
d2<-dat[which(dat$EXP==2),2:8]
d2<-d2[which(d2$TEMP==12),]
summary(d2)
EXP TEMP DAY ISOLATE PLATE Min. :2 Min. :12 Min. : 1.00 LB_AB:730 P.11 :160 1st Qu.:2 1st Qu.:12 1st Qu.: 9.00 P.12 :150 Median :2 Median :12 Median :19.00 P.13 :150 Mean :2 Mean :12 Mean :19.04 P.14 :140 3rd Qu.:2 3rd Qu.:12 3rd Qu.:29.00 P.15 :130 Max. :2 Max. :12 Max. :39.00 P.1 : 0 (Other): 0 WELL OD Min. :1.000 Min. :0.0930 1st Qu.:2.000 1st Qu.:0.1630 Median :4.000 Median :0.2580 Mean :4.315 Mean :0.2437 3rd Qu.:6.000 3rd Qu.:0.3170 Max. :8.000 Max. :0.4600
Now plot it:
Temps<-seq(0,max(d2$DAY)-1, by=0.05)
mycol<-1
my.ylim<-c(0, 0.5)
my.title<-"LB-AB isolate, T=12C"
plot(d2$DAY-1, d2$OD, xlim=c(0,max(Temps)), ylim=my.ylim,
pch=(mycol+20),
xlab="time (days)", ylab="",
main=my.title,
col=mycol+1, cex=1.5)
Although logistic growth is often written as a differential equation, here we will work with the analytic solution of the model:
μ(t)=KY0Y0+(K−Y0)exp(−rt)This gives the mean function that we want to fit. We will assume log-normal noise around this response, as the optical density is bounded to be greater than 0 and since we also have increasing variance over time (as the optical density increases).
JAGS needs the model written as a .txt
or .bug
file inside the working directory. You can either make the text file directly, or create it using the sink()
function in your R script, as follows:
Note that the model file has two mandatory sections (the priors and the likelihood) and one optional section (derived quantiaties calculated from your fitted parameters).
In the example below we will build the model function with the log-normal likelihood for the logistic growth function. Priors are a combination of uniform and exponential distributions. As with the normal distribution, jags uses τ to parameterize the variance of the normal distribution (τ=1/(σ2)). However it can be easier to specify the prior on sigma directly. In this example we will generate posterior samples of derived quantities outside of JAGS (so you can see what this is actually doing).
Now for some additional settings/specifications for jags:
# Parameters to Estimate
parameters <- c('Y0', 'K', 'r', 'sigma')
# Initial values for the parameters
inits<-function(){list(
Y0 = 0.1,
K = 0.4,
r = 0.1,
sigma = rlnorm(1))}
# MCMC Settings: number of posterior dist elements = [(ni - nb) / nt ] * nc
ni <- 6000 # number of iterations in each chain
nb <- 1000 # number of 'burn in' iterations to discard
nt <- 1 # thinning rate - jags saves every nt iterations in each chain
nc <- 5 # number of chains
Now we can run jags:
# Pull out data columns as vectors
data <- d2 # this lets us reuse the same generic code: we only change this first line
Y <- data$OD
N <- length(Y)
t <- data$DAY
# Bundle all data in a list for JAGS
jag.data<-list(Y = Y, N = N, t = t)
# Run JAGS
OD.12C <- jags(data=jag.data, inits=inits, parameters.to.save=parameters,
model.file="jags-logistic.bug", n.thin=nt, n.chains=nc, n.burnin=nb,
n.iter=ni, DIC=T, working.directory=getwd())
module glm loaded
Compiling model graph Resolving undeclared variables Allocating nodes Graph information: Observed stochastic nodes: 730 Unobserved stochastic nodes: 4 Total graph size: 1598 Initializing model
Change into "mcmc" type samples for visualization with coda
:
OD.12C.mcmc<-as.mcmc(OD.12C)
As you did in the Traits bayesian fitting example, there are a number of model diagnostics that we need to check. First we want to look at the chains and confirm that they look like "fuzzy caterpillars" -- no linear/non-linear patterns across the chains, low auto-correlation, etc.
First view the fitted parameters:
OD.12C$BUGSoutput$summary
mean | sd | 2.5% | 25% | 50% | 75% | 97.5% | Rhat | n.eff | |
---|---|---|---|---|---|---|---|---|---|
K | 3.659490e-01 | 0.0055513205 | 3.552925e-01 | 3.622170e-01 | 3.658641e-01 | 3.696170e-01 | 3.772298e-01 | 1.001300 | 7500 |
Y0 | 9.051409e-02 | 0.0004844611 | 9.001452e-02 | 9.015819e-02 | 9.037434e-02 | 9.072057e-02 | 9.179751e-02 | 1.001085 | 16000 |
deviance | -2.823094e+03 | 6.0822395330 | -2.832126e+03 | -2.827586e+03 | -2.824031e+03 | -2.819654e+03 | -2.808733e+03 | 1.001097 | 15000 |
r | 1.051490e-01 | 0.0023217883 | 1.005654e-01 | 1.035754e-01 | 1.051368e-01 | 1.067121e-01 | 1.097114e-01 | 1.001249 | 8600 |
sigma | 1.563601e-01 | 0.0041764147 | 1.484153e-01 | 1.535103e-01 | 1.562650e-01 | 1.591003e-01 | 1.647488e-01 | 1.001038 | 22000 |
Plot the chains using the coda package:
plot(OD.12C.mcmc[,c(1,2,4)])
We can examine the ACF of the chains as well, similarly to a time series:
s1<-as.data.frame(OD.12C.mcmc[[1]])
par(mfrow=c(2,2))
for(i in 2:5) acf(s1[,i], lag.max=20)
There is still a bit of autocorrelation, but it isn't too bad. The chain for σ is mixing best. We could reduce the autocorrelation even further by thinning the chain (i.e., change the nt
parameter to 5 or 10).
The last important diagnostic is to compare the prior and posterior distributions. Various packages in R have bespoke functions to do this. Here we use functions that we provide in the mcmc_utils.R
file provided on the website.
source("../code/mcmc_utils.R")
We also can write a function to put the samples into a convenient format for visualizing, etc:
samps<-NULL
for(i in 1:nc){
samps<-rbind(samps, as.data.frame(OD.12C.mcmc[[i]]))
}
samps<-samps[,c(5,2,3,4)]
And also, we can building a list to hold all the information about the priors for each parameter:
priors<-list()
priors$names<-c("Y0", "K", "r","sigma")
priors$fun<-c("uniform", "uniform", "exp","exp")
priors$hyper<-matrix(NA, ncol=4, nrow=3)
priors$hyper[,1]<-c(0.09, 0.15, NA)
priors$hyper[,2]<-c(0.01, 0.6, NA)
priors$hyper[,3]<-c(1000, NA, NA)
priors$hyper[,4]<-c(0.1, NA, NA)
Now we can plot the histograms of the posterior samples together with the prior distributions:
plot.hists(samps, my.par=c(2,2), n.hists=4, priors=priors, mai=c(0.5, 0.5, 0.25, 0.2))
The prior distribution here is very different from the posterior. These data are highly informative for the parameters of interest and are very unlikely to be influenced much by the prior distribution (although you can always change the priors to check this). However, notice that Y0 (the initial condition) is truncated by the prior. This is a fairly strong prior, because we know something about the initial optical density that is typical for the esperimental set up with the density of innoculum used and with a properly calibrated set-up.
It's often useful to also look at the joint distbution of all of your parameters together. Of course, if you have a high dimensional posterior, rendering a 2-D representation can be difficult. Instead, the standard is to examine the pair-wise posterior distribution, for instance as follows (using the s1
data frame we created above):
ipairs(s1[,2:5], ztransf = function(x){x[x<1] <- 1; log2(x)})
As you can see, estimates of r and K are highly correlated -- not surprising given the interplay between them in the logistic growth function. This correlation is an important feature of the system, and we use the full posterior distribution that includes this correlation when we want to build the corresponding posterior distribution of the behavior of the logistic function.
The final step is to check how well we are fitting the data. To do this we usually examine the posterior distribution of the mean function of our system, in this case the distribution of the logistic solution and compare this to the data. To do this, for each of our posterior samples (or a thinned subset), we plug the parameters for the ith sample θi into our function of interest, and evaluate the function as a desired set of x's. For instance, for logistic growth, we'll evaluate μ(t)=KiY0,iY0,i+(Ki−Y0,i)exp(−rit)
my.logistic<-function(t, Y0, K, r){
return(K*Y0/(Y0+(K-Y0)*exp(-r*t)))
}
ts<-seq(0, 40, length=100)
ss<-seq(1, dim(samps)[1], by=10)
my.curves<-matrix(NA, nrow=length(ss), ncol=length(ts))
for(i in 1:length(ss)){
my.curves[i,]<-my.logistic(t=ts, Y0=samps$Y0[i], K=samps$K[i], r=samps$r[i])
}
We can now plot all of these curves:
plot(ts, my.curves[1,], col=1, type="l", ylim=c(0.09, 0.36),
ylab="predicted OD", xlab="time (days)")
for(i in 2:length(ss)) lines(ts, my.curves[i,], col=i)
Then we can summarize this posterior using the apply
function to find the mean and the (for simplicity) quantile based 95% CI:
m.log<-apply(my.curves, 2, mean)
l.log<-apply(my.curves, 2, quantile, probs=0.025)
u.log<-apply(my.curves, 2, quantile, probs=0.975)
For comparison, here is how to find the 95% HPD Interval across time, using the HPDinterval
function from the coda
package:
hpd.log<-NULL
for(i in 1:length(ts)){
hpd.log<-cbind(hpd.log, as.numeric(HPDinterval(mcmc(my.curves[,i]))))
}
And plot these together with the data (in this case the HPD and quantile based intervals are indistinguishable):
my.ylim<-c(0.09, 0.45)
my.title<-"LB-AB isolate, T=12C"
plot(d2$DAY-1, d2$OD, xlim=c(0,max(Temps)), ylim=my.ylim,
pch=(mycol+20),
xlab="time (days)", ylab="",
main=my.title,
col="grey", cex=1.5)
lines(ts, m.log, col=1, lwd=2)
lines(ts, l.log, col=2, lwd=2, lty=2)
lines(ts, u.log, col=2, lwd=2, lty=2)
lines(ts, hpd.log[1,], col=3, lwd=2, lty=3)
lines(ts, hpd.log[2,], col=3, lwd=2, lty=3)
Note that this only shows the uncertainty in the mean function -- the assumed model with log normal noise says that the observations simply have this mean. The fit is attributing the majority of the observed noise to process error rather than parameter uncertainty.