# ANOVA models¶

Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix.

In [1]:
%%R
url = 'http://stats191.stanford.edu/data/salary.table'
salary.table$E = factor(salary.table$E)
salary.table$M = factor(salary.table$M)
salary.lm = lm(S ~ X + E + M, salary.table)

  (Intercept) X E2 E3 M1
1           1 1  0  0  1
2           1 1  0  1  0
3           1 1  0  1  1
4           1 1  1  0  0
5           1 1  0  1  0
6           1 2  1  0  1

• Often, especially in experimental settings, we record only categorical variables.

• Such models are often referred to ANOVA (Analysis of Variance) models.

• These are generalizations of our favorite example, the two sample $t$-test.

## Example: recovery time¶

• Suppose we want to understand the relationship between recovery time after surgery based on an patient's prior fitness.

• We group patients into three fitness levels: below average, average, above average.

• If you are in better shape before surgery, does it take less time to recover?

In [2]:
%%R
url = 'http://stats191.stanford.edu/data/rehab.csv'
rehab.table$Fitness <- factor(rehab.table$Fitness)

  Fitness Time
1       1   29
2       1   42
3       1   38
4       1   40
5       1   43
6       1   40

In [3]:
%%R -h 800 -w 800
attach(rehab.table)
boxplot(Time ~ Fitness, col=c('red','green','blue'))


## One-way ANOVA¶

• First generalization of two sample $t$-test: more than two groups.

• Observations are broken up into $r$ groups with $n_i, 1 \leq i \leq r$ observations per group.

• Model: $$Y_{ij} = \mu + \alpha_i + \varepsilon_{ij}, \qquad \varepsilon_{ij} \sim N(0, \sigma^2).$$

• Constraint: $\sum_{i=1}^r \alpha_i = 0$. This constraint is needed for “identifiability”. This is “equivalent” to only adding $r-1$ columns to the design matrix for this qualitative variable.

• This is not the same parameterization we get when only adding $r-1$ 0-1 columns, but it gives the same model.

• The estimates of $\alpha$ can be obtained from the estimates of $\beta$ using R's default parameters.

• For a more detailed exploration into R's creation of design matrices, try reading the following tutorial on design matrices.

## Fitting the model¶

• Model is easy to fit: $$\widehat{Y}_{ij} = \frac{1}{n_i} \sum_{j=1}^{n_i} Y_{ij} = \overline{Y}_{i\cdot}.$$ If observation is in $i$-th group: predicted mean is just the sample mean of observations in $i$-th group.

• Simplest question: is there any group (main) effect? $$H_0:\alpha_1 = \dots = \alpha_r= 0?$$

• Test is based on $F$-test with full model vs. reduced model. Reduced model just has an intercept.

• Other questions: is the effect the same in groups 1 and 2? $$H_0:\alpha_1=\alpha_2?$$

In [4]:
%%R
rehab.lm <- lm(Time ~ Fitness)
summary(rehab.lm)

Call:
lm(formula = Time ~ Fitness)

Residuals:
Min     1Q Median     3Q    Max
-9.0   -3.0   -0.5    3.0    8.0

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   38.000      1.574  24.149  < 2e-16 ***
Fitness2      -6.000      2.111  -2.842  0.00976 **
Fitness3     -14.000      2.404  -5.824 8.81e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.451 on 21 degrees of freedom
Multiple R-squared:  0.6176,	Adjusted R-squared:  0.5812
F-statistic: 16.96 on 2 and 21 DF,  p-value: 4.129e-05


In [5]:
%%R
print(predict(rehab.lm, list(Fitness=factor(c(1,2,3)))))
c(mean(Time[Fitness == 1]), mean(Time[Fitness == 2]), mean(Time[Fitness == 3]))

 1  2  3
38 32 24
[1] 38 32 24


Recall that the rows of the Coefficients table above do not correspond to the $\alpha$ parameter. For one thing, we would see three $\alpha$'s and their sum would have to be equal to 0.

Also, the design matrix is the indicator coding we saw last time.

In [6]:
%%R

  (Intercept) Fitness2 Fitness3
1           1        0        0
2           1        0        0
3           1        0        0
4           1        0        0
5           1        0        0
6           1        0        0

• There are ways to get different design matrices by using the contrasts argument. This is a bit above our pay grade at the moment.

• Upon inspection of the design matrix above, we see that the (Intercept) coefficient corresponds to the mean in Fitness==1, while Fitness==2 coefficient corresponds to the difference between the groups Fitness==2 and Fitness==1.

## ANOVA table¶

Much of the information in an ANOVA model is contained in the ANOVA table.

In [8]:
make_table(anova_oneway)
apply_theme('basic')

Out[8]:
 Source SS df $\mathbb{E}(MS)$ Treatment $SSTR=\sum_{i=1}^r n_i \left(\overline{Y}_{i\cdot} - \overline{Y}_{\cdot\cdot}\right)^2$ r-1 $\sigma^2 + \frac{\sum_{i=1}^r n_i \alpha_i^2}{r-1}$ Error $SSE=\sum_{i=1}^r \sum_{j=1}^{n_i}(Y_{ij} - \overline{Y}_{i\cdot})^2$ $\sum_{i=1}^r (n_i - 1)$ $\sigma^2$
In [9]:
%%R
anova(rehab.lm)

Analysis of Variance Table

Response: Time
Df Sum Sq Mean Sq F value    Pr(>F)
Fitness    2    672  336.00  16.962 4.129e-05 ***
Residuals 21    416   19.81
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

• Note that $MSTR$ measures “variability” of the “cell” means. If there is a group effect we expect this to be large relative to $MSE$.

• We see that under $H_0:\alpha_1=\dots=\alpha_r=0$, the expected value of $MSTR$ and $MSE$ is $\sigma^2$. This tells us how to test $H_0$ using ratio of mean squares, i.e. an $F$ test.

## Testing for any main effect¶

• Rows in the ANOVA table are, in general, independent.

• Therefore, under $H_0$ $$F = \frac{MSTR}{MSE} = \frac{\frac{SSTR}{df_{TR}}}{\frac{SSE}{df_{E}}} \sim F_{df_{TR}, df_E}$$ the degrees of freedom come from the $df$ column in previous table.

• Reject $H_0$ at level $\alpha$ if $F > F_{1-\alpha, df_{TR}, df_{E}}.$

In [10]:
%%R
F = 336.00 / 19.81
pval = 1 - pf(F, 2, 21)
print(data.frame(F,pval))

         F         pval
1 16.96113 4.129945e-05


## Inference for linear combinations¶

• Suppose we want to infer'' something about $$\sum_{i=1}^r a_i \mu_i$$ where $\mu_i = \mu+\alpha_i$ is the mean in the $i$-th group. For example: $$H_0:\mu_1-\mu_2=0 \qquad \text{(same as H_0:\alpha_1-\alpha_2=0)}?$$
• For example:

Is there a difference between below average and average groups in terms of rehab time?

• We need to know $$\text{Var}\left(\sum_{i=1}^r a_i \overline{Y}_{i\cdot} \right) = \sigma^2 \sum_{i=1}^r \frac{a_i^2}{n_i}.$$

• After this, the usual confidence intervals and $t$-tests apply.

In [11]:
%%R

  (Intercept) Fitness2 Fitness3
1           1        0        0
2           1        0        0
3           1        0        0
4           1        0        0
5           1        0        0
6           1        0        0


This means that the coefficient Fitness2 is the estimated difference between the two groups.

In [12]:
%%R
detach(rehab.table)


## Two-way ANOVA¶

Often, we will have more than one variable we are changing.

### Example¶

After kidney failure, we suppose that the time of stay in hospital depends on weight gain between treatments and duration of treatment.

We will model the log number of days as a function of the other two factors.

In [14]:
make_table(desc)
apply_theme('basic')


Out[14]:
 Variable Description Days Duration of hospital stay Weight How much weight is gained? (three levels) Duration How long under treatment for kidney problems? (two levels)
In [15]:
%%R
url = 'http://statweb.stanford.edu/~jtaylo/stats191/data/kidney.table'
kidney.table$D = factor(kidney.table$Duration)
kidney.table$W = factor(kidney.table$Weight)
kidney.table$logDays = log(kidney.table$Days + 1)
attach(kidney.table)

  Days Duration Weight ID D W   logDays
1    0        1      1  1 1 1 0.0000000
2    2        1      1  2 1 1 1.0986123
3    1        1      1  3 1 1 0.6931472
4    3        1      1  4 1 1 1.3862944
5    0        1      1  5 1 1 0.0000000
6    2        1      1  6 1 1 1.0986123


### Two-way ANOVA model¶

• Second generalization of $t$-test: more than one grouping variable.

• Two-way ANOVA model:

• $r$ groups in first factor
• $m$ groups in second factor
• $n_{ij}$ in each combination of factor variables.
• Model: $$Y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha \beta)_{ij} + \varepsilon_{ijk} , \qquad \varepsilon_{ijk} \sim N(0, \sigma^2).$$

• In kidney example, $r=3$ (weight gain), $m=2$ (duration of treatment), $n_{ij}=10$ for all $(i,j)$.

### Questions of interest¶

Two-way ANOVA: main questions of interest

• Are there main effects for the grouping variables? $$H_0:\alpha_1 = \dots = \alpha_r = 0, \qquad H_0: \beta_1 = \dots = \beta_m = 0.$$

• Are there interaction effects: $$H_0:(\alpha\beta)_{ij} = 0, 1 \leq i \leq r, 1 \leq j \leq m.$$

### Interactions between factors¶

We've already seen these interactions in the IT salary example.

• An additive model says that the effects of the two factors occur additively -- such a model has no interactions.

• An interaction is present whenever the additive model does not hold.

### Interaction plot¶

In [16]:
%%R -h 800 -w 800
interaction.plot(W, D, logDays, type='b', col=c('red',
'blue'), lwd=2, pch=c(23,24))


When these broken lines are not parallel, there is evidence of an interaction. The one thing missing from this plot are errorbars. The above broken lines are clearly not parallel but there is measurement error. If the error bars were large then we might consider there to be no interaction, otherwise we might.

### Parameterization¶

• Many constraints are needed, again for identifiability. Let’s not worry too much about the details

• Constraints:

• $\sum_{i=1}^r \alpha_i = 0$

• $\sum_{j=1}^m \beta_j = 0$

• $\sum_{j=1}^m (\alpha\beta)_{ij} = 0, 1 \leq i \leq r$

• $\sum_{i=1}^r (\alpha\beta)_{ij} = 0, 1 \leq j \leq m.$

• We should convince ourselves that we know have exactly $r*m$ free parameters.

### Fitting the model¶

• Easy to fit when $n_{ij}=n$ (balanced) $$\widehat{Y}_{ijk}= \overline{Y}_{ij\cdot} = \frac{1}{n}\sum_{k=1}^{n} Y_{ijk}.$$

• Inference for combinations $$\text{Var} \left(\sum_{i=1}^r \sum_{j=1}^m a_{ij} \overline{Y}_{ij\cdot}\right) = \frac{ \sigma^2}{n} \cdot \sum_{i=1}^r \sum_{j=1}^m{a_{ij}^2}.$$

• Usual $t$-tests, confidence intervals.

In [17]:
%%R
kidney.lm = lm(logDays ~ D*W)
summary(kidney.lm)

Call:
lm(formula = logDays ~ D * W)

Residuals:
Min       1Q   Median       3Q      Max
-1.33772 -0.51121  0.06302  0.62926  1.17950

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   1.0212     0.2317   4.407 5.01e-05 ***
D2           -0.1042     0.3277  -0.318   0.7517
W2            0.8439     0.3277   2.575   0.0128 *
W3            1.5271     0.3277   4.661 2.10e-05 ***
D2:W2        -0.4231     0.4634  -0.913   0.3653
D2:W3        -0.4491     0.4634  -0.969   0.3368
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7327 on 54 degrees of freedom
Multiple R-squared:  0.4076,	Adjusted R-squared:  0.3528
F-statistic: 7.431 on 5 and 54 DF,  p-value: 2.301e-05



## Example¶

• Suppose we are interested in comparing the mean in $(D=1,W=3)$ and $(D=2,W=2)$ groups. The difference is $$E(\bar{Y}_{13\cdot}-\bar{Y}_{22\cdot})$$

• By independence, its variance is $$\text{Var}(\bar{Y}_{13\cdot}) + \text{Var}(\bar{Y}_{22\cdot}) = \frac{2 \sigma^2}{n}.$$

In [18]:
%%R
estimates = predict(kidney.lm, list(D=factor(c(1,2)), W=factor(c(3,2))))
print(estimates)
sigma.hat = 0.7327 # from table above
n = 10 # ten observations per group
fit = estimates[1] - estimates[2]
upper = fit + qt(0.975, 54) * sqrt(2 * sigma.hat^2 / n)
lower = fit - qt(0.975 ,54) * sqrt(2 * sigma.hat^2 / n)
data.frame(fit,lower,upper)

       1        2
2.548271 1.337719
fit     lower    upper
1 1.210551 0.5536058 1.867497

In [19]:
%%R

  (Intercept) D2 W2 W3 D2:W2 D2:W3
1           1  0  0  0     0     0
2           1  0  0  0     0     0
3           1  0  0  0     0     0
4           1  0  0  0     0     0
5           1  0  0  0     0     0
6           1  0  0  0     0     0


### Finding predicted values¶

The most direct way to compute predicted values is using the predict function

In [20]:
%%R
predict(kidney.lm, list(D=factor(1),W=factor(1)), interval='confidence')

       fit       lwr      upr
1 1.021156 0.5566306 1.485681


### ANOVA table¶

In the balanced case, everything can again be summarized from the ANOVA table

In [22]:

make_table(anova_twoway)
apply_theme('basic')

Out[22]:
 Source SS df $\mathbb{E}(MS)$ A $SSA=nm\sum_{i=1}^r \left(\overline{Y}_{i\cdot\cdot} - \overline{Y}_{\cdot\cdot\cdot}\right)^2$ r-1 $\sigma^2 + nm\frac{\sum_{i=1}^r \alpha_i^2}{r-1}$ B $SSB=nr\sum_{j=1}^m \left(\overline{Y}_{\cdot j\cdot} - \overline{Y}_{\cdot\cdot\cdot}\right)^2$ m-1 $\sigma^2 + nr\frac{\sum_{j=1}^m \beta_j^2}{m-1}$ A:B $SSAB = n\sum_{i=1}^r \sum_{j=1}^m \left(\overline{Y}_{ij\cdot} - \overline{Y}_{i\cdot\cdot} - \overline{Y}_{\cdot j\cdot} + \overline{Y}_{\cdot\cdot\cdot}\right)^2$ (m-1)(r-1) $\sigma^2 + n\frac{\sum_{i=1}^r\sum_{j=1}^m (\alpha\beta)_{ij}^2}{(r-1)(m-1)}$ Error $SSE = \sum_{i=1}^r \sum_{j=1}^m \sum_{k=1}^{n}(Y_{ijk} - \overline{Y}_{ij\cdot})^2$ (n-1)mr $\sigma^2$

### Tests using the ANOVA table¶

• Rows of the ANOVA table can be used to test various of the hypotheses we started out with.

• For instance, we see that under $H_0:(\alpha\beta)_{ij}=0, \forall i,j$ the expected value of $SSAB$ and $SSE$ is $\sigma^2$ – use these for an $F$-test testing for an interaction.

• Under $H_0$ $$F = \frac{MSAB}{MSE} = \frac{\frac{SSAB}{(m-1)(r-1)}}{\frac{SSE}{(n-1)mr}} \sim F_{(m-1)(r-1), (n-1)mr}$$

In [23]:
%%R
anova(kidney.lm)

Analysis of Variance Table

Response: logDays
Df  Sum Sq Mean Sq F value    Pr(>F)
D          1  2.3397  2.3397  4.3583   0.04156 *
W          2 16.9713  8.4856 15.8067 3.945e-06 ***
D:W        2  0.6357  0.3178  0.5920   0.55675
Residuals 54 28.9892  0.5368
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


We can also test for interactions using our usual approach

In [24]:
%%R
anova(lm(logDays ~ D + W, kidney.table), kidney.lm)

Analysis of Variance Table

Model 1: logDays ~ D + W
Model 2: logDays ~ D * W
Res.Df    RSS Df Sum of Sq     F Pr(>F)
1     56 29.625
2     54 28.989  2   0.63566 0.592 0.5567


### Some caveats about R formulae¶

While we see that it is straightforward to form the interactions test using our usual anova function approach, we generally cannot test for main effects by this approach.

In [25]:
%%R
lm_no_main_Weight = lm(logDays ~ D + W:D)
anova(lm_no_main_Weight, kidney.lm)

Analysis of Variance Table

Model 1: logDays ~ D + W:D
Model 2: logDays ~ D * W
Res.Df    RSS Df  Sum of Sq F Pr(>F)
1     54 28.989
2     54 28.989  0 3.5527e-15


In fact, these models are identical in terms of their planes or their fitted values. What has happened is that R has formed a different design matrix using its rules for formula objects.

In [26]:
%%R
lm1 = lm(logDays ~ D + W:D)
lm2 = lm(logDays ~ D + W:D + W)
anova(lm1, lm2)

Analysis of Variance Table

Model 1: logDays ~ D + W:D
Model 2: logDays ~ D + W:D + W
Res.Df    RSS Df  Sum of Sq F Pr(>F)
1     54 28.989
2     54 28.989  0 3.5527e-15


## ANOVA tables in general¶

So far, we have used anova to compare two models. In this section, we produced tables for just 1 model. This also works for any regression model, though we have to be a little careful about interpretation.

Let's revisit the job aptitude test data from last section.

In [27]:
%%R
url = 'http://stats191.stanford.edu/data/jobtest.table'
jobtest.table$ETHN <- factor(jobtest.table$ETHN)
jobtest.lm = lm(JPERF ~ TEST * ETHN, jobtest.table)
summary(jobtest.lm)

Call:
lm(formula = JPERF ~ TEST * ETHN, data = jobtest.table)

Residuals:
Min      1Q  Median      3Q     Max
-2.0734 -1.0594 -0.2548  1.2830  2.1980

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   2.0103     1.0501   1.914   0.0736 .
TEST          1.3134     0.6704   1.959   0.0677 .
ETHN1        -1.9132     1.5403  -1.242   0.2321
TEST:ETHN1    1.9975     0.9544   2.093   0.0527 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.407 on 16 degrees of freedom
Multiple R-squared:  0.6643,	Adjusted R-squared:  0.6013
F-statistic: 10.55 on 3 and 16 DF,  p-value: 0.0004511



Now, let's look at the anova output. We'll see the results don't match.

In [28]:
%%R
anova(jobtest.lm)

Analysis of Variance Table

Response: JPERF
Df Sum Sq Mean Sq F value    Pr(>F)
TEST       1 48.723  48.723 24.6266 0.0001412 ***
ETHN       1  5.247   5.247  2.6519 0.1229524
TEST:ETHN  1  8.666   8.666  4.3802 0.0526501 .
Residuals 16 31.655   1.978
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


The difference is how the Sum Sq columns is created. In the anova output, terms in the response are added sequentially.

We can see this by comparing these two models directly. The F statistic doesn't agree because the MSE above is computed in the fullest model, but the Sum of Sq is correct.

In [29]:
%%R
anova(lm(JPERF ~ TEST, jobtest.table), lm(JPERF ~ TEST + ETHN, jobtest.table))

Analysis of Variance Table

Model 1: JPERF ~ TEST
Model 2: JPERF ~ TEST + ETHN
Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     18 45.568
2     17 40.322  1    5.2468 2.2121 0.1552


Similarly, the first Sum Sq in anova can be found by:

In [30]:
%%R
anova(lm(JPERF ~ 1, jobtest.table), lm(JPERF ~ TEST, jobtest.table))

Analysis of Variance Table

Model 1: JPERF ~ 1
Model 2: JPERF ~ TEST
Res.Df    RSS Df Sum of Sq      F    Pr(>F)
1     19 94.291
2     18 45.568  1    48.723 19.246 0.0003555 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


There are ways to produce an ANOVA table whose $p$-values agree with summary. This is done by an ANOVA table that uses Type-III sum of squares.

In [31]:
%%R
library(car)
Anova(jobtest.lm, type=3)

Anova Table (Type III tests)

Response: JPERF
Sum Sq Df F value  Pr(>F)
(Intercept)  7.251  1  3.6647 0.07363 .
TEST         7.594  1  3.8385 0.06775 .
ETHN         3.052  1  1.5427 0.23211
TEST:ETHN    8.666  1  4.3802 0.05265 .
Residuals   31.655 16
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In [32]:
%%R
summary(jobtest.lm)

Call:
lm(formula = JPERF ~ TEST * ETHN, data = jobtest.table)

Residuals:
Min      1Q  Median      3Q     Max
-2.0734 -1.0594 -0.2548  1.2830  2.1980

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   2.0103     1.0501   1.914   0.0736 .
TEST          1.3134     0.6704   1.959   0.0677 .
ETHN1        -1.9132     1.5403  -1.242   0.2321
TEST:ETHN1    1.9975     0.9544   2.093   0.0527 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.407 on 16 degrees of freedom
Multiple R-squared:  0.6643,	Adjusted R-squared:  0.6013
F-statistic: 10.55 on 3 and 16 DF,  p-value: 0.0004511



# Fixed and random effects¶

• In kidney & rehab examples, the categorical variables are well-defined categories: below average fitness, long duration, etc.

• In some designs, the categorical variable is “subject”.

• Simplest example: repeated measures, where more than one (identical) measurement is taken on the same individual.

• In this case, the “group” effect $\alpha_i$ is best thought of as random because we only sample a subset of the entire population.

### When to use random effects?¶

• A “group” effect is random if we can think of the levels we observe in that group to be samples from a larger population.

• Example: if collecting data from different medical centers, “center” might be thought of as random.

• Example: if surveying students on different campuses, “campus” may be a random effect.

### Example: sodium content in beer¶

• How much sodium is there in North American beer? How much does this vary by brand?

• Observations: for 6 brands of beer, we recorded the sodium content of 8 12 ounce bottles.

• Questions of interest: what is the “grand mean” sodium content? How much variability is there from brand to brand?

• “Individuals” in this case are brands, repeated measures are the 8 bottles.

In [33]:
%%R
url = 'http://stats191.stanford.edu/data/sodium.table'
sodium.table$brand = factor(sodium.table$brand)
sodium.lm = lm(sodium ~ brand, sodium.table)
anova(sodium.lm)

Analysis of Variance Table

Response: sodium
Df Sum Sq Mean Sq F value    Pr(>F)
brand      5 854.53 170.906  238.71 < 2.2e-16 ***
Residuals 42  30.07   0.716
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


### One-way random effects model¶

• Assuming that cell-sizes are the same, i.e. equal observations for each “subject” (brand of beer).

• Observations $$Y_{ij} \sim \mu+ \alpha_i + \varepsilon_{ij}, 1 \leq i \leq r, 1 \leq j \leq n$$

• $\varepsilon_{ij} \sim N(0, \sigma^2_{\epsilon}), 1 \leq i \leq r, 1 \leq j \leq n$

• $\alpha_i \sim N(0, \sigma^2_{\alpha}), 1 \leq i \leq r.$

• Parameters:

• $\mu$ is the population mean;

• $\sigma^2_{\epsilon}$ is the measurement variance (i.e. how variable are the readings from the machine that reads the sodium content?);

• $\sigma^2_{\alpha}$ is the population variance (i.e. how variable is the sodium content of beer across brands).

### Modelling the variance¶

• In random effects model, the observations are no longer independent (even if $\varepsilon$'s are independent $${\rm Cov}(Y_{ij}, Y_{i'j'}) = \left(\sigma^2_{\alpha} + \sigma^2_{\epsilon} \delta_{j,j'} \right) \delta_{i,i'}.$$

• In more complicated models, this makes maximum likelihood estimation'' more complicated: least squares is no longer the best solution.

• It's no longer a plane!

• This model has a very simple model for the mean, it just has a slightly more complex model for the variance.

• Shortly we'll see other more complex models of the variance:

• Weighted Least Squares
• Correlated Errors

### Fitting the model¶

The MLE (Maximum Likelihood Estimator) is found by minimizing \begin{aligned} -2 \log \ell (\mu, \sigma^2_{\epsilon}, \sigma^2_{\alpha}|Y) &= \sum_{i=1}^r \biggl[ (Y_i - \mu)^T (\sigma^2_{\epsilon} I_{n_i \times n_i} + \sigma^2_{\alpha} 11^T)^{-1} (Y_i - \mu) \\ & \qquad + \log \left( \det(\sigma^2_{\epsilon} I_{n_i \times n_i} + \sigma^2_{\alpha} 11^T) \right) \biggr]. \end{aligned}

THe function $\ell(\mu, \sigma^2_{\epsilon}, \sigma^2_{\alpha})$ is called the likelihood function.

### Fitting the model in balanced design¶

Only one parameter in the mean function $\mu.$

• When cell sizes are the same (balanced), $$\widehat{\mu} = \overline{Y}_{\cdot \cdot} = \frac{1}{nr} \sum_{i,j} Y_{ij}.$$ Unbalanced models: use numerical optimizer.

• This also changes estimates of $\sigma^2_{\epsilon}$ -- see ANOVA table. We might guess that $df=nr-1$ and $$\widehat{\sigma}^2 = \frac{1}{nr-1} \sum_{i,j} (Y_{ij} - \overline{Y}_{\cdot\cdot})^2.$$ This is not correct.

In [34]:
%%R
library(nlme)
sodium.lme = lme(fixed=sodium~1,random=~1|brand, data=sodium.table)
summary(sodium.lme)

Linear mixed-effects model fit by REML
Data: sodium.table
AIC      BIC    logLik
154.923 160.4735 -74.46152

Random effects:
Formula: ~1 | brand
(Intercept)  Residual
StdDev:    4.612346 0.8461397

Fixed effects: sodium ~ 1
Value Std.Error DF  t-value p-value
(Intercept) 17.62917  1.886939 42 9.342733       0

Standardized Within-Group Residuals:
Min          Q1         Med          Q3         Max
-1.90551291 -0.68337933  0.08232268  0.79246858  1.64968961

Number of Observations: 48
Number of Groups: 6


For reasons I'm not sure of, the degrees of freedom don't agree with our ANOVA, though we do find the correct SE for our estimate of $\mu$:

In [35]:
%%R
MSTR = anova(sodium.lm)$Mean[1] sqrt(MSTR/48)  [1] 1.886939  The intervals formed by lme use the 42 degrees of freedom, but are otherwise the same: In [36]: %%R intervals(sodium.lme)  Approximate 95% confidence intervals Fixed effects: lower est. upper (Intercept) 13.82117 17.62917 21.43716 attr(,"label") [1] "Fixed effects:" Random Effects: Level: brand lower est. upper sd((Intercept)) 2.475221 4.612346 8.594683 Within-group standard error: lower est. upper 0.6832445 0.8461397 1.0478715  In [37]: %%R center = mean(sodium.table$sodium)
lwr = center - sqrt(MSTR / 48) * qt(0.975,42)
upr = center + sqrt(MSTR / 48) * qt(0.975,42)
data.frame(lwr, center, upr)

       lwr   center      upr
1 13.82117 17.62917 21.43716


Using our degrees of freedom as 7 yields slightly wider intervals

In [38]:
%%R