LMM with multiple crossed of nested random effects

Often the experimental design or the data suggests a linear mixed model whose random effects are associated with multiple grouping factors.

Crossed random effects

An inclusion of multiple random effects terms which correspond to multiple grouping factors is often refered to as crossed random effect. A good coverage of linear mixed models with crossed random effects can be found in Chapter 2 of Douglas Bates' lme4 book.

A simple example of nested random effects from Bates' book is the following model. The $i$th observation of diameter in the $j$th sample from the $k$th plate is modeled as:

$$diameter_{ijk} = Intercept + SampleIntercept_{j} + PlateIntercept_{k} + RandomError_{ij},$$

where Intercept is the overall average, and SampleIntercept as well as PlateIntercept are random intercept terms, due to the sample and plate that a particular observation comes from.

In mixed_models we would fit such a model with:

LMM.from_formula(formula: "diameter ~ 1 + (1 | Sample) + (1 | Plate)", 
                 data: penicillin)

Example with simulated data

As an example we fit a linear mixed model, which can be written as

$$y = \beta_{0} + \beta_{1} \cdot x + b_{0} + b_{1} \cdot x + c_{0} + c_{1} \cdot x + \epsilon,$$

where $y$ is the response and $x$ is a predictor variable; $\beta_{0}$ and $\beta_{1}$ are the fixed intercept and slope coefficients; $b_{0}$ and $b_{1}$ are random intercept and slope coefficients due to factor $g$; $c_{0}$ and $c_{1}$ are random intercept and slope coefficients due to factor $h$.

The simulated data set is loaded, and its first five rows are displayed with:

In [1]:
require 'mixed_models'
# we pass `headers: true` to `#from_csv`, because
# mixed_models expects that all variable names in the data frame are ruby Symbols
df = Daru::DataFrame.from_csv "../spec/data/crossed_effects_data.csv", headers: true
df.head 5
Out[1]:
Daru::DataFrame:47127301961640 rows: 5 cols: 4
ghxy
0111.717420402467890.202546206520008
1230.2237449022394360.840573625427331
231-1.11598926418025-0.998332155138107
312-0.15562952641427-0.0145985318440115
412-0.1089194150635930.722443338784882

Then we fit a linear mixed model in mixed_models, and display the estimated correlation structure of the random effects:

In [2]:
mod = LMM.from_formula(formula: "y ~ x + (x|g) + (x|h)", data: df, reml: false)
mod.ran_ef_summary
Out[2]:
Daru::DataFrame:47127296310540 rows: 4 cols: 4
gg_xhh_x
g0.75397857189834870.999999998031062
g_x0.9999999980310620.7490861098771483
h0.56386204476714330.9999999971404151
h_x0.99999999714041510.38533198068098046

We can see that the crossed random effects corresponding to the grouping factors $g$ and $h$ form uncorrelated blocks in the correlation matrix. That is, crossed random effects are assumed to be independent by the model.

Of course, we can use all of the model attributes, diagnostics and inference methods described in other mixed_models tutorials for this model as well.

For example, we can test for the significance of the fixed slope effect, using the bootstrap approach with the following line of code:

In [3]:
p_value = mod.fix_ef_p(variable: :x, method: :bootstrap, nsim: 1000)
Out[3]:
0.0989010989010989

Or we can use the likelihood ratio test instead:

In [4]:
alternative_p_value = mod.fix_ef_p(variable: :x, method: :lrt)
Out[4]:
0.04871184664935746

Since the p-value obtained by LRT is barely below a significance level of 5%, and since the bootstrap method is typically more accurate than LRT and produces a rather high p-value here, we conclude that the data does not show enough evidence for statistical significance of predictor $x$ as a fixed effect.

Nested random effects

The grouping factors of random effects terms can be nested in each other. We refer to such random effects structures as nested random effects (even though strictly speaking not the random effects but the corresponding grouping factors are nested). As for crossed random effects, a good reference for linear mixed models with nested random effects is Chapter 2 of Douglas Bates' lme4 book.

For example, consider an experiment where we measure the bone volume of each digit in each foot of a number of mice (i.e. digit is nested within foot, which is nested within mouse). The $i$th observation of volume in the $m$th digit of the $k$th foot of the $j$th mouse can be modeled as:

$$volume_{ijkm} = Intercept + MouseIntercept_{j} + FootIntercept_{kj} + RandomError_{ijkm},$$

i.e. the random effect foot only appears as nested within mouse (i.e. the intercept due to foot 1 in mouse 1 is different than the intercept due to foot 1 in mouse 2).

In mixed_models we could fit such a model with:

LMM.from_formula(formula: "volume ~ 1 + (1 | mouse) + (1 | mouse:foot)",
                 data: bone_data)

Remark: In the R package lme4, instead of the formula "volume ~ 1 + (1|mouse) + (1|mouse:foot)" a shorter equivalent formula "volume ~ 1 + (1|mouse/foot)" can be used to fit the model. However, the formula parser in mixed_models currently does not support the shortcut notation /.

Example with simulated data

As an example we fit a linear mixed model with nested random effects to the following data.

In [5]:
df = Daru::DataFrame.from_csv("../spec/data/nested_effects_with_slope_data.csv", headers: true)
df.head 5
Out[5]:
Daru::DataFrame:47127301459180 rows: 5 cols: 4
abxy
0a3b10.3884253101947315.10364866735101
1a3b20.4462230005516126.23307061450375
2a3b11.5499365711830212.2050404173393
3a3b11.5278661459971512.0067595454774
4a3b20.7601121215127088.20054527384668

We consider the following model:

  • We take y to be the response and x its predictor.
  • We consider the factor b to be nested within the factor a.
  • We assume that the intercept varies due to variable a; that is, a different (random) intercept term for each level of a.
  • Moreover, we assume that the intercept varies due to the factor b which is nested in a; that is, different (random) intercept for each combination of levels of a and b.

We fit this model in mixed_models, and display the estimated random effects correlation structure.

In [6]:
mod = LMM.from_formula(formula: "y ~ x + (1|a) + (1|a:b)", data: df, reml: false)
mod.ran_ef_summary
Out[6]:
Daru::DataFrame:47127301931880 rows: 2 cols: 2
aa_and_b
a1.3410830040769561
a_and_b0.9769750031499026

We see that the standard deviations of the effect of a and of the nested effect of b and a are of comparable magnitude.

We can use all methods available in LMM to look at various parameter estimates or to do statistical inference.

For example, we can test the nested random effect for significance, in order to decide whether we should drop that term from the model to reduce model complexity. We can use the Chi squared based likelihood ratio test as follows.

In [7]:
p_val = mod.ran_ef_p(variable: :intercept, grouping: [:a, :b], method: :lrt)
Out[7]:
0.0050606262424956515

Where the nested grouping factor is supplied as an Array [:a, :b].

The p-value is small, suggesting that we probably should keep the term (1|a:b) in the model formula. To be more sure we can perform a bootstrap based hypothesis test as follows.

In [8]:
p_val_boot = mod.ran_ef_p(variable: :intercept, grouping: [:a, :b], 
                           method: :bootstrap, nsim: 1000)
Out[8]:
0.000999000999000999

The bootstrap p-value also support the above conclusion.