LMM hypotheses tests and confidence intervals

Often statistical models are used in order to determine which of the predictor variables have a significant relationship with the response variable. LMM has a number of methods to aid with this kind of statistical inference.

Below we will fit a linear mixed model using the Ruby gem mixed_models, and demostrate various types of hypotheses tests and confidence intervals available for objects of class LMM.


  • Data and linear mixed model

  • Likelihood ratio test

    • Chi squared p-value
    • Bootstrap p-value
  • Fixed effects hypotheses tests

    • Likelihood ratio test
    • Bootstrap test
    • Variances and covariances
    • Wald Z test
  • Fixed effects confidence intervals

    • Bootstrap confidence intervals
    • Wald Z confidence intervals
    • All types of intervals at once
  • Random effects hypotheses tests

    • Likelihood ratio test
    • Bootstrap test

Data and linear mixed model

We use the same data and model formulation as in a previous example, where we have looked at various parameter estimates, and have shown that the model fit is good.

The data set, which is simulated, contains two numeric variables Age and Aggression, and two categorical variables Location and Species. These data are available for 100 (human and alien) individuals.

We model the Aggression level of an individual as a linear function of the Age (Aggression decreases with Age), with a different constant added for each Species (i.e. each species has a different base level of aggression). Moreover, we assume that there is a random fluctuation in Aggression due to the Location that an individual is at. Additionally, there is a random fluctuation in how Age affects Aggression at each different Location.

Thus, the Aggression level of an individual of Species $spcs$ who is at the Location $lctn$ can be expressed as: $$Aggression = \beta_{0} + \gamma_{spcs} + Age \cdot \beta_{1} + b_{lctn,0} + Age \cdot b_{lctn,1} + \epsilon,$$ where $\epsilon$ is a random residual, and the random vector $(b_{lctn,0}, b_{lctn,1})^T$ follows a multivariate normal distribution (the same distribution but different realizations of the random vector for each Location). That is, we have a linear mixed model with fixed effects $\beta_{0}, \beta_{1}, \gamma_{Dalek}, \gamma_{Ood}, \dots$, and random effects $b_{Asylum,0}, b_{Asylum,1}, b_{Earth,0},\dots$.

We fit the model with the convenient method LMM#from_formula, which mimics the behaviour of the function lmer from the R package lme4.

In [1]:
require 'mixed_models'

alien_species = Daru::DataFrame.from_csv '../examples/data/alien_species.csv'
# mixed_models expects that all variable names in the data frame are ruby Symbols:
alien_species.vectors = Daru::Index.new(alien_species.vectors.map { |v| v.to_sym })

model_fit = LMM.from_formula(formula: "Aggression ~ Age + Species + (Age | Location)", 
                             data: alien_species)

puts "Fixed effects terms estimates and some diagnostics:"
puts model_fit.fix_ef_summary.inspect(20)
puts "Random effects correlation structure:"
puts model_fit.ran_ef_summary.inspect(12)
Fixed effects terms estimates and some diagnostics:

#<Daru::DataFrame:47402186756480 @name = 1973d309-2726-4c76-9a4a-571bb5b14396 @size = 5>
                                     coef                   sd              z_score        WaldZ_p_value 
           intercept   1016.2867207023459    60.19727495769054   16.882603430415077                  0.0 
                 Age -0.06531615342788907   0.0898848636725299  -0.7266646547504374  0.46743141066211646 
   Species_lvl_Human  -499.69369529020855   0.2682523406941929   -1862.774781375937                  0.0 
     Species_lvl_Ood   -899.5693213535765  0.28144708140043684  -3196.2289922406003                  0.0 
Species_lvl_WeepingA  -199.58895804200702  0.27578357795259995   -723.7158917283725                  0.0 

Random effects correlation structure:

#<Daru::DataFrame:47402186580300 @name = e1572836-6bdf-435f-8e5c-4230b839e0e2 @size = 2>
                 Location Location_Age 
    Location 104.26376362 -0.059863903 
Location_Age -0.059863903 0.1556707755 

Likelihood ratio test

Given two nested models, the LMM.likelihood_ratio_test tests whether the restricted simpler model is adequate. In this context 'nested' means that all predictors used in the restricted model must also be predictors in the full model (i.e. one model is a reduced version of the other more complex model). This method works only if both models were fit using the deviance (as opposed to REML criterion) as the objective function for the minimization (i.e. fit with reml: false). LMM.likelihood_ratio_test returns the p-value of the test.

Two methods are available to compute the p-value: approximation by a Chi squared distribution as delineated in section 2.4.1 in Pinheiro & Bates "Mixed Effects Models in S and S-PLUS" (2000), and a method based on bootstrapping as delineated in section 4.2.3 in Davison & Hinkley "Bootstrap Methods and their Application" (1997).

For example we can test the model formulation as above against a simpler model, which assumes that Age is neither a fixed effect nor a random effect. We can compute a likelihood ratio using the method LMM.likelihood_ratio.

In [2]:
complex_model = LMM.from_formula(formula: "Aggression ~ Age + Species + (Age | Location)", 
                                 data: alien_species, reml: false)
simple_model = LMM.from_formula(formula: "Aggression ~ Species + (1 | Location)", 
                                data: alien_species, reml: false)

LMM.likelihood_ratio(simple_model, complex_model)

Chi squared p-value

We perform the likelihood ratio test using the method LMM.likelihood_ratio_test with method: :chi2 to use the Chi squared approximation for the p-values.

In [3]:
chi2_p_value = LMM.likelihood_ratio_test(simple_model, complex_model, method: :chi2)

The p-value is tiny, which implies that Age is a significant predictor of Aggression, and that the more complex model should be prefered.

Bootstrap p-value

However, often one may not be sure whether the assumptions required for the validity of the Chi squared test are satisfied. In that case, one can compute a p-value with the bootstrap method by specifying method: :bootstrap, which by default uses all available CPUs in parallel.

In [4]:
bootstrap_p_value = LMM.likelihood_ratio_test(simple_model, complex_model, method: :bootstrap, nsim: 1000)

Even though the p-value is not as extreme as with the Chi squared test, it still shows significance of the variable Age.

Fixed effects hypotheses tests

Significance tests for the fixed effects can be performed with LMM#fix_ef_p (or its alias LMM#fix_ef_test). For a given fixed effects coefficient estimate the tested null hypothesis is that the true value of the coefficient is zero (i.e. no linear relationship to the response).

That is, for the above model formulation we carry out hypotheses tests for each fixed effects terms $\beta_{i}$ or $\gamma_{species}$, testing the null $H_{0} : \beta_{i} = 0$ against the alternative $H_{a} : \beta_{i} \neq 0$, or respectively the null $H_{0} : \gamma_{species} = 0$ against the alternative $H_{a} : \gamma_{species} \neq 0$.

LMM currently offers three methods of hypotheses testing for fixed effects: Wald Z test, likelihood ratio test, and a bootstrap test. For a good discussion of the validity of different methods see this entry from the wiki of the r-sig-mixed-models mailing list. Moreover, due to the equivalence of hypotheses tests and confidence intervals, an additional hypothesis testing tool are bootstrap confidence intervals which are described in a different section below.

Likelihood ratio test for the fixed effects

The likelihood ratio test for fixed effects is actually merely a convenience method. It is a convenient interface to LMM.likelihood_ratio_test with method: :chi2 described above. For example, we can test whether the fixed effects term Age is a significant predictor of Aggression in complex_model as follows.

In [5]:
lrt_p_value = complex_model.fix_ef_p(variable: :Age, method: :lrt)

We see that Age does not seem be significant as a fixed effects term, even though it is in general a significant predictor as we have seen before.

Bootstrap test for the fixed effects

Like the likelihood ratio test, the bootstrap test is merely a convenient shortcut to LMM.likelihood_ratio_test with method: :bootstrap. We can test the significance of the predictor variable Age in complex_model with:

In [6]:
bootstrap_p_value = complex_model.fix_ef_p(variable: :Age, method: :bootstrap, nsim: 1000)

This result confirms the conclusion based of method: :lrt.

Variances and covariances of the fixed effects coefficient estimates

The covariance matrix of the fixed effects estimates is returned by LMM#fix_ef_cov_mat, and the standard deviations of the fixed effects coefficients are returned by LMM#fix_ef_sd.

In [7]:
{:intercept=>60.19727495769054, :Age=>0.0898848636725299, :Species_lvl_Human=>0.2682523406941929, :Species_lvl_Ood=>0.28144708140043684, :Species_lvl_WeepingAngel=>0.27578357795259995}

Wald Z tests for the fixed effects

Based on the values returned by LMM#fix_ef_sd, the Wald Z test statistics for all fixed effects coefficients are computed by the method LMM#fix_ef_z.

In [8]:
{:intercept=>16.882603430415077, :Age=>-0.7266646547504374, :Species_lvl_Human=>-1862.774781375937, :Species_lvl_Ood=>-3196.2289922406003, :Species_lvl_WeepingAngel=>-723.7158917283725}

Based on the above Wald Z test statistics, hypotheses tests for each fixed effects term can be carried out. The corresponding (approximate) p-values are obtained with LMM#fix_ef_p as follows.

In [9]:
model_fit.fix_ef_p(method: :wald)
{:intercept=>0.0, :Age=>0.46743141066211646, :Species_lvl_Human=>0.0, :Species_lvl_Ood=>0.0, :Species_lvl_WeepingAngel=>0.0}

We see that Aggression of each Species is significantly different from the base level (which is the species Dalek in this model), while statistically we don't have enough evidence to conclude that Age of an individual is a good predictor of his/her/its aggression level (which agrees with the conclusion obtained above with method: :lrt and method: :bootstrap).

Confidence intervals for the fixed effects

Confidence intervals for the fixed effects terms can be computed with the method LMM#fix_ef_conf_int. The following types of confidence intervals are available:

  • Wald Z intervals
  • Basic bootstrap intervals
  • Bootstrap normal intervals
  • Bootstrap percentile intervals
  • Studentized bootstrap intervals

Bootstrap confidence intervals for the fixed effects

A detailed description of the bootstrap methods available in LMM#fix_ef_conf_int is given in this blog post.

For example we can compute the studentized bootstrap confidence intervals (also called bootstrap-t) with 98% coverage as follows.

In [10]:
bootstrap_t_intervals = model_fit.fix_ef_conf_int(level: 0.98, method: :bootstrap, 
                                                  boottype: :studentized, nsim: 1000)
{:intercept=>[874.4165005038727, 1139.1584607682103], :Age=>[-0.2706718136184743, 0.1493216868271593], :Species_lvl_Human=>[-500.28872315642866, -499.09663075641373], :Species_lvl_Ood=>[-900.2572251446181, -898.9341708622952], :Species_lvl_WeepingAngel=>[-200.24313298457852, -198.95081041642916]}

We see that Age is the only fixed effects predictor whose confidence interval contains zero, which implies that it probably has little linear relationship as a fixed effect to the response variable Aggression.

Wald Z confidence intervals for the fixed effects

We can use the Wald Z statistic to compute confidence intervals as well. For example 90% confidence intervals for each fixed effects coefficient estimate can be computed as follows.

In [11]:
conf_int = model_fit.fix_ef_conf_int(level: 0.9, method: :wald)
{:intercept=>[917.2710134723027, 1115.302427932389], :Age=>[-0.21316359921454495, 0.0825312923587668], :Species_lvl_Human=>[-500.1349311310106, -499.2524594494065], :Species_lvl_Ood=>[-900.0322606117453, -899.1063820954076], :Species_lvl_WeepingAngel=>[-200.04258166587707, -199.13533441813698]}

For greated visual clarity we can put the coefficient estimates and the confidence intervals into a Daru::DataFrame:

In [12]:
df = Daru::DataFrame.rows(conf_int.values, order: [:lower90, :upper90], index: model_fit.fix_ef_names)
df[:coef] = model_fit.fix_ef.values
Daru::DataFrame:47402176301980 rows: 5 cols: 3

All types of confidence intervals at once

With method: :all, LMM#fix_ef_conf_int returns a Daru::DataFrame containing the confidence intervals obtained by each of the available methods. The data frame can be printed in form of a nice looking table for inspection.

For example for the alien species data we obtain all types of 95% confidence intervals with:

In [13]:
ci = model_fit.fix_ef_conf_int(method: :all, nsim: 1000)
Daru::DataFrame:47402185048160 rows: 5 cols: 5
wald_z[898.3022305977157, 1134.2712108069761][-0.2414872478168185, 0.11085494096104034][-500.2194602132623, -499.1679303671548][-900.1209474930289, -899.017695214124][-200.12948391874875, -199.0484321652653]
boot_basic[901.8226468130745, 1142.3725792755527][-0.23613380807522952, 0.12082106157921091][-500.23480713878746, -499.1808531803127][-900.1538925933955, -899.0066844647126][-200.18820176543932, -199.02291203098875]
boot_norm[900.8638839567735, 1134.5288460466913][-0.2473842439597385, 0.11510977545973858][-500.2315699405288, -499.1525612096204][-900.1465247092458, -898.9995195976951][-200.1708564794685, -199.01484568123584]
boot_t[901.8226468130744, 1142.3725792755527][-0.23613380807522952, 0.12082106157921088][-500.2348071387875, -499.1808531803127][-900.1538925933955, -899.0066844647126][-200.18820176543932, -199.02291203098875]
boot_perc[890.2008621291391, 1130.7507945916173][-0.25145336843498906, 0.10550150121945137][-500.2065374001044, -499.15258344162964][-900.1319582424403, -898.9847501137574][-200.1550040530253, -198.98971431857473]

Since here we are dealing with data that was simulated according to the assumptions of the linear mixed model, all parameters end up approximately meeting the normality assumptions, and therefore all confidence interval methods turn out to be pretty much equivalent. Often when analyzing less ideal data, this will not be the case. Then it might be necessary to compare different types of confidence intervals in order to draw the right conclusions.

Random effects hypotheses tests

We can test individual random effects for siginificance with the method LMM#ran_ef_p (or its alias LMM#ran_ef_test). It offers two methods, :lrt and :bootstrap. Both are in fact merely convenient interfaces to LMM.likelihood_ratio_test described above.

Likelihood ratio test for individual random effects

The likelihood ratio test for random effects can only be performed if the model was fit with option reml: false.

For example we can test the random intercept term (due to Location) of complex_model as follows.

In [14]:
complex_model.ran_ef_p(variable: :intercept, grouping: :Location, method: :lrt)

This suggests that the variability in Aggression is significantly influenced by the effect of Location.

Bootstrap test for individual random effects

This test can also be performed only if the model was fit withreml: false.

We can test whether there is a significant random variation of the effect of Age due to Location using LMM#ran_ef_p with method: :bootstrap as follows.

In [15]:
complex_model.ran_ef_p(variable: :Age, grouping: :Location, method: :bootstrap, nsim: 1000)

We see that in fact the change of Age due to Location is a significant random effect.