A lightning introduction to linear mixed models for designed experiments: Part 1

A lightning introduction to linear mixed models for designed experiments: Part 1

The VSNi Team

11 October 2022
image_blog

Welcome to Part 1 of our four-part blog series aimed at providing you with a brief introduction to linear mixed models. In this first instalment, we’ll explain the basic form of the linear mixed model and discuss the distinction between fixed and random terms.

Part 1: The linear mixed model

Linear mixed models, also known as multi-level models and linear mixed effects models, are widely used in statistics to model dependent data structures, such as hierarchical, longitudinal or spatial data. They are an extension of simple linear models that allows for both fixed and random effects as predictor variables.

The basic form of a linear mixed model is comprised of two components: the fixed and random models. The choice of which terms (i.e., explanatory variables) to include in the fixed model, and which to include in the random model, typically depends on the aim of the analysis. However, in general, fixed terms often represent the effect of specific conditions applied or chosen for the experiment, i.e., the experimental treatments. Random terms often represent terms where the conditions observed comprise a sample from some wider population, and it is the variability of the population that is of interest. The structural (or randomized) components of an experimental design, such as blocks and plots, can usually be argued to fall into this category. Thus, from the perspective of designed experiments, terms representing experimental treatments are usually assigned as fixed, and terms associated with the randomization structure of the design are usually assigned as random.

As an example, let’s consider a trial in which the yield of 130 lines of wheat were studied. The trial design consisted of six replicates (factor Rep), each containing 13 sub-blocks (factor Subblock) with 10 plots. This gives a nested blocking structure of sub-blocks within replicate, which we’ll denote by Rep+Rep.Subblock. In a standard analysis of this trial, the 130 lines of wheat (factor Genotype) would be considered as a set of fixed effects. Thus, the two components of the linear mixed model can be written as:

Model 1

Fixed model: Genotype

Random model: Rep+Rep.Subblock                             

Occasionally, other arguments are used to assign terms as random rather than fixed. For example, as we’ll explore in next week’s blog, predicted random effects can be more precise than the predicted fixed effects. Thus, if precision is the most important criterion for a prediction, then it may be preferable to assign terms as random. This argument is often used in plant breeding trials, where genotypes may be assigned as random in order to increase precision and avoid selection bias. In this case, the two components of the model can be written as:

Model 2

Fixed model: 

Random model: Genotype + Rep+Rep.Subblock              

In the simplest form of the random model it is assumed that:

  • the effects associated with each random term are a set of independent samples from a Normal distribution, 
  • the effects within each random term have a common variance, known as the variance component for that term, and 
  • the effects from different random terms are independent.

However, more complex random models are possible that relax the assumptions of independence and common variance, leading to a wide range of covariance models.

Assuming a simple random model, Model 1 (genotypes fixed, experimental design factors random) can be written in terms of the individual observations as:

         Y subscript i j k space end subscript equals mu plus G subscript g left parenthesis i j k right parenthesis end subscript space plus space R subscript i space plus space B subscript i j end subscript space plus space e subscript i j k end subscript  (Model 1)

where

  • there are 780 observations, labelled by the replicate (i = 1, …, 6), sub-block within replicate (j = 1, …, 13) and plot within sub-block (k = 1, …, 10)
  • y subscript i j k end subscript is the observed response on the k to the power of t h end exponent plot within the j to the power of t h end exponent sub-block within the i to the power of t h end exponent replicate
  • mu is a constant (or intercept) term
  • G subscript g is the effect of the g to the power of t h end exponent genotype
  • g left parenthesis i j k right parenthesis indicates the genotype randomly allocated to the i j k to the power of t h end exponent plot
  • R subscript i is the random effect associated with the i to the power of t h end exponent replicate with variance component sigma subscript R superscript 2
  • B subscript i j end subscript is the random effect associated with the j to the power of t h end exponent sub-block in the i to the power of t h end exponent replicate with variance component sigma subscript B superscript 2
  • e subscript i j k end subscript is the random deviation for the k to the power of t h end exponent plot within the j to the power of t h end exponent sub-block within the i to the power of t h end exponent replicate, with residual variance sigma subscript blank superscript 2.

There are two fixed terms here: the constant (μ) and the set of genotype effects (G subscript g, g = 1, …, 169). Note, μ estimates the predicted mean for the reference genotype, r (typically, this is the first level of the factor), G subscript r is constrained to equal zero, and G subscript n represents the effect of genotype n as a deviation from the reference genotype, r. 

There are two random terms: the set of 6 replicate effects (R subscript i) and the set of 78 replicate by sub-block effects B subscript i j end subscript, plus the residual term (deviations).

The estimated parameters of the linear mixed model are the set of fixed effects and the variance parameters. The random effects have a slightly different status, which we’ll discuss in Part 2 of this series. Variance parameters are estimated by REML (REsidual Maximum Likelihood, also called REstricted Maximum Likelihood). The fixed effects are estimated by the method of generalized least squares, conditional on the estimated values of the variance components. 

In the next blog of this series, we’ll dive more deeply into the random model. We’ll discuss how to interpret the random model, how to compare the fit of different random models and what BLUPs are.