The VSNi Team11 October 2022
Welcome to Part 1 of our four-part blog series aimed at providing you with a brief introduction to linear mixed models. In this first instalment, we’ll explain the basic form of the linear mixed model and discuss the distinction between fixed and random terms.
Linear mixed models, also known as multi-level models and linear mixed effects models, are widely used in statistics to model dependent data structures, such as hierarchical, longitudinal or spatial data. They are an extension of simple linear models that allows for both fixed and random effects as predictor variables.
The basic form of a linear mixed model is comprised of two components: the fixed and random models. The choice of which terms (i.e., explanatory variables) to include in the fixed model, and which to include in the random model, typically depends on the aim of the analysis. However, in general, fixed terms often represent the effect of specific conditions applied or chosen for the experiment, i.e., the experimental treatments. Random terms often represent terms where the conditions observed comprise a sample from some wider population, and it is the variability of the population that is of interest. The structural (or randomized) components of an experimental design, such as blocks and plots, can usually be argued to fall into this category. Thus, from the perspective of designed experiments, terms representing experimental treatments are usually assigned as fixed, and terms associated with the randomization structure of the design are usually assigned as random.
As an example, let’s consider a trial in which the yield of 130 lines of wheat were studied. The trial design consisted of six replicates (factor Rep), each containing 13 sub-blocks (factor Subblock) with 10 plots. This gives a nested blocking structure of sub-blocks within replicate, which we’ll denote by Rep+Rep.Subblock. In a standard analysis of this trial, the 130 lines of wheat (factor Genotype) would be considered as a set of fixed effects. Thus, the two components of the linear mixed model can be written as:
Fixed model: Genotype
Random model: Rep+Rep.Subblock
Occasionally, other arguments are used to assign terms as random rather than fixed. For example, as we’ll explore in next week’s blog, predicted random effects can be more precise than the predicted fixed effects. Thus, if precision is the most important criterion for a prediction, then it may be preferable to assign terms as random. This argument is often used in plant breeding trials, where genotypes may be assigned as random in order to increase precision and avoid selection bias. In this case, the two components of the model can be written as:
Random model: Genotype + Rep+Rep.Subblock
In the simplest form of the random model it is assumed that:
However, more complex random models are possible that relax the assumptions of independence and common variance, leading to a wide range of covariance models.
Assuming a simple random model, Model 1 (genotypes fixed, experimental design factors random) can be written in terms of the individual observations as:
There are two fixed terms here: the constant (μ) and the set of genotype effects (, g = 1, …, 169). Note, μ estimates the predicted mean for the reference genotype, r (typically, this is the first level of the factor), is constrained to equal zero, and represents the effect of genotype n as a deviation from the reference genotype, r.
There are two random terms: the set of 6 replicate effects () and the set of 78 replicate by sub-block effects , plus the residual term (deviations).
The estimated parameters of the linear mixed model are the set of fixed effects and the variance parameters. The random effects have a slightly different status, which we’ll discuss in Part 2 of this series. Variance parameters are estimated by REML (REsidual Maximum Likelihood, also called REstricted Maximum Likelihood). The fixed effects are estimated by the method of generalized least squares, conditional on the estimated values of the variance components.
In the next blog of this series, we’ll dive more deeply into the random model. We’ll discuss how to interpret the random model, how to compare the fit of different random models and what BLUPs are.
Dr. Valérie Poupon09 February 2024
Parental versus Animal Model: What is the difference and how do we choose?
Tim Bean23 January 2024
Data, data everywhere…but is it helping your analytics?