The VSNi Team

11 October 2022Welcome to Part 1 of our four-part blog series aimed at providing you with a brief introduction to linear mixed models. In this first instalment, we’ll explain the basic form of the linear mixed model and discuss the distinction between fixed and random terms.

Linear mixed models, also known as multi-level models and linear mixed effects models, are widely used in statistics to model dependent data structures, such as hierarchical, longitudinal or spatial data. They are an extension of simple linear models that allows for both fixed and random effects as predictor variables.

The basic form of a linear mixed model is comprised of two components: the **fixed** and **random** models. The choice of which terms (i.e., explanatory variables) to include in the fixed model, and which to include in the random model, typically depends on the aim of the analysis. However, in general, fixed terms often represent the effect of specific conditions applied or chosen for the experiment, i.e., the experimental treatments. Random terms often represent terms where the conditions observed comprise a sample from some wider population, and it is the variability of the population that is of interest. The structural (or randomized) components of an experimental design, such as blocks and plots, can usually be argued to fall into this category. Thus, from the perspective of designed experiments, terms representing experimental treatments are usually assigned as fixed, and terms associated with the randomization structure of the design are usually assigned as random.

As an example, let’s consider a trial in which the yield of 130 lines of wheat were studied. The trial design consisted of six replicates (factor *Rep*), each containing 13 sub-blocks (factor *Subblock*) with 10 plots. This gives a nested blocking structure of sub-blocks within replicate, which we’ll denote by *Rep+Rep.Subblock*. In a standard analysis of this trial, the 130 lines of wheat (factor *Genotype*) would be considered as a set of fixed effects. Thus, the two components of the linear mixed model can be written as:

**Model 1**

Fixed model: *Genotype*

Random model: *Rep+Rep.Subblock*

Occasionally, other arguments are used to assign terms as random rather than fixed. For example, as we’ll explore in next week’s blog, predicted random effects can be more precise than the predicted fixed effects. Thus, if precision is the most important criterion for a prediction, then it may be preferable to assign terms as random. This argument is often used in plant breeding trials, where genotypes may be assigned as random in order to increase precision and avoid selection bias. In this case, the two components of the model can be written as:

**Model 2**

Fixed model:

Random model: *Genotype + Rep+Rep.Subblock*

In the simplest form of the random model it is assumed that:

- the effects associated with each random term are a set of independent samples from a Normal distribution,
- the effects within each random term have a common variance, known as the
for that term, and**variance component** - the effects from different random terms are independent.

However, more complex random models are possible that relax the assumptions of independence and common variance, leading to a wide range of covariance models.

Assuming a simple random model, Model 1 (genotypes fixed, experimental design factors random) can be written in terms of the individual observations as:

(Model 1)

where

- there are 780 observations, labelled by the replicate (
*i*=*1*, …,*6*), sub-block within replicate (*j*=*1*, …,*13*) and plot within sub-block (*k*=*1*, …,*10*) - is the observed response on the plot within the sub-block within the replicate
- is a constant (or intercept) term
- is the effect of the genotype
- indicates the genotype randomly allocated to the plot
- is the random effect associated with the replicate with variance component
- is the random effect associated with the sub-block in the replicate with variance component
- is the random deviation for the plot within the sub-block within the replicate, with residual variance .

There are two fixed terms here: the constant (*μ*) and the set of genotype effects (, *g* = *1*, …, *169*). Note, *μ* estimates the predicted mean for the reference genotype, *r* (typically, this is the first level of the factor), is constrained to equal zero, and represents the effect of genotype *n* as a deviation from the reference genotype, *r.*

There are two random terms: the set of 6 replicate effects () and the set of 78 replicate by sub-block effects , plus the residual term (deviations).

The estimated parameters of the linear mixed model are the set of fixed effects and the variance parameters. The random effects have a slightly different status, which we’ll discuss in Part 2 of this series. Variance parameters are estimated by REML (**RE**sidual **M**aximum **L**ikelihood, also called **RE**stricted **M**aximum **L**ikelihood). The fixed effects are estimated by the method of generalized least squares, conditional on the estimated values of the variance components.

In the next blog of this series, we’ll dive more deeply into the random model. We’ll discuss how to interpret the random model, how to compare the fit of different random models and what BLUPs are.

Related Reads