Understanding BLUEs, BLUPs, and breeding values in linear mixed models

Understanding BLUEs, BLUPs, and breeding values in linear mixed models

Dr. Salvador A. Gezan

02 November 2021
image_blog

The fitting of a linear mixed model (LMM) can be divided into two steps. The first one is the estimation of the variance components. In ASReml and Genstat this is done using restricted (or residual) maximum likelihood based on a complex algorithm that uses the average information (AI) algorithm and sparse matrix operations. Once these variance components are estimated, we then go onto the second step, where these variances are used to obtain estimates of the fixed effects and predictions of random effects. This is where we need to differentiate between BLUEs and BLUPs. But what are these?

The distinction between BLUEs and BLUPs

Best Linear Unbiased Estimates (BLUEs) are the solutions (or estimates) associated with the fixed effects and Best Linear Unbiased Predictions (BLUPs) are the solutions (but identified as predictions) associated with the random effects of a model. But what does Best Linear Unbiased mean?

BestAmong all possible unbiased linear estimators these solutions have minimum variance
LinearSolutions are formed from a linear combination of the observations
UnbiasedExpectations of these solutions are equal to their true values

Henderson's MME and the animal model

The use of BLUPs to predict random effects was first described by C. R. Henderson. He developed the mixed model equations (MME, see equation [2]) for LMMs in order to calculate BLUPs of breeding values (or any random effect) and BLUEs of fixed effects. 

Let’s have a look at how solutions are obtained from a linear mixed model. For this, we will consider the model written in matrix notation:
[1]    

where , , , and are vectors of observations, fixed effects, random effects, and random residuals, respectively; and and are design matrices connecting the observations to the effects.

The above model is the basis for most genetic analyses with what is known as the Animal Model, that once fitted provides us with an estimation of the breeding values for each of the ‘animals’ (individuals) considered in the model.

Considering the Animal Model on its matrix notation of above, we have corresponding to a series of nuisance fixed effects, for example contemporary group, age or replicate. However, what is more interesting to us is , corresponding to the breeding values. As this factor is random, we have distributional assumptions, namely that ~ where the is the numerator relationship matrix that describes the additive genetic relationship between individuals, and is the additive variance.

As the error or residual term, , is also a random effect, we have their distributional assumptions of  ~ , with an identity matrix and the residual variance (or mean square error). 

As you can see, the central problem in predicting breeding values from observed phenotypic data is separating the genetic and environmental effects. This separation is clearly done by the Animal Model. 

Solving the system of equations: Henderson's MME

So, in order to get our solutions (i.e., BLUEs and BLUPs), we need to solve the following system of equations, known as Henderson’s MME:

[2]      

All the elements we have previously defined, except for , which is formed from variance components estimated in the first step when fitting a LMM. This system can involve hundreds or even thousands of equations, requiring some intense computational calculations in order to estimate our solutions. 

A different way to see the above equations is to express the formulae for BLUEs as:

[3]    

and for the BLUPs (breeding values) as:

[4]    

where

Equivalence to Weighted Least Squares (WLS)

Interestingly, expression [3] is equivalent to the estimation of Weighted Least Squares (WLS) as described for linear regression or linear models. Here, the weights are defined as . This is relevant as the elements on (particularly the diagonals) will be associated with the amount of information, or weight, that each record has (in this case phenotypic response of an individual).

Breeding values and the breeder's equation

Also, the expression for the breeding values is extremely relevant. Note that [4] can be written as:

where is resembling the calculation of heritability with the genetic component in the numerator () and the phenotypic variance () in the denominator as an inverse. Also, , represents the phenotypic response ‘corrected’ for nuisance effects. Therefore, the above expression resembles the breeder’s equation:

where the index is used to identify the individual.

Therefore, fitting a LMM using an Animal Model is equivalent to that well-known expression, but in this case, the specification of matrices connecting pieces of data allows us to use all information and relationships between individuals and to have different weights for each of these pieces. Therefore, Henderson’s MME provide us with the best linear unbiased predictions of our breeding values!

About the author

Dr. Salvador Gezan is a statistician/quantitative geneticist with more than 20 years’ experience in breeding, statistical analysis and genetic improvement consulting. He currently works as a Statistical Consultant at VSN International, UK. Dr. Gezan started his career at Rothamsted Research as a biometrician, where he worked with Genstat and ASReml statistical software. Over the last 15 years he has taught ASReml workshops for companies and university researchers around the world. 

Dr. Gezan has worked on agronomy, aquaculture, forestry, entomology, medical, biological modelling, and with many commercial breeding programs, applying traditional and molecular statistical tools. His research has led to more than 100 peer reviewed publications, and he is one of the co-authors of the textbook Statistical Methods in Biology: Design and Analysis of Experiments and Regression.