Unraveling breeding values: How one value depends on others

Unraveling breeding values: How one value depends on others

Dr. Salvador A. Gezan

19 April 2022
image_blog

Demystifying breeding values: Understanding the basics

All breeding programs focus on selecting individuals with outstanding breeding values (BVs). But the concept of BVs is sometimes hard to grasp. In its most formal definition, the BV of an individual is the deviation of the mean of its progeny from the population mean. This deviation is associated with a specific trait, and it can be positive or negative. 

Estimating breeding values using linear mixed models (LMMs)

Breeding values are estimated using the phenotypic data of the target individual and/or its relatives with the use of linear mixed model (LMM) methodology. Here, genotypes are considered a random effect, and after fitting the model we obtain their BLUPs, or Best Linear Unbiased Predictions, a more general class of random effects that includes BVs. 

The role of the numerator relationship matrix (A matrix)

A key element of estimating these BVs is the incorporation of a numerator relationship matrix (often known as the A matrix), that contains the additive genetic relationships between individuals. For example, full-sib individuals from unrelated parents have a coefficient of 0.5, indicating that they share, on average, 50% of their alleles. We will illustrate the role of this matrix in the estimation of BVs. In order to do so, we will start with the widely used Animal Model (or individual model) with some generic matrix notation, assuming we have a total of n individuals. Once this model is fitted it will provide us with an estimation of the BV for each of the n animals/individuals considered in the model.

where,

y is the vector with the response variable of dimension n×1,

X is an incidence matrix of dimension n×p,

β is a vector of fixed effects of dimension p×1,

Z is an incidence matrix of dimension n×n,

a is a vector of breeding values of dimension n×1, with ~, and

e is a vector with residual/error terms of dimension n×1, with ~.

In the model description above, we have included the distributional assumptions of the random effects. For example, the vector a follows a multivariate Normal distribution with a vector of expected values of zero and a variance-covariance matrix . Similarly, for the residual e we have a vector of expected values equal to zero and a variance-covariance matrix , where is is an identity matrix of dimension n×n.

The formulae for the estimation of the BVs (in this case a effects) in matrix notation is:

with

Here, V is the matrix of variance-covariance of the phenotypic observations, i.e., V = V(y) is of dimension n×n, and you can see that it depends on the assumptions associated with the matrices G and R.

To make this a little easier to understand we will make some simplifications. First, we will consider that all individuals defined in A have phenotypic data (that is why the vectors y and a have the same dimensions). Hence, we have that . Second, we will consider that there is a single fixed effect in our model: the overall mean; hence, and therefore , a vector of n ones. And finally, we will replace G and R by and respectively. Thus,

   

   

with

     

For interpretation, let’s first focus on the matrix V, which actually corresponds to the total phenotypic variance based on our model. The inverse of this matrix, as used in our estimation of a, is actually specifying a weight to each observation. If we, for simplicity, assume that all individuals are unrelated, then this weight is a diagonal matrix with elements corresponding to the inverse of the total phenotypic variance. However, we normally have genetic relationships between individuals. The inverse of V includes this genetic relationship information, but it also considers the over- and under-representation of some individuals given the presence (or absence) of relatives in the data. These weights will be larger when there is more information and smaller when there is less information on an animal.

We can also write the expression to estimate a as:

   

where, and

In the above definition, S is a vector of dimension n×1 containing the phenotypic response for each individual ‘corrected’ by the fixed effects. In our simplified case, this correction is only , but in most cases it will require more complex corrections as described by

The significance of matrix H: Covariances between phenotypic observations and breeding values

Now, much more interesting is the matrix H of dimension n×n. This matrix can be understood as the matrix of covariances between the phenotypic observations and the breeding values, namely: cov(a, y). This H matrix describes the association between a given phenotypic observation to the prediction of another (or same) individual’s given breeding value. Let’s write the expression of the individual’s breeding value . This corresponds to the row of the matrix H multiplied by the vector S. Thus,

In the above expression, we are using the notation . This corresponds to the value (or weight) of the H matrix located in the row and column .

The above formula has several interesting elements, First, it corresponds to a linear combination of the deviations of each phenotypic response (y corrected by fixed effects). Second, these deviations, are then averaged according to the weight coefficients from the values to provide us with an estimation of

These weights depend on two aspects: the matrices A and V. From the matrix A we have the relationships coefficients between any other individual with respect to individual , and these are always going to be non-zero for all relatives of individual . This is extremely relevant, as it implies that all relatives of individual with phenotypic data will contribute to the estimation of the BV for this individual. The higher this coefficient the stronger the contribution. This is biologically reasonable, as all relatives have phenotypic responses representing a portion of the alleles (or DNA) also found in individual

The other aspect of these weights is that they are being calculated by an element of the matrix. Here, the more or better information that comes from a relative (say more replications), then the smaller variance associated with that piece of information and therefore a larger weight. This does not apply here but if we deal, for example, with heterogenous errors this will imply differential weights by groups. 

Interestingly, the largest weight will originate from , the phenotypic response of the individual on its own BV. This is reasonable as we are observing directly the response of this genotype. In other models (for example parental models) this might not be the case and only relatives contribute to the estimation of .

Exploring multi-trait analyses: Estimating breeding values for different traits

Extensions of the above Animal Model also use equivalent expressions for estimating BVs. For example, if we think of multi-trait analyses, we will want to estimate the breeding value for individual on trait . In our multi-trait Animal Model we include phenotypic measurements on all traits for the same individual, and all of these will contribute to the estimation of . The corresponding weights will depend on the precision of the trait (e.g., heritabilities), but also on the strength of the additive correlation between traits, and this will be reflected in a more complex G matrix used to fit our LMM.

Concluding insights: The complex interplay of breeding values

The expression is very important because it tells us that the BV of a given individual is a linear combination of all its relatives’ phenotypic information (and therefore BVs). And their contribution depends not only on how related they are but also on the quality of that piece of phenotypic information. This is done simultaneously for all individuals in our dataset, using a complex system of equations solved within the LMM framework.  So, as we have seen ‘a breeding value depends on other breeding values’!

About the author

Dr. Salvador Gezan is a statistician/quantitative geneticist with more than 20 years’ experience in breeding, statistical analysis and genetic improvement consulting. He currently works as a Statistical Consultant at VSN International, UK. Dr. Gezan started his career at Rothamsted Research as a biometrician, where he worked with Genstat and ASReml statistical software. Over the last 15 years he has taught ASReml workshops for companies and university researchers around the world. 

Dr. Gezan has worked on agronomy, aquaculture, forestry, entomology, medical, biological modelling, and with many commercial breeding programs, applying traditional and molecular statistical tools. His research has led to more than 100 peer reviewed publications, and he is one of the co-authors of the textbook Statistical Methods in Biology: Design and Analysis of Experiments and Regression.