The elegance of the Breeder’s Equation

The elegance of the Breeder’s Equation

Dr. Salvador A. Gezan

03 May 2022
image_blog

Understanding the Breeder's Equation

The Breeder’s Equation is a well-known expression in quantitative genetics that is widely used to understand how genetic gain can be achieved. Here, we will focus on a few statistical aspects of this equation and the role they play in helping us to maximize genetic gain. An expression of this equation is:

x

where,

is the genetic gain per period of time (often in units/year),

is the narrow- or broad-sense heritability,

is the selection differential (in units),

is the period or cycle (often in years).

Let’s start with . This represents the genetic gain on an annual basis, and its units will depend on the target trait. Often, we use yield/year but can also correspond to an index from a trait composite with different weights. In our discussion, we will assume that we have estimated breeding values (EBVs) for a group of individuals that will be parents to constitute the next generation (breeding population) or for commercial use (production population). Hence, the focus is on additive effects. Alternatively, we could use total genetic values (TGVs). These are common in plant breeding, and correspond to effects that quantify the total genetic worth of an individual (often a clone), and therefore it contains both additive and non-additive effects.

The role of heritability in genetic gain

The most important aspect of the above equation is the heritability (). If it is exactly one, then we have the perfect situation of ‘what we see is what we get’. That is, phenotypic selection of the top individuals will yield the best genotypes. However, this is never the case, and we have a range of heritabilities often from as low as 0.1 up to 0.8. For simplicity, we will consider the following simple heritability expression:

Narrow-sense:    / +

Broad-sense:        / +

where, , and are the additive, total genetic and residual variances, respectively.

Statistical approaches to reduce residual variance

Here, anything aiming to reduce will increase heritability and, in turn, produce greater genetic gains. Statistically, there are multiple options to consider to reduce . For example, the use of a sound and optimized experimental design will help to eliminate bias and/or have better control of background noise. Also, on the linear mixed model analysis side, we can use, for example, spatial analyses on field trials, or inclusion of relevant nuisance random or fixed effects, to better model the data.

In practical terms there are many other options to reduce this background noise. One is to have better and more careful phenotypic measurements. This implies clear, consistent and well-defined protocols, but also, as in some cases, the use of more expensive measurement procedures and well-trained people. For example, use of sophisticated laboratory techniques to measure protein or sugar content. In field trials, good selection and preparation of experimental sites is also critical to ensure more homogeneous conditions for plants. 

All of the above elements affect equivalently EBVs or TGVs, but for the former the inclusion of pedigree is important to improve the accuracy of these random effects. This pedigree should be of top quality, as any mistakes or inconsistencies (e.g., unrecorded pollen contamination) will result in increased background noise.

The impact of replication on heritability

Additionally, the heritability from the Breeder’s Equation can take many definitions. An important one is associated with the use of the genotype’s mean based on several phenotyped replications (as done with clones), or multiple measurements over time (as done on trees and dairy cows). To illustrate this, consider the following expressions:

Narrow-sense:     / +

Broad-sense:         / +

where and are the mean broad- and narrow-sense heritabilities, respectively; and is the replication (or number of repeated measurements). Note that the replication reduces the magnitude of the background noise in a direct way; therefore, any efforts to increase replication will increase heritability (unless  is zero). This implies, that in plant breeding for example, having an adequate number of replications will increase genetic gain. Defining the optimal number of replications or measurements over time of an organism is another statistical aspect to plan carefully, and is associated with good experimental design practices and helped with simulations.

Selection differential: Maximizing genetic gains

The differential of selection, , has obvious implications. Its expression is: , where is the mean of the selected individuals and is the overall mean of the population under selection. Hence, the further away the selected individuals are from the overall mean, the greater the potential for large genetic gains. This seems a simple strategy that can occur in at least the two ways described below. 

One way is to select very few (say < 5) genotypes. But this is difficult for most breeding programs, as there is a restriction on a minimum number of future progenitors to generate the next generation; and they also need to present a wide range of genetic diversity to ensure the long-term success of the program. 

Alternatively, the other strategy is to play a numbers game, where we have a very large pool (thousands or millions) of genotypes to select from. Large numbers are required in order to increase the chance of finding an outstanding combination of alleles. Note that this, in order to be successful, still requires a non-zero heritability. This strategy has several side effects, such as high operational costs, large experimental study size, and in general complex logistics. Nevertheless, in practical terms we should always aim at having as many individuals as possible to select from.

The role of generation interval 

The last component is , which is the period, cycle or generation interval. Depending on the organism it might or might not be possible to modify this component, as some breeding programs are limited on the reproduction cycle, and/or the time required to collect the phenotypic data. For example, milk production of sires requires to phenotype female offspring. Similarly with trees, a wait of several years is required to perform crosses and collect seeds on mature trees. But, in some cases, it might be possible to induce early maturation with hormones, thus reducing this cycle for all or some of the individuals. Alternatively, selecting genotypes early (with or without phenotypic data) speeds up this cycle, and it has been a successful strategy in perennial plants. However, this may imply a loss in the accuracy of the EBVs (or TGVs) and/or a reduction in heritability.

Genomic selection: Enhancing the Breeder's Equation

The emergence of genomic selection (GS) has also added interesting alternatives to the components of the Breeder’s Equation. For example, under the numbers game mentioned above, it is possible to genotype thousands of individuals for which we obtain genomic predictions based on a previously trained model. This will clearly increase the pool of individuals for selection, but at an important economic cost associated with genotyping. Additionally, genomic predictions can be implemented on a pool of individuals that are still immature, and this set can be used as soon as possible as parents, even before having phenotypic data associated with these genotypes. 

Genomic prediction models are not perfect, and they add an extra complication to the genetic gain calculations. In order to understand this, we will use the expression of indirect genetic gain below:

x x    x  

where,

is the genetic gain per period of time for the target (unmeasured) trait,

is the heritability of the target trait,

is the heritability of the indirect trait,

is the additive correlation between the target and indirect trait,

is the selection differential based on the indirect trait,

is the period or cycle.

This expression is modified by considering that GS generates an indirect predicted genetic value for each individual based on molecular data that is not perfectly accurate (with accuracy defined as the correlation between the true and the predicted EBVs). As we are dealing with the same trait,  = , and we write:

= x x

where, refers to the genetic gains based on the genomic prediction model available, and is the accuracy of this model (as defined before).

As is less than one, then we always have a loss of genetic gain for our trait, and this is going to depend on how far from one we are. Most common accuracy values range between 0.2 to 0.4; hence, there could be a considerable potential loss of genetic gain. Nevertheless, as indicated earlier, this can be overcome by genotyping a greater number of individuals, or by reducing the cycle given the availability of predictions at early stages, among many options not described here.

Increasing genomic model accuracy for optimal genetic gain

Statistically, there are many possible ways to increase the accuracy of genomic models. Some include larger training populations, more SNPs markers on the genotyping panel, close relatedness between training and evaluation populations, better or more appropriate genomic techniques for fitting models (such as GBLUP or Bayes B), to name a few. It is not our objective to focus on the above or other aspects, but it is important to mention that there are many alternatives to increase the accuracy of our models, and they will have a relevant effect on increasing genetic gains.

Balancing genetic gain and cost considerations

The last aspect that it is important to mention about our Breeder’s Equation is that exploiting its benefits depends on costs and careful economical evaluations. Most options to increase genetic gain are associated with greater costs. For example, in some countries it is cheaper to improve the quality of the phenotyping rather than increasing the number of genotypes to evaluate. All of these, and other, elements need to be considered specifically for the reality of each commercial breeding program and they are often complemented with detailed simulation studies. But, as shown here, the Breeder’s Equation presents an easy way to enumerate the elements that are worth assessing so we can make the most of our resources and statistical tools available.

About the author

Dr. Salvador Gezan is a statistician/quantitative geneticist with more than 20 years’ experience in breeding, statistical analysis and genetic improvement consulting. He currently works as a Statistical Consultant at VSN International, UK. Dr. Gezan started his career at Rothamsted Research as a biometrician, where he worked with Genstat and ASReml statistical software. Over the last 15 years he has taught ASReml workshops for companies and university researchers around the world. 

Dr. Gezan has worked on agronomy, aquaculture, forestry, entomology, medical, biological modelling, and with many commercial breeding programs, applying traditional and molecular statistical tools. His research has led to more than 100 peer reviewed publications, and he is one of the co-authors of the textbook Statistical Methods in Biology: Design and Analysis of Experiments and Regression.