Inbreeding coefficients in the genomic era: expected vs realized autozygosity

Inbreeding coefficients in the genomic era: expected vs realized autozygosity

Dr. Miguel Gallach

21 September 2022

Inbreeding is one of the most important concepts in genetics. It is so important that even people unfamiliar with the subject have a reasonable idea of what inbreeding is about. An intuitive definition of inbreeding would be ‘something connected to the idea of mating closely related organisms’. Certainly, inbreeding refers to mating parents who share one or more ancestors, or reciprocally, inbreeding refers to descending from related parents.

Geneticists use an objective and measurable definition when talking about inbreeding: the inbreeding coefficient. The inbreeding coefficient, symbolized as F, is the probability that two alleles are identical by descent (i.e., they are biochemical copies of a single allele in an ancestor). Homozygotes of identical genes are called identical homozygotes or autozygous. For instance, for a polymorphic nucleotide site segregating A and G alleles in a population (i.e. a genetic marker), AA and GG individuals would be identified as homozygous and AG as heterozygous at this particular site. 

Understanding the F coefficient and its implications

The F coefficient can be calculated for an individual, two individuals or a population. For instance, the F coefficient of the offspring of first cousin parents is 0.0625. This means that, on average, 6.25% of the genetic material in offspring of first cousins will be identical by descent. However, there is a high degree of stochastic variance about this average. This is because meiosis is a random process and the proportion of DNA inherited from grandparents (or more distant ancestors) varies among grandchildren. Thus, the standard deviation of the previous F coefficient is 0.0243. In practice, this means that 95% of the offspring of first cousins will have a realized (actual) inbreeding coefficient between 0.0139 and 0.1111 (F ± 2 x SD). It is therefore perfectly possible for offspring of second cousins to be more autozygous (e.g., F = 0.03125) than the offspring of first cousins.

The stochastic variance increases with each meiosis and therefore the difference between expected and realized autozygosity increases with each new generation. This phenomenon is inevitable and may have a significant impact on breeding programs, the management of endangered or local livestock populations, conservation biology and human genetic studies. This is because breeding values, effective population size, population relationships, demographic history and identification of recessive disease variants depends on the accurate estimation of the inbreeding coefficient.

Estimating realized autozygosity with Runs of Homozygosity (ROH)

Runs of Homozygosity (ROH) are long DNA stretches of homozygous markers. Since the first population genomic studies characterizing ROH, and thanks to the inexpensive genotyping by means of SNP DNA Chips, ROH have been applied in demography studies, inbreeding depression, conservation biology, human health, and evolution. Quite possibly, the most significant application in breeding and conservation is to compute realized autozygosity. Since ROH are (most likely) a consequence of autozygosity, we can estimate a ROH-derived inbreeding coefficient, F subscript R O H end subscript as F subscript R O H end subscript = sum subscript L R O H end subscript divided by L subscript a u t o s o m e end subscript, where L subscript R O H end subscript is the length of the identified ROH and L subscript a u t o s o m e end subscript is the length of the autosomal genome size (normally, the genome portion covered by the SNPs on the chip).

Advantages of over

Although genotyping errors may have some influence when computing F subscript R O H end subscript, errors are also very common in pedigree information, and there are studies strongly suggesting that F subscript R O H end subscript is a better estimator for individual autozygosity than F subscript P E D end subscript (i.e., F coefficient derived from pedigree information). In other words, F subscript R O H end subscript are good estimators of inbreeding and even better than F subscript P E D end subscript if the genomic data is good enough. As you can imagine, using genomic markers has gained a lot of attention in animal and plant breeding in recent years.

Applications of in population genetics and breeding

F subscript R O H end subscript has other interesting applications in population genetics and breeding. For instance, in some cases, determining the shared ancestry of a reference population is difficult when there is no genealogy information; it is complex or incomplete. Since the length of ROH correlates with the number of recombination events, we can date the common ancestor back by number of generations. Hence, with a rule of thumb of 1cM/Mb for livestock species and assuming constant population size in the past, you would expect ROH expanding for 16Mb, 10Mb and 5Mb to come from a common ancestor three, five and ten generations back, respectively. Other applications are the estimation of the effective population size from the change in F subscript R O H end subscript per generation (N e subscript R O H end subscript space equals space 1 divided by 2 capital delta F subscript R O H end subscript) and the detection of genomic regions that underwent artificial selection (ROH hotspots) or regions that contain critical genes (ROH cold spots). For a review of general applications of ROH, you can refer to and for a specific applications in small endangered populations, see and the citations therein.

Leveraging genomic sciences for analytical advancements

Since the advent of next generation sequencing and derived genomic technologies, the impact of genomic sciences in animal and plant breeding (agrogenomics), medicine (personalized medicine, pharmacogenomics), biofertilizers (metagenomics), wildlife management (ecological genomics), etc., is indisputable. Here, is just one simple example of how genomics can provide a ‘possible solution to an old problem’ ( The proper integration of genomic data into your standard analytical toolkit will certainly help you reach your goals at a faster pace and leverage your profits.

About the author

Dr. Miguel Gallach is a geneticist with an M.Sc. in molecular and evolutionary genetics and Ph.D. in biology from the University of Valencia, Spain. Dr. Gallach specializes in the application of genomics and has over 15 years’ experience in academia (research, teaching, and mentoring). He is the former associate editor of BMC Evolutionary Biology and a former consultant for the IAEA/UN in Vienna, Austria. Currently (2022) he works as the CEO and Chief Scientific Officer of GC Genomics.