What are A, G and H matrices and when do we use them?

What are A, G and H matrices and when do we use them?

The VSNi Team

27 July 2021
image_blog

My colleague, Amanda Avelar de Oliveira, gave a presentation on ASRgenomics in which she talked a lot about A, G and H matrices. I was unfamiliar with these terms so afterwards I tried Googling and got absolutely nowhere! Lots of lengthy papers and highly detailed explanations, but no short and simple definitions. So, I had a chat with Amanda to try and get a straight answer. Here’s what I learned.

Jane
During your ASRgenomics [1] talk I was Googling "What is the difference between genomic and pedigree matrices". I gave up.

I imagine the explanation is something like this:

  • A matrix = we have been recording on paper or computer, which individuals were bred together and which progeny resulted.
  • G matrix = we can look at the genomes of the individuals and see which ones are related

But this is just my guess!

Amanda
Your definition is right, but in a very informal way. Let’s consider this paragraph in the book Genetic Data Analysis for Plant and Animal Breeding by Fikret Isik, James Holland, Christian Maltecca.

alt text

Traditional genetic evaluations combined the phenotypic data and resemblance coefficients between relatives to predict the genetic merit of individuals. The resemblance coefficients, derived from pedigrees, are based on probabilities that alleles are identical by descent (IBD). The resulting matrix of pairwise pedigree relationships is referred to as the A matrix because the elements in the matrix are pedigree-based estimates of additive genetic relationships.

More recently, genetic markers distributed throughout the entire genome have been used to measure genetic similarities more precisely than by using pedigree information (VanRaden, 2008). Genetic markers estimate the proportion of chromosome segments shared by individuals based on the identical by state (IBS) matching of marker alleles.  The matrix of pairwise realized genomic relationships estimated from marker information is referred to as the G matrix.

DNA segment in two or more individuals is

Jane
Ok, so if we can get genomic relationship information why would we even bother with the A matrix any more, (unless we do not have access to DNA data?). Why does it make sense to combine the A and G matrices to create an H matrix? Why not just chuck out the A matrix?

Amanda 
Basically, to get genotypic information we need to genotype the individuals in our population, which is really expensive! Pedigree information can be easier and cheaper to obtain. In animal breeding, for example, breeders have very good pedigree information - they basically know the whole ancestry of an individual for generations! In plants, it is less common to have a good pedigree, especially for annual crops.

Jane
So, the pedigree information is still valuable, particularly if you only have partial genomic information.

Amanda
Absolutely. And when you trust your pedigree, you can also use it to identify possible mistakes in your genomic data.

Jane
Ooer, how can genomic data be mistaken? Perhaps the person collecting the data mixed up the samples, for example?

Amanda
Yes. Or even errors in the lab, contamination by pollen or semen, and so on. There are many possible sources of error.

Jane
So, the final H matrix, combining the A and G matrices is useful...how?

Amanda
To ensure good quality genomic data it needs to be genotyped with a high coverage. This is a very expensive process, and generally people cannot afford to do much of it. The H matrix is useful because it combines the information from the pedigree with genomic marker information, enabling you to use all the information on genetic similarity you have. For example, if you only have money to genotype 100 individuals, but you have pedigree (and phenotype) information on 1000 individuals, you can run your analysis on all 1000 individuals by using an H matrix to combine the genomic and pedigree information. We call it the H-matrix because it’s a hybrid of the pedigree-based A matrix and the genomic-based G matrix.

Jane
Great! That makes sense. Thank you so much.

Amanda
No problem.

A matrix: contains pedigree-based estimates of additive genetic relationships

G matrix: contains realized genomic relationships estimated from marker information

H matrix: combined estimates of the genetic relationships from the pedigree and genomic (i.e., marker) information.

[1]  ASRgenomics is an R package that provides a series of molecular and genetic routines to assist in analytical pipelines for Genomic Selection and/or Genome-Wide Association Studies (GWAS). It can be used both before and after genomic analysis by ASReml-R (or another R library). The main routines included are used for:

  • Preparing and exploring the phenotypic and genetic data.
  • Generating the genomic-based matrix (G) (and its inverse).
  • Tuning up the genomic matrix and preparing it for downstream analyses.
  • Generating and exploring the hybrid genomic matrix (H).

About the authors

Amanda Avelar de Oliveira is an Agronomist with M.Sc and Ph.D. in Genetics and Plant Breeding from the University of São Paulo (ESALQ/USP). She has experience on quantitative genetics, genomic prediction, field trial analysis and genotyping pipelines. Currently, she works as a consultant at VSN International, UK.

“I believe in the power of knowledge sharing and multidisciplinary efforts to increase genetic gains in plant breeding while ensuring sustainability in agriculture”.

Jane Cohen is a technical author with a bachelor's degree in information technology and graduate diploma in technical communication. She is a self-confessed science nerd with a keen interest in biology and statistics. When she's not honing her DIY skills she can generally be found either with a book in hand or perambulating in the countryside.