A genotype without a phenotype does not go very far!

Unlocking the genomic puzzle: the power of phenotype-genotype synergy

Dr. Salvador A. Gezan

22 June 2021
image_blog

I recently read a very interesting opinion piece published in The Guardian. The author talks about the impact the Human Genome Project (HGP) has had 20 years after the first draft of the human genome was published. Of course, this has been a great accomplishment, and today it is possible to have whole genome sequencing done in less than one week and for a fraction of the original cost. Now there are many more full genomes available for different animal and plant species. These constitute great scientific and technological advancements, and one cannot stop thinking what will be possible 20 years from now!

Going back to that article, the author states a very critical aspect of the HGP that I copy in the following paragraph:

“The HGP has huge potential benefits for medicine and our understanding of human diversity and origins. But a blizzard of misleading rhetoric surrounded the project, contributing to the widespread and sometimes dangerous misunderstandings about genes that now bedevils the genomic age.“

Phillip Ball

Misleading rhetoric: unraveling the hype

This project has been surrounded by lots of media attention and, as with many scientific communications, one of the things that concerns me was the expected future benefits from this sequencing project. I am not going into detail about all the promises stated (see original article for more details) but what is alarming is that this genome was sold as a ‘book of instructions’ and ‘nature’s complete genetic blueprint for building a human being’.

Misleading rhetoric has fuelled the belief that our genetic code is an ‘instruction book’ – but it’s far more interesting than that…”

Phillip Ball

It is true that the genomic information is critical for understanding things such as gene diversity, propensity to diseases and deleterious mutations. Moreover, nowadays genomic information is used to make genomic predictions, for example polygenic risk scores for humans and genomic breeding values for plants and animals. But the big fallacy is that a genotype is a ‘blueprint’. Here, I disagree as a genotype without a phenotype does not go very far!

Genotype alone falls short: the significance of phenotypic data

All achievements in genomics, current and future, come from a close connection between phenotypic data and genotypic data. For example, genome-wide association (GWAS) used for finding markers to provide early detection of cancer relies on having thousands of individuals (or samples) identified at the different states of the disease. Likewise, for vegetables, identifying SNP markers for increased supermarket shelf-life requires phenotypic data as they age.

In my view, the main fallacy in many over-promising genomic projects, including HGP, is the belief that genomics is all you need, reflecting a lack of understanding on how critical phenotypic data is. I have even encountered, among breeding managers, the statement that ‘phenotypic data and field testing are no longer required if we have genomic data.’! Genes alone are thousands of small pieces of information, and there are so many complex aspects to consider, such as genes interacting with environment, and other high-order interactions at the gene level (such as dominance and epistasis), that can only be identified and understood with the use of complex computational tools paired with data on each genotype.

Life is not a readout of genes – it’s a far more interesting, subtle and contingent process than that.” 

Phillip Ball

Genomic data will stay with us for a long time. It has, and will become, cheaper to obtain and at some point it will be treated as a commodity. But good records that evaluate hundreds, thousands or even millions of unique individuals, is expensive and slow to obtain. 

Phenotyping matters: enhancing precision and heritability

To highlight a couple of things about this: firstly, increasing precision of phenotyping data will increase heritability; larger heritability values translate into better models, and therefore a higher chance of finding the actual true causal marker of a disease. Secondly, many breeding programs have large quantities of historical data. Often, for this data it is easier today to invest in genotyping than phenotyping, especially if DNA samples have been stored (as with semen or milt in animal breeding), or with DNA directly available from field trials (as with forest breeding programs); therefore, in these cases investment on phenotyping has already been done!

Interestingly, the statistical tools that focus on phenotyping data, are not as ‘sexy’ as the genotyping tools. Here we talk about boring aspects such as: replication, blocking, randomization, and then regression analysis, linear (mixed) models, logistic regression, etc. But all these tools are well known and understood, and there is no excuse to ignore our statistical heritage. 

Bridging the gap: statistical tools for genotype-phenotype analysis

Statistical tools such as ASReml-R or Genstat are critical for understanding genotype versus phenotype. In a project such as the HGP, 20 years ago only some doors were opened, and as we collect more and more information, there will be many statistical (and computational) challenges, and we will need to develop new techniques that will make all of those promises from the HGP possible; albeit always closely connected to good phenotypic data, otherwise, this will be a waste of our time!

About the author

Dr. Salvador Gezan is a statistician/quantitative geneticist with more than 20 years’ experience in breeding, statistical analysis and genetic improvement consulting. He currently works as a Statistical Consultant at VSN International, UK. Dr. Gezan started his career at Rothamsted Research as a biometrician, where he worked with Genstat and ASReml statistical software. Over the last 15 years he has taught ASReml workshops for companies and university researchers around the world. 

Dr. Gezan has worked on agronomy, aquaculture, forestry, entomology, medical, biological modelling, and with many commercial breeding programs, applying traditional and molecular statistical tools. His research has led to more than 100 peer reviewed publications, and he is one of the co-authors of the textbook Statistical Methods in Biology: Design and Analysis of Experiments and Regression.