From data to insights: improving plant breeding outcomes with JHI’s cutting-edge data visualisation tools

From data to insights: improving plant breeding outcomes with JHI’s cutting-edge data visualisation tools

lain Milne

20 September 2023
image_blog

The James Hutton Institute (https://www.hutton.ac.uk/) is a research institute based in Scotland that focuses on a range of topics related to environmental science, agriculture, and food security. It may come as little surprise that the JHI informatics team that works in the area of molecular and quantitative genetics has developed a strong component of data visualisation in their software tool portfolio given their location in the same Scottish Parish where William Playfair, a pioneer of data visualisation was raised. In a career that encompassed working for James Watt, spying, and taking part in the storming of the Bastille, Playfair invented several types of diagrams: in 1786 the line, area and bar chart of economic data, and in 1801 the pie chart and circle graph, used to show part-whole relations. His approach was founded on the basis that for most people it is easier to comprehend the major features of data visually than from a table of figures. 

JHI’s visualisation tools

JHI’s visualisation software is designed to help users explore and analyse data sets in a visual format. The software includes a range of tools for visualising data, such as single-nucleotide polymorphisms (SNPs) and pedigrees. The software is designed to be user-friendly, with intuitive interfaces that make it easy for users to manipulate and interact with data. One of the key features of the software is its flexibility, which allows users to visualise data from a wide range of sources and customise visualisations to suit their specific needs. Interactive tools enable users to explore and manipulate data in real-time. This includes tools for filtering and selecting data, as well as tools for zooming in and out of visualisations to explore different levels of detail. 

The JHI approach: guiding you every step of the way

The JHI Dundee team’s approach has been to develop a suite of visualisation tools to both complement and enhance the processes involved at every stage from data capture through analysis, to conveying the results of analysis to a final audience. In an era of big data, the significance of visualisation as a complement to robust experimental design and accurate statistical analysis is growing exponentially. The true value of visualisation becomes apparent when we observe its impact throughout the various stages of the data capture and analysis lifecycle. 

JHI visualisation tools in action

Plant breeding has undergone a significant transformation with the emergence of affordable high-throughput technologies for sequencing and genotyping. However, as these technologies continue to advance rapidly, a critical challenge lies in ensuring the generation of accurate data through effective analytic approaches for base calling and SNP calling. Visualisation of sequencing reads and genotyping data plays an important role in quality control. Conducting visual inspection of representative data samples is of great value in identifying artefacts and systematic errors in both sequencing and SNP calling processes. However, this in turn brings computation challenges in efficient computer graphics and rapid access to large data files. The Tablet application enables rapid access to binary index files resulting from sequence assembly and readmapping. Tablet allows for fast scanning of read mapping and SNP calling anomalies which plays a key role in the selection of analysis software and optimisation of analysis parameters.  

Flapjack: empowering plant breeders with efficient data analysis and visualisation

Flapjack was originally developed for “graphical genotyping” in the era where genotyping was based on a limited number of microsatellite markers. It can cope with tens of thousands of SNPs across thousands of lines. This resilience is achieved through the optimisation of graphics and rapid file access. Flapjack has emerged as a versatile tool for a range of plant breeding applications at every stage in the data collection cycle. For instance: 

  • Efficient ordering of SNP genotype data by microtiter plate to rapidly expose unusual patterns of SNP calling or high levels of missing data associated with a particular plate, compromising subsequent analysis  
  • Sorting lines by SNP haplotypes at a local or global level 
  • Colouring SNP genotype data by similarity to a reference line 
  • Ordering SNP genotype data by trait scores.alt text

Figure1 Cereal SNP data visualised in Flapjack to highlight similarity (green) or differences (red) to a reference haplotype 

Helium: revealing pedigree connections and genetic insights

The SNP or trait data can also be viewed in a pedigree context using  Helium, which enables the display of large and complex pedigrees. Colour pallets can be chosen to follow traits or haplotypes from the pre-breeding stage to the development of elite lines. Helium also highlights individuals that make multiple contributions to complex pedigrees, aiding in the identification of pedigree errors. This visualisation tool, along with others in the H3 portfolio, has been engineered to conform to BrAPI standards allowing seamless integration into analysis pipelines.

alt text

Figure 2. Visualisation of a complex pedigree in Helium, coloured to show the ancestry of a SNP haplotype in a key barley variety and its transmission to a wide variety of progeny. 

JHI’s cutting-edge data visualisation tools have been developed to further their commitment to sustainable resource management. Their software empowers users to explore and analyse data, streamlining all processes from data capture to analysis, making insights more accessible. Their tools, including Tablet, Flapjack, and Helium, not only optimise analysis but also contribute to H3's goal of fostering thriving communities and environmental security by connecting science, land and people. 

About the author

lain Milne 

lain's speciality is developing high-level, intuitive, professional software solutions and (integrated) services with a strong visualization component to enhance and accelerate sequencing, genotyping and analysis pipelines. lain applies his knowledge of user requirements coupled with back-end, flexible and scalable components with a focus on infrastructures suited to big data.