Prof. Stephen Senn11 January 2022
The importance of deciding whether it is necessary to use nitrogen in manures needs no further comment. It was to settle definitely questions like this that John Bennet Lawes began his experiments at Rothamsted in Hertfordshire on the manuring of crops.
T B Wood (1913), The Story of a Loaf of Bread, p4.
Three great heads of statistics at Rothamsted made important contributions to the design and analysis of experiments. Ronald Aylmer Fisher (1890-1962) was at Rothamsted Experimental Station from 1919 to 1933, initially as the sole statistician and then as the head of statistics. When he left to become Galton Professor of Eugenics at University College London he was succeeded as head of statistics by Frank Yates (1902-1994), who had only arrived to work at Rothamsted two years earlier. Yates was to remain the head of statistics for 35 years. When he retired in 1968, his successor was John Nelder (1924-2010), who had previously worked briefly at Rothamsted but who was, at the time of his appointment, head of statistics at The National Vegetable Research Station in Wellesbourne. John Nelder remained head until his retirement in 1984.
All three made considerable contributions to many fields of statistics and Fisher also to genetics and evolutionary biology. Yates worked on sampling theory and computing, Nelder on computing and modelling and Fisher on just about everything. A common interest of all three, however, was the design and analysis of experiments. Together they created what I like to think of as The Rothamsted School of design and analysis of experiments. Of course, they were not the only statisticians who did this. Many others who worked at Rothamsted made important contributions, as did others elsewhere. Nevertheless, the work of these three was crucial and the theory they created has been extremely influential, although not, as I shall explain in due course, as influential as it deserves to be.
An important context for the development of this theory was that agriculture was the field (a word that always causes one to pause, given the subject) of application. Agricultural scientists were ingenious and ambitious in constructing complex experiments. Typically a field would be subdivided into plots, to which treatments would be applied, but it could be that other treatments were applied at a lower subplot level. Due to spatial autocorrelation in fertility, variation between plots would generally be higher than variation between subplots. Thus, care had to be taken in judging whether the effects of the treatments applied differed from each other by more than could be expected by chance. Discovering exactly how this should be done is something that took half a century, and all the three heads made important contributions.
In 1910 Thomas Barlow Wood (1869-1929), an agricultural scientist, and Frederick John Marrian Stratton (1881-1960), an astronomer, collaborated to write a paper that described how the accuracy of results from an agricultural experiment could be estimated . They showed how a technique that astronomers had long been using to assess the reliability of a mean of a number of differing observations could be applied to agricultural yields also. (They took an example of the percentage of dry matter in 160 roots of a variety of Golden Globe mangold.) They also showed how two treatments could be compared.
The theory of errors was well established amongst astronomers. George Bidell Airy (1801-1892) had written a monograph on the subject  in 1862 that had become a standard work of reference. From a modern perspective, it is slightly surprising that Wood and Stratton felt it necessary to explain a common technique amongst astronomers to agronomists but as they put it:
It might seem at first that no two branches of study could be more widely separated than Agriculture and Astronomy. A moment's consideration, however, will show that they have one point in common: both are at the mercy of the weather. (p425)
Furthermore, only two years earlier, Student (William Sealy Gosset, 1976-1937), whose work involved regular contact with problems of agricultural experiments, had published his later-to-become-famous paper The Probable Error of A Mean . This was in many ways in advance of that of Wood and Stratton. Presumably, they were unaware of Student’s work but we now know that Student himself had been anticipated in 1876 by Jakob Luroth(1844-1910) , a German mathematician who was originally an astronomer, so the story of astronomers and agronomists advancing the theory of errors by stumbling past each other has some history.
An interesting connection (it has perhaps a causal explanation and cannot be marked down definitively as a coincidence) is that both Wood and Stratton had connections to Caius College Cambridge, as did Fisher.
I am going to pick up the story with the second of these statisticians. Frank Yates studied mathematics at Cambridge, graduating in 1924 and after a brief period teaching at Malvern college worked from 1927-1931 as a surveyor in what is now Ghana . This either honed or provided an outlet for a talent for efficient computation and developing effective algorithms. Surveying required a lot of calculation using least squares and as David Finney put it :
Gaussian least squares was not a topic then taught to undergraduate mathematicians; the need for regularly using this technique undoubtedly developed in him the concern for efficient, well-organised, and accurate computation that characterised his later career. (p2)
Interestingly, Yates never saw the need for matrix algebra and generations of statisticians working at Rothamsted subsequently had to hide their interest in matrices from the head of statistics!
On his arrival at Rothamsted, Yates started collaborating with Fisher, developing, in particular, the work on the design and analysis of experiments; he achieved much rapidly. A good example is given by his Royal Statistical Society(RSS) read paper of 1935, ‘_Complex Experiments_’ . This presents a dazzling array of ideas with much of what has become standard theory to support them, but is also grounded in application. Many of the ideas come directly from Fisher, some indirectly, but there are also many felicitous and ingenious touches that are clearly due to Yates. In it he covers complex treatment structures, in particular for factorial designs, but also how to deal with different sources of variation in the experimental material, including their influence on efficient estimation and appropriate error estimation, for example for incomplete block designs, a topic he was to develop more fully the following year. 
As was usual for a read paper, a number of commentaries were also published. Neyman pointed out that interactive effects in factorial experiments would be estimated with low precision. Yates changed his definition in the published version of the paper from the version read to the RSS and in reply to Neyman added the remark:
Since the meeting, I have altered my definition of an interaction by the inclusion of a factor 1/2, for reasons stated in the text. (p247)
This had the effect of reducing the standard error. However, this response was not quite fair. I once discussed this with Michael Healy, a statistician who also worked at Rothamsted, and he agreed with me that however useful this modification might be algorithmically, it was not an answer to Neyman’s criticism.
In his published comment on Yates’s read paper, Fisher drew attention to two aspects of any experiment (in Genstat we now call these the block structure and the treatment structure). He gave an example of a field with plots arranged in five rows and five columns, with each of the 25 plots subdivided into two, giving 50 units and thus 49 degrees of freedom in total. As an example of the second kind, he considered studying two factors: one with five levels and one with two with each combination studied with five replications, making 5 x 2 x 5 = 50 applications and again 49 degrees of freedom. He then stated:
The choice of the experimental design might be regarded as the choice of which items in the first analysis were to correspond to any chosen items in the second, and this could be represented by a two-way analysis of the 49 elements.
In other words, it was the way that the treatment structure mapped onto the block structure that guided the way that the experiment was to be analysed and, of course, the anticipated analysis would guide the way the experiment should be designed.
An example of a modern application of Fisher’s insight is shown in the following image, which gives the Genstat code I used to carry out analysis of variances for three possible treatment models, defined by TREATMENTSTRUCTURE commands, on a cross-over design for which the basic experimental units were defined by the BLOCKSTRUCTURE command.
Use of the ANOVA command without mentioning an outcome variable gives me a so-called dummy analysis, showing how the degrees of freedom should be apportioned but not, of course, giving me a full analysis since no outcome data are used. The example is described in a blog of mine: https://www.linkedin.com/pulse/designed-inferences-stephen-senn/.
Well before Yates’s arrival at Rothamsted, Fisher had realised that these distinctions between block and treatment structure were crucial and that in particular careful attention had to be paid to the former when calculating errors. He had, however, learned by making mistakes. Two years after Fisher’s death, in reviewing his contributions to experimental design, in commenting on an early example dating from 1923 of Fisher analysing a complex experiment, Yates, having first criticised the design, wrote:
To obtain a reasonable estimate of error for these interactions, however, the fact that the varietal plots were split for the potash treatments should have been taken into account. This was not done in the original analysis, a single pooled estimate being used...
The need for the partition of error into whole-plot and sub-plot components was recognised by 1925. Part of the data of the above experiment was re-analysed in Statistical Methods for Research Workers in the now conventional form. (P311-312)
Fisher had taught himself fast. 
In fact, by the appearance of his classic text Statistical Methods for Research Workers , Fisher had developed analysis of variance (indeed, the term variance is due to him), the principles of blocking and replication, and his most controversial innovation, randomisation. An important point about this is still regularly misunderstood. As Fisher put it:
In a well-planned experiment, certain restrictions may be imposed upon the random arrangement of the plots in such a way that the experimental error may still be accurately estimated, while the greater part of the influence of heterogeneity may be eliminated.  (p232)
Thus, randomisation was not an alternative to balancing known influences but an adjunct to it.
As Yates put it in summing up what Fisher had achieved:
Apart from factorial design, therefore, all the principles of sound experimental design and analysis were established by 1925.  (p312)
One day John Nelder was analysing a complex experiment. He was doing so in the tradition of Fisher and Yates. This is what he subsequently had to say about it:
During my first employment at Rothamsted, I was given the job of analyzing some relatively complex structured experiments on trace elements. There were crossed and nested classifications with confounding and all the rest of it, and I could produce analyses of variance for these designs. I then began to wonder how I knew what the proper analyses were and I thought that there must be some general principles that would allow one to deduce the form of the analysis from the structure of the design. The idea went underground for about 10 years. I finally resurrected it and constructed the theory of generally balanced designs, which took in virtually all the work of Fisher and Yates and Finney and put them into a single framework so that any design could be described in terms of two formulas. The first was for the block structure, which was the structure of the experimental units before you inserted the treatments. The second was the treatment structure—the treatments that were put on these units. The specification was completed by the data matrix showing which treatments went on to which unit.  (P125)
I have quoted this at length because it leaves me little else to say. John was able to unify the developments of Fisher and Yates and others, (David Finney is mentioned) so that a wide range of experimental designs could be analysed using a single general approach. The results were published in two papers in the Proceedings of the Royal Society ,  in 1965, one of which did, indeed cover block structure and the other treatment structure.
No. Not at all. What Nelder established was that a general algorithm could be used and that hence a computer package could be written to implement it. After his arrival as head of statistics at Rothamsted, he was able to direct the development of Genstat, the software that was designed to implement his theory. However, many others worked on this , particularly notable being the contributions of Roger Payne, who continues to develop it to this day. An irony is that whereas one of John Nelder’s other seminal contributions to statistics, Generalised Linear Models, has been taken up by every major statistical package, (as far as I am aware) Genstat is the only one to have implemented the Rothamsted School approach to analysing designed experiments. Thus, when the Genstat user proceeds to analyse such an experiment by first declaring a BLOCKSTRUCTURE and then a TREATMENTSTRUCTURE before proceeding to request an ANOVA they are using software that is still ahead of its time but based on a theory with a century of tradition.
Professor Stephen Senn has worked as a statistician but also as an academic in various positions in Switzerland, Scotland, England and Luxembourg. From 2011-2018 he was head of the Competence Center for Methodology and Statistics at the Luxembourg Institute of Health. He is the author of Cross-over Trials in Clinical Research (1993, 2002), Statistical Issues in Drug Development (1997, 2007,2021), and Dicing with Death (2003). In 2009 he was awarded the Bradford Hill Medal of the Royal Statistical Society. In 2017 he gave the Fisher Memorial Lecture. He is an honorary life member of PSI and ISCB.
Stephen Senn: Blogs and Web Papers http://www.senns.uk/Blogs.html
1. Wood TB, Stratton F. The interpretation of experimental results. The Journal of Agricultural Science. 1910;3(4):417-440.
2. Airy GB. On the Algebraical and Numerical Theory of Errors of Observations and the Combination of Observations. MacMillan and Co; 1862.
3. Student. The probable error of a mean. Biometrika. 1908;6:1-25.
4. Pfanzagl J, Sheynin O. Studies in the history of probability and statistics .44. A forerunner of the t-distribution. Biometrika. Dec 1996;83(4):891-898.
5. Dyke G. Obituary: Frank Yates. Journal of the Royal Statistical Society Series A (Statistics in Society. 1995;158(2):333-338.
6. Finney DJ. Remember a pioneer: Frank Yates (1902‐1994). Teaching Statistics. 1998;20(1):2-5.
7. Yates F. Complex Experiments (with discussion). Supplement to the Journal of the Royal Statistical Society. 1935;2(2):181-247.
8. Yates F. Incomplete randomized blocks. Annals of Eugenics. Sep 1936;7:121-140.
9. Yates F. Sir Ronald Fisher and the design of experiments. Biometrics. 1964;20(2):307-321.
10. Fisher RA. Statistical Methods for Research Workers. Oliver and Boyd; 1925.
11. Fisher RA. Statistical Methods for Research Workers. In: Bennett JH, ed. Statistical Methods, Experimental Design and Scientific Inference. Oxford University; 1925.
12. Senn SJ. A conversation with John Nelder. Research Paper. Statistical Science. 2003;18(1):118-131.
13. Nelder JA. The analysis of randomised experiments with orthogonal block structure I. Block structure and the null analysis of variance. Proceedings of the Royal Society of London Series A. 1965;283:147-162.
14. Nelder JA. The analysis of randomised experiments with orthogonal block structure II. Treatment structure and the general analysis of variance. Proceedings of the Royal Society of London Series A. 1965;283:163-178.
15. Senn S. John Ashworth Nelder. 8 October 1924—7 August 2010. The Royal Society Publishing; 2019.
The VSNi Team27 April 2021
Evolution of statistical computing
It is widely acknowledged that the most fundamental developments in statistics in the past 60+ years are driven by information technology (IT). We should not underestimate the importance of pen and paper as a form of IT but it is since people start using computers to do statistical analysis that we really changed the role statistics plays in our research as well as normal life.
In this blog we will give a brief historical overview, presenting some of the main general statistics software packages developed from 1957 onwards. Statistical software developed for special purposes will be ignored. We also ignore the most widely used ‘software for statistics’ as Brian Ripley (2002) stated in his famous quote: “Let’s not kid ourselves: the most widely used piece of software for statistics is Excel.” Our focus is some of the packages developed by statisticians for statisticians, which are still evolving to incorporate the latest development of statistics.
Pioneer statisticians like Ronald Fisher started out doing their statistics on pieces of paper and later upgraded to using calculating machines. Fisher bought the first Millionaire calculating machine when he was heading Rothamsted Research’s statistics department in the early 1920s. It cost about £200 at that time, which is equivalent in purchasing power to about £9,141 in 2020. This mechanical calculator could only calculate direct product, but it was very helpful for the statisticians at that time as Fisher mentioned: "Most of my statistics has been learned on the machine." The calculator was heavily used by Fisher’s successor Frank Yates (Head of Department 1933-1968) and contributed to much of Yates’ research, such as designs with confounding between treatment interactions and blocks, or split plots, or quasi-factorials.
Rothamsted Annual Report for 1952: "The analytical work has again involved a very considerable computing effort."
From the early 1950s we entered the computer age. The computer at this time looked little like its modern counterpart, whether it was an Elliott 401 from the UK or an IBM 700/7000 series in the US. Although the first documented statistical package, BMDP, was developed starting in 1957 for IBM mainframes at the UCLA Health Computing Facility, on the other side of the Atlantic Ocean statisticians at Rothamsted Research began their endeavours to program on an Elliot 401 in 1954.
When we teach statistics in schools or universities, students very often complain about the difficulties of programming. Looking back at programming in the 1950s will give modern students an appreciation of how easy programming today actually is!
An Elliott 401 served one user at a time and requested all input on paper tape (forget your keyboard and intelligent IDE editor). It provided the output to an electric typewriter. All programming had to be in machine code with the instructions and data on a rotating disk with 32-bit word length, 5 "words" of fast-access store, 7 intermediate access tracks of 128 words, 16 further tracks selectable one at a time (= 2949 words – 128 for system).
Computer paper tape
fitting constants to main effects and interactions in multi-way tables (1957), regression and multiple regression (1956), fitting many standard curves as well as multivariate analysis for latent roots and vectors (1955).
Although it sounds very promising with the emerging of statistical programs for research, routine statistical analyses were also performed and these still represented a big challenge, at least computationally. For example, in 1963, which was the last year with the Elliott 401 and Elliott 402 computers, Rothamsted Research statisticians analysed 14,357 data variables, and this took them 4,731 hours to complete the job. It is hard to imagine the energy consumption as well as the amount of paper tape used for programming. Probably the paper tape (all glued together) would be long enough to circle the equator.
The above collection of programs was mainly used for agricultural research at Rothamsted and was not given an umbrella name until John Nelder became Head of the Statistics Department in 1968. The development of Genstat (General Statistics) started from that year and the programming was done in FORTRAN, initially on an IBM machine. In that same year, at North Carolina State University, SAS (Statistical Analysis Software) was almost simultaneously developed by computational statisticians, also for analysing agricultural data to improve crop yields. At around the same time, social scientists at the University of Chicago started to develop SPSS (Statistical Package for the Social Sciences). Although the three packages (Genstat, SAS and SPSS) were developed for different purposes and their functions diverged somewhat later, the basic functions covered similar statistical methodologies.
The first version of SPSS was released in 1968. In 1970, the first version of Genstat was released with the functions of ANOVA, regression, principal components and principal coordinate analysis, single-linkage cluster analysis and general calculations on vectors, matrices and tables. The first version of SAS, SAS 71, was released and named after the year of its release. The early versions of all three software packages were written in FORTRAN and designed for mainframe computers.
Since the 1980s, with the breakthrough of personal computers, a second generation of statistical software began to emerge. There was an MS-DOS version of Genstat (Genstat 4.03) released with an interactive command line interface in 1980.
Genstat 4.03 for MSDOS
Around 1985, SAS and SPSS also released a version for personal computers. In the 1980s more players entered this market: STATA was developed from 1985 and JMP was developed from 1989. JMP was, from the very beginning, for Macintosh computers. As a consequence, JMP had a strong focus on visualization as well as graphics from its inception.
The development of the third generation of statistical computing systems had started before the emergence of software like Genstat 4.03e or SAS 6.01. This development was led by John Chambers and his group in Bell Laboratories since the 1970s. The outcome of their work is the S language. It had been developed into a general purpose language with implementations for classical as well as modern statistical inferences. S language was freely available, and its audience was mainly sophisticated academic users. After the acquisition of S language by the Insightful Corporation and rebranding as S-PLUS, this leading third generation statistical software package was widely used in both theoretical and practical statistics in the 1990s, especially before the release of a stable beta version of the free and open-source software R in the year 2000. R was developed by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently widely used by statisticians in academia and industry, together with statistical software developers, data miners and data analysts.
Software like Genstat, SAS, SPSS and many other packages had to deal with the challenge from R. Each of these long-standing software packages developed an R interface R or even R interpreters to anticipate the change of user behaviour and ever-increasing adoption of the R computing environment. For example, SAS and SPSS have some R plug-ins to talk to each other. VSNi’s ASReml-R software was developed for ASReml users who want to run mixed model analysis within the R environment, and at the present time there are more ASReml-R users than ASReml standalone users. Users who need reliable and robust mixed effects model fitting adopted ASReml-R as an alternative to other mixed model R packages due to its superior performance and simplified syntax. For Genstat users, msanova was also developed as an R package to provide traditional ANOVA users an R interface to run their analysis.
We have no clear idea about what will represent the fourth generation of statistical software. R, as an open-source software and a platform for prototyping and teaching has the potential to help this change in statistical innovation. An example is the R Shiny package, where web applications can be easily developed to provide statistical computing as online services. But all open-source and commercial software has to face the same challenges of providing fast, reliable and robust statistical analyses that allow for reproducibility of research and, most importantly, use sound and correct statistical inference and theory, something that Ronald Fisher will have expected from his computing machine!
Dr. Vanessa Cave10 May 2022
The essential role of statistical thinking in animal ethics: dealing with reduction
Having spent over 15 years working as an applied statistician in the biosciences, I’ve come across my fair-share of animal studies. And one of my greatest bugbears is that the full value is rarely extracted from the experimental data collected. This could be because the best statistical approaches haven’t been employed to analyse the data, the findings are selectively or incorrectly reported, other research programmes that could benefit from the data don’t have access to it, or the data aren’t re-analysed following the advent of new statistical methods or tools that have the potential to draw greater insights from it.
An enormous number of scientific research studies involve animals, and with this come many ethical issues and concerns. To help ensure high standards of animal welfare in scientific research, many governments, universities, R&D companies, and individual scientists have adopted the principles of the 3Rs: Replacement, Reduction and Refinement. Indeed, in many countries the tenets of the 3Rs are enshrined in legislation and regulations around the use of animals in scientific research.
|Use methods or technologies that replace or avoid the use of animals.|
|Limit the number of animals used.|
|Refine methods in order to minimise or eliminate negative animal welfare impacts.|
In this blog, I’ll focus on the second principle, Reduction, and argue that statistical expertise is absolutely crucial for achieving reduction.
The aim of reduction is to minimise the number of animals used in scientific research whilst balancing against any additional adverse animal welfare impacts and without compromising the scientific value of the research. This principle demands that before carrying out an experiment (or survey) involving animals, the researchers must consider and implement approaches that both:
Both these considerations involve statistical thinking. Let’s begin by exploring the important role statistics plays in minimising current animal use.
Reduction requires that any experiment (or survey) carried out must use as few animals as possible. However, with too few animals the study will lack the statistical power to draw meaningful conclusions, ultimately wasting animals. But how do we determine how many animals are needed for a sufficiently powered experiment? The necessary starting point is to establish clearly defined, specific research questions. These can then be formulated into appropriate statistical hypotheses, for which an experiment (or survey) can be designed.
Statistical expertise in experimental design plays a pivotal role in ensuring enough of the right type of data are collected to answer the research questions as objectively and as efficiently as possible. For example, sophisticated experimental designs involving blocking can be used to reduce random variation, making the experiment more efficient (i.e., increase the statistical power with fewer animals) as well as guarding against bias. Once a suitable experimental design has been decided upon, a power analysis can be used to calculate the required number of animals (i.e., determine the sample size). Indeed, a power analysis is typically needed to obtain animal ethics approval - a formal process in which the benefits of the proposed research is weighed up against the likely harm to the animals.
Researchers also need to investigate whether pre-existing sources of information or data could be integrated into their study, enabling them to reduce the number of animals required. For example, by means of a meta-analysis. At the extreme end, data relevant to the research questions may already be available, eradicating the need for an experiment altogether!
An obvious mechanism for minimising future animal use is to ensure we do it right the first time, avoiding the need for additional experiments. This is easier said than done; there are many statistical and practical considerations at work here. The following paragraphs cover four important steps in experimental research in which statistical expertise plays a major role: data acquisition, data management, data analysis and inference.
Above, I alluded to the validity of the experimental design. If the design is flawed, the data collected will be compromised, if not essentially worthless. Two common mistakes to avoid are pseudo-replication and the lack of (or poor) randomisation. Replication and randomisation are two of the basic principles of good experimental design. Confusing pseudo-replication (either at the design or analysis stage) for genuine replication will lead to invalid statistical inferences. Randomisation is necessary to ensure the statistical inference is valid and for guarding against bias.
Another extremely important consideration when designing an experiment, and setting the sample size, is the risk and impact of missing data due, for example, to animal drop-out or equipment failure. Missing data results in a loss of statistical power, complicates the statistical analysis, and has the potential to cause substantial bias (and potentially invalidate any conclusions). Careful planning and management of an experiment will help minimise the amount of missing data. In addition, safe-guards, controls or contingencies could be built into the experimental design that help mitigate against the impact of missing data. If missing data does result, appropriate statistical methods to account for it must be applied. Failure to do so could invalidate the entire study.
It is also important that the right data are collected to answer the research questions of interest. That is, the right response and explanatory variables measured at the appropriate scale and frequency. There are many statistical related-questions the researchers must answer, including: what population do they want to make inference about? how generalisable do they need their findings to be? what controllable and uncontrollable variables are there? Answers to these questions not only affects enrolment of animals into the study, but also the conditions they are subjected to and the data that should be collected.
It is essential that the data from the experiment (including meta-data) is appropriately managed and stored to protect its integrity and ensure its usability. If the data get messed up (e.g., if different variables measured on the same animal cannot be linked), is undecipherable (e.g., if the attributes of the variables are unknown) or is incomplete (e.g., if the observations aren’t linked to the structural variables associated with the experimental design), the data are likely worthless. Statisticians can offer invaluable expertise in good data management practices, helping to ensure the data are accurately recorded, the downstream results from analysing the data are reproducible and the data itself is reusable at a later date, by possibly a different group of researchers.
Unsurprisingly, it is also vitally important that the data are analysed correctly, using the methods that draw the most value from it. As expected, statistical expertise plays a huge role here! The results and inference are meaningful only if appropriate statistical methods are used. Moreover, often there is a choice of valid statistical approaches; however, some approaches will be more powerful or more precise than others.
Having analysed the data, it is important that the inference (or conclusions) drawn are sound. Again, statistical thinking is crucial here. For example, in my experience, one all too common mistake in animal studies is to accept the null hypothesis and erroneously claim that a non-significant result means there is no difference (say, between treatment means).
The other important mechanism for minimising future animal use is to share the knowledge and information gleaned. The most basic step here is to ensure that all the results are correctly and non-selectively reported. Reporting all aspects of the trial, including the experimental design and statistical analysis, accurately and completely is crucial for the wider interpretation of the findings, reproducibility and repeatability of the research, and for scientific scrutiny. In addition, all results, including null results, are valuable and should be shared.
Sharing the data (or resources, e.g., animal tissues) also contributes to reduction. The data may be able to be re-used for a different purpose, integrated with other sources of data to provide new insights, or re-analysed in the future using a more advanced statistical technique, or for a different hypothesis.
Another avenue that should also be explored is whether additional data or information can be obtained from the experiment, without incurring any further adverse animal welfare impacts, that could benefit other researchers and/or future studies. For example, to help address a different research question now or in the future. At the outset of the study, researchers must consider whether their proposed study could be combined with another one, whether the research animals could be shared with another experiment (e.g., animals euthanized for one experiment may provide suitable tissue for use in another), what additional data could be collected that may (or is!) of future use, etc.
Statistical thinking clearly plays a fundamental role in reducing the number of animals used in scientific research, and in ensuring the most value is drawn from the resulting data. I strongly believe that statistical expertise must be fully utilised through the duration of the project, from design through to analysis and dissemination of results, in all research projects involving animals to achieving reduction. In my experience, most researchers strive for very high standards of animal ethics, and absolutely do not want to cause unnecessary harm to animals. Unfortunately, the role statistical expertise plays here is not always appreciated or taken advantage of. So next time you’re thinking of undertaking research involving animals, ensure you have expert statistical input!
Dr. Vanessa Cave is an applied statistician interested in the application of statistics to the biosciences, in particular agriculture and ecology, and is a developer of the Genstat statistical software package. She has over 15 years of experience collaborating with scientists, using statistics to solve real-world problems. Vanessa provides expertise on experiment and survey design, data collection and management, statistical analysis, and the interpretation of statistical findings. Her interests include statistical consultancy, mixed models, multivariate methods, statistical ecology, statistical graphics and data visualisation, and the statistical challenges related to digital agriculture.
Vanessa is currently President of the Australasian Region of the International Biometric Society, past-President of the New Zealand Statistical Association, an Associate Editor for the Agronomy Journal, on the Editorial Board of The New Zealand Veterinary Journal and an honorary academic at the University of Auckland. She has a PhD in statistics from the University of St Andrew.
Kanchana Punyawaew and Dr. Vanessa Cave01 March 2021
Mixed models for repeated measures and longitudinal data
The term "repeated measures" refers to experimental designs or observational studies in which each experimental unit (or subject) is measured repeatedly over time or space. "Longitudinal data" is a special case of repeated measures in which variables are measured over time (often for a comparatively long period of time) and duration itself is typically a variable of interest.
In terms of data analysis, it doesn’t really matter what type of data you have, as you can analyze both using mixed models. Remember, the key feature of both types of data is that the response variable is measured more than once on each experimental unit, and these repeated measurements are likely to be correlated.
To illustrate the use of mixed model approaches for analyzing repeated measures, we’ll examine a data set from Landau and Everitt’s 2004 book, “A Handbook of Statistical Analyses using SPSS”. Here, a double-blind, placebo-controlled clinical trial was conducted to determine whether an estrogen treatment reduces post-natal depression. Sixty three subjects were randomly assigned to one of two treatment groups: placebo (27 subjects) and estrogen treatment (36 subjects). Depression scores were measured on each subject at baseline, i.e. before randomization (predep) and at six two-monthly visits after randomization (postdep at visits 1-6). However, not all the women in the trial had their depression score recorded on all scheduled visits.
In this example, the data were measured at fixed, equally spaced, time points. (Visit is time as a factor and nVisit is time as a continuous variable.) There is one between-subject factor (Group, i.e. the treatment group, either placebo or estrogen treatment), one within-subject factor (Visit or nVisit) and a covariate (predep).
Using the following plots, we can explore the data. In the first plot below, the depression scores for each subject are plotted against time, including the baseline, separately for each treatment group.
In the second plot, the mean depression score for each treatment group is plotted over time. From these plots, we can see variation among subjects within each treatment group that depression scores for subjects generally decrease with time, and on average the depression score at each visit is lower with the estrogen treatment than the placebo.
The simplest approach for analyzing repeated measures data is to use a random effects model with subject fitted as random. It assumes a constant correlation between all observations on the same subject. The analysis objectives can either be to measure the average treatment effect over time or to assess treatment effects at each time point and to test whether treatment interacts with time.
In this example, the treatment (Group), time (Visit), treatment by time interaction (Group:Visit) and baseline (predep) effects can all be fitted as fixed. The subject effects are fitted as random, allowing for constant correlation between depression scores taken on the same subject over time.
The code and output from fitting this model in ASReml-R 4 follows;
The output from summary() shows that the estimate of subject and residual variance from the model are 15.10 and 11.53, respectively, giving a total variance of 15.10 + 11.53 = 26.63. The Wald test (from the wald.asreml() table) for predep, Group and Visit are significant (probability level (Pr) ≤ 0.01). There appears to be no relationship between treatment group and time (Group:Visit) i.e. the probability level is greater than 0.05 (Pr = 0.8636).
In practice, often the correlation between observations on the same subject is not constant. It is common to expect that the covariances of measurements made closer together in time are more similar than those at more distant times. Mixed models can accommodate many different covariance patterns. The ideal usage is to select the pattern that best reflects the true covariance structure of the data. A typical strategy is to start with a simple pattern, such as compound symmetry or first-order autoregressive, and test if a more complex pattern leads to a significant improvement in the likelihood.
Note: using a covariance model with a simple correlation structure (i.e. uniform) will provide the same results as fitting a random effects model with random subject.
In ASReml-R 4 we use the corv() function on time (i.e. Visit) to specify uniform correlation between depression scores taken on the same subject over time.
Here, the estimate of the correlation among times (Visit) is 0.57 and the estimate of the residual variance is 26.63 (identical to the total variance of the random effects model, asr1).
Specifying a heterogeneous first-order autoregressive covariance structure is easily done in ASReml-R 4 by changing the variance-covariance function in the residual term from corv() to ar1h().
When the relationship of a measurement with time is of interest, a random coefficients model is often appropriate. In a random coefficients model, time is considered a continuous variable, and the subject and subject by time interaction (Subject:nVisit) are fitted as random effects. This allows the slopes and intercepts to vary randomly between subjects, resulting in a separate regression line to be fitted for each subject. However, importantly, the slopes and intercepts are correlated.
The str() function of asreml() call is used for fitting a random coefficient model;
The summary table contains the variance parameter for Subject (the set of intercepts, 23.24) and Subject:nVisit (the set of slopes, 0.89), the estimate of correlation between the slopes and intercepts (-0.57) and the estimate of residual variance (8.38).
Brady T. West, Kathleen B. Welch and Andrzej T. Galecki (2007). Linear Mixed Models: A Practical Guide Using Statistical Software. Chapman & Hall/CRC, Taylor & Francis Group, LLC.
Brown, H. and R. Prescott (2015). Applied Mixed Models in Medicine. Third Edition. John Wiley & Sons Ltd, England.
Sabine Landau and Brian S. Everitt (2004). A Handbook of Statistical Analyses using SPSS. Chapman & Hall/CRC Press LLC.