A linear mixed model for a completely randomized design

A linear mixed model for a completely randomized design

The VSNi Team

16 February 2022
image_blog

Introduction to completely randomized designs (CRD)

In a completely randomized design (CRD) treatments are randomly allocated to the experimental units. The randomization ensures that each treatment is equally likely to be assigned to any given experimental unit. For a balanced design, the replication (i.e., the number of experimental units) is the same for each treatment. However, a CRD is not required to be balanced to be analysed.

Importance of blocking in completely randomized designs

When there are known differences between the experimental units, you can improve precision and avoid bias by blocking. To find out more, check out the blog Using blocking to improve precision and avoid bias.

Randomization helps us guard against bias. In the figure below, five treatments have been randomly assigned to 50 experimental units using a balanced design (i.e., 10 experimental units per treatment). The orange and green outline colours represent some inherent, but unknown, differences between the experimental units. For example, in a plant trial, a difference in plant vigour, or in a medical trial, a physiological difference between patients. Randomization reduces bias by evening out the differences between experimental units.

alt text

Let’s find out how to analyse data from a CRD study.

Example: A plant physiology experiment

Our example data are from an experiment in plant physiology, published by Sokal and Rohlf (1995). The lengths of pea sections (the dependent, or response, variable) grown in a tissue culture were recorded. The purpose of the experiment was to test the effects of various sugar media (the independent, or explanatory, variable) on mean pea section length. A balanced CRD was used with 10 replicates per treatment level.

The data are given in the table below:

alt text

Note: Prior to analysis we’ll need to arrange the data in long-format, with Treatment a factor with 5 levels (the 4 sugar treatments plus a control) and Length a variate.

Dot histograms of the data from each treatment indicate that:

➣ Mean pea section length is greatest for the Control treatment.

➣ Mean pea section length for the Sucrose treatment is greater than for the 3 other sugar medias.

➣ The Control data may be more variable than data from the sugar media treatments.

alt text

We’ll consider two different models for this CRD data set:

Model 1: Analyzing CRD with constant variance

This simple model comprises a fixed term for the treatments and a single residual (or error) variance for all treatment groups.

Model 2: Handling unequal variance in CRD

This more complex model comprises a fixed term for the treatments and a separate residual variance for each treatment group.

ASReml-R4 code for fitting the models

# Model 1

model1<-asreml(fixed=Length~Treatment, 

              residual=~units,

               data=peadata)

# Model 2

model2<-asreml(fixed=Length~Treatment, 

              residual=~dsum(~id(units)|Treatment),

              data=peadata)

Results from fitting Model 1 in ASReml-R4

alt text

Genstat code for fitting the models

"Model 1"

VCOMPONENTS [FIXED=Treatment] 

REML [PRINT=model,components,means,waldTests; PSE=alldifferences] Length

 

"Model 2"

VCOMPONENTS [FIXED=Treatment; EXPERIMENTS=Treatment] 

REML [PRINT=model,components,means,waldTests; PSE=alldifferences] Length

Results from fitting Model 1 in Genstat

alt text

alt text
The estimated variance component (i.e., the residual variance) and its standard error are 5.46 ± 1.15.
alt text
The Wald test provides strong statistical evidence of a Treatment effect (P < 0.001),that is, the mean pea section length differs between the treatments.
alt text
The estimated means are 70.1, 59.3, 58.2, 58.0, and 64.1 in ocular units for Control, 2% Glucose, 2% Fructose, 1% Glucose + 1% Fructose, and 2% Sucrose, respectively. The standard error of the difference between treatment means is 1.045
alt text
For our inference to be valid, it is very important that we check the residuals for evidence of departures from the residual assumptions of normality and constant variance.

For normality:

➣ The histogram should have a reasonably symmetric bell-shape.

➣ The normal plot should form approximately a straight line.

For constant variance:

➣ The spread of the residuals in the scatterplots against fitted values and units should be roughly equal over the range of the data.

Here, there is some evidence of non-constant variance, in particular the residual variance appears to be greater for the Control treatment. Model 2 will allow for this unequal residual variance. Let’s look at the results from fitting Model 2:

Results from fitting Model 2 in ASReml-R4

alt text

Results from fitting Model 2 in Genstat

alt text

Indicated by the yellow flag are the estimated residual variances for each treatment. We can see that the estimated variance for the Control treatment group is much higher than for the other treatments (albeit with a large standard error). Consequently, the standard error of the mean for the Control group is higher than the other treatments, as is its standard error of the differences. A formal comparison between Model 1 and 2 can be performed by using a likelihood ratio test, but this is the topic of another blog!

Reference

Sokal, R. R. and Rohlf, F. J. (1995). Biometry: the principles and practice of statistics in biological research. 3rd Edition, W.H. Freeman, New York, USA.