The VSNi Team
16 February 2022In a completely randomized design (CRD) treatments are randomly allocated to the experimental units. The randomization ensures that each treatment is equally likely to be assigned to any given experimental unit. For a balanced design, the replication (i.e., the number of experimental units) is the same for each treatment. However, a CRD is not required to be balanced to be analysed.
Importance of blocking in completely randomized designsWhen there are known differences between the experimental units, you can improve precision and avoid bias by blocking. To find out more, check out the blog Using blocking to improve precision and avoid bias. |
Randomization helps us guard against bias. In the figure below, five treatments have been randomly assigned to 50 experimental units using a balanced design (i.e., 10 experimental units per treatment). The orange and green outline colours represent some inherent, but unknown, differences between the experimental units. For example, in a plant trial, a difference in plant vigour, or in a medical trial, a physiological difference between patients. Randomization reduces bias by evening out the differences between experimental units.
Let’s find out how to analyse data from a CRD study.
Our example data are from an experiment in plant physiology, published by Sokal and Rohlf (1995). The lengths of pea sections (the dependent, or response, variable) grown in a tissue culture were recorded. The purpose of the experiment was to test the effects of various sugar media (the independent, or explanatory, variable) on mean pea section length. A balanced CRD was used with 10 replicates per treatment level.
The data are given in the table below:
Note: Prior to analysis we’ll need to arrange the data in long-format, with Treatment a factor with 5 levels (the 4 sugar treatments plus a control) and Length a variate.
Dot histograms of the data from each treatment indicate that:
➣ Mean pea section length is greatest for the Control treatment.
➣ Mean pea section length for the Sucrose treatment is greater than for the 3 other sugar medias.
➣ The Control data may be more variable than data from the sugar media treatments.
We’ll consider two different models for this CRD data set:
This simple model comprises a fixed term for the treatments and a single residual (or error) variance for all treatment groups.
This more complex model comprises a fixed term for the treatments and a separate residual variance for each treatment group.
# Model 1 model1<-asreml(fixed=Length~Treatment, residual=~units, data=peadata) # Model 2 model2<-asreml(fixed=Length~Treatment, residual=~dsum(~id(units)|Treatment), data=peadata) |
"Model 1" VCOMPONENTS [FIXED=Treatment] REML [PRINT=model,components,means,waldTests; PSE=alldifferences] Length
"Model 2" VCOMPONENTS [FIXED=Treatment; EXPERIMENTS=Treatment] REML [PRINT=model,components,means,waldTests; PSE=alldifferences] Length |
The estimated variance component (i.e., the residual variance) and its standard error are 5.46 ± 1.15. | |
The Wald test provides strong statistical evidence of a Treatment effect (P < 0.001),that is, the mean pea section length differs between the treatments. | |
The estimated means are 70.1, 59.3, 58.2, 58.0, and 64.1 in ocular units for Control, 2% Glucose, 2% Fructose, 1% Glucose + 1% Fructose, and 2% Sucrose, respectively. The standard error of the difference between treatment means is 1.045 | |
For our inference to be valid, it is very important that we check the residuals for evidence of departures from the residual assumptions of normality and constant variance. |
For normality:
➣ The histogram should have a reasonably symmetric bell-shape.
➣ The normal plot should form approximately a straight line.
For constant variance:
➣ The spread of the residuals in the scatterplots against fitted values and units should be roughly equal over the range of the data.
Here, there is some evidence of non-constant variance, in particular the residual variance appears to be greater for the Control treatment. Model 2 will allow for this unequal residual variance. Let’s look at the results from fitting Model 2:
Indicated by the yellow flag are the estimated residual variances for each treatment. We can see that the estimated variance for the Control treatment group is much higher than for the other treatments (albeit with a large standard error). Consequently, the standard error of the mean for the Control group is higher than the other treatments, as is its standard error of the differences. A formal comparison between Model 1 and 2 can be performed by using a likelihood ratio test, but this is the topic of another blog!
Sokal, R. R. and Rohlf, F. J. (1995). Biometry: the principles and practice of statistics in biological research. 3rd Edition, W.H. Freeman, New York, USA.
Related Reads