Dr. Salvador A. Gezan

06 April 2022You probably have heard about the basic principles of experimental design: **replication**, **randomization** and **blocking**. In this blog, we will illustrate these principles using a specific example in order to give you a more intuitive understanding of their importance.

We will consider a typical agricultural study (but this also applies to many other fields) on which we want to compare two treatments: A and B. We will consider, for example, that these treatments correspond to two commercial plant varieties that we want to compare. Let’s assume that we have a total of 50 plants available for each of the two varieties for this study, and that there is a plot of land to establish our experiment.

Let’s start with the principle of **replication**. Suppose you set up your experiment using two large plots of 50 plants each. These (schematically) will look like this:

**A B**

Each letter represents a plot. In this case, we have a single replicate that is formed by 50 plants. That is, this replication corresponds to the undesired case of pseudo-replication . The key element is that you have a single replicate (plot) for each of your treatments. This is not sufficient for any statistical analyses - even if you think you have 50 plants in each treatment, that is really pseudo-replication.

Also, in terms of layout for the above experiment, intuitively you might be concerned that the left side of the field has better conditions than the right side; hence, this will make treatment A look better than it really is. So, there is a need to have some additional representation (and a better distribution) of the treatment in the field to avoid this issue. This is the role of replication.

Now, let’s move to an improved experimental design with 5 replications (plots) per treatment. This time, having the same resources, we will have a total of 10 plots (5 for each treatment), but each is smaller with only 10 plants. An arrangement of this experiment might look like this:

**A B B A B A A A B B B A**

In this new experimental design, we have replication and this seems and feels safer, with more copies of the same treatment to assist with our conclusions. But there is a key statistical principle in the above case too, that is **randomization**. The above layout originates by allocating the treatments randomly to each plot. There are many (thousands) of potential arrangements; the above is just one of them, and it looks good. As you can see there is clearly some mixing of the treatments A and B across the plots. The above design is known as a completely randomized design (CRD) and does not have any restrictions and is very easy to generate.

Replication is critical to give us additional observations of the same treatment and to allow us to calculate an estimate of the underlying variability (copies of the same treatment are not completely identical). On the other hand, randomization is needed to ensure that our statistical inference is valid; we use it to avoid our personal bias and to let chance have a say. Note that any potential random arrangement has the same probability of been selected and this is key for all calculations, particularly the assumption regarding the probability distributions of the data.

However, we can have a more extreme case: imagine that you obtained the following randomization for the same experiment:

**B B B B A B A A A A**

Now, that looks a bit odd, and it no longer feels correct! Most of the plots for treatments B are on the left, and most of the plots for A are on the right. This is a valid randomization: one of the many possibilities from the thousands of potential arrangements, but here we have some concerns. For example, how about if we have, as before, better conditions on the left side, producing a bias in treatment B making it look better than what really is.

For any randomization that is like the above arrangement, where we are ‘concerned’ or we feel ‘uncomfortable’ with it, we tend to say that: **“if you do not like it, then this design is not for you!”** Your concerns might be valid, and therefore, a CRD is not the design for you and there is a need to do something about it.

This is where the principle of **blocking** (also known as local control) comes into place. Here, in order to adjust for some potential (known or unknown) bias, we add some structure to the experimental layout. This is where using a randomized complete block design (RCBD) seems more intuitive. In our example we will set up a RCBD with 5 blocks, each with 2 plots. And therefore, we will partition the field as:

**1 1 | 2 2 | 3 3 | 4 4 | 5 5**

The numbers correspond to blocks, and then what we do is randomize the treatments within the blocks. Note that this implies a **restriction** - that is, if plot 1 in block 3 is treatment A, then the other plot in block 3 can only be treatment B. This restriction is important to control for that bias we are concerned about.

Now, we are ready to have a randomization for our RCBD, such as this:

**B A | A B | A B | B A | B A**

Intuitively, the above arrangement feels *‘good’*: well balanced and safe in the sense of potential bias. This good feeling is happening because internally we are concern with a trend from left to right. If we did not have this concern then we will not require to do this blocking or control. However, it does seem like a good insurance for our study.

The above is the main characteristic of experimental designs with some form of blocking (such as RCBD, but also row-column and alpha-lattice, amongst others). They aim at creating a better distribution of the treatments across the field trial. And for the same reason they are strongly favored, as we can never be sure about potential underlying trends or biases (think for example of fertility pockets). Hence, blocking has the important benefit of avoiding bias, which can improve precision by reducing the unexplained variation (and in turn increase our statistical power!).

One important last aspect of blocking, is that it is never free! This insurance has a cost, and it is on degrees of freedom (df) affecting our statistical tests. In our above example, the denominator df for the CRD is 8 (n – t = 10 – 2), but for the RCBD this is only 4 (n – t – b + 1 =4) (where n, t and b refer to the number of plots, treatments and blocks, respectively). In this small experiment, this is an important difference and it will affect our statistical inference; however, in larger experiments, this difference still exists but it will be less costly. Note that this loss of df can be outweighed by the reduction in unexplained variation.

All three design principles of replication, randomization and blocking, should always be kept in mind. And if in the future you feel *‘uncomfortable’* with a randomization for one of your experiments, **then that randomization is not for you!**

Dr. Salvador Gezan is a statistician/quantitative geneticist with more than 20 years’ experience in breeding, statistical analysis and genetic improvement consulting. He currently works as a Statistical Consultant at VSN International, UK. Dr. Gezan started his career at Rothamsted Research as a biometrician, where he worked with Genstat and ASReml statistical software. Over the last 15 years he has taught ASReml workshops for companies and university researchers around the world.

Dr. Gezan has worked on agronomy, aquaculture, forestry, entomology, medical, biological modelling, and with many commercial breeding programs, applying traditional and molecular statistical tools. His research has led to more than 100 peer reviewed publications, and he is one of the co-authors of the textbook *Statistical Methods in Biology: Design and Analysis of Experiments and Regression.*

Related Reads