Dr. Vanessa Cave

03 May 2023In an earlier blog (ANOVA, LM, LMM, GLM, GLMM, HGLM? Which statistical method should I use?) a simple diagram was presented with the aim of helping you decide which statistical model is appropriate for your data. In this follow-up blog, we’ll delve a little deeper and explore the relationships between the models: linear model (LM), generalized linear model (GLM), generalized linear mixed model (GLMM) and hierarchical generalized linear model (HGLM).

LMs can be used to model Normal data with a single source of random variation |

The linear model is:

where:

- is the vector containing the observed response values , assumed to be Normally distribution with mean and variance
- is the vector of mean responses predicted by the model (i.e., is the expected value of observation )
- is the vector of residuals (i.e., the random error), assumed to have a Normal distribution with mean 0 and variance

and

- the mean, , is modelled by a of explanatory variables, i.e.,

where , , …, are the regression coefficients (i.e., parameters) associated with the explanatory variables , , …, , respectively. In matrix form, this mean model can be written more succinctly as:

where is the model matrix for the explanatory variables, and is a vector containing their regression coefficients.

## Simple example of a linear modelModelling diastolic blood pressure using age as a predictor (i.e., explanatory variable). |

GLMs extend linear models to accommodate data from non-Normal distributions |

In a generalized linear model, the expected value of is still **but…**

1. can now come from any distribution from the exponential family. In addition to the Normal distribution, this includes (amongst others) the binomial, Poisson, gamma, inverse-Normal, multinomial, negative-binomial, geometric, exponential and Bernoulli distributions.

and, importantly,

2. the underlying linear model now defines a , i.e.,

which is related to the mean response, , via a :

Notice that the link function defines the transformation required to make the model linear.

Due to its special properties, often the *canonical* link function for the distribution of is used. However, sometimes there are good reasons to use a different link. For example, for binomial data, the canonical link function is the logit; however, for scientific reasons, the probit link or complementary-log-log link might be more appropriate. The canonical link functions are:

## Simple example of a generalized linear modelModelling the number of students awake at the end of a lecture (i.e., binomial data) using the duration of the lecture (in minutes) as a predictor, and a logit link function. Predicted proportion of students awake at the end of the lecture on the scale of the linear predictor (i.e., logit scale) = 3 - 0.07 × Plotted on the original scale (i.e., as proportions): |

GLMMs extend generalized linear models to allow for more than one source of random variation (i.e., random effects) |

Once again, the expected value of is and, as for generalized linear models, the underlying linear model defines the linear predictor **but…** the linear predictor is extended to include one or more . That is, the linear predictor is:

where is the model matrix for the random term, and corresponds to its vector of random effects. By allowing for random terms, data with additional sources of random variation, such as block effects, can be modelled.

In a generalized linear mixed model, the random effects corresponding to the random term are assumed to come from a Normal distribution with mean 0 and variance .

## Simple example of a generalized linear modelModelling the number of nematodes in a plot, after treatment with one of four different fumigants, from a trial with a randomized complete block design. Response variable, NematodesAssumed distribution of PoissonLink function, g(): Explanatory (fixed) terms: factor Random terms: factor |

HGLMs extend generalized linear mixed models to allow for the random effects to follow a non-Normal distribution. |

Just as for the three earlier modelling frameworks, the expected value of is . And, as for generalized linear mixed models, the linear predictor, , can include random terms **but…** these additional random terms aren’t constrained to follow just a Normal distribution nor to have an identity link. That is, the linear predictor is:

where the random terms now have their own link function:

and the vectors of random effects can follow a non-Normal distribution (e.g., beta, gamma, inverse gamma).

As it’s algorithmically and intuitively appealing, often the *conjugate* distribution to the distribution of the response variable, , is used for the random effects, e.g.:

## Simple example of a hierarchical generalized modelAs above, modelling the number of nematodes in a plot, after treatment with one of four different fumigants, from a trial with a randomized complete block design |

**Want to fit a LM, GLM, GLMM or HGLM?** Genstat offers comprehensive and user-friendly menus for fitting these models and outputting results.

Dr. Vanessa Cave is an applied statistician interested in the application of statistics to the biosciences, in particular agriculture and ecology, and is a developer of the Genstat statistical software package. She has over 15 years of experience collaborating with scientists, using statistics to solve real-world problems. Vanessa provides expertise on experiment and survey design, data collection and management, statistical analysis, and the interpretation of statistical findings. Her interests include statistical consultancy, mixed models, multivariate methods, statistical ecology, statistical graphics and data visualisation, and the statistical challenges related to digital agriculture.

Vanessa is a past President of both the Australasian Region of the International Biometric Society and the New Zealand Statistical Association, on the Editorial Board of The New Zealand Veterinary Journal and an honorary academic at the University of Auckland. She has a PhD in statistics from the University of St Andrew.

Related Reads