Dr. Vanessa Cave and Dr. Roger Payne11 October 2023
Generalized linear models (GLMs) are a powerful and popular tool in statistics for the analysis of non-Normal data, such as counts and proportions. They are an important data analysis method in a wide range of research fields, including agriculture, biology, medical research, quality control and insurance. However, a major limitation of GLMs is that they cater for only one source of random variation: the residual error. In practice, many data sets involve more than one source of random variability. For example, when data are collected from a designed experiment with blocking variables, or when repeated measurements are taken on the same experimental unit (or subject).
In an earlier blog we learnt that generalized linear mixed models (GLMMs) and hierarchical generalized linear models (HGLMs) extend GLMs to accommodate additional sources of random variation. We also learnt that, unlike GLMMs, in an HGLM these additional random terms aren’t constrained to follow a Normal distribution nor to have an identity link function. For example, if the basic GLM is a log-linear model (Poisson distribution and log link), a more appropriate assumption for the additional random terms might be the gamma distribution (the conjugate distribution to the Poisson) and the log link function, as opposed to the Normal distribution and the identity link function used by the GLMM.
GLMMs can be considered a special case of HGLMs in which we do assume that the additional random terms are Normal and have an identity link. Indeed, some statistical practitioners routinely fit GLMMs using the HGLM framework. In this blog we discuss why.
A perilous problem with GLMMs is that the variance components of the random terms tend to be underestimated by the REML-based algorithm. That is, the estimates of the variance components are generally smaller than their true values. Consequently, the standard errors of the fixed effects may be too small, and the Wald and F tests may be too large (making their p-values too small). Therefore, we need to be cautious when interpreting the Wald and F tests.
This bias is especially problematical for binary data, but it also can be a serious issue for binomial data with small binomial totals, Poisson data with small counts, and for small data sets.
Permutation tests can help overcome this problem by providing a more reliable way of assessing the fixed effects in a GLMM. However, an alternative approach is to fit the GLMM as an HGLM, using enhanced Laplace approximations to help reduce the bias in the estimates of the variance components. (You can learn more about the theory underpinning GLMMs and HGLMs in the Genstat e-learning course Generalized linear models with random effects (GLMMs & HGLMs)).
Let’s illustrate with an example.
Our example data set is from a study of mating success of salamander. Three experiments were conducted, each mating 20 male and 20 female salamanders of 2 types (R = rough butt or W = whiteside), paired 6 times with their own and the other type. The response variable is binary, recording whether or not the mating was successful. Of interest is whether the male and female parent type influences a successful mating.
The variables in the data set are:
Season (factor) - season the experiment was conducted (Fall or Summer)
Experiment (factor) - experiment 1, 2, or 3
TypeM (factor) - type of the male parent (R or W)
TypeF (factor) - type of the female parent (R or W)
Cross (factor) – combination of TypeF by TypeM (RR, RW, WR or WW)
Male (factor) - male parent id (60 levels)
Female (factor) - female parent id (60 levels)
Mate (variate) - binary response variable where 1 = successful mating, and 0 = unsuccessful mating
Note: Binary data is binomial data in which the binomial totals are 1.
Using Genstat, we’ll fit a binomial GLMM with a logit link function in which:
The GLMM menu for fitting this model is shown below:
The estimated variance components and their standard errors are:
However, when we compare these estimates to those obtained using:
it appears that the GLMM estimates maybe downward biased!
Monte Carlo EM
Binomial-logit GLMM fitted as an HGLM
Now, let’s fit our GLMM as an HGLM. The HGLM menu is shown below:
In the Options menu, we can set the order of the Laplace approximations to be used in the estimation of the mean model and in the estimation of the dispersion model.
Setting the order for the mean and dispersion models to 1 is recommended for binomial and Poisson GLMMs, where the use of order 0 (the default) can lead to serious downwards bias in the estimates of the variance components.
(Note: it is best not to go beyond order 1 as the algorithm can take a long time to run and there may be convergence problems.)
Using Laplace order 0 approximations
Let’s first fit our HGLM leaving the order of the Laplace approximations at the default of 0. This results in estimates of the variance components very similar to those obtained using the GLMM menu. The estimates of the fixed effects (not shown) are also very similar. As GLMM and HGLM use very different model fitting algorithms, it’s not unexpected to see small differences.
HGLM (order 0)
Monte Carlo EM
Using Laplace order 1 approximations
To address the likely downwards bias in the estimates of the variance components, let’s enhance the Laplace approximations used for both the mean and dispersion models by setting their order to 1.
The resulting estimates of the variance components are now in line with those obtained from the Gibbs sample and Monte Carlo EM algorithm.
HGLM (order 1)
Monte Carlo EM
Moreover, the estimates of the fixed effects are very similar:
HGLM (order 1)
Monte Carlo EM
|Type F W|
|Type M W|
So, should we always fit GLMMs using HGLMs with enhanced Laplace approximations?
Probably not. Arguably most people are more familiar with GLMMs than HGLMs, and the output from a GLMM analysis tends to be better understood. Moreover, the REML-based algorithm used to fit GLMMs allows for negative variance components and correlation models. However, should you be analysing a data set prone to the bias problem of GLMMs (i.e., binary data, binomial data with small binomial totals, Poisson data with small counts, or a small data set) then using a HGLM with enhanced Laplace approximations to fit your GLMM is a good idea.
 Genstat example data set Salamander.gsh
Dr. Roger Payne
Dr Roger Payne leads the development of Genstat at VSNi. He has a degree in mathematics and a PhD in mathematical statistics from University of Cambridge, and is a Chartered Statistician of the Royal Statistical Society.
Prior to joining VSNi he was a member of the Rothamsted Statistics Department, where he worked on Genstat from 1974, taking over its leadership in 1985. He has also had several extended research visits to other organisations, including a one year secondment to CSIRO Division of Mathematical Statistics in Adelaide during 1978-9. His work has involved a mixture of statistical computing, statistical consulting and research, enabling him to develop a talent for development and then practical application of new and relevant statistical methodology. He has a broad knowledge of statistics, with particular expertise in design and analysis of experiments, generalized linear models and linear mixed models.
He has served on the committees of many statistical societies, including the Royal Statistical Society Council, General Applications Section (Chairman) and Statistical Computing Section (Chairman), the British Classification Society and the International Association for Statistical Computing (Vice President). His other professional activities have included a visiting professorship at Liverpool John Moores University, and a special lectureship at University of Nottingham.
Dr. Vanessa Cave
Dr. Vanessa Cave is an applied statistician interested in the application of statistics to the biosciences, in particular agriculture and ecology, and is a developer of the Genstat statistical software package. She has over 15 years of experience collaborating with scientists, using statistics to solve real-world problems. Vanessa provides expertise on experiment and survey design, data collection and management, statistical analysis, and the interpretation of statistical findings. Her interests include statistical consultancy, mixed models, multivariate methods, statistical ecology, statistical graphics and data visualisation, and the statistical challenges related to digital agriculture.
Vanessa is a past President of both the Australasian Region of the International Biometric Society and the New Zealand Statistical Association, on the Editorial Board of The New Zealand Veterinary Journal and an honorary academic at the University of Auckland. She has a PhD in statistics from the University of St Andrews.
Dr. Ruth Butler15 November 2023
Experimental Design: What are Balance, Efficiency and Independence?
Dr. Salvador A. Gezan08 November 2023
Statistical inference is the process of drawing some conclusions about a population based on the sample data at hand.
Salvador A. Gezan01 November 2023
An important aim when fitting linear mixed models (LMM) is the use of random effect estimates. In some analyses, such as genetic evaluations, the main objective of the analysis is to obtain these estimates.