PhD - Applications of Multilevel Modelling

Exploring the assumption of no correlation of explanatory variables with random effects. 

Project aims and activities

The overarching aim of my PhD project is to improve understanding about the implications of using random effects models with social data where the random effects are correlated with explanatory variables. 

In more detail

Random effects models are used in the social sciences to handle data where cases are clustered in some way, either because of the way the data have been sampled, or because conceptually we believe that cases are grouped in some way, so that important differences between these groups might shape the processes affecting them

For example, if we are interested in the relationship between parental income and school attainment, and the data we have contains groups of pupils within schools, we might suppose that there will be differences in the outcomes of different schools for various reasons we can't measure (or just haven't). Perhaps one school has a sporty ethos, and another is very focused on diversity and tolerance, while a third has suffered a lot of disruption because of a building which is poorly maintained.

These differences could mean that the children within any one of these schools have outcomes which are similarly different to the overall average. Some sort of multilevel model is needed to cope with that structure in the data.

Imagine a model which tries to describe how exam scores vary along with a parental income. One common approach would be to use a random intercepts model, which would allow a different 'baseline' exam score for each school, and estimate the effect of wealth on exam scores relative to that. 

This improves our analysis in two ways:

However, these models rely on an oft-violated underlying assumption of no correlation of explanatory variables with random effects (hencefore the 'NCRX assumption'). If this assumption is not met, the resulting estimates can be inaccurate

To continue the example above, we can easily imagine that parental income might tend to be higher in schools where exam scores are above average, perhaps because wealthier parents have the means to move to an area where their child can attend an apparently high-achieving school. 

If this confounding factor confuses the model, we could draw the wrong conclusions about the way in which family wealth is involved with school attainment. The graphs above show a simulated (fake!) but plausible data pattern where exam scores do increase with income, but higher incomes are clustered in schools with higher exam results, and within those clusters the relationship between income and attainment is weaker. Looking at the same data in a multilevel way changes the story.  

Such a violation does not always change the results, and there are corrections we can apply to address it, but we need to understand more (on an applied level) about when and how these models fail. My PhD project will address that need through three strands of activity:



Kate O'Hara, University of Stirling, @Kate_OHara_


Paul Lambert, University of Stirling, 

Kevin Ralston, University of Edinburgh,


'Three-minute Thesis' slide, as presented at University of Stirling Festival of Research, May 2023. Image credits and references for this slide are listed here.