Michael Freeman of U of Washington has a nicely done quick and dirty animation explaining random effects and nested data.
There are many different approaches with the usual slew of advantages/disadvantages (no clear "best" approach), but here is the suggestion Mixed Effects Models and Extensions in Ecology with R, which is one of my favorite textbooks for the amount of technical detail and extra attention to how to handle the dirty, messy reality of data that most other references avoid (only giving you the ideal, well-behaved cases). Check out the other great works and resources offered by the first author statistician Alain Zuur.
From Ch. 5 Mixed Effects Modelling for Nested Data:
Set the "beyond optimal" model or the "full" model -- all the fixed terms and their interactions that are of interest to you and that are practical to include.
Using the beyond optimal model, find the optimal random term using REML, not ML which is biased for nested random structures. REML-estimated likelihood ratio test, AIC or BIC to compare and select (caveat: AIC and BIC do not avoid boundary problems). The book illustrates why using a null model (no fixed effect factors) to determine the random effects structure is a bad idea by way of step-by-step example.
note that stats::anova() in R will refit the models with ML before giving you the likelihood ratio test results -- set refit = F
Once optimal random structure is found in (2), determine the optimal fixed effects structure. To compare models with nested fixed effects (but same random structure), need ML-estimated t-/F-statistic and not REML-estimated.
Why ML: REML estimates the random effects by taking linear combinations of the data to remove the fixed effects -- which essentially makes the likelihoods of the two models not directly comparable (Faraway 2006).
Estimate the final model using REML.
Also from Zuur:
I. intro
• overdispersion
= data variance > mean
=> Poisson regression models do not work (underlying assumption: variance = mean)
=> use negative binomial GLMs and GAMs
• zero-inflated
= data sets with lots of zeros (more than expected in Poisson or negative binomial distribution)
• models handling zero inflation also handle overdispersion
° zero-inflated Poisson GLMs and GAMs
° zero-inflated negative binomial GLMs and GAMs
• model = fixed term + random term
° fixed term = response variable described as a function of explanatory variables
° random term = ...
* only random term => linear regression or additive modeling
+ allows for heterogeneity => generalized least squares (weighted linear regression)
+ allows for nested data => mixed effects modelin
App.A
1. Outliers in the response and explanatory variables.
• boxplots and Cleveland dotplots (sometimes dotplots suggest no outliers but boxplots do)
• R: dotchart(Loyn$ABUND, main = "ABUND", group = Loyn$fGRAZE)
• 2-3 observations are outliers for all variables => exclude
• too many observations outliers => transformation
2. Collinearity of the explanatory variables
• remove redundant variables
• 3 tools:
a. Pairwise scatterplots
• R: AED::pairs(Z, lower.panel = panel.smooth2, upper.panel = panel.cor, diag.panel = panel.hist)
• the value of 0.6 (and –0.6) is not large enough to worry us.
b. correlation coefficients
c. variance inflation factors (VIF)
• R: corvif(Z[, c(-1,-7)])
• VIF < 3 => no collinearity between variables
3. Relationships between the response variable and the explanatory variables.
° pairplots of response variable against explanatory variables
° various R functions
• pairs
• lattice::coplot
• lattice::xyplot
• design:: plot. design (design and interaction plots)
° interactions -- how to deal
• different views:
1. Start with a model with no interactions. Apply the model, model selection, and model validation. If the validation shows that there are patterns in the residuals, investigate why. Adding interactions may be an option to improve the model.
2. Use biological knowledge to decide which, if any, interactions are sensible to add.
3. Apply good data exploration to see which interactions may be important.
4. Identify the prime explanatory variable(s) of interest. In the bird example, this would be GRAZE and AREA as we can control them using management decisions. Include interactions between these variables and any of the other variables.
5. Only include the main terms and two-way interaction terms.
6. Only include higher interactions terms if you have good reasons (i.e. biological justification) to do so.
7. Include all interactions by default.
• if interactions are included => must include the corresponding main terms
• first assess numerically:
> summary(M1)
# don't use individual p-values to assess the significance of a factor
# should not drop levels of nominal variable (include all levels otherwise drop entire variable)
> drop1(M1, test="F")
# repeatedly compares full with a nested model, dropping one variable each time
# in linear regression, returns p-values = that of t-statistic from "summary"
# gives a p-value for each variable
# !!! p-values depend on order of variables
> anova(M1)
# residual mean square = of full model
# p-value should be similar to drop1 if same nested models compared
> anova(M1,M2)
# compare nested models
# advantage over anova(M1) is can control which terms get dropped
° model selection
• explanatory variable not significant from above?
* 3 main approaches:
1. Drop individual explanatory variables one by one based on hypothesis testing procedures.
° drop least significant term depending on t and p-values from "summary" or "anova" commands
2. Drop individual explanatory variables one by one (and each time refit the model) and use a model selection criteria like the AIC or BIC to decide on the optimal model.
° automatic forward and backward model selection tools in R
> step(M1)
3. Specify a priori chosen models, and compare these models with each other.
4. Model validation (statisticians tend to use graphs
• Plot (standardised) residuals against fitted values to assess homogeneity.
> M3 <-lm(ABUND˜L.AREA+fGRAZE,data= Loyn)
> op <-par(mfrow= c(2, 2))
> plot(M3)
• Make a histogram or QQ-plot of the residuals to verify normality.
> E <-rstandard(M3)
> hist(E)
> qqnorm(E)
• Plot the residuals against each explanatory variable that was used in the model. If you see a pattern, you are violating the independence assumption.
> plot(y = E, x = Loyn$L.AREA, xlab = "AREA", ylab = "Residuals")
> abline(0,0)
> plot(E ˜ Loyn$fGRAZE, xlab = "GRAZE", ylab = "Residuals")
> abline(0, 0)
> par(op)
• Plot the residuals against each explanatory variable not used in the model. If you see a pattern, include the omitted explanatory variable and refit the model. If the residuals patterns disappear, include the term, even if it is not significant.
° solutions:
* Extend the model with interactions terms.
* Extend the model with a non-linear length effect (e.g. use length and length to the power of two as explanatory variables).
* Add more explanatory variables.
* Transform the data to linearise the relationships
* if all this fails to remove patterns, move away from linear models and consider e.g. smoothing
• Assess the model for influential observations. A useful tool is the Cook distance function.
5. Model interpretation
• scatterplot
> plot(Loyn$L.AREA,Loyn$ABUND)
• predictions made by model with fitted line
> P1<-predict(M3,newdata=D1)
> I1 <-order(L.AREA) > SGRAZE <-GRAZE[I1]
# sort data from small to large otherwise get spaghetti plot
> SL.AREA <-sort(L.AREA)
> lines(D1$SL.AREA,P1,lty=1)
6. Additive modeling -- generalised additive model (GAM): what type of model is required?
> library(mgcv)
> AM1 <-gam(ABUND˜s(L.AREA)+s(L.DIST)+ s(L.LDIST) + s(YR.ISOL) + s(ALT) + fGRAZE, data = Loyn)
> anova(AM1)
• Q is what is optimal model => start with all the explanatory variables because some variables may have nonlinear effect causing them to be n.s. in linear regression model
• Q is if variables A and B really linear => GAM with smoothing functions of only A and B
• The anova command does not apply a sequential F-test as it did for the linear regression model. Instead, it gives the Wald test (approximate!)
• model selection options
° hypothesis testing approach
* easiest
* just drop the least significant term from the model, refit the model, and repeat this process until all terms are significant.
° AIC
> AIC(AM1)
• can be time-consuming because no "step" function
• you have to drop each term in turn, write down the AIC, and choose the variable to drop from the model, and repeat this process a couple of times.
° cross-validation
• method to estimate optimal amount of smoothing, whereonedegreeoffreedomproduces a straight line and 10 degrees of freedom is a highly non-linear curve
• "gam" function can produce smoothers with 0 degrees of freedom, which basically removes the need to refit the model without the terms.
* only works with thin plate regression splines and cubic regression spline
> AM2 <-gam(ABUND˜s(L.AREA,bs= "cs") + s(L.DIST, bs = "cs") + s(L.LDIST,bs = "cs") + s(YR.ISOL, bs = "cs") + s(ALT, bs = "cs") + fGRAZE, data = Loyn)
> anova(AM2)
* bs = "cs" tells R to use the cubic regression spline with shrinkage
• model validation process should follow nearly the same steps as in linear regression
° but there is no function that plots residuals against fitted values
> E.AM3 <-resid(AM3)
> Fit.AM3 <-fitted(AM3)
> plot(x = Fit.AM3, y = E.AM3, xlab = "Fitted values", ylab = "Residuals")
• final Q: was GAM really necessary?
° if same explanatory variables, compare to linear regression:
> M3 <-lm(ABUND˜L.AREA+fGRAZE,data= Loyn)
> AM3 <-gam(ABUND˜s(L.AREA,bs= "cs") + fGRAZE, data = Loyn)
> anova(M3, AM3, test ="F")
# null hypothesis that both models same
• gam::gam vs. mgcv::gam
° gam::gam easy to understand and explain.
° mgcv::gam can do cross validation and allows for generalised additive mixed modelling (GAMM) including spatial and temporal correlations as well as nested data and various heterogeneity patterns.
* Cross-validation is a process that automatically determines the optimal amount of smoothing
II. Limitations of linear regression models
II.1 Data exploration
• identify outliers and violations of homogeneity
° Cleveland dotplots, grouped by var
>dotchart(Nereis$concentration,groups=factor(Nereis$nutrient),ylab="Nutrient",xlab="Concentration",main="Clevelanddotplot",pch=Nereis$nutrient)
• relationships between variables
° pairplots
# each panel is scatterplot of 2 var
# in general no sense to add nominal var
> pairs(Nereis)
> coplot(LNAFD ∼ LNLENGTH | fMONTH, data = Clams)
° boxplots (condition on nominal var)
> boxplot(concentration ∼ factor(nutrient), varwidth = TRUE, xlab = "nutrient", main = "Boxplot of concentration conditional on\ nutrient", ylab = "concentration", data = Nereis)
* Points outside box may (or may not) be outliers. One should not label them as outliers purely on the basis of a boxplot!
° lattice/Trellis graph
> library(lattice)
> xyplot(X15N ∼ Age | factor(Tooth), type = "l", xlab = "Estimated age", col = 1, ylab = expression(paste(deltaˆ{15},"N")), strip = function(bg = 'white', ...) strip.default(bg = 'white', ...), data = TeethNitrogen)
II.2 Linear regression model
• ‘linear’ in linear regression basically means linear in the parameters
• assumptions
* normality of the data, and therefore of the residuals at each X value
• can get away with a bit of non-normality
• check at each X value: make a histogram of all observations at that particular X value. Very often, we don’t have multiple observations (sub-samples) at each X value. In that case, the best we can do is to pool all residuals and make a histogram of the pooled residuals
° normality of the pooled residuals is reassuring, but it does not imply normality of the population data. We also discuss how not to check for normality as the underlying concept of normality is grossly misunderstood by many researchers.
° misleading to base normality judgement purely on a histogram of all the Y data because raw data Y includes effect of explanatory var unless have replicated observations for each X value
=> apply model and inspect residuals
* homogeneity/homoscedasticity (spread of Gaussian curve same for all values
• better to assess homogeneity purely based on a graphical inspection of the residuals
• minor violation okay but major violation serious
* fixed X (explanatory variables are deterministic)
* independence
• the most serious problem as it invalidates important tests such as the F-test and the t-test
* correct model specification
II.3 Model validation
(i) residuals versus fitted values to verify homogeneity,
(ii) a QQ-plot or histogram of the residuals for normality, and
(iii) residuals versus each explanatory variable to check independence
Cook distance > 1 is the threshold value upon one should take further action (the Cook distance tells you how influential an observation is on the estimated parameters)
III. Additive models
• like linear model equation but β × Xi replaced by the smoothing curve f(Xi).
• smoothers
° LOESS (local regression smoother), LOWESS (local weighted linear regression)
> library(gam)
> M1 <-gam(Sources16∼ lo(Depth16, span = 0.5))
> plot(M1, se = TRUE)
# with fitted and observed values:
> M2 <-predict(M2,se= TRUE)
> plot(Depth16, Sources16, type = "p")
> I1 <-order(Depth16)
> lines(Depth16[I1], M2$fit[I1], lty = 1)
> lines(Depth16[I1], M2$fit[I1] + 2 * M2$se[I1], lty = 2)
> lines(Depth16[I1], M2$fit[I1] -2 * M2$se[I1], lty = 2)
> par(op)
• R function loess
° actually does LOWESS
° newer than function lowess with better support for missing values and has different default settings.
° As both packages use the same function gam, it may be wise to restart R; alternatively, type detach("package:gam")
IV. Dealing with heterogeneity
• when residuals vs. fixed values is not an evenly distributed cloud:
> plot(M1, which = c(1), col = 1, add.smooth = FALSE, caption = "")
IV.1.2 Fixed variance structure
• fitted using the generalised least squares (GLS) method
• assumes that var(εi) = σ2 × culpritFixedVar_i
• allows for heterogeneity as culpritFixedVar changes
> library(nlme)
> M.lm <-gls(Testisweight∼ DML * fMONTH, data=Squid)
> vf1Fixed <-varFixed(∼DML)
> M.gls1 <-gls(Testisweight∼ DML * fMONTH, weights = vf1Fixed, data = Squid)
> anova(M.lm, M.gls1)
# models not nested so no models are not nested; so no log-likelihood ratio test statistic is given => use the AIC
# can also just use AIC(M.lm, M.gls1)
IV.1.3 VarIdent Variance Structure
• formulate the variance structure with different spread per stratum: εij ∼ N(0,σ2j) j = 1,...,12
# using diff variances per month
> vf2 <-varIdent(form= ∼ 1 | fMONTH)
> M.gls2 <-gls(Testisweight∼ DML*fMONTH, data =Squid, weights = vf2)
> anova(M.lm, M.gls1, M.gls2)
# or
AIC(M.lm, M.gls1, M.gls2)
# plot residuals to see which factors influencing spread i.e. a variance covariance
> E <-resid(M.lm)
> coplot(E ∼ DML | fMONTH, data = Squid)
IV.1.4 ‘power of the covariate’ variance structure: varPower
• don't use of variance covariate has values =0
> vf3<-varPower(form=∼DML)
> M.gls3<-gls(Testisweight∼DML*fMONTH,weights=vf3,data=Squid)
# model an increase in spread for larger DML values, but only in certain months
> vf4<-varPower(form=∼DML|fMONTH)
> M.gls4<-gls(Testisweight∼DML*fMONTH,data=Squid,weights=vf4)
# fixed option in varPower allows one to test if variance of residuals constant for certain levels
IV.1.5 the exponential variance structure: varExp
• better choice when variance covariate can = 0
• models the variance of the residuals as σ2 multiplied by an exponential function of the variance covariate DML and an unknown parameter δ
> vf5 <-varExp(form=∼ DML)
> M.gls5 <-gls(Testisweight∼ DML * fMONTH, weights = vf5, data = Squid)
IV.1.5 constant plus power of the variance covariate function: varConstPower
• this variance structure works better than the varExp if the variance covariate has values close to zero
> vf6 <-varConstPower(form=∼ DML)
> M.gls6 <-gls(Testisweight∼ DML * fMONTH, weights = vf6, data = Squid)
IV.1.6 combination of variance structures: varComb function.
• allow for both an increase in residual spread for larger DML values as well as a different spread per month
# combination of varIdent and varExp.
> vf8 <-varComb(varIdent(form=∼ 1 | fMONTH) , varExp(form =∼ DML) )
> M.gls8 <-gls(Testisweight∼ DML * fMONTH, weights = vf8, data = Squid)
> anova(M.lm, M.gls1, M.gls2, M.gls3, M.gls4, M.gls5, M.gls6, M.gls7, M.gls8)
• If the variance covariate has large values (e.g. larger than 100), numerical instabilities may occur; exp(100) is rather large! => rescale the variance covariate before using it in any of the variance structures.
° and just use unscaled for fixed terms
• which variance structure to choose?
° If the variance covariate is a nominal variable => use varIdent
° if variance covariate is continuous => varFixed,varPower,varExp, or varConstPower
* varFixed is rather limited, as it assumes that the variance of the residuals is linearly related to a variance covariate
* if the variance covariate takes the value of zero => don't use varPower
* a matter of trial and error, and the best choice is judged through using tools like the AIC
IV.1.9 graphical model validation
• two types of residuals:
(i) residuals calculated as observed minus fitted values (also called ordinary residuals) and
• not useful for graphical model validation because heterogeneity now allowed
> E1 <-resid(M.gls4)
> coplot(E1 ∼ DML | fMONTH, ylab = "Ordinary residuals", data = Squid)
(ii) normalised residuals
= the observed minus the fitted values and then dividing by the square root of the variance
• use for model validation
> E2 <-resid(M.gls4,type= "normalized")
> coplot(E2 ∼ DML | fMONTH, data = Squid, ylab = "Normalised residuals")
https://www.theanalysisfactor.com/mixed-models-predictor-both-fixed-random/
Nested random term measures an interaction of a factor in the term it is nested in and is a variance estimate of the random slope
Terminology check: random effects vs. random factors
https://www.theanalysisfactor.com/random-factors-random-effects-difference/
https://dynamicecology.wordpress.com/2015/11/04/is-it-a-fixed-or-random-effect/
Most of the time, comparing AIC and using the likelihood ratio test (LRT) is okay until you have boundary conditions
Self, S. G. & Liang, K. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions J. Amer. Statist. Assoc., 1987, 82, 605-610.
Use AIC to compare models with same fixed effects structure but varying random effects structure? This is what Zuur et al. (2009) recommends, but as Ben bolker says -- with caution because AIC also suffers from the same problems as the LRT (both are asymptotic tests). Why is this an issue: unclear what the degrees of freedom should be 
https://stats.stackexchange.com/questions/117497/comparing-between-random-effects-structures-in-a-linear-mixed-effects-model