Using R package MuMIn
MuMIn is a fairly flexible R package for conducting model selection and model averaging with a variety of linear models including ordinary linear regression and generalized mixed models. If you don’t know what the latter are, don’t worry this tutorial will still prove useful. I will assume that you are somewhat familiar with R and know things like: # (the hash tag) is for adding comments to R code, how to use an R script, how to specify the working directory, how to install and load R packages, and how to read and write data files. If not, you may want to visit the FW599 An introduction to data management and R for Fisheries and Wildlife applications for some pointers. Be sure to have MuMIn and the R package lme4 installed before proceeding with the tutorial. You also may want to download the two example data files that are in comma separated format: Example data.csv and Westslope.csv. The script that contains all of the code used below is here and a pdf duplicate of this website is available here.
To demonstrate many of the useful functions in MuMIn, let’s use the Example data.
# download the file directly from the website
dater<-read.csv("https://drive.google.com/file/d/10Ncj8ooqQ6v8kkQ5u0f_WrHC1FEmpIOX/view?usp=sharing")
# or read in downloaded datafile
#dater<-read.csv("Example data.csv")
# what's in the data
head(dater)
The data includes 3 response variables count, density and presenceand 5 explanatory variables: elev (elevation), slope (gradient), area, distance (to nearest population), and pct.cover (% cover). A good practice is to use summary to make sure you don’t have any missing data.
# make sure all numeric values and no missing
summary(dater)
Load the package using either “require” or “library” functions.
# load the MUMIn package
require(MuMIn)
One very important thing you should do next is change the global options for how R functions handle missing data. By making this change, a function will not work if data are missing. This is required if you use the "dredge" function for exploratory data analysis.
# change na. action
options(na.action = "na.fail")
Ok we’re ready to go. Let’s fit four candidate models that explain variation in animal density. Ideally, these models would represent hypotheses. Given the nature of the response, we’ll use ordinary linear regression with the “lm” function.
#First, fit 4 candidate linear models to explain variation in density
mod1<-lm(density~distance+elev, data = dater)
mod2<-lm(density~slope+pct.cover, data = dater)
mod3<-lm(density~slope+distance, data = dater)
mod4<-lm(density~slope+distance+elev, data = dater)
We can now use the mod.sel or model.sel function (same thing) to conduct model selection. The default model selection criteria is Akaike’s information criteria (AIC) with small sample bias adjustment, AICc. Here we’ll create an object “out.put” that contains all of the model selection information.
# use the mod.sel function to conduct model selection
# and put output into object out.put
out.put<-mod.sel(mod1,mod2,mod3,mod4)
# what's it look like, hmm AIC with small sample bias adjustment AICc
# delta AICc, and the model weights
out.put
>
Model selection table
(Int) dst elv pct.cvr slp df logLik AICc delta
mod1 -0.013280 5.195e-05 1.832e-05 4 863.907 -1719.7 0.00
mod4 -0.013080 5.181e-05 2.002e-05 -0.01096 5 863.944 -1717.6 2.01
mod3 -0.012920 5.207e-05 0.08634 4 860.130 -1712.1 7.55
mod2 0.003614 2.777e-05 0.05374 4 819.492 -1630.8 88.83
weight
mod1 0.720
mod4 0.264
mod3 0.016
mod2 0.000
Models ranked by AICc(x)
The results are shown above (gray background) with the models sorted from best (top) to worst (bottom). Looks like mod1, containing an intercept (Int) distance (dst), elevation(elev) is best with a weight of 0.72. It is 0.72/0.264 = 2.72 time more likely to be the best explanation (hypothesis) for variation in density. Note that the function does not use the whole name for model parameters, but creates abbreviations. Quite often we need to express model selection uncertainty by specifying a confidence set of models using some rule. Here we can use the subset function to select the models that meet the criteria. Note that the weights are re-normalized for the models selected. That is, they are adjusted so that they add to one. I hate this feature-- not very helpful.
# create a confidence set of models using the subset function
# select models with delta AICc less than 5
# IMPORTANT: Weights have been renormalized!!
subset(out.put, delta <5)
>
Model selection table
(Int) dst elv slp df logLik AICc delta weight
mod1 -0.01328 5.195e-05 1.832e-05 4 863.907 -1719.7 0.00 0.732
mod4 -0.01308 5.181e-05 2.002e-05 -0.01096 5 863.944 -1717.6 2.01 0.268
Models ranked by AICc(x)
# select models using Royall's 1/8 rule for strength of evidence
# IMPORTANT: Weights have been renormalized!!
subset(out.put, 1/8 < weight/max(out.put$weight))
>
Model selection table
(Int) dst elv slp df logLik AICc delta weight
mod1 -0.01328 5.195e-05 1.832e-05 4 863.907 -1719.7 0.00 0.732
mod4 -0.01308 5.181e-05 2.002e-05 -0.01096 5 863.944 -1717.6 2.01 0.268
Models ranked by AICc(x)
Not much different than delta < 5 above. Let’s try another criteria based on the cumulative sum of the model weights.
# select models 95% cumulative weight criteria
# IMPORTANT: Weights have been renormalized!!
subset(out.put, cumsum(out.put$weight) <= .95)
>
Model selection table
(Int) dst elv df logLik AICc delta weight
mod1 -0.01328 5.195e-05 1.832e-05 4 863.907 -1719.7 0 1
Models ranked by AICc(x)
In most circumstances, you would like to include model selection results in a table in a report, publication, or thesis. Here, we need to coerce the output from the mod.sel function into a dataframe. The first c elements of that data frame contain what we want. How do I know that? I first created the dataframe and used the “str” function to see what elements were in the dataframe.
# coerce the object out.put into a data frame
# elements 6-10 in out.put have what we want
sel.table<-as.data.frame(out.put)[6:10]
sel.table
>
df logLik AICc delta weight
mod1 4 863.9067 -1719.653 0.000000 7.196856e-01
mod4 5 863.9437 -1717.646 2.006937 2.638408e-01
mod3 4 860.1296 -1712.099 7.554120 1.647353e-02
mod2 4 819.4916 -1630.823 88.830152 3.697604e-20
This is a bit messy and not ready for any report. Let’s clean this up a bit -- first by rounding.
# a little clean-up, lets round things a bit
sel.table[,2:3]<- round(sel.table[,2:3],2)
sel.table[,4:5]<- round(sel.table[,4:5],3)
# that’s better
>
sel.table df logLik AICc delta weight
mod1 4 863.91 -1719.65 0.000 0.720
mod4 5 863.94 -1717.65 2.007 0.264
mod3 4 860.13 -1712.10 7.554 0.016
mod2 4 819.49 -1630.82 88.830 0.000
# how about a little renaming columns to fit proper conventions
# number of parameters (df) should be K
names(sel.table)[1] = "K"
## lets be sure to put the model names in a column
sel.table$Model<-rownames(sel.table)
# replace Model name with formulas little tricky so be careful
for(i in 1:nrow(sel.table)) sel.table$Model[i]<- as.character(formula(paste(sel.table$Model[i])))[3]
# let's see what is in there
sel.table
>
df logLik AICc delta weight Model
mod1 4 863.91 -1719.65 0.000 0.720 distance + elev
mod4 5 863.94 -1717.65 2.007 0.264 slope + distance + elev
mod3 4 860.13 -1712.10 7.554 0.016 slope + distance
mod2 4 819.49 -1630.82 88.830 0.000 slope + pct.cover
#little reordering of columns
sel.table<-sel.table[,c(6,1,2,3,4,5)]
sel.table
>
Model K logLik AICc delta weight
mod1 distance + elev 4 863.91 -1719.65 0.000 0.720
mod4 slope + distance + elev 5 863.94 -1717.65 2.007 0.264
mod3 slope + distance 4 860.13 -1712.10 7.554 0.016
mod2 slope + pct.cover 4 819.49 -1630.82 88.830 0.000
Now that’s more like it. Models are from best to worst fitting. You can usually drop AICc is the Log Likelihood (logLik) is in the table, but we’ll leave it here. This is about ready for a report, so let’s write it to a comma separated file.
# write to a file, here a comma separated values format
# make sure your working directory is properly specified
write.csv(sel.table,"My model selection table.csv", row.names = F)
The default model selection criteria for all MuMIn functions is AICc. If you don’t like or have another favorite, you can specify that method using the “rank” option in mod.sel. Below is code for selecting models using Bayesian or Schwartz’s information criteria (BIC) consistent AIC with Fishers information matrix (CAICF) and quasi-AIC (QAIC, more on this later below).
# model selection criteria BIC
mod.sel(mod1,mod2,mod3,mod4, rank = BIC) Model selection table
>
(Int) dst elv pct.cvr slp df logLik BIC delta
mod1 -0.013280 5.195e-05 1.832e-05 4 863.907 -1705.6 0.00
mod4 -0.013080 5.181e-05 2.002e-05 -0.01096 5 863.944 -1700.2 5.47
mod3 -0.012920 5.207e-05 0.08634 4 860.130 -1698.1 7.55
mod2 0.003614 2.777e-05 0.05374 4 819.492 -1616.8 88.83
weight
mod1 0.919
mod4 0.060
mod3 0.021
mod2 0.000
Models ranked by BIC(x)
#consistent AIC with Fishers information matrix
mod.sel(mod1,mod2,mod3,mod4, rank = CAICF) Model selection table
>
(Int) dst elv pct.cvr slp df logLik CAICF delta
mod3 -0.012920 5.207e-05 0.08634 4 860.130 -1642.9 0.00
mod1 -0.013280 5.195e-05 1.832e-05 4 863.907 -1633.1 9.80
mod4 -0.013080 5.181e-05 2.002e-05 -0.01096 5 863.944 -1619.2 23.66
mod2 0.003614 2.777e-05 0.05374 4 819.492 -1565.5 77.37
weight
mod3 0.993
mod1 0.007
mod4 0.000
mod2 0.000
Models ranked by CAICF(x)
There also are MuMin functions for calculating model selection criteria, such as AIC, AICc, BIC and Mallows Cp, an ad hoc model selection criterion, not recommended.
# Mallows Cp
Cp(mod4)
[1] 0.01757519
#AIC
AIC(mod1,mod2)
df AIC
mod1 4 -1719.813
mod2 4 -1630.983
Note that above the df is actually the number of model parameters, usually defined as K.
# CAICF
CAICF(mod1, mod2)
CAICF
mod1 -1633.082
mod2 -1565.503
The relative importance of individual parameters can also be examined using the model weights. Here, the Akaike weights for each model that contains the parameter of interest are summed. These have been defined as importance weights and you can obtain them from a mod.sel object using the “importance” function.
# Importance weights for individual predictor variables
# calculated using the importance function
importance(out.put)
>
importance(out.put) distance elev slope pct.cover
Importance: 1 0.98 0.28 <0.01
N containing models: 3 2 3 1
Looking at the output above, there is plenty of evidence for distance and elev (weights close to one), but much less for pct.cover. The number of candidate models that a parameter occurs can have a big effect of the importance weight. For example, the intercept is included in all models, so the importance weight is 1 (hence it is never shown). In the above output, pct.cover is in only one model so interpret weights with caution.
Model averaging is a mean to incorporate model selection uncertainty. Here, the parameter estimates for each candidate model are weighted using their corresponding model weights and summed. There are two methods for model-averaging defined by Burnham and Anderson as
, where parameter estimates are averaged over all models in which predictor xj occurs and
where parameter estimates are averaged over all models not just those in which predictor xj occurs. MuMIn function model.avg conducts both types of model averaging and reports the first type of model averaging as “subset” and the second type as “full.
# Model average using all candidate models, always use revised.var = TRUE
MA.ests<-model.avg(out.put, revised.var = TRUE)
>
Call:
model.avg.model.selection(object = out.put, revised.var = TRUE)
Component models:
‘12’ ‘124’ ‘14’ ‘34’
Coefficients:
(Intercept) distance elev slope pct.cover
subset -0.01322053 5.191484e-05 1.877402e-05 -0.005241959 2.777260e-05
full -0.01322053 5.191484e-05 1.846475e-05 -0.001469396 1.026921e-24
Above, are the two types of model averaged coefficients.
The here are your estimates, unconditional standard errors, you want the Adjusted SE and upper and lower CL
MA.ests$avg.model
>
Estimate Std. Error Adjusted SE Lower CI Upper CI
(Intercept) -1.322053e-02 2.196457e-03 2.207049e-03 -1.754627e-02 -8.894795e-03
distance 5.191484e-05 5.221077e-06 5.246295e-06 4.163229e-05 6.219739e-05
elev 1.877402e-05 4.910574e-06 4.933771e-06 9.104006e-06 2.844403e-05
slope -5.241959e-03 4.582284e-02 4.598957e-02 -9.537986e-02 8.489594e-02
pct.cover 2.777260e-05 2.716872e-05 2.729983e-05 -2.573408e-05 8.127928e-05
#Here are the beta tilda bar MA estimates
MA.ests$coef.shrinkage
>
(Intercept) distance elev slope pct.cover
-1.322053e-02 5.191484e-05 1.846475e-05 -1.469396e-03 1.026921e-24
# you can also obtain importance weights for individual params
MA.ests$importance
>
distance elev slope pct.cover
Importance: 1 0.98 0.28 <0.01
N containing models: 3 2 3 1
We can use other options to select and create a composite model created from only those models in our confidence set. For example
#create model averaged estimates for parameters in confidence set of
# models only using subset command
MA.ests<-model.avg(out.put, subset= delta < 5, revised.var = TRUE)
MA.ests$avg.model
>
Estimate Std. Error Adjusted SE Lower CI Upper CI
(Intercept) -1.322554e-02 2.194158e-03 2.204742e-03 -1.754676e-02 -8.904326e-03
distance 5.191232e-05 5.219487e-06 5.244699e-06 4.163290e-05 6.219174e-05
elev 1.877402e-05 4.910574e-06 4.933771e-06 9.104006e-06 2.844403e-05
slope -1.096031e-02 4.060037e-02 4.079708e-02 -9.092112e-02 6.900050e-02
# lets clean up a bit and write the table to a file
MA.est.table<-round(MA.ests$avg.model[,c(1,3:5)],6)
MA.est.table
>
Estimate Adjusted SE Lower CI Upper CI
(Intercept) -0.013226 0.002205 -0.017547 -0.008904
distance 0.000052 0.000005 0.000042 0.000062
elev 0.000019 0.000005 0.000009 0.000028
slope -0.010960 0.040797 -0.090921 0.069001
## write to a CSV file
write.csv(MA.est.table, "My model averaged estimates.csv")
For normal linear models, you can use the model averaged parameter estimates to predict the response, here density, for various values of the independent variables (aka. predictors). This is equivalent to predicting the response with each candidate model and averaging the predictions using the corresponding model weights. Note that this will not work for non-normal models, such a logistic regression or Poisson regression. We can use the output from MuMIn functions to conduct model averaging of predicted values. Here we use a few functions in MuMIn and some in base R functions. For example, below is the code for calculating model averaged predictions.
#extract parameters and weights from confidence model set
# using get.models function
pred.parms<-get.models(out.put, subset= delta < 5)
# predict values using each model, here were just using the
# the example dataset, you could use a new dataset
model.preds = sapply(pred.parms, predict, newdata = dater)
>
model.preds mod1 mod4 mod3 mod2
1 1.572387e-02 1.561711e-02 1.647889e-02 0.009264373
2 8.710879e-03 8.812682e-03 8.035629e-03 0.007606412
3 4.596332e-03 4.510560e-03 3.989456e-03 0.011867896
4 1.125713e-02 1.133978e-02 8.993757e-03 0.011941525
(rest of output not shown)
Above are the predictions made with each candidate model (Cutoff the out a bit). Now we need to weight each by the model weight and sum (just like the model averaged coefficients above). The easiest way to do this is using matrix multiplication.
# weight the prediction from each model by its AIC weight
# the "Weights" function extracts the weights
# we also are using matrix multiplication %*%
mod.ave.preds<-model.preds %*% Weights(out.put)
mod.ave.preds
[,1]
1 1.570814e-02
2 8.726615e-03
3 4.563705e-03
4 1.124165e-02
5 1.518054e-02
6 7.068765e-03
(rest of output not shown)
A more interesting application of model averaging for plotting is to create a dataset where everything but a single predictor (elevation, below) is set at its mean value.
# elevation ranges from the observed minimum to maximum
elev=c(min(dater$elev):max(dater$elev))
# create plotdata data frame with mean values
plotdata<-as.data.frame(lapply(lapply(dater[5:8],mean),rep,length(elev))))
plotdata<-cbind(elev,plotdata)
# now predict density for the plot data with each model
model.preds = sapply(pred.parms, predict, newdata = plotdata)
# weight the prediction from each model by its AIC weight
# and sum (matrix multiplication)
mod.ave4plot<-model.preds %*% Weights(out.put)
# plot the model averaged predicted densities vs elevation
plot(mod.ave4plot~ elev, type = 'l', xlab="Elevation (m)", ylab="Model averaged predicted density")
Another useful MuMIn function is "dredge." However, you should only use is for exploratory purposes. Data dredging is strongly discouraged and can result in spurious (and irrelevant or worse, wrong) results and inference. So read the message below and users beware.
## FOR EXPLORATORY PURPOSES ONLY!!! NEVER EVER DO THIS FOR A REAL
## STUDY
# fit model with all parameters
all.parms<-lm(density~slope+distance+elev+ pct.cover, data = dater)
# the dredge function fits all combinations
# of the variables in the all.parms model fit above
results<-dredge(all.parms)
results
>
Model selection table
(Int) dst elv pct.cvr slp df logLik AICc delta
4 -0.013280 5.195e-05 1.832e-05 4 863.907 -1719.7 0.00
8 -0.014560 5.177e-05 1.860e-05 2.362e-05 5 864.448 -1718.7 1.00
12 -0.013080 5.181e-05 2.002e-05 -0.01096 5 863.944 -1717.6 2.01
16 -0.014360 5.162e-05 2.044e-05 2.377e-05 -0.01187 6 864.491 -1716.6 3.01
(rest not shown)
# grab best supported models
subset(results, delta <5)
>
Model selection table
(Int) dst elv pct.cvr slp df logLik AICc delta weight
4 -0.01328 5.195e-05 1.832e-05 4 863.907 -1719.7 0.00 0.455
8 -0.01456 5.177e-05 1.860e-05 2.362e-05 5 864.448 -1718.7 1.00 0.276
12 -0.01308 5.181e-05 2.002e-05 -0.01096 5 863.944 -1717.6 2.01 0.167
16 -0.01436 5.162e-05 2.044e-05 2.377e-05 -0.01187 6 864.491 -1716.6 3.01 0.101
Models ranked by AICc(x)
#grab best model
subset(results, delta == 0)
>
Global model call: lm(formula = density ~ slope + distance + elev + pct.cover, data = dater)
---
Model selection table
(Int) dst elv df logLik AICc delta weight
4 -0.01328 5.195e-05 1.832e-05 4 863.907 -1719.7 0 1
Models ranked by AICc(x)
# calculate variable importance weights
importance(results)
>
distance elev pct.cover slope
Importance: 1.00 0.98 0.38 0.28
N containing models: 8 8 8 8
Notice above that every parameter is in the same number of models.
# use another model selection criteria
results<-dredge(all.parms, rank = BIC)
results
>
Global model call: lm(formula = density ~ slope + distance + elev + pct.cover, data = dater)
---
Model selection table
(Int) dst elv pct.cvr slp df logLik BIC delta
4 -0.013280 5.195e-05 1.832e-05 4 863.907 -1705.6 0.00
8 -0.014560 5.177e-05 1.860e-05 2.362e-05 5 864.448 -1701.2 4.46
12 -0.013080 5.181e-05 2.002e-05 -0.01096 5 863.944 -1700.2 5.47
10 -0.012920 5.207e-05 0.08634 4 860.130 -1698.1 7.55
(rest not shown)
# only allow a maximum of 3 and minimum of 1 parameters in each model
results<-dredge(all.parms,m.max =3, m.min = 1)
results
>
Model selection table
(Int) dst elv pct.cvr slp df logLik AICc delta
4 -0.013280 5.195e-05 1.832e-05 4 863.907 -1719.7 0.00
8 -0.014560 5.177e-05 1.860e-05 2.362e-05 5 864.448 -1718.7 1.00
12 -0.013080 5.181e-05 2.002e-05 -0.01096 5 863.944 -1717.6 2.01
10 -0.012920 5.207e-05 0.08634 4 860.130 -1712.1 7.55
(rest not shown)
# fit all models but do not include models with both slope and elevation
results<-dredge(all.parms, subset= !(slope && elev))
results
>
Model selection table
(Int) dst elv pct.cvr slp df logLik AICc delta weight
4 -0.013280 5.195e-05 1.832e-05 4 863.907 -1719.7 0.00 0.609
8 -0.014560 5.177e-05 1.860e-05 2.362e-05 5 864.448 -1718.7 1.00 0.370
10 -0.012920 5.207e-05 0.08634 4 860.130 -1712.1 7.55 0.014
(rest not shown)
# include elevation in all models
results<-dredge(all.parms,fixed =c("elev"))
results
>
Model selection table
(Int) dst elv pct.cvr slp df logLik AICc delta weight
2 -0.013280 5.195e-05 1.832e-05 4 863.907 -1719.7 0.00 0.455
4 -0.014560 5.177e-05 1.860e-05 2.362e-05 5 864.448 -1718.7 1.00 0.276
6 -0.013080 5.181e-05 2.002e-05 -0.01096 5 863.944 -1717.6 2.01 0.167
(rest not shown)
# objects created with dredge can also be used to
# create model averaged parameters
MA.ests<-model.avg(results, subset= delta < 2, revised.var = TRUE)
MA.ests$avg.model
>
Estimate Std. Error Adjusted SE Lower CI Upper CI
(Intercept) -1.376386e-02 2.373991e-03 2.384677e-03 -1.843774e-02 -9.089980e-03
distance 5.188373e-05 5.210960e-06 5.236138e-06 4.162108e-05 6.214637e-05
elev 1.842395e-05 3.598801e-06 3.616170e-06 1.133638e-05 2.551151e-05
pct.cover 2.362281e-05 2.286638e-05 2.297717e-05 -2.141161e-05 6.865722e-05
All of the functions above can be used with objects created by fitting linear models with the glm function. All of the above applies to these glm objects. However, there is one important difference that you should know. GLM’s such as Poisson regression and Logistic regression can often fail to meet statistical assumptions due to extra variance. This is often defined as over-dispersion (not an issue for normal linear regression!) and requires the use of quasi-AIC for model selection. Here is an example of overdispersion in count data fit using poisson regression and the glm function.
# fit global Poisson regression model
global.mod<-glm(count~area+distance+elev+ slope, data = dater, family = poisson)
summary(global.mod)
>
Call:
glm(formula = count ~ area + distance + elev + slope, family = poisson,
data = dater)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.6618 -2.0085 -0.9933 0.9147 3.6812
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.972e+00 1.113e-01 -26.709 < 2e-16 ***
area 2.654e-03 5.533e-05 47.972 < 2e-16 ***
distance 5.317e-03 1.552e-04 34.248 < 2e-16 ***
elev 2.095e-03 2.765e-04 7.576 3.57e-14 ***
slope -4.664e+00 1.589e+00 -2.936 0.00333 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 4460.39 on 254 degrees of freedom
Residual deviance: 898.82 on 250 degrees of freedom
AIC: 1603.1
# calculate c-hat to evaluate model assumptions, chat > 1 means overdispersion
chat<-sum(residuals(glob.mod,"pearson")^2)/glob.mod$df.residual
chat
[1] 2.898076
To account for the overdispersion, we use quasipoisson regression and specify “quasipoisson” in the glm function.
# fit global quasipoisson regression model to global model
global.mod<-glm(count~area+distance+elev+ slope, data = dater, family = quasipoisson)
summary(global.mod)
>
Call:
glm(formula = count ~ area + distance + elev + slope, family = quasipoisson,
data = dater)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.6618 -2.0085 -0.9933 0.9147 3.6812
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.9715045 0.1898075 -15.655 < 2e-16 ***
area 0.0026545 0.0000944 28.118 < 2e-16 ***
distance 0.0053167 0.0002649 20.074 < 2e-16 ***
elev 0.0020946 0.0004717 4.440 1.35e-05 ***
slope -4.6641697 2.7104978 -1.721 0.0865 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasipoisson family taken to be 2.910718)
Null deviance: 4460.39 on 254 degrees of freedom
Residual deviance: 898.82 on 250 degrees of freedom
AIC: NA
We then create our candidate model set, just like we did with lm function and the density response variable.
# Candidate set
modl2<-glm(count~area+slope, data = dater, family = quasipoisson)
modl3<-glm(count~area+distance, data = dater, family = quasipoisson)
modl4<-glm(count~area+elev, data = dater, family = quasipoisson)
# try model selection with quasi AICc, QAICc, be sure to supply the
# chat of the global model using the rank.args inside the model.set function
quasi.MS<-model.sel(global.mod,modl2,modl3,modl4, rank = QAICc, rank.args = alist(chat = chat))
## lets see what we got
as.data.frame(quasi.MS)
>
(Intercept) area distance elev slope df logLik
global.mod -2.9715045 0.002654473 0.005316652 0.002094616 -4.664170 5 NA
modl2 -0.7917247 0.002441674 NA NA 3.750308 3 NA
modl3 -2.5867633 0.002665374 0.005324297 NA NA 3 NA
modl4 -0.8972105 0.002437204 NA 0.001056615 NA 3 NA
QAICc delta weight
global.mod NA NA NA
modl2 NA NA NA
modl3 NA NA NA
modl4 NA NA NA
Huston, we have a problem! Look at all of those NA's in place of QAIC, deltas, and weights - something is wrong. This is because glm does not supply the log-likelihood for quasipiosson (and quasibinomial) regression. We need to create a function that will trick glm into giving up the log likelihood. Here it is below.
# This is needed to get the likelihood and calculate QAICc
x.quasipoisson <- function(...) {
res <- quasipoisson(...)
res$aic <- poisson(...)$aic
res
}
Now we use the function and the “update” function to set things up for model selection. This should be done on each candidate model.
# update the models so you can get the log likelihood
global.mod<-update(global.mod,family = "x.quasipoisson")
modl2<-update(modl2,family = "x.quasipoisson")
modl3<-update(modl3,family = "x.quasipoisson")
modl4<-update(modl4,family = "x.quasipoisson")
We are now good to go for model selection using QAICc.
# now conduct model selection
quasi.MS<-model.sel(global.mod,modl2,modl3,modl4, rank = QAICc, rank.args = alist(chat = chat))
as.data.frame(quasi.MS)
>
(Intercept) area distance elev slope df logLik
global.mod -2.9715045 0.002654473 0.005316652 0.002094616 -4.664170 5 -796.5328
modl3 -2.5867633 0.002665374 0.005324297 NA NA 3 -856.1141
modl4 -0.8972105 0.002437204 NA 0.001056615 NA 3 -1325.0153
modl2 -0.7917247 0.002441674 NA NA 3.750308 3 -1345.5074
QAICc delta weight
global.mod 562.0363 0.00000 1.000000e+00
modl3 598.9754 36.93913 9.522895e-09
modl4 922.5702 360.53391 5.141100e-79
modl2 936.7121 374.67577 4.367061e-82
We can now us any of the other MuMIn functions, like above. For example, we could use subset to select the best model.
## get the best model
subset(quasi.MS, delta == 0)
>
Model selection table
(Intrc) area dstnc elev slope df logLik QAICc delta
global.mod -2.972 0.002654 0.005317 0.002095 -4.664 5 -796.533 562 0
weight
global.mod 1
Models ranked by QAICc(x, chat = chat)
# yes, dredge works but only on updated model
dredge(global.mod, rank = "QAICc", chat = chat)
>
Global model call: glm(formula = count ~ area + distance + elev + slope, family = "x.quasipoisson", data = dater)
---
Model selection table
(Intrc) area dstnc elev slope df logLik QAICc delta weight
16 -2.9720 0.002654 0.005317 0.002095 -4.664 5 -796.533 562.0 0.00 0.612
8 -3.0120 0.002636 0.005336 0.001364 4 -800.893 562.9 0.91 0.388
(rest not shown)
MuMIn functions also can be used for model selection with generalized linear mixed models. Here we’ll use the lmer4 package and conduct some model selection with a hierarchical logistic regression model. Be sure to have lme4 installed.
#load lme4 package
require(lme4)
Let's load the example dataset.
## read data directly from website
trout<-read.csv("https://drive.google.com/file/d/1c9woE1O6FVZwPcdtRwwCMEu94I-fo6Du/view?usp=sharing")
## or read downloaded file
# trout<-read.csv("Westslope.csv")
#lets see what is in the file
head(trout)
The data contains presence and absence data for Westslope cutthroat trout from streams in watersheds within Interior Columbia River Basin. The file contains the following data:
PRESENCE- species presence (1) or absence (0)
WSHD - Watershed ID
SOIL_PROD - percent of watershed with productive soils
GRADIENT- gradient (%) of the stream
WIDTH- Mean width of the stream in meters
As with all model selection exercises, you should first fit the global model and evaluate model assumptions, such at the distribution of the residuals, independence, etc. Below we fit a global model (model1 and 4 candidate models using the westslope data with glmer function. Logistic regression so “family = binomial.”
## fit logistic regression with random effect output to model1
model1 <-glmer(PRESENCE ~ SOIL_PROD + GRADIENT + WIDTH + (1|WSHD),data = trout, family = binomial)
summary(model1)
>
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod]
Family: binomial ( logit )
Formula: PRESENCE ~ SOIL_PROD + GRADIENT + WIDTH + (1 | WSHD)
Data: trout
AIC BIC logLik deviance df.resid
423.0 448.1 -206.5 413.0 1115
Scaled residuals:
Min 1Q Median 3Q Max
-3.2280 -0.1067 -0.0278 -0.0053 13.9827
Random effects:
Groups Name Variance Std.Dev.
WSHD (Intercept) 19.41 4.406
Number of obs: 1120, groups: WSHD, 56
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.40275 1.83118 -3.497 0.000471 ***
SOIL_PROD 0.07282 0.03178 2.291 0.021964 *
GRADIENT 0.32147 0.04009 8.019 1.06e-15 ***
WIDTH -0.60649 0.07601 -7.979 1.47e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) SOIL_P GRADIE
SOIL_PROD -0.808
GRADIENT -0.362 0.156
WIDTH 0.019 -0.140 -0.485
# fit remaining candidate models
model2 <-glmer(PRESENCE ~ GRADIENT + WIDTH + (1|WSHD),data = trout, family = binomial)
model3 <-glmer(PRESENCE ~ SOIL_PROD + (1|WSHD),data = trout, family = binomial)
model4 <-glmer(PRESENCE ~ SOIL_PROD + WIDTH + (1 + SOIL_PROD|WSHD),data = trout, family = binomial)
model5 <-glmer(PRESENCE ~ SOIL_PROD + WIDTH + (1 |WSHD),data = trout, family = binomial)
# conduct model selection using BIC
my.models<-model.sel(model1,model2,model3,model4,model5,rank=BIC)
my.models
>
Model selection table
(Int) GRA SOI_PRO WID random df logLik BIC delta weight
model2 -3.410 0.3186 -0.6040 W 4 -209.270 446.6 0.00 0.681
model1 -6.403 0.3215 0.07282 -0.6065 W 5 -206.520 448.1 1.52 0.319
model4 -6.667 0.12800 -0.4401 1+S_P|W 6 -257.123 556.4 109.75 0.000
model5 -1.910 0.04778 -0.4397 W 4 -264.977 558.0 111.41 0.000
model3 -5.233 0.04074 W 3 -309.658 640.4 193.75 0.000
Models ranked by NULL
Random terms:
W = ‘1 | WSHD’
1+S_P|W = ‘1 + SOIL_PROD | WSHD’
The best approximating model is model 2, which contains gradient, width and a randomly varying intercept. We also can use any other MuMIn function, e.g.,
# calculate importance weights
importance(my.models)
>
GRADIENT WIDTH SOIL_PROD
Importance: 1.00 1.00 0.32
N containing models: 2 4 4
## yes dredge works here too, yikes!
dredge(model1, rank = BIC)
>
Model selection table
(Int) GRA SOI_PRO WID df logLik BIC delta weight
6 -3.41000 0.3186 -0.6040 4 -209.270 446.6 0.00 0.681
8 -6.40300 0.3215 0.07282 -0.6065 5 -206.520 448.1 1.52 0.319
2 -6.80900 0.2375 3 -262.364 545.8 99.17 0.000
4 -9.05000 0.2390 0.05337 4 -259.790 547.7 101.04 0.000
5 0.07927 -0.4396 3 -267.178 555.4 108.79 0.000
7 -1.91000 0.04778 -0.4397 4 -264.977 558.0 111.41 0.000
1 -3.54100 2 -311.858 637.8 191.13 0.000
3 -5.23300 0.04074 3 -309.658 640.4 193.75 0.000
Models ranked by BIC(x)
Random terms (all models):
‘1 | WSHD’
One word of warning, do not estimate model averaged parameters for mixed models! You can, however, model average the predictions of GLMM.
Useful references
Anderson, D. R., K. P. Burnham and W. L. Thompson. 2000. Null hypothesis testing: problems, prevalence, and an alternative. Journal of Wildlife Management 64:912-923.
Anderson, D. R., W. A. Link, D. H. Johnson, and K. P. Burnham. 2001. Suggestions of presenting the results of data analysis. Journal of Wildlife Management. 65:373-378
Anderson, D. R., and K. P. Burnham. 2002. Avoiding pitfalls when using information-theoretic methods. Journal of Wildlife Management 66:910916.
Burnham, K. P., and D. R. Anderson. 2002. Model selection and inference: a practical information theoretic approach, 2nd ed. Springer-Verlag, New York.
Examples of papers using AICc and importance weights
Reiman, B.E., J.T. Peterson, and D.L. Myers. 2006. Have Brook Trout Displaced Bull Trout in Streams of Central Idaho?: An Empirical Analysis of Distributions Along Elevation and Thermal Gradients Canadian Journal of Fisheries and Aquatic Sciences 63:63-78.
McCargo, J.W. and J.T. Peterson. 2010. An evaluation of the influence of seasonal base flow and geomorphic stream characteristics on Coastal Plain stream fish assemblages. Transactions of the American Fisheries Society 139: 29-48.