As we have already seen, it is typical in CMR analysis that more than one model is capable of explaining the data, even if we are confined to operating within a particular model structure (like CJS). A few features are typical of competing CMR models:
There is generally a "most complex" or "global" model that includes variation in parameters over groups, time occasions, etc. E.g., the Phi(g*t) p(g*t) model for CJS
Models can be formed from this global model that have different numbers of parameters (complexity), with some having parameters that are constant over groups, times, or other dimensions, or that express variation over time or individuals via covariate relationships.
All the models (global and sub-models) provide estimates or predictions of the "real parameter" values of the global model, and
These estimates can and typically differ in value among the alternative models.
Given the above, there are some interrelated questions that need to be asked:
What alternative models should I consider?
Which model(s) should I use for inference?
How do I get estimates and confidence intervals for the real parameters and predictions of interest, given that these are different among the models?
Selection of models
Thought should be given to the selection of a candidate set of models. Obviously, one of the candidate models needs to be the "global model" , but which others? For some model structures this would be a fairly small list, but for others if taken "all possible submodels" could be a very long list indeed. For the CJS model structure we immediately get 16 models including the global if we have groups and time involved, and that is not including design matrix models, nor does it include various permutations of parameters that are constant (e.g., first 2 years constant, last 8 time varying, next 2 constant, remaining 8 time varying, first and last constant, etc.).
Some immediate questions to ask about a candidate model are:
First, does the global model from which it was developed fit the data? See the previous section for a discussion of model fit and overdispersion correction.
Second, is the model plausible biologically? E.g.., many of the permutation examples above are nonsensical and should not be included;
or, is the model at least something that could arise on statistical sampling grounds (e.g., we probably don't expect Phi to really be constant over time or groups, but the data may not support a more complex model).
Finally, often there are a priori hypotheses of interest and if possible these should be expressed in terms of model parameters and constraints. Some examples:
Survival is greater for males than females: Phi(sex) and Phi(.) models should both be in the set
Survival changes as a linear function of temperature: include Phi(.), Phi(t), and Phi(covariate) models in the set (we will consider covariate models in the next lab).
Model comparison via likelihood ratio
Because all of our models are based on maximum likelihood (ML), ML can be used to conduct traditional tests known as likelihood ratio tests (LRT). In LRT, one model (H0) is obtained as special case of the other (more general Ha) model by constraining parameters. For example, we could compare H0: Phi(.) vs. Ha: Phi(t) by setting the Phi parameters equal over time, finding the ML estimates and likelihood value, and computing a ratio of the likelihood statistics. MARK provides all the ingredients to do LRT but we will usually not use this approach for several reasons, some philosophical, others practical:
LRT (and null hypothesis testing in general) is more appropriate to an experimental approach where we are controlling a factor, than an observational paradigm
LRT involves pairwise comparison among models, tedious if we have a large number of candidates
LRT requires that H0 is a subset of Ha, but many models we construct are not nested subsets (e.g., Phi(t)p(.) is not nested in Phi(.)p(t), although both are nested in Phi(t)p(t))
By focusing on hypothesis testing, LRT discards information about the "losing" models. By contrast multi-model inference (MMI) retains that information.
Model comparison via AIC; Model averaged estimation and prediction
Akaike's Information Criterion (AIC) is also base on likelihood, but avoids the above drawbacks of LRT. AIC is computed based on the deviance (a standard part of the output from MARK/RMark models) and described earlier as a measure of model fit to the data. By definition, are global model will have the lowest deviance and all other models will have higher deviance values. AIC applies a penalty to the deviance of 2 times the number of model parameters. Further adjustments are made based on small sample sizes (AICc) or overdispersion correction (QAICc), as illustrated below. The AIC (with adjustments) values are then transformed to produced normalized weights that are between 0 and 1 for each model, and sum to 1 across the candidate models.
AIC values and model weights can be used in a variety of ways and these are not mutually exclusive:
To rank the models from highest to lowest weight for comparison of relative model support
To select a top model or subset of models based on a weight cutoff or "confidence set" of models
To average estimates over the multiple models, with appropriate model weighting
AIC values and model weights can be fairly easily computed from standard model output for any ML model. Alternatively both RMark and MARK have built in procedures that automate these calculations. Users of RMark are free to write their own variants of these procedures as user-defined functions in R, which may come in handy for the very few cases where the built-in functions are unavailable or fail.
RMark implementation
Given a set of MARK models that have been run in the current RMark session, the function collect.models() assembles all the models into a list object containing all the individual model results, computes AIC and delta AIC for each model, and computes normalized model weights for the model set. By default, the collect.models() function computes the AICc small sample adjustment, but does not adjust for overdispersion, which we will do below.
We can illustrate this function with the Dipper example The run.dipper function specifies our set of 16 models (we can add more if we want), collects them in the collect.models() function, and returns this as a 'results' object to the main program. The last line of run.dipper() is
>return(collect.models())
which when called from the main program produces a results object:
>dipper.results=run.dipper()
>#dislay all models
>dipper.results
> dipper.results
This produces a table of AICc, delta AICc, and AICc weights. We can adjust the results for overdispersion by the adjust.chat() function; applying this with chat=1.31 (based on the previous bootstrapping results) we get a revised QAICc table of results
>#adjust AIC for oversdispersion>adjusted.results<-adjust.chat(chat=1.31,dipper.results)
>adjusted.results
We can go to the results object and get estimates of the real a parameters under specific models, which of course will be different depending on model:
>#get particular model real estimates under 2 models
>summary(adjusted.results$phi.sex.T_p.sex.T) #global model
>summary(adjusted.results$phi.sex_p.sex) #Phi(sex)p(sex) model
Alternatively (and usually preferably, if we are mainly interested in estimation or prediction), we use the model.average() function to obtain model average estimates. This provides
Estimates of real model parameters computed as a weighted average over the model set, with the weights based on QAICc values
Estimates of unconditional variance of the estimates, which include 2 components
Variance of estimates under each model about the weighted average (model uncertainty)
Variance of the estimates conditional on each model (sampling uncertainty)
The unconditional variances and confidence intervals will generally be larger (wider) than those that ignore model uncertainty, and therefore provide a more "honest" assessment of what our data says about our parameters.
For the Dipper example we can implement model averaging, focusing on the survival (Phi parameters)
>##model averaging
>mod.avg<-model.average(adjusted.results,parameter="Phi",vcv=T)$estimates
The model.average() function returns all the real parameter values for the "all different" PIM design. Since we are making CJS assumptions of no cohort effect, we'll use the first (1980) cohort that contains all the relevant time and group effects:
>mod.avg<-subset(mod.avg,cohort==1980,select=c("estimate","se","lcl","ucl","group","time","Time"))
>print(mod.avg,row.names=F)
estimate se lcl ucl group time
0.5585209 0.09292556 0.3767232 0.7258769 Female 1980
0.5574686 0.09044244 0.3804465 0.7210031 Female 1981
0.5576159 0.08994401 0.3815097 0.7203361 Female 1982
0.5585135 0.08982069 0.3825289 0.7209322 Female 1983
0.5584119 0.08967033 0.3827221 0.7206019 Female 1984
0.5582632 0.09144314 0.3792687 0.7232991 Female 1985
0.5630435 0.09344739 0.3796755 0.7306598 Male 1980
0.5619916 0.09103527 0.3832924 0.7259348 Male 1981
0.5621392 0.09053167 0.3843699 0.7252666 Male 1982
0.5630361 0.09036415 0.3854709 0.7257921 Male 1983
0.5629340 0.09021913 0.3856551 0.7254720 Male 1984
0.5627873 0.09198960 0.3821913 0.7281431 Male 1985
>
If we wish, we can easily send this data frame to a plotting routine, e.g.,
fem<-subset(mod.avg, group=="Female")
mal<-subset(mod.avg,group=="Male")
with(mal,plot(Time+1980,estimate,type="l",col="black",xlab="Year",ylab="Phi",main="Male =black Female=red",ylim=c(0.4,0.7)))
matlines(mal$Time+1980,mal$lcl,lty=2,col="black")
matlines(mal$Time+1980,mal$ucl,lty=2,col="black")
matlines(fem$Time+1980,fem$estimate,lty=1,col="red")
matlines(fem$Time+1980,fem$lcl,lty=2,col="red")
matlines(fem$Time+1980,fem$ucl,lty=2,col="red")
This visual representation makes it clear that there is little data support for either temporal or group differences in Phi, consistent with QAICc results.
Note the huge increase in the confidence interval for the last time period. This is due to the issue of parameter identifiability, something we will return to later. Briefly, in the CJS model last Phi and p parameters (Phi_k-1 and p_k) are not separately identifiable: only their product is estimable from the likelihood. We would therefore ordinarily concentrate on inference about the remaining parameters (Phi_1 to k-2 and p_2 to p_k-1).
R script to implement the above calculations.
MARK implementation
We again illustrate AIC calculations and model averaging using the Dipper example. A zip archive contains the same 16 models run with the Dipper example. Unzip this and open the project in MARK. The results browser contains summary statistics for all 16 model, including AICc, with the models sorted by descending AICc weight. The deviance values are of course identical to the RMark values, with the global model Phi(g*t) p(g*t) having the lowest deviance (by definition, best fit to the data). AICc values are also identical except for the cases where MARK counts fewer parameters than RMark. Note that the global model has the highest QAICc value (lowest model weight); apparently we do not need a model of this complexity!
The AIC values can be adjusted for overdispersion using the Adjustments/ c-hat tool.
Manually enter in the new c-hat values (e.g., 1.31 from the bootstrap simulations).
Once the c-hat adjustments are made, the AICc values are converted to QAICc, new weights computed, and if necessary the models are re-ordered to reflect the new weights.
Model averaging is enabled from the Output/Model Averaging pull down menu. Select 'Real' to obtain estimates of real parameter values (Phi and p). "Derived" parameters are functions of real parameter values that are available under some of the model structures.
For CJS models tabs appear containing lists of the real parameters under the "all different" PIM structure for each group. Select the top row (first cohort) for the males (check box)
and the top row (first cohort) for the females (check box). Also specify "Estimates in Excel spreadsheet".
A text file contains the model-specific and weighted output, and the spreadsheet contains the model-averaged estimates , unconditional SEs and CIs for the real parameter values.
Next: Review Exercises