Overview
First, please review the MARK book chapter on Goodness of Fit.
Also review these notes by Jay Rotella from Montana state-- especially on "deviance and the saturated model"
All of the goodness of fit methods for count data are based on various notions of "discrepancy" between what the model predicts and what the data say. The trick is to to determine if discrepancy is large enough to conclude that the data and the model disagree "significantly". By "significantly" here I do not necessarily mean "at some arbitrary p value", but rather more subjectively "enough to matter in terms of bias, precision, confidence interval coverage, etc". Thus almost by definition measures of model fit are going to be relative. Further, while the goal is usually going to be to find a general model that fits "well enough", we will later see that this model often (usually) is a jumping off point to considering other models (derived from this general model) to address specific questions, or to reduce model complexity in the interest of parsimony (address further under multi-model inference).
MARK and alternatives to MARK
RMark currently has little in the in the way of built in goodness of fit testing, beyond computation of deviance/ df ratios and other statistics directly computable from the lo- likelihood. By contrast, MARK does have some built in goodness of fit procedures, including access to the program RELEASE bootstrap goodness of fit testing (both discussed further below). In addition programs U-CARE by Remi Choque et al. and the mra procedure in R by Trent McDondald perform goodness of fit testing for certain types of CMR models. I am aware of no procedure that does omnibus testing for all the models available within MARK, however. This lack of general procedure is something I'll return to below.
I will illustrate each of a series of approaches by way of the Dipper example, an "ideal example" in that quite a number of approaches for GOF will work for the CJS model structure, but the example also can be used to illustrate some more "non standard" approaches.
We'll start by running the Phi(g*t) p(t) model in RMark
> data(dipper)
>
> #formulas for general ("global") CJS model (sex*time)
> Phi.sex.T=list(formula=~sex*time)
> p.sex.T=list(formula=~sex*time)
> Phi.t=list(formula=~time)
> p.t=list(formula=~time)
>
> dipper.processed=process.data(dipper,model="CJS",begin.time=1980,groups="sex")
> dipper.ddl=make.design.data(dipper.processed)
> global<-mark(dipper.processed,dipper.ddl,model.parameters=list(Phi=Phi.sex.T,p=p.sex.T))
> global
Deviance / df
A statistic that is available automatically in both RMark and MARK is the the deviance statistic.
> collect.models()
model npar AICc DeltaAICc weight Deviance
1 Phi(~sex * time)p(~sex * time) 24 704.9436 0 1 71.47397
The deviance is defined by the difference between the -2 *log-likelihood of the fitted model (in this case, Phi(g*t)p(t) and the saturated model ( a model n parameters that fits the n observations perfectly.” Neter et al. 1996. Applied linear statistical models. 4th Edition. Irwin. ). For a CJS model, a "saturate model" would include both cohort and time effects, and would essentially have as many parameters as there are potential capture histories (a lot!). You can see the details of how this is computed by displaying the full MARK output file:
> global
The saturated model has -2log-likelihood =582.47711; the estimated model =653.95108 ; the difference between these is the deviance = 71.47397. The deviance divided by its degrees of freedom is an estimate of c-hat, the variance inflation factor. We can read c-hat from the output or calculate it from the model object as
> #deviance / df
> c.hat<-global$results$deviance/global$results$deviance.df
> c.hat
[1] 3.761788
So, evidently we do not have perfect model fit (c.hat >1) - but how bad is it?
We will return to this question below, but first a detour into MARK- land. As we have seen before, we can export our data into MARK format, by
> #export RMark data to MARK format
> export.chdata(dipper.processed, filename="dipper",replace=T)
and then read this into MARK and run the Phi(g*t) p(g*t) model. It is also possible to directly export models already run in RMark and import them into MARK. We have created the equivalent MARK model and saved the MARK project here. Unzip this file and open the MARK project. You will see in the results browser the single model Phi(g*t)p(g*t) and the identical deviance value (the AIC values differ because of different parameter counting). If you right click and select "output in Notepad" you will see the saved log likelihood, deviance and c-hat statistics that RMark produced.
The full output for this model is here.
Program RELEASE
Now we are going to conduct a couple of tests that RMark does not do (at least, as built in functions). The first of these, RELEASE, was specifically designed for CJS studies involving releases of multiple groups (like treatment and controls) but is readily applicable to more generic CJS observational studies that do not involve age structure. The program exists as a stand alone program, but is more conveniently run from the MARK menu under the Tests pull down menu:
Selecting this option produces an output file, a portion of it which is reproduced below but is here in its entirety. The output file is quite detailed (more than is needed for most applications) and for GOF testing really boils down to the summary of 2 tests: Test 2 and Test 3, which deal with tests of standard CJS assumptions about lack of trap dependency on previous capture and homogeneity of re-encounter probabilities. We start with investigating the sum of these 2 tests for each of the groups
The summed tests indicate no issues for either group (P>0.38). If there had been an indication of test rejection (say P<0.05) then we would need to look further (which group, which test, etc.) for the cause. Since there is no indication of lack of fit based on these tests, we will move on.
Deviance residual plots
Earlier we saw how we can get an overall deviance statistic for the model. MARK also provides for computation of this statistic for every observed encounter history, and plots these to see if there are trends or outliers. This feature is obtained by going to the "Graph Deviance Residuals" icon of the results browser.
A plot of the residuals over the capture histories is reproduced below. Individual history results can be displayed by clicking on the graph points.
These are standardized residuals, so if we think in terms of standard normal deviates we would expect the residuals to vary between about -2 and +2 under a fitting model. Based on this criterion, most of the residuals values are expected, with only a few encounter occasions counting as "outliers".
Bootstrap methods
We will move on to some more complex methods based on simulation. Here we use a particular class of simulation known as parametric bootstrapping. In this approach, we first fit a model (generally, the "global" or other higher-complexity model) to our observed data, estimate parameters using maximum likelihood, and compute one of the usual discrepancy statistics such as deviance/df. We then simulate new data under the candidate model, taking the estimated parameter values as fixed. After repeating this process a large number of times, we compare the discrepancy estimated from our sample data to the distribution of simulated discrepancies. Based on where our observed value falls in this distribution we conclude that the observed value is consistent with (fits) or inconsistent (does not fit) the null distribution. If we are using deviance as our discrepancy measure we can calculate the mean of the simulated deviances. The ratio of the observed deviance to the mean simulated deviance can also be used as an alternative measure of overdispersion (c-hat).
MARK implementation
MARK contains 2 built in bootstrap simulation programs that work for CJS and a few other model structures.
Median c-hat tool
The first of these purportedly gives a more accurate estimate of c-hat and is based on a logistic regression approach
Occasionally, the simulation seems to hang up and produce results that result in MARK errors. If you get a message like the one below simply click on "ok" to keep going.
T
The output for the median simulation is here, with key results indicated below.
The above results suggest a need for a slight overdispersion correction (c-hat =1.06). This value can be used in the results browser to adjust AIC and other likelihood measures for overdispersion under a process known as quasi-likelihood. A similar procedure is enabled in RMark as described below.
Bootstrap simulation tool
An older bootstrap simulation tool is also available for CJS and a few other structures (e.g., the Seber parameterization of dead recovery models). It is implemented (here for the Dipper example) from the Tests pull down menu
Users can select optional c-hat, df and deviance output, or just deviance. Since the model parameters and data structure (and therefore df) should not change among simulated data sets, deviance is faster and can be divided by df later.
Users are asked to specify an output file (named by default). If the same named file exists the current simulation results will be appended to the old ones, so be sure to rename your file if you are doing a new bootstrap (as opposed to resuming a previous, e.g., aborted, one).
We must specify the number of simulations and a 'seed' or starting value for the random number generator. The defaults are 100 simulations (which is low-- you probably want more) and 0 (which takes the computer clock time to get a starting value).
.
We can monitor progress (and decide to take a walk, go get a beer, or take a nap)...
When we come back / wake up, the simulation hopefully will be done (unless you got crazy and specified 1000K simulations), and we can view the results (which can also be opened up outside MARK, since they are Dbase files)---
MARK has a nice built-in tool that will compute the mean and CI for the simulated deviance.
Recall that the observed deviance was 71.47, which is clearly higher than expected under the assumed model (upper limit of 55.66 from the simulations). The ratio of the observed deviance to the mean simulated deviance is 71.47/ 53.81 =1.33, which is a bit higher than suggested by the median c-hat approach (1.06). As suggested in Chapter 5 of the MARK book, a 'conservative' (statistically, not politically) approach would be to take the higher variance inflation value. In addition, I have found the median c-hat approach a bit less stable than the ordinary bootstrap, and since I frankly understand the results of the ordinary bootstrap a bit better I tend to favor it over median c-hat.
Implementation in RMark via user defined functions
As noted earlier, RMark does not (at least currently) have built in capacity to perform bootstrap simulations. However, it is fairly easy to write code in R to simulate data based on parameter estimates, and I have written a series of user-defined functions that do so for CJS data, send the data to RMark within a bootstrap loop, keep track of the deviances from the simulated data, and perform a comparison to the observed. Script to implement this for the Dipper example (along with the previous deviance/df computation of c-hat) is here. The results I just ran indicate a broader CI (37-84) and an adjusted c-hat of ~1.31 (100 simulations), very close to the result from the MARK bootstrap simulation above.
>sim.out<-sims(Phi.fem,Phi.mal,p.fem,p.mal,marked,reps=100)
> sim.out
$deviance.mean [1] 54.22543 $deviance.025 2.5% 34.9605 $deviance.975 97.5% 74.59993 > #compare data deviance and simulation > data.deviance<-global$results$deviance > data.deviance [1] 71.47397 > sim.ci<-c(sim.out$deviance.025,sim.out$deviance.975) > cat("data deviance = ",data.deviance, "simulation mean = ", sim.out$deviance.mean, "simulation 95%CI = ", sim.ci, "\n") data deviance = 71.47397 simulation mean = 54.22543 simulation 95%CI = 34.9605 74.59993 > > #modified c.hat > c.hat<-data.deviance/sim.out$deviance.mean > cat("modified c.hat=",c.hat, "\n") modified c.hat= 1.31809
RMark allows one to adjust the likelihoods of a list of MARK models using the function adjust.chat. For example to adjust by c-hat=1.31 we would use chat=1.31 in the function (default =1 , no adjustment). I added a couple of models to the global model including a Phi(.)p(.) and a Phi(t) p(t) model and created a results object collecting these models
>time<-mark(dipper.processed,dipper.ddl,model.parameters=list(Phi=Phi.t,p=p.t))
>null<-mark(dipper.processed,dipper.ddl)
> results<-collect.models()
> results.adj<-adjust.chat(chat=c.hat,results)
In this example, although the delta QAIC values differ (delta QAIC~13 for the top 2 models vs. delta AIC ~11), there is little practical impact on SEs, CIs and other measures, since the top model has such high weighting. In general though the overdispersion adjustment can result in different model rankings and longer CIs on both model-specific and model-averaged estimates.
The script to do the calculations in R (deviance/ df and bootstrap) is here.
Finally, the above illustration of R user defined functions is important, because in fact we can do bootrapping for any model for which we can simulate the data and run RMark-- which is virtually of them (see example simulation code here for several other common data structures, including some non-CMR examples). MARK on the other hand does bootstrapping for only a few models.
Summary of recommended approach
I hesitate to offer set rules of for anything including goodness of fit, but here are a few guidelines:
Start with a sufficiently general model capable of capturing all major, anticipated sources of variation in the data, contingent on some general study design parameters and goals. E.g., for a open CJS analysis over 5 years for 2 groups but no age structure, the general model will be Phi(g*t) p(g*t) . If age structure is involved it will be Phi(age*g*t) p(g*t) if recaptures occur only in the "adult" age class, otherwise it will be Phi(age*g*t) p(age*g*t)
If available, use a frequency-based method such as RELEASE to test for general lack of fit in the general model
Also examine the ratio of deviance to deviance df. If this is very large (> 3) then there may be issues with fit, but note (see MARK book discussion on this) that these deviance measures tend to overstate lack of fit especially when data are sparse
If available, conduct bootstrap GOF tests in MARK, or use RMark to simulate data under the assumed model and compare the simulated and observed discrepancy measures
If the general model exhibits apparent extra binomial variation (data deviance/df >1 but <10 and "unusual" in comparison to the simulated ratio), apply variance inflation correction (c-hat) to correct likelihood based measures of variance and confidence and AIC values.
If model fit is poor and beyond that reasonably correctable by c-hat inflation
Report this fact when discussing results and caution readers that inference may be thereby compromised.
Take measures to collect future data in a manner that will allow proper identification and modeling of relevant sources of variation. Usually this means "more data, better design".
Next: Multi-model inference