Implementation in MARK

There are 2 basic types of input formats. One we have already seen is the "LDLD" format (previously used in known fates analysis), and we will see how this works when we use RMark.  MARK also provides more efficient, triangular matrix input format for recovery data that summarize the essential information, which is the number of animals tag, released, and recovered in each recovery period (and by subtraction not recovered).

We illustrate the above basic model structure, with the added wrinkle of stratification by groups, using a data set for adult male and female American black ducks banded during 1989-98 (10 years) and recovered over the same 10 years. The data are encoded in the summary band recovery format for MARK, which is 1) a matrix of band recoveries, followed by 2) a vector of the numbers banded and released in each year. Read these data into program MARK, being sure to use the Brownie parameterization  (down the model list a ways). Note that in this example there are 10 encounter occasions (and 2 groups, male first and then female). In general, the number of 'encounter occasions' corresponds for a band recovery problem is the number of recovery periods. In MARK the data matrix must contain the same number of rows (capture occasions) as columns (recovery occasions). When k > l this is accomplished by simply adding 1 or more rows of zeros in the recovery matrix, and augmenting the banding vector by zeroes.

I read these data in and ran all 16 of the standard PIM models; the analyses are stored in the MARK project files (dbf and fpt) which you can open directly in MARK (save these to a directory and then open them). Open and examine the PIM for the "global" S(g*t)f(g*t) model. The first PIM is for survival rates of males, and will look as follows:

That is, the first 9 parameters are the survival rates for adult males for the 1st, 2nd, ... 9th interval (i.e., there are 10 years of banding, so 9 intervals), S1, S2, S3, ....., S9. These parameters are time-specific but not cohort (year of release) specific, so for instance an animal released in year 3 has exactly the same survival rate between year 5 and 6 (parameter index 5) as one release in year 1, since these indices are the same throughout the column (representing survival year) for all rows (representing release years). Likewise the female survival rates are indexed by

In the same manner, the recover rates for males are indexed by parameters 19-28, and for females of 29-28 (one for each year of banding for each sex).

This is the general or 'global' model describing time- and group (sex)- specific survival and recovery. Reduced parameter models can be formed either automatically (with the run|predefined models feature) or using the PIMs. For example, survival can be made time- but not sex- specific by setting the first 2 PIMS equal to

Other models, including ones not available in the menu, can be created by manipulating the PIMS. For example,

specifies that annual survival rates are constant during the 1st 5 years and last 4 years, but hypothesized as different between 2 periods. Many additional models can be constructed from the PIMs, but some models will require use of a design matrix. Return to the "global" model S(g*t)f(g*t); retrieve this model, and obtain the 'full' design matrix. Consider the portion of this matrix related just to survival in the 2 groups (parameters 1-18). You should see something like this:

where the first column is an intercept (on the logit scale), the second a "group effect" (1=males, 0=females, the next columns 8 columns are time (year) effects, and the last 8 columns are group*year interaction. As with the PIM version of this model, it allows for all combinations of survival rates in group*year, i.e. 18 survival, 9 for males and 9 for females. However, it may be possible to describe both sex- and time-specific survival variation in a biologically meaningful but simpler way. In particular, whereas there may be intrinsic differences in male and female survival (e.g., females have lower survival because of increased risk to predation during nesting), there are other factors that vary from year to year, such as weather conditions, that should affect males and females equally, one the average difference between male and female survival is accounted for. Under such a model there would continue to be group and time effects, but no group * time interaction. This model is formed by simply eliminating the Sg1*t1 - Sg1*t8 columns in the previous matrix, producing

This is a model in which (on the logit scale) group effects are additive to time effects; a plot of logit-transformed survival rates over time would appear parallel for the 2 groups (males and females), with one group consistently above or below the other one.

If there is evidence (e.g, based on AIC) that this model performs well compared to the more complex model, it probably should be used to describe the time and sex variation in survival for 3 reasons: 1) it makes biological sense, a priori; 2) it is a simpler model, with fewer parameters, and 3) often it is easier to understand biological effects in the absence of interaction. The same logic may hold true for recovery rates: if there are sex-specific harvest regulations then a priori one expects the harvest rates for one sex to be higher or lower than for the other sex. Therefore construction of additive models for the recovery rate groups of parameters may be sensible, as well.

Model averaging

Although one model S(g) f(t) is the clear leader in terms of AIC, you can still do model averaging to take into account model selection uncertainty. That is done by selecting from the toolbar 'output/model averaging/real' and selecting the real parameters (S and f) you want to average.

Goodness of fit and QAIC

Before proceeding further with model building and testing, it is important to check the "global model" to see if it adequately fits the data. MARK produces a "deviance" statistic which can be used as a measure of model fit. Under the null hypothesis that the model fits the data, this statistic follows a chi-square distribution with degrees of freedom (df) supplied by the MARK output, and one may perform a test of this null hypothesis by comparison to a chi-square table. However, a somewhat more useful approach is based on the recognition that if the model fits the data, in theory the ratio

c-hat= deviance / df,

which is also produced by MARK, should equal 1. In practice this ratio often exceeds 1, even for models in which all sources of variation (e.g., time, group, age) are captured by the model. There are 2 possible reasons for this. First, an even more complex model is needed to capture variation in the data, i.e, there is heterogeneity in survival, recovery, or both that is not captured by the model. This could happen when one is using a one-age model inappropriately, when juvenile animals are part of the released sample. Stratifying the data by age and using a 2-age model may solve this problem. However, even when all identifiable forms of variability have been modeled it is possible that the basic multinomial model still does not capture properly reflect variability in the data, due to what is known as "overdispersion" or "extra-biniomial variability". In either case, likelihood values, AIC, and parameter variances should all be adjusted accordingly. Assuming that c-hat fairly represents this "extra-biniomial variability", the preceding statistics can all be adjusted with MARK, using the 'adjustments' tool in the results browser. This adjustment produces what is known as a "quasilikelihood" or likelihood adjusted by the variance inflation factor, and from this "quasi-AIC" or QAIC; estimated variances and confidence intervals are also adjusted. In the black duck example MARK produces a c-hat of 1.26, which should be used as an adjustment. Computation of c-hat based on the model deviance is sometimes not an accurate reflection of "extra-biniomial variability", and alternatives based on bootstrap sampling exist for the band recovery (Seber parameterization, see below) and some other models in MARK. I performed 1000 bootstrap simulations under the S(g*t) f(g*t) model and came up with an average c-hat of 1.18. Since the simulations generated data where S(g*t) f(g*t) was the "true" model, a fairer computation of c-hat based on the data would 1.26/1.18 = 1.07, which is the value I recommend as an adjustment.

Seber parameterization

For comparison, I also analyzed these data under the Seber parameterization (dbf and fpt files).  The global model is similar (slightly differences in point estimates and likelihood values) but notice that model ranking is quite different.  My interpretation is that some of the variation that properly belongs in the recovery process (f) is being thrown into S; it is common that for game birds like ducks annual survival rate varies less than harvest rates.  I think the Brownie parameterization is much more sensible for these types of data, where we are trying to cleanly separate the recovery (harvest-driven) process from overall survival (which includes harvest as well as other factors).

Covariates

Because recovery models are special case of CJS, they can be generalized to include multiple groups, age structure (next time), and covariates.    Time or individual covariates can be included in the design matrix; time covariates are illustrated below.

One-age example with a time covariate

As with known fates data, both time-specific and individual covariates can be incorporated into band recovery analysis. As an example, we have data from adult female mallards during 1966-78 in Prairie Canada (see page 22 of Chapter 16 of Williams et al. 2002). A negative density-dependent relationship was hypothesized between survival rates and breeding densities (ducks / pond) of ducks; the latter statistic for each of the years 1966-77 (the last year over which survival could be estimated) was:

D={0.978, 2.245, 1.268, 1.090. 1.336, 1.307, 2.260, 0.665, 0.890, 1.318, 2.116, 0.871}.

Read the band recovery data for these mallards (13 occasions, 1 group, no individual covariates). Construct and run the global S(g*t) f(g*t). Then retrieve the design matrix for this model and obtain the "full" design matrix.

Formulate the covariate model by creating a new design matrix::

Construct an additional model in which survival is constant over time (either by eliminating column 2 above, or using PIMs); finally, combine the various survival models with assumptions about recovery (f(t), f(.)) and find a "best" model.

Individual covariates would be handled in a similar fashion as with known fates data, requiring use of the encounter history LDLD format to enable inputting of individual covariates along with the encounter histories.

Implementation in RMark