We will start with the case where all releases cohorts are of known ages. The typical (and most straightforward to interpret) situation arises when all captures are in the natal (birth year, hatching year, etc.) class. In this case there is no ambiguity about the ages of recaptured animals, since age is known by definition from the time elapsed between release and recapture (if any).
Release cohort of known ages (e.g., all natal)
RMark formulation
We illustrate a basic cohort/age model by a data example where all animals are marked and released when they are age 0 (natal year). In the example, animals are captured and recaptured on 10 occasions. Obtain the data here and save it to your local directory. It is already formatted in RMark format and is accessed by
> load("cohort.example")
We can view the first several lines of the file by the head() function:
>head(cohort.example)
ch
1 1000000000
2 1000000000
3 1000000000
4 1100000000
5 1010000000
6 1000000000
We process the data as usual for CJS modeling; we will also make the design data at this step, since this data contains some essential information for proper age modeling
> cohort.processed=process.data(cohort.example,model="CJS")
> cohort.ddl=make.design.data(cohort.processed)
At this point, let's take a quick look at the design data (cohort.ddl), focusing on the survival component
> cohort.ddl$Phi
Notice that this involves 45 parameter indices. We will focus on the cohort, age, and time columns. Note that these are all factors variables, so that the levels are treated as groups rather than numerically. By contrast, Cohort, Age, and Time are treated as numerical values (e.g., ages 0 to 8). These distinctions, while adding some complexity, give us a lot of flexibility in modeling age and time-dependent survival.
Now look at how "age" and "time" vary together for each cohort. For the first cohort, released at time 1, animal are age 0 at time 1, age 1 at time 2, etc. For the second cohort (release at time 2) they are age 0 at time 2, age 1 and time 3, and so on for the rest of the occasions and cohorts. At various times we will need to refer back to this structure, in order to create specific models.
Model building proceeds in a similar fashion to time-specific CJS models, with the added dimension of age. We will build several models for comparison, focusing on age and time variation in Phi and leaving p constant (p.dot) for now. First we need some parameter formulas we can refer to in our model statements:
> #formulas for survival and capture
> Phi.age=list(formula=~age)
> Phi.dot=list(formula=~1)
> Phi.t=list(formula=~time)
> Phi.age.t=list(formula=~age+time)
> Phi.age.T=list(formula=~age*time)
> p.dot=list(formula=~1)
>
Our first model describes Phi are varying solely with respect to age (0 -8)
> #phi.age
> phi.age<-mark(cohort.processed,cohort.ddl,model.parameters=list(Phi=Phi.age,p=p.dot))
We'll skip over the estimates and other results for the time being, but take a look at the PIM matrix for Phi
> PIMS(phi.age,"Phi")
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
If you contrast this with the time-specific CJS PIM (earlier lab, and also below) you will notice that instead of the parameter indices varying over time (so, the same in every column) they vary over age (time elapsed since initial release time, with "1" standing for age 0 survival, "2" for age 1, etc.
We can create a model that sets survival equal (over age and time) and examine its PIMS, which reveal a single parameter index (1) for Phi.
> phi.dot<-mark(cohort.processed,cohort.ddl,model.parameters=list(Phi=Phi.dot,p=p.dot))
> PIMS(phi.dot,"Phi")
Specifying time variation creates the usual, time-specific CJS model formulation, which you can confirm by looking at the PIM
phi.t<-mark(cohort.processed,cohort.ddl,model.parameters=list(Phi=Phi.t,p=p.dot))
> PIMS(phi.t,"Phi")
Things get more interesting (and challenging) when we try to model age and time simultaneously. First consider the additive age + time model.
> phi.age_plus_t<-mark(cohort.processed,cohort.ddl,model.parameters=list(Phi=Phi.age.t,p=p.dot))
Examine the PIM for the Phi parameter and notice that we have 45 indices-- so we are back to an "all different" PIM.
> PIMS(phi.age_plus_t,"Phi")
However, the design matrix shows that the actual model is simpler
> dim(phi.age_plus_t$design.matrix)
[1] 46 18
We have 46 rows ( 45 for Phi and 1 for p) but only 18 columns; we have fewer rows because there are no age X time interactions.
Our last model (well not quite, but for the moment) tries to keep these interactions between age and time.
>phi.age_times_t<-mark(cohort.processed,cohort.ddl,model.parameters=list(Phi=Phi.age.T,p=p.dot))
It has the same PIM dimensions as the previous model but the design matrix now has 54 instead of 18 columns.
> PIMS(phi.age_times_t,"Phi")
> dim(phi.age_times_t$design.matrix)
[1] 46 54
In fact, there are more beta parameters (columns) than real parameters (row), which seems strange, and it is. If you take a look at the parameter estimates (beta, real) you will see a lot of extreme values, indicating that not all the parameters in this model are statistically identifiable. And this model assume p is constant. If we tried a "age x time" model for both Phi and p we would see that a large number of parameters cannot be identified.
Even the age-specific (first model we ran) model has some issues. If you examine the survival rates in late ages, you will see increasing SE and wider CIs. This is a function of the fact that there is little data to support these later estimates: the maximum age at recapture of any animal from cohort 1 is 8, from 2 is 7, etc., so there are fewer and fewer animals available for recapture in older ages. Previous researchers have gotten around this to some extent by specifying a "maximum age", beyond which all animals still alive are assumed to survive at the same rate.
I illustrate that here by setting the maximum age to 4 and re-doing the age-specific model. This requires a visit to the design data to reset ages >4 to 4. A bit of R code does the trick, keeping in mind that max.age (like age) needs to be treated like a factor, not a number. Notice that for the numerical comparison I used Age not age!
> #max age 4
> cohort.ddl$Phi$max.age<-as.factor((cohort.ddl$Phi$Age<4)*cohort.ddl$Phi$Age+(cohort.ddl$Phi$Age>3)*4)
>
Now we can run another model (using the modified design data),
>
> Phi.max.age=list(formula=~max.age)
> phi.age.max4<-mark(cohort.processed,cohort.ddl,model.parameters=list(Phi=Phi.max.age,p=p.dot))
The PIMS structure shows that we now have 5 instead of 9 age indices (ages 0 - 4)
> PIMS(phi.age.max4,"Phi")
Finally, we can collect the models in a results object, and do some comparisons.
> results<-collect.models()
> results
When we do this we see that the max.age model is clearly favored (AIC weight >0.85). We can do some nice plots of model estimates versus age with a little R code:
> n.occasions<-cohort.processed$nocc
> #max age model summaries
> Phi.hat<-phi.age.max4$results$real$estimate[1:5]
> lcl<-phi.age.max4$results$real$lcl[1:5]
> ucl<-phi.age.max4$results$real$ucl[1:5]
> summary<-data.frame(Phi.hat=Phi.hat,lcl=lcl,ucl=ucl,age=1:5-1)
with(summary,plot(age,Phi.hat,type="l",xlab="Age",ylab="Phi"))
> with(summary,matlines(age,lcl,lty=2))
> with(summary,matlines(age,ucl,lty=2))
Finally, we will export our data to MARK format for use with the MARK GUI
> #produce input for MARK
> export.MARK(cohort.processed, "cohort",replace = TRUE, chat = 1, title = "Cohort Example", ind.covariates = "all")
R script for all of the above computations.
Mark GUI
We will look at this same example but this time constructing models via the MARK interface. Input data are here in MARK format. We read these data into MARK in the Live Recaptures (CJS) model type as usual. We will skip this step and go right to the MARK project I built, which has the data loaded and the above models (previously run in RMark) already constructed and run (here in a zip file; download, open and extract the files to your working directory and open in MARK with the yellow "open results database" icon).
We can build a few of the same models (e.g., Phi(t)p(.) and Phi(.)p(.)) by using one of the built-in PIM models, but mostly we are going to have to "teach" MARK how age transition works by pulling up and modifying the PIMS for the default model, either using the PIM matrices or the PIM chart tool. However, MARK makes this relatively easy via a built in tool. Pull up the default (time-specific) PIM matrix for Phi and right click. Select the "age" option and "maximum age =9" (note that MARK numbers the ages 1 to 9 not 0 to 8 like RMark did!).
The resulting PIM will look something like this
Run (or save) this model, naming it Phi(age)p(.) (I'm assuming that you already made p constant, if not you just build Phi(age)p(t). No worries). We can build second age model but with maximum age =5 ( corresponding to max age =4 for RMark) in a similar fashion; it's Phi PIM will be
We can proceed and build all the other (and more) models we built in RMark. In particular, we need to build Phi(age*t)p(.) by changing the PIM matrix for Phi to "all different" (right click again). This results in a Phi PIM of
Constructing the Phi(age+t)p(.) model requires starting with the above PIM and then building a design matrix. Retrieve the Phi(age*t)p(.) model, go to "Design/Reduced" and specify 19 columns (parameters). Fill in the first (intercept) column with 45 1's (one for each PIM parameter in the DM row) . The next column is for age effect, and must be filled in with 1 when the cohort transitions to the age indicated by the column. This is made easier (but not easy) by using a series of 'partial identity' matrices. The same thing must be done for the time parameter (columns 10-18) to indicate the respective survival intervals for each cohort. I have done this for the first cohort :
All of this must be done for each cohort, taking care to 'stagger' the initiation of each diagonal (which will shorten for the later cohorts). I find this to be tedious and error prone. An alternative is to build the DM in Excel and import into MARK; even better is to build the DM in RMark, and either export to MARK or just conduct the analyses in RMark.
One more caution: if you are trying to build a Phi(.) model from a previous model, start with a Phi(t) not a Phi(age) model and setting it to constant. MARK seems to get confused about parameter numbering otherwise, although the parameter estimates are correct.
Aside from the Phi(age+t) model (which I did not build in MARK), the other models are identical to what we did in RMark, beyond the fact previously noted that the parameter count MARK provides sometimes is lower than that from RMark (which is based on the number of columns of the DM). This in turn will result in discrepancies between AIC values and (sometimes) model ranking from the 2 approaches. Beyond this, MARK provides the usual model comparison, model averaging, and other statistics. Limited graphing capability also exists in MARK, but again graphing is much more flexible in the R/ RMark environment.
Release cohort of unknown/ mixed ages
When we do not know (i.e., cannot identify) the ages of animal in each release class and we attempt to use a cohort-based approach, we have the problem identified at the opening of this session: we cannot know the ages when the animals are subsequently recaptured, unless they happened to have been of known age (e.g., natal) when released. This results in the capture histories typically being a mixture of animals of different ages, with no way in the model to properly assign age structure (other than "could have been" type hypothetical exercise, as in " this animal that I caught in year 6 looks like an adult, so it could have been born in 1 of the study.. or year 2, etc. ".
There are couple of ways around this, at least in part. One is to confine analysis to animals of some minimum age ("adults") and use time-specific CJS, on assumption that further age-specific variation in survival will be less important. One could use a cohort-based approach for those known to be released as natal animals, and potentially combine these results with the time-specific adult results. The challenge here is that of course the animals released as natal year will become "adults" and so should presumably be modeled the same way as those released as adults. The proper and preferred approach in fact recognizes this overlap and, given the right design data structure (always the key), allows for both time and age-specific modeling of all the parameters. That is the topic of the next section.