Basic approach to simulating and modeling CJS data in BUGS/ JAGS
I believe (and smart people like Marc Kéry agree) that, while "real" data is always a good thing to have, simulations can be even better, for a couple of reasons. First, unlike 'real' data, you know what the true parameter values and assumptions were that generated the data-- because you picked them! Second, and important for Bayesian analysis, once you've written the code to generate the simulated data, you are most of the way there to writing the code for the Bayesian model, because the assumptions and structure of the 2 will go hand in hand. To take a simple case, to generate 10 binomial samples with n=100 and p=0.4 we would use R code like this:
x<-rbinom(10,100,0.4).
The BUGS code to "estimate" p with these data would be a little different, but very similar looking .That is, the likelihood for x would be specified as
for (i in 1:10)
{
x[i]~dbin(p,n[i])
}
where x and n are vectors containing the data (n could vary between you samples). Since this is a Bayesian analysis we need to put a prior on p, for example.
p~dunif(0,1)
That's pretty much it. Essentially the inferential model is the simulation model worked backwards: the simulation model start by assuming p and a model structure, then generates data; the inference model starts with data and an assumed model structure and gives you inference on p.
CJS data involve a generalization of the above steps to include the 2 processes that determine what the data are, under a particular set of assumptions. If we start with the simplest CJS model possible, we have 2 parameters: Phi the probability of survival between samples, and p the probability of recapture at a given sample, and we assume both are homogeneous over samples and among animals. The data are then generated by a series of Bernoulli trials ("coin flips" or binomials with n=1 and probability Phi of success), given the important initial condition that we can only ever see in our sample animals that are 1) initially marked and released or 2) recaptured following initial release. That means that at every sample occasion we have a certain number of animals that we are going to release with marks that have been caught for the first time, and these are a given ( get a "1" in the capture history column). So basically in our capture history matrix every animal we ever capture has at least one "1" (the occasion at which it was first captured) and may be caught again later. So how does "caught again later" happen? It happens when an animal 1) survives to a later occasion (with probability Phi over each survival interval) and 2) is captured (with probability p).
Both the simulation code and the Bayes code are here. Let's walk through some of the logic.
To illustrate, take a single individual, and assume it is released at occasion 1. We flip a coin with probability =Phi to see if it survives to occasion 2. If it does, we flip another coin (probability = p) to see if it's captured, and if so, give the capture history a "1" in that column. If the animal is not captured, it gets a 0, but may subsequently get a 1. But if the animal dies, all subsequent columns must of course be 0. In this way we step through all the released animals and all the capture occasions and fill in a capture history matrix with 1's and 0's. (Animals that we never capture, of course, have all zeros and naturally do not appear in the model. Since in CJS we condition on only the animals released (caught at least once) we don't care about these other animals. A bit later when we talk about abundance estimation we do care about the unmarked animals, since they are part of estimating N.)
The Bayesian model essentially reverses the process. Given an observed (or simulated) matrix of 1's and 0's, the model asks: what values of Phi and p could have given rise to these data? We use a nice little trick to help us here, called a latent variable. A latent variable is a variable or state of the system that we really can't observe, at least not completely, but would like to model. In our case, the latent variable z is the state of whether animal i is alive at time t (and thus can potentially be captured), with 1= alive and 0 = dead (or permanently emigrated).
mu1[i,t]<-phi[i,t-1]*z[i,t-1]
z[i,t]~dbern(mu1[i,t])
This result is then used to determine the probability that a capture occurs, which of course is 0 if z[i,t]<-0
mu2[i,t]<-p[i,t-1]*z[i,t]
Finally, the data-- the matrix of observed 1s and 0s - is used with a Bernoulli likelihood to provide posterior updating of the model.
y[i,t]~dbern(mu2[i,t])
Note that p and phi are both allowed to vary over both individuals (i) and occasions (t). This is a very general code that we will modify to create specific models. If we step back a bit earlier in the code we see, first, prior distributions for 2 parameters: mean.p and mean.phi. Then in the loop below we set phi[i,t] and p[i,t] to these parameter values. We could have just use constants in the program throughout, but then we'd have to go back and re-write much of the code every time we create a new model. This is prone to errors, and also is not fun.
mean.p~dunif(0,1)
mean.phi~dunif(0,1)
for (i in 1:nind)
{
for (t in f[i]:(n.occasions-1))
{
phi[i,t]<-mean.phi #this allows us to expand in the future!
p[i,t]<-mean.p
}
}
More Bayes tricks
There are a couple of tricks that Kéry and Schaub use that take advantage of what we know about the CJS data structure, and also wind up speeding up computation. First, as mentioned, the CJS model is conditioned on the first release of animals, e.g.,
01000
doesn't enter the likelihood until occasion 2. The function get.first is used together with an 'apply' (a built-in R function) to look for the first time that 1 occurs and assign this value to f for the corresponding capture history. This is then used in the code to tell the model to start at that occasion, not before. Second, there are a large number of observations where we know that the latent variable z must be 1, because the animal was observed before and after that occasion (any history with a 0 in between 2 1's fits the bill). For these we do not have to sample z from the Bernoulli, because we know it is 1. Therefore they are treated as data. This is done by the function known.states.cjs(CH), which is incorporated into the data function for BUGS or JAGS. Likewise, for these we must assign NA values to the state z, since it is not unknown but is observed; this is done in the function cjs.init.z.. These last 2 steps are not absolutely necessary but a) avoid crashes, and b) speed things up considerably. But your biggest speed advantage is switching to JAGS from BUGS.
BUGS vs. JAGS
BUGS and JAGS have almost identical syntax (both in the code and in the control statements) and in some models you can run either with no changes. Depending on the JAGS interface (I am using rjags) you might need to change a few things between BUGS and JAGS. For instance exponentiation in BUGS is pow(base, exponent) but in JAGS is base^exponent (like in R). JAGS is much, much faster than BUGS, for reasons I don't completely understand. However, BUGS tends to be easier to 'debug' (ironic given its name). So I often build and debug a model in BUGS (OpenBUGS or R2OpenBUGS) and once it's running make whatever minor changes are needed to run it in JAGS (usually I use rjags but R2jags works too). In the example here I am using rjags.
Adding a random time effect
There are a number of ways to add a random time effect to the CJS model, focusing here on temporal variation in Phi. Although we can use other distributions, usually a normal distribution works well (after suitable transformation) and is easily generalized to other, more complex hierarchical models. Ignoring p (which we will keep the same in this model) start as before by putting a uniform prior on mean.phi
mean.phi~dunif(0,1)
We then transform this to the logit scale and specify a fairly wide, uniform sd for the random effect, which we convert to precision (tau).
mu<-log(mean.phi/(1-mean.phi))
sigma~dunif(0,10)
tau<-pow(sigma,-2)
sigma2<-pow(sigma,2) #optional if we want to see variance too
Then a random epsilon is added to the mean, where epsilon has a mean of 0 and precision of tau. Note that in Bayesian code it doesn't matter that we used epsilon in a statement before it was 'defined' later. These models are not order dependent, and everything gets updated at once.
for (i in 1:nind)
{
for (t in f[i]:(n.occasions-1))
#PUTTING THINGS ON A LOGIT SCALE WILL MAKE THINGS MUCH EASIER LATER (E.G. FOR ADDING COVARIATES, OTHER HIERARCHICAL STRUCTURE)
{logit(phi[i,t])<-mu+epsilon[t]
p[i,t]<-mean.p
}
}
for (t in 1:(n.occasions-1))
{
epsilon[t]~dnorm(0,tau)
}
This random vector of Phi, together with the updated value for p, gets incorporated into the CJS model and the data are used to update all the model quantities to get posterior values.
Finally, we can (pretty trivially) create a model with fixed time effects for Phi and p, just by vectorizing the parameters in the prior distributions.
#priors
for (t in 1:(n.occasions-1))
{
phi[t]~dunif(0,1)
p[t]~dunif(0,1)
}
We now have 3 models:
A constant Phi and p model m0
A random time effect Phi and constant p model
A fixed time effect Phi and p model
The code runs all 3 models for the simulated data. For comparison, RMark is used with the (reformatted) data to get ML estimates for the first and third models. Note: when you run this code, you may notice that some of the intermediate DIC computation return "inf" values for the penalized deviance (DIC). This is because these values are undefined at some of the data values; I have coded a workaround summary that excludes these undefined values and summarizes DIC over the range where it is in fact defined.
Adding other structure (covariate models, multi-scale hierarchical models)
As suggested, this code can be fairly easily generalized to more complex models. Some obvious ones:
Adding a time or individual covariate to model Phi or p. Basically this is just a fixed covariate effect added to either the mean or random effect model, e.g.
logit(phi[i,t])<-mu+epsilon[t] +beta*covariate[i]
would add a fixed individual covariate effect to a temporal random effect model (obviously the covariate would have to be read as data, and a prior specified for beta)
Adding a random time effect for p (obvious)
Allowing individual variation (heterogeneity) in Phi or p via either continuous or finite mixture models.
For many problems, further hierarchical structure could make sense in a model. Examples include:
Repeated measures (time effects nested in individuals)
Individual animals as repeated measures in sites which in turn are nested in landscapes, with measurable attributes at each level.
A word on efficiency vs. flexibility
The above approach allows for tremendously flexibility in modeling, allowing for construction of models based on individual and cohort- specific variation. However, this flexibility comes with a price, in that that depending on the data dimensions there can be a large number of computations. The Bernoulli modeling approach also does not take full advantage of standard CJS assumptions, that ignore cohort effects. That is, under the usual one-age CJS model we treat animals that are re-captured and re-released as members of the release cohort at the release time, and do not retain information about the original release cohort in their subsequent recaptures. This allows for a condensation of the data into a form known as an m-array, which resembles a band recovery matrix and analysis by multinomial rather than Bernoulli likelihoods, with a great reduction in computing time. The m-array approach works for any standard CJS model with time (but not cohort) effects, including random time effects, so in fact we could have used this approach for the models we just did above. I present an abbreviated version of the Kéry and Schaub code that they produce in section 7.10 of their book here that simulates data, collapses the data into an m-array, and runs a CJS model (the constant Phi and p model); Kéry and Schaub's code has a few more, easily added features including some Bayesian goodness of fit testing. The code can be easily modified to run the random effect and fixed time effects models, and also implemented in JAGS instead of BUGS for additional speed.
Next: Bayesian trickery: using data augmentation to estimate abundance via latent variables