Our ability to model parameters in CMR depends on the structure of the data, including
Time span (whether the population is considered closed or open)
Number and type of captures and re-encounters
Inclusion of group or covariate effects
Type and number of parameters and nature of any modeling of variation in parameters we desire
Any other specific assumptions we wish to make about the parameters
I will cover here some elemental ideas in CMR inference, illustrated with 2 of our important data structures:
Closed abundance CMR
Open CMR under Cormack-Jolly-Seber assumptions
Parameters, data structures, and alternative models
Closed CMR
Our closed CMR models generally will have 2 types of parameters:
Abundance (N), which is our main parameter of interest
Capture (p) and recapture (c) probabilities, related to modeling the detection (capture) process
The data will be encounter (capture) histories of individual animals that have been caught one or more times. These are typically summarized into an array with rows for the individuals and columns for the capture occasions. For example
10001
01001
00001
denotes 3 animals that have been captured, the first on occasion 1 and 5, the second 2 and 5,and the third only on 5. So in general there will be n*k elements, where n= the number of different animals captured and k=the number of capture occasions.
We will use the data and one or more models containing alternative assumptions about parameter variability to get estimates of N, p, and c using maximum likelihood (and later Bayesian) approaches. If we have a single stratum, then N is a single constant, but p and c may vary over time or with respect to previous capture of an individual. The are many possible models, here are a few:
p and c different (so recapture higher or lower than initial capture) but constant over the 5 occasions
p=c and constant
p and c time varying and different
p=c but time varying
CJS
Although the data look superficially the same, CJS assumptions are actually quite different than closed CMR for 2 reasons:
The population is assumed open between sampling periods
Analysis focuses only on animals that have been previously marked. For that reason
Abundance is no longer estimated (since that requires information on unmarked animals)
Focus in on estimation and modeling of survival probabilities between periods (phi) and recapture probabilities (p), so these are the 2 parameter types
As with abundance estimate we use the data and maximum likelihood to obtain inference on phi and p, under alternative assumptions about how they vary. For example
phi and p both constant
phi constant but p time varying
p constant but phi time varying
group or covariate effects for phi and /or p
Distributions and likelihood
Although we are going to use many methods where the computations are performed in a "black box", it is important to know some basic concepts and terminology. First, our CMR data are generally going to be modeled as having arisen from particular statistical distributions, notably the binomial or the multinomial. For example, there are N animal in a population and we take a capture sample, the number n we obtain can thought of as a binomial outcome with probability p from a sample of N. A statistical likelihood on the other hand assumes that we have collected the data (say a list of capture histories) and then asks: under our assumed model, what is the most likely value for N (and p, c and other parameters)? We get estimates by finding values of N and other parameters that maximize the likelihood (hence are known as maximum likelihood estimates or MLE).
Model selection and multi-model inference
By definition the MLE are the values that maximize the likelihood, but they do so by assuming a model, and therefore the estimates will differ among models. This raises the question of which estimate or combination of estimates should be used for inference, and is not trivial: selecting an overly complex model will result in poor estimate precision, but too simple of a model may result in biased estimates. We will use a statistic based on the likelihood called Akaike's Information Criterion or AIC to balance model complexity and parsimony. We will used AIC 2 ways:
Sometimes to select one (or a few ) models for exclusive use in inference, but
More commonly, to average over multiple models, retaining information about model uncertainty in the computation of variances and confidence intervals.
We will use AIC and a multi-model inference approach for virtually all of our CMR analyses, so please get familiar with it.
Simulating data
Although it sound like a tool reserved for geeks, simulation is actually a powerful means of both understanding your model, as well as visualizing what data should look like, if your model assumptions are "true". The basic idea is simple, illustrated for a closed population with 3 sample occasions:
We suppose that data could legitimately arise from a particular set of assumptions, say N = 100 and constant capture probability of p =0.3 per sample occasion
We treat the population as known and for each of the N=100 individuals draw a sample from the assumed distribution (in this case, a Bernoulli or binomial with n=1 and p=0.3). In this example the code would be x<-rbinom(3,1,0.3) for 3 replicates of a n=1 binomial p=0.3
This results in a capture (x[i]=1) or no capture (x=0) at each occasion
We repeat this for each 'animal' in the population (n=100 times)
Each line of data (the x's) is used as a capture history. We can then use these data to estimate N and p under our model
Next: Review Exercises