Regardless of whether we use RMark or MARK, we need to be familiar with some basic concepts for structuring a closed CMR model. As with any MARK model, there are 2 essential elements: the Parameter Index Matrix (PIM) and the Design Matrix (DM). Again, the PIM is basically a list of parameters by different types, with an indexing number associated with each. The parameter types and there indices are determined by (1) the basic data type (e.g., closed vs . open CMR), and (2) the dimensions of the data (number of sampling occasions and groups).
Let's take a closed CMR problem with a single group (stratum) and 5 capture occasions. This model and data structure would have a PIM with 3 different parameter types: capture probabilities (p), recapture probabilities (c) and abundance (N). A basic PIM allowing these 3 parameter types to have distinct values from each other and over occasions would number the parameters:
p1-p5 1:5
c1-c-4 6:9
N 10
That is, there are 5 distinct capture probabilities allowed over the 5 occasions, followed by 4 recapture probabilities (note that while these are labeled c1-c4 they refer to probability of recapture of a previously marked animal on occasions 2-5). Finally, the last of these parameters is abundance (N), which because we are assuming closure obviously does not vary. Note that it is important that parameters have distinct indices, that is ordinarily we will not allow parameters of different types to "share" an index. An exception to this will be seen below, where we allow p and c to share indices (since under some models we want these parameters to be the same).
The DM operates on a basic model structure define by the PIM, and creates particular models under specific assumptions. For example
parameter p-intercept c-intercept N-intercept
p1 1 0 0
p2 1 0 0
p3 1 0 0
p4 1 0 0
p5 1 0 0
c1 0 1 0
c2 0 1 0
c3 0 1 0
c4 0 1 0
N 0 0 1
specifies a model in which p and c are constant over time but different from each other (and of course from N); this model has 3 parameters. By comparison the model
parameter p intercept N
parameter p-intercept N-intercept
p1 1 0
p2 1 0
p3 1 0
p4 1 0
p5 1 0
c1 1 0
c2 1 0
c3 1 0
c4 1 0
N 0 1
indicates that capture and recapture are the same (p=c) and constant, and has 2 parameters. Note that the columns of the design matrix wind up corresponding to the number or parameters we are trying to estimate in the model.
RMark approach
We will start out creating models using the scripting language in RMark, as will be seen in the examples below. RMark essentially combines the PIM and DM steps into one step, eliminating parameter indices if these are not ultimately used in the design. To illustrate, if wind up building a model with both time and behavior effects, we would use a script similar to this to define capture probability
(time+c)
and the model would need to keep track of the 1-5 time indices for capture and recapture. If by contrast we were only interested in a behavior effect, a model like
(~1+c)
would be used, and there would be no need to keep track of time indices (since the effect do not involve time). This will be clearer with specific examples, below. The ~1 indicates that we are replacing the time effects for occasions 1-5 with an intercept.
Also note that we could in theory create a model in which the number of parameters estimated equaled the original list in the PIM (in the above example 10), but in fact that will almost never be the best model, and it will usually be better to construct a series of simpler, alternative models.
Setting up an analysis in RMark
The first step in a RMark (or MARK) analysis is getting the data in the right format, which for RMark will be in a special dataframe. See the previous discussion on data formatting. Here we're going assume we have our data formatted, and use a canned example to illustrate model building and estimation. We'll then come back and show how to put data into RMark.
First we need to load the RMark library and fetch a data example. The following commands will fetch a classic example by Edwards and Eberhardt.
> library(RMark)
> data(edwards.eberhardt)
The command
> edwards.eberhardt
displays the data. You will see that the data is a matrix of 76 rows and 18 columns, containing 1's and 0's. The rows are the capture histories for individual animals (76 in all) and the 1/0 indicates capture (1) or not capture (0) in each of 18 occasions.
The basic syntax for reading in the data, creating a model, and getting estimates is fairly simple, and looks generically like this
model_object<-mark(data,model, model.parameters)
"Data" tells us where the data is, "model" specifies the model structure (in this case, closed CMR),and model.parameters will indicate the specific model we wish to build. The command
mb<-mark(edwards.eberhardt,model="Closed")
will actually create a simple (but not the simplest) model for this structure and store the model and any results in the object "mb". By default, the basic model has a single parameter for each parameter type (p, c, and N) , so no variation in capture over time. There are therefore 3 parameters. We can run this and see some results right away:
mb<-mark(edwards.eberhardt,model="Closed")
.
> mb$results$real
estimate se lcl ucl fixed note
p g1 t1 0.0868518 0.0215415 0.0528969 0.1393948
c g1 t2 0.0809816 0.0095560 0.0641189 0.1017966
N g1 93.8918520 10.8161480 81.9921320 129.4231100
> mb$results$AICc
[1] 381.5498
Note by the way that the explicit formulation of this model gives exactly the same results
pdot.cdotshared=list(formula=~1+c,share=TRUE)
mb_e<-mark(edwards.eberhardt,model="Closed", model.parameters=list(p=pdot.cdotshared))
So, under this model, we get estimates of N of approximately 94, of capture of p =0.087, and of recapture of 0.081, so we might conclude that there is (slight) evidence of trap shyness. But do we really need this much complexity in the model? We do this by creating a model in which p=c, that is the probability of catching an animal that is already marked is no different than catching an unmarked animal. This requires 2 lines: 1 to specify a formula for how p is modeled, and the second that incorporates that formula in a list (actually a 'list of lists'). We use the 'share= TRUE' convention because this will allow p and c (technically different parameter types) to take on common values. The formula ~1 specifies an intercept only for the design matrix, so constant p=c.
>pdotshared=list(formula=~1,share=TRUE)
> m0<-mark(edwards.eberhardt,model="Closed", model.parameters=list(p=pdotshared))
STOP NORMAL EXIT
....
> m0$results$real
estimate se lcl ucl fixed note
p g1 t1 0.081955 0.008816 0.0662522 0.1009771
N g1 96.258772 6.878606 86.6032790 114.7066900
> m0$results$AICc
[1] 379.6029
Note that the estimate of N is a bit higher (about 96) but also that the AIC value is lower, 379.6 vs. 381.5. As we have seen in previous labs, we use AIC to guide model selection, or preferably, we can simply average across the alternative models, and take model uncertainty into account. We'll come back to this after we run a few more models.
Let's run an additional model, this time allowing p to vary among occasions (time) but p=c otherwise.
> ptimeshared=list(formula=~time,share=TRUE)
> mt<-mark(edwards.eberhardt,model="Closed",model.parameters=list(p=ptimeshared))
> mt$results$real
estimate se lcl ucl fixed note
p g1 t1 9.465890e-02 3.073600e-02 4.921960e-02 1.743547e-01
p g1 t2 8.414130e-02 2.906480e-02 4.202310e-02 1.613627e-01
p g1 t3 9.465900e-02 3.073600e-02 4.921970e-02 1.743549e-01
p g1 t4 1.472477e-01 3.775670e-02 8.740740e-02 2.373984e-01
p g1 t5 8.414130e-02 2.906480e-02 4.202310e-02 1.613626e-01
p g1 t6 5.258780e-02 2.318180e-02 2.181250e-02 1.213952e-01
p g1 t7 1.893189e-01 4.228050e-02 1.197932e-01 2.860811e-01
p g1 t8 1.156946e-01 3.377600e-02 6.410890e-02 1.999219e-01
p g1 t9 4.207010e-02 2.079490e-02 1.572280e-02 1.077360e-01
p g1 t10 3.155280e-02 1.806120e-02 1.012560e-02 9.401660e-02
p g1 t11 1.682832e-01 4.011370e-02 1.034393e-01 2.619019e-01
p g1 t12 5.258820e-02 2.318190e-02 2.181280e-02 1.213957e-01
p g1 t13 2.103520e-02 1.478950e-02 5.230800e-03 8.071680e-02
p g1 t14 7.362350e-02 2.726840e-02 3.502820e-02 1.482131e-01
p g1 t15 9.465910e-02 3.073610e-02 4.921970e-02 1.743551e-01
p g1 t16 2.937879e-09 5.694183e-06 -1.115766e-05 1.116354e-05
p g1 t17 4.207070e-02 2.079510e-02 1.572310e-02 1.077369e-01
p g1 t18 1.051769e-01 3.230150e-02 5.658920e-02 1.872039e-01
N g1 9.507803e+01 6.613440e+00 8.585798e+01 1.129215e+02
>
> mt$results$AICc
[1] 354.5968
Now we get a lower AIC (the lowest yet) and a value of N in between the last 2 (95 vs. 94 and 96). Maybe we're getting close.
So far we have run 3 of the "classic" models (or their equivalents) discussed by Otis et al. (1978):
Otis notation RMarknotation Parameter variation
M0 p(~1)c()N(~1) No variation in p, p=c
Mb p(~1)c(~1)N(~1) Behavior variation (p !=c) but constant in time
Mt p(~time)c()N(~1) Time variation in p, p=c
The AIC results for these models are summarized below
model npar AICc DeltaAICc weight Deviance
3 p(~time)c()N(~1) 19 354.5968 0.00000 9.999949e-01 305.2648
1 p(~1)c()N(~1) 2 379.6029 25.00616 3.715168e-06 364.8259
2 p(~1)c(~1)N(~1) 3 381.5498 26.95302 1.403539e-06 364.7640
and we can use the AIC weights to calculate a model-averaged estimated for N (code attached to do this):
> avg<-model.average(ee.results)
> nmarked<-nrow(edwards.eberhardt)
> N<-subset(avg,par.index==36)
> Nhat<-N$estimate+nmarked
> seN<-N$se
> print("Model averaged N and se")
[1] "Model averaged N and se"
> Nhat
[1] 95.0781
> seN
[1] 6.613456