Individual covariates relate to how parameters vary among individual animals in the study, and are dealt with in RMark and MARK in a somewhat different way than time-specific covariates. As you recall, time-specific covariate are incorporated by modification of the design data (in RMark) or inputs into the design matrix (MARK) to include the covariate effects. By contrast, individual covariate require us to first add additional information to each capture history. The covariates are then referenced by labels in the design data (or design matrix), which point RMark/MARK to the covariate associated with the label, and then "drill down" to the individual capture history level to form the data likelihood (based on the observed CH and covariate). How exactly this works is a deeply guarded family secret passed down by the Whites for generations (not really... but that sounds good). You do not need to know the "secret" to make it work (but we will get a bit more transparent when we build models using Bayesian methods), but you will need to follow some basic steps as illustrated by the examples to follow.
We will use another example with simulated data (where the covariate effects is known to operate) to illustrate both RMark and MARK approaches. The data structure again involves a single group of animals captured and recaptured over 10 occasions according to a CJS design. A single covariate value (cov) is measured for each animal, and is assumed to affect survival (Phi). Because there are 450 marked animals in all, we require 450 covariate values to predict survival.
Implementation in RMark Script
To set up the analysis, the covariate values must be added to the data frame containing the capture histories. "cov" is a list containing the 450 covariates, this is easily done by adding a column cov to the datframe containing the capture histories (ch), e.g., data$cov=cov. I have already done this step for the example. The dataframe containing the capture histories and the covariate are available here in an R object and the R commands are here.
First, load in the object that contains the capture histories with covariate values appended.
> setwd(data_dir)
> require(RMark)
> load("ind.cov.example.data")
We can check out the first several lines of the data
> cap.data<-ind.cov.example.data$cap.data
> cap.data[1:10,]
ch cov
1 1010010010 0.55480451
2 1010011110 0.70528761
3 1000000000 0.06893459
4 1000000000 0.36743102
5 1100000000 0.42647732
6 1011000000 0.96820696
7 1010000000 0.61487024
8 1001000100 0.64234220
9 1000000000 0.63677239
10 1111000011 0.88687793
Processing the data and creating the design data:
> cap.data.processed=process.data(cap.data,model="CJS")
> cap.data.ddl=make.design.data(cap.data.processed)
We can formulate several models including covariate models, and collect them for model averaging:
> #formulas for general ("global") CJS model (sex*time)
> Phi.cov<-list(formula=~cov)
> p.dot=list(formula=~1)
> Phi.t=list(formula=~time)
> p.t=list(formula=~time)
> cov.est<-mark(cap.data.processed,cap.data.ddl,model.parameters=list(Phi=Phi.cov,p=p.dot))
> cov.est.pt<-mark(cap.data.processed,cap.data.ddl,model.parameters=list(Phi=Phi.cov,p=p.t))
> time.est<-mark(cap.data.processed,cap.data.ddl,model.parameters=list(Phi=Phi.t,p=p.t))
> null.est<-mark(cap.data.processed,cap.data.ddl)
> results<-collect.models()
>results
model npar AICc DeltaAICc weight Deviance
1 Phi(~cov)p(~1) 3 1909.496 0.00000 9.982739e-01 1903.4666
2 Phi(~cov)p(~time) 11 1922.216 12.72036 1.726072e-03 1899.8889
3 Phi(~1)p(~1) 2 1970.296 60.79954 6.263216e-14 356.9423
4 Phi(~time)p(~time) 18 1993.824 84.32839 0.000000e+00 347.6298
These results indicate clear support for the cov.est model (Phi~cov)p(~1). Examination of the parameter estimates for this model
> #get estimates
> cov.est$results$beta
> cov.est$results$real
shows beta values for Phi of 0.0669 (intercept) and 2.334 (slope). The real estimate of Phi of 0.77597 corresponds to a prediction of Phi at the mean observed covariate values (about 0.503 for these data), and this is the default specified by RMark and MArk. We can see this clearly by using the built in function covariate.predictions(), which allows us to apply the beta estimates to predicting Phi for any specified covariate value (observed or not). For the mean covariate value we have
> #predictions at mean covariate value
> df.mean<-data.frame(cov=mean(cap.data$cov))
> mod.pred<-covariate.predictions(cov.est,data=df.mean,indices=1)
> mod.pred
$estimates
vcv.index model.index par.index covdata estimate se lcl
1 1 1 1 0.5035801 0.7759659 0.01711751 0.7406439
ucl fixed
1 0.8077262
$vcv
[,1]
[1,] 0.0002930093
and for a different value (cov =0.8) we have
> #prediction for a different covariate values
> df.new<-data.frame(cov=0.8)
> mod.pred<-covariate.predictions(cov.est,data=df.new,indices=1)
> mod.pred
$estimates
vcv.index model.index par.index covdata estimate se lcl
1 1 1 1 0.8 0.8737113 0.01656763 0.8375191
ucl fixed
1 0.9027776
$vcv
[,1]
[1,] 0.0002744864
We can combine this feature with R data handling and graphic capabilities to generate some useful graphical output. For example, to graph the prediction and CI of Phi vs. a range 0-1 of covariates values under the cov.est model:
>
> df<-data.frame(cov=seq(0,1,0.1))
> mod.pred<-covariate.predictions(cov.est,data=df,indices=1)
> ests<-mod.pred$estimates
> plot(ests$covdata,ests$estimate,type="l",xlab="cov",ylab="Phi",main="model cov.est")
> matlines(ests$covdata,ests$lcl,lty=2)
> matlines(ests$covdata,ests$ucl,lty=2)
> #covariate at mean value
We can easily get model averaged predictions by applying the covariate.predictions() function to the result object instead of a single model:
> #model averaged
> avg.pred<-covariate.predictions(results,data=df,indices=1)
and then graphing as above. These features provide tremendous flexibility in manipulating and displaying model results, steps that tend to be more cumbersome in the MARK GUI.
As always, we can export the data into MARK format if we wish:
> #statements to produce input for MARK
> export.MARK(cap.data.processed, "IndCovExample",
+ replace = TRUE, chat = 1, title = "CJS Indiv Covariate example",
+ ind.covariates = "all")
We then use the IndCovExample.inp in the next section.
Implementation in MARK GUI
We read in the IndCovExample.inp data into the MARK session, specifying Live Recaptures, 1 group and 10 occasions and 1 covariate labeled "cov'. Again, I have already completed this step and created a project with several models; see zip folder containing the MARK project files and input data. You can either start from scratch (blue checked box in menu) or open the project files (yellow folder icon). As with time-specific covariates, usually we want to start with a fairly general PIM model and modify the design matrix to include the covariates. First I ran the Phi(t)p(t) model, then retrieved it and open up the design matrix tool. I modified the model to eliminate the time effects for Phi and replace the second column with the label "cov" (the "fill column / individual covariate tool will paste in the covariate label for you, which can copied down the appropriate number of rows). The resulting Phi(cov)p(t) model is:
Eliminating the time effects for p we have Phi(cov)p(.):
We can run these 2 models and an additional Phi(.)p(.) model (just eliminate the cov column in the above) and we get the 4 identical models to the RMark analysis. As we can see, the results are virtually identical, which should never surprise us-- this is just MARK being run with and without a GUI.
As with the time-specific example, we can do model averaging. To get the model-averaged estimates for Phi we open the model average tool and check to first row.
As with RMark, note that these model-averaged real parameters estimates for individual covariate models are by default based on prediction from the mean observed covariate value. This can only be changed by re-running the model(s) and inserting a user-specified covariate/
and then producing the model-averaged estimates based on this covariate input (but the estimated betas).
Thus, in order to obtain predictions over a range of covariate values in the MARK GUi, it is necessary to re-run the models for each value, do the averaging, and save the outputs. Obviously this is very cumbersome in contrast to the RMark approach, in which we can specify all the covariate values we want in a data frame and submit this and the model object to the covariate.predictions() function.
Special issues
The above examples cover many of the 'standard' covariate situations that arise, but inevitably there are cases that fall through the cracks:
Missing individual covariates
When including individual covariates in CMR, it sometimes occurs that covariate values are missing for 1 or more individuals. Unfortunately, there is no perfect solution to this problem in a likelihood framework, but there are several "work arounds" discussed in the MARK book chapter on individual covariates, summarized here with some opinions:
Remove all capture histories with missing covariates- This approach avoids assuming anything about the missing covariate, but results in a loss of data. I recommend this approach when there are a small number of missing covariates
Replace the missing covariate with the mean of the other values- This approach essentially assumes that the covariate arose as random outcome from the population mean. It will tend to pull the prediction back toward the average value. In my view there is little advantage to this approach, since it effectively replaces all the missing covariates with a constant regardless of the capture history.
Analyse with multistate models - This approach creates true and observable states associated with the covariate values. It requires discretization of covariate values and the estimation of additional parameters. Again, if there are just a few missing covariate values it is probably not worth the additional effort and complexity.
Not discussed here are Bayesian approaches, in which the missing covariates are treated like any other missing data (NA) , and predicted from prior distributions and posterior updating of the rest of the model parameters.
Time-varying individual covariates
Previously we have considered covariates that were either time-specific or individual-specific, that is
The covariate varies over time but is the same for all individuals each occasion, or
The covariate varies among individuals but remains the same over time.
More generally, covariates can and sometimes do vary among individuals and over time. The way this is dealt with in MARK/RMark is to create separate individual specific covariates, one for each time period, and then include these in the design matrix.
I illustrate this approach with the previous time-specific covariate example, where now the covariates are recast as "individual covariates" that in this instance have the same value among individuals but different values at each time occasion. Although this application is somewhat of a trick (to allow more general covariate prediction in RMark as we'll see) , the covariates could have varied among individuals, so the mechanics here illustrate the more general case (time- and individual-specific variation).
Implementation in RMark
First, we need to load in the example data object for the time specific problem, located here. Save this object in your working directory and load it into memory:
> setwd(data_dir)
> require(RMark)
> #object containing time-specific covariates and data
> load("example.data")
We can take a look at the first few capture histories and the time-specific covariates
> #capture histories
> #list the first several capture histories
> head(example.data$cap.data)
ch
1 1000000000
2 1110000000
3 1000000000
4 1000001011
5 1000100000
6 1001000000
> #time specific covariates
> example.data$cov$cov
[1] 0.4779753 0.8640208 0.1563517 0.9302479 0.9729868 0.9939771 0.4513658
[8] 0.9179416 0.4315568
The next step is to convert these to individual covariates, one for each time period, that are (in this example) identical for each animal. More generally, we would read in here the time-specific covariates for all the individual animals (which would, in generally, differ between animals)
> #convert to "individual" covariates
> n.ind<-nrow(example.data$cap.data)
> cov1<-rep(example.data$cov$cov[1],n.ind)
> cov2<-rep(example.data$cov$cov[2],n.ind)
> cov3<-rep(example.data$cov$cov[3],n.ind)
> cov4<-rep(example.data$cov$cov[4],n.ind)
> cov5<-rep(example.data$cov$cov[5],n.ind)
> cov6<-rep(example.data$cov$cov[6],n.ind)
> cov7<-rep(example.data$cov$cov[7],n.ind)
> cov8<-rep(example.data$cov$cov[8],n.ind)
> cov9<-rep(example.data$cov$cov[9],n.ind)
> cap<-example.data$cap.data
Tag these on to the end of the capture history records
> cap$cov1=cov1
> cap$cov2=cov2
> cap$cov3=cov3
> cap$cov4=cov4
> cap$cov5=cov5
> cap$cov6=cov6
> cap$cov7=cov7
> cap$cov8=cov8
> cap$cov9=cov9
Displaying the first several augmented capture histories:
> head(cap)
ch cov1 cov2 cov3 cov4 cov5 cov6 cov7 cov8 cov9
1 1000000000 0.4779753 0.8640208 0.1563517 0.9302479 0.9729868 0.9939771 0.4513658 0.9179416 0.4315568
2 1110000000 0.4779753 0.8640208 0.1563517 0.9302479 0.9729868 0.9939771 0.4513658 0.9179416 0.4315568
3 1000000000 0.4779753 0.8640208 0.1563517 0.9302479 0.9729868 0.9939771 0.4513658 0.9179416 0.4315568
4 1000001011 0.4779753 0.8640208 0.1563517 0.9302479 0.9729868 0.9939771 0.4513658 0.9179416 0.4315568
5 1000100000 0.4779753 0.8640208 0.1563517 0.9302479 0.9729868 0.9939771 0.4513658 0.9179416 0.4315568
6 1001000000 0.4779753 0.8640208 0.1563517 0.9302479 0.9729868 0.9939771 0.4513658 0.9179416 0.4315568
By the way, we are labeling the time-specific individual covariates cov1 - cov9 to correspond to the time labels for the 9 occasions, under the default that the first time occasion is time=1. If in the data processing we had used a different first.time, we would need different covariate labels. E.g., if first.time=2013 we would have cov2013, cov2014, etc. This is important because RMark needs to know what time element of the design to associate each covariate with.
Once we process the augmented capture data and create the design data, we are ready to build some models
> cap.processed=process.data(cap,model="CJS")
> cap.ddl=make.design.data(cap.processed)
We'll start with a single model that includes the time- and individual specific covariate
> Phi.cov<-list(formula=~cov)
> p.dot=list(formula=~1)
> cov.est<-mark(cap.processed,cap.ddl,model.parameters=list(Phi=Phi.cov,p=p.dot))
Note that we use "cov" to denote the individual covariate. RMark recognizes that cov is the single individual covariate that varies over time (so that the vector of cov2 values is different than cov1, etc.). That is, the model establishes a relationship between Phi and the covariate via a single coefficient, with the covariate allowed to vary over individuals and time:
logit(Phi) = beta_intercept + beta_cov*Cov(indiv, time)
In this example, the beta estimates are
> cov.est$results$beta
estimate se lcl ucl
Phi:(Intercept) 0.2749044 0.0953082 0.0881003 0.4617086
Phi:cov 2.1478457 0.1638474 1.8267047 2.4689866
p:(Intercept) -0.4019823 0.0242141 -0.4494419 -0.3545227
This is made clear by looking at the design matrix:
> cov.est$design.matrix
Phi:(Intercept) Phi:cov p:(Intercept)
Phi g1 c1 c1 a0 t1 "1" "cov1" "0"
Phi g1 c1 c1 a1 t2 "1" "cov2" "0"
Phi g1 c1 c1 a2 t3 "1" "cov3" "0"
Phi g1 c1 c1 a3 t4 "1" "cov4" "0"
Phi g1 c1 c1 a4 t5 "1" "cov5" "0"
Phi g1 c1 c1 a5 t6 "1" "cov6" "0"
Phi g1 c1 c1 a6 t7 "1" "cov7" "0"
Phi g1 c1 c1 a7 t8 "1" "cov8" "0"
Phi g1 c1 c1 a8 t9 "1" "cov9" "0"
p g1 c1 c1 a1 t2 "0" "0" "1"
We can use the estimates from this model to make predictions . First, we can see that the real parameter estimates for Phi for each occasion
> #get estimates of real parameters
> cov.est$results$real
are the same values predicted from the cov.est model taking as fixed the mean (in this case, same for all individuals) values for each time occasion> #predictions at mean covariate value
> df.mean<-data.frame(cov1=mean(cap$cov1),cov2=mean(cap$cov2),cov3=mean(cap$cov3),cov4=mean(cap$cov4),cov5=mean(cap$cov5),cov6=mean(cap$cov6),cov7=mean(cap$cov7),cov8=mean(cap$cov8),cov9=mean(cap$cov9))
> mod.pred<-covariate.predictions(cov.est,data=df.mean,indices=1:9)
> mod.pred$estimate
It is, however, more interesting to visualize the more general prediction-- for a specified range of covariate values, not necessarily observed in the study. We can do this very simply by (1) specifying a given list of values, (2) applying the same list to each time period, (3) forming the predictions. We then only need to look at the first real parameter index to display the predictions, since we've essentially folded "time" variation into "individual" variation. Here is script to do this:
> #specified covariate predictions single model
> df<-data.frame(cov1=seq(0,1,0.1),cov2=seq(0,1,0.1),cov3=seq(0,1,0.1),cov4=seq(0,1,0.1),cov5=seq(0,1,0.1),cov6=seq(0,1,0.1),cov7=seq(0,1,0.1),cov8=seq(0,1,0.1),cov9=seq(0,1,0.1))
> mod.pred<-covariate.predictions(cov.est,data=df,indices=1)
> ests<-mod.pred$estimates
> ests
Finally, we can plot the predictions against any one of the covariate columns (since they're all the same it doesn't matter) and display confidence intervals.
> plot(ests$cov1,ests$estimate,type="l",xlab="cov",ylab="Phi",main="model cov.est")
> matlines(ests$cov1,ests$lcl,lty=2)
> matlines(ests$cov1,ests$ucl,lty=2)
This approach is readily extended to model averaging. First, let's create some additional models
>#make some additional models>#time only>Phi.t=list(formula=~time)>p.t=list(formula=~time)>time.est<-mark(cap.processed,cap.ddl,model.parameters=list(Phi=Phi.t,p=p.t))>null.est<-mark(cap.processed,cap.ddl)>cov.est.pt<-mark(cap.processed,cap.ddl,model.parameters=list(Phi=Phi.cov,p=p.t))>results=collect.models()
Then we apply prediction to the the list of models and do model averaging
> #covariate predictions model averaging
> mod.pred<-covariate.predictions(results,data=df,indices=1)
> ests<-mod.pred$estimates
> plot(ests$cov1,ests$estimate,type="l",xlab="cov",ylab="Phi",main="model averaged")
> matlines(ests$cov1,ests$lcl,lty=2)
> matlines(ests$cov1,ests$ucl,lty=2)
Finally, we can export the capture and covariate data to MARK input format:
> #this exports just the inp file, not all the other junk...
> export.chdata(cap.processed, filename="IndividtimeExample",replace=T,covariate=c("cov1","cov2","cov3","cov4","cov5","cov6","cov7","cov8","cov9"))
Obviously, this approach is a bit more involved than simply including time-specific covariates in the design data. In addition, treating time-specific as individiual covariates is computationally more involved (since every individual capture history and covariate value is used separately in the likelihood). The approach is thus not justified if interest focuses solely on time-specific covariates, and at the observed time specific values (not future or hypothetical values). However, in this case, it was justified because we were specifically interested in getting both model-specific and model-averaged real estimates for a range of hypothetical covariate values. There are apparently a few other ways to do this (in RMark-- none that I know of in MARK) but the above works and is, by comparison, straightforward.
Script file to perform the above computations
Implementation in MARK GUI
I have taken the MARK formatted input file created by the export.chdata() function above and read it into the MARK GUI, specifying 10 occasions, 1 group, and 9 individual covariates. I then assigned the names cov1- cov9 to these; I could have retained the default names of Var1 - Var9, but for consistency with the RMARK results I kept the names as cov1 - cov9.
I then ran 2 of the standard PIM models that we ran previously in RMark: Phi(t) p(t) and Phi(.)p(.). Next I retreived the Phi(t) p(t) model and open up an 18-column design matrix, which I then edited to create the covariate column. By right clicking in this column, selecting 'select individual covariates', and selecting all (cov1-cov9) , the labels cov1-cov9 are pasted in the corresponding rows of column 2, as in the second figure below
Once the covariate column is created, add an intercept and time effect for p and run (or save) as model Phi(cov) p(t).
Finally, Phi(cov) p(.) is created simply by deleting the time effects for p (colums 4-11)
You can compare the design matrices for these models to the RMark versions and see that they are equivalent. After we run all 4 models we get:
We can then get real estimates of the parameters under each model, or model averaged estimates. As with the previous individual covariate example, the estimates where cov is involved are based on predictions at the mean observed covariate value. If predictions at other covariate levels are needed the models must be re-run and values must be manually inserted into the MARK menu. How tedious! I have confirmed that the model averaged mean covariate predictions match the RMark results, you are welcome to repeat this for a bunch of other values if you have nothing better to do.
Zip file containing MARK input and project files
Next: Review exercises