KNOWN FATES ANALYSIS WITH COVARIATES IN RMARK
Some of the most interesting and important applications of statistical modeling go beyond the estimation of basic parameters such as survival, and delve into describing and explaining heterogeneity in survival and other vital rates (ideally under experimental situations, as when factors have been manipulated and subjects assigned to treatment groups). Programs MARK and the RMark package have tremendous capability for including this type of modeling, assuming that the appropriate sampling or experimental design has been followed.
I illustrate this situation with an example black duck telemetry data set (Conroy et al. 1989. Winter survival of female American black ducks on the Atlantic coast. Journal of Wildlife Management 53:99-109). The data are contained in a built-in RMark dataframe, which we can access via the commands
>library(RMark)
>data(Blackduck)
Listing the first few lines of data we have...
> Blackduck
ch BirdAge Weight Wing_Len condix
1 1100000000000000 1 1.16 27.7 4.19
2 1011000000000000 0 1.16 26.4 4.39
3 1011000000000000 1 1.08 26.7 4.04
4 1010000000000000 0 1.12 26.2 4.27
5 1010000000000000 1 1.14 27.7 4.11
Each line is a capture history (in this case for the first 5 animals) in "LDLD" format, in which "L" stands for observation at a sampling interval (1=observed, 0=not observed) and "D" stands for survival (0) or death (1) in an interval. This is followed by 4 individual covariates: age (1= adult, 0=subadult), body mass in kg, wing chord in cm, and a condition index = mass/wing*100. Note that in the RMark dataframe BirdAge is treated as a factor variable (so non-numeric), which can produce unpredictable results in RMark. The expression
>Blackduck$BirdAge=as.numeric(Blackduck$BirdAge)-1
converts this into a numeric covariate (0 or 1 values).
The data are processed and associated with the known fates model structure by
> bduck.processed=process.data(Blackduck,model="Known")
and a general design matrix is built as
> bduck.ddl=make.design.data(bduck.processed)
Finally, I specify a time-specific covariate, which is the number of days in a week (occasion) in which minimum temperature was < 0 C (freezing); this is then incorporated into the design matrix.
> bduck.ddl$S$min=c(4,6,7,7,7,6,5,5)
We can now proceed to develop several models of interest. The general syntax for doing this is fairly simple, since there is only a single parameter type (S= survival) that we are modeling. For example, the model that involves constant S can be created by the statements
>S.dot=list(formula=~1)
>mod1=mark(data=bduck.processed,ddl=bduck.ddl,model.parameters=list(S=S.dot))
A time-specific model can be built (using the built-in parameter indicator "time" in the design matrix) as
>S.time=list(formula=~time)
>mod2=mark(data=bduck.processed,ddl=bduck.ddl,model.parameters=list(S=S.time))
A series of models can be created and run using the mark.wrapper function. Here is code that runs the above 2 models along with several more, and produces a summary table of AIC values. The results appear to favor the model with temperature (min) as covariate, with the coefficient having a value of -0.61, indicating negative relationship between freezing days and survival. The code also produces an input file that can be run in the MARK GUI.
The RMark function model.average() can be used with the bduck.result object to compute model-averaged estimates of the "real parameters" (weekly survival for weeks 1-8).
> model.average(bduck.results)
Note that when individual covariates are involved (as they are with at least 2 models) the real parameter "estimates" are predictions at the mean covariate value, whereas time-specific covariate take on the occasion- specific value for the covariate. E.g., for the model S(min+BirdAge * Weight) we would predict S(3) for adults by setting BirdAge=1, min=7, and using the average weight for adults. More flexible prediction is enable by using a covariate.predictions() function, as described below.
Prediction using covariate models
Often, once we have a estimated the parameters of a covariate model we will be interested in making predictions about the response at various levels of the covariate, including those that may not have been observed yet. If we have a linear logit model of the form
logit(S) = intercept + slope* covariate
and we have estimates of the intercept and slope (from our analysis), we could simply plug in various levels of the covariate and produce preditions. However, this gets a bit cumbersome for a couple of reasons:
We will usually want estimates of variance and perhaps confidence intervals on our predictions, and these are computationally more difficult than just 'plug-in' computations
Most of the time we will have estimates under several models, which would then produce correspondingly different predictions given the covariate values (with some of course predicting no effect of the covariate).
Fortunately, Jeff Laake has written a very nice function covariate.predictions() that deals with these issues in a very general way.
The function reads in a dataframe of model results (like the bduck.results above) which in turn stores all the estimates of model parameter under each model, AIC values,etc.
It then reads in a user-built dataframe containing values of the covariates that we want to predict with. This can be a range for a single covariate, or combination of multiple covariates.
The function then creates an object with model-averaged predictions and variance-covariance structure at each covariate value (or combination of values).
I illustrate this with the Black Duck example. Assume that we've already created the object bduck.results as above. I then selected a range of mass values over which to predict. As noted these could be anything, for simplicity I just evenly spaced these over the data range.
> ########################covariate predictions
> #pick a range of masses to predict over. Just use data range for now
> minmass=min(Blackduck$Weight)
> maxmass=max(Blackduck$Weight)
> mass.values=minmass+(0:30)*(maxmass-minmass)/3
Model-averaged predictions are gotten by a single, very simple call to the covariate.predictions() function
> #compute model-averaged predictions of S at each mass value
> Sbymass=covariate.predictions(bduck.results,data=data.frame(Weight=mass.values),indices=c(1))
>
If we want, we can then plot the prediction and confidence interval over the range of covariate values.
> minmass=min(Blackduck$Weight)
> maxmass=max(Blackduck$Weight)
> mass.values=minmass+(0:30)*(maxmass-minmass)/30
>
> #compute model-averaged predictions of S at each mass value
> Sbymass=covariate.predictions(bduck.results,data=data.frame(Weight=mass.values),indices=c(1))
> #plots
> plot(Sbymass$estimates$covdata, Sbymass$estimates$estimate,
+ type="l",lwd=2,xlab="Mass(kg)",ylab="Survival",ylim=c(0.8,1))
> lines(Sbymass$estimates$covdata, Sbymass$estimates$lcl,lty=2)
> lines(Sbymass$estimates$covdata, Sbymass$estimates$ucl,lty=2)
We could also see how the prediction looks over a more extreme range (500 g ducks -- probably dead! to very fat 2kg ducks)
> #a more extreme prediction
> minmass=0.5
> maxmass=2.0
> mass.values=minmass+(0:30)*(maxmass-minmass)/30
>
> #compute model-averaged predictions of S at each mass value
> Sbymass=covariate.predictions(bduck.results,data=data.frame(Weight=mass.values),indices=c(1))
> #plots
> plot(Sbymass$estimates$covdata, Sbymass$estimates$estimate,
+ type="l",lwd=2,xlab="Mass(kg)",ylab="Survival",ylim=c(0.8,1))
> lines(Sbymass$estimates$covdata, Sbymass$estimates$lcl,lty=2)
> lines(Sbymass$estimates$covdata, Sbymass$estimates$ucl,lty=2)
>
You can see for yourself that while the predictions at 500 g and 940g (lower end of data range) are not hugely different, the lower confidence interval for predictions outside the data range gets very wide (off the chart). This should not be a big surprise and simply reinforces the caution about "prediction outside the design".
Additional comments
Covariate prediction
In the above, note that indice=c(1) refers to the parameter indices in the "all different" parameter indices (the index for the real parameters), which for this data structure are 1- 8 for the 8 time occasions. The models that do not include a time effect will make the same prediction for all parameter indices, so only the first c(1) is needed. In this example we are averaging over some models that include time effects, so the model average prediction above will refer only to the first occasion; if predictions for all the occasions are needed, then those indices would have to be included (indices =1:8). However, since the covariate effect is the same across all occasions in the models where there is an effect, there is really no point in doing this for all indices, especially if the interest is in the comparative effect of the index. We will discuss parameter index matrices (PIMS) much more thoroughly, when we get to CJS models.
You can compare the results of the covariate.prediction() to plugging values into the covariate model. Take the simple case of the age +weight model . First I show code that pulls out the estimates from the model, plugs in covariate values, and produces predictions. Then I do the same thing using the covariate.prediction() function, illustrating that the input for that function can be either a single model, --where we get a prediction under just that model-- or a list of model (results)-- where we get a model averaged prediction.
> #bird age +weight model
> betas<-bduck.results$S.BirdAge.Weight$results$beta$estimate
> #covariate inputs for intercept =1 age=1 and weight =1.2
> x<-c(1,1,1.2)
> #compute prediction
> SpredA<-expit(sum(betas*x))
> SpredA
[1] 0.9289718
>
> #using the covariate prediction function & just one model
> SpredB<-covariate.predictions(bduck.results$S.BirdAge.Weight,data=data.frame(BirdAge=1,Weight=1.2),indices=c(1))
> SpredB$estimate$estimate
[1] 0.9289718
>
Derived parameters
The real survival parameters S modeled in RMark refer to the probabilities of surviving each interval of the study, in the Black Duck example each of the 8 1-week periods. Although not explicitly part of the known fates model, interest often centers on the survival to the end of the study, which by definition is the product of S(j) over the j=1,..., n periods, so S(1)*S(2) *.......S(8) if survival is time-varying or S^8 if it is not. This quantity is also available as a derived parameter under each model, for example
> bduck.results$S.dot$results$real$estimate^8
[1] 0.5922517
shows how this could be computed from the real parameter value and
> bduck.results$S.dot$results$derived
estimate se lcl ucl
1 0.5922518 0.07313623 0.4451435 0.7244965
provides this value directly along with its se and ci.
Nest success data
The above data format and model structure is appropriate for the usual kind of "known fates" data encountered in radiotelemetry studies. Studies of nest success in birds typically involve a somewhat different data structure, related to the way that nest fates are determined. The data for each nest involve (1) the day the nest was first found, (2) Last Present = last day that chicks were observed present in the nest, (3) the last day the nest was checked, and (4) an indicator of nest fate 0=hatched 1=depredated. A frequency column is also specified; this will usually be 1. In addition, individual (nest-specific) covariates may be recorded.
I have included R script to run 2 example nest success analyses. The first is a simple analysis of data on 18 Killdeer nests, which runs a series of models fitting polynomial time effect terms (so modelling variation in daily survival rate over the season). The second is a large (565 nests) study of Mallard nest success, in which success is modeled as a function of several individual (nest-specific) covariates and plots are formed showing predictions between daily survival rate and nest age, and DSR and habitat features. See RMark help for further documentation on these examples.
Next: Review exercises