We will use the distsamp() module of the unmarked package in R to construct alternative models for distance sampling data. The example we will use is a line tranect data set in which rabbits are counted along 4 transects ranging from 1.5 to 4 km in length; 2 transects were located in 'good' and 2 in 'poor' habitats, and perpendicular distances to detected rabbits were recording in meters. The data are located in a comma-delimited text file.
Note: distsamp() is the basic framework developed by Royle et al. 2004. Chandler et al. (2011) extended this to more general situations, for example with counts that follow a negative binomial rather than Poisson distribution. See the unmarked package documentation and help pages for more details.
The data are read in using R and formatted for input into distsamp(), which requires 'binning' the data into distance intervals. A little experimentation (e.g., with the hist() function) shows that a reasonably smooth looking data distribution is provided by binning into 5-m intervals.
> data.dir<-"C:/users/mike/Dropbox/teaching/WILD8390/spring2014/week4"
> setwd(data.dir)
> data<-read.csv("rabbit_dist.csv")
> #specify distance intervals by 5 m out to 140
> breaks<-seq(0,140,5)
> n<-length(breaks)-1
> labels<-as.numeric(1:n)
> #bin distance data by these intervals and compute frequencies for each line
> data$binned.x<-cut(data$distance,breaks=breaks,labels=labels)
>
Other formatting requires the reading of the transect lengths (same within observations in a line) into a database, and the conversion of the habitat variable from alphanumeric ("good"/ "poor") to numeric (1,0). Finally, the data are bundled together into a special dataframe for distsamp()
> class(y) <- "matrix"
> #list of transect lengths by line
> tlength<-with(data,aggregate(length~Line,FUN="mean"))$length*1000
> #open library and format data
> library(unmarked)
> hab<-c(1,1,0,0)
> hab<-data.frame(hab=hab)
> rabbit<-unmarkedFrameDS(y=y,siteCovs=hab, dist.breaks=breaks, tlength=tlength, survey="line",
+ unitsIn="m",mapInfo)
>
Once the data are formatted in the dataframe, we can begin to construct and compare models. I have constructed 4 below; you can make more on your own. The first model involves a half-normal detection function with no habitat effect. Note that the formula ~x ~x indicates modeling of the detection followed by state (density) parameter; in this case ~1 ~1 indicates an intercept (no covariates) model. Also note that in these conventional analyses, distance sampling is conducted on 1 occasion, so time is not an issue but spatial variability is. Thus, if covariates are introduced they will be specific to the samples (lines or objects detected).
> #halfnormal detection, no habitat effect on density
>
> mod1<-distsamp(~1 ~1, data=rabbit, keyfun="halfnorm",output="density",
+ unitsOut="ha", method="BFGS", se=TRUE)
>
We can produce some summary statistics for this model
> summary(mod1)
Call:
distsamp(formula = ~1 ~ 1, data = rabbit, keyfun = "halfnorm",
output = "density", unitsOut = "ha", method = "BFGS", se = TRUE)
Density (log-scale):
Estimate SE z P(>|z|)
0.179 0.117 1.53 0.127
Detection (log-scale):
Estimate SE z P(>|z|)
3.54 0.0678 52.3 0
AIC: 232.4596
and get back-transformed estimates of density and detection, using either the backTransform() or the predict() functions built into unmarked:
Backtransformed linear combination(s) of Density estimate(s)
Estimate SE LinComb (Intercept)
1.2 0.14 0.179 1
Transformation: exp
> backTransform(mod1, type="det")
Backtransformed linear combination(s) of Detection estimate(s)
Estimate SE LinComb sigma(Intercept)
34.6 2.35 3.54 1
Transformation: exp
>
We can also do some simple diagostics, for example plotting a histogram of the data against the estimated detection function
> hist(mod1, xlab="Distance (m)")# Only works when there are no det covars
>
We can continue in similar fashion with other models. For instance a model with the same detection function but adding a habitat covariate for density produces:
> #add a habitat covariate
> mod2<-distsamp(~1 ~hab, data=rabbit, keyfun="halfnorm",output="density",
+ unitsOut="ha", method="BFGS", se=TRUE)
> summary(mod2)
Call:
distsamp(formula = ~1 ~ hab, data = rabbit, keyfun = "halfnorm",
output = "density", unitsOut = "ha", method = "BFGS", se = TRUE)
Density (log-scale):
Estimate SE z P(>|z|)
(Intercept) -0.215 0.158 -1.36 1.74e-01
hab 0.884 0.192 4.61 4.06e-06
Detection (log-scale):
Estimate SE z P(>|z|)
3.54 0.0678 52.3 0
AIC: 213.2255
Number of sites: 4
optim convergence code: 0
optim iterations: 46
Bootstrap iterations: 0
Survey design: line-transect
Detection function: halfnorm
UnitsIn: m
UnitsOut: ha
>
Getting backtransformed estimates takes an additional step because now we are predicting
log(lambda) = beta_0+ beta_1*hab, having estimated beta_0 and beta_1. So we have to generate predictions when the habitat is good , which is coded by c(1,1) == (intercept=1 + habitat=1) or poor by c(1,0) (intercept=1 + habitat =0).
> lc <- linearComb(mod2, array(c(1, 1,0,1),dim=c(2,2)), type="state")
> lc
Linear combination(s) of Density estimate(s)
Estimate SE (Intercept) hab
1 -0.215 0.158 1 0
2 0.669 0.145 1 1
> backTransform(lc) # animals / ha
Backtransformed linear combination(s) of Density estimate(s)
Estimate SE LinComb (Intercept) hab
1 0.806 0.127 -0.215 1 0
2 1.952 0.283 0.669 1 1
Transformation: exp
> backTransform(mod1, type="det")
Backtransformed linear combination(s) of Detection estimate(s)
Estimate SE LinComb sigma(Intercept)
34.6 2.35 3.54 1
Transformation: exp
As before we can produce a plot of the detection function, but this won't change since the detection function hasn't changed.
We'll produce 2 more models, like the ones above but with a hazard rate function as the detection function
> ####################
> # Hazard-rate detection function.
> (mod3 <- distsamp(~ 1 ~ 1, data=rabbit, keyfun="hazard"))
Call:
distsamp(formula = ~1 ~ 1, data = rabbit, keyfun = "hazard")
Density:
Estimate SE z P(>|z|)
0.153 0.123 1.25 0.212
Detection:
Estimate SE z P(>|z|)
3.56 0.103 34.6 7.03e-262
Hazard-rate(scale):
Estimate SE z P(>|z|)
1.23 0.158 7.79 6.52e-15
AIC: 227.6153
>
> summary(mod3)
Call:
distsamp(formula = ~1 ~ 1, data = rabbit, keyfun = "hazard")
Density (log-scale):
Estimate SE z P(>|z|)
0.153 0.123 1.25 0.212
Detection (log-scale):
Estimate SE z P(>|z|)
3.56 0.103 34.6 7.03e-262
Hazard-rate(scale) (log-scale):
Estimate SE z P(>|z|)
1.23 0.158 7.79 6.52e-15
AIC: 227.6153
Number of sites: 4
optim convergence code: 0
optim iterations: 38
Bootstrap iterations: 0
Survey design: line-transect
Detection function: hazard
UnitsIn: m
UnitsOut: ha
> backTransform(mod3, type="state") # animals / ha
Backtransformed linear combination(s) of Density estimate(s)
Estimate SE LinComb (Intercept)
1.17 0.143 0.153 1
Transformation: exp
> backTransform(mod3, type="det")
Backtransformed linear combination(s) of Detection estimate(s)
Estimate SE LinComb shape(Intercept)
35 3.6 3.56 1
Transformation: exp
>
and
> # Hazard-rate detection function with habitat covariate
> mod4 <- distsamp(~ 1 ~ hab, data=rabbit, keyfun="hazard")
>
> summary(mod4)
Call:
distsamp(formula = ~1 ~ hab, data = rabbit, keyfun = "hazard")
Density (log-scale):
Estimate SE z P(>|z|)
(Intercept) -0.240 0.163 -1.48 1.39e-01
hab 0.884 0.192 4.61 4.06e-06
Detection (log-scale):
Estimate SE z P(>|z|)
3.56 0.103 34.6 7.04e-262
Hazard-rate(scale) (log-scale):
Estimate SE z P(>|z|)
1.23 0.158 7.79 6.52e-15
AIC: 208.3813
Number of sites: 4
optim convergence code: 0
optim iterations: 62
Bootstrap iterations: 0
Survey design: line-transect
Detection function: hazard
UnitsIn: m
UnitsOut: ha
>
> lc <- linearComb(mod4, array(c(1, 1,0,1),dim=c(2,2)), type="state")
> lc
Linear combination(s) of Density estimate(s)
Estimate SE (Intercept) hab
1 -0.240 0.163 1 0
2 0.644 0.150 1 1
> backTransform(lc)
Backtransformed linear combination(s) of Density estimate(s)
Estimate SE LinComb (Intercept) hab
1 0.786 0.128 -0.240 1 0
2 1.903 0.285 0.644 1 1
Transformation: exp
>
> hist(mod4, xlab="Distance (m)")# Only works when there are no det covars
>
Note that this seems to fit the data a little better, so perhaps adding a parameter (the hazard rate has an additional parameter) helped. We'll see when we look at AIC.
We can run a built-in model selection procedure and generate some summary comparisons across models. The
> #built in model selection
> fl <- fitList(hn=mod1, hn_hab=mod2, haz=mod3,haz_hab=mod4)
> fl
An object of class "unmarkedFitList"
Slot "fits":
$hn
Call:
distsamp(formula = ~1 ~ 1, data = rabbit, keyfun = "halfnorm",
output = "density", unitsOut = "ha", method = "BFGS", se = TRUE)
Density:
Estimate SE z P(>|z|)
0.179 0.117 1.53 0.127
Detection:
Estimate SE z P(>|z|)
3.54 0.0678 52.3 0
AIC: 232.4596
$hn_hab
Call:
distsamp(formula = ~1 ~ hab, data = rabbit, keyfun = "halfnorm",
output = "density", unitsOut = "ha", method = "BFGS", se = TRUE)
Density:
Estimate SE z P(>|z|)
(Intercept) -0.215 0.158 -1.36 1.74e-01
hab 0.884 0.192 4.61 4.06e-06
Detection:
Estimate SE z P(>|z|)
3.54 0.0678 52.3 0
AIC: 213.2255
$haz
Call:
distsamp(formula = ~1 ~ 1, data = rabbit, keyfun = "hazard")
Density:
Estimate SE z P(>|z|)
0.153 0.123 1.25 0.212
Detection:
Estimate SE z P(>|z|)
3.56 0.103 34.6 7.03e-262
Hazard-rate(scale):
Estimate SE z P(>|z|)
1.23 0.158 7.79 6.52e-15
AIC: 227.6153
$haz_hab
Call:
distsamp(formula = ~1 ~ hab, data = rabbit, keyfun = "hazard")
Density:
Estimate SE z P(>|z|)
(Intercept) -0.240 0.163 -1.48 1.39e-01
hab 0.884 0.192 4.61 4.06e-06
Detection:
Estimate SE z P(>|z|)
3.56 0.103 34.6 7.04e-262
Hazard-rate(scale):
Estimate SE z P(>|z|)
1.23 0.158 7.79 6.52e-15
AIC: 208.3813
>
> ms <- modSel(fl)
> ms
nPars AIC delta AICwt cumltvWt
haz_hab 4 208.38 0.00 9.2e-01 0.92
hn_hab 3 213.23 4.84 8.1e-02 1.00
haz 3 227.62 19.23 6.1e-05 1.00
hn 2 232.46 24.08 5.4e-06 1.00
>
> coef(ms) # Estimates only
lam(hab) lam(Int) p(scale) p(shape(Intercept)) p(sigma(Intercept))
haz_hab 0.8840291 -0.2403650 1.233495 3.556425 NA
hn_hab 0.8840296 -0.2151606 NA NA 3.544771
haz NA 0.1533509 1.233496 3.556425 NA
hn NA 0.1785409 NA NA 3.544772
> SE(ms) # Standard errors only
SElam(hab) SElam(Int) SEp(scale) SEp(shape(Intercept))
haz_hab 0.1918375 0.1626036 0.1582720 0.1028749
hn_hab 0.1918374 0.1581059 NA NA
haz NA 0.1229742 0.1582719 0.1028747
hn NA 0.1169631 NA NA
SEp(sigma(Intercept))
haz_hab NA
hn_hab 0.06780638
haz NA
hn 0.06780646
> (toExport <- as(ms, "data.frame")) # Everything
model formula lam(hab) SElam(hab) lam(Int) SElam(Int) p(scale)
4 haz_hab ~1 ~ hab 0.8840291 0.1918375 -0.2403650 0.1626036 1.233495
2 hn_hab ~1 ~ hab 0.8840296 0.1918374 -0.2151606 0.1581059 NA
3 haz ~1 ~ 1 NA NA 0.1533509 0.1229742 1.233496
1 hn ~1 ~ 1 NA NA 0.1785409 0.1169631 NA
SEp(scale) p(shape(Intercept)) SEp(shape(Intercept)) p(sigma(Intercept))
4 0.1582720 3.556425 0.1028749 NA
2 NA NA NA 3.544771
3 0.1582719 3.556425 0.1028747 NA
1 NA NA NA 3.544772
SEp(sigma(Intercept)) Converge CondNum negLogLike nPars n AIC delta
4 NA 0 20.78234 100.1906 4 4 208.3813 0.000000
2 0.06780638 0 20.53277 103.6127 3 4 213.2255 4.844208
3 NA 0 12.24531 110.8077 3 4 227.6153 19.234077
1 0.06780646 0 5.82114 114.2298 2 4 232.4596 24.078285
AICwt Rsq cumltvWt
4 9.184362e-01 NA 0.9184362
2 8.149718e-02 NA 0.9999334
3 6.115364e-05 NA 0.9999946
1 5.426451e-06 NA 1.0000000
>
>
>
The results suggest that the last model we fit (hazard rate + habitat covariate) did the best job of predicting the data. It also indicates a fairly strong difference in density between good and poor habitats (1.9 vs 0.78 rabbits/ ha).
Finally, after model selection, we can form model-averaged estimates of the real parameters, for example density. We use the built in predict() function in unmarked. By default the predictions are mode based on the site-level (i.e., per transect) covariate values but potentially finer scales could be involved:
> #built in model selection
> fL <- fitList(hn=mod1, hn_hab=mod2, haz=mod3,haz_hab=mod4)
> ms<-modSel(fL)
> predict(fL,type="state")
Predicted SE lower upper
1 1.9073531 0.2851971 1.4233554 2.555948
2 1.9073531 0.2851971 1.4233554 2.555948
3 0.7880019 0.1279684 0.5733806 1.082965
4 0.7880019 0.1279684 0.5733806 1.082965
Often it will be of more interest to form predictions for specific covariate values or combinations (if more than one covariate is involved). In this example there is a single habitat covariate with 2 levels (0 = poor and 1 = good). We form the predictions by creating a new data frame with these levels and using it in the predict function
> new<-data.frame(hab=c(0,1))
> #form predictions
> p.data<-predict(fL,type="state",new=new,appendData=T)
> p.data
Predicted SE lower upper hab
1 0.7880019 0.1279684 0.5733806 1.082965 0
2 1.9073531 0.2851971 1.4233554 2.555948 1
This provides us with predictions, standard errors, and confidence intervals for density in good vs poor habitats
All the above commands are contained in an R script file.