Sightability Models
Sightability experiments and models now play an important part in abundance estimation. They have mostly been applied to aerial surveys of large mammals (especially ungulates) but conceivably could be used in other situations.
The basic idea for sightability models is fairly straightforward. First, it is of course assumed that detection ("sightability") is incomplete (p<1), but is at least potentially predictable by factors that can be measured in the field. Examples of factors that might affect sightablity include:
Vegetation cover
Group size (for animals that occur in aggregations)
Aircraft speed and altitude
Observer experience
Note that some of the above (aircraft speed, observer experiene) may be under design control but other (vegetation cover, group size) are not. This can have implications for survey design, in that we may wish to control factors such as aircraft speed and elevation in order to optimize detection and reduce variability among surveys. On the other hand, factors outside our control-- like group size-- need to be taken into account in interpreting survey results. In either case, we need some empirical basis for conclusions about the role of these factors-- we need data and a model.
The package SightabilityModel in R provides a convenient framework for taking sightability into account and deriving unbiased estimates. The basic steps of sightability modeling involve:
Collection of experimental data to estimate the relationship between factors we can measure and sightability. Often this will involve the attachment of radio collars to animals, which are then incorporated into aerial surveys where the crew first surveys to detect all animals (receiver off) and then activates the receiver to determine which marked animals have been detected (w=1) or missed (w=0). The detection data are then fit with binomial regression models to determine the relationship of detection to the covariate factors( x) associated with the detections/ non-detections. This established a prediction for detection as a function of x, p(x).
During operational surveys, animals are detected (by definition only the "1"s are detected!) and along with the detections covariates ( x) are recorded. The counts associated with the covariate values, n( x i) are then "adjusted" by p(xi) on each unit i (plot, transect, etc) counted
n( xi)/p(xi)
The "adjusted" counts are then summed over the survey units. When a sample of available units is surveyed, this sum must also take into account a 'sample inflation' factor.
The above is a simplification of the Horvitz-Thompson, see the Fieberg manual and references for more details, including particulars on how variances and confidence intervals are computed.
A couple of examples (both provided with the package) will illustrate the salient points.
Simultaneous analysis of experimental calibration data and observational count data
The example is of moose (Alces alces) surveyed in northern Minnesota. Over 3 years 124 radiocollared moose were surveyed, detections and non detections recorded, along with 2 covariates: vegetative screening cover (vcov), and group size (grpsize). The data are loaded via the commands
> data(exp.m)
> data(obs.m)
> data(sampinfo.m)
> #first 5 observations of experimental data
> exp.m[1:5,]
year observed voc grpsize
37 2005 1 20 4
38 2005 1 85 2
39 2005 0 80 1
40 2005 0 75 1
41 2005 0 70 1
The operational data are 4 years where moose were counted on survey units (totals and by age/ sex); simultaneously observers recorded vegetation screening and groupsize.
> #first 5 observations of observational data
> obs.m[1:5,]
year stratum subunit total cows calves bulls unclass voc grpsize
1 2004 1 140 2 1 1 0 0 90 2
2 2004 1 140 1 0 0 1 0 90 1
3 2004 1 140 2 0 0 2 0 75 2
4 2004 1 140 3 1 0 2 0 0 3
5 2004 1 180 2 2 0 0 0 85 2
>
Finally, information about the spatial sampling design must be included, specifically the number of sampling units available (Nh) and the number surveyed (nh) for each stratum (1,2,3) in each year:
> #sampling info data
> sampinfo.m
year stratum Nh nh
1 2004 1 178 10
2 2004 2 127 14
3 2004 3 31 6
4 2005 1 238 12
5 2005 2 180 18
6 2005 3 35 6
7 2006 1 238 18
8 2006 2 180 14
9 2006 3 35 5
10 2007 1 248 16
11 2007 2 170 18
12 2007 3 35 6
These data are put together into an analysis for each year, for example 2004:
> est.2004<-Sight.Est(observed~voc,odat=subset(obs.m,year==2004),sdat=exp.m,sampinfo=subset(sampinfo.m,year==2004))
> summary(est.2004)
[1] tau.hat = 13,096 ; 95 % CI = ( 8,549 , 21,430 )
More detailed results are provied by
> est.2004
Call:
Sight.Est(form = observed ~ voc, sdat = exp.m, odat = subset(obs.m,
year == 2004), sampinfo = subset(sampinfo.m, year == 2004))
------------------- SIGHTABILITY MODEL ---------------------
Call: glm(formula = form, family = binomial(), data = sdat)
Coefficients:
(Intercept) voc
1.75993 -0.03479
Degrees of Freedom: 123 Total (i.e. Null); 122 Residual
Null Deviance: 171.6
Residual Deviance: 147.4 AIC: 151.4
----------------- Population Survey data ----------------
Stratum Sampling Information
stratum nh Nh
1 1 10 178
2 2 14 127
3 3 6 31
Number of animals seen in each stratum
1 2 3
68 248 202
-------------- POPULATION ESTIMATE ( 95 % CI) ----------------
[1] tau.hat = 13,096 ; 95 % CI = ( 8,549 , 21,430 )
------------------ SE(tau.hat) --------------------------------
Variance method: [1] "WONG"
SE
3,117
-------------- Variance Components -------------------
VarSamp VarSight VarMod
3,033,952 873,246 5,810,441
>
We can glean some useful information here about survey design.
> est.2004$est
tau.hat VarTot VarSamp VarSight VarMod
13096.31 9717637.97 3033951.73 873245.56 5810440.68
> est.2004$est[3:5]/est.2004$est[2]
VarSamp VarSight VarMod
0.31221082 0.08986192 0.59792726
The above shows that about 60% of the estimated variance for 2004 was due to the statistical model used to estimate the sightability predctions, and most of the rest (31%) to sampling variation (because nh<Nh). Both of these are under the investigator's control, and so could be reduced through improved calibration data (more marked animals, better covariates), more complete spatial sampling, or both. We fundamentally can't do much about sightability itself (VarSight): it is what it is, but we can do a better job of estimating it.
Use of estimates derived from a prior experimental calibration with observational count data
Here we have a fairly common situation in which the experimental study is done once (or maybe even borrowed from a different study) and then applied to later (or earlier) operational surveys. The example is from mountain goat surveyed in Olympic National park collected in 2004; then in 2009 a separate study of sightability was done and the estimates used to adjust the 2004 counts. The experimental and operational data are loaded from the package as before, while the sampling information is created in a dataframe:
>
> #applying previous calibration estimates to an operational survey
> #contains previous model fit
> data(g.fit)
> data(gdat)
> sampinfo<-data.frame(nh=c(6,23,11),Nh=c(6,27,65),stratum=c(1,2,3))
Because no goats were surveyed n the last stratum, only the first 2 rows of the sampinfo dataframe are used:
> goat.est<-Sight.Est(observed~GroupSize+Terrain+pct.VegCover,odat=gdat,sampinfo=sampinfo[1:2,],bet=g.fit$beta.g,varbet=g.fit$varbeta.g)
> goat.est
Call:
Sight.Est(form = observed ~ GroupSize + Terrain + pct.VegCover,
odat = gdat, sampinfo = sampinfo[1:2, ], bet = g.fit$beta.g,
varbet = g.fit$varbeta.g)
------------------- SIGHTABILITY MODEL ---------------------
$bet
[1] 1.385 0.158 -1.138 -0.012
$varbet
[,1] [,2] [,3] [,4]
[1,] 0.160801 -0.013000 -0.090000 -1.0e-03
[2,] -0.013000 0.003721 -0.001000 0.0e+00
[3,] -0.090000 -0.001000 0.161604 0.0e+00
[4,] -0.001000 0.000000 0.000000 8.1e-05
$note
[1] "User supplied regression model"
----------------- Population Survey data ----------------
Stratum Sampling Information
stratum nh Nh
1 1 6 6
2 2 23 27
Number of animals seen in each stratum
1 2
124 56
-------------- POPULATION ESTIMATE ( 95 % CI) ----------------
[1] tau.hat = 230 ; 95 % CI = ( 205 , 293 )
------------------ SE(tau.hat) --------------------------------
Variance method: [1] "WONG"
SE
20
-------------- Variance Components -------------------
VarSamp VarSight VarMod
137 168 82
> summary(goat.est)
[1] tau.hat = 230 ; 95 % CI = ( 205 , 293 )
>
Note several things:
The above examples were strictly logit-linear models with pre-selected variables. In general you will want to explore:
Multiple alternative models
Quadratic and nonlinear model
Options for alternative sightability models are contained in the package (see Feinberg); alternatively it can often be easier to fit models outside the package (e.g., in glm) and then import results included model-averaged estimates and variance-covariance matrices.
The attached code provides basic steps for computing and comparing AIC values and producing a table of AIC model weights.
Addtional special cases handled in the package include:
Correlated within-stratum detection estimates
Non-independent population estimates (trends, etc)
Built-in multi-model inference
Bootstrap computation of variances
See the Fieberg document for these details.
R code for above examples
Application of sightability models to aerial surveys of desert bighorn sheep.
Several alternative models involving various functional forms for group size prediction and pooled vs. unpooled data were compared using AIC and one selected for prediction into operational areas and years.
Conroy et al. 2014 (below for code and erratum correction)
Code and data for analyses to run alternative models and compute estimates