Post date: Apr 21, 2017 10:41:0 PM
Here is how I formated the stripe data and the command for running the null model (GARMA(0,0)), this is from stripeAnalysis.R:
library(rjags)
dat<-read.csv("Tcris_master_26.csv")
hv<-which(dat$location=="HV")
y<-tapply(X=dat$striped[hv],INDEX=dat$year[hv],sum)
n<-tapply(X=dat$striped[hv]+dat$unstriped[hv],INDEX=dat$year[hv],sum)
## add 2017 data
y<-c(y,90+280)
n<-c(n,164+90+106+280)
dics<-vector("list",7)
## fit GARMA(0,0) model
dat<-list(y=y[-c(1:2)],n=n[-c(1:2)],N=18)
mod<-jags.model(file="garma00model",data=dat,n.chains=3)
update(mod,n.iter=10000)
sam<-coda.samples(mod,n.iter=100000,thin=50,variable.names=c("c","m"))
dsam<-dic.samples(mod,n.iter=100000,thin=50,type="pD")
dics[[1]]<-dsam
Results from model comparison for stripe. I used all 17 years of data. Based on 3 chains each with 10,000 burnin, 100,000 sampling iterations and a thinning interval of 50. An additional 100,000/50 samples were generated for DIC after sampling for parameter estimation. The best model was GARMA(1,2).
For GARMA(1,2), a pseudo r2 given by 1 - (mean deviance best model)/(mean deviance null model) was 0. 67.
For leave-one-out cross-validation, the r^2 for data ~ predicted was 0.47. See,
yr<-2000:2017
pdf("stripeTimeSeriesGARMA12.pdf",width=6,height=5)
par(mar=c(5,5,0.5,0.5))
plot(yr,dat$y/dat$n,type='l',ylim=c(0.3,1),cex.lab="1.5",xlab="year",ylab="stripe frequency")
points(yr,o[2:19,3],col="blue")
segments(yr,o[2:19,1],yr,o[2:19,5],col="blue")
dev.off()
## r2 from null model comparison
L1<-96.41
L0<-295.9
r2<-1-(L1/L0)
## 0.6741805
## cross validation
cve<-matrix(NA,nrow=18,ncol=5)
for(x in 3:18){
datx<-dat
datx$y[x]<-NA
mod<-jags.model(file="garma12model",data=datx,n.chains=3)
update(mod,n.iter=10000)
samx<-coda.samples(mod,n.iter=100000,thin=50,variable.names=c("theta","c","m","phi"))
ox<-summary(samx)[[2]]
cve[x,]<-ox[x+1,]
}
pdf("stripeTimeSeriesPredict.pdf",width=6,height=5)
par(mar=c(5,5,0.5,0.5))
plot(yr,dat$y/dat$n,type='l',ylim=c(0.3,1),cex.lab="1.5",xlab="year",ylab="stripe frequency")
points(yr,cve[,3],col="blue")
segments(yr,cve[,1],yr,cve[,5],col="blue")
dev.off()
oo<-lm(dat$y[-c(1:2)]/dat$n[-c(1:2)]~cve[-c(1:2),3])
pdf("stripeObsVpredict.pdf",width=5,height=5)
plot(cve[,3],dat$y/dat$n,pch=19,xlab="predicted frequency",ylab="observed frequency",cex.lab=1.4)
abline(oo$coefficients,col="darkgray",lwd=1.5)
abline(a=0,b=1,lty=2,lwd=1.5)
dev.off()
summary(oo)
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.02844 0.19834 0.143 0.88804
#cve[-c(1:2), 3] 0.88968 0.25264 3.522 0.00339 **
#Residual standard error: 0.09637 on 14 degrees of freedom
#Multiple R-squared: 0.4697, Adjusted R-squared: 0.4318
#F-statistic: 12.4 on 1 and 14 DF, p-value: 0.003387
mean(cve[,1] < dat$y/dat$n & cve[,5] > dat$y/dat$n,na.rm=T)
## 0.75
I repeated the same process with melanistic. GARMA(2,1) was the best model (see melanicAnalysis.R and the related workspace garmaMel.rdat for details. But, the key summaries are that the GARMA(2,1) pseudo-r^2 was 0.42 and the leave-one-out cross validation r^2 was 0.15, but with a greater intercept offset,
summary(oo)
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.12454 0.07115 1.750 0.102
#cve[-c(1:2), 3] 0.47752 0.29962 1.594 0.133
#Residual standard error: 0.0885 on 14 degrees of freedom
#Multiple R-squared: 0.1536, Adjusted R-squared: 0.09311
#F-statistic: 2.54 on 1 and 14 DF, p-value: 0.1333
mean(cve[,1] < dat$y/dat$n & cve[,5] > dat$y/dat$n,na.rm=T)
## 0.5
I tried the last X years predictions for stripe (with various values for X) and had a hard time getting convergence. I am exploring rstan as an alternative.