comparison of predictability for stripe and color at HV

Post date: Apr 21, 2017 10:41:0 PM

Here is how I formated the stripe data and the command for running the null model (GARMA(0,0)), this is from stripeAnalysis.R:

library(rjags)

dat<-read.csv("Tcris_master_26.csv")

hv<-which(dat$location=="HV")

y<-tapply(X=dat$striped[hv],INDEX=dat$year[hv],sum)

n<-tapply(X=dat$striped[hv]+dat$unstriped[hv],INDEX=dat$year[hv],sum)

## add 2017 data

y<-c(y,90+280)

n<-c(n,164+90+106+280)

dics<-vector("list",7)

## fit GARMA(0,0) model

dat<-list(y=y[-c(1:2)],n=n[-c(1:2)],N=18)

mod<-jags.model(file="garma00model",data=dat,n.chains=3)

update(mod,n.iter=10000)

sam<-coda.samples(mod,n.iter=100000,thin=50,variable.names=c("c","m"))

dsam<-dic.samples(mod,n.iter=100000,thin=50,type="pD")

dics[[1]]<-dsam

Results from model comparison for stripe. I used all 17 years of data. Based on 3 chains each with 10,000 burnin, 100,000 sampling iterations and a thinning interval of 50. An additional 100,000/50 samples were generated for DIC after sampling for parameter estimation. The best model was GARMA(1,2).

For GARMA(1,2), a pseudo r2 given by 1 - (mean deviance best model)/(mean deviance null model) was 0. 67.

For leave-one-out cross-validation, the r^2 for data ~ predicted was 0.47. See,

yr<-2000:2017

pdf("stripeTimeSeriesGARMA12.pdf",width=6,height=5)

par(mar=c(5,5,0.5,0.5))

plot(yr,dat$y/dat$n,type='l',ylim=c(0.3,1),cex.lab="1.5",xlab="year",ylab="stripe frequency")

points(yr,o[2:19,3],col="blue")

segments(yr,o[2:19,1],yr,o[2:19,5],col="blue")

dev.off()

## r2 from null model comparison

L1<-96.41

L0<-295.9

r2<-1-(L1/L0)

## 0.6741805

## cross validation

cve<-matrix(NA,nrow=18,ncol=5)

for(x in 3:18){

datx<-dat

datx$y[x]<-NA

mod<-jags.model(file="garma12model",data=datx,n.chains=3)

update(mod,n.iter=10000)

samx<-coda.samples(mod,n.iter=100000,thin=50,variable.names=c("theta","c","m","phi"))

ox<-summary(samx)[[2]]

cve[x,]<-ox[x+1,]

}

pdf("stripeTimeSeriesPredict.pdf",width=6,height=5)

par(mar=c(5,5,0.5,0.5))

plot(yr,dat$y/dat$n,type='l',ylim=c(0.3,1),cex.lab="1.5",xlab="year",ylab="stripe frequency")

points(yr,cve[,3],col="blue")

segments(yr,cve[,1],yr,cve[,5],col="blue")

dev.off()

oo<-lm(dat$y[-c(1:2)]/dat$n[-c(1:2)]~cve[-c(1:2),3])

pdf("stripeObsVpredict.pdf",width=5,height=5)

plot(cve[,3],dat$y/dat$n,pch=19,xlab="predicted frequency",ylab="observed frequency",cex.lab=1.4)

abline(oo$coefficients,col="darkgray",lwd=1.5)

abline(a=0,b=1,lty=2,lwd=1.5)

dev.off()

summary(oo)

#Coefficients:

# Estimate Std. Error t value Pr(>|t|)

#(Intercept) 0.02844 0.19834 0.143 0.88804

#cve[-c(1:2), 3] 0.88968 0.25264 3.522 0.00339 **

#Residual standard error: 0.09637 on 14 degrees of freedom

#Multiple R-squared: 0.4697, Adjusted R-squared: 0.4318

#F-statistic: 12.4 on 1 and 14 DF, p-value: 0.003387

mean(cve[,1] < dat$y/dat$n & cve[,5] > dat$y/dat$n,na.rm=T)

## 0.75

I repeated the same process with melanistic. GARMA(2,1) was the best model (see melanicAnalysis.R and the related workspace garmaMel.rdat for details. But, the key summaries are that the GARMA(2,1) pseudo-r^2 was 0.42 and the leave-one-out cross validation r^2 was 0.15, but with a greater intercept offset,

summary(oo)

#Coefficients:

# Estimate Std. Error t value Pr(>|t|)

#(Intercept) 0.12454 0.07115 1.750 0.102

#cve[-c(1:2), 3] 0.47752 0.29962 1.594 0.133

#Residual standard error: 0.0885 on 14 degrees of freedom

#Multiple R-squared: 0.1536, Adjusted R-squared: 0.09311

#F-statistic: 2.54 on 1 and 14 DF, p-value: 0.1333

mean(cve[,1] < dat$y/dat$n & cve[,5] > dat$y/dat$n,na.rm=T)

## 0.5

I tried the last X years predictions for stripe (with various values for X) and had a hard time getting convergence. I am exploring rstan as an alternative.

Page updated

Google Sites

Report abuse