Post date: Nov 20, 2013 11:2:12 PM
I can also use the BSLMM in gemma to identify individual loci or genetic regions associated with performance. This could be interesting in terms of genome annotation (later), but is central to asking whether the genetic basis of variation in performance in the lab predicts or is consistent with patterns of allele frequency and host-plant use variation in nature. I wrote a perl script to obtain mean posterior effect estimates for individual SNVs across chains. The perl script is combineSparseEffects.pl. With this script I get two files for each trait (survival and adult weight) and individual experiments vs. combination (so far I only have the former). One file contains the estimate of gamma (the posterior inclusion probability) and the other contains the effect = beta * gamma + alpha (beta is the effect given gamma = 1 and alpha is the polygenic effect, this makes more sense if you think of the overall effect as a mixture of effects from one normal and one normal x zero point-mass mixture).
These plots give the pip or effect by locus for survival (pip, effect) and weight (pip, effect). With regard to survival, all pip are quite low for GLA, but there are clearly non-zero pip for SLA on Ms and Ac (the latter are higher, but also perhaps less interesting as noted previously). Effects are correlated with pip (r > 0.5, sometimes almost 0.95). On the other hand, there are more loci with non-zero pip and effects in GLA (particularly GLA on Ms) for adult weight. Again, I don't know that there is a big take home message from this.
I also calculated posterior inclusion probabilities for entire scaffolds. These are very strongly correlated with scaffold length (except for SLA on Ac) and the scaffold pip's are driven almost entirely by the whether or not the scaffold includes one of the few loci with high pip. Thus, the scaffold-based analyses do not capture anything beyond what we see in the individual locus analyses. All figures are in the directory projects/lycaeides_hostplant/melGemma/figs. The R code for these analyses (snpCommands.R) is below,
## snp and region analyses
## read posterior estimates
survgamma<-read.table("effectsSurvGamma.txt",header=TRUE)
survbeta<-read.table("effectsSurvBeta.txt",header=TRUE)
wgtgamma<-read.table("effectsWgtGamma.txt",header=TRUE)
wgtbeta<-read.table("effectsWgtBeta.txt",header=TRUE)
nam<-c("GLA-Ac","GLA-Ms","SLA-Ac","SLA-Ms")
nl<-dim(survgamma)[1]
nlw<-dim(wgtgamma)[1]
ne<-4 # nubmer experiments or sets
ns<-length(unique(survgamma$scaf))
nsw<-length(unique(wgtgamma$scaf))
## plots of gamma and effect for individual snp loci
png("gpSurvPip.png",width=12,height=12,units="in",res=320)
par(mfrow=c(4,1))
par(mar=c(4,5.5,4.5,0.5))
for(j in 1:4){
plot(1:nl,survgamma[,j+2],pch=20,xlab="locus",ylab="pip",cex.lab=1.6,cex.axis=1.1)
title(main=nam[j],cex.main=1.3)
}
dev.off()
png("gpWgtPip.png",width=12,height=12,units="in",res=320)
par(mfrow=c(4,1))
par(mar=c(4,5.5,4.5,0.5))
for(j in 1:4){
plot(1:nlw,wgtgamma[,j+2],pch=20,xlab="locus",ylab="pip",cex.lab=1.6,cex.axis=1.1)
title(main=nam[j],cex.main=1.3)
}
dev.off()
png("gpSurvEf.png",width=12,height=12,units="in",res=320)
par(mfrow=c(4,1))
par(mar=c(4,5.5,4.5,0.5))
for(j in 1:4){
plot(1:nl,survbeta[,j+2],pch=20,xlab="locus",ylab="effect",cex.lab=1.6,cex.axis=1.1)
title(main=nam[j],cex.main=1.3)
}
dev.off()
png("gpWgtEf.png",width=12,height=12,units="in",res=320)
par(mfrow=c(4,1))
par(mar=c(4,5.5,4.5,0.5))
for(j in 1:4){
plot(1:nlw,wgtbeta[,j+2],pch=20,xlab="locus",ylab="effect",cex.lab=1.6,cex.axis=1.1)
title(main=nam[j],cex.main=1.3)
}
dev.off()
## probability of one or more associated variants on scaffold
## number of snps per scaffold
xl<-tapply(X=survgamma$pos > -1,survgamma$scaf,sum)
pr<-matrix(NA,nrow=ns,ncol=ne)
for(j in 1:ne){
pr[,j]<-1-tapply(X=1-survgamma[,j+2],INDEX=survgamma$scaf,prod,na.rm=TRUE)
}
png("gpScafPipSurv.png",width=11,height=11,units="in",res=280)
par(mfrow=c(2,2))
par(mar=rep(4.5,4))
par(pty='s')
for(j in 1:4){
plot(xl,pr[,j],pch=20,xlab="number of loci",ylab="scaffold pip",cex.lab=1.5)
title(main=nam[j],cex.main=1.3)
}
dev.off()
pr<-matrix(NA,nrow=nsw,ncol=ne)
for(j in 1:ne){
pr[,j]<-1-tapply(X=1-wgtgamma[,j+2],INDEX=wgtgamma$scaf,prod,na.rm=TRUE)
}
xl<-tapply(X=wgtgamma$pos > -1,wgtgamma$scaf,sum)
png("gpScafPipWgt.png",width=11,height=11,units="in",res=280)
par(mfrow=c(2,2))
par(mar=rep(4.5,4))
par(pty='s')
for(j in 1:4){
plot(xl,pr[,j],pch=20,xlab="number of loci",ylab="scaffold pip",cex.lab=1.5)
title(main=nam[j],cex.main=1.3)
}
dev.off()