Post date: Sep 26, 2019 7:43:39 PM
These results now include the previously dropped individuals.
1. Extract summaries from the posterior. All within the output subdirectory
Hyperparameters:
perl calpost.pl *0.hyp.txt
Generates genarch_tchum_mel.txt, with the six traits (AC sex, AC survival, MM sex, MM survival, RG, GB) as rows, and point estimates and 95% CIs for PVE, PGE and n-gamma as columns.
PIPs:
perl grabPips.pl o_mod_g_tchum_AC_ph1_ch0.param.txt
perl grabPips.pl o_mod_g_tchum_AC_ph2_ch0.param.txt
perl grabPips.pl o_mod_g_tchum_MM_ph1_ch0.param.txt
perl grabPips.pl o_mod_g_tchum_MM_ph2_ch0.param.txt
perl grabPips.pl o_mod_g_tchum_ph1_ch0.param.txt
perl grabPips.pl o_mod_g_tchum_ph2_ch0.param.txt
Generates the pip files, pip_o_mod_g_tchum_*, one row each per SNP.
MAVs:
perl grabMavEffects.pl o_mod_g_tchum_AC_ph1_ch0.param.txt
perl grabMavEffects.pl o_mod_g_tchum_AC_ph2_ch0.param.txt
perl grabMavEffects.pl o_mod_g_tchum_MM_ph1_ch0.param.txt
perl grabMavEffects.pl o_mod_g_tchum_MM_ph2_ch0.param.txt
perl grabMavEffects.pl o_mod_g_tchum_ph1_ch0.param.txt
perl grabMavEffects.pl o_mod_g_tchum_ph2_ch0.param.txt
Generates the model-averaged effect files, mav_o_mod_g_tchum_*, one row each per SNP.
2. Plots and analyses in R (models.R):
I have re-ran the gemma and Bayesian model averaging/model selection analysis with the previously dropped individuals added back in. I have also looked at/calculated a few different summaries (to think more about epistasis) and tweaked a few other things. All is more or less as it was, but here is the new summary. My plans for next steps come at the end.
a. Focusing on the indel SNPs, I grabbed all of the SNPs with PIPs from gemma for mapping color that were 0.5 or greater for RG or GB (or both). Thus, these SNPs were more likely to be associated with color than not. There were 6 such SNPs.
Here are their positions (on scaffold 128): 5359458, 5359483, 5359496, 5690741, 5708354, 5746964.
SNPs 2, 3, 5 and 6 from this list were associated with RG, and SNPs 1, 3, 4, and 6 were associated with GB (i.e., some shared, some not).
b. Then, I used Bayesian model averaging/variable selection regression to fit a model (while doing variable selection) with additive and interaction effects for these six SNPs (21 variables total) for survival on A/C and on MM. The figure (ColorSurvPips.pdf) shows PIPs for these 21 variables for A/C and MM (note the first 6 terms are the additive ones, the others are the interactions). Visually, my take on this is that there is more of a signal on A/C than MM. This includes interactions in both cases, and the bulk of the signal comes from interactions on A/C. Quantitatively, we have the estimates based on these PIPs (all give posterior mean and sd = s.e.):
Total number of SNPs or SNP interactions in the model
A/C = 4.03 (1.63); MM = 2.72 (1.48)
Number of additive effects
A/C = 1.10 (0.83); MM = 0.56 (0.72)
Number of interaction effects
A/C = 2.93 (1.41); MM = 2.16 (1.30)
Note that large s.e. make discussion of credible/significant differences here tricky, but the trends are pretty clear/compelling (just lots of uncertainty).
c. I am now also including model-averaged effect estimates from the Bayesian model averaging (ColorSurvModAvEffs.pdf). There are some details worth thinking about interactions for genotypes that are worth discussing, but I think you can mostly conclude sign epistasis with an epistatic effect involving a SNP is larger in magnitude then its additive effect (but lets chat the details). Under this interpretation, we have some evidence of sign epistasis, though it is a bit complicated by the fact that we have so few additive effects (i.e., it is only the interaction that matters in many cases).
Also, just for context/thinking about interactions and real effects in the population, here are the non-reference allele frequencies for the six SNPs: 0.126, 0.073, 0.601, 0.662, 0.089, 0.022. Note the 6th one which stands out in some ways is rather rare.
d. Finally, I fit generalized linear models with (i) all predictors with PIPs > .5 or (ii) all predictors with PIPs > .1 (this is different than last time where I just used all of them). The former is just 2 or 1 predictors for AC and MM respectively, whereas the latter gives 11 or 9. The pseudo-R2 (this are glms) are as follows (and in all cases some terms are significant):
AC, PIP > .5, 0.08
AC, PIP > .1, 0.16
MM, PIP > .5, 0.03
MM, PIP > .1, 0.12
Next, I plan to play more with a few other ways of summarizing explanatory power and then to turn to landscapes proper.
Here is the code:
## read in genetic and survival data
g_AC<-as.matrix(read.table("g_tchum_AC.txt",header=FALSE))
g_MM<-as.matrix(read.table("g_tchum_MM.txt",header=FALSE))
snps<-as.matrix(read.table("tchumSnpTable.txt",header=FALSE))
del<-which(snps[,2]==128 & snps[,3] >= 5000000 & snps[,3] <= 6000000)
g_AC_del<-t(g_AC[del,])
g_MM_del<-t(g_MM[del,])
c_AC_del<-g_AC_del
c_MM_del<-g_MM_del
for(i in 1:length(del)){
c_AC_del[,i]<-c_AC_del[,i]-mean(c_AC_del[,i])
c_MM_del[,i]<-c_MM_del[,i]-mean(c_MM_del[,i])
}
c_AC_del2<-c_AC_del^2
c_MM_del2<-c_MM_del^2
p_AC<-read.table("pheno_AC.txt",header=FALSE)
p_MM<-read.table("pheno_MM.txt",header=FALSE)
## top pip SNPs
pf<-list.files("output/",pattern="pip_")
pips<-matrix(NA,nrow=11855,ncol=6)
for(i in 1:6){
pips[,i]<-scan(paste("output/",pf[i],sep=""))
}
pipSnpTab<-read.table("output/pipSnpTable.txt",header=FALSE)
pipDel<-which(pipSnpTab[,2]==128 & pipSnpTab[,3] >= 5000000 & pipSnpTab[,3] <= 6000000)
pipDf<-data.frame(pipSnpTab[pipDel,],pips[pipDel,5:6])
sum(pipDf[,5] > 0.5 | pipDf[,4] > 0.5)
#[1] 6
cor.test(pipDf[,4],pipDf[,5])
# Pearson's product-moment correlation
#Pearson's product-moment correlation
#data: pipDf[, 4] and pipDf[, 5]
#t = 2.0706, df = 16, p-value = 0.05493
#alternative hypothesis: true correlation is not equal to 0
#95 percent confidence interval:
# -0.009121002 0.762851117
#sample estimates:
# cor
#0.4597067
#data: pipDf[, 4] and pipDf[, 5]
## get relevant SNP ids
topIds<-pipDf[which(pipDf[,5] > 0.5 | pipDf[,4] > 0.5),1:3]
topSnps<-rep(NA,6)
#
for(i in 1:6){
topSnps[i]<-which(snps[,1]==topIds[i,1] & snps[,2]==topIds[i,2] & snps[,3]==topIds[i,3])
}
## RG = 2, 3, 5, 6
## GB = 1, 3, 4, 6
g_AC_pip<-t(g_AC[topSnps,])
g_MM_pip<-t(g_MM[topSnps,])
c_AC_pip<-g_AC_pip
c_MM_pip<-g_MM_pip
for(i in 1:length(topSnps)){
c_AC_pip[,i]<-c_AC_pip[,i]-mean(c_AC_pip[,i])
c_MM_pip[,i]<-c_MM_pip[,i]-mean(c_MM_pip[,i])
}
library(BMS)
V<-21 ## variables
form<-p_AC[,2] ~ .^2
mod<-model.matrix(form,dat=as.data.frame(c_AC_pip))
mod<-cbind(mod[,-1])
dfAC<-data.frame(p_AC[,2],mod)
bmsAC<-vector("list",3)
for(i in 1:5){
bmsAC[[i]]<-bms(dfAC,burn=10000,iter=200000,mprior="uniform",g="UIP",user.int=FALSE)
}
bpipsAC<-matrix(NA,nrow=V,ncol=5)
bmavAC<-matrix(NA,nrow=V,ncol=5)
for(i in 1:5){
oo<-coef(bmsAC[[i]],order.by.pip=FALSE)
bpipsAC[,i]<-oo[,1]
bmavAC[,i]<-oo[,2]
}
paIds<-row.names(oo)
form<-p_MM[,2] ~ .^2
mod<-model.matrix(form,dat=as.data.frame(c_MM_pip))
mod<-cbind(mod[,-1])
dfMM<-data.frame(p_MM[,2],mod)
bmsMM<-vector("list",3)
for(i in 1:5){
bmsMM[[i]]<-bms(dfMM,burn=10000,iter=200000,mprior="uniform",g="UIP",user.int=FALSE)
}
bpipsMM<-matrix(NA,nrow=V,ncol=5)
bmavMM<-matrix(NA,nrow=V,ncol=5)
for(i in 1:5){
oo<-coef(bmsMM[[i]],order.by.pip=FALSE)
bpipsMM[,i]<-oo[,1]
bmavMM[,i]<-oo[,2]
}
cs<-rep(c("cornsilk3","coral3"),c(6,15))
pdf("ColorSurvPips.pdf",width=7,height=9)
par(mfrow=c(2,1))
par(mar=c(5.75,4.5,3,0.5))
barplot(apply(bpipsAC,1,mean),ylim=c(0,1),names.arg=paIds,col=cs,xlab="Coefficient",ylab="PIP",cex.lab=1.4,las=2,cex.names=.7)
title(main="(A) Survival on A/C",cex.main=1.5)
barplot(apply(bpipsMM,1,mean),ylim=c(0,1),names.arg=paIds,col=cs,xlab="Coefficient",ylab="PIP",cex.lab=1.4,las=2,cex.names=.7)
title(main="(B) Survival on MM",cex.main=1.5)
dev.off()
pdf("ColorSurvModAvEffs.pdf",width=7,height=9)
par(mfrow=c(2,1))
par(mar=c(5.75,4.5,3,0.5))
barplot(apply(bmavAC,1,mean),names.arg=paIds,col=cs,xlab="Coefficient",ylab="Effect",cex.lab=1.4,las=2,cex.names=.7)
title(main="(A) Survival on A/C",cex.main=1.5)
barplot(apply(bmavMM,1,mean),names.arg=paIds,col=cs,xlab="Coefficient",ylab="Effect",cex.lab=1.4,las=2,cex.names=.7)
title(main="(B) Survival on MM",cex.main=1.5)
dev.off()
## number of additive effects, number of interactions in the model
ae<-1:6
ee<-7:21
Nx<-10000
noAeAC<-rep(NA,Nx)
noEeAC<-rep(NA,Nx)
noAeMM<-rep(NA,Nx)
noEeMM<-rep(NA,Nx)
for(i in 1:Nx){
sAC<-rbinom(n=21,size=1,prob=apply(bpipsAC,1,mean))
sMM<-rbinom(n=21,size=1,prob=apply(bpipsMM,1,mean))
noAeAC[i]<-sum(sAC[ae])
noEeAC[i]<-sum(sAC[ee])
noAeMM[i]<-sum(sMM[ae])
noEeMM[i]<-sum(sMM[ee])
}
mnsd<-function(x=NA){
c(mean(x),sd(x))
}
mnsd(noAeAC)
# 1.0997000 0.8263342
mnsd(noAeMM)
# 0.5616000 0.7154416
mnsd(noEeAC)
# 2.92920 1.40783
mnsd(noEeMM)
# 2.15640 1.29534
mnsd(noAeAC+noEeAC)
# 4.028900 1.633564
mnsd(noAeMM+noEeMM)
# 2.718000 1.477191
## fits of 'top' model and models with pips > 0.5... these are the same
xx<-which(apply(bpipsAC,1,mean) > .5)
## 6 8
o<-glm(dfAC[,1] ~ as.matrix(dfAC[,xx+1]),family="binomial")
summary(o)
#Coefficients:
# Estimate Std. Error z value Pr(>|z|)
#(Intercept) -1.4391 0.2011 -7.157 8.24e-13 ***
#as.matrix(dfAC[, xx + 1])V6 2.1053 0.7271 2.896 0.00378 **
#as.matrix(dfAC[, xx + 1])V1.V3 -1.5453 0.5209 -2.967 0.00301 **
# Null deviance: 240.71 on 215 degrees of freedom
#Residual deviance: 224.95 on 213 degrees of freedom
#AIC: 230.95
## Efron's pseudo r2
mn<-mean(dfAC[,1])
1-(sum((dfAC[,1]-o$fitted.values)^2)/sum((dfAC[,1]-mn)^2))
#[1] 0.08097351
## comparing to case where all terms with pp > 0.1 are included
xx<-which(apply(bpipsAC,1,mean) > .1)
## 4 6 7 8 9 10 13 14 15 18 21
o<-glm(dfAC[,1] ~ as.matrix(dfAC[,xx+1]),family="binomial")
summary(o)
#Coefficients:
# Estimate Std. Error z value Pr(>|z|)
#(Intercept) -1.5309 0.2917 -5.249 1.53e-07 ***
#as.matrix(dfAC[, xx + 1])V4 -0.8559 0.3808 -2.247 0.0246 *
#as.matrix(dfAC[, xx + 1])V6 5.2336 3.4741 1.506 0.1320
#as.matrix(dfAC[, xx + 1])V1.V2 1.4741 1.6857 0.874 0.3819
#as.matrix(dfAC[, xx + 1])V1.V3 -2.0473 0.9999 -2.047 0.0406 *
#as.matrix(dfAC[, xx + 1])V1.V4 0.1972 1.1848 0.166 0.8678
#as.matrix(dfAC[, xx + 1])V1.V5 -1.7193 1.1768 -1.461 0.1440
#as.matrix(dfAC[, xx + 1])V2.V4 0.6583 1.0109 0.651 0.5149
#as.matrix(dfAC[, xx + 1])V2.V5 -6.0606 2.8971 -2.092 0.0364 *
#as.matrix(dfAC[, xx + 1])V2.V6 -3.3460 1.7971 -1.862 0.0626 .
#as.matrix(dfAC[, xx + 1])V3.V6 11.2613 7.2973 1.543 0.1228
#as.matrix(dfAC[, xx + 1])V5.V6 -0.2194 13.3376 -0.016 0.9869
# Null deviance: 240.71 on 215 degrees of freedom
#Residual deviance: 207.22 on 204 degrees of freedom
#AIC: 231.22
mn<-mean(dfAC[,1])
1-(sum((dfAC[,1]-o$fitted.values)^2)/sum((dfAC[,1]-mn)^2))
## [1] 0.1569219
## fits of 'top' model and models with pips > 0.5... these are the same
xx<-which(apply(bpipsMM,1,mean) > .5)
## 16
o<-glm(dfMM[,1] ~ as.matrix(dfMM[,xx+1]),family="binomial")
summary(o)
#Coefficients:
# Estimate Std. Error z value Pr(>|z|)
#(Intercept) -1.7964 0.2470 -7.274 3.48e-13 ***
#as.matrix(dfMM[, xx + 1]) -1.1578 0.4528 -2.557 0.0106 *
# Null deviance: 211.5 on 212 degrees of freedom
#Residual deviance: 205.0 on 211 degrees of freedom
#AIC: 209
mn<-mean(dfMM[,1])
1-(sum((dfMM[,1]-o$fitted.values)^2)/sum((dfMM[,1]-mn)^2))
## 0.02951571
## comparing to case where all terms with pp > 0.1 are included
xx<-which(apply(bpipsMM,1,mean) > .1)
## 3 6 9 15 16 17 18 19 20
o<-glm(dfMM[,1] ~ as.matrix(dfMM[,xx+1]),family="binomial")
summary(o)
#Coefficients:
# Estimate Std. Error z value Pr(>|z|)
#(Intercept) -3.8924 0.9791 -3.976 7.02e-05 ***
#as.matrix(dfMM[, xx + 1])V3 -9.7522 4.5110 -2.162 0.0306 *
#as.matrix(dfMM[, xx + 1])V6 -50.3035 23.2903 -2.160 0.0308 *
#as.matrix(dfMM[, xx + 1])V1.V4 -1.3630 1.0418 -1.308 0.1907
#as.matrix(dfMM[, xx + 1])V2.V6 -6.8548 9.4857 -0.723 0.4699
#as.matrix(dfMM[, xx + 1])V3.V4 -1.3313 0.6643 -2.004 0.0451 *
#as.matrix(dfMM[, xx + 1])V3.V5 -0.1454 1.2631 -0.115 0.9084
#as.matrix(dfMM[, xx + 1])V3.V6 -250.6165 114.1158 -2.196 0.0281 *
#as.matrix(dfMM[, xx + 1])V4.V5 0.8689 2.2369 0.388 0.6977
#as.matrix(dfMM[, xx + 1])V4.V6 -2.6720 2.2256 -1.201 0.2299
mn<-mean(dfMM[,1])
1-(sum((dfMM[,1]-o$fitted.values)^2)/sum((dfMM[,1]-mn)^2))
## [1] 0.1248464