Connecting color-associated SNPs with survival in the Timema chumash 2019 experiment

Post date: Sep 26, 2019 7:43:39 PM

These results now include the previously dropped individuals.

1. Extract summaries from the posterior. All within the output subdirectory

Hyperparameters:

perl calpost.pl *0.hyp.txt

Generates genarch_tchum_mel.txt, with the six traits (AC sex, AC survival, MM sex, MM survival, RG, GB) as rows, and point estimates and 95% CIs for PVE, PGE and n-gamma as columns.

PIPs:

perl grabPips.pl o_mod_g_tchum_AC_ph1_ch0.param.txt

perl grabPips.pl o_mod_g_tchum_AC_ph2_ch0.param.txt

perl grabPips.pl o_mod_g_tchum_MM_ph1_ch0.param.txt

perl grabPips.pl o_mod_g_tchum_MM_ph2_ch0.param.txt

perl grabPips.pl o_mod_g_tchum_ph1_ch0.param.txt

perl grabPips.pl o_mod_g_tchum_ph2_ch0.param.txt

Generates the pip files, pip_o_mod_g_tchum_*, one row each per SNP.

MAVs:

perl grabMavEffects.pl o_mod_g_tchum_AC_ph1_ch0.param.txt

perl grabMavEffects.pl o_mod_g_tchum_AC_ph2_ch0.param.txt

perl grabMavEffects.pl o_mod_g_tchum_MM_ph1_ch0.param.txt

perl grabMavEffects.pl o_mod_g_tchum_MM_ph2_ch0.param.txt

perl grabMavEffects.pl o_mod_g_tchum_ph1_ch0.param.txt

perl grabMavEffects.pl o_mod_g_tchum_ph2_ch0.param.txt

Generates the model-averaged effect files, mav_o_mod_g_tchum_*, one row each per SNP.

2. Plots and analyses in R (models.R):

I have re-ran the gemma and Bayesian model averaging/model selection analysis with the previously dropped individuals added back in. I have also looked at/calculated a few different summaries (to think more about epistasis) and tweaked a few other things. All is more or less as it was, but here is the new summary. My plans for next steps come at the end.

a. Focusing on the indel SNPs, I grabbed all of the SNPs with PIPs from gemma for mapping color that were 0.5 or greater for RG or GB (or both). Thus, these SNPs were more likely to be associated with color than not. There were 6 such SNPs.

Here are their positions (on scaffold 128): 5359458, 5359483, 5359496, 5690741, 5708354, 5746964.

SNPs 2, 3, 5 and 6 from this list were associated with RG, and SNPs 1, 3, 4, and 6 were associated with GB (i.e., some shared, some not).

b. Then, I used Bayesian model averaging/variable selection regression to fit a model (while doing variable selection) with additive and interaction effects for these six SNPs (21 variables total) for survival on A/C and on MM. The figure (ColorSurvPips.pdf) shows PIPs for these 21 variables for A/C and MM (note the first 6 terms are the additive ones, the others are the interactions). Visually, my take on this is that there is more of a signal on A/C than MM. This includes interactions in both cases, and the bulk of the signal comes from interactions on A/C. Quantitatively, we have the estimates based on these PIPs (all give posterior mean and sd = s.e.):

Total number of SNPs or SNP interactions in the model

A/C = 4.03 (1.63); MM = 2.72 (1.48)

Number of additive effects

A/C = 1.10 (0.83); MM = 0.56 (0.72)

Number of interaction effects

A/C = 2.93 (1.41); MM = 2.16 (1.30)

Note that large s.e. make discussion of credible/significant differences here tricky, but the trends are pretty clear/compelling (just lots of uncertainty).

c. I am now also including model-averaged effect estimates from the Bayesian model averaging (ColorSurvModAvEffs.pdf). There are some details worth thinking about interactions for genotypes that are worth discussing, but I think you can mostly conclude sign epistasis with an epistatic effect involving a SNP is larger in magnitude then its additive effect (but lets chat the details). Under this interpretation, we have some evidence of sign epistasis, though it is a bit complicated by the fact that we have so few additive effects (i.e., it is only the interaction that matters in many cases).

Also, just for context/thinking about interactions and real effects in the population, here are the non-reference allele frequencies for the six SNPs: 0.126, 0.073, 0.601, 0.662, 0.089, 0.022. Note the 6th one which stands out in some ways is rather rare.

d. Finally, I fit generalized linear models with (i) all predictors with PIPs > .5 or (ii) all predictors with PIPs > .1 (this is different than last time where I just used all of them). The former is just 2 or 1 predictors for AC and MM respectively, whereas the latter gives 11 or 9. The pseudo-R2 (this are glms) are as follows (and in all cases some terms are significant):

AC, PIP > .5, 0.08

AC, PIP > .1, 0.16

MM, PIP > .5, 0.03

MM, PIP > .1, 0.12

Next, I plan to play more with a few other ways of summarizing explanatory power and then to turn to landscapes proper.

Here is the code:

## read in genetic and survival data

g_AC<-as.matrix(read.table("g_tchum_AC.txt",header=FALSE))

g_MM<-as.matrix(read.table("g_tchum_MM.txt",header=FALSE))

snps<-as.matrix(read.table("tchumSnpTable.txt",header=FALSE))

del<-which(snps[,2]==128 & snps[,3] >= 5000000 & snps[,3] <= 6000000)

g_AC_del<-t(g_AC[del,])

g_MM_del<-t(g_MM[del,])

c_AC_del<-g_AC_del

c_MM_del<-g_MM_del

for(i in 1:length(del)){

c_AC_del[,i]<-c_AC_del[,i]-mean(c_AC_del[,i])

c_MM_del[,i]<-c_MM_del[,i]-mean(c_MM_del[,i])

}

c_AC_del2<-c_AC_del^2

c_MM_del2<-c_MM_del^2

p_AC<-read.table("pheno_AC.txt",header=FALSE)

p_MM<-read.table("pheno_MM.txt",header=FALSE)

## top pip SNPs

pf<-list.files("output/",pattern="pip_")

pips<-matrix(NA,nrow=11855,ncol=6)

for(i in 1:6){

pips[,i]<-scan(paste("output/",pf[i],sep=""))

}

pipSnpTab<-read.table("output/pipSnpTable.txt",header=FALSE)

pipDel<-which(pipSnpTab[,2]==128 & pipSnpTab[,3] >= 5000000 & pipSnpTab[,3] <= 6000000)

pipDf<-data.frame(pipSnpTab[pipDel,],pips[pipDel,5:6])

sum(pipDf[,5] > 0.5 | pipDf[,4] > 0.5)

#[1] 6

cor.test(pipDf[,4],pipDf[,5])

# Pearson's product-moment correlation

#Pearson's product-moment correlation

#data: pipDf[, 4] and pipDf[, 5]

#t = 2.0706, df = 16, p-value = 0.05493

#alternative hypothesis: true correlation is not equal to 0

#95 percent confidence interval:

# -0.009121002 0.762851117

#sample estimates:

# cor

#0.4597067

#data: pipDf[, 4] and pipDf[, 5]

## get relevant SNP ids

topIds<-pipDf[which(pipDf[,5] > 0.5 | pipDf[,4] > 0.5),1:3]

topSnps<-rep(NA,6)

for(i in 1:6){

topSnps[i]<-which(snps[,1]==topIds[i,1] & snps[,2]==topIds[i,2] & snps[,3]==topIds[i,3])

}

## RG = 2, 3, 5, 6

## GB = 1, 3, 4, 6

g_AC_pip<-t(g_AC[topSnps,])

g_MM_pip<-t(g_MM[topSnps,])

c_AC_pip<-g_AC_pip

c_MM_pip<-g_MM_pip

for(i in 1:length(topSnps)){

c_AC_pip[,i]<-c_AC_pip[,i]-mean(c_AC_pip[,i])

c_MM_pip[,i]<-c_MM_pip[,i]-mean(c_MM_pip[,i])

}

library(BMS)

V<-21 ## variables

form<-p_AC[,2] ~ .^2

mod<-model.matrix(form,dat=as.data.frame(c_AC_pip))

mod<-cbind(mod[,-1])

dfAC<-data.frame(p_AC[,2],mod)

bmsAC<-vector("list",3)

for(i in 1:5){

bmsAC[[i]]<-bms(dfAC,burn=10000,iter=200000,mprior="uniform",g="UIP",user.int=FALSE)

}

bpipsAC<-matrix(NA,nrow=V,ncol=5)

bmavAC<-matrix(NA,nrow=V,ncol=5)

for(i in 1:5){

oo<-coef(bmsAC[[i]],order.by.pip=FALSE)

bpipsAC[,i]<-oo[,1]

bmavAC[,i]<-oo[,2]

}

paIds<-row.names(oo)

form<-p_MM[,2] ~ .^2

mod<-model.matrix(form,dat=as.data.frame(c_MM_pip))

mod<-cbind(mod[,-1])

dfMM<-data.frame(p_MM[,2],mod)

bmsMM<-vector("list",3)

for(i in 1:5){

bmsMM[[i]]<-bms(dfMM,burn=10000,iter=200000,mprior="uniform",g="UIP",user.int=FALSE)

}

bpipsMM<-matrix(NA,nrow=V,ncol=5)

bmavMM<-matrix(NA,nrow=V,ncol=5)

for(i in 1:5){

oo<-coef(bmsMM[[i]],order.by.pip=FALSE)

bpipsMM[,i]<-oo[,1]

bmavMM[,i]<-oo[,2]

}

cs<-rep(c("cornsilk3","coral3"),c(6,15))

pdf("ColorSurvPips.pdf",width=7,height=9)

par(mfrow=c(2,1))

par(mar=c(5.75,4.5,3,0.5))

barplot(apply(bpipsAC,1,mean),ylim=c(0,1),names.arg=paIds,col=cs,xlab="Coefficient",ylab="PIP",cex.lab=1.4,las=2,cex.names=.7)

title(main="(A) Survival on A/C",cex.main=1.5)

barplot(apply(bpipsMM,1,mean),ylim=c(0,1),names.arg=paIds,col=cs,xlab="Coefficient",ylab="PIP",cex.lab=1.4,las=2,cex.names=.7)

title(main="(B) Survival on MM",cex.main=1.5)

dev.off()

pdf("ColorSurvModAvEffs.pdf",width=7,height=9)

par(mfrow=c(2,1))

par(mar=c(5.75,4.5,3,0.5))

barplot(apply(bmavAC,1,mean),names.arg=paIds,col=cs,xlab="Coefficient",ylab="Effect",cex.lab=1.4,las=2,cex.names=.7)

title(main="(A) Survival on A/C",cex.main=1.5)

barplot(apply(bmavMM,1,mean),names.arg=paIds,col=cs,xlab="Coefficient",ylab="Effect",cex.lab=1.4,las=2,cex.names=.7)

title(main="(B) Survival on MM",cex.main=1.5)

dev.off()

## number of additive effects, number of interactions in the model

ae<-1:6

ee<-7:21

Nx<-10000

noAeAC<-rep(NA,Nx)

noEeAC<-rep(NA,Nx)

noAeMM<-rep(NA,Nx)

noEeMM<-rep(NA,Nx)

for(i in 1:Nx){

sAC<-rbinom(n=21,size=1,prob=apply(bpipsAC,1,mean))

sMM<-rbinom(n=21,size=1,prob=apply(bpipsMM,1,mean))

noAeAC[i]<-sum(sAC[ae])

noEeAC[i]<-sum(sAC[ee])

noAeMM[i]<-sum(sMM[ae])

noEeMM[i]<-sum(sMM[ee])

}

mnsd<-function(x=NA){

c(mean(x),sd(x))

}

mnsd(noAeAC)

# 1.0997000 0.8263342

mnsd(noAeMM)

# 0.5616000 0.7154416

mnsd(noEeAC)

# 2.92920 1.40783

mnsd(noEeMM)

# 2.15640 1.29534

mnsd(noAeAC+noEeAC)

# 4.028900 1.633564

mnsd(noAeMM+noEeMM)

# 2.718000 1.477191

## fits of 'top' model and models with pips > 0.5... these are the same

xx<-which(apply(bpipsAC,1,mean) > .5)

## 6 8

o<-glm(dfAC[,1] ~ as.matrix(dfAC[,xx+1]),family="binomial")

summary(o)

#Coefficients:

# Estimate Std. Error z value Pr(>|z|)

#(Intercept) -1.4391 0.2011 -7.157 8.24e-13 ***

#as.matrix(dfAC[, xx + 1])V6 2.1053 0.7271 2.896 0.00378 **

#as.matrix(dfAC[, xx + 1])V1.V3 -1.5453 0.5209 -2.967 0.00301 **

# Null deviance: 240.71 on 215 degrees of freedom

#Residual deviance: 224.95 on 213 degrees of freedom

#AIC: 230.95

## Efron's pseudo r2

mn<-mean(dfAC[,1])

1-(sum((dfAC[,1]-o$fitted.values)^2)/sum((dfAC[,1]-mn)^2))

#[1] 0.08097351

## comparing to case where all terms with pp > 0.1 are included

xx<-which(apply(bpipsAC,1,mean) > .1)

## 4 6 7 8 9 10 13 14 15 18 21

o<-glm(dfAC[,1] ~ as.matrix(dfAC[,xx+1]),family="binomial")

summary(o)

#Coefficients:

# Estimate Std. Error z value Pr(>|z|)

#(Intercept) -1.5309 0.2917 -5.249 1.53e-07 ***

#as.matrix(dfAC[, xx + 1])V4 -0.8559 0.3808 -2.247 0.0246 *

#as.matrix(dfAC[, xx + 1])V6 5.2336 3.4741 1.506 0.1320

#as.matrix(dfAC[, xx + 1])V1.V2 1.4741 1.6857 0.874 0.3819

#as.matrix(dfAC[, xx + 1])V1.V3 -2.0473 0.9999 -2.047 0.0406 *

#as.matrix(dfAC[, xx + 1])V1.V4 0.1972 1.1848 0.166 0.8678

#as.matrix(dfAC[, xx + 1])V1.V5 -1.7193 1.1768 -1.461 0.1440

#as.matrix(dfAC[, xx + 1])V2.V4 0.6583 1.0109 0.651 0.5149

#as.matrix(dfAC[, xx + 1])V2.V5 -6.0606 2.8971 -2.092 0.0364 *

#as.matrix(dfAC[, xx + 1])V2.V6 -3.3460 1.7971 -1.862 0.0626 .

#as.matrix(dfAC[, xx + 1])V3.V6 11.2613 7.2973 1.543 0.1228

#as.matrix(dfAC[, xx + 1])V5.V6 -0.2194 13.3376 -0.016 0.9869

# Null deviance: 240.71 on 215 degrees of freedom

#Residual deviance: 207.22 on 204 degrees of freedom

#AIC: 231.22

mn<-mean(dfAC[,1])

1-(sum((dfAC[,1]-o$fitted.values)^2)/sum((dfAC[,1]-mn)^2))

## [1] 0.1569219

## fits of 'top' model and models with pips > 0.5... these are the same

xx<-which(apply(bpipsMM,1,mean) > .5)

## 16

o<-glm(dfMM[,1] ~ as.matrix(dfMM[,xx+1]),family="binomial")

summary(o)

#Coefficients:

# Estimate Std. Error z value Pr(>|z|)

#(Intercept) -1.7964 0.2470 -7.274 3.48e-13 ***

#as.matrix(dfMM[, xx + 1]) -1.1578 0.4528 -2.557 0.0106 *

# Null deviance: 211.5 on 212 degrees of freedom

#Residual deviance: 205.0 on 211 degrees of freedom

#AIC: 209

mn<-mean(dfMM[,1])

1-(sum((dfMM[,1]-o$fitted.values)^2)/sum((dfMM[,1]-mn)^2))

## 0.02951571

## comparing to case where all terms with pp > 0.1 are included

xx<-which(apply(bpipsMM,1,mean) > .1)

## 3 6 9 15 16 17 18 19 20

o<-glm(dfMM[,1] ~ as.matrix(dfMM[,xx+1]),family="binomial")

summary(o)

#Coefficients:

# Estimate Std. Error z value Pr(>|z|)

#(Intercept) -3.8924 0.9791 -3.976 7.02e-05 ***

#as.matrix(dfMM[, xx + 1])V3 -9.7522 4.5110 -2.162 0.0306 *

#as.matrix(dfMM[, xx + 1])V6 -50.3035 23.2903 -2.160 0.0308 *

#as.matrix(dfMM[, xx + 1])V1.V4 -1.3630 1.0418 -1.308 0.1907

#as.matrix(dfMM[, xx + 1])V2.V6 -6.8548 9.4857 -0.723 0.4699

#as.matrix(dfMM[, xx + 1])V3.V4 -1.3313 0.6643 -2.004 0.0451 *

#as.matrix(dfMM[, xx + 1])V3.V5 -0.1454 1.2631 -0.115 0.9084

#as.matrix(dfMM[, xx + 1])V3.V6 -250.6165 114.1158 -2.196 0.0281 *

#as.matrix(dfMM[, xx + 1])V4.V5 0.8689 2.2369 0.388 0.6977

#as.matrix(dfMM[, xx + 1])V4.V6 -2.6720 2.2256 -1.201 0.2299

mn<-mean(dfMM[,1])

1-(sum((dfMM[,1]-o$fitted.values)^2)/sum((dfMM[,1]-mn)^2))

## [1] 0.1248464

Page updated

Google Sites

Report abuse