Mapping color in gemma with epistatic effects from mapit

Post date: Oct 25, 2019 11:38:12 PM

We idenfited five SNPs with signficant interactions from MAPIT (three RG only and two RG and GB).

# V1 V2 V3

#[1,] 8 128 5359496

#[2,] 8 128 5690726

#[3,] 8 128 5690741

#[4,] 8 128 5746977

#[5,] 8 128 5746990

I added these to the gentoype table as interactions for running gemma. Note these were centered but not standardized prior to calculating the interaction terms (i.e., multiplying genotypes).

Gepi<-matrix(NA,nrow=5,ncol=437)

for(i in 1:5){

a<-which(snps[,2] == eSnps[i,2] & snps[,3] == eSnps[i,3])

Gepi[i,]<-gdat[a,]

Gepi[i,]<-Gepi[i,]-mean(Gepi[i,])

}

eGmat<-matrix(NA,nrow=10,ncol=437)

k<-1

for(i in 1:4){for(j in (i+1):5){

eGmat[k,]<-Gepi[i,] * Gepi[j,]

k<-k+1

}}

Gout<-as.matrix(rbind(gdat,eGmat))

write.table(Gout,file="../Gemma_mel/g_epi_tchum.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)

I then created the mod* genotype file for gemma for these and ran gemma as,

sbatch RunGemmaEpi.sh

which runs RunGemmaEpiFork.pl

#!/usr/bin/perl

# run gemma jobs

use Parallel::ForkManager;

my $max = 20;

my $pm = Parallel::ForkManager->new($max);

my $ph = "pheno_color.txt";

foreach $g (@ARGV){

$base = $g;

$base =~ s/mod_g_//;

$base =~ s/\.txt//;

foreach $ch (0..9){

system "sleep 2\n";

foreach $trait (1..2){

$pm->start and next; ## fork

$out = "oe_$base"."_ph$trait"."_ch$ch";

system "gemma -g $g -p $ph -bslmm 1 -n $trait -k output/tchum_epi.cXX.txt -notsnp -w 200000 -s 1000000 -o $out\n";

$pm->finish;

}

$pm->wait_all_children;

Then from the output directory, I calcualted hyperparameter posteriors (RG and GB) and summarized the PIPs in R (see epiPips.R, which is below, and calpostEpi.pl):

## pips for epistatic interactions

## = last 10 terms

f1<-list.files(pattern=glob2rx("oe_epi_tchum_ph1*param.txt"))

f2<-list.files(pattern=glob2rx("oe_epi_tchum_ph2*param.txt"))

pip1<-matrix(NA,nrow=10,ncol=10)

pip2<-matrix(NA,nrow=10,ncol=10)

for(i in 1:10){

dat1<-read.table(f1[i],header=TRUE)

dat2<-read.table(f2[i],header=TRUE)

pip1[,i]<-dat1$gamma[11856:11865]

pip2[,i]<-dat2$gamma[11856:11865]

}

cs<-c("indianred2","cornflowerblue")

pdf("chumColorEpiPips.pdf",width=5,height=5)

par(mar=c(4.5,4.5,0.5,0.5))

plot(apply(pip1,1,mean),pch=19,col=cs[1],ylab="PIP",xlab="SNP pair",cex.lab=1.4)

points(apply(pip2,1,mean),pch=19,col=cs[2])

dev.off()

1. First, here are the PVEs (from genarch_tchum_epi.txt):

RG = 0.907, 0.861-0.958

GB = 0.790, 0.708-0.873

Note much of a shift on GB, but definitely something on RG.

chumColorEpiPips.pdf shows the PIPs from gemma just for the 10 interaction terms (for all pairs of the 5 SNPs tagged from MAPIT).

Note that there are hints here and there, but that two stand out, one for RG and one for GB.

For RG that is between our additive associated SNPs 3 and 4 (which also interact for fitness... recall the 1 x 3 x 4 three way interaction).

For GB that is also SNP 3, but interacting with a SNP kind of near our additive SNP 5.

I can't say for sure why the PVEs from MAPIT were less, except for the possiblity that it is all about what else is in the model (which I guess is the answer in some sense).

2. I computed genome estiamted breeding values (GEBVs) from the gemma models for color with and without the 10 interactions (pairs for five SNPs).

The GEBVs from the two analyses are highly correlated:

RG, r = 0.95, p < 0.001

GB, r = 0.98, p < 0.001

Thus, even though some of the effects shifted around (especially for RG) we are getting similar estimates of what the color value should be when considering all of the SNPs in the model (i.e., SNPs/interactions were standing in for each other).

I then made a plot of GEBVs for RG vs. GB for the two models. The scatterplot from the additive model is in gray and the model with epistasis in orange. Note that allowing for epistasis does slightly tweak where in parameter space the GEBVs fall, and in such a way that it kind of pulls the green vs. melanic clusters apart more (most evidence for the melanic group which shifts right).

See plotGebvs.R in output:

## plot GEBV distributions for only additive vs. epistatic

rg_epi<-scan("mav_oe_epi_tchum_ph1_ch0.param.txt")

gb_epi<-scan("mav_oe_epi_tchum_ph2_ch0.param.txt")

rg_add<-scan("mav_o_mod_g_tchum_ph1_ch0.param.txt")

gb_add<-scan("mav_o_mod_g_tchum_ph2_ch0.param.txt")

g_add<-as.matrix(read.table("../g_tchum.txt",header=FALSE))

g_epi<-as.matrix(read.table("../g_epi_tchum.txt",header=FALSE))

drop<-which(apply(g_add,1,sd)==0)

g_add<-g_add[-drop,]

g_epi<-g_epi[-drop,]

gebv_rg_add<-rg_add %*% g_add

gebv_gb_add<-gb_add %*% g_add

gebv_rg_epi<-rg_epi %*% g_epi

gebv_gb_epi<-gb_epi %*% g_epi

cor.test(gebv_rg_add,gebv_rg_epi)

#data: gebv_rg_add and gebv_rg_epi

#t = 61.265, df = 435, p-value < 2.2e-16

#alternative hypothesis: true correlation is not equal to 0

#95 percent confidence interval:

# 0.9359554 0.9555952

#sample estimates:

# cor

#0.9466473

cor.test(gebv_gb_add,gebv_gb_epi)

#data: gebv_gb_add and gebv_gb_epi

#t = 104.83, df = 435, p-value < 2.2e-16

#alternative hypothesis: true correlation is not equal to 0

#95 percent confidence interval:

# 0.9768446 0.9840487

#sample estimates:

# cor

#0.9807781

## after centering

c_add<-g_add

c_epi<-g_epi

for(i in 1:dim(g_add)[1]){

c_add[i,]<-c_add[i,]-mean(c_add[i,])

}

for(i in 1:dim(g_epi)[1]){

c_epi[i,]<-c_epi[i,]-mean(c_epi[i,])

}

c_gebv_rg_add<-rg_add %*% c_add

c_gebv_gb_add<-gb_add %*% c_add

c_gebv_rg_epi<-rg_epi %*% c_epi

c_gebv_gb_epi<-gb_epi %*% c_epi

xlb<-min(c(c_gebv_rg_add,c_gebv_rg_epi))

xub<-max(c(c_gebv_rg_add,c_gebv_rg_epi))

ylb<-min(c(c_gebv_gb_add,c_gebv_gb_epi))

yub<-max(c(c_gebv_gb_add,c_gebv_gb_epi))

library(scales)

pdf("rgGbAddVsEpi.pdf",width=5,height=5)

par(mar=c(4.5,4.5,0.5,0.5))

plot(c_gebv_rg_add,c_gebv_gb_add,pch=19,col=alpha("gray",0.8),xlim=c(xlb,xub),ylim=c(ylb,yub),

xlab="GEBV RG",ylab="GEBV GB",cex.lab=1.4)

points(c_gebv_rg_epi,c_gebv_gb_epi,pch=19,col=alpha("orange",0.5))

dev.off()