Post date: Jan 09, 2015 8:29:9 PM
Several weeks ago I started testing for an association between Fst (between different hosts) and the difference in model averaged effects. The input files I made are in /home/A01963476/projects/timema_wgexperiment/fstAnalyses/. We had found such correlations (though very weak) in the GBS data, and interestingly the strength of the correlation was associated with gene flow between pops. Here is the summary I sent Patrik:
Only about 10% of the SNPs are shared between the wild and experimental populations, meaning our correlations between Fst and pips or betas (raw or model-averaged) are based on about 370,000 SNPs. While plenty of SNPs, I was a bit surprised that the overlap wasn't higher. This argues strongly for calling variants in a combined analyses when making comparisons across data sets (it might be worth doing here?).
At a first pass, we effectively have no correlations between the Fsts (including for the allopatric population pair) and beta, model-averaged betas, pips, or the absolute difference in model-averaged betas. So, not much here.
But, perhaps one shouldn't expect an overall correlation, rather we should just expect a correlation for those SNPs with the greatest model-averaged effects. Now I will test for that.
Still nothing. No correlations between Fst (any population pair) and the difference in betas or model-averaged betas between treatments. Nor are their correlations for the top 1%, 0.1% or 0.01% of loci (based on betas or model-averaged betas). Nor are the same loci found disproporinately in the top 1% (or 0.1 or 0.01%) for mean Fst (or Fst for any population pair) and betas (or model-averaged betas). This one is done (with the caveat that this might be worth revisiting with the set of SNPs called together)! Here is the R code.
## testing for an association between Fst and model-averaged effects
nl<-371457
np<-14
## read in data
## lg ord scaf pos fstHV fstMR1 fstR12 fstLaPrc betaA mbA pipA betaA mbA pipA
parm<-matrix(scan("genomeFstPimass.txt",n=nl*np,sep=" "),nrow=nl,ncol=np,byrow=T)
## dif in betas
dmb<-abs(parm[,10]-parm[,13])
## no correlation between dmb and fst for any population when considering all loci
## top loci
q<-quantile(dmb,probs=c(0.99,0.999,0.9999),na.rm=T) ## top 1%, 0.1% 0.01%
## top 1%
x<-which(dmb >= q[1]) ## 3330 loci
cor.test(dmb[x],parm[x,5]) ## no cor. for any
## top 0.1%
x<-which(dmb >= q[2]) ## 333 loci
cor.test(dmb[x],parm[x,6])
#t = -2.0326, df = 331, p-value = 0.04289
#alternative hypothesis: true correlation is not equal to 0
#95 percent confidence interval:
# -0.215932459 -0.003600727
#sample estimates:
# cor
#-0.1110335
## top 0.01%
x<-which(dmb >= q[2]) ## 34 loci
cor.test(dmb[x],parm[x,5])## none
## is their an enrichment for top loci in both sets
mnfst<-apply(parm[,5:8],1,mean,na.rm=T) ## mean fst
qf<-quantile(mnfst,probs=c(0.99,0.999,0.9999),na.rm=T)
## top 1% in each
x<-which(dmb >= q[1] & mnfst >= qf[1])## 33 loci
x<-which(dmb >= q[2] & mnfst >= qf[2])## none
## what about model-avg betas
dmb<-abs(parm[,11]-parm[,14])
## no correlations
## 42 loci in top 1% in each
## permutation test
num<-numeric(1000)
for(i in 1:1000){
pmnfst<-sample(mnfst,length(mnfst),replace=F)
x<-which(dmb >= q[1] & pmnfst >= qf[1])
num[i]<-length(x)
}
mean(num >= 42)
## 0.077
## fst for individual pop pairs
#######################
qf<-quantile(parm[,5],probs=c(0.99,0.999,0.9999),na.rm=T)
x<-which(dmb >= q[1] & parm[,5] >= qf[1])
obs<-length(x)
num<-numeric(1000)
for(i in 1:1000){
pmnfst<-sample(parm[,5],length(parm[,5]),replace=F)
x<-which(dmb >= q[1] & pmnfst >= qf[1])
num[i]<-length(x)
}
mean(num >= obs)
## p = 0.892
#######################
qf<-quantile(parm[,6],probs=c(0.99,0.999,0.9999),na.rm=T)
x<-which(dmb >= q[1] & parm[,6] >= qf[1])
obs<-length(x)
num<-numeric(1000)
for(i in 1:1000){
pmnfst<-sample(parm[,6],length(parm[,6]),replace=F)
x<-which(dmb >= q[1] & pmnfst >= qf[1])
num[i]<-length(x)
}
mean(num >= obs)
## p = 0.236
#######################
qf<-quantile(parm[,7],probs=c(0.99,0.999,0.9999),na.rm=T)
x<-which(dmb >= q[1] & parm[,7] >= qf[1])
obs<-length(x)
num<-numeric(1000)
for(i in 1:1000){
pmnfst<-sample(parm[,7],length(parm[,7]),replace=F)
x<-which(dmb >= q[1] & pmnfst >= qf[1])
num[i]<-length(x)
}
mean(num >= obs)
## p = 0.394
#######################
qf<-quantile(parm[,8],probs=c(0.99,0.999,0.9999),na.rm=T)
x<-which(dmb >= q[1] & parm[,8] >= qf[1])
obs<-length(x)
num<-numeric(1000)
for(i in 1:1000){
pmnfst<-sample(parm[,8],length(parm[,8]),replace=F)
x<-which(dmb >= q[1] & pmnfst >= qf[1])
num[i]<-length(x)
}
mean(num >= obs)
## p = 0.166