Post date: Nov 01, 2015 11:47:39 PM
I ran numerous randomization tests to ask whether 20kb windows for ecotype pairs or the experiment that contained trait-associated SNPs (from gemma) had higher Fst than expected by chance. All of the files for this are in /labs/evolution/projects/timema_speciation/tcristinae/traits/. My R script for these analyses is traitFstRandomTest.R. This is pasted below, but has to be commented, uncommented and edited to run specific contrasts. Detailed results are below. In each case I give the PIP for inclusion, whether only unique SNPs (across traits) and windows (across and within traits) were counted, or multiple hits were allowed, which traits were included, and which ecotype pairs were included (all 4 where not specified), followed by the x-fold difference of trait-Fst to null expectations and the p-value. Clearly, there are more permutations of things we could try, but I think we have a fairly clear and believable story so far. Taking all of this together, I think we see that, there is never a giant jump in Fst, but that we do have a slight increase in Fst for trait associated SNPs when we include stripe, include or only count LA x PRC, and count windows multiple times if there are multiple hits. The pattern is indeed primarily driven by LG8 (see the attached figure for where the trait-associated SNPs are), and LG8 in particular has multiple windows tagged by multiple SNPs (and thus suffers more of a reduction when only counting unique hits). This all makes eminent sense to me and really supports our story.
Counts of PIP 0.05 SNPs in 20kb windows on each LG:
multiple counting: 1 2 3 4 5 6 7 8 9 10 11 12 13
11 2 19 14 3 9 8 36 2 10 3 1 4
unique only: 1 2 3 4 5 6 7 8 9 10 11 12 13
7 1 16 13 3 8 6 15 2 10 3 1 4
PIP = 0.05, unique SNPs, unique windows, all (13) traits: 0.988x, p = 0.6364
PIP = 0.05, unique SNPs, multiple windows, all (13) traits: 1.05x, p = 0.0368
PIP = 0.05, multiple SNPs, multiple windows, all (13) traits: 1.05x, p = 0.0372
PIP = 0.05, multiple windows, stripe: 1.17x, p = 0.023
PIP = 0.05, unique windows, stripe: 1.13x, p = 0.0877
PIP = 0.1, multiple windows, stripe: 1.15x, p = 0.1012
PIP = 0.1, unique windows, stripe: 1.05x, p = 0.3338
PIP = 0.05, unique SNPs, unique windows, all but stripe: 0.987x, p = 0.6433
PIP = 0.05, unique SNPs, multiple windows, all but stripe: 1.04x, p = 0.1251
PIP = 0.05, multiple SNPs, multiple windows, all but stripe: 1.04x, p = 0.1162
PIP = 0.05, windows with >1 hit (counted once), all (13) traits: 1.07x, p = 0.1694
PIP = 0.05, windows with only 1 hit, all (13) traits: 0.966x, p = 0.1694, p = 0.8166
Comparison of >1 hit and only 1 hit windows (i.e. is Fst higher for >1), all (13) traits: 1.28x, p = 0.122
PIP = 0.05, paraptric pairs, multiple SNPs, multiple windows, all (13) traits: 0.989x, p = 0.6686
PIP = 0.05, allopatric pair, multiple SNPs, multiple windows, all (13) traits: 1.12x, p = 0.0081
PIP = 0.05, paraptric pairs, unique SNPs, unique windows, all (13) traits: 0.976x, p = 0.8014
PIP = 0.05, allopatric pair, unique SNPs, unique windows, all (13) traits: 1.00x, p = 0.481
Results related to the selection experiment (these make much less sense, as stripe shows a weaker signal):
PIP = 0.05, founders, multiple windows, all (13) traits: 0.993x, p = 0.6022
PIP = 0.05, survivors, multiple windows, all (13) traits: 1.011x, p = 0.3143
PIP = 0.05, survivors-founders, multiple windows, all (13) traits: 1.018x, p = 0.2761
PIP = 0.05, survivors-founders, unique windows, all (13) traits: 1.034x, p = 0.1749
PIP = 0.05, founders, multiple windows, stripe: 0.912x, p = 0.9173
PIP = 0.05, survivors, multiple windows, stripe: 0.925x, p = 0.8756
PIP = 0.05, survivors-founders, multiple windows, stripe: 0.929x, p = 0.7869
PIP = 0.05, survivors-founders, unique windows, stripe: 0.934x, p = 0.713
PIP = 0.05, founders, multiple windows, all but stripe: 1.01x, p = 0.3322
PIP = 0.05, survivors, multiple windows, all but stripe: 1.03x, p = 0.1435
PIP = 0.05, survivors-founders, multiple windows, all but stripe: 1.035x, p = 0.1503
PIP = 0.05, survivors-founders, unique windows, all but stripe: 1.043x, p = 0.136
R script:
## trait pips
TrBL<-as.matrix(read.table("pip_2013FHA_all_BL_MAF01_sparse.txt",header=F))
TrBW<-as.matrix(read.table("pip_2013FHA_all_residBW_MAF01_sparse.txt",header=F))
TrHW<-as.matrix(read.table("pip_2013FHA_all_residHW_MAF01_sparse.txt",header=F))
TrlatGB<-as.matrix(read.table("pip_2013FHA_all_resid_latGB_MAF01_sparse.txt",header=F))
TrlatL<-as.matrix(read.table("pip_2013FHA_all_resid_latL_MAF01_sparse.txt",header=F))
TrlatRG<-as.matrix(read.table("pip_2013FHA_all_resid_latRG_MAF01_sparse.txt",header=F))
Trfpent<-as.matrix(read.table("pip_2013FHA_females_penta_MAF01_sparse.txt",header=F))
Trfhepta<-as.matrix(read.table("pip_2013FHA_females_hepta_MAF01_sparse.txt",header=F))
Trfnona<-as.matrix(read.table("pip_2013FHA_females_nona_MAF01_sparse.txt",header=F))
Trmpent<-as.matrix(read.table("pip_2013FHA_males_mpenta_MAF01_sparse.txt",header=F))
Trmhepta<-as.matrix(read.table("pip_2013FHA_males_mhepta_MAF01_sparse.txt",header=F))
Trmnona<-as.matrix(read.table("pip_2013FHA_males_mnona_MAF01_sparse.txt",header=F))
Trpatln<-as.matrix(read.table("pip_2013FHA_all_resid_percent_stripe_MAF01_sparse.txt",header=F))
Tlist<-list(TrBL,TrBW,TrHW,TrlatGB,TrlatL,TrlatRG,Trfpent,Trfhepta,Trfnona,Trmpent,Trmhepta,Trmnona,Trpatln)
## fst
n1<-42687
parname<-c("pi","Dxy","Da","Fst")
parm<-9
d1<-matrix(scan("../fst/nei20k_lgver3_p_tcristHV.txt",n=n1*parm,sep=" "),nrow=n1,ncol=parm,byrow=T)
d2<-matrix(scan("../fst/nei20k_lgver3_p_tcristMR1.txt",n=n1*parm,sep=" "),nrow=n1,ncol=parm,byrow=T)
d3<-matrix(scan("../fst/nei20k_lgver3_p_tcristR12.txt",n=n1*parm,sep=" "),nrow=n1,ncol=parm,byrow=T)
d4<-matrix(scan("../fst/nei20k_lgver3_p_tcristL.txt",n=n1*parm,sep=" "),nrow=n1,ncol=parm,byrow=T)
ex1<-matrix(scan("../fst/nei20k_AvC.txt",n=n1*parm,sep=" "),nrow=n1,ncol=parm,byrow=T)
ex2<-matrix(scan("../fst/nei20k_AsurvCsurv.txt",n=n1*parm,sep=" "),nrow=n1,ncol=parm,byrow=T)
#get high pip pattern SNPs
hiPip<-NULL
bnd<-0.05
hiPip<-Trpatln[which(Trpatln[,4] > bnd),]
uHiPip<-hiPip[which(!duplicated(hiPip[,3])),]
# get high pip SNPs
hiPip<-NULL
bnd<-0.05
for(i in 1:12){
x<-Tlist[[i]][which(Tlist[[i]][,4] > bnd),]
if(length(x) > 0){hiPip<-rbind(hiPip,x)}
}
uHiPip<-hiPip
#uHiPip<-hiPip[-duplicated(hiPip[,3]),]
## 20kb window for each hi pip SNPs
n<-dim(uHiPip)[1]
pipWin<-rep(NA,n)
for(i in 1:n){
x<-which(d1[,3]==uHiPip[i,2] & abs(d1[,4]-uHiPip[i,3]) < 20000)
if (length(x) == 1){
pipWin[i]<-x
}
else if (length(x) > 1){
x<-which(d1[,3]==uHiPip[i,2] & abs(d1[,4]-uHiPip[i,3]) < 10000)
if (length(x) == 1){
pipWin[i]<-x
}
else{
cat ("mult snps for ",i,"\n")
}
}
}
pipWin<-pipWin[is.na(pipWin)==FALSE]
## get only windows with 1 or >1 hit
#winCnts<-table(pipWin)
#pipWin<-as.numeric(names(winCnts[winCnts == 1]))
#p1<-as.numeric(names(winCnts[winCnts == 1]))
#p2<-as.numeric(names(winCnts[winCnts > 1]))
####
A<-as.matrix(cbind(d1[,parm],d2[,parm],d3[,parm]))
#A<-as.matrix(cbind(d1[,parm],d2[,parm],d3[,parm],d4[,parm]))
mnFst<-apply(A,1,mean,na.rm=TRUE)
mnFst<-d4[,parm]
mnFst<-ex2[,parm]
obs<-mean(mnFst[pipWin],na.rm=T)
null<-rep(NA,10000)
for(i in 1:10000){
a<-sample(1:n1,length(pipWin),replace=TRUE)
null[i]<-mean(mnFst[a],na.rm=TRUE)
}
p<-mean(null > obs,na.rm=TRUE)
## counting pips per LG
a<-table(d1[pipWin,1])
b<-table(d1[unique(pipWin),1])
pdf("traitAssocWindsByLg.pdf",width=5,height=10)
par(mfrow=c(2,1))
barplot(a,names.arg=names(a),main="total number")
barplot(b,names.arg=names(b),ylim=c(0,35),main="unique windows")
dev.off()