new trait-associated SNP analyses, ecotypes and experiments

Post date: Nov 01, 2015 11:47:39 PM

I ran numerous randomization tests to ask whether 20kb windows for ecotype pairs or the experiment that contained trait-associated SNPs (from gemma) had higher Fst than expected by chance. All of the files for this are in /labs/evolution/projects/timema_speciation/tcristinae/traits/. My R script for these analyses is traitFstRandomTest.R. This is pasted below, but has to be commented, uncommented and edited to run specific contrasts. Detailed results are below. In each case I give the PIP for inclusion, whether only unique SNPs (across traits) and windows (across and within traits) were counted, or multiple hits were allowed, which traits were included, and which ecotype pairs were included (all 4 where not specified), followed by the x-fold difference of trait-Fst to null expectations and the p-value. Clearly, there are more permutations of things we could try, but I think we have a fairly clear and believable story so far. Taking all of this together, I think we see that, there is never a giant jump in Fst, but that we do have a slight increase in Fst for trait associated SNPs when we include stripe, include or only count LA x PRC, and count windows multiple times if there are multiple hits. The pattern is indeed primarily driven by LG8 (see the attached figure for where the trait-associated SNPs are), and LG8 in particular has multiple windows tagged by multiple SNPs (and thus suffers more of a reduction when only counting unique hits). This all makes eminent sense to me and really supports our story.

Counts of PIP 0.05 SNPs in 20kb windows on each LG:

multiple counting: 1 2 3 4 5 6 7 8 9 10 11 12 13

11 2 19 14 3 9 8 36 2 10 3 1 4

unique only: 1 2 3 4 5 6 7 8 9 10 11 12 13

7 1 16 13 3 8 6 15 2 10 3 1 4

- PIP = 0.05, unique SNPs, unique windows, all (13) traits: 0.988x, p = 0.6364
- PIP = 0.05, unique SNPs, multiple windows, all (13) traits: 1.05x, p = 0.0368
- PIP = 0.05, multiple SNPs, multiple windows, all (13) traits: 1.05x, p = 0.0372
- PIP = 0.05, multiple windows, stripe: 1.17x, p = 0.023
- PIP = 0.05, unique windows, stripe: 1.13x, p = 0.0877
- PIP = 0.1, multiple windows, stripe: 1.15x, p = 0.1012
- PIP = 0.1, unique windows, stripe: 1.05x, p = 0.3338
- PIP = 0.05, unique SNPs, unique windows, all but stripe: 0.987x, p = 0.6433
- PIP = 0.05, unique SNPs, multiple windows, all but stripe: 1.04x, p = 0.1251
- PIP = 0.05, multiple SNPs, multiple windows, all but stripe: 1.04x, p = 0.1162
- PIP = 0.05, windows with >1 hit (counted once), all (13) traits: 1.07x, p = 0.1694
- PIP = 0.05, windows with only 1 hit, all (13) traits: 0.966x, p = 0.1694, p = 0.8166
- Comparison of >1 hit and only 1 hit windows (i.e. is Fst higher for >1), all (13) traits: 1.28x, p = 0.122
- PIP = 0.05, paraptric pairs, multiple SNPs, multiple windows, all (13) traits: 0.989x, p = 0.6686
- PIP = 0.05, allopatric pair, multiple SNPs, multiple windows, all (13) traits: 1.12x, p = 0.0081
- PIP = 0.05, paraptric pairs, unique SNPs, unique windows, all (13) traits: 0.976x, p = 0.8014
- PIP = 0.05, allopatric pair, unique SNPs, unique windows, all (13) traits: 1.00x, p = 0.481

Results related to the selection experiment (these make much less sense, as stripe shows a weaker signal):

- PIP = 0.05, founders, multiple windows, all (13) traits: 0.993x, p = 0.6022
- PIP = 0.05, survivors, multiple windows, all (13) traits: 1.011x, p = 0.3143
- PIP = 0.05, survivors-founders, multiple windows, all (13) traits: 1.018x, p = 0.2761
- PIP = 0.05, survivors-founders, unique windows, all (13) traits: 1.034x, p = 0.1749
- PIP = 0.05, founders, multiple windows, stripe: 0.912x, p = 0.9173
- PIP = 0.05, survivors, multiple windows, stripe: 0.925x, p = 0.8756
- PIP = 0.05, survivors-founders, multiple windows, stripe: 0.929x, p = 0.7869
- PIP = 0.05, survivors-founders, unique windows, stripe: 0.934x, p = 0.713
- PIP = 0.05, founders, multiple windows, all but stripe: 1.01x, p = 0.3322
- PIP = 0.05, survivors, multiple windows, all but stripe: 1.03x, p = 0.1435
- PIP = 0.05, survivors-founders, multiple windows, all but stripe: 1.035x, p = 0.1503
- PIP = 0.05, survivors-founders, unique windows, all but stripe: 1.043x, p = 0.136

R script:

## trait pips

TrBL<-as.matrix(read.table("pip_2013FHA_all_BL_MAF01_sparse.txt",header=F))

TrBW<-as.matrix(read.table("pip_2013FHA_all_residBW_MAF01_sparse.txt",header=F))

TrHW<-as.matrix(read.table("pip_2013FHA_all_residHW_MAF01_sparse.txt",header=F))

TrlatGB<-as.matrix(read.table("pip_2013FHA_all_resid_latGB_MAF01_sparse.txt",header=F))

TrlatL<-as.matrix(read.table("pip_2013FHA_all_resid_latL_MAF01_sparse.txt",header=F))

TrlatRG<-as.matrix(read.table("pip_2013FHA_all_resid_latRG_MAF01_sparse.txt",header=F))

Trfpent<-as.matrix(read.table("pip_2013FHA_females_penta_MAF01_sparse.txt",header=F))

Trfhepta<-as.matrix(read.table("pip_2013FHA_females_hepta_MAF01_sparse.txt",header=F))

Trfnona<-as.matrix(read.table("pip_2013FHA_females_nona_MAF01_sparse.txt",header=F))

Trmpent<-as.matrix(read.table("pip_2013FHA_males_mpenta_MAF01_sparse.txt",header=F))

Trmhepta<-as.matrix(read.table("pip_2013FHA_males_mhepta_MAF01_sparse.txt",header=F))

Trmnona<-as.matrix(read.table("pip_2013FHA_males_mnona_MAF01_sparse.txt",header=F))

Trpatln<-as.matrix(read.table("pip_2013FHA_all_resid_percent_stripe_MAF01_sparse.txt",header=F))

Tlist<-list(TrBL,TrBW,TrHW,TrlatGB,TrlatL,TrlatRG,Trfpent,Trfhepta,Trfnona,Trmpent,Trmhepta,Trmnona,Trpatln)

## fst

n1<-42687

parname<-c("pi","Dxy","Da","Fst")

parm<-9

d1<-matrix(scan("../fst/nei20k_lgver3_p_tcristHV.txt",n=n1*parm,sep=" "),nrow=n1,ncol=parm,byrow=T)

d2<-matrix(scan("../fst/nei20k_lgver3_p_tcristMR1.txt",n=n1*parm,sep=" "),nrow=n1,ncol=parm,byrow=T)

d3<-matrix(scan("../fst/nei20k_lgver3_p_tcristR12.txt",n=n1*parm,sep=" "),nrow=n1,ncol=parm,byrow=T)

d4<-matrix(scan("../fst/nei20k_lgver3_p_tcristL.txt",n=n1*parm,sep=" "),nrow=n1,ncol=parm,byrow=T)

ex1<-matrix(scan("../fst/nei20k_AvC.txt",n=n1*parm,sep=" "),nrow=n1,ncol=parm,byrow=T)

ex2<-matrix(scan("../fst/nei20k_AsurvCsurv.txt",n=n1*parm,sep=" "),nrow=n1,ncol=parm,byrow=T)

#get high pip pattern SNPs

hiPip<-NULL

bnd<-0.05

hiPip<-Trpatln[which(Trpatln[,4] > bnd),]

uHiPip<-hiPip[which(!duplicated(hiPip[,3])),]

# get high pip SNPs

hiPip<-NULL

bnd<-0.05

for(i in 1:12){

x<-Tlist[[i]][which(Tlist[[i]][,4] > bnd),]

if(length(x) > 0){hiPip<-rbind(hiPip,x)}

}

uHiPip<-hiPip

#uHiPip<-hiPip[-duplicated(hiPip[,3]),]

## 20kb window for each hi pip SNPs

n<-dim(uHiPip)[1]

pipWin<-rep(NA,n)

for(i in 1:n){

x<-which(d1[,3]==uHiPip[i,2] & abs(d1[,4]-uHiPip[i,3]) < 20000)

if (length(x) == 1){

pipWin[i]<-x

}

else if (length(x) > 1){

x<-which(d1[,3]==uHiPip[i,2] & abs(d1[,4]-uHiPip[i,3]) < 10000)

if (length(x) == 1){

pipWin[i]<-x

}

else{

cat ("mult snps for ",i,"\n")

}

pipWin<-pipWin[is.na(pipWin)==FALSE]

## get only windows with 1 or >1 hit

#winCnts<-table(pipWin)

#pipWin<-as.numeric(names(winCnts[winCnts == 1]))

#p1<-as.numeric(names(winCnts[winCnts == 1]))

#p2<-as.numeric(names(winCnts[winCnts > 1]))

####

A<-as.matrix(cbind(d1[,parm],d2[,parm],d3[,parm]))

#A<-as.matrix(cbind(d1[,parm],d2[,parm],d3[,parm],d4[,parm]))

mnFst<-apply(A,1,mean,na.rm=TRUE)

mnFst<-d4[,parm]

mnFst<-ex2[,parm]

obs<-mean(mnFst[pipWin],na.rm=T)

null<-rep(NA,10000)

for(i in 1:10000){

a<-sample(1:n1,length(pipWin),replace=TRUE)

null[i]<-mean(mnFst[a],na.rm=TRUE)

}

p<-mean(null > obs,na.rm=TRUE)

## counting pips per LG

a<-table(d1[pipWin,1])

b<-table(d1[unique(pipWin),1])

pdf("traitAssocWindsByLg.pdf",width=5,height=10)

par(mfrow=c(2,1))

barplot(a,names.arg=names(a),main="total number")

barplot(b,names.arg=names(b),ylim=c(0,35),main="unique windows")

dev.off()