Post date: Mar 26, 2015 4:55:21 PM
First, the results below include the observed, expected (mean of null) and upper bounds (95th quantile of null), and p-value for observed > null for correspondence between the top Fst 20kb windows in the experiment and all four natural populations. This is basically the same thing Idid before but with the upper bounds added. I then took those 20kb windows in the top Fst for the experiment and at least one population pair and classified these as the selection candidates. There were 25 unique selection candidates (28 total, but scaf 14 position 178340 was present in 3 and scaf 966 position 74592 was present in 2 population pairs). I then calculated the mean pi for these 25 regions and compared this to null expectation with a randomization test. The results are also below. Here I was interested in the low pi tail so I give the 5th rather then 95th quantile of the null along with the mean and the p-value for observed < null. So, p < 0.001 and the difference between the null and observed is rather striking. So, pi is low in our selection candidates.
obs null ub enrichment p piobs pinull pilb p
99 28 8.883 14 3.152088 0 0.0002704276 0.002789536 0.002137824 0
99.5 10 2.307 5 4.334634 0 NA NA NA NA
99.9 3 0.099 1 30.303030 0 NA NA NA NA
And here is the R code (fstPiRandomTest.R):
n1<-22112
n5<-22110
parname<-c("pi","Dxy","Da","Fst")
d1<-matrix(scan("nei20k_tcrist_MR1",n=n1*7,sep=" "),nrow=n1,ncol=7,byrow=T)
d2<-matrix(scan("nei20k_tcrist_R12",n=n1*7,sep=" "),nrow=n1,ncol=7,byrow=T)
d3<-matrix(scan("nei20k_tcrist_L",n=n1*7,sep=" "),nrow=n1,ncol=7,byrow=T)
d4<-matrix(scan("nei20k_tcrist_HV",n=n1*7,sep=" "),nrow=n1,ncol=7,byrow=T)
d5<-matrix(scan("temp.txt",n=n5*7,sep=" "),nrow=n5,ncol=7,byrow=T)
parm<-7
A<-matrix(NA,nrow=n5,ncol=4)
B<-d5[,parm]
llist<-matrix(NA,nrow=n5,ncol=2)
for(i in 1:n5){
a<-which(d1[,1] == d2[i,1] & d1[,2] == d2[i,2])
if(length(a) > 0){A[i,]<-c(d1[a,parm],d2[a,parm],d3[a,parm],d4[a,parm])
llist[i,1]<-d2[i,1]
llist[i,2]<-d2[i,2]
}}
qsA<-apply(A,2,quantile,probs=c(0.99,0.995,0.999),na.rm=T)
qsB<-quantile(B,probs=c(0.99,0.995,0.999),na.rm=T)
out<-matrix(0,nrow=3,ncol=9)
rownames(out)<-c("99","99.5","99.9")
colnames(out)<-c("obs","null","ub","enrichment","p","piobs","pinull","pilb","p")
selcans<-NULL
for(i in 1:4){selcans<-c(selcans,which(A[,i] > qsA[1,i] & B > qsB[1])}
for(j in 1:3){for(i in 1:4){out[j,1]<-out[j,1]+sum(A[,i] > qsA[j,i] & B > qsB[j],na.rm=T)}}
N<-1000
null<-matrix(0,nrow=N,ncol=3)
C<-A
for(i in 1:N){
for(p in 1:4){C[,p]<-sample(A[,p],length(A[,p]),replace=FALSE)}
D<-sample(B,length(B),replace=FALSE)
for(j in 1:3){for(p in 1:4){null[i,j]<-null[i,j]+sum(C[,p] > qsA[j,p] & B > qsB[j],na.rm=T)}}
}
for(j in 1:3){
out[j,2]<-mean(null[,j],na.rm=TRUE)
out[j,3]<-quantile(null[,j],probs=0.95,na.rm=TRUE)
out[j,4]<-out[j,1]/out[j,2]
out[j,5]<-mean(null[,j] >= out[j,1])
}
## now check pi in same regions
out[2:3,6:9]<-NA
piavg<-apply(cbind(d1[,4],d2[,4],d3[,4],d4[,4]),1,mean,na.rm=T)
j<-1
out[j,6]<-mean(piavg[unique(selcans)])
l<-length(unique(selcans))
null<-rep(NA,1000)
for(i in 1:1000){
D<-sample(piavg,length(piavg),replace=FALSE)
null[i]<-mean(D[unique(selcans)],na.rm=T)
}
out[j,7]<-mean(null)
out[j,8]<-quantile(null,probs=0.05)
out[j,9]<-sum(null <= out[j,6])
write.csv(out,"piFstExperiment.csv")