Post date: Oct 04, 2013 5:10:36 PM
I finished an initial permutation analysis for snps on linkage groups to test for parallel high genetic divergence between Timema ecotypes. In brief, I
determined the empirical quantile (essentially rank) of the Fst for each locus for each pair, where 1 is the locus with the highest Fst and 0 is the locus with the lowest Fst
calculated the composite Fst quantile by summing these empirical quantiles (ranks) across the four population pairs (this composite score thus ranges from 0 to 4)
permuted the locus ids (the results are based on 1000 permutations) such that there is no correspondence between snps in different population pairs
calculated the composite Fst quantile for each snp in each permuted data set
recorded the maximum composite Fst quantile in each permuted data set, this is the maximum composite score expected across all > 4 million snps in the absence of parallelism
used the distribution of these maximum composite Fst quantiles from the premuted data sets (1000 maximums) as the null distribution for genome-wide significance for individual snps.
Based on this permutation four snps showed genome-wide significant (p < 0.05) evidence of parallel high genetic divergence.
Here are the snp numbers, linkage groups, scaffolds, and positions (in that order):
528506 11 316 565322
1010434 13 911 187771
3877150 8 765 45414
4118807 8 54 901311
And here is the R code specifically for the permutation analysis,
quant<-fstall
for(i in 1:4){
quant[order(fstall[,i]),i]<-seq(0,1,1/(dim(fstall)[1]-1))
}
qsum<-apply(quant,1,sum)
nrep<-1000
maxq<-numeric(nrep)
q99q<-numeric(nrep)
for(n in 1:nrep){
qrep<-quant
for(j in 1:4){
qrep[,j]<-qrep[sample(1:dim(qrep)[1],dim(qrep)[1],replace=FALSE),j]
}
qrepsum<-apply(qrep,1,sum)
maxq[n]<-max(qrepsum)
q99q[n]<-quantile(qrepsum,probs=0.99)
}
The related files, code, R workspace, and plots are in projects/timema_wgwild/fst.