Post date: Oct 13, 2016 11:19:44 PM
1. Extract genetic data (genotype likelihoods) for SNPs with maf > 1% and split the vcf file by treatment (source and each experimental block):
perl ../scripts/vcf2gl.pl 0.01 morefilter_filtered2x_tcrFhaVars.vcf
perl ../scripts/splitPopsOga.pl oga_maf1pct.gl
This results in 249,074 SNPs with N = 30 (OGA2010) , 60 (MR1_C), 46 (MR2_C), 30 (MR3_C), 56 (MR4_C), 9 (MR5_C), 58 (MR1_A), 28 (MR2_A), 26 (MR3_A), 101 (MR4_A), 7 (MR5_A),
2. Infer population allele frequencies. Used estpEM, which implements a EM algorithm. Here are the commands (repeated for each treatment:
estpEM -i OGA2011.gl -o p_OGA2011.gl -h 2
Additional files were made with just the EM estimates (single column):
cut -f 3 -d " " p_OGA2011.gl > pEM_OGA2011.txt
3. Genomic change for experimental treatments.
I made plots akin to what I have been making with scaffold size (no. of SNPs) on the x and either mean or 95th quantile allele frequency change for each scaffold with at least 500 SNPs (things get messy if you include scaffolds with few SNPs... basically the variation in means and quantiles explodes). Each panel shows the founders vs. one of the treatments (the panels are labeled by treatment along with the # re-captured). The plots, ogaMnChange.pdf and ogaQ95Change.pdf, clearly show that 128 (blue) and 702.1 (red) are nearly always at the very top in terms of change. This is really strong. So strong that I have tried to think of ways it could be an artifact of some sort, but this simply doesn't seem possible. This basically could stand on its own.
I next looked at what explains variation in the response for 128 and 702.1 across treatments. First, note that both the means and quantiles for 128 and 702.1 are very highly correlated (r = 0.97 for means and 0.99 for the 95th quantile). So, we can expect nearly identical results for either scaffold. Thus, for now I am focusing on 128. Here are the main results:
Genomic change on 128 is not obviously related to the re-capture size, though the point estimates of the correlations are in the right direction; means: r = -0.42, p = 0.23; 95th q: r = -0.19, p = 0.59. Some affect of recapture of course makes sense (more drift), but it is not too strong (or at least not significant).
Change in stripe frequency is not related to genomic change for scaffold 128. Means: r = -0.17, p = 0.63; 95th q: r = -0.01, p = 0.99.
But, there does appear to be a relationship between genomic change for scaffold 128 and change in the freq of melanistic bugs, at least for the 95th quantile (results for the mean change are suggestive too). Means: r = 0.54, p = 0.10; 95th q: r = 0.61, p = 0.059. Note that these p-values are pretty good given the fact that n = 10. Finally, the 95th quantile for genomic change remains (marginally) significant in a model that includes both number recapatured and genomic change as predictors of change in the frequency of melanistic bugs.
The contrasting signal for stripe and melanistic bugs is interesting. Changes in these two phenotypes are positively correlated (r = 0.53, p = 0.12), but not so strongly that they can't show semi-independent results.
Another thought, this signal is really strong. Why? I think the answer is that we are summarizing things at the right scale. By looking at large scaffolds noise, including individual genetic variants under selection, is mostly average over. But, because of the crazy LD on 702.1 and 128, direct selection on causal variants generates widespread indirect selection which shifts the mean and distribution (quantiles) for these scaffolds in a serious way. In other words, even if you had a single causal variant of similar effect on each scaffold in the genome, you would get an increased signal on 702.1 and 128 because you would have so much more indirect selection. You would totally miss this with tiny contigs or just by thinking about individual SNPs (and possibly even with windows unless the windows were quite big).
Key R code:
## read in allele freq. and id data
p0<-scan("pEM_OGA2010.txt")
p1a<-scan("pEM_MR1_A.txt")
p2a<-scan("pEM_MR2_A.txt")
p3a<-scan("pEM_MR3_A.txt")
p4a<-scan("pEM_MR4_A.txt")
p5a<-scan("pEM_MR5_A.txt")
p1c<-scan("pEM_MR1_C.txt")
p2c<-scan("pEM_MR2_C.txt")
p3c<-scan("pEM_MR3_C.txt")
p4c<-scan("pEM_MR4_C.txt")
p5c<-scan("pEM_MR5_C.txt")
p<-list(p1a,p1c,p2a,p2c,p3a,p3c,p4a,p4c,p5a,p5c)
ids<-read.table("snpids.txt",header=F)
## calculate allele frequency change
dp<-vector("list",10)
scmns<-vector("list",10)
scq95<-scmns
scn<-scmns
for(i in 1:10){
dp[[i]]<-abs(p[[i]]-p0)
scmns[[i]]<-tapply(X=dp[[i]],INDEX=ids[,2],mean)
scq95[[i]]<-tapply(X=dp[[i]],INDEX=ids[,2],quantile,probs=0.95)
scn[[i]]<-tapply(X=is.na(dp[[i]])==FALSE,INDEX=ids[,2],sum)
}
## treatment and change info.
trt<-c("1A","1C","2A","2C","3A","3C","4A","4C","5A","5C")
re<-c(57,62,28,46,25,29,99,56,7,9)
dmel<-c(0.1498245614,0.0570967742,0.0021428571,0.1547826087,-0.06,-0.0568965517,0.1486868687,0.0042857143,0.21,0.1188888889)
dstr<-c(-0.0228205128,-0.4110843373,-0.4445614035,-0.3628205128,-0.2948717949,-0.5235294118,-0.3216666667,-0.4663473054,-0.1127848101,-0.5681818182)
pstr2010<-c(0.7628205128,0.7710843373,0.8245614035,0.7628205128,0.7948717949,0.8235294118,0.7916666667,0.8263473054,0.8227848101,0.8181818182)
elev<-c(870,870,874,876,856,856,828,826,816,823)
lat<-c(30.859,30.859,30.822,30.851,30.862,30.862,30.831,30.828,30.88,30.867)
lon<-c(47.986,47.986,47.961,47.969,48.031,48.031,48.112,48.091,48.103,48.113)
mn128<-rep(NA,10)
q128<-rep(NA,10)
mn702<-rep(NA,10)
q702<-rep(NA,10)
## plots excluding small scaffolds
## drop small scaffolds, too noisy
pdf("ogaQ95Change.pdf",width=4,height=10)
par(mfrow=c(5,2))
par(mar=c(4,4,2,2))
for(i in 1:10){
a<-which(scn[[i]] > 500)
#x<-cbind(scn[[i]][a],scmns[[i]][a])
x<-cbind(scn[[i]][a],scq95[[i]][a])
plot(x[,1],x[,2],xlab="no. SNPs",ylab="q95 freq. change",cex.lab=1.1,cex.axis=1)
#plot(x[,1],x[,2],xlab="no. SNPs",ylab="mean freq. change",cex.lab=1.1,cex.axis=1)
title(main=paste(trt[i],", ",re[i]))
## scaf 128
#mn128[i]<-x[3,2]
points(x[3,1],x[3,2],col="blue",pch=19)
## scaf 702.1
#mn702[i]<-x[15,2]
points(x[15,1],x[15,2],col="red",pch=19)
}
dev.off()