Post date: Oct 11, 2016 11:7:46 PM
-- re-ran on Oct. 17th after fixing MAF filter to match OGA. This changed the number of SNPs slightly. The numbers below are correct.
1. Extract genetic data (genotype likelihoods) for SNPs with maf > 1% and split the vcf file by year (2011 vs. 2013):
perl ../scripts/vcf2gl.pl 0.01 morefilter_filtered2x_tcrFhaVars.vcf
perl ../scripts/splitPops.pl fha_maf1pct.gl
This results in 175,918 SNPs with N = 500 for 2011 (fha2011.gl) and N = 602 for 2013 (fha2013.gl).
2. Infer population allele frequencies. Used estpEM, which implements a EM algorithm. Here are the commands:
estpEM -i fha2011.gl -o p_fha2011.txt -h 2
estpEM -i fha2013.gl -o p_fha2013.txt -h 2
Additional files were made with just the EM estimates (single column):
cut -f 3 -d " " p_fha2011.txt > pEM_fha2011.txt
cut -f 3 -d " " p_fha2013.txt > pEM_fha2013.txt
3. I made a few quick plots in R to see whether their was any signal with LG8. Two things were evident (code and plots below): (1) allele frequency change on scaffold 128 of LG8 was the highest (mean and 95th q.) for any scaffold with at least 500 SNPs, and (2) one of my scripts kicked out scaffold 702 because of the decimal point... I need to fix this and run things again.
Key R code:
p11<-scan("pEM_fha2011.txt")
p13<-scan("pEM_fha2013.txt")
ids<-read.table("snpids.txt",header=F)
dp<-abs(p13-p11)
scmns<-tapply(X=dp,INDEX=ids[,2],mean)
scn<-tapply(X=is.na(dp)==FALSE,INDEX=ids[,2],sum)
## drop small scaffolds, too noisy
a<-which(scn > 500)
x<-cbind(scn[a],scmns[a])
## plot of mean change
pdf("fhaScafChange.pdf",width=5,height=5)
plot(x[,1],x[,2],xlab="no. SNPs",ylab="mean freq. change",cex.lab=1.4,cex.axis=1.2)
## scaf 128
points(x[3,1],x[3,2],col="blue",pch=19)
## scaf 702.1
points(x[13,1],x[13,2],col="red",pch=19)
dev.off()
Here are plots of the number of SNPs on a scaffold (proxy for size) versus mean or 95th quantile of allele frequency change. Scaffold 128 is shown in blue. I need to repeat this with 702 included (I fixed this and the numbers above are now correct--need to update plots still... 702 stands out almost as much as 128).