Post date: Oct 20, 2015 2:42:28 AM
All of the new results will be in /labs/evolution/projects/timema_speciation/tcristinae/.
1. Split LG and non-LG (based on genome v0.2) T. cristinae genotype likelihood files. This generates on gl file per population with LG and non-LG SNPs
perl ../scripts/splitPops.pl wildwgVarsA.gl noLg_wildwgVarsA.gl
2. Add headers with number of SNPs (9760626) and number of individuals to gl files
perl ../scripts/reHeaderGl.pl
3. Use the EM algorithm to infer allele frequencies
perl ../scripts/wrap_qsub_rc_estpEM.pl tcrist*gl
Here is one of the eight jobs
cd /labs/evolution/projects/timema_speciation/tcristinae/data/
~/bin/estpEM -i tcristHVA.gl -o p_tcristHVA.txt -e 0.0001 -m 20 -h 0
4. Pre-pend LG, order, scaffold and position based on v0.3 (latter two are the same of course) to the allele frequency estimates. These come directly from the genome, so they are correct.
perl ../scripts/addV3Lg.pl p_tcrist*
This generates the lgver3_p* files.
5. Calculate Fst for 20kb windows, where each window starts with the first SNP on a scaffold and only complete windows are included. A script called calcNeiTcr.pl creates the R scripts for this and runs them. This runs on the files in filesTcr.txt and generates nei20k_lgver3_p* files with LG (v0.3), order (v0.3), scaffold, position (mid point), number of SNPs, pi, Dxy, Da, and Fst for each window. I generated the file tcrist20kbRegionsGenomeV03.txt, which I sent to Doro, from these.
perl ../scripts/calcNeiTcr.pl
Here is the R script for one population:
nl<-9760626
nc<-4
llist<-matrix(scan("locuslist.txt",n=nl*4,sep=" "),nrow=nl,ncol=4,byrow=T)
scafs<-unique(llist[,3])
p1<-matrix(scan("lgver3_p_tcristLA.txt",n=nl*nc,what="character",sep=" "),nrow=nl,ncol=nc,byrow=T)
p2<-matrix(scan("lgver3_p_tcristPRC.txt",n=nl*nc,what="character",sep=" "),nrow=nl,ncol=nc,byrow=T)
pavg<-(as.numeric(p1[,4]) + as.numeric(p2[,4]))/2
pi1<-2 * as.numeric(p1[,4]) * (1 - as.numeric(p1[,4]))
pi2<-2 * as.numeric(p2[,4]) * (1 - as.numeric(p2[,4]))
piavg<-2 * pavg * (1 - pavg)
dxy<-as.numeric(p1[,4]) * (1 - as.numeric(p2[,4])) + (1 - as.numeric(p1[,4])) * as.numeric(p2[,4])
for(i in 1:length(scafs)){## loop through scaffolds
x<-which(llist[,3]==scafs[i]);pi1s<-pi1[x];pi2s<-pi2[x];dxys<-dxy[x];piavgs<-piavg[x]
lls<-llist[x,4]
window<-20000
n<-length(lls);bnds<-seq(lls[1],lls[n],window)
for(j in 1:(length(bnds)-1)){
a<-which(lls >= bnds[j] & lls < bnds[j+1])
if(length(a) >= 1){
pi<-sum((pi1s[a]+pi2s[a])/2) * 1/window
D<-sum(dxys[a]) * 1/window
pit<-sum(piavgs[a]) * 1/window
fst<-(pit - pi)/pit
Da<-D-pi
lg<-unique(llist[llist[,3]==scafs[i],1])
ord<-unique(llist[llist[,3]==scafs[i],2])
nei<-cbind(lg,ord,scafs[i],((bnds[j]+bnds[j+1])/2),length(a),pi,D,Da,fst)
write.table(round(nei,8),file="nei20k_lgver3_p_tcristLtxt",row.names=F,col.names=F,quote=F,append=TRUE)
}}}