Post date: Jan 22, 2014 4:3:30 PM
I calculated the size of hi-Fst HMM regions in base pairs. The distribution of hi-Fst region sizes in kbp for each population pair is shown in hiHmmSize.pdf in projects/timema_wgwild/hmm/. The distributions are quite similar (it seems that population pairs differ much more in the number of hi-Fst regions than in their length, though regions are larger on average in HV and R12 than other populations pairs). These results are somewhat different from what we get when using number of SNPs as the measure of length. A key difference between these two approaches are how long runs of N's in scaffolds are treated: these add no SNPs but do add to the length of a region (and there is no opportunity for a region to end in the run of N's, it has to wait for the next SNP). Below is a textual summary of these distributions.
HVA x HVC
Min. 1st Qu. Median Mean 3rd Qu. Max.
77 1020 4729 10460 13910 303700
MR1A x MR1C
Min. 1st Qu. Median Mean 3rd Qu. Max.
78 794 2278 6276 7662 158700
R12A x R12C
Min. 1st Qu. Median Mean 3rd Qu. Max.
50.0 948.5 4830.0 10580.0 13890.0 204600.0
LA x PRC
Min. 1st Qu. Median Mean 3rd Qu. Max.
32 644 2411 7255 8691 184600
## size of hmm regions in base pairs
hibpsize<-vector('list',4)
for(pp in 1:4){
ub<-which(fitHMsA[[pp]][1:(nl-1)]==1 & fitHMsA[[pp]][2:nl]!=1)
lb<-which(fitHMsA[[pp]][2:nl]==1 & fitHMsA[[pp]][1:(nl-1)]!=1)
nr<-length(lb)
bpsize<-numeric(nr)
for (i in 1:nr){
if(locinfo[lb[i],3] == locinfo[ub[i],3]){
bpsize[i]<-locinfo[ub[i],4] - locinfo[lb[i],4]
}
else {
if(length(unique(locinfo[lb[i]:ub[i],3]))==2){
a<-max(locinfo[locinfo[,3]==locinfo[lb[i],3],4])-locinfo[lb[i],4]
b<-locinfo[ub[i],4]-min(locinfo[locinfo[,3]==locinfo[ub[i],3],4])
bpsize[i]<-a+b
}
else{
bpsize[i]<-NA
}
}
}
hibpsize[[pp]]<-bpsize
}
pdf("hiHmmSize.pdf",width=8,height=8)
par(mfrow=c(2,2))
par(mar=c(5.5,6.5,1,1))
for(pp in 1:4){
hist(hibpsize[[pp]],breaks=seq(0,310000,10000),xlab="hmm region size (kbp)",ylab="no. of regions",cex.lab=1.7,cex.axis=1.3,xlim=c(0,310000),col="gray",main=poppair[pp],axes=FALSE)
axis(1,at=seq(0,300000,50000),labels=as.character(seq(0,300,50)),cex.axis=1.1)
axis(2,cex.axis=1.0,las=2)
}
dev.off()