Post date: Oct 12, 2014 11:29:7 PM
In some cases Fst estimates were based on few or no individuals. Thus, I wrote a script (getCoverage.pl) to determine the number of individual with at least one read at each locus for each population. The files non0coverage*.txt contain the results (one line per locus, # with data) for each population and are in /labs/evolution/projects/timema_radiation/variants/.
I am using a 3-state HMM to identify low, intermediate, and high differentiation genomic regions between each of the ten pairs of populations. My methods follow those presented in the Science paper. Briefly, this is a 3-state model with means (logit scale) and transition probabilities estimated from the data, but with the standard deviation fixed to the empirical sd. I am only including loci with data for five or more individuals in each population in a pair. I am running two replicates with different starting values for the transition probability matrix. The results will be saved to a HMM R workspace for each population. An example of the code for one pair is below. This code (one R file per population) and the output are all in /labs/evolution/projects/timema_radiation/popgen/fsts/.
library(HiddenMarkov)
## load rwsp
load("fstBCBOG_CxBCBOG_Q.rwsp")
nl<-5074942
## by locus sample size
nByL1<-read.table("../../variants/non0coverageBCBOG_C.txt")
nByL2<-read.table("../../variants/non0coverageBCBOG_Q.txt")
## logit transform fst
lfst<-log(fstBCBOG_CxBCBOG_Q[,3]/(1-fstBCBOG_CxBCBOG_Q[,3]))
lfst[is.na(lfst)==TRUE]<-log(0.0001/0.9999)
lfst[is.infinite(lfst)==TRUE]<-log(0.0001/0.9999)
## set fst to NA is sample sizes are not >= 5 each
nmin<-5
lfst<-lfst[which((nByL1[,1] >= nmin) & (nByL2[,1] >= nmin))]
kept<-which((nByL1[,1] >= nmin) & (nByL2[,1] >= nmin))
## revised function for HMM, allows constrained HMM
Mstep.normc <- function(x, cond, pm, pn){
# this function is a modified version of Mstep.norm
# here the standard deviation is fixed to the values specified in pm$sd
nms <- sort(names(pm))
n <- length(x)
m <- ncol(cond$u)
if (all(nms==c("mean", "sd"))){
mean <- as.numeric(matrix(x, nrow = 1) %*% cond$u)/apply(cond$u,
MARGIN = 2, FUN = sum)
sd <- pm$sd
return(list(mean=mean, sd=sd))
}
if (all(nms == "mean")) {
mean <- as.numeric(matrix(x, nrow = 1) %*% cond$u)/apply(cond$u,
MARGIN = 2, FUN = sum)
return(list(mean = mean))
}
if (all(nms == "sd")) {
sd <- pm$sd
return(list(sd = sd))
}
stop("Invalid specification of parameters")
}
rnormc <- rnorm
dnormc <- dnorm
pnormc <- pnorm
qnormc <- qnorm
## initial values
delta<-c(0.05,0.9,0.05)
PiA<-matrix(c(0.9,0.1,0,0.05,0.9,0.05,0,0.1,0.9),nrow=3,byrow=T)
PiB<-matrix(c(0.7,0.3,0,0.1,0.8,0.1,0,0.05,0.95),nrow=3,byrow=T)
Pi<-list(PiA,PiB)
mn<-mean(lfst,na.rm=TRUE)
s<-sd(lfst,na.rm=TRUE)
params<-vector("list",2)
fit<-vector("list",2)
for(i in 1:2){
init<-dthmm(lfst,Pi[[i]],delta,"normc",list(mean=c(mn*0.9,mn,mn*1.1),sd=rep(s,3)),discrete=FALSE)
params[[i]]<-BaumWelch(init)
fit[[i]]<-Viterbi(params[[i]])
}
save(list=ls(),file="hmmBCBOG_CxBCBOG_Q.rwsp")