Timema radiation, Fst and HMM

Post date: Oct 12, 2014 11:29:7 PM

In some cases Fst estimates were based on few or no individuals. Thus, I wrote a script (getCoverage.pl) to determine the number of individual with at least one read at each locus for each population. The files non0coverage*.txt contain the results (one line per locus, # with data) for each population and are in /labs/evolution/projects/timema_radiation/variants/.

I am using a 3-state HMM to identify low, intermediate, and high differentiation genomic regions between each of the ten pairs of populations. My methods follow those presented in the Science paper. Briefly, this is a 3-state model with means (logit scale) and transition probabilities estimated from the data, but with the standard deviation fixed to the empirical sd. I am only including loci with data for five or more individuals in each population in a pair. I am running two replicates with different starting values for the transition probability matrix. The results will be saved to a HMM R workspace for each population. An example of the code for one pair is below. This code (one R file per population) and the output are all in /labs/evolution/projects/timema_radiation/popgen/fsts/.

library(HiddenMarkov)

## load rwsp

load("fstBCBOG_CxBCBOG_Q.rwsp")

nl<-5074942

## by locus sample size

nByL1<-read.table("../../variants/non0coverageBCBOG_C.txt")

nByL2<-read.table("../../variants/non0coverageBCBOG_Q.txt")

## logit transform fst

lfst<-log(fstBCBOG_CxBCBOG_Q[,3]/(1-fstBCBOG_CxBCBOG_Q[,3]))

lfst[is.na(lfst)==TRUE]<-log(0.0001/0.9999)

lfst[is.infinite(lfst)==TRUE]<-log(0.0001/0.9999)

## set fst to NA is sample sizes are not >= 5 each

nmin<-5

lfst<-lfst[which((nByL1[,1] >= nmin) & (nByL2[,1] >= nmin))]

kept<-which((nByL1[,1] >= nmin) & (nByL2[,1] >= nmin))

## revised function for HMM, allows constrained HMM

Mstep.normc <- function(x, cond, pm, pn){

# this function is a modified version of Mstep.norm

# here the standard deviation is fixed to the values specified in pm$sd

nms <- sort(names(pm))

n <- length(x)

m <- ncol(cond$u)

if (all(nms==c("mean", "sd"))){

mean <- as.numeric(matrix(x, nrow = 1) %*% cond$u)/apply(cond$u,

MARGIN = 2, FUN = sum)

sd <- pm$sd

return(list(mean=mean, sd=sd))

}

if (all(nms == "mean")) {

mean <- as.numeric(matrix(x, nrow = 1) %*% cond$u)/apply(cond$u,

MARGIN = 2, FUN = sum)

return(list(mean = mean))

}

if (all(nms == "sd")) {

sd <- pm$sd

return(list(sd = sd))

}

stop("Invalid specification of parameters")

}

rnormc <- rnorm

dnormc <- dnorm

pnormc <- pnorm

qnormc <- qnorm

## initial values

delta<-c(0.05,0.9,0.05)

PiA<-matrix(c(0.9,0.1,0,0.05,0.9,0.05,0,0.1,0.9),nrow=3,byrow=T)

PiB<-matrix(c(0.7,0.3,0,0.1,0.8,0.1,0,0.05,0.95),nrow=3,byrow=T)

Pi<-list(PiA,PiB)

mn<-mean(lfst,na.rm=TRUE)

s<-sd(lfst,na.rm=TRUE)

params<-vector("list",2)

fit<-vector("list",2)

for(i in 1:2){

init<-dthmm(lfst,Pi[[i]],delta,"normc",list(mean=c(mn*0.9,mn,mn*1.1),sd=rep(s,3)),discrete=FALSE)

params[[i]]<-BaumWelch(init)

fit[[i]]<-Viterbi(params[[i]])

}

save(list=ls(),file="hmmBCBOG_CxBCBOG_Q.rwsp")

Page updated

Google Sites

Report abuse