Post date: Nov 11, 2013 11:39:41 PM
I wrote a C++ program to estimate genotypes and population allele frequencies from NGS data using pre-calculated genotype likelihoods. The program is called popmod and is in cvs under this name. This program samples the posterior probability distribution from the following Bayesian model (this is the unnormalized posterior),
Pr(g, p | x, e) = Pr(x | g, e) Pr(g | p) Pr(p)
Here the genotype likelihoods given by Pr(x | g, e) are pre-calculated using samtools and bcftools. The genotype prior is the HW model, and Pr(p) is given by a beta distribution with fixed hyperprior (the default is Jeffery's prior, a = b = 0.5). Thus, all steps in the MCMC algorithm are Gibb's samples from the full conditional posterior distributions. I ran the model on a small example data set and it produced reasonable output.
The output is written to a HDF5 file. I still need to write a version of estpost to parse this file.