Post date: Feb 21, 2015 9:39:27 PM
I am using bcftools to estimate pi and theta for each seed beetle population. Estimates use EM and account for uncertainty in which sites are variable (this is the same thing I did for the Lycaeides admixed populations).
First I need to generate a bcf file for each population line at each time point,
samtools mpileup -C 50 -d 50 -f cmacbbg9jan14gbs.fasta -q 10 -Q 15 -D -g -I alnL1R-F35*bam > varL1R-F35.bcf
Then I generate an initial estimate of the SFS,
bcftools view -cGP cond2 varL1R-F35.bcf > /dev/null 2> L1R-F35sites.1.afs
And then I iterate the EM algorithm (20 times) to infer the SFS,
bcftools view -cGP L1R-F35sites.N.afs L1R-F35sites.bcf > /dev/null 2> L1R-F35sites.N+1.afs
I am using a perl script (sfsEM.pl) to iterate this process over all of the lines (via wrap_qub*). The output files include the SFS (unfolded, and split across sets of bases that will need to be combined), heterozygosity (pi or Theta_pi) and Theta_w.