Post date: Sep 30, 2013 9:59:23 PM
Even with the less most stringent (see previous post) conditions, I found 12,287,179 single nucleotide variants based on the whole genome resequence data from the 8 wild T. cristinae populations (160 individuals). I used the a perl script to create a genotype likelihood file from the vcf file (wildwgVarsA.vcf) with only variants with maf >= 1%. This reduced the data set to 4,391,556 SNVs. I then estimated the genotype for each individuals and SNV as ghat = Sum_a g_a * Pr(g_a) (the script I used is gl2genest.pl). The genotype estimates are in pntest_wildwgVarsA.txt. I will estimate allele frequencies in R based on the genotype estimates.