Post date: Nov 11, 2013 5:52:49 PM
The variant calling run finished with wild-caught individuals, but not with all individuals. It is trivial to grab a set of loci from one vcf file and calculate the genotype likelihoods for other individuals at these sites. However, I cannot figure out a way to specify the non-reference allele, meaning that the same alleles might not be identified in each population or set. Consequently, I think it will be necessary to include all individuals in the variant calling phase (thus I need to wait for the run with all individuals to finish).
This also means that I cannot simply use the ML allele frequency (AF1 from the vcf file) to set the prior for genotypes. Importantly the ML allele frequency from the vcf files is not equivalent to,
[sum_i sum_k k L(g_i = k)]/2N
Rather it is obtained by numerically maximizing the full likelihood. I plan to implement a Bayesian equivalent that works directly with pre-calculated genotype likelihoods. It would be nice to include a switch to include or exclude individuals from the allele frequency calculation (this would be useful for the selection experiment individuals). Also, I might want to add a F-model (overall or population-specific) prior to account for correlated allele frequencies among sites.