Post date: Nov 12, 2013 9:42:14 PM
I used a pair of perl scripts to grab the subset of individuals relevant for each of four experiments, (i) GLA on Ms, (ii) GLA on Ac, (iii) SLA on Ms, (iv) SLA on Ac, and generate genotype likelihood files. This includes all experimental individuals that were sequenced from a population and treatment, as well as wild-caught individuals from the population. These scripts are makeIdList.pl and vcf2glSelExp.pl both in projects/lycaeides_hostplant/experiment. Together these scripts generate the infiles for the popmod program I wrote. This program estimates genotypes and allele frequencies from genotype likelihoods. Importantly, the program allows a subset of individuals to be used to infer the allele frequencies (this is specified with a vector of 0's and 1's in the gentoype likelihood file). Thus, I can use the allele frequencies estimated only from the wild-caught individuals as the prior on genotypes for the experimental individuals.
I am using the popmod program to estimate genotypes for the experimental individuals. The four input files contain 206,047 loci. The number of experimental and wild caught individuals per analysis or file is,
The input genotype likelihood files are data_*laTra*.gl in projects/lycaeides_hostplant/variants/. I ran two independent MCMC analyses on each of the four data files to estimates genotypes. Each chain is 4600 steps with a 100 step burning and thinning interval of 3 (1500 saved samples). I used Jeffery's beta prior (a=0.5, b=0.5) for the allele frequencies. The job numbers are 43881-43888 (8 jobs). These are all running with results written to scratch/expaf/.