Post date: Jun 23, 2015 9:59:51 PM
My plan is to use dadi (a program based on a diffusion approximation to the WF model with parameters estimated using composite likelihood) to infer divergence times and migration rates among T. cristinae populations from the genome data (for subsets of genetic regions, see below). This program is fast, but has two drawbacks. First, the diffusion approximation assumes Nm < 1/N (i.e. fewer than 1 migrant per generations), which is likely not valid for T. cristinae. Second, bootstrapping is required to estimate parameter confidence intervals and for hypothesis testing. But the latter is possible and the former might not be as bad as other possible assumptions/limitations of other approaches, so I will give it a shot.
Here is an overview of my analysis, starting with an initial test run before going into the real analysis (the files described are in /labs/evolution/projects/timema_speciation/dadi).
Generate infiles for dadi. The script mkAlleleCnts.R calculates the sample allele frequencies from genotype estimates (based solely on the genotype likelihoods) and then multiplies the frequencies by 2N and rounds to get allele counts. The *A files are the counts of 1 allele and the *B files are the counts of the other allele. A second script, mkDataIn.pl converts the acnt* files to dadi format, snp*txt. Here is an example: perl mkDadiIn.pl acnt_HVA_A.txt acnt_HVC_A.txt.
Random subset for test run. The script subScaf.R grabs a random subset (50 at present) of scaffolds. This list can be used to subset the dadi file: perl subDadi.pl ranSubScaf.txt snp_LAxPRC.txt. I used these to generate sub_snp* dadi snp files for HVAxHVC and LAxPRC with 271,847 SNPs as a test.
NOTE, I had trouble with convergence with dadi, so I abandoned this in favor of ABC.