Post date: Nov 13, 2013 5:33:41 PM
We want to know whether top parallel differentiation SNPs exhibited large, parallel (by host plant) changes in the between-generation experiment. To answer this question I first calculated the average difference in allele frequency change between adenostoma and caenothus experimental populations. The results and R code are in projects/timema_reproduction_experiment/bgsr_results/.
Next I created a directory (projects/timema_wgwild/experiment) with locus id files for the wgrs and bgsr data, as well as the relevant parameter files (the parallel differentiation and parallel change measures). I then wrote a perl script to identify SNPs that were in both data set (combineParDivExp.pl). Only 3654 SNPs were in both data sets. This puts severe limits on our ability to ask whether the top parallel differentiation SNPs also show parallel allele frequency divergence in the experiments (at best we should expect one or two of these 3654 SNPs to be the parallel differentiation SNPs). Various factors might contribute to this lack of overlap in the data sets. First, we are only considering linkage group SNPs which should exclude a bit more than half of the GBS SNPs (this means we might expect about 30 thousand shared SNPs, still 10X more than we have). A second important difference is that the GBS SNPs were ascertained in a large sample of individuals from a single populations whereas the whole genome resequence data were ascertained based on smaller samples from a greater number of populations. This could be pretty important (particularly for rare, spatially restricted variants).
Regardless, we might need to drop this analysis or figure out some other way to ask the same question that does not rely on completely overlapping SNPs.