Post date: Mar 18, 2014 10:0:9 PM
We wanted to know whether SNPs with high Fst for individual population pairs had similar functional enrichments as each other and as the parallel divergence SNPs. Thus, I generated files with these loci for Victor. We wanted to have roughly the same number of SNPs as we had for the parallel divergence SNP enrichment analysis. I went about this two ways. First, I grabbed the top 0.01% highest Fst SNPs (i.e. those above the 99.99th quantile) for each population pair. The set in each pair was unique (i.e. none showed up in two or more populations pairs). This yields 440 SNPs per population pair. These are in separate files where each row is a SNP and the columns give LG, order of scaffold on the LG, scaffold number, and nucleotide position (Victor, these are the same ids we have been using). These are the files noParHiFst[1-4].txt (HV, MR, R12, La x Prc) in projects/timema_wgwild/hmm/. There is a fifth file (noParHiFstAll.txt) that has the top 0.0025% highest Fst SNPs (99.9975th quantile) for each population pair, but all combined in a single file. This file has 440 SNPs, So, we can either match the number of SNPs per population pair (files 1-4) or across all population pairs (file 5).