Post date: Oct 19, 2015 4:28:33 PM
I have been going in circles trying to figure out what exactly happened with the v2 to v3 comparisons for the whole genome data. I am pretty sure something was off (although I can't quite reconcile this with some of our strong results, which imply that things were indeed lined up), but I can't figure out the details without intermediate files that do not exist. With the (I think corrected) T. cristinae ids (but still based on the windows defined by the 10 pairs) I am not able to recreate the results we had. But, this might be because the data from the experiments were off in the same way (or that everything was correct and got mislabeled anyway). Given all of this, I am going to archive the old genome results for T. cristinae and the experiment and re-run the analyses using windows from T. cristinae. If we get the same answer as before we can be very confident in it. If not, it was either false or highly dependent where exactly window boundaries were; either way this would mean that we shouldn't trust it much. Here is my plan (at least for the initial things to redo):
1. Go back and get all SNPs from scaffolds that are on LGs in v3. Then define windows based on the four ecotype pairs and estimate Fst and pi for the four pairs, the founding experiment populations, and the survivors (using the same window definition as for T. cristinae).
2. Test for overlap between Fst in the experiment and nature; define DSRs. Test for reduced pi in the DSRs.
3. Fit the HMM for Fst in T. cristinae.
4. Re-run ABC analyses for T. cristinae.
5. Re-test for enhanced Fst for stripe (we will now have both top SNPs) and trait-associated SNPs.
For now, I don't think we need to re-run any of the 10 ecotype pair stuff. We don't actually compare windows for them and cristinae and thus its fine that they differ. I expect this to take about a week, but I will pass things along as they finish. Obviously this is annoying, but in the end we will have cleaner results as we will make better use of the cristinae genome data.