Post date: Jun 01, 2015 10:24:58 PM
I obtained better (lower rmsd) estimates of population ancestry frequencies for strong and weak selection simulations with +- 10 vs. +- 4 neighboring SNPs used for LDA. See results and plots with 'n10'. So, I am now running the demographic and selection data sets with popanc with n = 10 and n = 20 SNPs to better evaluate the choice for number of SNPs.
So far I have only compared popanc with structure for demographic data sets with +- 4 SNPs for LDA. In this case I inferred population ancestry frequencies by taking the mean of Pr(00) + 0.5 * (Pr(01) + Pr(10)) from the site-by-site output. I used the sumStruct.pl script in the structure sub-directory for this. Even with +-4 SNPs popanc does slightly better than structure, although this is mainly true for 200 generations (i.e. where there has been more drift in ancestry frequencies across genetic regions). I suspect popanc will do even better with +- 10 or 20 SNPs (running now). Also, popanc is about 10x faster and gives credible intervals (though not ideal ones) on the ancestry frequencies.