Post date: Sep 11, 2014 10:27:36 PM
Initial PCA of Moe's data gave results that did not fit Patrik's expectations for population structure. This was particularly true if I used the average allele frequency prior, but also true with a uniform prior on genotypes.
There are two things I am trying to explore this issue, as it could suggest a general problem with my new variant calling pipeline.
First, I am variant calling without performing indel realignment. These vcf files with *norealn* in the name are the result of this (so far analyses are running for LG 1 and LG 2, but I will keep adding more over the next days). Note that bams.list has the indel realigned bam files whereas bams2.list has those without indel realignment.
Second, I want to try an entropy analysis of both sets of vcf files. I haven't set this up yet.
Also note that I have still only finished the first 8 LGs for variant calling even from the realigned data. In other words, I have more to do here.