Post date: Dec 06, 2019 8:57:54 PM
Zach's response to 2019/12/05's summary:
This looks promising, it seems that calling hets. is relatively robust to thresholds. To be clear, this is the average proportion (across all individuals and SNPs), right? That is a nice initial summary. You note the sd is large, is that the SD across loci or individuals, presumably loci? If so that is good. Pick a few thresholds, and make plots (histograms would be fine) of the proportion of hets at each locus (proportion out of non NA calls). In other words, we want to see the distribution across loci. Ideally it will be bimodal, with one mode that is really high, and one that is much lower (and maybe broader).
Answer:
This is the mean number of heterozygotes per SNP across Pando only individuals. I take the number of heterozygotes in each SNP and average across 15543 SNPs.
The standard deviation is across loci.
Histograms of hets proportion can be seen here (look at page three!).