Ancient DNA sampling can confound tests for admixture

Post date: Apr 6, 2016 7:19:39 PM

As ancient genomes from humans and other species accumulate, we are to coming to grips not only with issues of reduced sequence quality and risk of contamination, but also many unexpected oddities of having data from different time points. Ancient DNA data from different historical time points is most certainly the most direct way to learn about the population history of humans and other species, but analyzing genetic data that has temporal structure requires special considerations. Recently, I was thinking that it might also affect the 3-population test, one of the most powerful analytical tools we have in our arsenal when we reconstruct human prehistory. This test for admixture was invented by David Reich and Nick Patterson in 2009, as part of a population genetic framework that uses allele frequency differences between populations to reconstruct population history (original paper). The 3-population test, denoted f3(A, B; X), that can provide unambiguous evidence that a population X is related to populations A and B through admixture, and not a simple tree model. Unambiguous, it turns out, only if we have unbiased estimates of the population allele frequencies.

To test the effect of having samples from different time points on the 3-population test, I simulated a model with temporal structure but no admixture, as illustrated in the left panel of the figure below.

When we use the 3-population test on the resulting data we obtain a highly negative f3-statistic (Z = -6), which might have mislead us into thinking that the Ancientpop population was admixed between the two present-day populations. In the right panel, we modify our simulation to have no temporal structure and observe that the f3-statistic is consistent with zero (it is not positive because the ancient population does not have any private drift in this model).

Note, that the confounding effect above can also arise if the individuals in Ancientpop was from the present time, but from lineages that diverged at different time points from the ancestral population, and subsequently pooled. In fact, this is what I simulated with ms, since this is identical to temporal structure when polymorphisms are ascertained in the root (so that the mutations that arise when individuals are supposed to be dead are automatically excluded).

ms (Hudson, 2002) command lines were:

left panel: ms 32 100000 -s 1 -I 13 10 2 1 1 1 1 1 1 1 1 1 1 10 0 -ej 0.26 2 1 -ej 0.10 3 1 -ej 0.11 4 1 -ej 0.12 5 1 -ej 0.13 5 1 -ej 0.14 6 1 -ej 0.15 7 1 -ej 0.16 8 1 -ej 0.17 9 1 -ej 0.18 10 1 -ej 0.19 11 1 -ej 0.2 12 1 -ej 0.25 13 1

right panel: ms 32 100000 -s 1 -I 13 10 2 1 1 1 1 1 1 1 1 1 1 10 0 -ej 0.26 2 1 -ej 0.15 3 1 -ej 0.15 4 1 -ej 0.15 5 1 -ej 0.15 5 1 -ej 0.15 6 1 -ej 0.15 7 1 -ej 0.15 8 1 -ej 0.15 9 1 -ej 0.15 10 1 -ej 0.15 11 1 -ej 0.15 12 1 -ej 0.25 13 1

After ascertaining SNPs in the second population here, there were about 12,000 independent SNPs for each model.