Research

We use population genetic theory and high-throughput biological sequence analysis to study recent evolutionary history in humans and other species. One of our primary research interests is the evolution of mutagenesis–we want to understand the forces that control DNA replication fidelity, the mutational breakdown of established traits, and the ultimate origin of new traits. We are also broadly interested in the development of novel methods to study the impact of demography, inbreeding, and hybridization on the dynamics of natural selection, particularly in the wake of gene flow between humans, Neanderthals, and other extinct hominids.

Evolution of the mutation rate and spectrum

Genetic mutations are the force that both creates life and destroys life: all evolutionary adaptations, life-ending cancers, and debilitating genetic diseases are set in motion when a cell divides and transmits an imperfect copy of its DNA to one of its daughter cells. As such, DNA replication fidelity is one of the most important traits for geneticists to study. If we understood more about which genetic variants act as “mutator alleles” by causing cells to replicate their DNA less accurately, this would help us understand both the genetic basis of cancer risk and the occurrence of germline mutations that cause disease in newborns. Unfortunately, DNA replication fidelity is more difficult and expensive to measure than classical quantitative traits like height; consequently, less is known about its genetic architecture.

Although DNA is replicated and repaired by highly conserved housekeeping pathways, the mutation rate appears to evolve surprisingly rapidly over evolutionary time. One way to see this is to compare the relative mutation rates of different 3-base-pair DNA motifs, expanding a one-dimensional "mutation rate" into a rich, multidimensional "mutation spectrum." By looking at genetic variants that segregate in human populations today, we found that Europeans and South Asians were affected by a "burst" of TCC>TTC mutations that died out at least 2,000 years ago (Harris 2015; Harris and Pritchard 2016). Due to changes in the mutation rates of particular DNA motifs, each human population and great ape species appears to have its own distinctive mutational spectrum that results from a unique set of mutational challenges and repair processes. We are working to decipher how this variation is genetically and environmentally determined and what evolutionary pressures (such as cancer, congenital disease, or life history) might be driving mutagenesis to change.

Population genetic inference of mutation spectrum history (MuSH)

Population geneticists have long understood that genetic variation is a rich source of information about recent history, including population size changes and migration. Now that we are starting to understand that mutation spectrum evolution is also a ubiquitous feature of evolutionary history, my group is working to quantitatively chart how it proceeds alongside other features of demographic history. Although we originally reported that the European TCC pulse began about 15,000 years ago, more sophisticated modeling led by Will DeWitt has shown that it likely started much earlier, perhaps 80 thousand years ago (DeWitt, et al. 2021). We are actively working to better understand the role of gene flow in distributing mutational signatures among ancient and modern populations, which may be needed to reconcile the age of the TCC pulse with its particular population distribution.


Searching for mutator alleles and the causality of mutation spectrum evolution

What causes new mutational signatures to suddenly pop up in the germlines of particular populations and species? The most likely explanations are environmental mutagens and novel replication and repair gene variants. In natural populations, it is difficult to know what environmental mutagens might have been present in the environments where ancestral genetic variants arose as new mutations, and the existence of sex and recombination even make it difficult to map the effects of genetic mutators. Both of these sources of uncertainty are reduced or eliminated when considering mutations that are accumulating in artificial colonies of model organisms, whose environments and breeding structure are controlled and recorded by researchers. By leveraging this insight, we recently identified mutator alleles that shape mutation spectrum variation within yeast (Jiang, et al. 2021) and mice (Sasani, et al. 2021). We are actively working to map more mutator alleles and investigate to what extent germline mutator alleles might cause somatic mutations and thereby influence the heritability of cancer risk and aging phenotypes.



Inference of Evolutionary History from Whole Genomes

To extract information about biology and population history from genetic variation data, it is essential to develop good statistical models that describe how genomes and populations evolve. Coalescent theory is an extremely powerful framework that makes it possible to infer population history from present-day genomes; I and others have used it to infer population size changes and migration events that have shaped the histories of humans and other species. Working with Rasmus Nielsen, Yun Song, and Sara Sheehan during my graduate studies, I developed composite likelihood methods that use the spatial distribution of SNPs in whole genomes to infer divergence times when populations began splitting apart, the extent of ongoing gene flow between separate popuations, and past booms and busts that affected population size (Harris and Nielsen 2013, Sheehan, et al. 2013, Harris, et al. 2014).

By mathematically describing how new mutations beget variation within populations and eventually fixed differences between species, it becomes possible to tease apart signatures of past population crashes from the footprints of natural selection or gene flow between diverged cousins like humans and Neanderthals. I have used theory and simulations to begin exploring the complex dynamics of natural selection that result from gene flow between vigorous populations like humans and struggling, inbred populations like Neanderthals (Harris and Nielsen 2016).

By developing new ways to summarize and analyze huge whole-genome databases like the one generated by the 1000 Genomes Project, I aim to figure out which mathematical models best describe our evolution and also pick out subtle ways in which all standard population genetic models fail to predict the patterns that exist in our genomes. One such pattern are the clusters of SNPs created by multinucleotide mutations (MNMs), complex mutation events that alter multiple nearby base pairs at once (Harris and Nielsen 2014). As larger and larger databases such as TOPMed become available, I am excited to continue mining them for patterns that violate standard population genetic models and give clues about the rich variety of biological processes that create, maintain, and destroy genetic diversity.

TLDR

If you're more of a visual learner, these Alex Cagan seminar sketches offer some nice snapshots of my research interests:

The Human Mutation Rate Meeting; Max Planck Institute for Evolutionary Anthropology; Leipzig, Germany 2015

Society for Molecular Biology and Evolution Annual Meeting; San Juan, Puerto Rico 2014

Society for Molecular Biology and Evolution Annual Meeting; Chicago, IL 2013