Does recombination lead to more efficient natural selection? How do mutation rates vary within genomes? Do genes expressed in germ line have unusual recombination rates? What determines whether a pair of adjacent genes is maintained through evolutionary time, and are pairs with correlated expression profiles more frequently maintained? How does the composition of a sequence or a genome evolve? Do life history traits predict lineage-specific selection efficiency, and can we measure this accurately? To what extent does constraint on protein evolution determine amino acid composition? These are some of the questions my work has addressed.
Recombination as a correlate of rates of evolution
Meiotic recombination is thought to modulate the efficacy of selection, and hence substitution rates, as well as patterns of diversity within genomes. It is, however, also known to covary with gene expression levels, double strand break formation, nucleotide composition, replication timing and gene order conservation. While crossover correlates negatively with divergence after controlling for confounding variables, consistent with more efficient selection owing to reduced Hill-Robertson interference, I have shown that sequences prone to double strand breaks may be inherently slow-evolving in S. cerevisiae. On the other hand, increased mutation rates in sequences that replicate late during S-phase lead to increases in diversity that may partly obscure the correlation between diversity and recombination in Drosophila. Another observation from my work is that different taxa may show similar patterns as regards for instance the connection between recombination and increased rates of gene order rearrangement, but differ in terms of whether linkage between adjacent coexpressed genes is preserved. The latter appears to be the case in S. cerevisiae, but not D. melanogaster.
The role of population size in genome evolution
In addition to its connection with substitution and rearrangements, recombination is a well-established driver of nucleotide composition. As the machinery that repairs double strand breaks that occur during meiotic recombination favours the transmission of GC alleles, a phenomenon known as GC-biased gene conversion (gBGC), highly recombining regions tend to be GC-rich. Much as in mammals, small-bodied bird species with putatively larger populations have overall higher genomic GC contents, consistent with a process that is more effective when Ne is high. As expected, we find these effects to be most pronounced and the heterogeneity between lineages the greatest in regions with high crossover rates. This has implications for the selection of sequences for phylogenetic reconstruction, as it may affect topology.
Under the nearly neutral theory, it is typically assumed that small-bodied animals with large Ne ought also to have low dN/dS owing to more efficient selection. Surprisingly, this appears not to be the case in birds, where there is a negative correlation between dN/dS and body mass that is robust to control for non-stationary composition and divergence time. Meanwhile, the correlation between body mass and the ratio of radical to conservative amino acid changes is positive, in principle consistent with large-bodied species being more prone to the accumulation of more disruptive substitutions.
Estimating the efficacy of selection & predicting selective effects
The above observations indicate that caution is warranted when relying on a single metric to assess selection efficiency, and care must be taken to control for covariates. However, they are not in themselves sufficient to benchmark a given method. For instance, it is known that ecological shifts between species can alter the selective landscape, which may complicate the relationship between Ne and which changes are accepted. While the notion that "conservative" amino acid changes might behave as effectively neutral is appealing, little is known about the extent to which selection can distinguish between "radical" and "conservative" changes. Meanwhile, although placing all non-synonymous changes in a single category may appear coarse-grained, the dynamics of dN/dS are currently better-understood. It is therefore necessary to carefully assess how well different classes of model explain phylogenomic data, and the extent to which selection acts of amino acid preference. The latter can be examined with the help of mutation-selection models, which offer insight into selective coefficients associated with different classes of change. Knowledge about the predicted selective effects of a given mutation and improved metrics will be useful to conclusively answer whether selection is more efficient in certain genomic regions or lineages.