This study maps mutation rates at single-nucleotide resolution in the human genome and uncovers a hypermutation peak at transcription start sites (TSSs). The signal correlates with transcriptional activity in testes and is also observed in long noncoding regions and RNA Polymerase II pause sites, suggesting transcription–replication conflicts drive local mutagenesis. The authors further identify distinct mutation patterns at exon–intron boundaries and show that ancestral and ongoing GC-biased gene conversion (gBGC) shapes the GC-content around TSSs. Overall, the work reveals how transcription-associated hypermutation and gBGC together influence local nucleotide composition.
This study evaluate how cultural transmission of reproductive success (CTRS) affects tree‑sequence inference. We simulate genealogies and allele‑frequency trajectories under scenarios with varying degrees of CTRS which is known to imbalance trees and apply tree‑based summary statistics to assess biases of standard softwares. They show that CTRS can distort inferred coalescent structure. The results highlight the need to account for non‑genetic transmission mechanisms when interpreting tree‑sequence outputs.
This study uses forward-in-time simulations (SLiM) to investigate how the frequency of meiosis versus mitosis, recombination rates, and selection coefficients shape genetic diversity. The authors show that selective sweeps strongly reduce diversity when meiosis is rare and that the recombination rate per meiosis, rather than meiotic frequency, determines the extent of linked selection. They also examine how dominance affects diversity loss, providing new insights into the interplay between recombination and selection in evolutionary genetics.
This study investigates the role of transient mutator phenotypes in mutation accumulation in yeast. By measuring single and double mutation rates, the authors show that double mutations occur up to 17 times more often than expected, originating from genetically normal cells that temporarily express a mutator state. Simulations indicate these mutator subpopulations are small and undergo short, intense bursts of mutation, with most double mutations accumulating sequentially rather than simultaneously.
In vertebrates, protein-coding genes have a GC-peak near their transcriptional start site. This pattern impacts how transcribed mRNAs are exported from the nucleus and translated into proteins. Although it is assumed that patterns of GC-content are shaped solely by adaptive evolution, we demonstrate in this paper that non adaptive processes such as historical GC-biased gene conversion play a major role. In fact, we show that the GC-peak is not preserved by selection and is currently decreasing to reach the mutation rate equilibrium.
One major goal of molecular evolutionary biology is to identify genomic regions under selection and/ or adaptation. In this perspective, we aim to refine our definitions of selection and genetic drift, as well as additional mechanisms that constrain evolution of the genome under diverse contexts. We highlight that all of these processes need to be taken into account to correctly identify the targets of selection.
Our study explores the processes driving genomic diversity in regions of low recombination using a combination of simulations, theory and analyses of human data. We investigate how selection against several partially recessive variants affects linked neutral diversity (associative overdominance) can increase diversity very strongly, in some cases up to a 3-fold increase relative to the neutral expectation. We also characterize the conditions under which associative overdominance is strong (selection, dominance and recombination parameters). The increase in diversity is driven by the maintenance of complementary haplotypes such that the effects of recessive variants are masked in heterozygous state, which can be considered a form of balancing selection. We finally performed a genome scan on 1000G human populations and identify several genomic regions possibly subject to associative overdominance.
In this study, we examine the genomic diversity of human populations and show that purifying selection at linked sites (i.e. background selection) and GC-biased gene conversion (gBGC) affect as much as 95% of the variants of our genome. The magnitude and relative importance of these processes are largely determined by variation in recombination rate and base composition. By conditioning on genomic regions with recombination rates above 1.5 cM/Mb and mutation types (C↔G, A↔T), we identify a set of SNPs that is mostly unaffected by background selection or gBGC, and that avoids these biases in the reconstruction of human history.
This paper has been highlighted in eLife by a insight paper from Kelley Harris, Neutral evolution: the randomness that shapes our DNA and an eLife digest. It also benefits from a press release by the Swiss Institute of Bioinformatics and by the University of Bern. You can read the outreach summary in english, french or german.
For more information concerning the importance of our paper in the debate over neutral evolution from the evolutionary genetics' community you can read this article which cites our paper:
<< With the accumulating evidence for adaptation in the human genome, it seems likely that some large fraction of the genome would be subject to the effects of linked selection, he suggested. “We just don’t know how large that fraction is.” [says Andrew Kern] A recent paper in eLife by Fanny Pouyet and her computational-geneticist colleagues at the University of Bern and the Swiss Institute of Bioinformatics pins down that number. [...] [T]hey concluded that less than 5 percent of the human genome evolved by chance alone. As the editors of eLife noted in their summary of the paper, “This suggests that while most of our genetic material is formed of non-functional sequences, the vast majority of it evolves indirectly under some type of selection." >>
This study uses gene regulatory network models to examine the functional consequences of yeast GAL3 sequence variants. We link the genetic variation that exist among a population to changes of parameter values of the regulatory GAL network. We combine the numerical approach to experimental analyses of the yeast GAL network and we show that GAL3 natural variation is sufficient to convert a gradual response into a binary switch. Finally, dynamic network modeling allows us to successfully maps alleles to specific locations of the parameter space and to functionally infer the consequences of DNA polymorphisms in the population. This framework can be more generally applied to the mechanistic interpretation of genetic variants.
Here, we study the variation in synonymous codon usage among genes involved in different functional categories in humans. We show that synonymous codon usage is not driven by constraints on tRNA abundance, but by large-scale variation in GC-content, caused by meiotic recombination, via the non-adaptive process of GC-biased gene conversion (gBGC). First, we observe that expression in meiotic cells varies among functional categories. Then, we demonstrate that meiotic expression is associated with a decrease in recombination within genes and as a consequence is linked to a reduced level of gBGC. Overall, the differences in gBGC stength explains 70% of the variance in synonymous codon usage among genes. We argue that the strong heterogeneity of synonymous codon usage induced by gBGC in mammalian genomes precludes any optimization of the tRNA pool to the demand in codon usage.
We present a codon substitution model named SENCA (site evolution of nucleotides, codons, and amino acids) that disentangles 3 levels of genes evolution. SENCA separately describes 1) the nucleotide processes which apply on all sites of a sequence such as the mutational bias, 2) the preferences between synonymous codons, and 3) the preferences among amino acids. We study the core genome of 21 prokaryotes intraspecifically and five Enterobacteria interspecifically. We retrieve a universal mutational bias toward AT. We also argue that most synonymous substitutions are not neutral and must be taken into account to estimate the selection parameter on nonsynonymous substitutions. We propose new summary statistics to measure the relative importance of these 3 levels.
Bio++ is a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. Bio++ is designed to be both easy to use and computer efficient by providing researchers a set of re-usable tools. This paper presents the second major release of the libraries, which provides notably a built-in access to sequence databases and new data structures for handling and manipulating sequences from the omics era. Complex models of sequence evolution, such as mixture models and generic n-tuples alphabets, are also included. You can find the description of these tools, here.