LINE-1s, being the most abundant and only active type of transposable elements in the human genome, are keys to many fundamental problems in human genomics. Our understanding of the youngest LINE-1s is limited due to the scarcity of high-quality genome data at a population scale. To tackle this issue, we retrieved LINE-1 sequences from long-read-based haploid genome assemblies and analyzed their sequence and evolutionary history. Our data show that there are twice as many intact LINE-1s in haploid-resolved genomes compared to the reference genome. Additionally, retrotransposition assays reveal that the number of in vitro active LINE-1s is also doubled. The active LINE-1s in the human population are predominantly polymorphic, exhibiting variable allelic forms, and their activity varies dramatically both in vitro and in vivo.
These findings suggest that during the emergence of young LINE-1 lineages, their activity and population frequency are not always correlated. Some active LINE-1s may be "hidden" in alternative allelic forms within the human population, indicating that the activity of emerging LINE-1 lineages can persist much longer than previously thought. Polymorphic LINE-1s with consistently high activity provide valuable candidates for studying recent adaptive evolution. We are currently expanding our research to include more high-quality assemblies from the pangenome reference to generate a nearly complete catalog of all full-length LINE-1s in the global human population and to conduct a thorough analysis of their recent adaptation in the human genome.
https://doi.org/10.1038/s44318-023-00007-y (The EMBO Journal)
We are also excited to be involved in the 1000G-ONT consortium, an international collaboration focused on re-sequencing the 1000 Genomes Project with long reads using nanopore technology. This opportunity will enable us to investigate LINE-1 variation in the human population at an unprecedented level and to generate a nearly complete reference of all full-length young LINE-1s in human. This enhanced reference will facilitate research on the physiological impact and disease risk associated with LINE-1s, providing deeper insights into their role in human health and disease.
https://doi.org/10.1101/gr.279273.124 (Genome Research)
I am interested in the evolution of retrocopied genes, particularly the retrocopies of the restriction factors of retrotransposons - their parental genes restrict retrotransposons, meanwhile they can occasionally take advantage of retrotransposons to duplicate themselves. The project aims to understand the role of retrocopied restriction factors. Do they function similarly with their parental genes to restrict transposable elements, or they help transposable elements by inhibiting their parental genes as "dominant negative"?
As an example of one facet of the above mentioned question, our work demonstrated that the gene-duplicating activity of transposable elements can be co-opted by our genomes to rapidly generate new innate immune genes. The A3 gene family members are known to play essential roles in our immune defense, and they are rapidly evolving to keep up with the everchanging demands of our immune system. Our work discovered at least 10 additional retrocopies of A3s in New World monkey genomes. Our data show that some of the A3 retrocopies are expressed in various New World monkey tissues, and they are functional to restrict retroviruses and retrotransposons. We proposed that retrocopying-mediated duplication is a key mechanism for host encoded APOBEC3 genes to catch up with the fast pace of retrovirus evolution. This insight established the new paradigm that the ability of immune systems to adapt recurrently and rapidly may depend upon these innate immune retrogenes.
In collaboration with Dr. Miriam Rosenberg, at the Hebrew University of Jerusalem and an affiliate of PNRI, we are investigating a remarkable phenomenon where some children experience significant regression of brain tumors due to their immune systems, even without medical treatment. These children often develop a rare and severe condition called Opsoclonus-Myoclonus-Ataxia Syndrome (OMAS), which leads to seizures and other debilitating neurological symptoms. Interestingly, children who develop OMAS tend to have a higher survival rate compared to those who only have the tumor without OMAS. We hypothesize that the immune system becomes hyperactive in fighting the tumor but fails to deactivate once the tumor is eradicated. Although the underlying mechanisms remain unclear, we have gathered evidence suggesting that transposable elements, such as LINE-1s, may play a role in triggering this immune response that also eliminates the tumor.
https://doi.org/10.1016/j.celrep.2023.112879 (Cell Reports)
Retroviruses, beyond their association with diseases, actively influence host genomes by integrating into them, a crucial step in their life cycle. A group of retroviruses known as Cervid Endogenous Retroviruses (CrERVs) is currently integrating into the genome of mule deer, providing a unique opportunity to study the early stages of retrovirus endogenization. To investigate CrERV evolution, I generated the first draft assembly of the mule deer genome and used this assembly to reconstruct and map all CrERVs. The evolutionary history of CrERVs reveals that they can recombine with infectious counterparts, enabling the recombinant CrERVs to retrotranspose and be passed through generations. Additionally, CrERV insertion sites are enriched near host genes, suggesting their potential role in gene regulation. These findings offer new insights into CrERV interactions with the host genome, their possible connection to chronic wasting disease, and present a model for understanding the impact of endogenous retroviruses on human diseases.
https://doi.org/10.1093/molbev/msab252 (Molecular Biology and Evolution)
Since retroviral peptide signals were detected in the genomes of both LGL leukemia patients and their spouses, we investigated the genomes of Large Granular Lymphocyte (LGL) leukemia patients for potential retroviral etiology. To address the two possible scenarios of retroviral involvement, we designed a workflow capable of detecting both clonal and rare retroviral insertions in the sequenced genomes and transcriptomes of LGL leukemia patients. We developed a pipeline to identify retrovirus-sized insertions, which can detect clonal retroviral insertions, and another pipeline inspired by metagenomics analysis to detect rare retroviruses in raw sequences that cannot be mapped to the reference genome or assembled. Although we did not find any new retroviral insertions, we found more HERV-Ks carried by the LGL leukemia patients than the general population.
https://doi.org/10.1186/s12920-019-0549-9 (BMC Medical Genomics)
I played a key role in developing the pipeline to calculate the frequency of all known polymorphic HERV-Ks in the human population. Using this tool, we discovered that the East Asian population has a lower burden of HERV-K compared to other human populations. Further investigation revealed that LGL leukemia patients have a higher HERV-K burden than the general population. This work not only enhanced our understanding of the association between HERV-K burden and LGL leukemia but also represents the first systematic effort to generate references of full-length HERV-Ks in the general human population using whole genome sequencing data.
https://doi.org/10.1371/journal.pcbi.1006564 (PLOS Computational Biology)
The "B1" type of non-autonomous transposable elements depends on LINE-1s for their propagation. In a group of South American rodents (the Sigmodontinae subfamily, including species like the "hispid cotton rat" and "marsh rice rat"), both LINE-1 and B1 lost their activity, leading to their "extinction". Interestingly, B1's extinction occurred before that of LINE-1 in these rodents. I discovered that historical waves of LINE-1 and B1 expansion were synchronized, and the timing of these activity waves can be traced back several millions of years before the extinction. This work significantly enhances our understanding of the interaction between autonomous transposable elements (LINE-1) and their dependent elements (B1).
https://doi.org/10.1186/s13100-019-0164-5 (Mobile DNA)
LINE-1 retrotransposons lost their activity in a large clade of megabats (many genera of the Pteropodidae family, mostly known as "flying foxes"). However, how this "extinction" happened was a mystery. I resurrected the extinct LINE-1 retrotransposon from the genome sequences of these megabats. Genome analyses and in vitro LINE-1 activity assays showed that megabat LINE-1 lineages expanded significantly before their extinction. The two megabat LINE-1 lineages were synchronized in their historical activity profile, implying a strong evolutionary arms race between LINE-1s and the restriction of the host genome.
https://doi.org/10.1371/journal.pgen.1004395 (PLOS Genetics)
© 2024 Lei Yang