High-throughput sequencing technologies have dramatically enhanced the pace of genetic research, leading to significant insights in disease causation and the identification of potential drug targets. The UK Biobank (UKB) serves as a rich repository, offering diverse data types beyond genomic sequences, such as imaging, proteomic data, electronic health records (EHRs), and physical activity records
The objective is to leverage the UKB's multimodal data to improve our understanding of human health and diseases. This involves developing models to analyze these data types together, exploring genetic and phenotypic correlations, and constructing robust disease prediction models using both supervised and unsupervised machine learning techniques.
The approach involves generating image-derived phenotypes (IDPs) using advanced segmentation techniques and integrating these with other data modalities to produce comprehensive analytical models. Early findings indicate the potential of these integrated models to uncover cross-trait relationships and improve predictions of disease incidence and genotype-to-phenotype associations, thus highlighting the value of a cohesive and predictive framework in genetics research.
Recent advancements in ancient DNA (aDNA) studies have enhanced our ability to trace human evolution through genomic changes over time. Although previous studies have utilized time-stratified aDNA for selection scans, they have primarily focused on single locus approaches. These approaches often overlook the nuanced information that multi-locus statistics can provide regarding selective events.
To address this gap, we conducted a multi-locus genotype scan for natural selection using a statistic designed for unphased diploid data, G12, across 708 European samples spanning 7,000 years. We aimed to validate the effectiveness of G12 on aDNA, factoring in the complexities like high missingness rates, DNA damage, and ascertainment bias typical in ancient DNA analyses.
Our validation involved simulations and the use of control loci, such as the lactase persistence allele known from modern Europeans. G12 outperformed another recent statistic, SweepFinder2, in detecting selection signals. Our analysis revealed 14 regions of selection throughout four historical periods, with notable declines in detectable signals over time. These findings underscore the significant role of selective pressures in early European history, potentially masked by genetic drift and demographic changes like admixture.
(This work in now published in Nature Communications)