Publication

Selected publications of Yanglab @ HKUST

We introduce STitch3D, a unified framework that integrates multiple ST slices to reconstruct 3D cellular structures. By jointly modelling multiple slices and integrating them with single-cell RNA-sequencing data, STitch3D simultaneously identifies 3D spatial regions with coherent gene-expression levels and reveals 3D cell-type distributions. 

We develop a statistical method for cross-population fine-mapping (XMAP) by leveraging genetic diversity and accounting for confounding bias. We show that the output of XMAP can be integrated with single-cell datasets, which greatly improves the interpretation of putative causal variants in their cellular context at single-cell resolution.

We developed a unified approach to integrating scRNA-seq reference data and spatial transcriptomics (ST) data that leverages deep generative models. With innovation in model and algorithm designs, SpatialScope not only enhances seq-based ST data to achieve single-cell resolution, but also accurately infers transcriptome-wide expression levels for image-based ST data.

We have created Portal, a unified framework of adversarial domain translation to learn harmonized representations of datasets. When compared to other state-of-the-art methods, Portal achieves better performance for preserving biological variation during integration, while achieving the integration of millions of cells, in minutes, with low memory consumption. We show that Portal is widely applicable to integrating datasets across different samples, platforms and data types. 

Mendelian randomization (MR) is a valuable tool for inferring the causal relationship between an exposure and an outcome. Great efforts have been made to relax MR assumptions to account for confounding due to pleiotropy. However, causal effects are often falsely detected between exposures and outcomes, even in the absence of genetic correlation. Here, we show that sample structure is a major confounding factor that is largely ignored by existing summary-level MR methods. To detect causal effects with well-calibrated statistical inference, we propose MR-APSS to account for pleiotropy and sample structure simultaneously by leveraging genome-wide information. Real data-analysis results suggest that MR-APSS not only avoids many false-positive findings, but also improves the statistical power of detecting causal effects.

We develop a statistical method, LOG-TRAM, to leverage the local genetic architecture for Trans-ancestry association mapping (TRAM). By using biobank-scale datasets, we show that LOG-TRAM can greatly improve the statistical power of identifying risk variants in under-represented populations while producing well-calibrated p values. We applied LOG-TRAM to the GWAS summary statistics of various complex traits/diseases from BioBank Japan, UK Biobank, and African populations. We obtained substantial gains in power and achieved effective correction of confounding biases in TRAM.

We develop a cross-population analysis framework for PRS construction with both individual-level (XPA) and summary-level (XPASS) GWAS data. By leveraging trans-ethnic genetic correlation, our methods can borrow information from the Biobank-scale European population data to improve risk prediction in the non-European populations. With novel data structure and algorithm design, our methods are scalable to millions of samples and millions of genetic variants, providing a substantial saving in computational time and memory usage.

Publication

2024

2023

2022

2021

2020

2019

2018

2017 

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005