Publication
Selected publications of Yanglab @ HKUST
We introduce STitch3D, a unified framework that integrates multiple ST slices to reconstruct 3D cellular structures. By jointly modelling multiple slices and integrating them with single-cell RNA-sequencing data, STitch3D simultaneously identifies 3D spatial regions with coherent gene-expression levels and reveals 3D cell-type distributions.
We develop a statistical method for cross-population fine-mapping (XMAP) by leveraging genetic diversity and accounting for confounding bias. We show that the output of XMAP can be integrated with single-cell datasets, which greatly improves the interpretation of putative causal variants in their cellular context at single-cell resolution.
We developed a unified approach to integrating scRNA-seq reference data and spatial transcriptomics (ST) data that leverages deep generative models. With innovation in model and algorithm designs, SpatialScope not only enhances seq-based ST data to achieve single-cell resolution, but also accurately infers transcriptome-wide expression levels for image-based ST data.
We have created Portal, a unified framework of adversarial domain translation to learn harmonized representations of datasets. When compared to other state-of-the-art methods, Portal achieves better performance for preserving biological variation during integration, while achieving the integration of millions of cells, in minutes, with low memory consumption. We show that Portal is widely applicable to integrating datasets across different samples, platforms and data types.
Mendelian randomization (MR) is a valuable tool for inferring the causal relationship between an exposure and an outcome. Great efforts have been made to relax MR assumptions to account for confounding due to pleiotropy. However, causal effects are often falsely detected between exposures and outcomes, even in the absence of genetic correlation. Here, we show that sample structure is a major confounding factor that is largely ignored by existing summary-level MR methods. To detect causal effects with well-calibrated statistical inference, we propose MR-APSS to account for pleiotropy and sample structure simultaneously by leveraging genome-wide information. Real data-analysis results suggest that MR-APSS not only avoids many false-positive findings, but also improves the statistical power of detecting causal effects.
We develop a statistical method, LOG-TRAM, to leverage the local genetic architecture for Trans-ancestry association mapping (TRAM). By using biobank-scale datasets, we show that LOG-TRAM can greatly improve the statistical power of identifying risk variants in under-represented populations while producing well-calibrated p values. We applied LOG-TRAM to the GWAS summary statistics of various complex traits/diseases from BioBank Japan, UK Biobank, and African populations. We obtained substantial gains in power and achieved effective correction of confounding biases in TRAM.
We develop a cross-population analysis framework for PRS construction with both individual-level (XPA) and summary-level (XPASS) GWAS data. By leveraging trans-ethnic genetic correlation, our methods can borrow information from the Biobank-scale European population data to improve risk prediction in the non-European populations. With novel data structure and algorithm design, our methods are scalable to millions of samples and millions of genetic variants, providing a substantial saving in computational time and memory usage.
Publication
2024
Xianghong Hu, Mingxuan Cai, Jiashun Xiao, Xiaomeng Wan, Zhiwei Wang, Hongyu Zhao, Can Yang. Benchmarking Mendelian Randomization methods for causal inference using genome-wide association study summary statistics. The American Journal of Human Genetics, to appear [medRxiv version][software and datasets][datasets on Zenodo]. We conducted a benchmark study evaluating 16 MR methods using real-world genetic datasets. Our study not only provides valuable insights into the performance and limitations of the compared methods but also offers practical guidance for researchers to choose appropriate MR methods for causal inference.
Yuheng Chen, Xin Xu, Xiaomeng Wan, Jiashun Xiao, Can Yang. UCS: a unified approach to cell segmentation for subcellular spatial transcriptomics. [preprint][software] We propose a unified approach to cell segmentation (UCS) for SST data obtained from diverse platforms, including 10X Xenium, NanoString CosMx, MERSCOPE, and Stereo-seq. UCS leverages deep learning techniques to achieve high accuracy in cell segmentation by integrating nuclei segmentation from staining images and transcript data.
Xin Xu, Tong Xiao, Zitong Chao, Zhenya Huang, Can Yang, Yang Wang. Can LLMs Solve longer Math Word Problems Better? [Arxiv version] We show that large language models (LLMs) perform worse when solving long math word problems (MWP). To address this issue, we proposed a new instructional prompt to mitigate the influence of long context for proprietary LLMs. We also propose a new data augmentation technique for open-source LLMs to enhance their performance of solving MWP with longer context. This is a joint work with Prof Huang and Tong Xiao at USTC.
Xin Xu, Shizhe Diao, Can Yang, Yang Wang. Can We Verify Step by Step for Incorrect Answer Detection? [Arxiv] Chain-of-Thought (CoT) prompting enhances reasoning in large language models (LLMs). Previous studies focused on improving end-task performance and assessing reasoning chain quality. Can LLM accuracy be predicted by scrutinizing reasoning chains? We introduce R2PE, a benchmark exploring this relationship across five domains, measuring falsehood based on reasoning steps. The proposed process discernibility score (PDS) framework outperforms the answer-checking baseline significantly.
Yuyao Liu, Can Yang. Computational Methods for Alignment and Integration of Spatial Transcriptomics Data. Computational and Structural Biotechnology Journal [open access link] [software and datasets]. This mini-review paper provides an overview of the key concepts behind representative methods for slice alignment and integration of spatial transcriptomics data. We also present the testing results of these methods on diverse spatial transcriptomics datasets and evaluate their performance in significant downstream tasks. By understanding the strengths and weaknesses of each method, we aim to inspire and drive future advancements in this field.
Zhiwei Wang, Fa Zhang, Cong Zheng, Xianghong Hu, Mingxuan Cai, Can Yang. MFAI: A scalable Bayesian matrix factorization approach to leveraging auxiliary information. Journal of Computational and Graphical Statistics. [Arxiv version][software] We developed MAFI, a computationally scalable approach, to effectively leverage auxiliary information (MFAI) in probabilistic matrix factorization by integrating gradient boosted trees. The parameters in MAFI can be automatically determined under the empirical Bayes framework, making it adaptive to the utilization of auxiliary information and immune to overfitting.
Collin Sakal, Tingyou Li, Juan Li, Can Yang, Xinyue Li. Association Between Sleep Efficiency Variability and Cognition Among Older Adults: Cross-Sectional Accelerometer Study. JMIR Aging 2024;7:e54353 [link]. This is a collaborative research with Prof. Xinyue Li's team at CityU.
2023
Gefei Wang, Jia Zhao, Yan Yan, Yang Wang, Angela Ruohao Wu, Can Yang. Construction of a 3D whole organism spatial atlas by joint modeling of multiple slices with deep neural networks. Nature Machine Intelligence [link] [BioRxiv version][software][Video talk given by Gefei in Chinese] We developed STitch3D, a unified computational framework that integrates multiple 2D tissue slices to reconstruct 3D cellular structures from the tissue level to the whole organism level. With a novel model design with deep neural networks, STitch3D can simultaneously identify 3D spatial regions with coherent gene expression levels and reveal 3D distributions of cell types, facilitating various downstream analysis of spatial transcriptomics data.
Mingxuan Cai, Zhiwei Wang, Jiashun Xiao, Xianghong Hu, Gang Chen, and Can Yang. XMAP: Cross-population fine-mapping by leveraging genetic diversity and accounting for confounding bias. Nature Communications. [link][BioRxiv version][software]. To address challenging issues in identifying causal genetic variants, we developed a statistical method for cross-population fine-mapping (XMAP) by leveraging genetic diversity and accounting for confounding bias. We have showed that the output of XMAP can be integrated with single-cell datasets, which greatly improves the interpretation of putative causal variants in their cellular context at single-cell resolution.
Xiaomeng Wan, Jiashun Xiao, Sindy Sing Ting Tam, Mingxuan Cai, Ryohichi Sugimura, Yang Wang, Xiang Wan, Zhixiang Lin, Angela Ruohao Wu, Can Yang. Integrating spatial and single-cell transcriptomics data using deep generative models with SpatialScope. Nature Communication. [link][BioRxiv version][software][Video talk at Banff]. We developed a unified approach to integrating scRNA-seq reference data and spatial transcriptomics (ST) data that leverages deep generative models. With innovation in model and algorithm designs, SpatialScope not only enhances seq-based ST data to achieve single-cell resolution, but also accurately infers transcriptome-wide expression levels for image-based ST data.
Qing Shuai, Zhiyuan Yu, Zhize Zhou, Lixin Fan, Haijun Yang, Can Yang, Xiaowei Zhou. Reconstructing Close Human Interactions from Multiple Views. ACM Trans. Graph. 42, 6, Article 273. December 2023. [link] This paper addresses the challenging task of reconstructing the 3D poses of multiple individuals engaged in close interactions, captured by multiple calibrated cameras. This is a joint work with Prof. Xiaowei Zhou's lab at ZJU.
Kaidong Wang, Yao Wang, Xiuwu Liao, Shaojie Tang, Can Yang, Deyu Meng. Provable Tensor Completion with Graph Information. [Arxiv] We introduce a framework to solve the dynamic graph regularized tensor completion problem.
Chen Li, Can Yang and Zhixiang Lin. stVAE deconvolves cell-type composition in cellular resolution spatial transcriptomics. Bioinformatics. 2023. [link][software] This is a collaborative work with Zhixiang Lin's lab at CUHK. stVAE is a probabilistic model with the auto-encoder structure for spatial transcriptomics deconvolution. It offers superior performance over related methods in terms of computational speed and statistical accuracy.
Xinyi Yu, Jiashun Xiao, Mingxuan Cai, Yuling Jiao, Xiang Wan, Jin Liu, and Can Yang. PALM: A powerful and adaptive latent model for prioritizing risk variants with functional annotations. Bioinformatics. [software] To prioritize risk variants in genome-wide association studies (GWASs), we developed a new method integrating gradient boosted trees and the expectation maximization algorithm, which is scalable to millions of genetic variants in GWASs.
2022
The Tabula Microcebus Consortium, Camille Ezran, Shixuan Liu, Jingsi Ming, Lisbeth A. Guethlein, Michael F.Z. Wang, Roozbeh Dehghannasiri, Julia Olivieri, Hannah K. Frank, Alexander Tarashansky, Winston Koh, Qiuyu Jing, Olga Botvinnik, Jane Antony, Stephen Chang, Angela Oliveira Pisco, Jim Karkanias, Can Yang, James E. Ferrell Jr., Scott D. Boyd, Peter Parham, Jonathan Z. Long, Bo Wang, Julia Salzman, Iwijn De Vlaminck, Angela Wu, Stephen R. Quake, Mark A. Krasnow. Mouse lemur transcriptomic atlas elucidates primate genes, physiology, disease, and evolution. [BioRxiv] Jingsi Ming from our group greatly contributes to The Tabula Microcebus Project. The HKUST team (with Angela Wu) joins The Tabula Microcebus Consortium and makes a great contribution to this international project. Our FIRM method contributes to real data analysis.
Yiming Chao, Yang Xiang, Jiashun Xiao, Shihui Zhang, Weizhong Zheng, Xiaomeng Wan, LI Zhuoxuan, Mingze Gao, Gefei Wang, Zhilin Chen, Mo Ebrahimkhani, Can Yang, Angela Ruohao Wu, Pentao Liu, Yuanhua Huang, Ryohichi Sugimura. Organoid-based single-cell spatiotemporal gene expression landscape of human embryonic development and hematopoiesis. Signal Transduction and Targeted Therapy. [BioRxiv][Published version]. 2022. This is a collaborative work with Prof. Ryohichi Sugimura's team at HKU. The manuscript and software of SpatialScope are available [manuscript][software].
Jia Zhao, Gefei Wang, Jingsi Ming, Zhixiang Lin, Yang Wang, Tabula Microcebus Consortium, Angela Ruohao Wu, Can Yang. Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets. [Nature Computational Science][BioRxiv version 1, 2021][BioRxiv version 2, 2022][Software][Story behind the paper][Talk video given by Jia Zhao in Chinese: how did the Portal method create]. May 30, 2022.
Xianghong Hu, Jia Zhao, Zhixiang Lin, Yang Wang, Heng Peng, Hongyu Zhao, Xiang Wan, Can Yang. Mendelian Randomization for causal inference accounting for pleiotropy and sample structure using genome-wide summary statistics. Proceedings of the National Academy of Sciences [PNAS final version][BioRxiv version 1, 2021][BioRxiv version 2, 2022][MR-APSS software]. 2022.
Jiashun Xiao, Mingxuan Cai, Xinyi Yu, Xianghong Hu, Gang Chen, Xiang Wan, Can Yang. Leveraging the local genetic structure for trans-ancestry association mapping. The American Journal of Human Genetics [AJHG link][BioRxiv version][software]. 2022.
Qian Xu, Can Yang, Yu-Fang Pei. Editorial: Genetic Pleiotropy in Complex Traits and Diseases. Frontiers in Genetics.[link] 2022.
Jingsi Ming, Zhixiang Lin, Jia Zhao, Xiang Wan, Can Yang, Angela Ruohao Wu. FIRM: Flexible Integration of single-cell RNA-sequencing data for large-scale Multi-tissue cell atlas datasets. [Briefing in Bioinformatics][BioRxiv][FIRM software]. 2022.
Jiashun Xiao, Mingxuan Cai, Xianghong Hu, Gang Chen, Xiang Wan, and Can Yang. XPXP: Improving polygenic prediction by cross-population and cross-phenotype analysis. [XPXP software]. [link]. Bioinformatics, 2022.
Jingsi Ming, Jia Zhao, Can Yang. scPI: A scalable framework for probabilistic inference in single-cell RNA-sequencing data analysis. [scPI software][link]. Statistics in Biosciences, 2022. The 2024 Statistics in Biosciences Best Paper Awards.
Min Zhou, Mingwei Dai, Yuan Yao, Jin Liu, Can Yang, Heng Peng. BOLT-SSI: Fully Screening Interaction Effects for Ultra-High Dimensional Data. [Arxiv][Software][Statistica Sinica], 2022.
Hongzhao Fan, Can Yang, Yanguang Zhou. Ultralong mean free path phonons in HKUST-1 and their scatterings with water adsorbates. Physical Review B, 2022. [link]
2021
The Tabula Microcebus Consortium, Camille Ezran, Shixuan Liu, Stephen Chang, Jingsi Ming, Olga Botvinnik, Lolita Penland, Alexander Tarashansky, Antoine de Morree, Kyle J. Travaglini, Kazuteru Hasegawa, Hosu Sin, Rene Sit, Jennifer Okamoto, Rahul Sinha, Yue Zhang, Caitlin J. Karanewsky, Jozeph L. Pendleton, Maurizio Morri, Martine Perret, Fabienne Aujard, Lubert Stryer, Steven Artandi, Margaret Fuller, Irving L. Weissman, Thomas A. Rando, James E. Ferrell Jr., Bo Wang, Iwijn De Vlaminck, Can Yang, Kerriann M. Casey, Megan A. Albertelli, Angela Oliveira Pisco, Jim Karkanias, Norma Neff, Angela Wu, Stephen R. Quake, Mark A. Krasnow. Tabula Microcebus: A transcriptomic cell atlas of mouse lemur, an emerging primate model organism. [BioRxiv version 1][BioRxiv version 2][Project Website]. Jingsi Ming from our group greatly contributes to The Tabula Microcebus Project. The HKUST team (with Angela Wu) joins The Tabula Microcebus Consortium and makes a great contribution to this international project.
Shixuan Liu, Camille Ezran, Michael F. Z. Wang, Zhengda Li, The Tabula Microcebus Consortium, Jonathon Z. Long, Iwijn De Vlaminck, Sheng Wang, Christin Kuo, Jacques Epelbaum, Jeremy Terrien, Mark A. Krasnow, James E. Ferrell, Jr. An organism-wide atlas of hormonal signaling based on the mouse lemur single-cell transcriptome. [BioRxiv][Project Website]. Angela Wu's group and our group (Jingsi Ming, Jia Zhao and Gefei Wang) join The Tabula Microcebus Consortium, contributing to this international project.
Lin Hou, Qiongshi Lu, Can Yang, Hongyu Zhao. Special issue on genome-wide association study. Editorial. Quantitative Biology. 2021, 9 (2): 105-106.
Gefei Wang, Yuling Jiao, Qian Xu, Yang Wang, Can Yang. Deep Generative Learning via Schrodinger Bridge. [ICML][Arxiv][DGLSB Software]. PMLR 139:10794-10804, 2021.
Mingxuan Cai, Jiashun Xiao, Shunkang Zhang, Xiang Wan, Hongyu Zhao, Gang Chen, Can Yang. A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. The American Journal of Human Genetics. [AJHG link][XPA software][XPASS software][Preprint Version][Supplementary Note]. 108, 632-655, April 2021.
Baolin Wu, Yixuan Ye, Wei Jiang, Yiliang Zhang, Leqi Xu, Yunan Wu, Lan Wang, Can Yang, Hongyu Zhao. A variance component mixture modeling approach for powerful Mendelian randomization analysis using GWAS summary data.
Boran Gao, Can Yang, Jin Liu, Xiang Zhou. Accurate genetic and environmental covariance estimation with composite likelihood in genome-wide association studies. PLoS Genetics. 2021. [link]
Baolin Wu, Yixuan Ye, Wei Jiang, Yiliang Zhang, Leqi Xu, Yunan Wu, Lan Wang, Can Yang, Hongyu Zhao. Incorporate horizontal pleiotropy for robust and efficient Mendelian Randomization inference using GWAS summary data.
Jian Huang, Yuling Jiao, Bangti Jin, Jin Liu, Xiliang Lu, Can Yang. A unified primal dual active set algorithm for nonconvex sparse recovery. Statistical Science, 36(2): 215-238 May 2021. [published version][Arxiv][Software].
2020
Xingjie Shi, Xiaoran Chai, Yi Yang, Qing Cheng, Yuling Jiao, Jian Huang, Can Yang, Jin Liu. A tissue-specific collaborative mixed model for jointly analyzing multiple tissues in transcriptome-wide association studies. Nucleic Acids Research [published version] [BioRxiv]. [Software] 2020.
Kang Kang, Xue Sun, Lizhong Wang, Xiaotian Yao, Senwei Tang, Junjie Deng, Xiaoli Wu, WeGene Research Team, Can Yang and Gang Chen. Direct-to-consumer genetic testing in China and its role in GWAS discovery and replication. Quantitative Biology. 2020
Zhongshang Yuan, Huanhuan Zhu, Ping Zeng, Sheng Yang, Shiquan Sun, Can Yang, Jin Liu, Xiang Zhou. Testing and controlling for horizontal pleiotropy with the probabilistic Mendelian randomization in transcriptome-wide association studies. [Nature Communication]. 2020.
Jiafa He, Chengwei Pan, Can Yang, Ming Zhang, Yang Wang, Yizhou Yu, Xiaowei Zhou. Learning Hybrid Representations for Automatic 3D Vessel Centerline Extraction. International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2020.
Mingxuan Cai, Lin Chen, Jin Liu, Can Yang. IGREX for quantifying the impact of genetically regulated expression on phenotypes. NAR Genomics and Bioinformatics. [Published version][BioRxiv][Software] 2020.
Qing Cheng, Yi Yang, Xingjie Shi, Kar-Fu Yeung, Can Yang, Heng Peng, Jin Liu. MR-LDP: a two-sample Mendelian randomization for GWAS summary statistics accounting for linkage disequilibrium and horizontal pleiotropy. NAR Genomics and Bioinformatics. [Published version][BioRxiv]. 2020.
Jingsi Ming, Tao Wang, Can Yang. LPM: A latent probit model to characterize relationship among complex diseases using summary-statistics from multiple GWAS and functional annotations. Bioinformatics. [Published version][BioRxiv][Software]. 2020.
Jian Huang, Yuling Jiao, Jin Liu, Can Yang. REMI: Regression with marginal information and its application in genome-wide association studies. Statistica Sinica. 31 (2021), 1985-2004 [Arxiv][Software]
2019
Yi Yang, Xingjie Shi, Yuling Jiao, Jian Huang, Min Chen, Xiang Zhou, Lei Sun, Xinyi Lin, Can Yang, and Jin Liu. CoMM-S2: a collaborative mixed model using summary statistics in transcriptome-wide association studies. Bioinformatics. [BioRvix]. 2019.
Xiangyu Luo, Can Yang, Yingying Wei. Detection of cell-type-specific risk-CpG sites in epigenome-wide association studies. Nature Communications, 10, 3113 . 2019. [Nature Communications][Software at Bioconductor]
Yuan Gao, Yuling Jiao, Yang Wang, Yao Wang, Can Yang, Shunkang Zhang. Deep Generative Learning via Variational Gradient Flow. International Conference on Machine Learning (ICML), PMLR 97:2093-2101, 2019. [ICML link][Software][Demo_code]
Tao Wang, Can Yang, Hongyu Zhao. Prediction analysis for microbiome sequencing data. Biometrics. [Journal Version][arXiv]
Xingjie Shi, Yuling Jiao, Yi Yang, Ching-Yu Cheng, Can Yang, Xinyi Lin, Jin Liu. VIMCO: Variational Inference for Multiple Correlated Outcomes in Genome-wide Association Studies. Bioinformatics. [link][Software]
Jia Zhao, Jingsi Ming, Xianghong Hu, Gang Chen, Jin Liu, Can Yang. Bayesian Weighted Mendelian Randomization for Causal Inference based on Summary Statistics. Bioinformatics. [link][Software][Code for Reproducibility]
Mingxuan Cai, Mingwei Dai, Jingsi Ming, Heng Peng, Jin Liu, Can Yang. BIVAS: A Scalable Bayesian Method for Bi-Level Variable Selection With Applications. Journal of Computational and Graphical Statistics. [Published version] [Arxiv] [Software][IMDB data]
2018
Can Yang, Xiang Wan, Xinyi Lin, Mengjie Chen, Xiang Zhou, Jin Liu. CoMM: a collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information. Bioinformatics. [link][Software]
Mingwei Dai, Xiang Wan, Peng Hao, Yao Wang, Yue Liu, Jin Liu, Zongben Xu, and Can Yang. Joint analysis of Individual-level and summary-level GWAS data by leveraging pleiotropy. Bioinformatics. [link][Software]
Yi Yang, Mingwei Dai, Jian Huang, Xinyi Lin, Can Yang, Jin Liu, Min Chen. LPG: a four-groups probabilistic approach to leveraging pleiotropy in genome-wide association studies. BMC Genomics. [Link][Software]
Jingsi Ming, Mingwei Dai, Mingxuan Cai, Xiang Wan, Jin Liu, Can Yang. LSMM: A statistical approach to integrating functional annotations with genome-wide association studies. Bioinformatics. March, 2018. [Bioinformatics link][Software]
2017
C. Wu, C. Yang, H. Zhao, J. Zhu. On the convergence of the EM algorithm: from the statistical perspective. [arXiv]
Y. Hu, Q. Lu, R. Powles, X. Yao, C. Yang, F. Fang, X. Xu, H. Zhao. Leveraging Functional Annotations in Genetic Risk Prediction for Human Complex Diseases. PLoS Computational Biology. [PLoS Computational Biology link] [Biorxiv]
J. Liu, X. Wan, C. Wang, Ch. Yang, X. Zhou, C. Yang. LLR: A latent low-rank approach to colocalizing genetic risk variants in multiple GWAS. Bioinformatics. 2017. [Bioinformatics link]
M. Dai, J. Ming, M., Cai, J. Liu, C. Yang, X. Wan, and Z. Xu. IGESS: A statistical approach to integrating individual level genotype data and summary statistics in genome wide association studies. Bioinformatics. 2017. [Bioinformatics link][software]
Z. Lin, T. Wang, C. Yang, H. Zhao. On Joint estimation of Gaussian graphical models for spatial and temporal data. Biometrics, DOI: 10.1111/biom.12650. 2017. [Arxiv version][software]
2016
Z. Lin, C. Yang, Y. Zhu, J. Duchi, Y. Fu, Y. Wang, B. Jiang, M. Zamanighomi, X. Xu, M. Li, N. Sestan, H. Zhao, and W. Wong. Simultaneous dimension reduction and adjustment for confounding variation. Proceedings of the National Academy of Sciences. [PNAS version]. doi: 10.1073/pnas.1617317113, vol. 113 no. 51, 14662-14667. December 20, 2016. [software]
J. Liu*, C. Yang*, X. Shi, C. Li, J. Huang, H. Zhao and S. Ma. Analyzing Association Mapping in Pedigree-based GWAS Using A Penalized Multi-trait Mixed Model. Genetic Epidemiology. 2016. *Joint first author. [link] An Early version on [arXiv].
C. Yang, X. Wan, J. Liu, and M. Ng. Introduction to statistical methods for integrative data analysis in genome-wide association studies. Book Chapter. Big Data Analytics in Genomics, Springer. 2016. [link]
H.Liu, Y. Wang, C. Yang. Mathematical design of a novel gesture-based instruction/input device using wave detection. SIAM Journal on Imaging Sciences. 2016. [arxiv][SIAM version]
J. Liu, X. Wan, S. Ma, and C. Yang. EPS: An empirical Bayes approach to integrating pleiotropy and tissue-specific information for prioritizing risk genes. Bioinformatics. 2016.
J. Wu, Z. He, X. Liu, F. Gu, J. Zhou, C. Yang. Computing Exact Permutation p-Values for Association Rules. Information Science. 2016.
J. Jiang, C. Li, D. Paul, C. Yang, H. Zhao. On high dimensional misspecified mixed model analysis in genome-wide association studies. Annals of Statistics 2016, Vol. 44, No. 5, 2127-2160. [arXiv]
C. Yang, C. Li, D. Chung, M. Chen, J. Gelernter and H. Zhao. Introduction to statistical methods in genome-wide association studies. Book Chapter. Genome-Wide Association Studies From Polymorphism to Personalized Medicine, edited by Appasani K, Cambridge University Press. Jan. 2016. [Book link]
Chen M, Yang C, Li C, Zhao H: eQTL mapping in Genome-Wide Association Studies: From Polymorphism to Personalized Medicine. Book Chapter. Genome-Wide Association Studies From Polymorphism to Personalized Medicine, edited by Appasani K, Cambridge University Press. Jan. 2016. [Book link]
W. Cao, Y. Wang, J. Sun, D. Meng, C. Yang, A. Cichocki, Z. Xu. A Novel Tensor Robust PCA Approach for Background Subtraction from Compressive Measurements. [Arxiv]. IEEE Transactions on Image Processing. 2016.
2015
C. Yang, C. Li, Q. Wang, D. Chung, H. Zhao. Implications of pleiotropy: Challenges and opportunities for mining Big Data in Biomedicine. Frontiers in Genetics. 2015. [full text website][pdf]
R. Polimanti, C. Yang, H. Zhao, J. Gelernter. Dissecting ancestry genomic background in substance dependence genome-wide association studies. Pharmacogenomics. 2015. [link]
J. Liu, F. Wang, H, Zhang, X Gao, and C. Yang. A penalized regression approach for integrative analysis in genome-wide association studies. Journal of Biometrics and Biostatistics. 2015.
Q. Wang*, C. Yang*, J. Gelernter, H. Zhao. Pervasive pleiotropy between psychiatric disorders and immune disorders revealed by integrative analysis of multiple GWAS. *Joint first author [BioRxiv]. Human Genetics. [pdf]. 2015. [Yale News]
C. Li, C. Yang, G. Hather, R. Liu and H. Zhao. Efficient drug-pathway association analysis via integrative penalized matrix decomposition. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2015.
X, Zhou, C. Yang, H. Zhao and W. Yu. Low-Rank Modeling and Its Applications in Image Analysis. ACM Computing Surveys. Vol. 47, No. 2, Article 36, January 2015. [The Matlab code to produce the results presented in this paper]
W. Cao, Y. Wang, C. Yang, X. Chang, Z. Han, Z. Xu. Folded-concave penalization approaches to tensor completion. Neurocomputing. 152: 261–273, 2015. [pdf]
B. Teng, C. Yang, J. Liu, Z. Cai, X. Wan. Exploring the genetic patterns of complex diseases via the integrative genome-wide approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2015. [Arxiv version]
C. Li, C. Yang, H. Zhao. Data integration and drug discovery: pathway-based approaches. Book Chapter. Integrating Omics Data, edited by G. Tseng, D. Ghosh, J. Zhou. Cambridge University Press. Sept. 2015. [Book link][pdf]
2014
D. Chung*, C. Yang*, C. Li, J. Gelernter and H. Zhao. GPA: A statistical approach to prioritizing GWAS results by integrating pleiotropy information and annotation data. PLoS Genetics, 2014. *Joint first authors. [Supporting information about GPA: Software and real data sets].
H. Zhang, F. Wang, C. Yang, H. Xu, Z. Wang, H. Zhao, J. Gelernter. Identification of methylation quantitative trait loci (mQTLs) influencing DNA methylation in the promoter regions of alcohol dependence risk genes. Human Genetics. 2014.
C. Yang, C. Li, M. Chen X., Chen, L. Hou, and H. Zhao. A penalized linear mixed model for genomic prediction using pedigree structures. The Proceedings of Genetic Analysis Workshop 18. 2014.
C. Li, C. Yang, M. Chen X., Chen, L. Hou, and H. Zhao. Adjustment of familial relatedness in association test for rare variants. The Proceedings of Genetic Analysis Workshop 18. 2014.
M. Chen, C. Yang, C. Li, L. Hou, X., Chen, and H. Zhao. Admixture mapping analysis in the context of GWAS with GAW18 data. The Proceedings of Genetic Analysis Workshop 18. 2014.
C. Yang, C. Li, H. Kranzler, L. Farrer, H. Zhao and J. Gelernter. Exploring the genetic architecture of alcohol dependence in African-Americans via analysis of a genomewide set of common variants. Human Genetics. 2014.
C. Li, C. Yang and H. Zhao. Improving genetic risk prediction by leveraging pleiotropy. Human Genetics. 2014.
An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge. Genome Research, 2014.
2013
A. Smith, C. Yang, K. Jensen, R. Koesterer, R. Sherva, H. Zhao, H Kranzler, and J. Gelernter. Convergent genomic, epigenomic, and transcriptomic evidence identify a functional locus that regulates EXOC7 in human brain and associates with alcohol dependence. Submitted. 2013.
C. Yang, L. Wan, S. Zhang and H. Zhao. Accounting for Non-Genetic Factors by Low-Rank Representation and Sparse Regression for eQTL Mapping. Bioinformatics. 2013. [The yeast data set used in this paper]
X. Zhou, C. Yang, X.Wan, H. Zhao and W. Yu. Multi-sample aCGH Data Analysis via Total Variation and Spectral Regularization. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 10 (1): 230-235. [software]
X. Wan, C. Yang, Q. Yang, H. Zhao and W. Yu. HapBoost: A fast approach to boosting haplotype association analyses in genome-wide association studies. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 10 (1):207-212, 2013.
J. Ferguson, J. Cho, C. Yang, H. Zhao. Empirical Bayes correction for the Winner’s Curse in Genetic Association Studies. Genetic Epidemiology. 37(1):60-68, 2013.
X. Wan, C. Yang, Q. Yang, H. Zhao and W. Yu. The complete compositional epistasis detection in genome-wide association studies. BMC Genetics, Feb., 2013.
T. Hang, H. Gong, C. Yang and Z. He. ProteinLasso: A Lasso regression approach to protein inference problem in shotgun proteomics. Computational Biology and Chemistry. 43: 46–54, 2013.
X. Zhou, C. Yang and W. Yu. Moving objects segmentation by detecting contiguous outlier in low-rank representation. IEEE Trans. on Pattern Analysis and Machine Intelligence. 35(3): 597-610, 2013. [software]
P. Xie, H Kranzler, C. Yang, H. Zhao, L. Farrer, J. Gelernter. Genome-wide association study identifies new susceptibility loci for posttraumatic stress disorder. Biological psychiatry. 74(9): 656-663, 2013.
2012
X. Zhou, C. Yang and W. Yu. Automatic Mitral Leaflet Tracking in Echocardiography by Outlier Detection in the Low-rank Representation. The 25-th Conference on Computer Vision and Pattern Recognition, Providence, Rhode Island, USA, June 16-21. [software]. 2012.
C. Yang, Z. He, C. Yang and W. Yu. Peptide re-ranking with protein-peptide correspondence and precursor peak intensity information. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 9(4):1212-1219, 2012.
X. Wan*, C. Yang* and W. Yu. Comments on ‘An empirical comparison of several recent epistatic interaction detection methods’. Bioinformatics. 28(1):145-146, 2012. *Joint first author.
2011
C. Yang and X. Zhou. Blockwise coordinator descent for stable principle component pursuit (done in Feb.,2011, posted on August 4, 2011).
C. Yang, X., Zhou, X. Wan, Q. Yang, H. Xue and W. Yu. Identifying disease-associated SNP clusters via contiguous outlier detection, Bioinformatics, 17(18):2578-2585, 2011.
C. Yang, X. Wan, Q. Yang, H. Xue, N. Tang and W. Yu. A hidden two-locus disease association pattern in genome-wide association studies, BMC Bioinformatics, 12:156, 2011.
L.S. Yung, C. Yang, X. Wan, and W. Yu. GBOOST : A GPU-based tool for detecting gene-gene interactions in genome-wide case control studies, Bioinformatics, 27(9):1309-1310, 2011.
Z. He, C. Yang and W. Yu. A partial set covering model for protein mixture identification using mass spectrometry data, IEEE/ACM Transactions on Computational Biology and Bioinformatics. 8(2), 368-380, 2011.
Z. He, C. Yang, G. Guo, N. Li and W. Yu. Motif-All: Discovering all phosphorylation motifs, BMC Bioinformatics, 12(S1):S3, 2011.
C. Yang, X. Wan, Z. He, Q. Yang, and W. Yu. The choice of null distributions for detecting gene-gene interactions in genome-wide association studies, BMC Bioinformatics, BMC Bioinformatics, 12(S1):S26, 2011.
C. Yang. SNP data analysis in genome-wide association studies. Ph.D. Thesis, Hong Kong University of Science and Technology, 2011. (Winner of the 2012 Hong Kong Young Scientist Award in Engineering Science)
2010
X. Wan*, C. Yang*, Q. Yang, H. Xue, X. Fan, N. Tang and W. Yu. BOOST: a fast approach to detecting gene-gene interactions in genome-wide case-control studies. The American Journal of Human Genetics, 87(3):325-340, 2010. *Joint first authors. [software][talk] (The most popular GWAS data analysis toolkit PLINK has incorporated BOOST since its 1.90 Version)
X. Wan*, C. Yang*, Q. Yang, H. Xue, N. Tang and W. Yu. Detecting two-locus associations allowing for interactions in genome-wide association studies. Bioinformatics, 26(20):2517-2525, 2010. *Joint first authors.
C. Yang, C.Yang and W. Yu. A regularized regression method for peptide quantification. Journal of Proteome Research, 9(5):2705-2712, 2010.
Z. He, C. Yang, C. Yang and W. Yu. Optimization-based peptide mass fingerprinting for protein mixture Identification, Journal of Computational Biology, 17(3):221-235, 2010.
C. Yang, X. Wan, Q. Yang, H. Xue and W. Yu. Identifying main effects and epistatic interactions from large-scale SNP Data via adaptive group Lasso, BMC Bioinformatics, 11(S1):S18, 2010.
X. Wan, C. Yang, Q. Yang, H. Xue, N. L. S. Tang and W. Yu. Predictive rule inference for epistatic interaction detection in genome-wide association studies, Bioinformatics, 26(1):30-37, 2010.
2009
C. Yang, Z. He, X.Wan, Q. Yang, H. Xue and W. Yu. SNPHarvester: a filteringbased approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics, 25(4):504-511, 2009.
X. Wan, C. Yang, Q. Yang, H. Xue, N. L. S. Tang and W. Yu. MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study, BMC Bioinformatics, 10(1):13, 2009.
2008
Z. He, C. Yang and W. Yu. Peak bagging for peptide mass fingerprinting, Bioinformatics 24(10): 1293-1299, 2008.
C. Yang, J. Meng, S. Zhu and M. Dai. Model free data mining, Data Mining and Knowledge Discovery Technologies, Chapter X, Book chapter, 2018.
2007
W. Kong, S. Zhu and C. Yang. GPC algorithm and queuing-selecting for networked level Control, the IEEE International Conference on Control and Automation Guangzhou, China, 2007.
2006
C. Yang, J. Meng and S. Zhu. Cluster-based input selection for transparent fuzzy modeling, International Journal of Data Warehousing and Mining, 2(3), 57- 75, 2006.
C. Yang, S. Zhu, W. Kong and L. Lu. Application of generalized predictive control in networked control system, Journal of Zhejiang University SCIENCE, 7(2), 225-233, 2006.
J. Meng and C. Yang. The research of multi-variables hierarchical fuzzy decouplecontrol strategy based on human knowledge, the IEEE 6th World Congress on Intelligent Control and Automation, Dalian, China, 2006.
2005
C. Yang, S. Zhu, J. Meng and L. Lu. Transparent fuzzy modeling based on minimum cluster volume, the IEEE Fifth International Conference on Controland Automation, Budapest, Hungary, 2005.
C. Yang and J. Meng. Optimal fuzzy modeling based on minimum cluster volume,the First International Conference on Advanced Data Mining and Applications, Wuhan, China, 2005.