Research

Research theme I: Machine learning (including deep learning) has been successful in many fields of very large samples, however yet to be extended to the fields with moderate or small samples. Medical genomics is a typical field with high-dimentional data however limited labelled samples. By utilizing larger unlablled samples, we conduct Representation Learning, which learns sensible representations of genomic data, paving the way to downstream analysis towards a focal disease with small samples. This research enables powerful statistical learning in the fields with small samples, in particular biological and medical applications.

Small sample (Picture from math with bad drawings)

Research theme II: Association mining and causality inference are critical techniques in statistics. In biology, many applications involve complex structures with multi-scale big-data, including DNA, RNA, protein, and epigenetic marks. We develop novel statistical models and their scalable implementations to discover associations and causal factors in multi-scale data. This research allows the prediction of important biological or medical properties such as the risk of disease and response to treatments.

Research theme III: Statistical inference based on noisy and biased data is challenging, however is frequently encountered in practice. In particular, the emerging single-cell sequencing technology provided unprecedented opportunity to analyze biological phenomona at the single-cell resolution, however still suffers from significant noise and experimental bias due to premature experimental instruments. We develop novel algorithms to mine sensible knowledge depite of noise and bias in the data. Our statistical models will bridge the gap between the ability of state-of-the-art sequencing instruments and the abitious biological applications.

Single-cell RNA-Seq data (Picture from Panoli's article at towardsdatascience.com)

Selected Works: (My trainees are underlined; * = joint first authors; # = corresponding author(s))

Statistical method development

Wang D, Zhang Q#. (2025) “Decoding Omics via Representation Learning”, Nature Computational Science. In press.
Bica I, Trang R, Hu R, Su W, Zhai Z, Zhang Q (2025). “Learning Image Derived PDE-Phenotypes from fMRI Data”. Brain Informatics. In press.
He J, Li Q, Zhang Q# (2024) “rvTWAS: identifying gene-trait association using sequences by utilizing transcriptome-directed feature selection”. Genetics. 2024 Feb 07:iyad204. doi: 10.1093/genetics/iyad204. (Software)
Wang D*, Perera D*, He J*, Cao C, Kossinna P, Li Q, Zhang W, Guo X, Alexander P, Wu J, Zhang Q#. (2023) “cLD: Rare-variant linkage disequilibrium between genomic regions identifies novel genomic interactions”. PLoS Genetics. 2023 Dec 18;19(12):e1011074. doi: 10.1371/journal.pgen.1011074. (Software)
Li Q, Yu Y, Kossinna P, Lun T, Liao W#, Zhang Q#. (2023) “XA4C: eXplainable representation learning via Autoencoders revealing Critical genes”. PLoS Computational Biology. 2023 Oct 2;19(10):e1011476. doi: 10.1371/journal.pcbi.1011476. PMID: 37782668 (Software)
Kossinna P, Cai W, Shemanko C, Lu X, Zhang Q#. (2022) “Stabilized COre gene and Pathway Election uncovers pan-cancer shared pathways and a cancer specific driver”. Science Advances. 2022 Dec 21;8(51):eabo2846. doi: 10.1126/sciadv.abo2846. PMID: 36542714 (Software)
Cao C, Kossinna P, Kwok D, Li Q, He J, Su L, Guo X, Zhang Q#, Long Q#. (2022) “Disentangling genetic feature selection and aggregation in transcriptome-wide association studies” Genetics (Cover Feature). 2022 Feb 4;220(2):iyab216. doi: 10.1093/genetics/iyab216. PMID: 34849857. (Software)
Zhang Q, Tyler-Smith C, Long Q (2015). “An extended Tajima’s D neutrality test incorporating SNP calling and imputation uncertainties”. Statistics and Its Interface. 2015, vol.8(4), 447-456.
Zhang Q, Long Q, Ott J (2014). “AprioriGWAS, a new pattern mining strategy for detecting genetic variants associated with disease through interaction effects”. PLoS Computational Biology, Jun 5; 10(6). (Software)

Data analysis

Wang L, Guo Q, Acharya S, Zheng X, Huynh V, Whitmore B, Yimit A, Malhotra M, Chatterji S, Rosin N, Labit E, Chipak C, Gorzo K, Haidey J, Elliott DA, Ram T, Zhang Q, Kuipers H, Gordon G, Biernaskie J, Guo J. (2024) “Primary cilia signaling in astrocytes mediates development and regional-specific functional specification.” Nature Neuroscience. 2024 Aug 5.
Guo X, Ping J, Yang Y, Shu X-O, Wen W, Chen Z, Tao R, Jia G, He, Cai Q, Zhang Q, Giles G, Pearlman R, Rennert G, Vodicka P, Phipps A, Gruber S, Casey G, Peters U, Long J, Zheng W. (2024) “Large-scale alternative polyadenylation (APA)-wide association studies to identify putative susceptibility genes in human common cancers”. Cancer Research 2024 Aug 15;84(16):2707-2719.
Long Q, Rabanal FA, Meng D, Huber CD, Farlow A, Platzer A, Zhang Q, Vilhjálmsson BJ, Korte A, Nizhynska V, Voronin V, Korte P, Sedman L, Mandáková T, Lysak MA, Seren U, Hellmann I, Nordborg M (2013). “Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden”. Nature Genetics, 45(8): 884-90.
Zhang, Q as one of the listed participants of the International HapMap 3 Consortium. (2010) “Integrating common and rare genetic variation in diverse human populations”. Nature 467(7311): 52-8.
Zhang, Q as one of the listed participants of the International HapMap Consortium. (2007) “A second generation human haplotype map of over 3.1 million SNPs”. Nature 449(7164): 851-61.
Sun T, Gao Y, Tan W, Ma S, Shi Y, Yao J, Guo Y, Yang M, Zhang X, Zhang Q, Zeng C & Lin D. (2007) “A six-nucleotide insertion-deletion polymorphism in the CASP8 promoter is associated with susceptibility to multiple cancers”. Nature Genetics 39: 605-613
Zhang, Q as one of the listed participants of the International HapMap Consortium. (2005) “A Haplotype Map of the Human Genome”. Nature 437: 1299-1320
Zhang, Q as one of the listed participants of the International HapMap Consortium. (2003) “The International HapMap project”. Nature 426: 789-796

Full Publications: Please refer to my Google Scholar site for the latest list and citation reports.

Tools: Please refer to my GitHub for the software developed in my research group.

Acknowledgement: Our research is supported by national/provincial/institutional grants:

Page updated

Google Sites

Report abuse