Research

Research theme I: Machine learning (including deep learning) has been successful in many fields of very large samples, however yet to be extended to the fields with moderate or small samples. Medical genomics is a typical field with high-dimentional data however limited labelled samples. By utilizing larger unlablled samples, we conduct Representation Learning, which learns sensible representations of genomic data, paving the way to downstream analysis towards a focal disease with small samples. This research enables powerful statistical learning in the fields with small samples, in particular biological and medical applications.

Small sample (Picture from math with bad drawings)

Research theme II:   Association mining and causality inference are critical techniques in statistics. In biology, many applications involve complex structures with multi-scale big-data, including DNA, RNA, protein, and epigenetic marks. We develop novel statistical models and their scalable implementations to discover associations and causal factors in  multi-scale data. This research allows the prediction of important biological or medical properties such as the risk of disease and response to treatments.

Research theme III: Statistical inference based on noisy and biased data is challenging, however is frequently encountered in practice. In particular, the emerging single-cell sequencing technology provided unprecedented opportunity to analyze biological phenomona at the single-cell resolution, however still suffers from significant noise and experimental bias due to premature experimental instruments. We develop novel algorithms to mine sensible knowledge depite of noise and bias in the data. Our statistical models will bridge the gap between the ability of state-of-the-art sequencing instruments and the abitious biological applications. 

Single-cell RNA-Seq data (Picture from Panoli's article at towardsdatascience.com) 

Selected Works: (My trainees are underlined; * = joint first authors; # = corresponding author(s))

Statistical method development


Data analysis

Full Publications: Please refer to my  Google Scholar site  for the latest list and citation reports.

Tools: Please refer to my  GitHub for the software developed in my research group.

Acknowledgement: Our research is supported by national/provincial/institutional grants: