Research Interests
I develop and apply statistical inference and learning methods that enable reliable analysis of observational health data. My applied work has spanned electronic health records (EHR) data, mobile device data (both actively and passively collected), and large-scale biobank data (both genetic and clinical). My methodological work is inspired by my decade of experience working with EHR data and considers settings with extreme amounts of missing data, complex measurement error or misclassification, bias, and data heterogeneity. A full list of my publications can be found on Google Scholar.
Among the questions I currently find compelling are:
Semi-supervised model evaluation. In settings where labeled data is difficult to obtain, when and how can we make use of large amounts of unlabeled or noisy labeled data to precisely evaluate a prediction model's performance ? [paper 1, paper 2, paper 3]
Post-prediction inference. When outcomes and/or covariates are generated from potentially complex prediction models, how can we develop simple and computationally efficient procedures for valid statistical inference? [paper]
Exact inference for meta-analysis. How can we perform meta-analysis when few studies/sites are available, particularly when heterogeneity across studies/sites is present? [paper, preprint]
Applications in medical informatics. What is the state of phenotyping in EHR-based research? [paper] How can we expedite and improve the accuracy of EHR-based studies with semi-supervised and weakly-supervised learning? [paper 1, paper 2, paper 3]
Research Support
I am grateful for funding from NSERC, CIHR, CANSSI, the Connaught Fund, the McLaughlin Center, the University of Toronto Data Science Institute, and the Ontario Ministry of Health.