My methodological research program centers on developing flexible, semiparametrically efficient statistical tools for complex, high-dimensional biomedical data. It is organized around three research programs.
My methodological research program centers on developing flexible, semiparametrically efficient statistical tools for complex, high-dimensional biomedical data. It is organized around three research programs.
Research Program 1: Distance and Kernel-Based Semiparametric Framework
High-dimensional measurements from microbiome sequencing, neuroimaging, and wearable devices often resist direct analysis by classical regression models designed for scalar or low-dimensional outcomes. I develop a distance-based semiparametric framework that replaces individual feature vectors with pairwise between-subject distance (or dissimilarity) matrices, enabling flexible modeling of beta-diversity and other aggregate outcomes while preserving statistical efficiency. This framework serves as the unifying foundation for the four application areas below.
This line of work establishes the theoretical foundations of the distance-based semiparametric framework, deriving semiparametric efficiency bounds and demonstrating that the proposed estimators are asymptotically optimal within their class. This work provides the rigorous justification for applying distance-based models broadly across data types.
Selected Publications:
Liu J, Zhang X, Lin T, Chen R, Zhong Y, Chen T, Wu T, Liu C, Huang A, Nguyen TT, Lee EE, Jeste DV, Tu XM. A New Paradigm for High-dimensional Data: Distance-Based Semiparametric Feature Aggregation Framework via Between-Subject Attributes. Scandinavian Journal of Statistics. 2024; 51(2):672–696. PMID: 39101047. DOI: 10.1111/sjos.12695
Liu J, Lin T, Chen T, Zhang X, & Tu XM. On semiparametric efficiency of an emerging class of regression models for between-subject attributes. arXiv:2205.08036. https://doi.org/10.48550/arXiv.2205.08036
Microbiome composition data are high-dimensional, compositional, and often characterized by sparse counts — making standard regression approaches ill-suited. I apply and extend the distance-based framework to model beta-diversity in cross-sectional and longitudinal microbiome studies, including settings with missing data and ensemble estimation.
Selected Publications:
Liu J, Zhang X, Chen T, Wu T, Lin T, Jiang L, Lang S, Liu L, Natarajan L, Tu JX, Kosciolek T, Morton J, Nguyen TT, Schnabl B, Knight R, Feng C, Zhong Y, Tu XM. A semiparametric model for between-subject attributes: Applications to beta-diversity of microbiome data. Biometrics. 2022; 78(3):950–962. PMID: 34010477. DOI: 10.1111/biom.13487
Liu J, Xu K, Ferguson J, Kang K, et al. Distance-based semiparametric regression for between-subject outcomes in longitudinal microbiome studies with missing data. Under Revision, Statistics in Medicine (2025).
Liu J, Xu K, Wu T, Yao L, Nguyen TT, Jeste D, Zhang X. Deciphering the 'gut-brain axis' through microbiome diversity. General Psychiatry. 2023; 36(5):e101090. PMID: 37920405.
Neuroimaging data from multi-modal and multi-region acquisitions present particular challenges for statistical integration. I extend the distance-based framework to accommodate structured multi-modal, multi-region dissimilarity matrices, with applications to resting-state fMRI and structural MRI in psychosis research.
Selected Publications:
Zhang X, Vandekar S, Chen AA, Kang K, Seidlitz J, Alexander-Bloch A, & Liu J*. Multi-modal and multi-region distance model for neuroimaging data. Under Review (2026). bioRxiv
Actigraphy and other wearable-device data generate high-frequency, within-subject time series that are difficult to summarize without loss of information. I develop distance-based approaches to characterize both between- and within-subject variability in wearable data, with applications to sleep, physical activity, and rest-activity rhythms in psychiatric populations.
Selected Publications:
Liu J, Cai H, Zhang Z, Wang J, Lee E, & Zhang X. Between- and within-subject variability in actigraphy data: A case study on sleep patterns in schizophrenia. bioRxiv (2026). https://doi.org/10.64898/2026.02.06.704499
Research Program 2: Causal Inference & Mediation
X → M → Y
\ ↑
└───────┘
Observational biomedical studies increasingly seek not only to identify associations but to draw causal conclusions — such as whether microbiome composition mediates the relationship between a treatment and a mental health outcome. Classical causal inference methods, however, assume low-dimensional confounders and sparse effect structures that rarely hold in multi-omics or neuroimaging data. I develop causal inference and mediation frameworks that leverage feature aggregation and machine-learning debiasing to enable valid causal conclusions from high-dimensional, non-sparse data.
Selected Publications:
Liu J, Zhang X, Xu K, Mei Y, Taylor M, et al. Causal inference for high-dimensional measurements: feature aggregation and dual orthogonality. Under Review (2025).
Chen M, Nguyen TT, & Liu J*. High-dimensional Confounding in Causal Mediation: A Comparison Study of Double Machine Learning and Regularized Partial Correlation Network. Journal of Data Science. 2025. doi:10.6339/25-JDS1169.
Wu H, Shao L, Gui T, Wu T, Huang Z, Tu S, Tu X, Liu J*, & Lin T. Why Is the Double-Robust Estimator for Causal Inference Not Doubly Robust for Variance Estimation? arXiv:2511.17907 (2025).
Yao L, et al. Distance-based causal mediation analysis for high-dimensional data. (In preparation, 2026)
Research Program 3: Psychometrics
Psychiatric and clinical research depends on validated measurement instruments, such as symptom scales, cognitive batteries, and behavioral assessments, whose statistical properties are often under-scrutinized. I develop and apply psychometric methods that integrate classical item response theory (IRT) with modern network-based approaches, with the goal of producing shorter, better-validated scales that are practical for clinical use and sensitive to the latent structure of psychiatric constructs.
Selected Publications:
Liu J, Womer F, Sheffield J, Armstrong K, McGonigle T, et al. Distress as a bridge to suicidality in schizophrenia spectrum disorders: A network-based intervention simulation study. (Preprint, 2026) https://doi.org/10.21203/rs.3.rs-8855938/v1
Liu J, Chen M, Wu H, Cai H, Tu S, Lee E, Zhang X. An effective short form of the 20-item UCLA Loneliness Scale version 3: item response theory and network psychometrics. General Psychiatry. 2025; 38(4):e102055. PMID: 40667491.
Liu J. Book review for Computational Aspects of Psychometric Methods with R by Patricia Martinková and Adéla Hladká, Chapman & Hall/CRC, 2023. Biometrics. 2025. https://doi.org/10.1093/biomtc/ujaf132