Statistics & Theoretical Foundations of Data Science
Inference for Large Random Matrices & Spectral Methods
Manifold Learning, Geometric Inference & Data Integration
High-Dimensional Regression
Computational Biology & Biomedical Data Science
Integrative Single-Cell Analysis
Spatial Omics
[1] Ding, X., and Ma, R. (2025+) Kernel Spectral Joint Embeddings for High-Dimensional Noisy Datasets Using Duo-Landmark Integral Operators. Journal of the American Statistical Association [arxiv] [software]
[2] Liu, Z., Ma, R., and Zhong, Y. (2025) Assessing and Improving Reliability of Neighbor Embedding Methods: A Map-Continuity Perspective. Nature Communications [paper] [arxiv] [software]
[3] Fei, X., Ma, R., and Li, H. (2025) Statistical Inference for High-Dimensional Linear Regression with Blockwise Missing Data. Statistica Sinica 35, 431-456 [paper]
[4] Ma, R., Sun, E., Donoho, D., and Zou, J. (2024) Principled and Interpretable Alignability Testing and Integration of Single-Cell Data. Proceedings of the National Academy of Sciences (direct submission) [paper] [arxiv] [software]
[5] Sun, E., Ma, R., Negredo, P., Brunet, A., and Zou, J. (2024) TISSUE: Uncertainty-Calibrated Prediction of Single-Cell Spatial Transcriptomics Improves Downstream Analyses. Nature Methods [paper] [bioRxiv] [software]
[6] Cai, T. T., and Ma, R. (2024) Matrix Reordering for Noisy Disordered Matrices: Optimality and Computationally Efficient Algorithms. IEEE Transactions on Information Theory, 70(1), 509-531 [paper] [arxiv]
[7] Sun, E., Ma, R., and Zou, J. (2024) SPRITE: Improving Spatial Gene Expression Imputation with Gene and Cell Networks. Bioinfomatics [paper] [bioRxiv] [software]
[8] Ma, R., Guo, Z., Cai, T. T., and Li, H. (2024) Statistical Inference of Genetic Relatedness based on High-Dimensional Logistic Regression. Statistica Sinica, 34, 1023-1043 [arxiv] [R codes]
[9] Ding, X., and Ma, R. (2023) Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach. Annals of Statistics, 51(4), 1744-1769 [paper] [arxiv] [software]
[10] Cai, T. T., Guo, Z., and Ma, R. (2023) Statistical Inference for High-Dimensional Generalized Linear Models with Binary Outcomes. Journal of the American Statistical Association, 188(542), 1319-1332 [paper] [software]
[11] Ma, R., Sun, E., and Zou, J. (2023) A Spectral Method for Assessing and Combining Multiple Data Visualizations. Nature Communications [paper] [software] [arxiv] (Early Career Paper Award, ASA Biometrics Section, 2023)
[12] Sun, E., Ma, R., and Zou, J. (2023) Dynamic Visualization of High-Dimensional Data. Nature Computational Science [paper] [bioRxiv] [software]
[13] Einav, T., and Ma, R. (2023) Using Interpretable Machine Learning to Extend Heterogeneous Antibody-Virus Datasets. Cell Reports Methods, 3(100540) [paper] [software] [bioRxiv]
[14] Kelly, D., Ramdas, S., Ma, R., Rawlings-Goss, R., Grant, G., Ranciaro, A., Hirbo, J., Beggs, W., Yeager, M., Chanock, S., Nyambo, T., Omar, S., Meskel, D., Belay. G., Li, H., Brown, C., Tishkoff, S. (2023) The Genetic and Evolutionary Basis of Gene Expression Variation in East Africans. Genome Biology, 24 (35) [paper] [bioRxiv]
[15] Cai, T. T., and Ma, R. (2022) Theoretical Foundations of t-SNE for Visualizing High-Dimensional Clustered Data. Journal of Machine Learning Research, 23(301): 1-54 [arxiv] [paper]
[16] Ma, R., and Li, H. (2022) Interaction Network in Microbiome Studies. In Piegorsch, W. W., Levine, R. A., Zhang, H. H., and Lee, T. C. M. (eds.). Computational Statistics in Data Science, Chapter 13 [link]
[17] Ma, R., Cai, T. T., and Li, H. (2022) Optimal Estimation of Simultaneous Signals Using Absolute Inner Product and Applications to Integrative Genomics. Statistica Sinica, 32, 1027-1048 [arxiv] [paper] [R codes]
[18] Ma, R., and Barnett, I. (2021) The Asymptotic Distribution of Modularity in Weighted Signed Networks. Biometrika, 108(1): 1-16 [arxiv] [paper]
[19] Ma, R., Cai, T. T., and Li, H. (2021) Optimal Estimation of Bacterial Growth Rates Based on Permuted Monotone Matrix. Biometrika, 108(3): 693-708 [arxiv] [paper] [software]
[20] Cai, T. T., Li, H., and Ma, R. (2021) Optimal Structured Principal Subspace Estimation: Metric Entropy and Minimax Rates. Journal of Machine Learning Research, 22(46): 1-45 [arxiv] [paper]
[21] Ma, R., Cai, T. T., and Li, H. (2021) Optimal Permutation Recovery in Permuted Monotone Matrix Model. Journal of the American Statistical Association, 116(535), 1358-1372 [arxiv] [paper] [R codes]
[22] Ma, R., Cai, T. T., and Li, H. (2021) Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models. Journal of the American Statistical Association, 116(534), 984-998 [arxiv] [paper] [R codes]
[23] Ma, R., Hansen, M., Ranciaro, A., Thompson, S., Beggs, W., Mpoloka, S. W., Mokone, G. G., Meskel, D. W., Belay, G., Nyambo, T., Michailidis, G., Li, H., Burant, C., and Tishkoff, S. (2021) Impact of Subsistence and Genetics on Lipid Profiles in Ethnically Diverse Africans. Diabetes, 70 (Supplement_1): 191-LB [link]
[24] Zhang, L., Ma, R., Cai, T. T., and Li, H., Estimation, Confidence Intervals, and Large-Scale Hypotheses Testing for High-Dimensional Mixed Linear Regression. Submitted
[25] Li, S., Alexander, J., Kendall, J., Andrews, P., Rose, E., Orjuela, H., Park, S., Podszus, C., Shanley, L., Ma, R., Rishi, A., Donoho, D., Goldberg, G., Levy, D., Wigler, M., High-throughput single-nucleus hybrid sequencing reveals genome-transcriptome correlations in cancer. Submitted [bioRxiv]
[26] Fischer, J., and Ma, R., Sailing in High-Dimensional Spaces: Low-Dimensional Embeddings through Angle Preservation. Submitted [arxiv]
[27] Landa, B., Kluger, Y., and Ma, R., Entropic Optimal Transport Eigenmaps for Integration and Joint Embedding of Datasets. Submitted [arxiv]
[28] Phillip, N., Ma, R., Xu, R., Moffitt, J., and Irizarry, R., Identifying Spatially Variable Genes by Projecting to Morphologically Relevant Curves. Submitted [bioRxiv]
[29] Ma, Z., and Ma, R., Optimal Estimation of Shared Singular Subspaces across Multiple Matrices. Submitted [arxiv]
[30] Ma, R., Li, X., Hu, J., and Yu, B., Uncovering Smooth Structures in Single-Cell Data with PCS-Guided Neighbor Embeddings. Submitted [arxiv] [software]
[31] Baharav, T., Nicol, P., Irizarry, R., and Ma, R., Stacked SVD or SVD stacked? A Random Matrix Theory Perspective on Data Integration. Submitted [arxiv]
[32] Danning, R., Ke, T., Ma, R., and Lin, X., SEEK-VEC: Robust Latent Structure Discovery via Ensembled Topic Modeling
[33] Dhanyasi, N., Meirovitch, Y., Kapoor, V., et al., Developmental Connectomics of the Mouse Cerebellum
[34] Huang, A., Ma, R., and Cai, T., Enhancing Spectral Embedding through Robust and Flexible Knowledge Transfer in Electronic Health Records