My current research focuses on scalable computing and convergence theory for Bayesian nonparametric and deep learning models, with the objective to answer the following question: what is learnable from the data? The question shall be studied in two scenarios, first under unlimited computing power and sceond under some practical computational considerations.
Under the first scenario, one might expect that "nearly everythhing" is learnable. But that is far from truth. Just like the "majority" of subsets of the real line is not Borel-measurable, the "majority" of functions on the real line are not estimable, regardless of the sample size. Virtually all statistcal and machine learning models, regardless how complex and large they are, have assumptions on the learnable objects, either implicitly (e.g., deep learning architectures, Bayesian priors, etc) or explicitly (via regularizers on loss functions). Can we explicitly formulate these assumptions through mathematical languages and rigorously answer what objects can be learned by these methods?
The first scenario paves the way to the second scenario, where practical computational considerations give rise to methods like approximations, pseudo models, variational inferences, etc. While these methods were born to solve the computational issues, they end up being standalone and well-defined methods themselves, with their legitimacies extend beyond their original intention. For example, an approximation method may have optimal statistical convergence properties, but in the meanwhile does not actually approximate the targeted method. In some cases, the pursuit of efficient computation may bring benign side effects, for example, algorithmic regularization, benign overfitting, neural collapse, etc. With the prevalence of large models, I believe the study of scalable computing and convergence properties can no longer be separated.
Szabo, B. T. and Y. Zhu (2025).* Vecchia Gaussian processes: probabilistic properties, minimax rates and methodological developments. [arxiv]
Zhu, Y., M. Peruzzi, C. Li and D. B. Dunson (2024). Radial neighbors for provably accurate scalable approximations of Gaussian processes. Biometrika, 111(4). [arxiv]
Li, C., S. Sun, and Y. Zhu (2024). Fixed-domain posterior contraction rates for spatial gaussian process model with nugget. Journal of the American Statistical Association, 119(546), pp.1336-1347. [arxiv]
Zhu, Y., C. Li, and D. B. Dunson (2023). Classification trees for imbalanced data: Surface-to-volume regularization. Journal of the American Statistical Association, 118(543), pp.1707-1717. [arxiv]
Xie, P., W. Wu, Y. Zhu and E. P. Xing (2018). Orthogonality-promoting distance metric learning: Convex relaxation and theoretical analysis. International Conference on Machine Learning, pp. 5403-5412. PMLR. [pmlr]
Xie, P., H. Zhang, Y. Zhu, and E. P. Xing (2018). Nonoverlap-promoting variable selection. International Conference on Machine Learning, pp. 5413-5422. PMLR. [pmlr]
*Alphabetical orders