[1] Optimal neural network approximation of smooth compositional functions on sets with low intrinsic dimension (with T. Nagler). Preprint (2026).
[2] Latent Structure Emergence in Diffusion Models via Confidence-Based Filtering (with W. Wei, Y. Zheng, K. Chen, M. Seleznova and H. Chou). Preprint (2026).
[3] On the expressivity of deep Heaviside networks (with I. Kong, J. Chen and J. Schmidit-Hieber). Preprint (2025).
[4] Training diagonal linear networks with stochastic sharpness aware minimization. (with G. Clara and J. Schmidt-Hieber). Preprint (2025).
[1] A novel statistical approach to analyze image classification (with J. Chen and J. Schmidt-Hieber). To appear at Annals of Statistics
[2] Accelerated Mirror Descent for Non-Euclidean Star-convex Functions (with C. Lezane and W. Koolen). To appear at ALT2026
[3] On the VC dimension of deep group convolutional neural networks (with A. Sepliarskaia and J. Schmidt-Hieber). To appear at NeurIPS 2025.
[4] Dropout Regularization Versus l2-Penalization in the Linear Model (with G.Clara and J. Schmidt-Hieber). Journal of Machine Learning Research, to appear (2024).
[5] Convergence rates for shallow neural networks learned by gradient descent (with A. Braun, M. Kohler and H. Walk). Bernoulli, 30(1): 475-502 (2024).
[6] Statistical theory for image classification using deep neural networks with cross entropy loss (with M. Kohler). Journal of Statistical Planning and Inference, to appear (2024).
[7] Learning green’s function efficiently using low-rank (with K. Wimalawarne and T. Suzuki). ICML2023
[8] Estimation of a regression function on a manifold by fully connected deep neural networks (with M. Kohler and U. Reif). Journal of Statistical Planning and Inference, 222: 160-181 (2023).
[9] Estimation of a function of low local dimensionality by deep neural networks (with M. Kohler and A. Krzyzak). IEEE Transactions on Information Theory, 68(6): 4032-4042 (2022)
[10] Analysis of the rate of convergence of fully connected deep neural network regression estimates with smooth activation function. Journal of Multivariate Analysis, 182(C) (2021)
[11] Approximating smooth functions by neural networks with sigmoid activation function. Journal of Multivariate Analysis, 182(C) (2021)
[12] On the rate of convergence of fully connected deep very deep neural network estimates (with M. Kohler). Annals of Statistics, 49(4): 2231-2249 (2021)
[13] Discussion of "Nonparametric regression using deep neural networks with ReLU activation function" (with M. Kohler). Annals of Statistics, 48(4):1906-1910 (2020)
[14] Ein Beitrag zur statistischen Theorie des Deep Learnings. Verlag Dr. Hut (2020).
[A] The Smoking Gun: Statistical theory improves neural network estimates (joint with Michael Kohler), Oberwolfach Report 2021
[B] The Role of Statistical Theory in Understanding Deep Learning, Oberwolfach Report 2023