Publications and Preprints

Preprints

[1] Optimal neural network approximation of smooth compositional functions on sets with low intrinsic dimension (with T. Nagler). Preprint (2026).

[2] Latent Structure Emergence in Diffusion Models via Confidence-Based Filtering (with W. Wei, Y. Zheng, K. Chen, M. Seleznova and H. Chou). Preprint (2026).

[3] On the expressivity of deep Heaviside networks (with I. Kong, J. Chen and J. Schmidit-Hieber). Preprint (2025).

[4] Training diagonal linear networks with stochastic sharpness aware minimization. (with G. Clara and J. Schmidt-Hieber). Preprint (2025).

Publications

[1] A novel statistical approach to analyze image classification (with J. Chen and J. Schmidt-Hieber). To appear at Annals of Statistics

[2] Accelerated Mirror Descent for Non-Euclidean Star-convex Functions (with C. Lezane and W. Koolen). To appear at ALT2026

[3] On the VC dimension of deep group convolutional neural networks (with A. Sepliarskaia and J. Schmidt-Hieber). To appear at NeurIPS 2025.

[4] Dropout Regularization Versus l2-Penalization in the Linear Model (with G.Clara and J. Schmidt-Hieber). Journal of Machine Learning Research, to appear (2024).

[5] Convergence rates for shallow neural networks learned by gradient descent (with A. Braun, M. Kohler and H. Walk). Bernoulli, 30(1): 475-502 (2024).

[6] Statistical theory for image classification using deep neural networks with cross entropy loss (with M. Kohler). Journal of Statistical Planning and Inference, to appear (2024).

[7] Learning green’s function efficiently using low-rank (with K. Wimalawarne and T. Suzuki). ICML2023

[8] Estimation of a regression function on a manifold by fully connected deep neural networks (with M. Kohler and U. Reif). Journal of Statistical Planning and Inference, 222: 160-181 (2023).

[9] Estimation of a function of low local dimensionality by deep neural networks (with M. Kohler and A. Krzyzak). IEEE Transactions on Information Theory, 68(6): 4032-4042 (2022)

[10] Analysis of the rate of convergence of fully connected deep neural network regression estimates with smooth activation function. Journal of Multivariate Analysis, 182(C) (2021)

[11] Approximating smooth functions by neural networks with sigmoid activation function. Journal of Multivariate Analysis, 182(C) (2021)

[12] On the rate of convergence of fully connected deep very deep neural network estimates (with M. Kohler). Annals of Statistics, 49(4): 2231-2249 (2021)

[13] Discussion of "Nonparametric regression using deep neural networks with ReLU activation function" (with M. Kohler). Annals of Statistics, 48(4):1906-1910 (2020)

[14] Ein Beitrag zur statistischen Theorie des Deep Learnings. Verlag Dr. Hut (2020).

[A] The Smoking Gun: Statistical theory improves neural network estimates (joint with Michael Kohler), Oberwolfach Report 2021

[B] The Role of Statistical Theory in Understanding Deep Learning, Oberwolfach Report 2023

Page updated

Google Sites

Report abuse