Deep Learning Theory for Solving PDEs
Deep Learning Theory for Solving PDEs
There are three main sources of error when solving partial differential equations (PDEs) using neural networks: approximation error, sampling error, and training error.
The approximation error measures how well a neural network can approximate the target solution within its hypothesis space. The sampling error arises from the fact that we only have access to finite data samples, so even the best approximation cannot be guaranteed to be recovered exactly. The training error reflects the discrepancy caused by optimization and algorithmic limitations during the training process.
Approximation Analysis
For the approximation error, we show that deep neural networks with a wide range of popular activation functions can achieve superconvergence in Sobolev spaces measured by Sobolev norms compared with optimal continuous approximators. However, this improvement still suffers from the curse of dimensionality. We further show that this issue can be mitigated by formulating the problem in Barron spaces and Korobov spaces, where deep neural networks exhibit dimension-independent or weakly dimension-dependent approximation rates.
Y. Yang, H. Yang, and Y. Xiang, Nearly Optimal VC-Dimension and Pseudo-Dimension Bounds for Deep Neural Network Derivatives. Conference on Neural Information Processing Systems (NeurIPS), 2023.
Y. Yang, Y. Wu, H. Yang, and Y. Xiang, Nearly Optimal Approximation Rates for Deep Super ReLU Networks on Sobolev Spaces. (arXiv:2310.10766)
Y. Yang and Y. Lu, Optimal Deep Neural Network Approximation for Korobov Functions with respect to Sobolev Norms. Neural Networks, 2024.
Y. Yang and J. He. Deep Neural Networks with General Activations: Super-Convergence in Sobolev Norms. (arXiv:2508.05141)
Y. Lu, T. Mao, J. Xu, and Y. Yang. On the Dimension-Free Approximation of Deep Neural Networks for Symmetric Korobov Functions. (arXiv:2511.12398)
Y. Yang and Y. Xiang, Approximation of Functionals by Neural Networks without Curse of Dimensionality. J. Mach. Learn., 2022.
Generalization Analysis
For the generalization error, we show that although deep neural networks enjoy advantages at the approximation level, they typically require more data to learn their parameters. This trade-off implies the existence of an optimal network architecture once the loss function, the number of parameters, and the sample size are specified. Furthermore, guided by our theory, we develop an adaptive sampling strategy for solving PDEs.
Y. Yang and J. He, Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss. International Conference on Machine Learning (ICML), 2024.
J. Luo, Y. Yang, Y. Yuan, S. Xu, and W. Hao, An Imbalanced Learning-based Sampling Method for Physics-informed Neural Networks. Journal of Computational Physics, 2025.
Y. Yang, DeepONet for Solving Nonlinear Partial Differential Equations with Physics-Informed Training, Neural Networks, 2025.
Y. Yang and W. Zhu. Statistical Learning Guarantees for Group-Invariant Barron Functions. (arXiv:2509.23474)
optimization
In this section, we show that using neural networks to solve PDEs is often difficult, particularly for complex equations. We therefore embed problem-specific PDE structure and mathematical principles into the training procedure to stabilize optimization and enhance accuracy. Since different classes of PDEs exhibit distinct properties, we design and apply customized methods for each type.
Y. Yang, Q. Chen, and W. Hao, Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks. Journal of Scientific Computing, 2025.
W. Hao, X. Liu, and Y. Yang, Newton Informed Neural Operator for Computing Multiple Solutions of Nonlinear Partial Differential Equations, Conference on Neural Information Processing Systems (NeurIPS), 2024.
C. Chen, Y. Yang, Y. Xiang, and W. Hao. Learn Sharp Interface Solution by Homotopy Dynamics, International Conference on Machine Learning (ICML), 2025.
C. Chen, Y. Yang, Y. Xiang, and W. Hao, Automatic Differentiation is Essential in Training Neural Networks for Solving Differential Equations. Journal of Scientific Computing, 2025.
W. Hao, R. Li, Y. Xi, T. Xu, and Y. Yang, Multiscale Neural Networks for Approximating Green’s Functions, SIAM Journal on Scientific Computing, in press, 2026.
C. Chen, Q. Zhou, Y. Yang, T. Luo, and Y. Xiang, Quantifying Training Difficulty and Accelerating Convergence in Neural Network-Based PDE Solvers. (arXiv:2410.06308).
mathematical modeling
Bilayer High-Entropy Alloy
In this part, my research focuses on establishing new models and analyzing existing ones for complex systems, leveraging mathematical techniques and neural-network-based methods.
Gray-Scott Model
Y. Yang, T. Luo, and Y. Xiang, Convergence from Atomistic Model to Peierls-Nabarro Model for Dislocations in Bilayer System with Complex Lattice. Commun. Math. Sci., 20(4), 947-986, 2022.
Y. Yang, L. Zhang, and Y. Xiang, Stochastic Continuum Models for High-Entropy Alloys with Short Range Order. Multiscale Modeling & Simulation, 21(4):1323–1343, 2023.
W. Hao, C. Liu, Y. Wang, and Y. Yang, On pattern formation in the thermodynamically-consistent variational Gray-Scott model, Mathematical Biosciences, 2025.
Y. Yang, S. Lee, J. Calder, W. Hao. Energy Approach from ε-Graph to Continuum Diffusion Model with Connectivity Functional. (arXiv: 2510.25114).