Foretell of Future AI from Mathematical Foundation
AAAI 2026 Workshop
January 26, 2026 | Singapore | Grand Mercure Roxy Hotel - Brooke
AAAI 2026 Workshop
January 26, 2026 | Singapore | Grand Mercure Roxy Hotel - Brooke
Yuan Yao
In 1998, Steve Smale proposed his 18th mathematical problem for the 21st century, calling for a deeper understanding of the fundamental limitations of intelligence. Rapid advances in artificial intelligence over the past decade have lent renewed urgency to this challenge, underscoring the need for principled theoretical frameworks that characterize the capabilities and limitations of modern learning systems.
The structural differences between human intelligence and contemporary artificial intelligence models shed light on central issues in the development of trustworthy AI, including robustness, uncertainty quantification, and computational efficiency. In this talk, I examine these questions through the lens of modern statistical theory, with a particular emphasis on recent advances in false discovery rate control and their role in establishing mathematical guarantees for trustworthy artificial intelligence.
Biography: YAO, Yuan is currently Professor of Mathematics in the Hong Kong University of Science and Technology. Dr. Yao received his PhD in Mathematics from UC Berkeley with Prof. Steve Smale and worked in Stanford University and Peking University before joining HKUST in 2016. His main research interests lie in mathematics of data science and machine learning, with applications in computational biology and information technology.
Jianyu Hu
In this talk, we propose a new physics-informed kernel approach to solve general nonhomogeneous PDEs. Specifically, physical priors (typically encoded through PDE operator information) are incorporated into a kernel ridge regression formulation, and a regularization-based approach is employed to construct the operator learner. The method proposes a closed-form solution that is independent of the parameters of the associated PDE. From the perspective of regularization theory, the resulting estimator induces a well-defined operator that links the input and output function spaces. Consequently, it effectively shifts from a PDE solver to an operator-based solver, achieving outcomes comparable to those obtained with neural operator learning. In contrast to standard supervised learning frameworks, the proposed approach enables systematic extrapolation beyond the regimes represented in the observations without any training. Finally, a full error analysis is conducted that provides convergence rates using adaptive regularization parameters for operator-based solver.
Biography: Jianyu Hu is currently a Research Fellow at Nanyang Technological University. He received his Ph.D. degree from the School of Mathematics at Huazhong University of Science and Technology. His main research interests include structure-preserving machine learning methods and metastable transitions in stochastic dynamical systems. He published multiple papers in journals such as Journal of Nonlinear Science, Mathematics of Computation, and Physica D.
Kelin Xia
Artificial intelligence (AI) based Molecular Sciences have begun to gain momentum due to the great advancement in experimental data, computational power and learning models. However, a major issue that remains for all these AI-based learning models is the efficient molecular representations and featurization. Here we propose advanced mathematics-based molecular representations and featurization. Molecular structures and their interactions are represented by high-order topological and algebraic models (including Rips complex, Alpha complex, Neighborhood complex, Dowker complex, Hom-complex, Tor-algebra, Rhombille tiling, etc). Mathematical invariants (from persistent homology, Ricci curvature, persistent spectral, Analytic torsion, algebraic variety, etc) are used as molecular descriptors for learning models. Further, we develop geometric and topological deep learning models that can systematically incorporate molecular high-order, multiscale, and periodic information, and use them for analysing molecular data from chemistry, biology, and materials.
Biography: Dr Kelin Xia’s research focused on Mathematical AI for molecular sciences. He has published about 100 papers in journals and conferences, including SIAM Review, Science Advances, npj Computational Materials, ACS nano, Nature Machine Intelligence, TPAMI, ICML, KDD, etc. He has served as associated editor for “Computational Physiology and Medicine – Frontiers” and “Computational and Structural Biotechnology Journal”, editorial boards of “Theory in Biosciences”, “Scientific report” and “Journal of Physics: Complexity”, and editorial advisory board of “Journal of Chemical Information and Modeling” and “Patterns". He is Stanford and Elsevier World’s Top 2% Scientists 2024.
Tan Minh Nguyen
The parameter space of a neural network is often used as a proxy for the space of functions it represents, yet this correspondence is typically non-injective: distinct parameter configurations may realize the same function due to underlying symmetries. While such functional equivalence has been well studied in classical architectures, its role in modern models remains far less understood. In this talk, I will present our recent progress on the symmetry structure of modern neural architectures such as Transformers and Mixture-of-Experts. I will then discuss applications of these findings to equivariant metanetwork design and linear mode connectivity.
Biography: Tan Minh Nguyen is currently an Assistant Professor of Mathematics (Presidential Young Professor) at the National University of Singapore (NUS). Before joining NUS, he was a postdoctoral scholar in the Department of Mathematics at the University of California, Los Angeles, working with Dr. Stanley J. Osher. He obtained his Ph.D. in Machine Learning from Rice University, where he was advised by Dr. Richard G. Baraniuk. He gave an invited talk in the Deep Learning Theory Workshop at NeurIPS 2018 and organized the 1st Workshop on Integration of Deep Neural Models and Differential Equations at ICLR 2020. He also had two awesome long internships with Amazon AI and NVIDIA Research, during which he worked with Dr. Anima Anandkumar. He is the recipient of the prestigious Computing Innovation Postdoctoral Fellowship (CIFellows) from the Computing Research Association (CRA), the NSF Graduate Research Fellowship, and the IGERT Neuroengineering Traineeship. He received my M.S. and B.S. in Electrical and Computer Engineering from Rice in May 2018 and May 2014, respectively.
Rong Tang
Distribution regression seeks to estimate the conditional distribution of a multivariate response given a continuous covariate. This approach offers a more complete characterization of dependence than traditional regression methods. Classical nonparametric techniques often assume that the conditional distribution has a well-defined density, an assumption that fails in many real-world settings. These include cases where data contain discrete elements or lie on complex low-dimensional structures within high-dimensional spaces. In this work, we establish minimax convergence rates for distribution regression under nonparametric assumptions, focusing on scenarios where both covariates and responses lie on low-dimensional manifolds. We derive lower bounds that capture the inherent difficulty of the problem and propose a new hybrid estimator that combines adversarial learning with simultaneous least squares to attain matching upper bounds. Our results reveal how the smoothness of the conditional distribution and the geometry of the underlying manifolds together determine the estimation accuracy.
Biography: Rong Tang is an assistant professor in the Department of Mathematics at the Hong Kong University of Science and Technology (HKUST). Her research interests include machine learning theory, Bayesian inference, MCMC sampling, and nonparametric
We propose Poincaré Gradient Descent (PGD), a first-order optimization method on the Poincaré ball. PGD replaces the exponential map with a projection-based retraction that is first-order equivalent while preserving geodesic structure, thereby reducing per-iteration computational cost. We establish convergence guarantees for smooth geodesically convex and strongly convex objectives, showing that PGD matches the iteration complexity of Riemannian Gradient Descent (RGD). A Möbius isometric initialization and step-size equivalence analysis further ensure stable and efficient updates. Numerical experiments confirm that PGD achieves the same theoretical convergence rate as RGD but runs significantly faster in practice, offering a simple and effective framework for scalable optimization in hyperbolic space.
Authors: Chengyang Liu, Ouyang Shangke, Michael Ng, David Gu
We study the fundamental expressivity limits of transformer models. We formalize the notion of accessible sequences—those that a transformer can produce for some prompt—and characterize how accessibility depends on prompt length and model precision. By partitioning the embedding space via the decoder readout into next-token argmax regions and extending transformers to a mean-field map on probability measures, we derive theoretical upper bounds on the number and length of accessible output sequences. We prove that (i) the maximal length of accessible sequences grows linearly with the prompt length, and (ii) beyond a critical threshold, the proportion of reachable sequences decays exponentially with sequence length. These bounds hold even with unbounded context and computation time, linking the expressivity limits of transformers to the geometry of their embedding space and the finiteness of their representational precision. Experiments using a “cramming” procedure confirm both the linear scaling and the post-threshold exponential decay.
Authors: Maxime Meyer, Mario Michelessa, Caroline Chaux, Vincent Y. F. Tan
Bayesian neural networks (BNNs) require scalable sampling algorithms to approximate posterior distributions over parameters. Existing stochastic gradient Markov Chain Monte Carlo (SGMCMC) methods are highly sensitive to the choice of stepsize and adaptive variants such as pSGLD typically fail to sample the correct invariant measure without addition of a costly divergence correction term. In this work, we build on the recently proposed `SamAdams' framework for timestep adaptation (Leimkuhler, Lohmann, and Whalley 2025), introducing an adaptive scheme: SA-SGLD, which employs time rescaling to modulate the stepsize according to a monitored quantity (typically the local gradient norm). SA-SGLD can automatically shrink stepsizes in regions of high curvature and expand them in flatter regions, improving both stability and mixing without introducing bias. We show that our method can achieve more accurate posterior sampling than SGLD on high-curvature 2D toy examples and in image classification with BNNs using sharp priors.
Authors: Rajit Rajpal, Benedict Leimkuhler, Yuanhao Jiang
Spiking neural networks (SNNs) have been proposed as an (energy-)efficient alternative to conventional artificial neural networks. However, the aspired benefits have not yet been realized in practice. To gain a better understanding of why this gap persists, we theoretically study both discrete-time and continuous-time models of leaky integrate-and-fire neurons.
In the discrete-time model, which is a widely used framework due to its amenability to conventional deep learning software and hardware approaches, we analyze the impact of explicit recurrent connections on the network size required to approximate continuously differentiable functions. We contrast this view by investigating the computational efficiency of digital systems that simulate spike-based computations in the continuous-time model. It turns out that even in well-behaved settings, the computational complexity of this task may grow super-polynomially in the prescribed accuracy.
Thereby, we exemplarily highlight the intricacies of realizing potential strengths in the biological context, namely recurrent connections and computational efficiency, of spike-based computations on digital systems.
Authors: Adalbert Fono, Holger Boche, Gitta Kutyniok
NVIDIA AI Technology Center
Nanyang Technological University
Hong Kong University of Science and Technology
University of Edinburgh