Symbolic Machine Learning

Overview

Learning high-dimensional functions (e.g., solving high-dimensional partial differential equations (PDEs) and discovering governing PDEs) is fundamental in scientific fields such as diffusion, fluid dynamics, and quantum mechanics, and optimal control, etc. Developing efficient and accurate solvers for this task remains an important and challenging topic. Traditional solvers (e.g., finite element method (FEM) and finite difference) are usually limited to low-dimensional domains since the computational cost increases exponentially in the dimension as the curse of dimensionality. Neural networks (NNs) as mesh-free parameterization are widely employed in solving regression problems and high-dimensional PDEs. Yet the highly non-convex optimization objective function in NN optimization makes it difficult to achieve high accuracy. The errors of NN-based solvers would still grow with the dimension. Besides, NN parametrization may still require large memory and high computation cost for high-dimensional problems. Finally, numerical solutions provided by traditional solvers and NN-based solvers are not interpretable, e.g., the dependence of the solution on variables cannot be readily seen from numerical solutions. The key to tackle these issues is to develop symbolic learning to discover the low-complexity structures of a high-dimensional problem. Low-complexity structures are applied to transform a high-dimensional task into a low-dimensional learning problem.

High-Dimensional PDEs

We propose the finite expression method (FEX) [pdf], a methodology that aims to find a solution in the function space of mathematical expressions with finitely many operators. Compared with existing methods, our FEX enjoys the following advantages (summarized in the figure below): (1) The expression may reproduce the true solution and achieve a solution with high and even machine accuracy. (2) The expression requires a low cost on memory (a line of string) for solution storage and computation for solution evaluation. (3) The expression has good interpretability with an explicit and legible form. Besides, from the approximation perspective, the expression in FEX is capable of avoiding the curse of dimensionality in theory with convincing numerical performance for high-dimensional Schrödinger equations [pdf], committer functions [pdf], high-dimensional partial integral differential equations [pdf].

Learning Operators & Governing Equations

We proposed a novel deep symbolic method to find the governing equations [pdf] and [pdf] in the function space of mathematical expressions generated by binary expression trees with a fixed number of operators, which is called the finite expression method (FEX). Compared to other symbolic approaches, our FEX enjoys the following advantages (summarized in the figure below): (1) Binary expression trees in FEX can generate a large class of mathematical expressions from a small number of operator list, avoiding the requirement of a large dictionary of symbols or a large symbolic neural network. (2) FEX can effectively discover nonlinear and non-polynomial equations of mathematical operators while other symbolic approaches fail. (3) The problem of searching for an appropriate mathematical expression is reformulated into a reinforcement learning problem, where effective optimization algorithms are applicable with high efficiency in memory cost. Therefore, FEX has the capacity to solve high-dimensional and high-order problems.

Domain Knowledge Meets Symbolic Regression

In this paper [pdf], we present an advanced symbolic regression method that integrates symbol priors from diverse scientific domains - including physics, biology, chemistry, and engineering - into the regression process. By systematically analyzing domain-specific expressions, we derive probability distributions of symbols to guide expression generation. We propose novel tree-structured recurrent neural networks (RNNs) that leverage these symbol priors, enabling domain knowledge to steer the learning process. Additionally, we introduce a hierarchical tree structure for representing expressions, where unary and binary operators are organized to facilitate more efficient learning. To further accelerate training, we compile characteristic expression blocks from each domain and include them in the operator dictionary, providing relevant building blocks. Experimental results demonstrate that leveraging symbol priors significantly enhances the performance of symbolic regression, resulting in faster convergence and higher accuracy.

LLM Meets Symbolic Regression

Motivated by the remarkable success of artificial intelligence (AI) across diverse fields, the application of AI to solve scientific problems, often formulated as partial differential equations (PDEs), has garnered increasing attention. While most existing research concentrates on theoretical properties (such as well-posedness, regularity, and continuity) of the solutions, alongside direct AI-driven methods for solving PDEs, the challenge of uncovering symbolic relationships within these equations remains largely unexplored. In this paper [pdf], we propose leveraging large language models (LLMs) to learn such symbolic relationships. Our results demonstrate that LLMs can effectively predict the operators involved in PDE solutions by utilizing the symbolic information in the PDEs both theoretically and numerically. Furthermore, we show that discovering these symbolic relationships can substantially improve both the efficiency and accuracy of symbolic machine learning for finding analytical approximation of PDE solutions, delivering a fully interpretable solution pipeline. This work opens new avenues for understanding the symbolic structure of scientific problems and advancing their solution processes.

Reference

[7] R. Bhatnagar, L. Liang, K. Patel, H. Yang*. From Equations to Insights: Unraveling Symbolic Structures in PDEs with LLMs. [pdf]

[6] S. Huang, Y. Wen, T. Adusumilli, K. Choudhary, H. Yang*. Parsing the Language of Expressions: Enhancing Symbolic Regression with Domain-Aware Symbolic Priors. Submitted to Advanced Theory and Simulations. [pdf]

[5] Z. Song, C. Wang, H. Yang. A Fast Algorithm for the Finite Expression Method in Learning Dynamics on Complex Networks. Submitted. [pdf]

[4] Z. Jiang, C. Wang, H. Yang*. Finite Expression Methods for Discovering Physical Laws from Data. Submitted. [pdf]

[3] Z. Song, M. Cameron, H. Yang*. A Finite Expression Method for Solving High-Dimensional Committor Problems. SIAM Journal of Scientific Computing, 2025. [pdf] [doi]

[2] G. Hardwick^, S. Liang^, H. Yang*. Solving High-Dimensional Partial Integral Differential Equations: The Finite Expression Method. Journal of Computational Physics, 2025. [pdf] [doi]

[1] S. Liang^, H. Yang^*. Finite Expression Method for Solving High-Dimensional Partial Differential Equations. To appear, Journal of Machine Learning Research. [pdf] [doi]

Google Sites

Report abuse