News:
2026.1: I presented at Joint Math Meetings! My talks:
Maps, Models, and Making: Integrating Projects, Inquiry, and Concept Maps in ODE classroom slides
A Category Theory Framework for Quantifying Emergent Effects: Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme slides
2025.11: Our presentation at SfN received good attention and positive feedback:
Quantifying and Engineering Emergent Dynamics in Biological and Artificial Neural Networks poster
2025.9: Submitted materials for NeurIPS AI education track!
2025.8: Our abstract was accepted by SfN (Society for Neuroscience), see you this November in San Diego!
2025.7: Our work on emergence-promoting neural network initialization received positive feedback from reviewers of NeurIPS!
2025.4: Our two abstracts were accepted by UCSD Education Innovation Expo!
2025.1: Our paper on the categorical framework of emergence was accepted by Neural Computation!
(Neural Computation: https://direct.mit.edu/neco/article-abstract/37/8/1409/131382/A-Categorical-Framework-for-Quantifying-Emergent?redirectedFrom=fulltext )
This is our foundational and groundbreaking work that quantifies emergence. Our measure is being applied to a range of models in machine learning and neurobiology, yielding very interesting insights.
Emergent effect is crucial to the understanding of the properties of complex systems that do not appear in their basic units, but there has been a lack of theories to measure and understand its mechanisms. In this paper, we established a framework based on homological algebra that encodes emergence as the mathematical structure of cohomologies and then applied it to network models to develop a computational measure of emergence. This framework ties the emergence of a system to its network topology and local structures, paving the way to predict and understand the cause of emergent effects. We show in our numerical experiment that our measure of emergence correlates with the existing information-theoretic measure of emergence.
The direct application of this work leads to the following project:
(under review, https://openreview.net/pdf?id=7NlTuZcP99)
We proposed a neural network initialization scheme with higher emergence value, resulting in better performance compared to Kaiming/ Xavier initialization.
We introduce a novel yet straightforward neural network initialization scheme that modifies conventional methods like Xavier and Kaiming initialization. Inspired by the concept of emergence and leveraging the emergence measures proposed by (Li, 2023), our method adjusts the layer-wise weight scalar multiplier variable to achieve higher emergence values. This enhancement is easy to implement, requiring no additional optimization steps for initialization compared to GradInit. We evaluate our approach across various architectures, including MLP and convolutional architectures for image recognition, and transformers for machine translation. We demonstrate substantial improvements in both model accuracy and training speed, with and without batch normalization. The simplicity, theoretical innovation, and demonstrable empirical advantages of our method make it a potent enhancement to neural network initialization practices. These results suggest a promising direction for leveraging emergence to improve neural network training methodologies. Code is available at: https://github.com/johnnyjingzeli/EmergenceInit.
(AppliedMath, https://www.mdpi.com/2673-9909/5/3/93)
This is our endeavor to connect emergent phenomena with dynamical systems and stochastic processes.
We present a formal framework for modeling neural network dynamics using Category Theory, specifically through Markov categories. In this setting, neural states are represented as objects and state transitions as Markov kernels, i.e., morphisms in the category. This categorical perspective offers an algebraic alternative to traditional approaches based on stochastic differential equations, enabling a rigorous and structured approach to studying neural dynamics as a stochastic process with topological insights. By abstracting neural states as submeasurable spaces and transitions as kernels, our framework bridges biological complexity with formal mathematical structure, providing a foundation for analyzing emergent behavior. As part of this approach, we incorporate concepts from Interacting Particle Systems and employ mean-field approximations to construct Markov kernels, which are then used to simulate neural dynamics via the Ising model. Our simulations reveal a shift from unimodal to multimodal transition distributions near critical temperatures, reinforcing the connection between emergent behavior and abrupt changes in system dynamics.
(under review, https://arxiv.org/pdf/2409.01568)
This is my work joint with a group of undergrads I mentored! We study what happens to the network's emergent abilities as training goes.
Emergence, where complex behaviors develop from the interactions of simpler components within a network, plays a crucial role in enhancing neural network capabilities. We introduce a quantitative framework to measure emergence during the training process and examine its impact on network performance, particularly in relation to pruning and training dynamics. Our hypothesis posits that the degree of emergence—defined by the connectivity between active and inactive nodes—can predict the development of emergent behaviors in the network. Through experiments with feedforward and convolutional architectures on benchmark datasets, we demonstrate that higher emergence correlates with improved trainability and performance. We further explore the relationship between network complexity and the loss landscape, suggesting that higher emergence indicates a greater concentration of local minima and a more rugged loss landscape. Pruning, which reduces network complexity by removing redundant nodes and connections, is shown to enhance training efficiency and convergence speed, though it may lead to a reduction in final accuracy. These findings provide new insights into the interplay between emergence, complexity, and performance in neural networks, offering valuable implications for the design and optimization of more efficient architectures.
Physical Review E 101.1 (2020): 013312.
We present a class of exponential integrators to compute solutions of the stochastic Schrödinger equations arising from the modeling of open quantum systems. To be able to implement the methods within the same framework as the deterministic counterpart, we express the solution using Kunita's representation. With appropriate truncations, the solution operator can be written as matrix exponentials, which can be efficiently implemented by the Krylov subspace projection. The accuracy is examined in terms of the strong convergence by comparing trajectories, and in terms of the weak convergence by comparing the density-matrix operators. We show that the local accuracy can be further improved by introducing third-order commutators in the exponential. The effectiveness of the proposed methods is tested using the example from Di Ventra et al. [J. Phys.: Condens. Matter 16, 8025 (2004)].
Works in progress:
(in preparation) We develop a theoretical framework linking the structure and behavior of feed-forward neural networks to their loss landscape geometry via activation paths. By expressing network outputs and loss functions as polynomials over active paths, we derive explicit gradient and Hessian formulas revealing how path diversity controls
landscape smoothness. Our main result establishes that under sub-power path growth N_path(n) = Cn^β (where n denotes network width, C > 0 is a constant, and β ∈ (0, L) is the path scaling exponent, with L denoting the number of hidden layers) with β < L/2, wider networks exhibit flatter loss surfaces, with Hessian trace scaling as O(N^γ_path) where γ(L, β) = 2 − L/β < 0. This implies wider basins of attraction and explains the empirical success of over-parameterized models. Experiments on CIFAR-10, UCI Adult, and function approximation tasks validate the theory across network depths and widths. Our framework provides a mechanistic explanation for why over-parameterized networks generalize and offers design principles for architectures with provably trainable loss landscapes.
(in preparation) We propose a novel framework for analyzing circuit structures in large language models (LLMs) with a focus on the number of paths from input to output. By extracting circuits using methods such as ACDC and quantifying them with graph-theoretic metrics, we hypothesize that higher path density leads to flatter loss landscapes. This, in turn, may facilitate easier feature extraction, better generalization, and increased robustness. We develop a theoretical model connecting path density to loss curvature and validate our framework on transformer models like GPT-2 Small and Llama-2-7B.
(in preparation) We study emergence through paths: the idea that qualitatively new behaviors in neural networks arise when the architecture and learned gating activate sufficiently many computation paths from input to output. We formalize a path-based notion of emergence and propose two complementary measurement axes. The first is a mechanistic resource—the live-path fraction (LPF), the proportion of input-to-output paths that are active and carry non-negligible signal for a given input and parameter setting. The second is a behavioral threshold—the probability that performance on a task abruptly exceeds a pre-specified criterion (e.g., accuracy ≥ τ ) as a function of a resource (training steps, width, depth, or data). Using controlled families of skip and strided connections, we show that architectural motifs which increase the combinatorics of available paths (residual links, U-Net long skips, and stride-2 cross-block skips) systematically increase LPF and lower the scale (in steps or parameters) at which behavioral thresholds are crossed. We further connect LPF to out-of-distribution robustness: models with richer path structure sustain higher success under corruption shifts at fixed early-training budgets. Together, these results support a graph-theoretic perspective on emergence and offer practical levers—via architecture—to engineer when and how emergent abilities appear.
My Research in education:
Johnny Jingze Li, “Fostering Interdisciplinary AI Education Through Project-Based Learning: A Case Study” (link)
Johnny Li, “Maps, Models, and Making: Integrating Projects, Inquiry, and Concept Maps in Differential Equations classroom” (link)
Johnny Jingze Li, “Concept Maps based Learning Path Recommendation and Navigation” (link)
Kalyan Basu, Johnny Jingze Li, AlShinaifi, Faisal, Zeyad Almoaigel, “A Mathematical and Algorithmic Framework for Efficient Concept Acquisition by Learners” (in preparation)