Giovanni Ballarin1, Lukas Gonon1,2, Lyudmila Grigoryeva1,3
1 University of St. Gallen
2 Imperial College London
3 University of Warwick
Random Weights Neural Networks (RWNNs) have attracted significant attention in the literature and practical applications due to their ability to approximate complex functions with minimal and easy-to-implement training. Although prior research has established their universal approximation properties and generalization bounds (Gonon et al. (2023), Gonon (2023)), their use in statistical inference has remained largely unexamined. In this talk, we provide a principled inferential framework for shallow RWNNs. Leveraging recent advances in approximation theory (Leluc et al. (2025), Klusowski and Barron (2018)), we derive faster-than-standard Monte Carlo approximation rates in both L2 and expected L∞ norms. We then establish asymptotic consistency and normality of RWNN estimators in L2 (Chen and White (1999), Chen and Shen (1998), Shen (1997)), and uniform consistency under appropriate regularization (Chen and Christensen (2015)).
Our results highlight the interaction between randomness, regularization, and conditioning in determining the quality of the inference. In particular, we demonstrate that while RWNNs can serve as consistent and efficient estimators under suitable conditions, the poor conditioning of specific weight realizations poses a significant challenge, particularly in the L∞ setting. We conclude by discussing extensions to dynamic models and open problems around quantifying “good” realizations of random weights.
References
X. Chen and T. M. Christensen. Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions. Journal of Econometrics, 188(2): 447–465, Oct. 2015. ISSN 0304-4076. doi: 10.1016/j.jeconom.2015.03.010.
X. Chen and X. Shen. Sieve Extremum Estimates for Weakly Dependent Data. Econometrica, 66 (2):289, Mar. 1998. ISSN 00129682. doi: 10.2307/2998559.
X. Chen and H. White. Improved rates and asymptotic normality for nonparametric neural network estimators. IEEE Transactions on Information Theory, 45(2):682–691, Mar. 1999. ISSN 1557-9654. doi: 10.1109/18.749011.
L. Gonon. Random Feature Neural Networks Learn Black-Scholes Type PDEs Without Curse of Dimensionality. Journal of Machine Learning Research, 24:1–51, July 2023.
L. Gonon, L. Grigoryeva, and J.-P. Ortega. Approximation bounds for random neural networks and reservoir systems. The Annals of Applied Probability, 33(1):28–69, Feb. 2023. ISSN 1050-5164, 2168-8737. doi: 10.1214/22-AAP1806.
J. M. Klusowski and A. R. Barron. Approximation by Combinations of ReLU and Squared ReLU Ridge Functions With $\ellˆ1$ and $\ellˆ0$ Controls. IEEE Transactions on Information Theory, 64(12):7649–7656, Dec. 2018. ISSN 0018-9448, 1557-9654. doi: 10.1109/TIT.2018.2874447.
R. Leluc, F. Portier, J. Segers, and A. Zhuman. Speeding up Monte Carlo integration: Control neighbors for optimal convergence. Bernoulli, 31(2):1160–1180, May 2025. ISSN 1350-7265. doi:10.3150/24-BEJ1765.
X. Shen. On methods of sieves and penalization. The Annals of Statistics, 25(6):2555–2591, Dec.1997. ISSN 0090-5364, 2168-8966. doi: 10.1214/aos/1030741085.
The speaker's personal webpage: https://giovanni-ballarin.netlify.app
The abstract as PDF.
It is well-known that the coefficients of path signatures decay at least factorially fast, while the decay rate of the coefficients of the logarithimic signature is generally geometric. It was conjectured by T. Lyons and N. Sidorova that the only tree-reduced paths with bounded variation (BV) whose logarithmic signature can have infinite radius of convergence are straight lines. This conjecture was confirmed in the same work for certain types of paths and the general BV case remains unsolved.
In this talk, we develop a deeper understanding towards the Lyons-Sidorova conjecture. We prove that, if the logarithmic signature has infinite radius of convergence, the signature coefficients must satisfy an infinite system of rigid algebraic identities defined in terms of iterated integrals along complex exponential one-forms. These iterated integral identities impose strong geometric constraints on the underlying path, and in some special situations, confirm the conjecture.
As a non-trivial application of our integral identities, we prove a weak version of the conjecture, which asserts that if the logarithmic signature of a BV path has infinite radius of convergence over all sub-intervals of time, the underlying path must be a straight line.
Joint work with Xi Geng and Sheng Wang.
This talk is based on joint work with Ofelia Bonesini, Ioannis Gasteratos and Antoine Jacquier.
We introduce a canonical way of performing the joint lift of a Brownian motion W and a low-regularity adapted stochastic rough path X, extending Diehl-Oberhauser-Riedel (2015). Applying this construction to the case where X is the canonical lift of a one-dimensional fractional Brownian motion (possibly correlated with W) completes the partial rough path of Fukasawa-Takano (2024). We use this to model rough volatility with the versatile toolkit of rough differential equations (RDEs), namely by taking the price and volatility processes to be the solution to a single RDE. We argue that our framework is already interesting when W and X are independent, as correlation between the price and volatility can be introduced in the dynamics. The lead-lag scheme of Flint-Hambly-Lyons (2016) is extended to our fractional setting as an approximation theory for the rough path in the correlated case. Continuity of the solution map transforms this into a numerical scheme for RDEs. We numerically test this framework and use it to calibrate a simple new rough volatility model to market data.
TBA
Controlled ordinary differential equations driven by continuous bounded variation curves can be considered a continuous time analogue of recurrent neural networks for the construction of expressive features of the input curves. We ask up to which extent well known signature features of such curves can be reconstructed from controlled ordinary differential equations with (untrained) random vector fields. The answer turns out to be algebraically involved, but essentially the number of signature features, which can be reconstructed from the non-linear flow of the controlled ordinary differential equation, is exponential in its hidden dimension, when the vector fields are chosen to be neural with depth two. Moreover, we characterize a general linear independence condition on arbitrary vector fields, under which the signature features up to some fixed order can always be reconstructed. Based on joint work with Nicola Muca Cirone (Imperial College London) and Josef Teichmann (ETH Zürich).
The speaker's personal webpage: https://sites.google.com/view/miegluckstad/bio
Lyudmila Grigoryeva1,2. Based on joint work with: Christa Cuchiero3, Lukas Gonon1,4, Hannah Lim Jing Ting5, Juan-Pablo Ortega5, Josef Teichmann6
1 University of St. Gallen
2 University of Warwick
3 University of Vienna
4 Imperial College London
5 NTU Singapore
3 ETH Zurich
This talk provides an overview of reservoir computing (RC) as a framework for representing and learning nonlinear input-output and dynamical systems (Grigoryeva and Ortega (2018a,b), Gonon and Ortega (2020), Grigoryeva et al. (2023)). We discuss the rich structural connections between RC, Volterra series representations, and kernel methods.
We begin by revisiting the classical theory of fading memory filters and their representation via Volterra series, showing how reservoir systems—especially those with state-affine structure (state-affine systems (SAS))—can approximate such functionals through randomized projections with universal properties (Cuchiero et al. (2022)). We then highlight the emergence of Volterra reservoir kernels and their induced RKHS, illustrating how linear readouts in high-dimensional (or infinite-dimensional) feature spaces enable the universal approximation (or representation) of causal, time-invariant systems (Gonon et al. (2024, 2022)) and efficient learning of dynamical systems (Grigoryeva et al. (2025)).
A key aspect of the talk is the expressivity of state-space models underpinning RC, as well as its interesting connections to kernels and mathematical relations to path signatures in continuous time, which are a topic of our ongoing research.
References
C. Cuchiero, L. Gonon, L. Grigoryeva, J. P. Ortega, and J. Teichmann. Discrete-time signatures and randomness in reservoir computing. IEEE Transactions on Neural Networks and Learning Systems, 33(11):1–10, 2022. ISSN 21622388. doi: 10.1109/TNNLS.2021.3076777.
L. Gonon and J.-P. Ortega. Reservoir computing universality with stochastic inputs. IEEE Transactions on Neural Networks and Learning Systems, 31(1):100–112, 2020.
L. Gonon, L. Grigoryeva, and J.-P. Ortega. Reservoir kernels and Volterra series. arXiv:2212.14641, 2022.
L. Gonon, L. Grigoryeva, and J. P. Ortega. Infinite-dimensional reservoir computing. Neural Networks, 179, 2024.
L. Grigoryeva and J.-P. Ortega. Universal discrete-time reservoir computers with stochastic inputs and linear readouts using non-homogeneous state-affine systems. Journal of Machine Learning Research, 19(24):1–40, 2018a. URL http://arxiv.org/abs/1712.00754.
L. Grigoryeva and J.-P. Ortega. Echo state networks are universal. Neural Networks, 108:495–508, 2018b.
L. Grigoryeva, A. G. Hart, and J.-P. Ortega. Learning strange attractors with reservoir systems. Nonlinearity, 36:4674–4708, 2023.
L. Grigoryeva, H. Lim Jing Ting, and J.-P. Ortega. Infinite-dimensional next-generation reservoir computing. Physical Review E, 111:035305, 2025.
The abstract as PDF.
We will explore a Hopf algebra approach for the construction of cubature formulae on Wiener space. This maintains the symmetric structure of the expected signature of Brownian motion when projecting into the Lie algebra - greatly simplifying the constraint problem. We can demonstrate the effectiveness of this approach by constructing the first explicit degree-7 cubature formula for general dimension Wiener space. This is based on joint work with Emilio Ferrucci (University of Oxford), Christian Litterer (University of York) and Terry Lyons (University of Oxford).
We show well-posedness for McKean-Vlasov equations with rough common noise and progressively measurable coefficients. Our results are valid under natural regularity assumptions on the coefficients, in agreement with the respective requirements of Ito and rough path theory. To achieve these goals, we work in the framework of rough stochastic differential equations recently developed by the authors. Joint work with Peter Friz and Antoine Hocquet.
In this talk, I will present statistical tests for determining whether a path is an outlier, using test statistics defined in the reproducing kernel Hilbert space associated with a signature kernel. I will discuss theoretical guarantees on the probability of type I and type II errors, and present applications to novelty detection in streamed data, with a focus on challenges arising in epitranscriptomics.
Over the last decade, there has been a significant development in the study of stochastic dispersive PDEs, broadly interpreted with random initial data and/or additive stochastic forcing, where the difficulty comes from roughness in spatial regularity. In this talk, we consider the well-posedness of stochastic nonlinear wave equations with multiplicative noises, whose Itô solutions were constructed in 80s, while the pathwise well-posedness for such an equation has been an open problem for decades. As the main challenge of this problem comes from the deficiency of temporal regularity, we overcome this issue by bridging the theory of controlled rough paths theory and the Fourier restriction norm method.
This talk is based on a joint work with A. Chapouto (CNRS) and T. Oh (Edinburgh).
The expected signature maps a collection of data streams to a lower dimensional representation, with a remarkable property: the resulting feature tensor can fully characterize the data generating distribution. This "model-free"' embedding has been successfully leveraged to build multiple domain-agnostic machine learning (ML) algorithms for time series and sequential data. The convergence results discussed in this talk bridge the gap between the expected signature's empirical discrete-time estimator and its theoretical continuous-time value, allowing for a more complete probabilistic interpretation of expected signature-based ML methods. Moreover, when the data generating process is a martingale, we suggest a simple modification of the expected signature estimator with significantly lower mean squared error and empirically demonstrate how it can be effectively applied to improve predictive performance.
The speaker's personal webpage: http://www.lorenzolucchese.com/
Link to the online talk: https://liverpool-ac-uk.zoom.us/j/97054257249?pwd=iTYraPhxjEfbmbx1SbbkaYsIr0lUHV.1
Reservoir Computing is a framework which uses the internal dynamics of recurrent, random, and fixed neural networks to perform complex transformations of input signals, enabling them to carry out various computations and tasks. Learning is restricted to the output layer and can be thought of as “reading out” from the dynamical states of the reservoir. With no training of the internal weights, reservoirs do not have the costly and difficult training associated with other kinds of deep neural networks. This talk addresses various points where Reservoir Computing and Computational Neuroscience may be of mutual benefit. Firstly, we present how the connectivity of brain networks can inspire sparse, efficient, and robust reservoirs, and how structural features like clustering, recurrency, and neuron cell-type are related to task performance and selectivity. Secondly, we demonstrate using linearised dynamics and generalised Hebbian learning algorithms how the Reservoir Computing framework can inspire a better understanding and modelling of potential brain mechanisms.
Training Neural Differential Equations requires backpropagating through a numerical solver. Since the number of solver steps n is typically > 10^2, the forward computation graph can often become too large to store in memory. Checkpointing alleviates this problem by only storing a subset of the graph and recomputing the remaining nodes. However, this introduces an additional runtime cost with time complexity O(n log n) for a memory cost of O(√n).
In this talk, I will discuss a class of algebraically reversible solvers that allow for recomputation of the forward graph in linear time O(n) and constant memory O(1). This class of reversible solvers are formed by wrapping non-reversible solvers in a 'coupling layer' and therefore inherit many desirable properties such as improved convergence order and stability. In scientific modelling problems, we show that reversible solvers obtain a 2-3x training time reduction over checkpointing while using 20x less memory.
Ensemble Data Assimilation (EDA) plays a central role in operational data assimilation systems at centres such as ECMWF. However, the associated computational cost represents a significant portion of overall computational budget. To address these challenges, at ECMWF we are developing a hybrid EDA and probabilistic, data-driven method aimed at reducing current computational burden while maintaining physical consistency. The objective is to develop generative models capable of reproducing and sampling from the high-dimensional posterior probability distribution currently approximated by ECMWF’s ensemble 4D-Variational assimilation methods. In this talk, we present some of the ongoing research in this direction, focusing on a machine learning-based particle system designed to complement and integrate with physical EDA algorithms.
Lyudmila Grigoryeva1, Hannah Lim Jing Ting2, Juan-Pablo Ortega2
1 Faculty of Mathematics and Statistics, University of St. Gallen, Switzerland
2 School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore
Reservoir computing (RC) is a methodology in which a recurrent neural network with a randomly generated state equation and functionally simple readout layer is trained to proxy the data-generating process of a time series [1], [2]. Next-generation reservoir computing (NG-RC) is an increasingly popular method which replaces the standard RC approach with nonlinear vector autoregressions in which the covariates are monomials constructed using previous inputs [3], [4]. One downside of the NG-RC approach is that its performance and complexity depend strongly on the maximum order of monomials and the number of lags of past signals. Yet, as these hyperparameters grow, the computational effort associated with the NG-RC increases exponentially. By kernelizing NG-RC, we show a more computationally tractable approach to carrying out this methodology that does not increase in complexity with the maximum order of monomials and the number of lags of past signals. In particular, we show that the NG-RC is a particular case of polynomial kernel regression. Additionally, the kernel point of view allows us to introduce a generalization of NG-RC in which we consider all (infinite) past lags and polynomial orders of arbitrarily high order in the polynomial kernel regression. The primary tool in carrying this out is the Volterra kernel introduced in [5]. Kernel regressions using Volterra amount to using as covariates left-infinite input sequences and all the monomials of all degrees constructed out of them. This leads to an infinite dimensional covariates space, which can nevertheless be explicitly and efficiently implemented using the recurrence properties of the Volterra kernel. Moreover, the Volterra kernel is universal [5]. Various numerical illustrations show that these generalizations outperform the NG-RC itself. Finally, we present an informal theorem that, under certain conditions, input/output functionals can be represented as power series with terms reminiscent of the signature transform.
References
[1] H. Jaeger and H. Haas, ‘Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication’, Science, vol. 304, no. 5667, pp. 78–80, 2004, doi: 10.1126/science.1091277.
[2] W. Maass, ‘Liquid state machines: Motivation, theory, and applications’, in Computability In Context: Computation and Logic in the Real World, S. S. Barry Cooper and A. Sorbi, Eds., World Scientific, 2011, pp. 275–296.
[3] D. J. Gauthier, E. Bollt, A. Griffith, and W. A. S. Barbosa, ‘Next generation reservoir computing’, Nat. Commun., vol. 12, no. 1, p. 5564, 2021.
[4] W. A. S. Barbosa and D. J. Gauthier, ‘Learning spatiotemporal chaos using next-generation reservoir computing’, Chaos Interdiscip. J. Nonlinear Sci., vol. 32, no. 9, 2022.
[5] L. Gonon, L. Grigoryeva, and J.-P. Ortega, ‘Reservoir kernels and Volterra series’, arXiv:2212.14641, 2022.
We propose a novel tensor-to-tensor layer for deep learning models, particularly focusing on order-2 tensors to provide an image-to-image layer that can be integrated into image processing pipelines. The algorithmic core of our method leverages the mathematical concept of corner trees, originally developed for permutation counting, providing a novel framework for tensor-to-tensor transformations that can effectively handle complex data relationships and structures.
On the one hand, our method can be seen as a higher-order generalization of state space models, which is known to offer promising linear-cost alternatives for sequence-to-sequence tasks. On the other hand, it is based on a multiparameter generalization of the signature of iterated integrals, which has proven successful in summarising stream data over increments while preserving the order of events.
Our experimental results demonstrate the effectiveness of the proposed approach across multiple tasks. In image classification, our method achieves competitive accuracy while reducing the number of trainable parameters by up to 85% and multiply-add operations by up to 65% compared to ResNet architectures. For anomaly detection tasks, our approach achieves remarkable performance, including a 100% AUROC score on the leather category, while maintaining more robust performance with a significantly lower standard deviation (3.9%) compared to conventional approaches (6.3%). These results validate the efficiency and effectiveness of our tensor-to-tensor layer in practical applications. This is a joint work with Joscha Diehl (Greifswald), Rasheed Ibraheem (Edinburgh), Leonard Schmitz (TU Berlin).