Abstract
Data assimilation (DA) combines partial observations with a dynamical model to improve state estimation. Filter-based DA uses only past and present data and is the prerequisite for real-time forecasts. Smoother-based DA exploits both past and future observations. It aims to fill in missing data, provide more accurate estimations, and develop high-quality datasets. However, the standard smoothing procedure requires using all historical state estimations, which is storage-demanding, especially for high-dimensional systems. This paper develops an adaptive-lag online smoother for a large class of complex dynamical systems with strong nonlinear and non-Gaussian features, which has important applications to many real-world problems. The adaptive lag allows the DA to utilize only observations within a nearby window, significantly reducing computational storage. Online lag adjustment is essential for tackling turbulent systems, where temporal autocorrelation varies significantly over time due to intermittency, extreme events, and nonlinearity. Based on the uncertainty reduction in the estimated state, an information criterion is developed to systematically determine the adaptive lag. Notably, the mathematical structure of these systems facilitates the use of closed analytic formulae to calculate the online smoother and the adaptive lag, avoiding empirical tunings as in ensemble-based DA methods. The adaptive online smoother is applied to studying three important scientific problems. First, it helps detect online causal relationships between state variables. Second, its advantage of computational storage is illustrated via Lagrangian DA, a high-dimensional nonlinear problem. Finally, the adaptive smoother advances online parameter estimation with partial observations, emphasizing the role of the observed extreme events in accelerating convergence.
BibTeX Entry
@article{
}
Abstract
The Conditional Gaussian Nonlinear System (CGNS) is a broad class of nonlinear stochastic dynamical systems. Given the trajectories for a subset of state variables, the remaining follow a Gaussian distribution. Despite the conditionally linear structure, the CGNS exhibits strong nonlinearity, thus capturing many non-Gaussian characteristics observed in nature through its joint and marginal distributions. Desirably, it enjoys closed analytic formulae for the time evolution of its conditional Gaussian statistics, which facilitate the study of data assimilation and other related topics. In this paper, we develop a martingale-free approach to improve the understanding of CGNSs. This methodology provides a tractable approach to proving the time evolution of the conditional statistics by deriving results through time discretization schemes, with the continuous-time regime obtained via a formal limiting process as the discretization time-step vanishes. This discretized approach further allows for developing analytic formulae for optimal posterior sampling of unobserved state variables with correlated noise. These tools are particularly valuable for studying extreme events and intermittency and apply to high-dimensional systems. Moreover, the approach improves the understanding of different sampling methods in characterizing uncertainty. The effectiveness of the framework is demonstrated through a physics-constrained, triad-interaction climate model with cubic nonlinearity and state-dependent cross-interacting noise.
Typographical Errors:
Typographical Error: At the beginning of Section 2.4 - Preliminaries and at regularity condition (1), the superscripts of the noise feedback matrices’ standard elements corresponding to the state variables should be in boldface: x and y.
Citation Error: Reference [47] (Simonoff, J. Smoothing Methods in Statistics; Springer Series in Statistics; Springer: New York, NY, USA, 2012.), cited in page 2 (3rd paragraph of Section 1 - Introduction), was included by accident.
Typographical Error: In the integrals appearing from equation (A20) onward in the proof of Theorem 2, the upper limit should be t (not T), and the integration variable should be s instead of t, consistent with the preceding integrals. In addition, the differential is missing from the integral immediately before the application of Grönwall’s inequality (third-to-last equation). These typos do not affect the result of the theorem.
Typographical Error: In the last equation of the proof of Theorem 2, the exponent should be 8TM²C₂, instead of 2TM²C₂, consistent with the preceding equation. This typo does not alter the result of the theorem whatsoever.
BibTeX Entry
@article{andreou2025martingale,
title = "{A Martingale-Free Introduction to Conditional Gaussian Nonlinear Systems}",
author = "Andreou, Marios and Chen, Nan",
journal = "Entropy",
ISSN = "1099-4300",
publisher = "MDPI",
volume = "27",
number = "1",
article-number = "2",
year = "2025",
DOI = "10.3390/e27010002",
URL = "https://www.mdpi.com/1099-4300/27/1/2"
}