Tutorials
Tutorials
Tutorial at ACM SIGMETRICS 2025 on Distributional Analysis of Stochastic Algorithms - jointly with Lucy Huo.
Abstract: Stochastic algorithms power modern machine learning and optimization. They instantiate as stochastic gradient descent for loss minimization, stochastic descent-ascent for min-max optimization, TD/Q-Learning for reinforcement learning, stochastic approximation for fixed-point problems, and methods for stochastic Variational Inequalities. Together with their many variants, these algorithms become increasingly vital in today's large-scale problems with finite noisy data. It is of imminent interest to obtain fine-grained characterization of stochastic algorithms and enhance their sample and computational efficiencies.
Traditionally, stochastic algorithms are treated as the noisy versions of their deterministic counterparts, viewing their stochasticity as a nuisance. Prior work thus focuses on controlling the stochastic fluctuation of the iterates, both in terms of algorithm design (e.g., use of diminishing stepsizes) and analysis (e.g., bounding mean squared errors). A recent line of work deviates from the above paradigm and embraces the probabilistic behavior of stochastic algorithms. By viewing the iterate sequence as a stochastic process, its distribution can be studied using modern tools from Markov chain and stochastic analysis, which provides fine-grained characterization of the behaviors of the algorithms.
In this tutorial, we will present an overview of these new techniques and results for distributional analysis of stochastic algorithms. Three main ingredients of this approach will be covered. (1) Establishing finite-time distributional convergence of the iterates and relating their ergodicity properties to the characteristics of the problem instance, algorithm and data. (2) Characterization of the steady-state distribution of the iterates using the techniques of coupling and basic adjoint relationship. (3) Leveraging the precise probabilistic characterization for stepsize scheduling, variance reduction, bias refinement and efficient statistical inference. Our focus will be on the constant stepsize paradigm popular among practitioners, and emphasize disentangling the deterministic and stochastic behaviors of the algorithms as well as their transient and long-run behaviors. We will cover both background and the fundamental, state-of-the-art results, and applications in concrete optimization and RL problems. Open issues and potential future directions will also be discussed.