World Online Seminars on
World Online Seminars on
For announcements and Zoom links, please subscribe here to our mailing list.
Speaker: Petter Kolm and Grégoire Macqueron (NYU Courant)
Date/Time: Wednesday, 1/14, 7:00pm CET (10:00am PST, 1:00pm EST)
Title: Reinforcement Learning for Discrete Stochastic LQR: Convergence, Robustness, and Stability Guarantees
Abstract: We present ongoing work on a robust, non-model-based value-iteration (VI) algorithm for discrete stochastic linear–quadratic regulation (LQR) that learns optimal controllers directly from input–state data, without system identification and without requiring an initial stabilizing policy. Our analysis establishes several new results, including (i) global exponential convergence of exact VI and (ii) small-disturbance input-to-state stability of inexact VI, both holding for any positive semidefinite initialization. These guarantees provide a theoretical foundation for reliable reinforcement learning in high-noise or partially misspecified systems.
We discuss some practical aspects of numerical implementation, including reset-based sampling for stabilizing data collection and the exploitation of blockwise structure and sparsity in system and cost matrices. As illustrative case studies, we apply the method to the “data center cooling” benchmark and to the Gârleanu–Pedersen dynamic portfolio allocation model, using these well-known problems to demonstrate the algorithm’s behavior and practical performance.
Petter Kolm and Grégoire Macqueron
NYU Courant
Date/Time: Wednesday 1/14
7:00pm CET, 10:00am PST, 1:00pm EST
(University of Vienna)
(University of California, Santa Barbara)
(University of Verona)
(Stanford University)