Yufei Zhang

November 22nd


Title: Convergence of Policy Gradient Methods for Finite-horizon Stochastic Linear-quadratic Control Problems

Speaker: Yufei Zhang (LSE)

Date/Time: Tuesday, 11/22, 7:45pm CET (10:45am PDT, 1:45pm EDT)

Abstract: Recently, policy gradient (PG) methods have achieved notable success in various sequential decision-making applications. Much of the attention, however, has been for the discrete-time setting. Characterising the convergence rate of these algorithms for continuous-time control problems remains a challenging and open problem.

This talk studies the convergence of PG methods for finite-horizon exploratory linear-quadratic control (LQC) problems. This setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularisers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to obey an a-priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm. If time allows, extensions of the algorithm to nonlinear control problems will be discussed.

This is joint work with Michael Giegrich and Christoph Reisinger.

Bio: Yufei is an assistant professor in the Department of Statistics of LSE. His research interests lie at the intersection of stochastic control and games, machine learning, and mathematical finance. Yufei received his Ph.D. degree in Mathematics from the University of Oxford in 2021.

Meeting Recording: https://ucsb.zoom.us/rec/share/VQEr7WQLRGeUhs_-7KZfpHPLwRz464-qNq1LgAS72hstVK0_7lKOQD2i8Nhy31bw.uF5Z4Uj_pd99X76t

Access Passcode: b.Dsp4?4