Learning robust control for LQR systems with multiplicative noise via policy gradient
Keywords: Optimal, robust, control, reinforcement learning, policy, gradient, optimization, nonconvex, gradient domination, Polyak-Lojasiewicz, inequality, concentration, bound
Summary
We show that the linear quadratic regulator with multiplicative noise (LQRm) objective is gradient dominated, and thus applying policy gradient results in global convergence to the globally optimum control policy with polynomial dependence on problem parameters. The learned policy accounts for inherent parametric uncertainty in system dynamics and thus improves stability robustness. Results are provided both in the model-known and model-unknown settings where samples of system trajectories are used to estimate policy gradients.
Read the paper on arXiv.