References

Learning linear dynamical systems via spectral filtering
E Hazan, K Singh, C Zhang
Advances in Neural Information Processing Systems 30 (NIPS 2017)
Spectral filtering for general linear dynamical systems

E Hazan, H Lee, K Singh, C Zhang, Y Zhang

Neural Information Processing Systems (NIPS), 2018

No-regret prediction in marginally stable systems

U Ghai, H Lee, K Singh, C Zhang, Y Zhang

Conference on Learning Theory, 1714-1757

Research papers in online stochastic control

1. Regret Bounds for the Adaptive Control of Linear Quadratic Systems.

Y Abbasi-Yadkori, C Szepesvári

COLT 2011, 1-26

2. Model-Free Linear Quadratic Control via Reduction to Expert Prediction

Y Abbasi-Yadkori, N Lazic, C Szepesvari

The 22nd International Conference on Artificial Intelligence and Statistics

3. Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with root(T) Regret

A Cassel, T Koren

4. Logarithmic regret for learning linear quadratic regulators efficiently

A Cassel, A Cohen, T Koren

International Conference on Machine Learning, 1328-1337

5. Online linear quadratic control

A Cohen, A Hasidim, T Koren, N Lazic, Y Mansour, K Talwar

International Conference on Machine Learning, 1029-1038

6. Learning without mixing: Towards a sharp analysis of linear system identification

M Simchowitz, H Mania, S Tu, MI Jordan, B Recht

Conference On Learning Theory, 439-473

7. Learning linear dynamical systems with semi-parametric least squares

M Simchowitz, R Boczar, B Recht

Conference on Learning Theory, 2714-2802

8. Logarithmic regret bound in partially observable linear dynamical systems

S Lale, K Azizzadenesheli, B Hassibi, A Anandkumar

9. Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting

S Lale, K Azizzadenesheli, B Hassibi, A Anandkumar

10. Learning nonlinear dynamical systems from a single trajectory

D Foster, T Sarkar, A Rakhlin

Learning for Dynamics and Control, 851-861

11. Global convergence of policy gradient methods for the linear quadratic regulator

M Fazel, R Ge, S Kakade, M Mesbahi

International Conference on Machine Learning, 1467-1476

12. Online convex programming and regularization in adaptive control

M Raginsky, A Rakhlin, S Yüksel

49th IEEE Conference on Decision and Control (CDC), 1957-1962

13. Geometric Exploration for Online Control

O Plevrakis, E Hazan

Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

Competitive ratio for online control

Competitive Control with Delayed Imperfect Information

C Yu, G Shi, SJ Chung, Y Yue, A Wierman

Online Optimization with Memory and Competitive Control

G Shi, Y Lin, SJ Chung, Y Yue, A Wierman

Proceedings of NeurIPS

Regret minimization for Markov Decision Processes

Online Markov decision processes

E Even-Dar, SM Kakade, Y Mansour

Mathematics of Operations Research 34 (3), 726-736

Experts in a Markov decision process

E Even-Dar, SM Kakade, Y Mansour

Advances in neural information processing systems, 401-408

Markov decision processes with arbitrary reward processes

JY Yu, S Mannor, N Shimkin

Mathematics of Operations Research 34 (3), 737-757

Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

Y Abbasi-Yadkori, P Bartlett, V Kanade, Y Seldin, C Szepesvari

Neural Information Processing Systems

Better rates for any adversarial deterministic MDP

O Dekel, E Hazan

International Conference on Machine Learning, 675-683

Textbooks and other relevant readings (references limited to online availability)

1. Introduction to Online Convex Optimization, by Elad Hazan

2. Underactuated Robotics by Russ Tedrake

3. Reinforcement learning: an introduction, by Richard S. Sutton, Andrew G. Barto

4. Reinforcement Learning: Theory and Algorithms, by Alekh Agarwal, Nan Jiang, Sham M. Kakade, Wen Sun

5. Bandit Algorithms, by Tor Lattimore and Csaba Szepesvari

6. Control Systems & Reinforcement Learning by Sean Meyn

Page updated

Google Sites

Report abuse