Paper Lists

Learning from Demonstration (Behavior Cloning)

S. Ross and D. Bagnell, “Efficient reductions for imitation learning,” in Proc. of the International Conference on Artificial Intelligence and Statistics, May, 2010.
S. Ross, G. Gordon, and D. Bagnell, "A reduction of imitation learning and structured prediction to no-regret online learning," in Proc. of the international conference on artificial intelligence and statistics, 2011.
Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, Songhwai Oh, "Real-Time Nonparametric Reactive Navigation of Mobile Robots in Dynamic Environments," Robotics and Autonomous Systems, vol. 91, pp. 11–24, May 2017.
Sungjoon Choi, Eunwoo Kim, and Songhwai Oh, "Real-Time Navigation in Crowded Dynamic Environments Using Gaussian Process Motion Control," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), Jun. 2014. [Video]
Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstrations with Mixed Qualities Using Leveraged Gaussian Processes," IEEE Transactions on Robotics, vol. 35, no. 3, pp. 564-576, Jun. 2019.
Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, and Songhwai Oh, "Leveraged Non-Stationary Gaussian Process Regression for Autonomous Robot Navigation," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2015.
Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2016.
Giusti, Alessandro, Jérôme Guzzi, Dan C. Cireşan, Fang-Lin He, Juan P. Rodríguez, Flavio Fontana, Matthias Faessler et al. "A machine learning approach to visual perception of forest trails for mobile robots." IEEE Robotics and Automation Letters 1, no. 2 (2016): 661-667.
Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2017.
Loquercio, Antonio, Ana Isabel Maqueda, Carlos R. Del Blanco, and Davide Scaramuzza. "DroNet: Learning to Fly by Driving." IEEE Robotics and Automation Letters (2018). (http://rpg.ifi.uzh.ch/dronet.html)

Multi Armed Bandits

Continuous Black-Box Optimization

Stochastic and Adversarial MAB

Contextual MAB

Dueling MAB

Combinatorial MAB

Reinforcement Learning Theory

Jaksch, Thomas, Ronald Ortner, and Peter Auer. "Near-optimal Regret Bounds for Reinforcement Learning.", JMLR 2010.
Azar, Mohammad Gheshlaghi, Ian Osband, and Rémi Munos. "Minimax regret bounds for reinforcement learning.", ICML 2017.
Zihan Zhang and Xiangyang Ji. "Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function.", NeurIPS 2020.
Chi Jin, Zhuoran Yang, Zhaoran Wang, and Michael I. Jordan. "Provably efficient reinforcement learning with linear function approximation.", COLT 2020.
Qi Cai, Zhuoran Yang, Chi Jin, and Zhaoran Wang. "Provably efficient exploration in policy optimization.", ICML 2020.

Deep Reinforcement Learning

Sergey Levine, Vladlen Koltun, "Guided Policy Search," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2013.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., and Petersen, S. "Human-level control through deep reinforcement learning," Nature, 2015.
T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized Experience Replay," arXiv, 2015.
J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, "Trust Region Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Jul, 2015. [arXiv]
H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Feb, 2016.
Z. Wang, N. de Freitas, and M. Lanctot, "Dueling Network Architectures for Deep Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
Junhyuk Oh, Valliappa Chockalingam, Satinder P. Singh, Honglak Lee, "Control of Memory, Active Perception, and Action in Minecraft," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, "Asynchronous Methods for Deep Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
Shixiang Gu, Timothy P. Lillicrap, Ilya Sutskever, Sergey Levine, "Continuous Deep Q-Learning with Model-based Acceleration," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy, "Deep Exploration via Bootstrapped DQN," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
Aviv Tamar, Sergey Levine, Pieter Abbeel, Yi Wu, Garrett Thomas, "Value Iteration Networks," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Rémi Munos, "Unifying Count-Based Exploration and Intrinsic Motivation," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
Rein Houthooft, Xi Chen, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel, "VIME: Variational Information Maximizing Exploration," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
Tejas D. Kulkarni*, Karthik R. Narasimhan*, Ardavan Saeedi, Joshua B. Tenenbaum, "Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation." Advances in neural information processing systems (NIPS), Dec, 2016.
John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, and Pieter Abbeel, "High-Dimensional Continuous Control Using Generalized Advantage Estimation," in Proc. of the International Conference of Learning Representations (ICLR), May, 2016.
Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel, "End-to-End Training of Deep Visuomotor Policies," Journal of Machine Learning Research (JMLR), 2016.
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis, "Mastering the game of Go with deep neural networks and tree search," Nature, 2016.
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis, "Mastering the game of Go without human knowledge," Nature, 2017.
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. M. O. Heess, T. Erez, Y. Tassa, and D. P. Wierstra, "Continuous control with deep reinforcement learning," U.S. Patent Application No. 15/217,758, 2017. [arXiv]
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.
J. Schulman, P. Abbeel, and X. Chen, "Equivalence between policy gradients and soft Q-Learning," arXiv preprint arXiv:1704.06440, 2017.
P. H. Richemond, and Brendan Maginnis, "A short variational proof of equivalence between policy gradients and soft Q learning," arXiv preprint arXiv:1712.08650, 2017.
J. Achiam, D. Held, A. Tamar, P. Abbeel, "Constrained Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017. [arXiv]
Y. Chebotar, K. Hausman, M. Zhang, G. Sukhatme, S. Schaal, and S. Levine, "Combining model-based and model-free updates for trajectory-centric reinforcement learning," In Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, Sergey Levine, "Reinforcement Learning with Deep Energy-Based Policies," In Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu, "FeUdal Networks for Hierarchical Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
Z. Wang, V. Bapst, N. Heess, V. Mnih,R. Munos, K. Kavukcuoglu, and N. de Freitas, "Sample efficient actor-critic with experience replay," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
B. O’Donoghue, R. Munos, K. Kavukcuoglu, and V. Mnih, "PGQ: Combining policy gradient and Q-learning," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
S. Gu, T. Lillicrap, Z. Ghahramani, R. E. Turner, and S. Levine, "Q-Prop: Sample efficient policy gradient with an off-policy critic," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
Y. Wu, E. Mansimov, R. B. Grosse, S. Liao, and J. Ba, "Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation," Advances in neural information processing systems (NIPS), Dec, 2017.
O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans, "Bridging the gap between value and policy based reinforcement learning," Advances in neural information processing systems (NIPS), Dec, 2017.
Junhyuk Oh, Satinder Singh, Honglak Lee, "Value Prediction Network," Advances in neural information processing systems (NIPS), Dec, 2017.
Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei, "Deep Reinforcement Learning from Human Preferences," Advances in neural information processing systems (NIPS), Dec, 2017.
Andrychowicz, Marcin, Dwight Crow, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba, "Hindsight experience replay," Advances in neural information processing systems (NIPS), Dec, 2017.
Justin Fu, John Co-Reyes, and Sergey Levine, "Ex2: Exploration with exemplar models for deep reinforcement learning," Advances in Neural Information Processing Systems (NIPS), Dec, 2017.
Yevgen Chebotar, Mrinal Kalakrishnan, Ali Yahya, Adrian Li, Stefan Schaal, Sergey Levine, "Path integral guided policy search," in Proc. of the International Conference on Robotics and Automation (ICRA), May, 2017.
O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans, "Trust-PCL: An Off-Policy Trust Region Method for Continuous Control," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2018.
Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou J, Leibo JZ, "Deep Q-learning from Demonstrations," Association for the Advancement of Artificial Intelligence (AAAI). 2018.
Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1466-1473, Jul. 2018. [Supplementary Material | Video | arXiv preprint],
Scott Fujimoto, Herke Hoof, and David Meger. "Addressing Function Approximation Error in Actor-Critic Methods.", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2018.
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2018.
Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine, "Data-Efficient Hierarchical Reinforcement Learning", in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.
Zhang-Wei Hong, Tzu-Yun Shann, Shih-Yang Su, Yi-Hsiang Chang, Tsu-Jui Fu, and Chun-Yi Lee, "Diversity-Driven Exploration Strategy for Deep Reinforcement Learning", in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.
Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh, "A Lyapunov-based Approach to Safe Reinforcement Learning", in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.
Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, and Songhwai Oh, "Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning," arXiv preprint: 1902.00137, 2019.
Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov, "Exploration by Random Network Distillation", in Proc. of the International Conference on Learning Representations (ICLR), May, 2019.
Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew J. Johnson, Sergey Levine, "SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning." International Conference on Machine Learning. 2019.
David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, Gregory Wayne, "Experience replay for continual learning." Advances in Neural Information Processing Systems. 2019.
Fujimoto, Scott, David Meger, and Doina Precup. "Off-Policy Deep Reinforcement Learning without Exploration." International Conference on Machine Learning. 2019.
Du, Yilun, and Karthik Narasimhan. "Task-agnostic dynamics priors for deep reinforcement learning." arXiv preprint arXiv:1905.04819 (2019).
Matas, Jan, Stephen James, and Andrew J. Davison. "Sim-to-real reinforcement learning for deformable object manipulation." arXiv preprint arXiv:1806.07851 (2018).
Dosovitskiy, Alexey, and Vladlen Koltun. "Learning to act by predicting the future." arXiv preprint arXiv:1611.01779 (2016).

Model-based (Deep) Reinforcement Learning

V Bapst, A Sanchez-Gonzalez, C Doersch, KL Stachenfeld, P Kohli., PW Battaglia, and JB Hamrick. Structured agents for physical construction. ICML 2019.
KR Allen, KA Smith, and JB Tenenbaum. The tools challenge: rapid trial-and-error learning in physical problem solving. CogSci 2019.
K Asadi, D Misra, S Kim, and ML Littman. Combating the compounding-error problem with a multi-step model. arXiv 2019.
L Kaiser, M Babaeizadeh, P Milos, B Osinski, RH Campbell, K Czechowski, D Erhan, C Finn, P Kozakowsi, S Levine, R Sepassi, G Tucker, and H Michalewski. Model-based reinforcement learning for Atari. arXiv 2019.
D Hafner, T Lillicrap, I Fischer, R Villegas, D Ha, H Lee, and J Davidson. Learning latent dynamics for planning from pixels. ICML 2019.
Y Luo, H Xu, Y Li, Y Tian, T Darrell, and T Ma. Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees. ICLR 2019.
A Nagabandi, K Konoglie, S Levine, and V Kumar. Deep dynamics models for learning dexterous manipulation. arXiv 2019.
J Schrittwieser, I Antonoglou, T Hubert, K Simonyan, L Sifre, S Schmitt, A Guez, E Lockhart, D Hassabis, T Graepel, T Lillicrap, and D Silver. Mastering Atari, Go, chess and shogi by planning with a learned model. arXiv 2019.
H van Hasselt, M Hessel, and J Aslanides. When to use parametric models in reinforcement learning? NeurIPS 2019.
R Veerapaneni, JD Co-Reyes, M Chang, M Janner, C Finn, J Wu, JB Tenenbaum, and S Levine. Entity abstraction in visual model-based reinforcement learning. CoRL 2019.
T Wang, X Bao, I Clavera, J Hoang, Y Wen, E Langlois, S Zhang, G Zhang, P Abbeel, and J Ba. Benchmarking model-based reinforcement learning. arXiv 2019.
B Amos, IDJ Rodriguez, J Sacks, B Boots, JZ Kolter. Differentiable MPC for end-to-end planning and control. NeurIPS 2018.
K Chua, R Calandra, R McAllister, and S Levine. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. NeurIPS 2018.
I Clavera, J Rothfuss, J Schulman, Y Fujita, T Asfour, and P Abbeel. Model-based reinforcement learning via meta-policy optimization. CoRL 2018.
F Ebert, C Finn, S Dasari, A Xie, A Lee, and S Levine. Visual foresight: model-based deep reinforcement learning for vision-based robotic control. arXiv 2018.
V Feinberg, A Wan, I Stoica, MI Jordan, JE Gonzalez, and S Levine. Model-based value estimation for efficient model-free reinforcement learning. ICML 2018.
A Nagabandi, GS Kahn, R Fearing, and S Levine. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. ICRA 2018.
D Ha and J Schmidhuber. World models. NeurIPS 2018.
T Anthony, Z Tian, and D Barber. Thinking fast and slow with deep learning and tree search. NIPS 2017.
C Finn and S Levine. Deep visual foresight for planning robot motion. ICRA 2017.
S Gu, T Lillicrap, I Sutskever, and S Levine. Continuous deep Q-learning with model-based acceleration. ICML 2016.
E Talvitie. Self-correcting models for model-based reinforcement learning. AAAI 2016.
M Watter, JT Springenberg, J Boedecker, M Riedmiller. Embed to control: a locally linear latent dynamics model for control from raw images. NIPS 2015.
G Williams, A Aldrich, and E Theodorou. Model predictive path integral control using covariance variable importance sampling. arXiv 2015.
M Deisenroth and CE Rasmussen. PILCO: A model-based and data-efficient approach to policy search. ICML 2011.
R Parr, L Li, G Taylor, C Painter-Wakefield, ML Littman. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. ICML 2008.
W Li and E Todorov. Iterative linear quadratic regulator design for nonlinear biological movement systems. ICINCO 2004.

Distributional Reinforcement Learning

Marc G. Bellemare, Will Dabney, and Rémi Munos, "A Distributional Perspective on Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
Will Dabney, Mark Rowland, Marc G. Bellemare, and Rémi Munos, "Distributional Reinforcement Learning with Quantile Regression", in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Feb, 2018.
Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, and Timothy Lillicrap, "Distributed Distributional Deterministic Policy Gradients ," in Proc. of the International Conference on Learning Representations (ICLR), Feb, 2018.
Will Dabney*, Georg Ostrovski*, David Silver, and Rémi Munos, "Implicit Quantile Networks for Distributional Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2018.
Mavrin, Borislav, Hengshuai Yao, Linglong Kong, Kaiwen Wu, and Yaoliang Yu. "Distributional reinforcement learning for efficient exploration." in Proc. of the International conference on machine learning (ICML), 2019.

Offline Reinforcement Learning

Yifan Wu, George Tucker, Ofir Nachum, "Behavior Regularized Offline Reinforcement Learning." arXiv preprint arXiv:1911.11361, 2019.
Ofir Nachum, Bo Dai, Ilya Kostrikov, Yinlam Chow, Lihong Li, Dale Schuurmans, "AlgaeDICE: Policy Gradient from Arbitrary Experience." arXiv preprint arXiv:1912.02074, 2019.
Yueh-Hua Wu, Nontawat Charoenphakdee, Han Bao, Voot Tangkaratt, Masashi Sugiyama, "Imitation Learning from Imperfect Demonstration.", ICML, 2019.
Siegel, Noah Y., Jost Tobias Springenberg, Felix Berkenkamp, Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, and Martin Riedmiller, "Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning.", ICLR 2020.
Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans, "GenDICE: Generalized Offline Estimation of Stationary Values.", ICLR 2020.
Shangtong Zhang, Bo Liu, Shimon Whiteson, "GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values.", ICML 2020.
Rishabh Agarwal, Dale Schuurmans and Mohammad Norouzi, "An Optimistic Perspective on Offline Reinforcement Learning.", ICML 2020.
Byung-Jun Lee, Jongmin Lee, Peter Vrancx, Dongho Kim, Kee-Eung Kim, "Batch Reinforcement Learning with Hyperparameter Gradients.", ICML 2020.
Kei Ota, Tomoaki Oiki, Devesh K. Jha, Toshisada Mariyama, Daniel Nikovski, "Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?.", ICML 2020.
Alberto Maria Metelli, Flavio Mazzolini, Lorenzo Bisi, Luca Sabbioni, Marcello Restelli, "Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning.", ICML 2020.
Brahma S. Pavse, Ishan Durugkar, Josiah P. Hanna, Peter Stone, "Reducing Sampling Error in Batch Temporal Difference Learning.", ICML 2020.
Yao Liu, Adith Swaminathan, Alekh Agarwal and Emma Brunskill, "Provably Good Batch Reinforcement Learning Without Great Exploration.", NIPS 2020.
Aaron Sonabend-W, Junwei Lu, Leo A. Celi, Tianxi Cai, Peter Szolovits, "Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation.", NIPS 2020.
Li, Jiachen, Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Keith Ross, Henrik Iskov Christensen, and Hao Su, "Multi-Task Batch Reinforcement Learning with Metric Learning.", NIPS 2020.
Xinyue Chen, Zijian Zhou, Zheng Wang, Che Wang, Yanqiu Wu, Keith Ross, "BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning.", NIPS 2020.
Daniel Jarrett, Ioana Bica and Mihaela van der Schaar, "Strictly Batch Imitation Learning by Energy-based Distribution Matching.", NIPS 2020.
Ashvin Nair, Murtaza Dalal, Abhishek Gupta, Sergey Levine, "Accelerating Online Reinforcement Learning with Offline Datasets.", arXiv preprint arXiv:2006.09359, 2020.
Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine, "Conservative Q-Learning for Offline Reinforcement Learning.", arXiv preprint arXiv:2006.04779, 2020.
Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Shane Gu, "Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization.", arXiv preprint arXiv:2006.03647, 2020.
Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua M. Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh, "Uncertainty Weighted Offline Reinforcement Learning.", arXiv preprint arXiv:2105.08140, 2020.
Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas, "Critic Regularized Regression.", NIPS 2020.
Aayam Shrestha, Stefan Lee, Prasad Tadepalli, Alan Fern, "DeepAveragers: Offline Reinforcement Learning By Solving Derived Non-Parametric MDPs.", ICLR 2021.
Anurag Ajay, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum, "OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning.", ICLR 2021.
Nuria Armengol Urpi, Sebastian Curi, Andreas Krause, "Risk-Averse Offline Reinforcement Learning.", ICLR 2021.
Ruosong Wang, Dean P. Foster, Shan M. Kakade, "What are the Statistical Limits of Offline RL with Linear Function Approximation?.", ICLR 2021.

Meta Reinforcement Learning

Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, and Pieter Abbeel, "RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning", arXiv, 2016.
Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, and Matt Botvinick, "Learning to Reinforcement Learn", arXiv, 2016.
Chelsea Finn, Pieter Abbeel, and Sergey Levine, "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel, "A Simple Neural Attentive Meta-Learner", in Proc. of the International Conference on Learning Representations (ICLR), Feb, 2018.
Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, and Sergey Levine, "Meta-Reinforcement Learning of Structured Exploration Strategies", in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.

Hierarchical Reinforcement Learning

Causal Reinforcement Learning

(Classic) Reinforcement Learning

David Blackwell. "Discounted dynamic programming," The Annals of Mathematical Statistics, 1965.
Emanuel Todorov. "Linearly-solvable Markov decision problems," Advances in neural information processing systems (NIPS), Dec, 2007.
Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1466-1473, Jul. 2018. [Supplementary Material | Video | arXiv preprint]
Christopher JCH Watkins, and Peter Dayan, "Q-learning," Machine learning, 1992.
R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, 1992.
R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems (NIPS), Nov. 2000.
Sham M. Kakade, "A natural policy gradient," Advances in Neural Information Processing Systems (NIPS), Dec. 2002.
Sham M. Kakade, and John Langford, "Approximately optimal approximate reinforcement learning," in Proc. of the International Conference on Machine Learning (ICML), 2002.
Rasmussen, Carl Edward, and Malte Kuss, "Gaussian Processes in Reinforcement Learning," Advances in Neural Information Processing Systems (NIPS), Dec. 2003.
Jens Kober, and Jan R. Peters, "Policy search for motor primitives in robotics," Advances in neural information processing systems (NIPS), Dec, 2008.
Hado V Hasselt, "Double Q-learning," Advances in Neural Information Processing Systems (NIPS), Dec, 2010.
Peters, Jan, Katharina Mülling, and Yasemin Altun. "Relative Entropy Policy Search," in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Jul, 2010.
N. Heess, D. Silver, and Y. W. Teh, "Actor-critic reinforcement learning with energy-based policies. In Proc. of the European Workshop on Reinforcement Learning, Jun, 2012.
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin A. Riedmiller, "Deterministic Policy Gradient Algorithms," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2014.

Constrained Markov Decision Processes

Safe Exploration

Zimmer, Christoph, Mona Meister, and Duy Nguyen-Tuong. "Safe active learning for time-series modeling with gaussian processes." Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018.

Koller, Torsten, Felix Berkenkamp, Matteo Turchetta, and Andreas Krause. "Learning-based model predictive control for safe exploration." In 2018 IEEE conference on decision and control (CDC), pp. 6059-6066. IEEE, 2018.

Dalal, Gal, Krishnamurthy Dvijotham, Matej Vecerik, Todd Hester, Cosmin Paduraru, and Yuval Tassa. "Safe exploration in continuous action spaces." arXiv preprint arXiv:1801.08757 (2018).

Berkenkamp, Felix, Angela P. Schoellig, and Andreas Krause. "Safe controller optimization for quadrotors with Gaussian processes." 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2016.

Sui, Yanan, Alkis Gotovos, Joel Burdick, and Andreas Krause. "Safe exploration for optimization with Gaussian processes." In International Conference on Machine Learning, pp. 997-1005. PMLR, 2015.

Berkenkamp, Felix, et al. "Safe model-based reinforcement learning with stability guarantees." Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017.

Page updated

Google Sites

Report abuse