Week 5: Human-in-the-Loop RL

Preference-based learning:

Akrour, R., Schoenauer, M., and Sebag, M. Preference-based policy learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2011.
Akrour, R., Schoenauer, M., and Sebag, M. April: Active preference learning-based reinforcement learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2012.
Wilson, A., Fern, A., and Tadepalli, P. A Bayesian approach for policy learning from trajectory preference queries. In Advances in Neural Information Processing Systems, 2012.
Wirth, C. and Fürnkranz, J. Preference-based reinforcement learning: A preliminary survey. In ECML/PKDD Workshop on Reinforcement Learning from Generalized Feedback: Beyond Numeric Rewards, 2013.
Wirth, C., Fürnkranz, J., and Neumann, G. Model-free preference-based reinforcement learning. In Conference on Artificial Intelligence, 2016.
Sadigh, D., Dragan, A. D., Sastry, S., and Seshia, S. A. Active preference-based learning of reward functions. In Robotics: Science and Systems, 2017.
Biyik, E. and Sadigh, D. Batch active preference-based learning of reward functions. In Conference on Robot Learning, 2018.
Biyik, E., Huynh, N., Kochenderfer, M. J., and Sadigh, D. Active preference-based gaussian process regression for reward learning. In Robotics: Science and Systems, 2020.
Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., and Amodei, D. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems, 2017.
Kimin Lee, Laura Smith, Pieter Abbeel. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training, ICML 2021

Preference-based RL for dialog / summarization

Sugiyama, H., Meguro, T., and Minami, Y. Preference learning based inverse reinforcement learning for dialog control. In Conference of the International Speech Communication Association, 2012.
Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano. Learning to summarize from human feedback, 2020

COACH:

MacGlashan, James, Ho, Mark K., Loftin, Robert Tyler, Peng, Bei, Roberts, David L., Taylor, Matthew E., and Littman, Michael L. COACH: Interactive learning from policy-dependent human feedback. In ICML, 2017.
Dilip Arumugam, Jun Ki Lee, Sophie Saskin, Michael L. Littman. DeepCOACH: Deep Reinforcement Learning from Policy-Dependent Human Feedback. 2019

TAMER:

Bradley Knox and Peter Stone. Interactively Shaping Agents via Human Reinforcement. K-CAP 2009.
Garrett Warnell, Nicholas Waytowich, Vernon Lawhern, Peter Stone. Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces, AAAI 2018

Learning a Goal Classifier

Avi Singh, Larry Yang, K. Hartikainen, C. Finn, S. Levine. End-to-End Robotic Reinforcement Learning without Reward Engineering, 2019.
Laura Smith, Nikita Dhawan, Marvin Zhang, Pieter Abbeel, Sergey Levine. AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos, RSS 2020

Annie Xie, Avi Singh, Sergey Levine, Chelsea Finn. Few-Shot Goal Inference for Visuomotor Learning and Planning, CoRL 2018.

Other

Pilarski, P. M., Dawson, M. R., Degris, T., Fahimi, F., Carey, J. P., and Sutton, R. S. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning. In International Conference on Rehabilitation Robotics, 2011.
Leike, J., Krueger, D., Everitt, T., Martic, M., Maini, V., and Legg, S. Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871, 2018.

Page updated

Google Sites

Report abuse