Week7:
10/12: Cooperative IRL
(briefly) Simon Zhuang and Dylan Hadfield-Menell, Consequences of misaligned AI. NeurIPS 2020.
Dylan Hadfield-Menell, Stuart J. Russell, Pieter Abbeel, Anca Dragan, Cooperative inverse reinforcement learning. NeurIPS 2016.
See also Dhruv Malik, Malayandi Palaniappan, Jaime Fisac, Dylan Hadfield-Menell, Stuart Russell, and Anca Dragan, An efficient, generalized Bellman update for cooperative inverse reinforcement learning. ICML 2018.
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, and Stuart Russell, The off-switch game. IJCAI 2017.
10/14: Elaborations and alternatives
Presenter 1:
Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, and Anca Dragan, Inverse reward design. NeurIPS 2017.
Presenter 2:
Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg, Scalable agent alignment via reward modeling: a research direction. arXiv:1811.07871.