Advanced Topics in Learning and Decision Making
Instructors:
Pieter Abbeel & Stuart Russell
Lectures:
Tue/Thu 2-3:30pm, via zoom
In-person or Virtual: This class will be offered virtually through zoom. Note that live attendance is expected as many of the lectures will be discussion oriented.
About:
This course is a research-oriented course which will cover advanced topics in learning and decision making. Unlike many other courses, this course is not about teaching you some fully-proven-out methods. Instead we will focus on areas where current methods are still falling short and further research is needed. This is also reflected in the course logistics, with great emphasis on live discussion, students presenting papers, homework that does not have a clear-cut unique solution but instead has you explore new ideas.
Office hours:
Pieter: Fridays 3-4pm
Stuart: Tuesdays 3:30-5pm
(see pinned piazza post for zoom links)
Communication:
primarily via our class piazza
Syllabus and Class Schedule
(subject to revision; weekly readings to be added):
8/31: Overview of Reward-Free Pre-Training and Exploration [slides][scribe]
9/2: Student Presentations [scribe]
Presenter 1: [slides]
Pathak, Agrawal, Efros, Darrel (2017). Curiosity-driven Exploration by Self-supervised Prediction
Sekar*, Rybkin*, Daniilidis, Abbeel, Hafner, Pathak (2020). Planning to Explore via Self-Supervised World Models
Presenter 2: [slides]
Sharma, Gu, Levine, Kumar, Hausman (2019). DADS: Dynamics-Aware Unsupervised Discovery of Skills
Warde-Farley, Van de Wiele, Kulkarni, Ionesu, Hansen, Mnih (2018). DISCERN: Unsupervised Control through Non-Parametric Discriminative Rewards
Week 2
Self-Play and Curriculum
Quiz 2: due 9/6
Reading List
9/7: Self-play and the Alpha(Go)(Zero) series [slides][scribe]
AlphaGo: Silver, Huang, Maddison, Guez, ..., Hassabis (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484-489.
AlphaGo Zero: Silver, Schrittwieser, Simonyan, ..., Hassabis (2017). Mastering the Game of Go without Human Knowledge. Nature, 550, 354–359.
AlphaZero: Silver, Hubert, Schrittwieser, ..., Hassabis (2018) A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362(6419):1140–1144. See also preprint with methods and tables.
9/9: Student Presentations [scribe]
Presenter 1: [slides]
Bansal, Pachocki, Sidor, Sutskever, and Mordatch (2018). Emergent Complexity via Multi-Agent Competition. ICLR-18.
Baker, Kanitscheider, Markov, Wu, Powell, McGrew, and Mordatch (2020). Emergent tool use from multi-agent autocurricula. (See also the OpenAI Research Blog article, with lots of videos.)
Presenter 2: [slides]
Sukhbaatar, Lin, Kostrikov, Synnaeve, Szlam, Fergus (2018). Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play. ICLR-18.
Dennis, Jaques, Vinitsky, Bayen, Russell, Critch, Levine, Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design.'' NeurIPS-20.
Week 3 Hierarchical RL
Quiz 3: due 9/13
Reading List
9/14: An Overview on Hierarchical RL: Options, Hierarchical Abstractions of Machines, Feudal Learning [slides] [scribe]
Sutton, R.S., Precup, D., Singh, S. (1999). Between MDPs and semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence 112:181-211.
Ron Parr and Stuart Russell, ``Reinforcement Learning with Hierarchies of Machines.'' In Advances in Neural Information Processing Systems 10, MIT Press, 1998.
Dayan and Hinton (1993). Feudal Reinforcement Learning
Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu (2017). FeUdal Networks for Hierarchical Reinforcement Learning
9/16: Student Presentations [scribe]
Presenter 1: [slides]
Jacob Andreas, Dan Klein, and Sergey Levine (2017). Modular Multitask Reinforcement Learning with Policy Sketches
Frans, Ho, Chen, Abbeel, Schulman (2017) Meta-Learning Shared Hierarchies
Andrew Levy, George Konidaris, Robert Platt, Kate Saenko (2017). Learning Multi-Level Hierarchies with Hindsight
Week 4
Meta-level Decision Making
Quiz 4: due 09/20
Reading List
9/21: Basic concepts of meta-level control and bounded optimality [slides][scribe]
Stuart Russell and Devika Subramanian (1995). Provably bounded-optimal agents. Journal of Artificial Intelligence Research, 2, 575-609.
Nicholas Hay, Stuart Russell, Solomon Eyal Shimony, and David Tolpin, Selecting Computations: Theory and Applications. In Proc. UAI-12.
9/23: Applications of meta-reasoning and bounded optimality [scribe]
Presenter 1: [slides]
Christopher Lin, Andrey Kolobov, Ece Kamar, Eric Horvitz (2015). Metareasoning for planning under uncertainty. In Proc. IJCAI-15.
Justin Svegliato, Prakhar Sharma, and Shlomo Zilberstein (2020). A Model-Free Approach to Meta-Level Control of Anytime Algorithms. In ICRA-20.
Presenter 2: [slides]
Richard Lewis, Andrew Howes, and Satinder Singh (2014). Computational rationality: Linking mechanism and behavior through bounded utility maximization. Topics in Cognitive Science, 6, 279–311.
Samuel J. Gershman, Eric J. Horvitz, and Joshua B. Tenenbaum (2015). Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349, 273-8.
9/28: Preference-based Learning [slides] [scribe]
Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., and Amodei, D. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems, 2017.
Kimin Lee, Laura Smith, Pieter Abbeel. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training, ICML 2021
9/30: Student Presentations [scribe]
Presenter 1: [slides]
Garrett Warnell, Nicholas Waytowich, Vernon Lawhern, Peter Stone. Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces, AAAI 2018
Dilip Arumugam, Jun Ki Lee, Sophie Saskin, Michael L. Littman. DeepCOACH: Deep Reinforcement Learning from Policy-Dependent Human Feedback. 2019
Presenter 2: [slides]
Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano. Learning to summarize from human feedback, 2020
Jeff Wu, Long Ouyang, Daniel Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano. Recursively Summarizing Books with Human Feedback, 2021
[HW1 due] Fri 10/1 5pm [topics from Weeks 1-3]
10/5: An Overview of Inverse RL [slides] [scribe]
Andrew Ng and Stuart Russell, Algorithms for inverse reinforcement learning. ICML, 2000.
Pieter Abbeel and Andrew Ng, Apprenticeship learning via inverse reinforcement learning. ICML, 2004.
Deepak Ramachandran and E Amir: Bayesian Inverse Reinforcement Learning. IJCAI, 2007.
B. Ziebart, A. Maas, J. Bagnell, and A. Dey: Maximum entropy inverse reinforcement learning. AAAI, 2008.
10/7: Deep Learning and Inverse RL [scribe]
Presenter 1: [slides]
Chelsea Finn, Paul Christiano, Pieter Abbeel, and Sergey Levine, A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models. NeurIPS Workshop on Adversarial Training, 2016.
Presenter 2: [slides]
Jonathan Ho and Stefano Ermon, Generative adversarial imitation learning. NeurIPS, 2016.
(See also Yunzhu Li, Jiaming Song, and Stefano Ermon, InfoGAIL: Interpretable imitation learning from visual demonstrations. NeurIPS, 2017.)
10/12: Cooperative IRL [slides] [scribe]
(briefly) Simon Zhuang and Dylan Hadfield-Menell, Consequences of misaligned AI. NeurIPS 2020.
Dylan Hadfield-Menell, Stuart J. Russell, Pieter Abbeel, Anca Dragan, Cooperative inverse reinforcement learning. NeurIPS 2016.
See also Dhruv Malik, Malayandi Palaniappan, Jaime Fisac, Dylan Hadfield-Menell, Stuart Russell, and Anca Dragan, An efficient, generalized Bellman update for cooperative inverse reinforcement learning. ICML 2018.
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, and Stuart Russell, The off-switch game. IJCAI 2017.
10/14: Elaborations and alternatives [scribe]
Presenter 1: [slides]
Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, and Anca Dragan, Inverse reward design. NeurIPS 2017.
Presenter 2: [slides]
Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg, Scalable agent alignment via reward modeling: a research direction. arXiv:1811.07871.
[Project Proposal Due] Fri 10/15 5pm
10/19: Meta Reinforcement Learning and Few-Shot Imitation [slides] [scribe]
Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel, RL2: Fast Reinforcement Learning via Slow Reinforcement Learning, 2016
Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, Matt Botvinick, Learning to reinforcement learn, 2016
Chelsea Finn, Pieter Abbeel, Sergey Levine, Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017
10/21: Multi-Task Reinforcement Learning [scribe]
Presenter 1: [slides]
Peter Dayan, Improving Generalisation for Temporal Difference Learning: The Successor Representation, 1993
André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, David Silver, Successor Features for Transfer in Reinforcement Learning, NeurIPS 2017
Presenter 2: [slides]
Dmitry Kalashnikov, Jacob Varley, Yevgen Chebotar, Benjamin Swanson, Rico Jonschkowski, Chelsea Finn, Sergey Levine, Karol Hausman, MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale, 2021
Matteo Hessel, Hubert Soyer, Lasse Espeholt, Wojciech Czarnecki, Simon Schmitt, Hado van Hasselt, Multi-task Deep Reinforcement Learning with PopArt, AAAI 2019
10/26: Distributed multi-body and multi-agent RL [slides][scribe]
Stuart Russell and Andrew Zimdars, Q-Decomposition for Reinforcement Learning Agents. ICML 2003.
Bhaskara Marthi, Stuart Russell, David Latham, and Carlos Guestrin, Concurrent hierarchical reinforcement learning. IJCAI 2005.
Michael Littman, Markov games as a framework for multi-agent reinforcement learning. ICML 1994.
Michael Littman, Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research 1, 2001.
Junling Hu and Michael Wellman, Nash Q-Learning for General-Sum Stochastic Games. JMLR 4, 1039-1069, 2003.
Yoav Shoham, Rob Powers, and Trond Grenager, If multi-agent learning is the answer, what is the question? AIJ 171(7), 365-377, 2007.
10/28: Applications of multi-agent RL [scribe]
Presenter 1: [slides]
Max Jaderberg et al., Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364 (6443), 859-865, 2019.
Presenter 2: [slides]
Jakob Foerster, Francis Song, Edward Hughes, Neil Burch, Iain Dunning, Shimon Whiteson, Matthew Botvinick, Michael Bowling , Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning. ICML 2019.
Adam Lerer, Hengyuan Hu, Jakob Foerster, and Noam Brown, Improving Policies via Search in Cooperative Partially Observable Games. AAAI 2020.
[HW2 due] Fri 10/29 at 5pm [topics from Weeks 4-7]
11/2: Guest Lecturer: Karthik Narasimhan (Princeton) [slides][scribe]
Shunyu Yao, Rohan Rao, Matthew Hausknecht, Karthik Narasimhan (2020). Keep CALM and Explore: Language Models for Action Generation in Text-based Games
S.R.K. Branavan, David Silver, Regina Barzilay (2014). Learning to Win by Reading Manuals in a Monte-Carlo Framework
11/4: Guest Lecturer: Jacob Andreas (MIT) [slides] [scribe]
Yoav Artzi, Luke Zettlemoyer (2013). Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions
Jacob Andreas, Dan Klein, Sergey Levine (2017). Learning with Latent Language.
11/9: Model-based vs Model-free RL [slides] [scribe]
Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel, Model-Ensemble Trust-Region Policy Optimization, ICLR 2018
Ignasi Clavera*, Jonas Rothfuss*, John Schulman, Yasuhiro Fujita, Tamim Asfour, Pieter Abbeel, Model-Based Reinforcement Learning via Meta-Policy Optimization, CoRL 2018
Michael Janner, Justin Fu, Marvin Zhang, Sergey Levine, When to Trust Your Model: Model-Based Policy Optimization, NeurIPS 2019
11/11: no lecture (university holiday)
11/16: Guest Lecturer: Sham Kakade (Harvard) [slides][scribe]
Chapters 5 and 9 of Alekh Agarwal, Nan Jiang, Sham M. Kakade, and Wen Sun, Reinforcement Learning: Theory and Algorithms. Manuscript in preparation.
11/18: Model-based vs model-free RL (theory) [scribe]
Presenter 1: [slides]
Stephen Tu and Benjamin Recht, The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint. COLT 2019.
Presenter 2: []
Kefan Dong, Yuping Luo, and Tengyu Ma, On the expressivity of neural networks for deep reinforcement learning. ICML 2020.
11/23: Guest Lecturer: Oriol Vinyals (Google Deepmind) [scribe]
Vinyals et al, Grandmaster level in Starcraft II using multi-agent reinforcement learning, Nature 2019.
Lili Chen*, Kevin Lu*, Rajeswaran, Lee, Grover, Laskin, Abbeel, Srinivas, Mordatch, Decision Transformer: Reinforcement Learning via Sequence Modeling, NeurIPS 2021.
Sergey Levine, Aviral Kumar, George Tucker, Justin Fu, Offline RL: Tutorial, Review, Perspective on Open Problems, 2020.
11/25: no lecture (university holiday)
11/30: Guest Lecturer: Micah Carroll (Berkeley) and Dylan Hadfield-Menell (MIT) [slides][scribe]
Charles Evans and Atoosa Kasirzadeh, User Tampering in Reinforcement Learning Recommender Systems. 4th FAccTRec Workshop, RecSys 2021.
[Anonymous, under review], Estimating and penalizing induced preference shifts in recommender systems. 2021.
12/2: Prospects for RL and AI [slides][scribe]
David Silver, Satinder Singh, Doina Precup, Richard Sutton, Reward is Enough, AIJ 299, 2021.
François Chollet, On the measure of intelligence, arXiv:1911.01547, 2019.
Yixin Zhu, Tao Gao, Lifeng Fan, Siyuan Huang, Mark Edmonds, Hangxin Liu, Feng Gao, Chi Zhang, Siyuan Qi, Ying Nian Wu, Joshua B. Tenenbaum, Song-Chun Zhu (2020). Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Human-like Common Sense