Week 3: Hierarchical RL

Options (single-level subpolicy):
- Sutton, R.S., Precup, D., Singh, S. (1999). Between MDPs and semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence 112:181-211.
- Bacon,Harb,Precup (2016). Option-Critic Architecture.
HAMs (any level of nesting of structured policy priors -- options can be seen as degenerate 1-level case):
- Ron Parr and Stuart Russell, ``Reinforcement Learning with Hierarchies of Machines.'' In Advances in Neural Information Processing Systems 10, MIT Press, 1998.
- David Andre and Stuart Russell, ``State Abstraction for Programmable Reinforcement Learning Agents.'' In Proc. AAAI-02, Edmonton, Alberta: AAAI Press, 2002.
- Bhaskara Marthi, Stuart Russell, David Latham, and Carlos Guestrin, ``Concurrent hierarchical reinforcement learning.'' In Proc. IJCAI-05, Edinburgh, Scotland, 2005.
- Tom Dietterich, Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. JAIR, 13, 2000.
Reward- / Subgoal- based sub-tasking:
- Dayan and Hinton (1993). Feudal Reinforcement Learning
- Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu (2017). FeUdal Networks for Hierarchical Reinforcement Learning
- Andrew Levy, George Konidaris, Robert Platt, Kate Saenko (2017). Learning Multi-Level Hierarchies with Hindsight
- Nachum, Gu, Lee, Levine (2018). HiRO: Data-Efficient Hierarchical Reinforcement Learning
Shared skills inductive bias (similar to options, but inductive bias is in sharing/re-use of the subpolicies / options):
- Jacob Andreas, Dan Klein, and Sergey Levine (2017). Modular Multitask Reinforcement Learning with Policy Sketches. In Proc. ICML-17.
- Frans, Ho, Chen, Abbeel, Schulman (2017) Meta-Learning Shared Hierarchies
Data-driven skill priors:
- Singh*, Liu*, Zhou, Yu, Rhinehart, Levine (2020). Parrot: Data-Driven Behavioral Priors for Reinforcement Learning
- Pertsch, Leel, Lim (2020). Accelerating RL with Learned Skill Priors
Composite, concurrent skills
- Peng et al (2019) MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies

Classic:
- Chris Watkins PHD Thesis (1989). Learning from Delayed Rewards (Ch 9 onwards)
A critical investigation of current state of affairs in HRL
- Nachum, Tang, Lu, Gu, Lee, Levine (2019). Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

Page updated

Google Sites

Report abuse