Options (single-level subpolicy):
Sutton, R.S., Precup, D., Singh, S. (1999). Between MDPs and semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence 112:181-211.
Bacon,Harb,Precup (2016). Option-Critic Architecture.
HAMs (any level of nesting of structured policy priors -- options can be seen as degenerate 1-level case):
Ron Parr and Stuart Russell, ``Reinforcement Learning with Hierarchies of Machines.'' In Advances in Neural Information Processing Systems 10, MIT Press, 1998.
David Andre and Stuart Russell, ``State Abstraction for Programmable Reinforcement Learning Agents.'' In Proc. AAAI-02, Edmonton, Alberta: AAAI Press, 2002.
Bhaskara Marthi, Stuart Russell, David Latham, and Carlos Guestrin, ``Concurrent hierarchical reinforcement learning.'' In Proc. IJCAI-05, Edinburgh, Scotland, 2005.
Tom Dietterich, Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. JAIR, 13, 2000.
Reward- / Subgoal- based sub-tasking:
Dayan and Hinton (1993). Feudal Reinforcement Learning
Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu (2017). FeUdal Networks for Hierarchical Reinforcement Learning
Andrew Levy, George Konidaris, Robert Platt, Kate Saenko (2017). Learning Multi-Level Hierarchies with Hindsight
Nachum, Gu, Lee, Levine (2018). HiRO: Data-Efficient Hierarchical Reinforcement Learning
Shared skills inductive bias (similar to options, but inductive bias is in sharing/re-use of the subpolicies / options):
Jacob Andreas, Dan Klein, and Sergey Levine (2017). Modular Multitask Reinforcement Learning with Policy Sketches. In Proc. ICML-17.
Frans, Ho, Chen, Abbeel, Schulman (2017) Meta-Learning Shared Hierarchies
Data-driven skill priors:
Singh*, Liu*, Zhou, Yu, Rhinehart, Levine (2020). Parrot: Data-Driven Behavioral Priors for Reinforcement Learning
Pertsch, Leel, Lim (2020). Accelerating RL with Learned Skill Priors
Composite, concurrent skills
Classic:
Chris Watkins PHD Thesis (1989). Learning from Delayed Rewards (Ch 9 onwards)
A critical investigation of current state of affairs in HRL
Nachum, Tang, Lu, Gu, Lee, Levine (2019). Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?