Temporal abstraction is the key to learn complex tasks by representing knowledge and developing strategies over a wide range of temporal scales. A major challenge in developing algorithms with this property, is understanding the scope of their behavioral abstractions, or options, autonomously. In practice, learning options end-to-end brings with it major challenges such as: multiple options adopting similar behavior and a shrinking subset of relevant options. In this paper, we aim to tackle these challenges by drawing attention to diversity in options. We first attempt to encourage options to act diversely by introducing an intrinsically motivated reward complementing the task reward. We then propose a novel termination objective that decouples the option's termination from the maximum expected returns objective and instead targets critical states in the agent's trajectory with the intention of generating diverse trajectories. We show our approach not only improves performance and robustness of the algorithm, it also generates useful, stable and interpretable options in several discrete and continuous control tasks.
Options terminate in states where balance is very crucial.
Options terminating in the vertical hallway.
Options are represented by different colors (dark blue and light blue) on the trajectory.
Always manages to run upright without flipping over
First flips over and slides on the back.