Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies

Kenneth Marino, Abhinav Gupta, Rob Fergus, and Arthur Szlam

Abstract

In this work we introduce a simple, robust approach to hierarchically training an agent in the setting of sparse reward tasks. The agent is split into a low-level and a high-level policy. The low-level policy only accesses internal, proprioceptive dimensions of the state observation. The low-level policies are trained with a simple reward that encourages changing the values of the non-proprioceptive dimensions. Furthermore, it is induced to be periodic with the use a ``phase function.'' The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time. Using this approach, we solve difficult maze and navigation tasks with sparse rewards using the Mujoco Ant and Humanoid agents and show improvement over recent hierarchical methods.

Results

Low-level policies

ant_lowlevel_1.mp4
ant_lowlevel_4.mp4
ant_lowlevel_2.mp4
ant_lowlevel_5.mp4
humanoid_lowlevel_1.mp4
humanoid_lowlevel_3.mp4
humanoid_lowlevel_2_moonwalk.mp4
humanoid_lowlevel_4.mp4

Mazes

ant_cross_4.mp4
ant_cross_2.mp4
ant_cross_3.mp4
ant_cross_1.mp4
ant_skull_4.mp4
ant_skull_2.mp4
ant_skull_3.mp4
ant_skull_1.mp4
humanoid_cross_1.mp4
humanoid_cross_3.mp4
humanoid_cross_2.mp4
humanoid_cross_4.mp4

Citation

@article{marino2019ep3,
  title={Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies},
  author={Marino, Kenneth and Gupta, Abhinav and Fergus, Rob and Szlam, Arthur},
  journal={ICLR},
  year={2019}
}