Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies
Kenneth Marino, Abhinav Gupta, Rob Fergus, and Arthur Szlam
Abstract
In this work we introduce a simple, robust approach to hierarchically training an agent in the setting of sparse reward tasks. The agent is split into a low-level and a high-level policy. The low-level policy only accesses internal, proprioceptive dimensions of the state observation. The low-level policies are trained with a simple reward that encourages changing the values of the non-proprioceptive dimensions. Furthermore, it is induced to be periodic with the use a ``phase function.'' The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time. Using this approach, we solve difficult maze and navigation tasks with sparse rewards using the Mujoco Ant and Humanoid agents and show improvement over recent hierarchical methods.
Paper: ICLR Camera Ready
Results
Low-level policies
Mazes
Citation
@article{marino2019ep3,
title={Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies},
author={Marino, Kenneth and Gupta, Abhinav and Fergus, Rob and Szlam, Arthur},
journal={ICLR},
year={2019}
}