Hierarchical reinforcement learning offers a promising framework for learning policies that span long time horizon, but designing the abstraction between high-level policy and low-level policy is challenging. Language is a compositional representation that is human-interpretable and flexible, making it suitable for encoding a wide range of behaviors.
We propose to use language as the abstraction between the high-level and low-level policy, and demonstrate that the resulting policy can successfully solve long-horizon tasks with sparse reward and can generalize well using the compositionality of language even in challenging high-dimensional observation and action spaces. First, we demonstrate the benefit of our method in a low-dimensional observation space through various ablation and comparison against different HRL methods, and then scale our methods to challenging pixel observation space where the baselines cannot make progress.
More environment documentation and code of the algorithms are coming soon!
In this task, the high level policy needs to make the following statements simultaneously true:
In this task, the high level policy needs to order the objects following:
In this task, the high level policy needs to sort the objects such that: