We present a method for temporally-extended planning over high-dimensional state spaces by learning a state representation amenable to optimization and a goal-conditioned policy to abstract time.
Optimizing subgoals over the space of raw observations is ill-defined, as the space of valid subgoals lies in a low-dimensional manifold of the raw space. To address this, we train a β-VAE, whose latent space captures the space of valid states from the raw observation space. The latent space provides a state abstraction, which we can use to plan subgoals. We train the β-VAE from a dataset of randomly collected states from the environment.
We need a goal-conditioned policy that measures reachability between a pair of states. We employ temporal difference models (TDMs) from Pong et al. (2018). Given a starting state s and goal g, TDMs measure how close the agent can get from s to g for a short time horizon. For long-horizon tasks, TDMs provide temporal abstraction, allowing us to chain multiple short-horizon tasks into a long-horizon task.
Given the current state s and goal g, our planner optimizes for subgoals over the latent space of the VAE,. Each consecutive pair of subgoals is given a feasibility score, a metric for how close the agent can get to the next subgoal, starting from the previous subgoal. We maximize the overall feasibility of the plan. We employ an additional penalty that constrains the latent subgoals to stay within the prior distribution of the latent space.
The pointmass must plan a globally-optimal path to reach the goal, from inside the u-wall to the other side.
The robot must move around the puck to the desired puck location, and then move the end effector to its desired location.
The ant must plan a globally-optimal path to reach the goal, from one side of the wall to the other.
We visualize:
@article{nasiriany2019planning,
title={Planning with Goal-Conditioned Policies},
author={Soroush Nasiriany, Vitchyr Pong, Steven Lin, Sergey Levine},
booktitle={Advances in Neural Information Processing Systems},
year={2019}
}