We propose hierarchical visual foresight (HVF), a novel technique for subgoal generation for long horizon planning of visuomotor tasks.
HVF works by combining video prediction and generative models to produce subgoal images, conditioned on the goal image.
The subgoal images are directly optimized to produce easy to plan subsegments, and as a result HVF identifies semantically meaningful substeps in long horizon tasks.
We find that by using subgoals for planning, we see nearly a 200% improvement on long horizon desk manipulation tasks
Simulated Desk Manipulation
We demonstrate our method on long horizon manipulation of objects and a sliding door on a desk
HVF identifies semantically meaningful images as sub tasks, such as grasping the door handle, or pushing the block and repositioning the arm
Using subgoals enables a significant performance improvement on all 4 tasks
HVF_SEP19.mp4
Additional Results
Maze Navigation
We also evaluate our method on visual maze navigation
Agent (green block) must navigate through narrow gaps in walls to reach goal specified by goal image
The longer the task, the more HVF outperforms standard visual foresight and "Time Agnostic Prediction" (Jayaraman et al)
BAIR Robot Dataset
We show qualitative performance of HVF on real cluttered images from the BAIR robot dataset (Ebert et al)
HVF produces semantically meaningful subgoal images - for example reaching to objects when the goal state involves them being manipulated