Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation

Suraj Nair, Chelsea Finn

Stanford University | Google Brain

Hierarchical Visual Foresight (HVF)

We propose hierarchical visual foresight (HVF), a novel technique for subgoal generation for long horizon planning of visuomotor tasks.
HVF works by combining video prediction and generative models to produce subgoal images, conditioned on the goal image.
The subgoal images are directly optimized to produce easy to plan subsegments, and as a result HVF identifies semantically meaningful substeps in long horizon tasks.
We find that by using subgoals for planning, we see nearly a 200% improvement on long horizon desk manipulation tasks

Simulated Desk Manipulation

We demonstrate our method on long horizon manipulation of objects and a sliding door on a desk
HVF identifies semantically meaningful images as sub tasks, such as grasping the door handle, or pushing the block and repositioning the arm
Using subgoals enables a significant performance improvement on all 4 tasks

HVF_SEP19.mp4

Additional Results

Maze Navigation

We also evaluate our method on visual maze navigation
Agent (green block) must navigate through narrow gaps in walls to reach goal specified by goal image
The longer the task, the more HVF outperforms standard visual foresight and "Time Agnostic Prediction" (Jayaraman et al)

BAIR Robot Dataset

We show qualitative performance of HVF on real cluttered images from the BAIR robot dataset (Ebert et al)
HVF produces semantically meaningful subgoal images - for example reaching to objects when the goal state involves them being manipulated

Page updated

Report abuse