Hierarchical Visual Foresight (HVF)
Hierarchical Visual Foresight (HVF)
- We propose hierarchical visual foresight (HVF), a novel technique for subgoal generation for long horizon planning of visuomotor tasks.
- HVF works by combining video prediction and generative models to produce subgoal images, conditioned on the goal image.
- The subgoal images are directly optimized to produce easy to plan subsegments, and as a result HVF identifies semantically meaningful substeps in long horizon tasks.
- We find that by using subgoals for planning, we see nearly a 200% improvement on long horizon desk manipulation tasks
Simulated Desk Manipulation
Simulated Desk Manipulation
- We demonstrate our method on long horizon manipulation of objects and a sliding door on a desk
- HVF identifies semantically meaningful images as sub tasks, such as grasping the door handle, or pushing the block and repositioning the arm
- Using subgoals enables a significant performance improvement on all 4 tasks

Additional Results
Additional Results
Maze Navigation
Maze Navigation
- We also evaluate our method on visual maze navigation
- Agent (green block) must navigate through narrow gaps in walls to reach goal specified by goal image
- The longer the task, the more HVF outperforms standard visual foresight and "Time Agnostic Prediction" (Jayaraman et al)
BAIR Robot Dataset
BAIR Robot Dataset
- We show qualitative performance of HVF on real cluttered images from the BAIR robot dataset (Ebert et al)
- HVF produces semantically meaningful subgoal images - for example reaching to objects when the goal state involves them being manipulated