Time-Agnostic Prediction:
Predicting Predictable Video Frames
Dinesh Jayaraman, Frederik Ebert, Alexei Efros, Sergey Levine
Project abstract
Prediction is arguably one of the most basic functions of an intelligent system. In general, the problem of predicting events in the future or between two waypoints is exceedingly difficult. However, most phenomena naturally pass through relatively predictable bottlenecks---while we cannot predict the precise trajectory of a robot arm between being at rest and holding an object up, we can be certain that it must have picked the object up. To exploit this, we decouple visual prediction from a rigid notion of time. While conventional approaches predict frames at regularly spaced temporal intervals, our time-agnostic predictors (TAP) are not tied to specific times so that they may instead discover predictable "bottleneck" frames no matter when they occur. We evaluate our approach for future and intermediate frame prediction across three robotic manipulation tasks. Our predictions are not only of higher visual quality, but also correspond to coherent semantic subgoals in temporally extended tasks.
Download paper here
Bibtex
article
To cite this paper, please use the following Bibtex entry:
@inproceedings{jayaraman2019tap,
title={Time-Agnostic Prediction: Predicting Predictable Video Frames},
author={Jayaraman, Dinesh and Ebert, Frederik and Efros, Alexei A and Levine, Sergey},
booktitle={ICLR},
year={2019}
}
Data videos
Grasping task
Pick-and-place task
Pushing task
BAIR pushing task
Qualitative examples
In the paper, we provided some interesting example predictions in each setting comparing our method against baselines to showcase the differences between the methods. Here, we provide several random examples corresponding to different qualitative example figures in the paper.
In each case, as in the paper, each row corresponds to a separate example for which predictions from different methods (indicated by column names at the top of each figure) are shown.
(Coming soon: GIFs with input frames and prediction, for easier visualization)
Forward prediction results on grasping

Intermediate prediction results on grasping

Intermediate prediction results on pick-and-place

Intermediate prediction results on two-object pushing

VAE stochastic intermediate prediction results on pick-and-place

Intermediate prediction results on BAIR pushing
