arXiv paper: http://arxiv.org/abs/1605.07157 Supplementary Appendix: https://goo.gl/G0ZIr4 Dataset coming soon.
Robot pushing evaluation
Below are example video predictions from various models in our evaluation on the robot interaction dataset. The ground truth video shows all ten time steps, whereas all other videos show only the generated 8 time steps (conditioned on only the first two ground truth images, and all robot actions)
Sample video predictions - seen objects
ground truth // CDNA // ConvLSTM, with skip // FF multiscale [14] // FC LSTM [17]
![]() ![]() ![]() Sample video predictions - novel objects
ground truth // CDNA // ConvLSTM, with skip // FF multiscale [14] // FC LSTM [17]
![]() ![]() ![]() Note how the ConvLSTM model predicts motion less accurately compared to the CDNA model, and degrades the background (e.g. the left edge of the table).
Changing the action
CDNA, novel objects
0x action // 0.5x action // 1x action // 1.5x action
![]() ![]() Visualized masks
CDNA, seen objects, masks 0 (background), 2, and 8
![]() CDNA, novel objects, masks 0 (background), 2, and 8
![]() ![]() ![]() ![]() Human3.6M evaluation
Below are example video predictions from various models in our evaluation on the Human3.6M, with a held-out human subject. The ground truth video shows ten ground truth time steps, whereas all other videos show the generated 10 time steps (conditioned on only the first ten ground truth images, which are not shown)
Sample video predictions
ground truth // DNA // FF multiscale [14] // FC LSTM [17]
|



























