Learning Visual Predictive Models of Physics for Playing Billiards


The ability to plan and execute goal specific actions in varied, unexpected settings is a central requirement of intelligent agents. In this paper, we explore how an agent can be equipped with an internal model of the dynamics of the external world, and how it can use this model to plan novel actions by running multiple internal simulations ("visual imagination"). Our models directly process raw visual input, and use a novel object-centric prediction formulation based on visual glimpses centered on objects (fixations) to enforce translational invariance of the learned physical laws. The agent gathers training data through random interaction with a collection of different environments, and the resulting model can then be used to plan goal-directed actions in novel environments that the agent has not seen before. We demonstrate that our agent can accurately plan actions for playing a simulated billiards game, which requires pushing a ball into a target position or into collision with another ball.

Visual Imaginations

The following videos contrast the imagined trajectories from our model against the the ground truth trajectories produced from the physics simulator. 

1-ball worlds


2-ball worlds

2-ball worlds

3-ball worlds

3-ball worlds

Generalization - Testing on wall environments much larger than those during training time. 
The LSTM memory remembers the velocity of the ball, even when there is nothing to anchor against in the visual glimpse (shown in light gray).

Large wall environement

Large Wall environment