Structured World Belief


Gautam Singh1 Skand1 Junghyun Kim1 Hyunseok Kim2 Sungjin Ahn1

1Rutgers University, 2ETRI




Published in ICML 2021

Paper Supplementary ArXiv Poster Slides

INTRODUCTION

  • We propose a self-supervised scene representation learning model that takes a sequence of observations and returns object-centric belief state representations.

  • Our belief state contains K particles and their particle weights.

  • Each particle is further composed of N object vectors or object files.

BELIEF TRACKING

We take image sequences in which objects can disappear for long periods of time before re-appearing. We provide these image sequences to our model and perform belief tracking. There is randomness in the underlying dynamics of the objects because their trajectories can split at the intersections into two branches each with probability 0.5. Furthermore, the color of the objects also changes periodically.

We note that our model can:

  • Maintain plausible object positions and object-segments for objects that have become invisible in the observations.

  • Consistently re-identify the object that re-appears after a long period of invisibility. We do this well even when objects have the same appearance.

Figure 1. Demonstration of belief tracking in our model on videos with objects of different colors. Top-left: We show the observations being provided to the model. Bottom-left: We show the position particles maintained by our model. The color of the particles denotes the object-file ID. Bottom-right: We show the segments of particles for each object file. We show a mean image over all the particles for each file. The red outline on the cells denotes when the object is visible as inferred by our model.

Figure 2. Demonstration of belief tracking in our model on videos with objects of same color. Top-left: We show the observations being provided to the model. Bottom-left: We show the position particles maintained by our model. The color of the particles denotes the object-file ID. Bottom-right: We show the segments of particles for each object file. We show a mean image over all the particles for each file. The red outline on the cells denotes when the object is visible as inferred by our model.

APPLICATION OF BELIEF TRACKING IN REINFORCEMENT LEARNING

Figure 3. Demonstration of A2C gameplay in 3D Foodchase Game. The agent is the red cube and the food is the blue object. A positive reward is received when the agent touches the food in which case the food respawns again at a random position in the arena. If the agent touches the other two objects i.e. the enemies, a negative reward is received. Crucially, all objects can become invisible for long periods of time. There is randomness in the motion of enemies and the food because at the intersections, these can turn either go straight or move the left or the right lane each with equal probability.

Top (left to right): We show the observations being provided to the model, the action sampled by the policy, the action probabilities, current reward and current value estimate. Bottom-left: We show the position particles maintained by our model. The color of the particles denotes the object-file ID. Bottom-right: We show the segments of particles for each object file. We show a mean image over all the particles for each file. The red outline on the cells denotes when the object is visible as inferred by our model.

FUTURE GENERATION OF STRUCTURED BELIEF

Figure 4. Generation samples using our model in the 3D Foodchase environment. We collected 200K frames from the 3D Foodchase environment using random action policy and we trained SWB on the collected data. We provide 10 time-steps as conditioning frames and let the model perform generation after that. Here, we visualize 5 generation samples where actions are drawn from a random policy. The second column visualizes the position particles of the 10 time-step conditioning period. The third column shows the generated images. The fourth and the fifth columns visualize the foreground and the background generations.