Sequential Neural Processes

Gautam Singh* Jaesik Yoon* Youngsung Son Sungjin Ahn

Rutgers University SAP ETRI


When we see a new environment, we quickly form a world model in which we can imagine multiple possible futures and how a future scene might look like from any vantage point. As humans, we are aware of how things might be moving or changing over time - even when we are not looking.

This ability can be seen as quickly learning a random function from few data-points and knowing how this function evolves with time. We call this modelling as Sequential Neural Processes (SNP). A time-evolving random function has some interesting connections and use cases:

Neural Dynamic Scene Inference - a 4D model that infers the scene and its dynamics and generates plausible future scenes from queried camera viewpoints.

Meta-Transfer Learning - Consider, for example, a game-playing agent which, after clearing up the current stage, levels up to the next stage where more and faster enemies are placed than the previous stage. With SNP, the agent can efficiently learn the policy for the new stage with a few observations, but it can also learn and transfer the general trend from the past that there will be more and faster enemies in the future stages.

3D Environments

For each environment, we showed at most 5 context images in the first 5 time-steps of the roll-out. For the remaining time-steps, the model was only told which actions are being given to each object and we let the model sample the future. At each time-step, we query the model with nine camera viewpoints (shown as C1 through C9) around the 3D arena and let the model generate the images as would be seen from those viewpoints.

Each GIF shows predictions starting t=5 up to t=29 for a given random environment. In each illustration, the left pane shows the environment map and the ground truth images as would be seen from the query viewpoints. The right pane shows the model's corresponding generations.

Color Cube Environment

Environment consists of a walled arena with each wall colored differently. There is a cube with differently colored faces. It is moving according to the action provided to it (shown on the top bar). Actions are Left, Right, Up, Down movements and Anti-Clockwise and Clockwise rotations.

We see that SNP correctly performs the object transitions even beyond the training time-horizon at t=10.

More examples here.

Multi-Object Environment

Environment consists of a walled arena with each wall colored differently. There are three objects - a sphere, a cylinder and a cube in the arena. Each object is moving according to the action provided to it (shown on the top bar). An action 3-tuple, say, LDR means that sphere would move to the left, the cylinder downwards and the cube to the right.

We see that SNP correctly performs the object transitions even beyond the training time-horizon at t=10.

More examples here.

2D Environment

Moving Color Shapes Environment

This environment consists of a 128x128 sized white canvas having two objects moving and bouncing on the walls. Objects are picked with a random shape and color. To test the stochasticity in transitions, the object color may randomly be changed once in any episode withe a fixed rule. For example, red may change to magenta or blue may change to cyan. When two objects overlap, one covers the other based on a fixed priority ordering. Given a 2D viewpoint, the agent can observe a 64x64-sized cropped patch of the canvas around it.

Nine viewpoints are shown in the image below for each time-step. Each of these viewpoints are evenly spaced on the canvas. The ground-truth section shows the true target images for those viewpoints. The context section shows if any context is being provided at that time-step and it appears as gray if no context exists. The two middle sections show two samples of generations from the SNP model. The right-most section shows the corresponding generations from GQN.

Prediction Setting

This is an example of the prediction setting i.e. showing context in the early time-steps and then generating the future unassisted.

We note the following from this.

1. SNP samples are predicting plausible futures based on the modelled transition dynamics.

2. SNP samples model the stochasticity shown through the 2 samples drawn above. Since a cyan shape may randomly change color to blue based on the color change rule, we can observe that one sample keeps the cyan color while the other changes it to blue.

3. Training was performed with time-horizon of 20. We show that SNP generalizes beyond this as we show the roll-out upto t=29. In contrast, GQN cannot generalize beyond the training time-horizon t=20 since it does not model time explicitly.

Tracking and Belief Update

This is an example of the tracking setting. In this demonstration, we first show contexts in the early time-steps and then allow the predictions to diverge from the true. Then we intermittently show short bursts of context to demonstrate the belief update. On seeing such context, we note that the predictions become re-aligned with the true object positions, velocities and colors.

More examples here.

1D Environment

Data set from Gaussian Process with dynamics

We generate the data set from a Gaussian Process with a squared-exponential kernel and a small likelihood noise. In each episode, the small likelihood noise is fixed and the kernel is changed with fixed dynamics for the hyper-parameters of the kernel at each time-step. To introduce transition stochasticity, a small Gaussian noise is added. The data set consists of 50 time-steps in each episode. To test, only one point is shown at each of the randomly chosen 45 time-steps and nothing is shown in the remaining 5 time-steps.

In the animation below, the black dotted line, the big black dot, the big blue dot, the blue line and the sky-blue area are the ground-truth, the given point at the current time-step, given points in the past, the predictions and the uncertainty of the predictions, respectively. Previously given points gradually become more transparent.

We can see that SNP can predict targets with small uncertainty by capturing the recent points and predicting based on the tendency of the past points.

This suggests that SNP can transfer the past knowledge and update it with new knowledge effectively.

More examples here.

More Examples

3D Color Cube Environment

3D Multi-Object Environment

2D Moving Color Shapes Environment


Tracking and Belief Update

1D Gaussian Process with dynamics experiments