Abstract

For autonomous vehicles (AVs) to behave appropriately on roads populated by human-driven vehicles, they must be able to reason about the uncertain intentions and decisions of other drivers from rich perceptual information. Towards these capabilities, we present a probabilistic forecasting model of future interactions of multiple agents. We perform both standard forecasting and conditional forecasting with respect to the AV's goals. Conditional forecasting reasons about how all agents will likely respond to specific decisions of a controlled agent. We train our model on real and simulated data to forecast vehicle trajectories given past positions and LIDAR. Our evaluation shows that our model is substantially more accurate in multi-agent driving scenarios compared to existing state-of-the-art. Beyond its general ability to perform conditional forecasting queries, we show that our model's predictions of all agents improve when conditioned on knowledge of the AV's intentions, further illustrating its capability to model agent interactions.

Gif Highlight

Using our method to forecast on the nuScenes dataset.

Video (Best viewed in HD)

Forecasting with Estimating Social-forecasting Probabilities. Forecasts were generated on the LIDAR, which arrives at 12Hz, and displayed on both the LIDAR and images from the ego-car at 5Hz.

PREdiction Conditioned on Goals (PRECOG) Visualizations

ESP forecasts (unconditional). Car 1 is predicted to travel ahead of Car 2.

PRECOG (condition with Blue's final pos.). Car 1 is predicted move forward and stop at an earlier position, and Car 2 is predicted to start moving.

ESP forecasts (unconditional). Car 1 is predicted to begin turning, and Car 2 will move to take its place.

PRECOG (condition with Blue's final pos.). Both Car 1 and Car 2 are predicted to stop.

ESP forecasts (unconditional). Car 1 and Car 2 are predicted to travel slowly.

PRECOG (condition with Blue's final pos.). Car 1 and Car 2 are predicted to travel more quickly.

ESP forecasts (unconditional). Car 1 is predicted to closely follow Car 2.

PRECOG (condition with Blue's final pos.). Car 1 is predicted to give more space to Car 2, and Car 2 is predicted to sometimes use this space to travel forwards instead of turning.

Estimating Social-Forecast Probabilities (ESP) Visualizations

Architecture

Architecture of our Estimating Social-forecast Probabilities (ESP) model. A convolutional neural network processes scene information to inform a factorized flow-based latent generative model over each agent's future motion. See the paper and appendix for details.

Graphical ModeLs

Left: ESP models an evolution of stochastic interactions between all agents. Right: PRECOG is performed by planning the latent decisions for one agent, which influences the evolution of the stochastic predictions.

PREdiction Conditioned on Goals in Visual Multi-agent Settings