Active Visual Planning (AVP) is a new method for prediction and planning under uncertainty that works by modeling the future detection of unseen agents or actors.
Code: [public github redacted - anonymized google drive link provided for review]
Abstract: Visual occlusions necessitate that agents reason not only about the intentions of observed agents, but also the potential presence, positions, and intentions of unobserved agents. In this paper, we present a new method for prediction and planning in complex, partially observed multi-agent environments involving uncertainty in existence, intentions, and positions. Our approach, Active Visual Planning (AVP), uses high-dimensional observations to learn a flow-based generative model of multi-agent joint trajectories, including unobserved agents that may exist and be revealed in the near future. Our predictive model is implemented using deep neural networks that map raw observations to future trajectories and is learned entirely offline using a dataset of recorded observations. Once learned, our predictive model can be used for contingency planning over the potential existence, intentions, and positions of unobserved agents, allowing an agent to take actions that reduce its uncertainty over occluded agents that may prove dangerous in the future if not properly accounted for in the present. We demonstrate the effectiveness of AVP on a set of autonomous driving environments inspired by real-world scenarios that require reasoning about the existence of other unobserved agents for safe and efficient driving. In these environments, AVP achieves optimal performance, while methods that do not reason about potential unobserved agents exhibit either overconfident or underconfident behavior.
Navigating a blind intersection requires prediction and planning under perception-induced uncertainty: an expert driver (red) should not only predict that a vehicle (blue) may emerge from the occluded area, but should also understand that inching forward into the intersection will reduce uncertainty regarding oncoming traffic by improving visibility, thus enabling the driver to safely execute a turn. AVP enables prediction and planning with respect to unobserved agents, including actively planning to reduce future uncertainty due to perception.
Method overview
A predictive model (consisting of a convolutional context encoder, recurrent trajectory encoder, and recurrent trajectory decoder) is trained on a dataset of human driving. Importantly (and unlike prior works), this dataset can be partially observed: cars can come in-and-out of view and be assigned partially-labelled trajectories. Additionally, the model can predict other actors going in and out of view by predicting their future detection.
At test-time, the predictive model is fed a snapshot of the world state (observed past trajectories and a current visual context, such as a LiDAR point cloud or RGB image). The model can be used to predict a distribution of future partially observed joint trajectories. To model can also be used to generate contingent plans by optimizing for a predicted future trajectory against a loss function - for example, optimizing for a trajectory that reaches a certain goal destination while avoiding collisions.
Graphical model:
The AVP predictive model can be represented with a graphical model. AVP is a co-influence model that allows for co-leader planning: optimizing for zr plans a policy modelled to influence and be influenced by the stochastic behavior of the human. Importantly, AVP also models the influence of both the human's position Xh and robot's position Xr on the human's detection Dh.
Modelling this relationship enables planning a future robot position that reduces the robot's uncertainty of the human agent's existence. This graphical model is implemented via the trajectory decoder in the system figure above, and is used for planning at test-time with the gradient-based policy planner.
Using AVP for planning:
AVP can be used for planning - by optimizing for robot z's and sampling the z's of non-controllable agents, AVP can generate trajectory-level plans contingent on the existence and stochastic behavior of other agents.
Using AVP for closed-loop control:
AVP easily be applied to control through a combination of replanning and following a trajectory-level plan with a P-controller.
Environments
We present a set of environments in CARLA that are designed to evaluate the ability of agents to predict and plan with respect to unobserved agents:
overtake, where the ego vehicle attempts to overtake a truck with limited visibility of oncoming traffic
blind summit, where the ego vehicle ascends a steep hill with limited visibility of traffic beyond the hill apex
intersection, where the ego vehicle attempts to cross an intersection with potential occluded traffic on the left and/or right side
Decision matrices
During data collection, each scenario has a fixed number of outcomes that occur with certain frequencies. Each scenario mode (i.e., outcome) corresponds to a realistic joint trajectory that could occur between human drivers. The decision matrices specify the total number of modes in the expert dataset (used to train the learned generative trajectory model). The columns specify the behavioral policy (an underconfident, overconfident, or optimal driver), and the rows specify the existence of another actor in the scene.
Overtake scenario outcomes
Note: LiDAR point cloud (blue dots) and "detection line" (not used by the model) are added for viewing clarity
Blind summit scenario outcomes
Intersection scenario outcomes