Abstract
Multi-agent interactions are important to model for forecasting other agents' behaviors and trajectories. At a certain time, to forecast a reasonable future trajectory, each agent only needs to pay attention to the interactions with a small group of most relevant agents instead of unnecessarily paying attention to all the other agents. However, prior attention modeling ignores an important prior that human attention in driving does not change rapidly and may introduce fluctuating attention across time steps. In this paper, we formulate an attention model for multi-agent interactions based on a total variation smoothness prior and propose a trajectory prediction architecture that leverages the knowledge of these attended interactions. We demonstrate how the total variation attention prior along with the new loss terms leads to smoother attention and more sample-efficient learning of multi-agent prediction and show its accuracy by comparing its results to state-of-the-art prediction approaches on both synthetic scenarios and naturalistic driving data.
Architecture
The architecture of the proposed method. We show the prediction process of a node (node 1) in a scene with 3 agents. The solid green edges indicate the interaction between different agents with the states of the connected nodes as the input. The dashed green arrows show the RNN for the solid green edges. The solid blue edges indicate the interaction between consecutive steps with the states of consecutive steps as the input. The dashed blue arrows show the RNN for the solid blue edges. The solid red arrows indicate the attention module. The solid black arrows show the input and output of the RNN to predict the location of node 1.
Smoothness Attention Prior
We posit that human attention does not change frequently over time. Our hypothesis is based on two factors: First, cognitive science literature suggests that although human attention can change rapidly when running freely without specific instructions, deliberate movement of attention is significantly slower because of an internal limit on the speed of volitional commands. This suggests that human social attention does not change frequently in driving, as it falls into the category of deliberate movement. Second, in driving scenarios, the most relevant vehicles to pay attention to are often the ones that can immediately affect the ego agent or the ones that are affected by the ego agent. This group of agents often do not change rapidly since the reward function typically consists of terms related to the distance to the goal and proximity to other agents, which mostly continuously change along time steps. Hence, we impose a smoothness constraint on the attention, which is defined as a vectorial total variation penalty:
Results on Synthetic Datasets
Double Merge Scenario: Two vehicles in two lanes (green and brown) want to change to each other's lane, which may cause a collision. We add 20 other vehicles (white) that do not influence these two vehicles to make the scene more complex, requiring the two vehicles to attend to each other instead of these other 20 vehicles.
For the ground-truth policy for each vehicle, the main vehicle whose initial location is behind the other main vehicle waits until the other finishes its lane change, and then starts its own lane change. The other vehicles drive straight with a constant speed on the outer two lanes.
We aim to show the regularization reduces the sample complexity, and improves performance especially for the rare events where data is limited. For this, we assume the case the green vehicle starts behind the brown vehicle is a major case with significant amount of data, while the other scenario as the minor case with only limited data.
We collect 50 trajectories from the major case for training. For the minor case, we vary the size of the dataset by collecting 10%, 30%, 60% and 80% of the number of major case trajectories.
Halting Car Scenario: Two main vehicles are initialized in the same lane while one vehicle (the leader) is in front of the other (the follower). We create two scenarios: `Go', where the leader vehicle drives with a constant speed, and `Stop', where the leader vehicle suddenly stops and then accelerates back. The follower vehicle aims to follow the leader vehicle safely, and thus needs to react to the potential slowing down behavior of the leader. We also add 20 other cars to increase complexity.
For the ground-truth policies, we let the leader vehicle slow down to stopping and then accelerate in the `Stop' case. The follower vehicle slows down or accelerates depending on its distance to the leader vehicle. In the `Go' case, both the leader and the follower vehicles drive with a constant speed. In both cases, the other vehicles drive straight with a constant speed on the outer lanes.
We simulate the rare events by using either setting (`Stop'/`Go') as the major case and the other as the minor case. We collect 50 trajectories for the major case for training, and collect 30% of the major case trajectories for the minor case.
Results on INTERACTION Datasets
Sampled Trajectories
Intersection
Ours (Zoom in)
Ours
Social Attention (Zoom in)
Social Attention
Merging
Ours (Zoom in)
Ours
Social Attention (Zoom in)
Social Attention
Roundout
Ours (Zoom in)
Ours
Social Attention (Zoom in)
Social Attention