Accepted at CoRL 2022 as oral presentation (acceptance rate 6.5%)
Best Paper Awards Finalist
paper and reviews: https://openreview.net/forum?id=DE8rdNuGj_7
CoRL Oral talk: https://youtu.be/56LzTZfwY2Q?t=2859
Uncertainty on human behaviors poses a significant challenge to autonomous driving in crowded urban environments. The partially observable Markov decision processes (POMDPs) offer a principled framework for planning under uncertainty, often leveraging Monte Carlo sampling to achieve online performance for complex tasks. However, sampling also raises safety concerns by potentially missing critical events. To address this, we propose a new algorithm, LEarning Attention over Driving bEhavioRs (LEADER), that learns to attend to critical human behaviors during planning. LEADER learns a neural network generator to provide attention over human behaviors in real-time situations. It integrates the attention into a belief-space planner, using importance sampling to bias reasoning towards critical events. To train the algorithm, we let the attention generator and the planner form a min-max game. By solving the min-max game, LEADER learns to perform risk-aware planning without human labeling.
Overview of LEADER. (a) LEADER contains a learning component (red) and a planning component (blue). The learning components include: an attention generator that tries to generate attention q over human behaviors, based on the current belief b and observation z from the environment; and a critic that approximates the planner's value estimate based on b, z and the generated attention, q. The planning component performs risk-aware planning using the learned attention q. It decides an action a to be executed in the environment and collects experience data. (b) Attention is defined as an importance distribution over human behavioral intentions. The upper box shows the probability of different intentions of the highlighted exo-agent in green, yellow and red, as well as how the attention generator maps the natural-occurrence probabilities to importance probabilities, by highlighting the most adversarial intention (red). (c) We train LEADER using three simulated real-life urban environments: Meskel Square in Addis Ababa, Ethiopia, Magic Roundabout in Swindon, UK, and Highway in Singapore.
(a)
(b)
(c)
Visualization of the learned attention in: (a) highway (b) Meskel square (c) Magic. We highlighted one exo-agent in blue in each scene. Learned attention over its intentions are color-coded: green, yellow, red, purple, sorted from low attention to high attention. For other exo-agents, we only show the most-attended intention with dotted lines.