Arun Kumar Singh
Jatan Shrestha
Nicola Albarella
Autonomous driving has a natural bi-level structure. The goal of the upper behavioural layer is to provide appropriate lane change, speeding up, and braking decisions to optimize a given driving task. However, this layer can only indirectly influence the driving efficiency through the lower-level trajectory planner, which takes in the behavioural inputs to produce motion commands. Existing sampling-based approaches do not fully exploit the strong coupling between the behavioural and planning layer. On the other hand, end-to-end Reinforcement Learning (RL) can learn a behavioural layer while incorporating feedback from the lower-level planner. However, purely data-driven approaches often fail in safety metrics in unseen environments. This paper presents a novel alternative; a parameterized bi-level optimization that jointly computes the optimal behavioural decisions and the resulting downstream trajectory. Our approach runs in real-time using a custom GPU-accelerated batch optimizer and a Conditional Variational Autoencoder (CVAE) learnt warm-start strategy. Extensive simulations show that our approach outperforms state-of-the-art Model Predictive Control (MPC) and RL approaches in terms of collision rate while being competitive in driving efficiency.
Existing works draw behavioral inputs p from a distribution and solve a simple QP trajectory planner for all those inputs. However, there is no mechanism to modify the behavioral input sampling based on the performance of the lower-level planner on the driving task. We address this issue by adding a gradient estimation block and a projection operator to aid constraint satisfaction. We present a novel approach that estimates the direction in which the behavioral inputs need to be perturbed to improve the optimality of the lower-level tra- trajectory with respect to the driving task.
Our bi-level optimizer ensures safe driving in dense and potentially rash traffic scenarios.
In this work, we propose a bi-level optimization that combines Quadratic Programming (QP) with gradient-free optimization for solving the bi-level problem. Moreover, we train a Conditional Variational Autoencoder (CVAE), embedded with a differentiable optimization layer, for warm-starting our bi-level optimizer. Our approach outperforms state-of-the-art MPC and RL-based baselines in safety metrics in dense traffic.
At the core of our Behavioral Cloning framework lies a combination of feedforward and differentiable optimization layers, conditioned on the observations o, to produce behavioral inputs p that leads to an optimal trajectory as close as possible to expert trajectory demonstrations. CVAE is trained to approximate the complex underlying distribution of optimal trajectories from expert demonstrations that subsumes the behavioral inputs p. The samples drawn from this learned distribution initialize our bi-level optimizer.
We evaluate MPC-Bi-Level alongside MPC-baselines comprising Vanilla, Grid, Random and Batch variants on a benchmarking suite of highway-driving scenarios. We also compare our approach against RL-baselines, DQN and PPO on highway-driving scenarios with varying traffic density.