Learning to Navigate Intersections

with Unsupervised Driver Trait Inference

Abstract

Navigation through uncontrolled intersections is one of the key challenges for autonomous vehicles. Identifying the subtle differences in hidden traits of other drivers can bring significant benefits when navigating in such environments. We propose an unsupervised method for inferring driver traits such as driving styles from observed vehicle trajectories. We use a variational autoencoder with recurrent neural networks to learn a latent representation of traits without any ground truth trait labels. Then, we use this trait representation to learn a policy for an autonomous vehicle to navigate through a T-intersection with deep reinforcement learning. Our pipeline enables the autonomous vehicle to adjust its actions when dealing with drivers of different traits to ensure safety and efficiency. Our method demonstrates promising performance and outperforms state-of-the-art baselines in the T-intersection scenario.

Trait Representation Learning

We assume that all drivers are either conservative or aggressive. The conservative drivers will yield to the ego car if the ego car cuts in front of them, while the aggressive drivers will ignore the ego car.

To encode the latent traits of drivers, we first collect a dataset of driver trajectories without trait labels from the simulator. Then, we use a variational autoencoder with recurrent neural networks (VAE+RNN) to learn a 2D representation of trajectories.

VAE+RNN network

In training, the VAE+RNN network gradually learns to separate the two driver traits into different clusters.

Visualization of the training progress

Navigation through the T-intersection

Using the trait representation as input, the ego car learns a policy to navigate through the uncontrolled T-intersection with model-free reinforcement learning. The policy network is a gated recurrent unit (GRU) with an attention module, which assigns attention weights to other cars.

GRU+Attention policy network

We show three experiment settings with different proportions of conservative and aggressive cars. P(conservative) denotes the probability of each driver to be conservative.

After training, the ego car is able to wait until a conservative car appears, cut in the front of the conservative cars when passing both lanes, and complete the right turn.

P(conservative) = 0.25

P(conservative) = 0.4

P(conservative) = 0.6

Demo Video