Structural Attention-Based Recurrent Variational Autoencoder for Highway Vehicle Anomaly Detection

Abstract

In autonomous driving, detection of abnormal driving behaviors is essential to ensure the safety of vehicle controllers. Prior works in vehicle anomaly detection have shown that modeling interactions between agents improves detection accuracy, but certain abnormal behaviors where structured road information is paramount are poorly identified, such as wrong-way and off-road driving. We propose a novel unsupervised framework for highway anomaly detection named Structural Attention-based Recurrent VAE (SABeR-VAE), which explicitly uses the structure of the environment to aid anomaly identification. Specifically, we use a vehicle self-attention module to learn the relations among vehicles on a road, and a separate lane-vehicle attention module to model the importance of permissible lanes to aid in trajectory prediction. Conditioned on the attention modules' outputs, a recurrent encoder-decoder architecture with a stochastic Koopman operator-propagated latent space predicts the next states of vehicles. Our model is trained end-to-end to minimize prediction loss on normal vehicle behaviors, and is deployed to detect anomalies in (ab)normal scenarios. By combining the heterogeneous vehicle and lane information, SABeR-VAE and its deterministic variant, SABeR-AE, improve abnormal AUPR by 18% and 25% respectively on the simulated MAAD highway dataset. Furthermore, we show that the learned Koopman operator in SABeR-VAE enforces interpretable structure in the variational latent space. The results of our method indeed show that modeling environmental factors is essential to detecting a diverse set of anomalies in deployment.

Method and Architecture

Given trajectories of vehicles on a road, and a discretized map representing the legal maneuvers possible for each vehicle at every timestep, the purpose of SABeR-VAE is to output an anomaly score for each timestep. The higher the output score, the more confident the model is that there is an out of distribution behavior occurring at that time. The model is trained to reconstruct trajectories from normal scenarios unsupervised, such that abnormal trajectories will be reconstructed poorly at test time, producing higher anomaly scores in deployment.

SABeR-VAE consists of five primary modules:

  • Vehicle-Vehicle Self-Attention

  • Lane-Vehicle Attention

  • GRU Encoder

  • Stochastic Koopman Propagation

  • MLP Decoder

First, the GRU encoder embeds a window of trajectories into a variational latent space conditioned on the vehicle-self-attention module. Then, the Koopman operator propagates the encoded distribution forward in time conditioned on the lane-vehicle attention output. Finally, a simple multi-layer perceptron decodes the propagated distribution into a predicted position for a vehicle. The error between the predicted and ground-truth positions is used to calculate an anomaly score for each timestep in a scenario.

Anomaly Detection of Multi-Agent Trajectories in the MAAD Dataset

We showcase a normal scenario of an overtake, and two abnormal scenarios of a vehicle going off-road and in the wrong direction below. The anomaly score graph produced by SABeR-VAE is shown below each trajectory scenario. The model outputs high anomaly scores for abnormal situations that it did not see in training, like going off-road or driving in the wrong direction.

Normal Overtake

Abnormal Off-Road

Abnormal Wrong Direction

Propagated Latent Space Interpretability

We also identified that our architecture learns interpretable features in the latent space encoding expected (normal) behaviors of vehicles on the road. Below, we visualize a window trajectory of a normal sequence and an abnormal wrong-direction scenario. As soon as a vehicle begins to drive abnormally, its corresponding prediction loss spikes and its latent encoding jumps to another cluster representing another class of behaviors. Interpretability is useful for anomaly detection methods to identify why the model detects an out of distribution behavior, and to perform root cause analysis of said detections.

Normal Driving

Abnormal Wrong Direction

The pink vehicle's trajectory is always encoded into the top-most cluster of latent behaviors because it drives in the correct (lawful) direction for the duration of the window sequence that is expected of vehicles in the bottom lanes of the road.




The pink vehicle is initially encoded to the top-most cluster. At time t = 4, the pink car crosses the divider into the wrong direction. Consequently, its latent trajectory jumps to the middle cluster representing the expected behaviors of vehicles in the top lanes of the road.

Recommended Citation