PeRP: Personalized Residual Policies For Congestion Mitigation Through Co-operative Advisory Systems

Aamir Hasan, Neeloy Chakraborty, Haonan Chen, Jung-Hoon Cho,
Cathy Wu, and Katherine Driggs-Campbell

Human-Centered Autonomy Lab

University of Illinois Urbana-Champaign and Massachusetts Institute of Technology


Presented at ITSC 2023

[Paper] [Supplementary Material] [Code]

Abstract

Intelligent driving systems can be used to mitigate congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these systems assume precise control over autonomous vehicle fleets, and are hence limited in practice as they fail to account for uncertainty in human behavior. Piecewise Constant (PC) Policies address these issues by structurally modeling the likeness of human driving to reduce traffic congestion in dense scenarios to provide action advice to humans. However, PC policies assume that all drivers behave similarly. To this end, we develop a co-operative advisory system based on PC policies with a novel driver trait conditioned Personalized Residual Policy, PeRP. PeRP advises drivers to behave in ways that mitigate traffic congestion. We first infer the driver’s intrinsic traits on how they follow instructions in an unsupervised manner with a variational autoencoder. Then, a policy conditioned on the inferred trait adapts the action of the PC policy to provide the driver with a personalized recommendation. Our system is trained in simulation with novel driver modeling of instruction adherence. We show that our approach successfully mitigates congestion while adapting to different driver behaviors, with 4 to 22% improvement in average speed over baselines.


Overview

PeRP augments general advised instructions to provide personalized recommendations to drivers of varying driving styles (Ex: an aggressive driver (a) or conservative driver (b)) to mitigate congestion (c).

PeRP

PeRP appends a residual action, aPeRP, to the PCP action, aPCP, while conditioned on the driver trait, z, to produce an advised action, aadvised. The driver considers the advised action and takes an action adriver in the environment.

Driver Trait Inference

The figure below shows the Driver Trait Inference VAE Model. The input trajectory x is encoded as zμ and zσ before parameterization as z using the encoder network. The decoder network uses this latent vector z to reconstruct the input trajectory as x.

Results

The figure below provides a visualization of the latent space of the driver trait inference VAE on the validation dataset. We show the latent points for the different trait means, traitμ, in different colors.

The following figure shows the difference between episodes that use advice from PeRP and PCP respectively, where the red trajectories represent the ego vehicle that is advised by the policy. We see that PeRP successfully mitigates congestion when compared to PCP.

Further velocity analysis also shows that PeRP has improved performance on average in all test cases as well, as can be seen in the paper and supplementary material. The figure below corroborates the same.

Citation

@inproceedings{hasan2023perp,
    title={{PeRP}: Personalized Residual Policies For Congestion Mitigation Through Co-operative Advisory Systems},
    author={Hasan, Aamir and Chakraborty, Neeloy and Chen, Haonan and Cho, Jung-Hoon and Wu, Cathy and Driggs-Campbell, Katherine},
    booktitle={IEEE International Conference on Intelligent Transportation Systems (ITSC)},
    year={2023}
}