Learning energy-efficient driving behaviors by imitating experts
Authors: Abdul Rahman Kreidieh, Zhe Fu, and Alexandre M. Bayen
Authors: Abdul Rahman Kreidieh, Zhe Fu, and Alexandre M. Bayen
The rise of vehicle automation has generated significant interest in the potential role of future automated vehicles (AVs). In particular, in highly dense traffic settings, AVs are expected to serve as congestion-dampeners, mitigating the presence of instabilities that arise from various sources. However, in many applications, such maneuvers rely heavily on non-local sensing or coordination by interacting AVs, thereby rendering their adaptation to real-world settings a particularly difficult challenge. To address this challenge, this paper examines the role of imitation learning in bridging the gap between such control strategies and realistic limitations in communication and sensing. Treating the controllers above as ``experts", we demonstrate that imitation learning strategies can succeed in deriving policies that improve the performance of a given network over a wide variety of traffic conditions using only local observations.
All results presented in here are reproducible from: https://github.com/AboudyKreidieh/il-traffic
We explore the problem of dissipating congestion in multi-lane highways. The specific network utilized, seen below, is a simulated reconstruction of the I-210 network in Los Angeles, CA, USA. Within this network, congestion is produced via an imbalance between the inflow and outflow conditions brought about by a mock congested downstream edge. This imbalance produces pockets of dense traffic near the rightmost edge, which coupled with the string-unstable behavior of human drivers, characterized by periodic, sudden, and at times sharp, oscillations in driving speeds. These waves can be seen in the video below, and are represented as the red diagonal streaks within the time-space diagram.
This behavior is also consistent across the range of explored boundary conditions, as expressed in the figures below. These oscillations produce large desired accelerations by the individual vehicles, which reduce the energy efficiency and safety of a given network. We provide a few metrics below as a means of comparison with controlled settings.
Time-space diagrams for the middle lane for different traffic conditions
Average acceleration for different traffic conditions
Energy efficiency for different traffic conditions
To mitigate the formation of congestion within this network, we take inspiration from recent work on dissipating stop-and-go traffic in ring road settings. The findings from this work suggest that a subset of vehicles, driving at the equilibrium speed of a given network, can dissipate the formation and propagation of waves within a given network. When adapting this controller to the more generalized multi-lane highway settings, we find that assigning an equilibrium speed value matching the downstream flow of traffic produces significant reductions in stop-and-go behaviors while maintaining a high throughput. Under this policy, the vehicle similar to a ramp metering system, constraining the flow of vehicles to match the largest policy outflow and preventing a buildup of dense traffic in the process.
This behavior can be seen in the video and figure below, with the automated vehicles depicted in red. As you can see, the vehicles are now more uniformly distributed throughout the network, highlighting the absence of stop-and-go instabilities.
This behavior, once again, is consistent across a number of boundary conditions, produce an approximately 60 % reduction of accelerations and a 15 % improvement in energy-efficiency. This approach, however, relies on often unavailable estimates of downstream traffic, which makes it difficult to adapt to real-world settings.
Time-space diagrams for the middle lane for different traffic conditions
Average acceleration for different traffic conditions
Energy efficiency for different traffic conditions
To absolve the need for downstream knowledge when deploying the above controller, we look to present techniques within the field of imitation learning. Through these techniques, we aim to map the original behaviors by the expert in Section 2 above to a neural network policy whose states consist solely of information which is perceptible to individual vehicles. The video below demonstrates the results from this imitation learning procedure. As we can see, the imitated policy largely succeeds in dissipating the formation and propagation of waves.
This behavior is consistent across all conditions and importantly, in terms of variability in accelerations and energy-efficiency, matches the performance of the expert policy. This suggests the the imitation learning procedure succeeds in deriving a policy that drives as though it is knowledgeable of the downstream state of traffic.
Time-space diagrams for the middle lane for different traffic conditions
Average acceleration for different traffic conditions
Energy efficiency for different traffic conditions
Standard imitation learning procedures alone do not succeed in producing a policy that meaningfully matches the original expert. In this section, we describe a number of techniques that were deployed to allow for meaningful successes by the algorithm, and provide simulations and plots of the learned behaviors when they are ablated.
The first limiting factor accounts for features within the expert model that render in difficult to imitate. In particular, when errors occur by the imitated policy in the form of large gaps, the expert policy in its original functional form does not present mechanism to recover from them. This ability to recover from error is a key component of ensuring safety and robustness in any imitation learning procedure. To mitigate this issue, we introduce a simply improvement to the expert that informs the imitated policy how to reduce large gaps. When this augmentation is not introduced however, the learned policy, as seen in the video below, learns to form large gaps with leader vehicles whenever possible, resulting in the formation of more pronounced waves in adjacent lanes.
The second limitation account for difficulties associated with estimating non-local state information from local state observations. Seen in the figure below, when only using current time-step information, the imitated policy, while partially dissipating waves, does not succeed in matching the performance of the expert policy, and in fact does 15 % worse in terms of accelerations and subsequent oscillations in speed. To account for this, we introduce temporal knowledge to the state of the imitated policy in the form of observations from previous time steps, or frames. This allows the policy in Section 3 to match the performance of the expert.