Dissipating stop-and-go waves in closed and open networks via deep reinforcement learning
Authors: Abdul Rahman Kreidieh, Cathy Wu, and Alexandre M Bayen
Authors: Abdul Rahman Kreidieh, Cathy Wu, and Alexandre M Bayen
We demonstrate the ability for model-free reinforcement learning (RL) techniques to generate traffic control strategies for connected and autonomous vehicles (CAVs) in various network geometries. This method is demonstrated to achieve near complete wave dissipation in a straight open road network with only 10% CAV penetration, while penetration rates as low as 2.5% are revealed to contribute greatly to reductions in the frequency and magnitude of formed waves. Moreover, a study of controllers generated in closed network scenarios exhibiting otherwise similar densities and perturbing behaviors confirms that closed network policies generalize to open network tasks, and presents the potential role of transfer learning in fine-tuning the parameters of these policies.
All results presented in here are reproducible from: https://github.com/flow-project/flow
Note: In all videos the following colors apply:
Red: autonomous vehicles
Blue: observed human-driven vehicles
White: unobserved human-driven vehicles
We simulate the effect of autonomous vehicles on a straight highway network for a multitude of CAV penetration rates. In the absence of autonomous vehicles, the network exhibits properties of convective instability, with perturbations propagating upstream from the merge point before exiting the network. As the percentage of autonomous penetration increases, the waves are increasingly dissipated, with virtually no waves propagating from the merge at 10% autonomy. Moreover, in terms of mobility, we witness a 13% increase in throughput as the portion of autonomous vehicles in the network increases from 0% to 10%, with vehicles on average moving at almost twice their previous velocities.
We attempt to learn a policy on a closed network ring that generalizes to open network settings. The ring has a circumference of 1400 m and a total of 50 vehicles, approximately matching the densities in straight highway simulations. In order to reconstruct the effects of the on-merge, vehicles closest to an arbitrary fixed point are periodically perturbed with a frequency equal to that on the merge inflow rate. Finally, in order to account for variability in the number of autonomous vehicles, the RL agent can only perceive and control vehicles within a controllable region of length equal to that of the highway. In all other regions of space, the AVs act as human-driven vehicles.
During the RL training process, autonomous vehicles are initially trained in the ring road with the same actions, observations, and rewards described. Then after a predefined number of iterations, the network is replaced with the previously described straight highway network, and training is continued. The learned policy in the ring road can be seen below.
Comparing the transfer learning method mentioned above against purely training the RL agent in the straight highway, we find that the policy learned on the ring road initially outperforms human-driven dynamics in the straight highway network, thereby acting as a "warm start" to the RL training process, which then continues to optimize the controller parameters for the straight highway. This suggests that the MDP structures presented in closed and open networks are sufficiently similar for control strategies developed in either to be somewhat interchangeable. A figure of the training performance of the two strategies is depicted below.