Flow: Architecture and Benchmarking for Reinforcement Learning in Traffic Control
Authors: Cathy Wu, Aboudy Kreidieh, Kanaad Parvate, Eugene Vinitsky, and Alexandre Bayen
Flow is a new computational framework, built to support a key need triggered by the rapid growth of autonomy in ground traffic: controllers for autonomous vehicles in the presence of complex nonlinear dynamics in traffic. Leveraging recent advances in deep Reinforcement Learning (RL), Flow enables the use of RL methods such as policy gradient for traffic control and enables benchmarking the performance of classical (including hand-designed) controllers with learned policies (control laws). Flow integrates traffic microsimulator SUMO with deep reinforcement learning library rllab and enables the easy design of traffic tasks, including different networks configurations and vehicle dynamics. We use Flow to develop reliable controllers for complex problems, such as controlling mixed-autonomy traffic (involving both autonomous and human-driven vehicles) in a ring road. For this, we first show that state-of-the-art hand-designed controllers excel when in-distribution, but fail to generalize; then, we show that even simple neural network policies can solve the stabilization task across density settings and generalize to out-of-distribution settings.
Flow is open-source and available at: https://github.com/cathywu/flow
Stabilizing a single-lane ring
Here we see a simulation of the famous experiment by Sugiyama et al., in which 22 vehicles in a 230m ring road lead to instabilities called "stop-ang-go-waves". This reproduced result is followed by 4 different controllers (2 learned, 2 explicit controllers from the literature), which replace 1 of the vehicles with a controlled vehicle. Some of the videos are shown for 260m (the length at which the explicit controllers are calibrated) and others show rollouts with varying densities (different lengths). The GRU policy performs the best overall, and it is able to stabilize ring sizes even outside of the training regime.
Flow can be used to train vehicles to solve other tasks and improve network performance. Below, a sequence of adjacent autonomous vehicles learn to platoon together to improve the average velocity of the human-driven vehicles.
Autonomous vehicles can be trained to handle various networks as well. Below, the loop is augmented with an intersection creating a "figure 8" network. In this network, autonomous vehicles learn to handle the added intersection by bunching all vehicles together and even weaving the autonomous vehicles at the intersection when all vehicles are autonomous.