Chuyang Xiao* , Dawei Wang* , Xinzheng Tang , Jia Pan, Yuexin Ma
This paper presents a mixed traffic control policy designed to optimize traffic efficiency across diverse road topologies, addressing issues of congestion prevalent in urban environments. A model-free reinforcement learning (RL) approach is developed to manage large-scale traffic flow, using data collected by autonomous vehicles to influence humandriven vehicles. A real-world mixed traffic control benchmark is also released, which includes 444 scenarios from 20 countries, representing a wide geographic distribution and covering a variety of scenarios and road topologies. This benchmark serves as a foundation for future research, providing a realistic simulation environment for the development of effective policies. Comprehensive experiments demonstrate the effectiveness and adaptability of the proposed method, achieving better performance than existing traffic control methods in both intersection and roundabout scenarios. To the best of our knowledge, this is the first project to introduce a real-world complex scenarios mixed traffic control benchmark.
Benchmark
Here we introduce a real-world mixed traffic control benchmark dataset comprising 444 dynamic scenarios across 111 distinct road topologies from 20 countries worldwide. The dataset captures diverse intersection and roundabout configurations, systematically categorized by road geometry to ensure balanced representation. Each scenario features high-fidelity traffic flows with demand levels ranging from 400 to 5000 vehicles/hour, simulating realistic density variations and complex interactions.
Method
LEFT: The green car within the yellow square represents the ego robot vehicle (RV), which independently collects traffic information via its local perception system and autonomously decides its acceleration.
RIGHT: The policy observation is depicted. The vehicles surrounding the ego RV may be either robot vehicles (RVs: Green) or human-driven vehicles (HVs: White), but they will all be considered in the observation of the ego.
Our method accounts for vehicles positioned both in front of the ego RV, represented by the light green area, and behind it, represented by the light red area. For each observed vehicle, its relative velocity and position to the ego vehicle are encoded into the observation.
Results
The overall results measured in average wait time (s) and throughput rate (10−3) between our method and four baseline methods on the whole test set, intersection subset and roundabout subset. In instances where a method was inapplicable to certain test sets, the corresponding result was left blank.
We evaluated our proposed method along with four baseline methodologies, under two distinct levels of traffic demand and two types of road topologies. The throughput rate (10−3) served as the evaluation metric in this assessment.
Citation:
@article{xiao2025optimizing,
title={Optimizing Efficiency of Mixed Traffic through Reinforcement Learning: A Topology-Independent Approach and Benchmark},
author={Xiao, Chuyang and Wang, Dawei and Tang, Xinzheng and Pan, Jia and Ma, Yuexin},
journal={arXiv preprint arXiv:2501.16728},
year={2025}
}