OTVIC: A Dataset with Online Transmission for Vehicle-to-Infrastructure Cooperative 3D Object Detection
IROS 2024 Oral
IROS 2024 Oral
Vehicle-to-infrastructure cooperative 3D object detection (VIC3D) is a task that leverages both vehicle and roadside sensors to jointly perceive the surrounding environment. However, considering the high speed of vehicles, the real-time requirements, and the limitations of communication bandwidth, roadside devices transmit the results of perception rather than raw sensor data or feature maps in our real-world scenarios. And affected by various environmental factors, the transmission delay is dynamic. To meet the needs of practical applications, we present OTVIC, which is the first multi-modality and multi-view dataset with online transmission from real scenes for vehicle-to-infrastructure cooperative 3D object detection. The ego-vehicle receives the results of infrastructure perception in real-time, collected from a section of highway in Chengdu, China. Moreover, we propose LfFormer, which is a novel end-to-end multi-modality late fusion framework with transformer for VIC3D task as a baseline based on OTVIC. Experiments prove our fusion framework's effectiveness and robustness.
In OTVIC dataset, all files are named with timestamp.
In the annotation folder:
There are three subfolders train, val and test in the annotation folder, which provide the label file xxx.json corresponding to training, verification and testing.
In the data folder:
can_bus: The xxx.json file provides the movement and localization data of ego-vehicle.
image: There are four subfolders namely Forward, Barkward, Left and Right, which provide the original image xxx.png file.
pointcloud: The xxx.pcd file provides the original point cloud data.
road: The xxx.json file provides infrastructure perception data received online by ego-vehicle.
Please refer to the paper for more details.