AD4RL: Autonomous Driving Benchmarks for Offline Reinforcement Learning with Value-based Dataset

Dongsu Lee, Chanin Eom, and Minhae Kwon
Brain and Machine Intelligence Lab., Soongsil University

Paper

Code

Dataset

Abstract

Offline reinforcement learning has emerged as a promising technology by enhancing its practicality through the use of pre-collected large datasets. Despite its practical benefits, most algorithm development research in offline reinforcement learning still relies on game tasks with synthetic datasets. To address such limitations, this paper provides autonomous driving datasets and benchmarks for offline reinforcement learning research. We provide 19 datasets, including real-world human driver's datasets, and seven state-of-the-art offline reinforcement learning algorithms in three realistic driving scenarios. We also provide a unified decision-making process model that can operate effectively across different scenarios, serving as a reference framework in algorithm design. Our research lays the groundwork for further collaborations in the community to explore practical aspects of existing reinforcement learning methods.

Which advantage does AD4RL give us?

Offline Reinforcement Learning:
Offline paradigms can enhance the practicality and reality of reinforcement learning. More precisely, online RL poses practical challenges despite its achievement in virtual domains. Firstly, trial-and-error learning can lead to financial loss and social upheaval in mission-critical systems, such as car accidents. Secondly, policy training based on simulations has an inherent gap between the simulator and real-world dynamics, which results in limited performance. Lastly, the learning style based on active data collection, i.e., online interaction between the agent and the environment, hampers the ability to exploit vast previously collected datasets. Addressing these challenges is critical to realizing the full potential of reinforcement learning algorithms.

Trajectory-based Autonomous Driving Dataset:
The predominant approach in this realm relies on image-based data for observation. This approach offers the advantage of an end-to-end pipeline, where neural networks handle the entire process from perception to decision-making. However, this end-to-end methodology poses challenges in terms of interpretability, making it challenging to pinpoint the source of failures.
In contrast, the value-based approach (trajectory-based approach) allows researchers to focus more directly on enhancing the policy's decision-making capabilities without considering image processing ability. It is based on the assumption that the input data (e.g., HD maps, images, physical status, and sensor data) is completely processed and integrated. 

Unified Partially Observable Markov Decision Process Model:
Existing autonomous driving research relies on decision-making models that are overfitted to specific driving environments. Driving policies learned from these models may fail in a variety of driving environments. In this work, we propose a unified decision-making model that is applicable in a variety of road environments.

Summary Video

Cut-in Scenario                                                             Lane Reduction Scenario                                                             Highway Scenario