Please check our new site at: Https://ehsan-ami.github.io/rlftsim/
Abstract. Supervised open-loop training has been widely adopted for training traffic simulation models; however, it fails to capture the inherently dynamic, multi-agent interactions prevalent in complex driving scenarios. We introduce RLFTSim, a reinforcement-learning-based fine-tuning framework that enhances scenario realism by aligning simulator rollouts with real-world data distributions and provides a method for the distillation of goal-conditioned controllability in scenario generation. We instantiate RLFTSim atop a pre-trained simulator, design a reward that balances fidelity and controllability, and perform extensive experiments on the Waymo Open Motion Dataset. Our results show improvements in realism enhancement and achieve the state-of-the-art performance. Compared with other heuristic search-based fine-tuning methods, RLFTSim requires significantly fewer samples due to a proposed low-variance and dense reward signal. We also showcase the effectiveness of our approach for controllability enhancement in traffic simulation via goal-conditioning.
📽️ Qualitative visualization of the simulation model before and after RLFTSim post-training.
Pre-train
Post-train
Figure 1 -left (Collision & Off-road): The pre-trained model generates unrealistic off-road behavior and a collision with cross-traffic, while the post-trained model (RLFTSim) produces realistic lane-following behavior that respects traffic rules.
Pre-train
Post-train
Figure S1 (Collision 1): In the simulation for the pre-trained model, the vehicle entering the circle fails to yield to the pedestrian and collides with it. In the post-trained model, the vehicle yields to the pedestrian.
Pre-train
Post-train
Figure S2 (Collision 2): For the pre-trained model, there is a rear-end collision between two vehicles at the bottom of the scene. However, the post-trained model avoids this accident.
Pre-train
Post-train
Figure S3 (Collision 3): For the pre-trained model, the parked vehicle attempts to enter the road, which leads to a collision with the passing vehicle. The passing vehicle tries to slow down, but it cannot avoid the collision. For the post-trained model, the parked vehicle waits for the road to get free, and then enters the road
Pre-train
Post-train
Figure S4 (Off-road 1): For the pretrained model, the cyclist does not respect the drivable area and goes off-road. For the post-trained model (RLFTSim model), the cyclist adheres to the drivable area.
🎯 Qualitative visualization for Goal-Conditioned Fine-Tuning (GCFT)
b1: Pre-train
Successful U-Turn
b4: Pre-train
Failed left turn
b2: Post-train (cat, hard)
Successful U-Turn
b5: (cat, hard)
Successful left turn
b3: Post-train (ind, hard)
Successful U-Turn
b6: Post-train (ind, hard)
Successful left turn
Figure 1 (right): GCFT Visualization. The goal point is shown with a magenta colored circle. The simulation is shown for various representations of the goal point (Section 3.2). We have a U-turn goal in the top row, and a left turn goal in the bottom row.