Dream to Drive with Predictive Individual World Model

Yinfeng Gao, Qichao Zhang, Da-Wei Ding, and Dongbin Zhao

Codes: [https://github.com/gaoyinfeng/PIWM]

Simulator: [https://github.com/gaoyinfeng/I-SIM]

Abstract

It is still a challenging topic to make reactive driving behaviors in complex urban environments as road users' intentions are unknown. Model-based reinforcement learning (MBRL) offers great potential to learn a reactive policy by constructing a world model that can provide informative states and imagination training. However, a critical limitation in relevant research lies in the scene-level reconstruction representation learning, which may overlook key interactive vehicles and hardly model the interactive features among vehicles and their long-term intentions. Therefore, this paper presents a novel MBRL method with a predictive individual world model (PIWM) for autonomous driving. PIWM describes the driving environment from an individual-level perspective and captures vehicles' interactive relations and their intentions via trajectory prediction task. Meanwhile, a behavior policy is learned jointly with PIWM. It is trained in PIWM's imagination and effectively navigates in the urban driving scenes leveraging intention-aware latent states. The proposed method is trained and evaluated on simulation environments built upon real-world challenging interactive scenarios. Compared with popular model-free and state-of-the-art model-based reinforcement learning methods, experimental results show that the proposed method achieves the best performance in terms of safety and efficiency.

Framework and Motivations

Considering a complex driving scenario with an ego vehicle and several social vehicles, Dreamer learns a scene-level world model, and the representation learning is to reconstruct current observations. The behavior model learns to operate on the mingled state. On the other hand, our method learns to build the world model in an individual-level framework, where each vehicle in the scene is classified and modeled separately with branched networks and owns a unique state. We further improve the individual-level world model by explicitly modeling the relations between vehicles, and replacing the reconstruction with trajectory prediction to capture the potential intentions or motion trends of vehicles.

Detailed Designs of the Model

The modules represented by solid lines are branched-only, the other modules represented by dash lines are shared between branches. Note that the gradients of actor and critic are stopped from flowing backward through latent states, which makes the representation learning purely happen in the world model learning phase. Network modules are all formed as multi-layer perceptrons (MLPs) since no image observations are considered.

Experiments Videos

The video version of the typical scenarios in Fig. 9. In which the ego is shown in red, the VDI and VPI are shown in orange and yellow, respectively, and other vehicles are shown in blue. Here we provide the logged ego trajectories with a light grey vehicle.

piwm-36-cut-in.mp4

PIWM: Cut-in

piwm-41-unprotect.mp4

PIWM: Unprotected left turn

dmv3-36-cut-in.mp4

DMV3: Cut-in

dmv3-41-unprotect.mp4

DMV3: Unprotected left turn

dqn-36-cut-in.mp4

DQN: Cut-in

dqn-41-unprotect.mp4

DQN: Unprotected left turn

Supplement Videos

piwm-0+21-big-car.mp4