Deep Imitative Models for Flexible Inference, Planning, and Control

Abstract:

Imitation learning is an appealing approach for autonomous control: in many tasks, demonstrations of preferred behavior can be readily obtained from human experts, removing the need for costly and potentially dangerous online data collection in the real world. However, imitation learning produces behavioral policies with limited flexibility to accommodate new goals at test-time. In contrast, model-based reinforcement learning (MBRL) can plan to arbitrary goals using a predictive dynamics model learned from data, yet suffers from two shortcomings. First, the dynamics alone cannot be used to choose desired or safe outcomes -- it estimates only what is possible, not what is preferred. Second, MBRL typically requires additional online data collection to ensure the model is accurate in situations that are actually encountered when attempting to achieve test-time goals. Collecting this data with a partially-trained model can be dangerous and time-consuming. In this paper, we propose "imitative models'' to combine the benefits of imitation learning and MBRL. Imitative models are probabilistic predictive models able to plan interpretable expert-like trajectories to achieve arbitrary goals. Inference with them resembles trajectory optimization in model-based reinforcement learning, and learning them resembles imitation learning. We find this method substantially outperforms six direct imitation learning approaches (five of them prior work) and an MBRL approach in a dynamic simulated autonomous driving task, and can be learned efficiently from a fixed set of expert demonstrations without additional online data collection. We also show our model can flexibly incorporate user-supplied costs at test-time, can plan to sequences of goals, and can even perform well with imprecise goals, including goals on the wrong side of the road.

Navigated Driving With Imitative Models In CARLA

The model produces expert-like plans given candidate waypoints

The model produces expert-like plans given candidate waypoints

Navigating With waypoints On wrong side THe of road

The model produces expert-like plans that both follow the candidate waypoints and stays on the correct side of the road.

The model produces expert-like plans that both follow the candidate waypoints and stays on the correct side of the road.

Navigating With DEcoy waypoints

The model produces expert-like plans given candidate waypoints. The model is rarely distracted by non-expert goals.

The model produces expert-like plans given candidate waypoints. The model is rarely distracted by non-expert goals.

Navigating with potholes

Setting: Imitative Planning to goals >= 25m separated by 25m, with pothole costs (Town02). Our model was trained on Town01, with no examples of potholes. At test time, we show that we can control to goals in an unseen environment without crashing, while incorporating a new cost that simulates randomized potholes to avoid.