OIL: Observational Imitation Learning

Paper

OIL_Supplement_720p.mp4

Abstract:

Recent work has explored the problem of autonomous navigation by imitating a teacher and learning an end-to-end policy, which directly predicts controls from raw images. However, these approaches tend to be sensitive to mistakes by the teacher and do not scale well to other environments or vehicles. To this end, we propose Observational Imitation Learning (OIL), a novel imitation learning variant that supports online training and automatic selection of optimal behavior by observing multiple imperfect teachers. We apply our proposed methodology to the challenging problems of autonomous driving and UAV racing. For both tasks, we utilize the Sim4CV simulator that enables the generation of large amounts of synthetic training data and also allows for online learning and evaluation. We train a perception network to predict waypoints from raw image data and use OIL to train another network to predict controls from these waypoints. Extensive experiments demonstrate that our trained network outperforms its teachers, conventional imitation learning (IL) and reinforcement learning (RL) baselines and even humans in simulation.

Contributions:

(1) We propose Observational Imitation Learning (OIL) as a new approach for training a stationary deterministic policy that overcomes shortcomings of conventional imitation learning by incorporating reinforcement learning ideas. It learns from an ensemble of imperfect teachers, but only updates the policy with the best maneuvers of each teacher, eventually outperforming all of them.


(2) We introduce a flexible network architecture that adapts well to different control scenarios and complex navigation tasks (e.g. autonomous driving and UAV racing) using OIL in a self-supervised manner without any human training demonstrations.


(3) To the best of our knowledge, this paper is the first to apply imitation learning to multiple teachers while being robust to teachers that exhibit bad behavior.

Please cite our paper if you find anything helpful

@misc{li2018oil, title={OIL: Observational Imitation Learning}, author={Guohao Li and Matthias Müller and Vincent Casser and Neil Smith and Dominik L. Michels and Bernard Ghanem}, year={2018}, eprint={1803.01129}, archivePrefix={arXiv}, primaryClass={cs.CV}}

Results of Autonomous Driving

Comparison to Teachers

OIL_na345_300steps_car_track1.mp4

OIL-track1

PIDn_track1.mp4

Teacher1-track1

PIDa_track1.mp4

Teacher2-track1

l3_track1.mp4

Teacher3-track1

l4_track1.mp4

Teacher4-track1

l5_track1.mp4

Teacher5-track1

Comparison to Human

OIL_na345_300steps_car_track2.mp4

OIL-track2

Humam_track2.mp4

Novice-track2

Frost_track2.mp4

Intermediate-track2

Matthias_track2.mp4

Expert-track2

Comparison to Learned Baselines

OIL_na345_300steps_car_track3.mp4

OIL-track3

BC_na345_track3.mp4

Behavior Cloning-track3

DAGGER_na345_track3.mp4

DAGGER-track3

DDPG_car_fixed_pitch_track3.mp4

DDPG-track3

Ablation study

OIL_na345_60steps_car_track4.mp4

OIL-60steps-track4

OIL_na345_180steps_car_track4.mp4

OIL-180steps-track4

OIL_na345_300steps_car_track4.mp4

OIL-300steps-track4

OIL_na345_600steps_car_track4.mp4

OIL-600steps-track4

OIL_na345_300steps_car_track4.mp4

OIL-5teachers(1-5)-track4

OIL_n34_car_track4.mp4

OIL-3teachers(1,3,4)-track4

Results of UAV Racing

OIL_na345_hard.mp4

OIL-track4

BC_na345_hard.mp4

Behavior Cloning-track4

DAGGER_na345_hard.mp4

DAGGER-track4

DDPG_hard.mp4

DDPG-track4