Learning to Drive (L2D) as a Low-Cost Benchmark for Real-World Reinforcement Learning

Ari Viitala*, Rinu Boney*, Yi Zhao, Alexander Ilin, Juho Kannala

icra_final_hq.mp4

Motivation

  • It is important to ground progress of RL on real-world tasks.

  • We introduce Learning to Drive (L2D), a simple and easily reproducible benchmark for real-world RL.

  • We test imitation learning, state-of-the-art model-free, and model-based algorithms on the proposed L2D benchmark, to show that existing RL algorithms can learn to drive from scratch in less than five minutes of interaction.

  • We open-source our training pipeline, baseline implementations and all experimental details (including our training trajectories) so that it is easy to reproduce our results and apply any RL algorithm on our benchmark.

Experimental Setup

In L2D benchmark, an RL agent has to learn to drive a Donkey car around three miniature tracks, given only monocular image observations and speed of the car.

The agent has to learn to drive from disengagements, which occurs when it drives off the track. The three tracks, our training trajectories and disengagement rewards:

Task 1: Learning to Steer

To evaluate data-efficiency, the RL agent is tasked with learning to steer the car moving at a fixed speed and the task is considered successful if the agent can consistently complete three laps around the track.

The state-of-the-art model-based RL algorithm, Dreamer, learns from sparse and noisy rewards in a very sample-efficient and robust manner to achieve the maximum return of 1000 within merely 5 minutes of driving, in all tracks. The state-of-the-art model-free algorithm, SAC+VAE, is also able to learn, but requires almost twice as many disengagements as Dreamer.

Task 2: Generalization

To evaluate generalization, the RL agents have to demonstrate consistent performance across different tracks. We evaluate this by training the agent to solve Task 1 on one of the tracks and measuring the agent's performance on other other tracks.

Environmental conditions such as ambient lighting during training and testing times were different and we observe more variance in the performance of our agents. Dreamer generalizes better than SAC+VAE. While agents trained on Track 1 are able to generalize to all other tracks, agents trained on Track 3 are unable to generalize to other tracks (since Track 3 only consists of sharp right-angle bends). Agents trained on Track 2 perform better on Track 1, which is easier to drive on.

Task 3: High-Speed Control

To evaluate control performance, the RL agent is tasked with controlling both the speed and steering of the car and performance is measured in terms of the average lap time of the agent. We compare the average lap times over five laps (from a flying start) of best performing RL agents with a well-tested imitation learning agent and a human operator.

Reinforcement learning agents are able to outperform imitation learning and even the human operator, learning from just sparse and noisy disengagement signals. Although SAC+VAE requires more samples to learn, it is able to drive the fastest, outperforming Dreamer, which is also able to drive faster than a human.

Steps to Reproduce Our Experiments

  1. Buy a Donkey car starter kit or buy the necessary parts to build one (see https://docs.donkeycar.com/guide/build_hardware/ for all details).

  2. Setup track(s) for training.

  3. Follow instructions in the Donkey car user guide to assemble the car, install the supporting software, and to train an imitation learning baseline.

  4. Buy a power bank and an Intel RealSense T265, mount them to the car, and connect them to the Raspberry Pi.

  5. Install our supporting software and run the training script to exactly reproduce the results of our baseline agents.