D4RL

Datasets for Deep Data-Driven Reinforcement Learning

Anonymous Authors

A collection of benchmarks and datasets for offline reinforcement learning

Code (Download Link)

Maze2D

AntMaze

Adroit

Gym

Flow

CARLA

FrankaKitchen

Abstract

The offline reinforcement learning (RL) problem, also known as batch RL, refers to the setting where a policy must be learned from a static dataset, without additional online data collection. This setting is compelling as potentially it allows RL methods to take advantage of large, pre-collected datasets, much like how the rise of large datasets has fueled results in supervised learning in recent years. However, existing online RL benchmarks are not tailored towards the offline setting, making progress in offline RL difficult to measure. In this work, we introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL. Examples of such properties include: datasets generated via hand-designed controllers and human demonstrators, multi-objective datasets where an agent can perform different tasks in the same environment, and datasets consisting of a mixtures of policies. To facilitate research, we release our benchmark tasks and datasets with a comprehensive evaluation of existing algorithms and an evaluation protocol together with an open-source codebase. We hope that our benchmark will focus research effort on methods that drive improvements not just on simulated tasks, but ultimately on the kinds of real-world problems where offline RL will have the largest impact.

Why D4RL?

In order to scalably collect large datasets for RL, data must be able to obtained cheaply and efficiently. Examples of strategies could include passive logging of agents in interactive environments (i.e. autonomous driving, recommender systems), or leveraging existing repositories recorded behaviors (i.e. medical records). These sources are likely to contain highly unstructured characteristics, such as:

Data generated via human demonstrations or hard-coded controllers.
Data with heterogenous mixtures of different policies
Data observing agents completing a variety of goals within the same environment.

This is in stark contrast to data generated by executing random agents or agents trained via RL, on a single task.

The goal of D4RL is to introduce tasks with more realistic characteristics, but in a package that is amenable to simulation and easy experimentation.

Tasks

Maze2D and AntMaze

The maze environments are designed to test the ability of agents to recombine existing data in novel ways. For example, if an agent sees trajectories 1-2 and 2-3, it can form a shortest path from 1-3. Two robots are available - a simple ball and the "Ant" robot from the Gym benchmark.

Adroit

The Adroit domain includes motion-captured human data on a realistic, high-DoF robotic hand. A variety of challenging tasks from the original paper are included, including pen twirling, opening a door, using a hammer, and relocating an object.

Gym

Several OpenAI Gym benchmark tasks are included with data collected by a variety of pre-trained RL agents. This includes the Hopper, HalfCheetah, and Walker environments.

Flow

Flow is a simulator for reducing traffic congestion using autonomous vehicles. We include two tasks - controlling autonomous vehicles on a ring or a highway merge configuration.

FrankKitchen

The FrankaKitchen domain is based on the Adept environment. This domain offers a challenging manipulation problem in an unstructured environment with many possible tasks to perform.

CARLA

CARLA is a high-fidelity autonomous driving simulator. We adapt the CARLA simulator for an offline navigation task, using visual observations.

Getting Started

Using D4RL is simple!

After installation, each task and dataset can be loaded as follows (in this example, we are loading the maze2d-umaze-v0 task):