Benchmarking Offline Reinforcement Learning on Real-Robot Hardware

 Abstract

Learning policies from previously recorded data is a promising direction for real-world robotics tasks, as online learning is often infeasible. Dexterous manipulation in particular remains an open problem in its general form. The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community toward tackling this problem, we propose a benchmark including: i) a large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation; ii) the option to execute learned policies on a real-world robotic system and a simulation for efficient debugging. We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.

Two dexterous manipulation tasks on the TriFinger platform

pushing.mp4

Push task: Footage of the expert policy used during data collection.


lifting.mp4

Lift task: Footage of the expert policy (trained and deployed with low-pass filtering on the actions) used during data collection.

Training, data collection and benchmarking pipeline

Datasets

The datasets are available via the trifinger_rl_datasets package which also provides a gym environment of the simulated TriFinger platform. Similarly to D4RL, the datasets are downloaded automatically when requested via the `get_dataset` method:

import gymnasium as gym

import trifinger_rl_datasets

env = gym.make("trifinger-cube-push-sim-expert-v0")

dataset = env.get_dataset()

All datasets are also available as versions with camera images. See the documentation for further details on how to use the 34 datasets.

Get access to a cluster of real TriFinger robots

A cluster of TriFinger platforms is hosted at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, and is available for remote use. To request access for a research project, write an email to the contact person listed at the TriFinger website and describe the type and scope of the experiments you want to run. You will then be provided with an account that enables you to submit jobs to the TriFinger cluster as if you were using a compute cluster.

ICLR 2023 Poster