RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback

RLHF-Blender provides a simple way to setup and run scalable experiments for learning from diverse human feeedback. To support different experiment designs, users may select from different combinations of feedback types in the user interface, feedback processing , querying strategies and reward models.

Paper

Demo

Code (GitHub)

Yannick Metz1 , David Lindner2, Raphaël Baur2, Daniel Keim1, Mennatallah El-Assady2

1 University of Konstanz 2 ETH Zurich

Presented at the Interactive Learning from Implicit Feedback Workshop at ICML2023

RLHF-Blender allows to configure experimental setups for RLHF-experiments based on several modular components:

A freely configurable user interface for different feedback type interactions
Feedback processors, handling the translation of different types of feedback, incl. meta-data, into a common format
Adaptor to different reward models (e.g. reward model ensembles, AIRL-style models, etc.)

Workflow & Architecture of RLHF-Blender

A user interacts with a user interface to view behavior and provide feedback. Samples of the behavior are retreived by a sampler components, and can supplied either as online or offline data. Human feedback is translated into a common format, aligned with the respective samples, and passed onto a reward model for training. Samples are logged and stored for post-hoc analysis. The reward model can either be used offline for subsequent analysis, or in an online mode to train an RL model for data generation.

Abstract

To use reinforcement learning from human feedback (RLHF) in practical applications, it is crucial to learn reward models from diverse sources of human feedback and to consider human factors involved in providing feedback of different types. However, systematic study of learning from diverse types of feedback is held back by limited standardized tooling available to researchers.

To bridge this gap, we propose RLHF-Blender, a configurable, interactive interface for learning from human feedback. RLHF-Blender provides a modular experimentation framework and implementation that enables researchers to systematically investigate the properties and qualities of human feedback for reward learning. The system facilitates the exploration of various feedback types, including demonstrations, rankings, comparisons, and natural language instructions, as well as studies considering the impact of human factors on their effectiveness. We discuss a set of concrete research opportunities enabled by RLHF-Blender.

The Application

User interface supporting multiple types of feedback

To collect multi-type feedback from, RLHF-Blender implements five types of human feedback, i.e. ratings, rankings, demonstrations and descriptions.

Episodes or segments can be visualised at required level of detail.

Configurable interface to enable flexible experimentation

To support a wide range of possible experiments, it is possible to configure the system, including user interface, sampling and reward learnining components.

Investigating dependencies and biases for different tasks, feedback types and users may improve learning from human feedback in the future.

Experiment Setups

Replication: Preference-Based Learning for Agent Training

Evaluating The Effectiveness of Multi-Type Feedback

Continuous Calibration of Feedback Irrationality

Exploring The Impact Explainability for Human Feedback