Interactive Reinforcement Learning from Natural Language Feedback

Imene Tarakli, Samuele Vinanzi, Alessandro Di Nuovo

Sheffield Hallam University

i.tarakli@shu.ac.uk

Accepted in IROS 2024

Code

Appendix

Abstract

Large Language Models (LLMs) are playing a more pivotal role in robotics development. This paper presents Evaluative Corrective Guidance LAnguage as reInfoRcement (ECLAIR), a novel approach that uses LLMs to interpret and integrate diverse natural language feedback into robotic learning. ECLAIR presents a comprehensive unified framework that converts different type of human advice into actionable insights within Reinforcement Learning, enabling effective robot teaching. Experimental results with real-word participants show that ECLAIR significantly enhances robot learning, steering the robot’s policy closer to optimal outcomes within the start of learning, while notably reducing the requirement of extensive human intervention. Moreover, our findings highlight that ECLAIR effectively combines different type of advice and scales well to prompt modifications. Finally, we show that ECLAIR supports instruction across multiple languages, broadening its applicability and facilitating more inclusive human-robot teaching.

ECLAIR Framework

ECLAIR is a Reinforcement Learning (RL) framework that integrates different types of natural language feedback to interactively shape robots’ behaviours. The model consists of two phases:

Advice interpretation: we leverage the use of LLMs to translate the spoken feedback into different value, specifically evaluative feedback, corrective feedback, and guidance for the next action.
Advice shaping: this consists of integrating the different types of feedback in the RL algorithm to update and refine the policy of the robot.

Performance Evaluation

We evaluate the efficacy of ECLAIR in enhancing the robot’s learning process from human advice and compare it with the baseline TAMER. We recruit 12 participants to teach the robot in real-time with both methods.

Teaching with ECLAIR

Teaching with TAMER

ECLAIR demonstrates rapid learning, closely mirroring the expert performance from the initial episodes. In contrast, TAMER, shows a gradual and slower improvement, succeeding to pick-and-place the cube only towards the end of the training sessions.

The quick convergence of ECLAIR enabled it to be over 70% more successful than TAMER across the teaching trials (p <0.0001).

Feedback dynamics

Initially, ECLAIR receives more input from participants compared to TAMER. However, this trend reverses from the third episode onwards, with ECLAIR feedback converging to zero, while TAMER feedback remains consistently high until converging, where it then slightly diminishes. The availability of multiple feedback channels in ECLAIR seemingly reduces the cognitive load on trainers, allowing them to offer varied and richer feedback at the initial stage of training, enabling quicker learning with less overall teaching effort.

Moreover, with ECLAIR, we observe that participants predominantly used evaluative feedback when instructing the robot, with guidance closely following, indicating a tendency among participants to employ a combination of these feedback types for teaching. The use of corrective feedback was notably less frequent.

LLM's robustness as advice interpreter

We assess ECLAIR's adaptability to different prompting strategies and languages. For this, we generate a dataset of 20 instances of human advice with ground truth.

Few-shot VS Zero-shot Prompting

We assess ECLAIR's adaptability to different prompting strategies, examining its performance with both few-shot and zero-shot examples. ECLAIR's performance remains stable regardless of the prompting method, suggesting LLMs' pre-training is sufficient for this task context, without requiring explicit examples.

Prompt Rewording

Minimal performance differences between prompts wording can be seen. This suggests that ECLAIR is robust to changes in prompt wording.

Multilingual Interactions

We tested ECLAIR on datasets translated in 4 additional languages. We observe a consistent label accuracy across these languages, with no significant difference. ECLAIR can thus be used in different languages.

Page updated

Google Sites

Report abuse