Text2Interaction: Establishing Safe and Preferable
Human-Robot Interaction

Jakob Thumm 1Christopher Agia 2Marco Pavone 2Matthias Althoff 1

1 Technical University of Munich, 2 Stanford University

To appear at Conference on Robot Learning (CoRL) 2024

Abstract

Adjusting robot behavior to human preferences can require intensive human feedback, preventing quick adaptation to new users and changing circumstances. Moreover, current approaches typically treat user preferences as a reward, which requires a manual balance between task success and user satisfaction. To integrate new user preferences in a zero-shot manner, our proposed Text2Interaction framework invokes large language models to generate a task plan, motion preferences as Python code, and parameters of a safe controller. By maximizing the combined probability of task completion and user satisfaction instead of a weighted sum of rewards, we can reliably find plans that fulfill both requirements. We find that 83% of users working with Text2Interaction agree that it integrates their preferences into the robot’s plan, and 94% prefer Text2Interaction over the baseline. Our ablation study shows that Text2Interaction aligns better with unseen preferences than other baselines while maintaining a high success rate.

Human Preferences in Robotics

Behavior 1

Behavior 3

Behavior 2

Behavior 4

➡️ Different users have varying preferences. This naturally applies to robotics as well!

➡️ Previous works require many human inputs (>10) to learn the preferred behavior of a user, e.g., through:

➡️ Our goal is to incorporate human preference from a single user instruction instead!

The Text2Interaction Framework

Skills: To solve requested tasks, we chain multiple skills together, which consist of

Models: For each primitive, we learn 

using offline reinforcement learning.

Objective: The objective of our work is to maximize the combined probability of user satisfaction and task success. 

Preferences: We identify three main human preferences when working together with robots:

Formulation: We show that we can extend our objective and approximate each probability term to:

Solution: To fulfill the user instructions, we let a large language model (LLM) generate:

Furthermore, we approximate the geometric feasibility (task success) with the learned Q-functions of the primitives.

Results

Real-world user study

We invited 18 participants to test Text2Interaction live on a real robot. All participants execute the four experiment that is shown in the supplementary video below. With statistical evidence, users perceived Text2Interaction as more intelligent (p ≤ 0.005), cooperative (p ≤ 0.01), comfortable (p ≤ 0.025) and trustworthy (p ≤ 0.05) than the baseline.

Baseline (no preference)

Text2Interaction (with preference)

Ablation study

In our ablation study, we evaluate how well Text2Interaction performs on new user instructions and how it compares against other baselines. For this, we manually define 15 object arrangement tasks. To solve each task, we randomly add three out of these 15 tasks as in-context examples to the prompt of Text2Interaction, with a new random selection for each task. We repeat this three times per task, leading to 45 test trials. Each trial is then executed 100 times from random initial states. We compare Text2Interaction to the oracle (true preference functions), a baseline only optimizing for task success, and a baseline treating the success and human preference as additive rewards.

Our results show that Text2Interaction produces suitable preference functions for most tasks despite the low number of random in-context examples. Furthermore, we find that Text2Interaction achieves high preference values with only minor reductions of success rate. Our probabilistic formulation outperforms the additive reward formulation commonly found in the literature.

Example task: "Place the three blocks in a straight line.":

Oracle
(ground truth preference)

Text2Interaction
(generated preference function)

Baseline 1
(only optimize for task success)

Baseline 2
(Additive reward formulation)

Citation

If you found this work interesting, please consider citing:

@article{thumm2024text2interaction,

  title   = {Text2Interaction: Establishing Safe and Preferable Human-Robot Interaction},

  author  = {Thumm, Jakob and Agia, Christopher and Pavone, Marco and Althoff, Matthias},

  journal = {arXiv preprint arXiv:2408.06105},

  year    = {2024}

}

Acknowledgements