Text2Interaction: Establishing Safe and Preferable
Human-Robot Interaction
Jakob Thumm 1, Christopher Agia 2, Marco Pavone 2, Matthias Althoff 1
1 Technical University of Munich, 2 Stanford University
To appear at Conference on Robot Learning (CoRL) 2024
Abstract
Adjusting robot behavior to human preferences can require intensive human feedback, preventing quick adaptation to new users and changing circumstances. Moreover, current approaches typically treat user preferences as a reward, which requires a manual balance between task success and user satisfaction. To integrate new user preferences in a zero-shot manner, our proposed Text2Interaction framework invokes large language models to generate a task plan, motion preferences as Python code, and parameters of a safe controller. By maximizing the combined probability of task completion and user satisfaction instead of a weighted sum of rewards, we can reliably find plans that fulfill both requirements. We find that 83% of users working with Text2Interaction agree that it integrates their preferences into the robot’s plan, and 94% prefer Text2Interaction over the baseline. Our ablation study shows that Text2Interaction aligns better with unseen preferences than other baselines while maintaining a high success rate.
Human Preferences in Robotics
Behavior 1
Behavior 3
Behavior 2
Behavior 4
➡️ Different users have varying preferences. This naturally applies to robotics as well!
➡️ Previous works require many human inputs (>10) to learn the preferred behavior of a user, e.g., through:
➡️ Our goal is to incorporate human preference from a single user instruction instead!
The Text2Interaction Framework
Skills: To solve requested tasks, we chain multiple skills together, which consist of
A primitive, e.g., pick, place, handover, push, ...
An action that parametrizes the primitive, e.g., how to pick, where to place, ...
A controller that executes the primitive. We use our provably safe controller for human-robot interaction.
A vector of controller parameters, e.g., max. velocity, stiffness, damping, ...
Models: For each primitive, we learn
a policy,
a Q-function
a transition distribution
Objective: The objective of our work is to maximize the combined probability of user satisfaction and task success.
Preferences: We identify three main human preferences when working together with robots:
Task-level preferences: What should the robot do?
Motion-level preferences: Which path should the robot choose?
Control-level preferences: How fast, soft, or precise should the robot be?
Formulation: We show that we can extend our objective and approximate each probability term to:
Solution: To fulfill the user instructions, we let a large language model (LLM) generate:
A task plan (sequence of primitives)
A set of controller parameters (choose from predefined)
Motion preferences as executable python functions
Furthermore, we approximate the geometric feasibility (task success) with the learned Q-functions of the primitives.
Results
Real-world user study
We invited 18 participants to test Text2Interaction live on a real robot. All participants execute the four experiment that is shown in the supplementary video below. With statistical evidence, users perceived Text2Interaction as more intelligent (p ≤ 0.005), cooperative (p ≤ 0.01), comfortable (p ≤ 0.025) and trustworthy (p ≤ 0.05) than the baseline.
Baseline (no preference)
Text2Interaction (with preference)
Ablation study
In our ablation study, we evaluate how well Text2Interaction performs on new user instructions and how it compares against other baselines. For this, we manually define 15 object arrangement tasks. To solve each task, we randomly add three out of these 15 tasks as in-context examples to the prompt of Text2Interaction, with a new random selection for each task. We repeat this three times per task, leading to 45 test trials. Each trial is then executed 100 times from random initial states. We compare Text2Interaction to the oracle (true preference functions), a baseline only optimizing for task success, and a baseline treating the success and human preference as additive rewards.
Our results show that Text2Interaction produces suitable preference functions for most tasks despite the low number of random in-context examples. Furthermore, we find that Text2Interaction achieves high preference values with only minor reductions of success rate. Our probabilistic formulation outperforms the additive reward formulation commonly found in the literature.
Example task: "Place the three blocks in a straight line.":
Oracle
(ground truth preference)
Text2Interaction
(generated preference function)
Baseline 1
(only optimize for task success)
Baseline 2
(Additive reward formulation)
Citation
If you found this work interesting, please consider citing:
@article{thumm2024text2interaction,
title = {Text2Interaction: Establishing Safe and Preferable Human-Robot Interaction},
author = {Thumm, Jakob and Agia, Christopher and Pavone, Marco and Althoff, Matthias},
journal = {arXiv preprint arXiv:2408.06105},
year = {2024}
}