Towards Uncertainty Unification: A Case Study for Preference Learning

Accepted to RSS 2025

Abstract

Learning human preferences is essential for human-robot interaction, as it enables robots to adapt their behaviors to align with human expectations and goals. However, the inherent uncertainties in both human behavior and robotic systems make preference learning a challenging task. While probabilistic robotics algorithms offer uncertainty quantification, the integration of human preference uncertainty remains underexplored. To bridge this gap, we introduce uncertainty unification and propose a novel framework, uncertainty-unified preference learning (UUPL), which enhances Gaussian Process (GP)-based preference learning by unifying human and robot uncertainties. Specifically, UUPL includes a human preference uncertainty model that improves GP posterior mean estimation, and an uncertainty-weighted Gaussian Mixture Model (GMM) that enhances GP predictive variance accuracy. Additionally, we design a user-specific calibration process to align uncertainty representations across users, ensuring consistency and reliability in the model performance. Comprehensive experiments and user studies demonstrate that UUPL achieves state-of-the-art performance in both prediction accuracy and user rating. An ablation study further validates the effectiveness of human uncertainty model and uncertainty-weighted GMM of UUPL.

High-level Intuition

Imagine a robot inferring Alice's (a human user) ideal trajectory for passing a cup of coffee above a table using preference learning. In one sample pair, trajectory $x^{(1)}$ poses a risk of spilling coffee on the keyboard, while trajectory $x^{(2)}$ risks spilling it on the headphones. The keyboard and headphones are both valuable to Alice, so she responds that she weakly prefers $x^{(2)}$ with hesitation, reflecting her uncertainty in the decision. An uncertainty-averse model ignores this nuance, potentially learning a suboptimal and undesirable trajectory (e.g., still passing above the headphones). In contrast, an uncertainty-unified model incorporates Alice’s expressed uncertainty into its uncertainty-aware framework, enabling it to learn an ideal trajectory that aligns with her true preferences.

Method

Imagine a robot inferring a user's preferred room temperature. For each query, we collect the user's preference with the associated uncertainty level. To begin, a calibration process (blue box) interprets the user's definitions of "confident'' and ''uncertain'', ensuring these subjective assessments are accurately quantified with uncertainty factors $u$. Then, we construct the human preference uncertainty model as a probit model using Gaussian CDF, with the calibrated $u$ as the standard deviation (left part of purple box). This model improves the GP mean estimation accuracy (right part of purple box). Additionally, we introduce a weighted GMM (left part of red box) to adaptively scale the GP predictive variance (right part of red box) based on the human uncertainty level, enhancing its interpretability. Through this approach, UUPL effectively integrates human uncertainty into both the GP mean and variance, achieving comprehensive uncertainty unification, and thus provides a more accurate, interpretable, and user-aligned learning result (rightmost picture).

Experiments & Results

Three simulation experiments and their results. (a) The 1D thermal comfort function used in Simulation 1, with results shown in (d). (b) The 2D tabletop importance function for Simulation 2, with results plotted in (e). (c) Simulation 3, where the left panel depicts two possible trajectories for the red user car based on the blue agent car's action, and the right panel shows the functions of the four trajectory-related features. Results are presented in (f).

One example for the tabletop importance task. The red box represents the user interface, and green box illustrates the tabletop setup, with the robot trying to move from the blue star to the red star.

Illustration of the apple pick-and-place task. The dashed pink line is one possible trajectory. The robot starts at the blue star, passes the yellow star, and reaches the red star.

BibTex

@INPROCEEDINGS{PengS-RSS-25,

AUTHOR = {Shaoting Peng AND Haonan Chen AND Katherine Rose Driggs-Campbell},

TITLE = {Towards Uncertainty Unification: A Case Study for Preference Learning},

BOOKTITLE = {Proceedings of Robotics: Science and Systems},

YEAR = {2025},

ADDRESS = {LosAngeles, CA, USA},

MONTH = {June},

DOI = {10.15607/RSS.2025.XXI.091}

}

Page updated

Google Sites

Report abuse