Personalization in Human-Robot Interaction through Preference-based Action Representation Learning
Ruiqi Wang, Dezhong Zhao, Dayoon Suh, Ziqin Yuan,
Guohua Chen, and Byung-Cheol Min
To be Presented at ICRA2025
Ruiqi Wang, Dezhong Zhao, Dayoon Suh, Ziqin Yuan,
Guohua Chen, and Byung-Cheol Min
To be Presented at ICRA2025
Abstract
Preference-based reinforcement learning (PbRL) has shown significant promise for personalization in human-robot interaction (HRI) by explicitly integrating human preferences into the robot learning process. However, existing practices often require training a personalized robot policy from scratch, resulting in inefficient use of human feedback. In this paper, we propose preference-based action representation learning (PbARL), an efficient fine-tuning method that decouples common task structure from preference by leveraging pre-trained robot policies. Instead of directly fine-tuning the pre-trained policy with human preference, PbARL uses it as a reference for an action representation learning task that maximizes the mutual information between the pre-trained source domain and the target user preference-aligned domain. This approach allows the robot to personalize its behaviors while preserving original task performance and eliminates the need for extensive prior information from the source domain, thereby enhancing efficiency and practicality in real-world HRI scenarios. Empirical results on the Assistive Gym benchmark and a real-world user study (N=8) demonstrate the benefits of our method compared to state-of-the-art approaches.
Comparison with Previous Preference-based Methods for Personalization
Comparison of our method with previous preference-based approaches for personalized adaptations. Unlike the common PbRL regime, which trains personalized policies from scratch, our method shifts toward fine-tuning to leverage human feedback more efficiently. Instead of using the preference-aligned reward model to directly adjust the pre-trained policy via RL, we employ it for an action representation task to train a mutual information encoder, preserving the pre-trained task performance while enhancing personalization.
Framework Overview
Overview of PbARL. We train PbARL using transition tuples: current state, action distribution, and next state, collected by testing a pre-trained robot policy in the environment. The objective is to learn a harmonized latent action space within the mutual information state encoder, implemented as a conditional VAE, by collectively optimizing three losses: a reconstruction loss , a preference loss that reflects the consistency between the original action ranking list and the re-ranked action list based on scores derived from the preference-aligned reward model , and a Kullback–Leibler (KL) loss to regularize the latent space in the VAE structure. To enhance controllability and scalability in the learned latent action space, we also conduct an auxiliary task to train a latent transition model , optimized via a dynamic loss.
Experiments on Benchmark
We first evaluated the proposed PbARL on the Assistive Gym benchmark , focusing on three distinct assistive HRI tasks: Feeding, Drinking, and Itch Scratching.
Demos of User Study
To evaluate the personalization capabilities of our method in realistic HRI scenarios, we conducted a real-world user study on the Feeding task.
Interaction Demos with Pre-trained Robot Policy
Demos of Personalization Performance Comparison