Location: RTH 109
June 25, 9am-5pm
The rapid evolution of generative modeling has led to tremendous advances in recent years, enabling breakthroughs across a wide range of domains, including image and video synthesis, natural language processing, and robotics. Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Diffusion Models, and foundation models transform how agents perceive, learn, and interact with the world. Inspired by this progress, this workshop explores the potential of applying generative modeling techniques to enhance human-robot interaction (HRI). We aim to gather the communities of robot learning and Human-Robot Interaction to discuss cutting-edge generative model techniques, modeling of human behaviors and robot motions, and opportunities to use them to achieve more intuitive human-robot interactions for robot teaching, data collection, language-base interactions and collaborative execution.
Why are generative models important for research in HRI? HRI will benefit greatly from powerful large models that bring open-world knowledge and generalization to the classic HRI interaction workflows. Just as ChatGPT has become of popular use for non-technical users, it’s only a matter of time before these types of large models with vision and language capabilities will play a key role in generating and mediating interaction between humans and robots in daily life settings (home robots learning your home tasks from examples) and industrial deployments (co-bots in manufacturing). Generative models are also key for the creation of simulation environments (3D assets, scenes, tasks, language commands, and language-based task generation), and simulation environments are useful for data collection of human demonstrations, data generation, and policy training. It’s important for HRI researchers to foster collaborations that investigate how multi-agent interactions and human-like behaviors will play a role in these systems, whether in simulation or real settings.
Why is HRI important to research in generative models? Conversely, HRI is pivotal for advancing research in generative models. Human interaction and feedback are essential for producing high-quality data for learning and value-aligned training. For example, reinforcement learning from human feedback (RLHF) has demonstrated significant advancements in model performance, enabling ChatGPT’s performance to surpass models learned from static language datasets. Generative models applied to robotics are fundamentally tied to human interaction. In data collection pipelines, we need to provide users with tools, methods, and interfaces to provide and curate high-quality data that can be used by learning algorithms. For model improvements, we need human feedback in the loop with policy learning iterations of fine-tuning during deployment. These are all core interaction problems that are studied in HRI and are now prime to be used in the loop with generative AI in both training and inference, bringing the knowledge from interactions and human-centered modeling into robot learning.
Link to last year's workshop: https://sites.google.com/view/gai-hri-2024
Meta, UW
9:10 am
CMU
9:40 am
Stanford
10:10 am
UPenn
11:30 am
Cornell
1:30 pm
Northwestern University
2:00 pm
Waymo
4:10 pm
Location: RTH 109. Map: https://roboticsconference.org/program/workshops/
Morning Session. Chair: Claudia D'Arpino
9:00 Workshop Intro - Minyoung Hwang
9:10 Roozbeh Mottaghi -
Visual Embodied Planning
Abstract: Current literature and demos in robotics often focus on tabletop or single-room scenarios. However, to develop useful household robots, we must plan for long-horizon scenarios in partially observable environments. A key challenge in such scenarios is the presence of perception and actuation noise, as well as dynamic environments that change over time, which makes planning even more difficult. In this talk, I will present PARTNR, a benchmark designed for planning and reasoning in human-robot collaborative tasks. I will provide a comprehensive analysis of how LLMs behave as planners across various scenarios, highlighting their strengths and limitations. Additionally, I will demonstrate how we can distill knowledge from these large models into smaller, faster models using synthetic collaborative planning data, enabling more efficient planning. The second part of the talk will focus on an embodied learnable memory - a VLM that performs memory operations to capture the state of the environment over time. This model offers several advantages, including open-vocabulary support, error correction, handling dynamic environments, and a unified architecture, unlike recent methods that rely on multiple distinct models. Furthermore, I will introduce a new reinforcement learning method that significantly improves planning performance compared to strong baselines, leveraging the proposed memory.
9:40 Yonatan Bisk -
Simulating People with Language Models -- Good Idea? Bad Idea?
10:10 Dorsa Sadigh -
Steerable and Interactive Robots in the Era of Large Model
10:40 Coffee Break & Poster Session
Poster Stand Numbers: 94-103 (your number = paper ID + 93)
11:30 Nadia Figueroa -
All Generative Models Are Wrong, But Some Are Useful (When Combined with Structured Methods)
12:00 Panel Discussion 1: Are LLMs, VLMs, VLAs robust enough for simulation and interaction?
Yonatan Bisk, Nadia Figueroa, Roozbeh Mottaghi, moderated by Claudia D'Arpino
12:30 Lunch
Afternoon Session. Chair: Harold Soh
1:30 Tapomayukh Bhattacharjee -
Stakeholder-Informed Physical Assistance: Leveraging Generative AI towards Real-World Caregiving Robots
Abstract: How can generative AI help robots deliver meaningful physical assistance in caregiving tasks? In care settings, robots must operate in unstructured, dynamic environments—physically interacting with people and objects to support tasks such as feeding, meal-preparation, transferring, and bed-bathing. These scenarios demand not only robust perception and control across a variety of scenarios, but also systems that can reason about user intent, personalize their behavior, and communicate effectively with them. In this talk, I will present our ongoing work on building stakeholder-informed caregiving robots that leverage generative AI tools to better understand the users while adapting and explaining the robots' own actions. By combining generative models with real-world deployment constraints, we aim to create physically assistive robots that are more capable --grounded in physical intelligence and responsive to the unique needs and contexts of their users.
2:00 Brenna Argall -
Individualized Models of Human Control in Assistive Robotics
Abstract: As need increases, access decreases. It is a paradox that as human motor impairments become more severe, and increasing needs pair with decreasing motor ability, the very machines created to provide assistance become less and less accessible to operate with independence. My lab addresses this paradox by incorporating robotics intelligence into machines that enable mobility and manipulation: leveraging robotics autonomy, to advance human autonomy. In this talk, I will overview research with in my lab that models human control signals issued to operate assistive robots, that aims to mine as much information as possible from limited control signals issued by persons with severe motor impairments.
2:30 Student Spotlight Talk (Paper IDs 7, 8, 9)
3:00 RSS Selected Paper Talk - Demonstrating Arena 5.0 - Linh Kästner
Abstract: Building upon the foundations laid by our previous work, this paper introducesArena 5.0, the fifth iteration of our framework for robotics social navigation development and benchmarking. Arena 5.0 provides three main contributions: 1) The complete integration of NVIDIA Isaac Gym, enabling photorealistic simulations and more efficient training. It seamlessly incorporates Isaac Gym into the Arena platform, allowing the use of existing modules such as randomized environment generation, evaluation tools, ROS2 support, and the integration of planners, robot models, and APIs within Isaac Gym. 2) A comprehensive benchmark of state-of-the-art social navigation strategies, evaluated on a diverse set of generated and customized worlds and scenarios of varying difficulty levels. These benchmarks provide a detailed assessment of navigation planners using a wide range of social navigation metrics. 3) An extensive set of modules for specified and highly customizable scenario generation and task planning facilitating improved and customizable generation of social navigation scenarios, such as emergency and rescue situations. The platform’s performance was evaluated by generating the aforementioned benchmark and through a comprehensive user study, demonstrating significant improvements in usability and efficiency compared to previous versions. Arena 5.0 is open source and available at https://github.com/Arena-Rosnav
3:30 Coffee Break
4:10 Panel Discussion 2: Foundation models for shared autonomy and assistive robots: limitations and opportunities
Vincent Vanhoucke, Brenna Argall, Tapomayukh Bhattacharjee, moderated by Harold Soh
5:00 Best Paper Award Announcement & Wrap Up
Motion and Behavior Modelling and Generation
Generative modeling of human-like behaviors in imitation learning
Generative modeling of valid robot motion plans and TAMP
Generative modeling of human-robot interactions
Imitation learning and learning from demonstrations (motion and tasks)
Imitation of multi-agent collaborative tasks
Diffusion Models for motion and behavior generation
Generation of scenes, tasks, and interactive behaviors in simulation
Human Interaction for Goal Specification on Generative Models
Teleoperation and shared autonomy
User goal specification for interactively commanding a robot
Goal abstractions using language
Interfaces for robot teaching
Inference-time policy adaptation using human feedback
Large Language Models (LLMs) and Vision Language Models (VLMs) in HRI
LLMs and VLMs for embodied AI
Generative models (LLMs/VLMs) for offline evaluation
Generative models with multi-modality (vision, language, audio, tactile)
Generative models of speech for HRI (dialogue, empathy, engagement)
LLMs as planners for behavior generation
LLMs and VLMs as reward functions or success detectors
AI-HRI Safety and Alignment
Risks and biases of using generative models for data generation, interaction
Safely deploying generative models for HRI
Out-of-distribution settings in HRI
Uncertainty and Misalignment Detection
Spotlight talks: Spotlight papers will be presented as 7min spotlight talks (in-person or pre-recorded videos) in the 2:30pm session.
Posters Session: Papers will also present a poster during the poster sessions. Poster Stand Numbers: 94-103 (your number = paper ID + 93)
Accepted Papers:
#1: Improving Human-Robot Interaction via a Population of Synthetic Human-like Teams
Siddharth Srikanth, Varun Bhatt, Michael Lewis, Katia P. Sycara, Aaquib Tabrez, Stefanos Nikolaidis
#2: MDG: Multi-Agent Behavior Modeling in Traffic Scenarios through Masked Denoising Generation
Zhiyu Huang, Zewei Zhou, Tianhui Cai, Jiaqi Ma
#3: X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real
Prithwish Dan, Kushal Kedia, Angela Chao, Maximus Adrian Pace, Edward Duan, Wei-Chiu Ma, Sanjiban Choudhury
#4: MotIF: Motion Instruction Fine-tuning
Minyoung Hwang, Joey Hejna, Dorsa Sadigh, Yonatan Bisk
#5: Mixed Initiative Dialog for Human-Robot Collaborative Mobile Manipulation
Albert Yu, Chengshu Li, Luca Macesanu, Arnav Balaji, Ruchira Ray, Ray Mooney, Roberto Martín-Martín
#6: DEXOP: Hardware for Collecting Contact-Rich and Dexterous Robotic Manipulation Data In-The-Wild
Hao-Shu Fang, Arthur Hu, Branden Romero, Edward H Adelson, Pulkit Agrawal
#7: Structured Imitation Learning of Interactive Policies through Inverse Games (Spotlight)
Max Muchen Sun, Todd Murphey
#8: Modeling Latent Partner Strategies for Adaptive Zero-Shot Human-Agent Collaboration (Spotlight)
Benjamin Li, Shuyang Shi, Lucia Romero, Huao Li, Yaqi Xie, Woojun Kim, Stefanos Nikolaidis, Michael Lewis, Katia P. Sycara, Simon Stepputtis
#9: Pack It Your Way: Diffusion-Based Bin Packing with Custom Human Constraints (Spotlight)
Anurag Maurya, Shivam Vats, Ravi Prakash
#10: Controllable Video Action Model
Sriram Yenamandra, Shuang Li, Sean Kirmani, Shuran Song, Dorsa Sadigh
MIT
Disney Research Zurich
NUS
NUS
MIT
MIT
NVIDIA
Authors are invited to submit short papers (3-4 pages excluding references) covering topics on generative modeling applied to human-robot interaction. We invite contributions describing on-going research, results that build on previously presented work, systems, datasets and benchmarks, and papers with demos (that could be displayed easily next to a poster).
Submission link https://openreview.net/group?id=roboticsfoundation.org/RSS/2025/Workshop/GenAI-HRI
Submission
Submissions should use the official RSS LaTeX template. Reviews will be single blind. Accepted papers will be presented as posters during the workshop and selected works will have an opportunity for a spotlight talk. Accepted papers will be available online on the workshop website (non-archival). A best paper award will be sponsored by NVIDIA.
Important Dates
Submission deadline: June 2nd, 2025 (Anywhere on Earth [AoE])
Notifications: June 12th, 2025 (AoE), June 15th, 2025 (AoE)
Camera-ready deadline: June 18th, 2025 (AoE) June 23rd, 2025 (AoE) - Submit revision in OpenReview + 1min videos submissions.
Workshop date: June 25th, 2025 (Full day)