Robot Evaluation for the Real World
Workshop at Robotics: Science and Systems (RSS) 2025
June 25, 2025
Los Angeles, California
RTH (Ronald Tutor Hall) 211 for Talks/Debate/Panel
Epstein Plaza for Poster Session
Workshop at Robotics: Science and Systems (RSS) 2025
June 25, 2025
Los Angeles, California
RTH (Ronald Tutor Hall) 211 for Talks/Debate/Panel
Epstein Plaza for Poster Session
Zoom information
https://cmu.zoom.us/j/94467599166?pwd=YEvC4w1KbiWtK7jBWb0XYSIwRuFp9o.1
Meeting ID: 944 6759 9166
Passcode: 517466
Robotics has seen rapid advancements, with emerging methods and systems achieving impressive results on public benchmarks. Yet, despite these strides, real-world deployment continues to expose persistent challenges in generalization, robustness, safety, and reliability. This workshop aims to critically examine how we evaluate robotic systems and whether current benchmarking practices effectively reflect the complexities of real-world operation across domains like factories, construction sites, and homes.
To ensure the reliability and accountability of robotic systems while maintaining scientific momentum, we propose a fresh look at evaluation practices. Are existing benchmarks sufficient, or should they be redesigned to better capture real-world behavior? Can we provide theoretical guarantees of safety and performance, or must we rely on comprehensive empirical assessments? How do we balance thoroughness in evaluation with accessibility and replicability, especially across diverse research environments?
This workshop brings together experts from robotics, machine learning, human-robot interaction, cognitive science, and related fields to discuss these pressing questions. We organize the challenges into three key themes:
Evaluations and Progress: How can evaluations drive meaningful advancements without creating barriers that stifle innovation?
Accessibility and Relevance: Should benchmarks reflect the full complexity of deployment, or be simplified to foster broader participation?
Alignment Across Stakeholders: How do we design evaluations that meet the needs of academia, industry, and policy, without compromising on rigor or utility?
By exploring both the technical and societal implications of evaluation, we aim to develop frameworks that support not only scientific discovery but also safe and impactful real-world deployment. Through collaborative discussion, we hope to lay the groundwork for rigorous, transparent, and flexible evaluation methodologies that guide the next generation of robotics research.
Andrea Bajcsy is an Assistant Professor in the Robotics Institute at Carnegie Mellon University where she leads the Interactive and Trustworthy Robotics Lab (Intent Lab). She broadly works at the intersection of robotics, machine learning, control theory, and human-AI interaction. Prior to joining CMU, Andrea received her Ph.D. in Electrical Engineering & Computer Science from University of California, Berkeley in 2022. She is the recipient of the NSF CAREER Award (2025), Google Research Scholar Award (2024), Rising Stars in EECS Award (2021), Honorable Mention for the T-RO Best Paper Award (2020), NSF Graduate Research Fellowship (2016), and worked at NVIDIA Research for Autonomous Driving.
Anirudha Majumdar is an Associate Professor at Princeton’s Mechanical and Aerospace Engineering department and a Research Scientist at Google DeepMind. His research focuses on control algorithms for high-performance, safety-critical robotics. He earned his Ph.D. at MIT.
Elena Messina is Principal at Prospicience LLC, where she provides consulting for strategic planning and on development, assessment, and adoption of advanced robotics and AI technologies. Previously, she founded and led major research programs and projects at the National Institute of Standards and Technology that focused on advancing the capabilities of robots through the definition of performance requirements, metrics, test methods, tools, and testbeds.
Prof. Juha Röning, is the head of the Biomimetics and Intelligent Systems Group (BISG) research unit and a professor at the Faculty of Information Technology and Electrical Engineering at the University of Oulu. He has more than 30 years of experience in mobile robotics, holds three patents, and has published more than 400 papers in the areas of computer vision, robotics, intelligent signal analysis, and software security. He is currently serving as a Board of Director for euRobotics aisbl (Vice-President Research) and Adra. He was the academic coordinator for DIMECC CyberTrust programme and the project coordinator for H2020 HYFLIERS and CS-AWARE projects. He is currently the project coordinator of CS-AWARE-NEXT project (Horizon Europe).
Lindsay Sanneman is an Assistant Professor at the School of Computing and Augmented Intelligence at ASU. Her research focuses on model evaluation and alignment from human factors and human-robot interaction perspectives. She received her Ph.D. from MIT.
Vincent Vanhoucke is a Distinguished Engineer at Waymo, focusing on AI and machine learning for robotics. He was a founding member of Google Brain and led Google’s robotics research team. His work includes large-scale deep learning systems for speech and vision, such as the 'Inception' architectures. He holds a Ph.D. from Stanford and is an IEEE Fellow.
Yukie Nagai is a Project Professor at the International Research Center for Neurointelligence at the University of Tokyo. Her research spans computational neuroscience and cognitive developmental robotics. She received her Ph.D. from Osaka University.
Ted Xiao is a research scientist at Google DeepMind, where he works on making robots smarter. His research focuses on robot learning, internet-scale foundation models, and reinforcement learning.
Dr. Henrik I. Christensen is the Qualcomm Chancellor’s Chair of Robot Systems and a Distinguished Professor of Computer Science at Dept. of Computer Science and Engineering, UC San Diego. He is the director of the Contextual Robotics Institute, the Cognitive Robotics Laboratory, and the Autonomous Vehicle Laboratory.
Cherie Ho
Carnegie Mellon University
Masha Itkina
Toyota Research Institute
Rebecca Martin
Carnegie Mellon University
Ransalu Senanayake
Arizona State University
Hadas Kress-Gazit
Cornell University
Kenny Kimble
National Institute of Standards and Technology (NIST)
Rares Ambrus
Toyota Research Institute
Siqi Zhou
Technical University of Munich
Jean Oh
Carnegie Mellon University
Jonathan Francis
Bosch Center for AI; Carnegie Mellon University
Haruki Nishimura
Toyota Research Institute
Somrita Banerjee
Apple
Allen Ren
Princeton University