Making Sense of Data in Robotics:
Composition, Curation, and Interpretability at Scale
Making Sense of Data in Robotics:
Composition, Curation, and Interpretability at Scale
Location: COEX Convention & Exhibition Center, Room E5
Livestream: We are planning on live streaming the event (tentatively on YouTube) and will post a link here day of.
Everyone wants a working robot that reliably completes goals, interacts with users, and remains safe. Across robot learning approaches – whether scaling imitation learning, building predictive world models, or adapting foundation models for embodied reasoning – the choice of training data is emerging as a critical driver of performance. Yet, while data underpins almost all successful robot learning systems, few often stop to question what specific aspects of the dataset led to the success. Current hypotheses about what constitutes "good" data for robot learning tend to be heuristic and at times contradictory. Arguments may favor either diversity or uniformity, regard multi-modality as beneficial or detrimental, or consider errors in demonstrations either as harmful or as useful when recoverable. This workshop brings together diverse perspectives from the robot learning community and broader ML fields to advance a deeper, more principled understanding of what makes robot learning data “good?”
In spite of increasing investment in large-scale robotics data – through teleoperation or fleet deployment – collection has outpaced our scientific understanding of what constitutes effective data for robot learning. The way forward is unclear: significant progress has emerged from small, carefully crafted datasets, world modeling, parallel simulation, and representations from off-domain data sources in vision and language, all of which make different decisions about what data to use or collect. Moreover, as our robot systems increase in capability, choices become more nuanced – for instance, which data characteristics matter when pushing policy performance from 90% to 99%. The data-design space for robot learning is massive, but seldom discussed in depth.
In sum, this workshop seeks to bring together academic researchers and industry practitioners towards the development of a science of data for robot learning, structured along the following themes:
🧩 Theme 1: Data Composition – What data should we use in robotics?
What properties and modalities of data (e.g., demonstrations, failures, interventions, tactile, language annotations, non-robotics data, human intent, preference, or uncertainty) provides the most value for training general-purpose robot learning models?
How do the desiderata for dataset composition vary with different robot learning objectives (e.g., imitation learning vs. world modeling)?
Can we meaningfully define and measure important properties of robotics datasets, such as coverage, diversity, and quality?
What can we learn from dataset design in other ML domains, like vision and language? Can we formalize taxonomies for robotics dataset composition to promote a similar degree of reusability and comparison?
🧹 Theme 2: Data Curation – What data should we keep, drop, or collect next?
How can we evaluate the quality of robotics data? What makes a “good” example?
What are principled ways to select, filter, or weight data for different robot learning tasks?
Do robotics datasets contain harmful biases or spurious correlations? If so, how can we mitigate their effect?
Can we define common benchmarks for data curation in robot learning?
How do methods for active data collection (e.g., curriculum learning, data selection, adaptive sampling) scale to physical robots?
💡 Theme 3: Data Interpretability – How can we understand and analyze the role of data?
What tools exist (or are needed) to interpret how individual data points, demonstrations, or data subsets affect the behavior and generalization of robotics models?
Can interpretability guide what new data to collect for a deployed robot system?
How can interpretability inform design and tradeoffs in dataset composition, for example, using a few high-quality examples vs. large-scale, weakly-labeled data?
Dr. Marco Pavone is an Associate Professor of Aeronautics and Astronautics at Stanford University, where he directs the Autonomous Systems Laboratory. He also leads autonomous vehicle research at NVIDIA. Before joining Stanford, he was a Research Technologist within the Robotics Section at the NASA Jet Propulsion Laboratory. He received a Ph.D. degree in Aeronautics and Astronautics from the Massachusetts Institute of Technology in 2010.
Mayee Chen is a PhD candidate in Computer Science at Stanford University, advised by Professor Christopher Ré. Her research focuses on advancing the fundamentals of artificial intelligence through data-centric approaches, particularly in training data curation, where she has developed techniques for data mixing, curriculum learning, and weak supervision. Her work has been recognized with a best student paper runner-up award at UAI 2022, a best paper award at an AAAI 2022 workshop, and spotlights at ICLR and NeurIPS 2023. Mayee is currently a research intern at the Allen Institute for AI (AI2), driving the data mixing efforts for OLMo 3, their next open-source large language model. She has also interned at Microsoft Research and obtained her summa cum laude B.S.E. in Operations Research and Financial Engineering from Princeton University.
Joseph Lim is an Associate Professor in the Kim Jaechul School of Artificial Intelligence at Korea Advanced Institute of Science and Technology (KAIST). Previously, he was was an assistant professor at the University of Southern California (USC). Before that, he completed his PhD at Massachusetts Institute of Technology under the guidance of Professor Antonio Torralba, and also had a half-year long postdoc under Professor William Freeman and a year long postdoc under Professor Fei-Fei Li at Stanford University. He received his bachelor degree at the University of California - Berkeley, where he worked in the Computer Vision lab under the guidance of Professor Jitendra Malik. He has also spent time at Microsoft Research, Adobe Creative Technologies Lab, and Google.
Professor Ken Goldberg is President of the Robot Learning Foundation and Chair of the Berkeley AI Research (BAIR) Lab Steering Committee. He is co-founder of Ambi Robotics and Jacobi Robotics and William S. Floyd Distinguished Chair of Engineering at UC Berkeley, where he leads research in robotics and automation including grasping, manipulation, and learning for applications in industry, homes, agriculture, and robot-assisted surgery.
Chelsea Finn is an Assistant Professor in Computer Science and Electrical Engineering at Stanford University and a co-founder of Physical Intelligence (Pi). Her research interests lie in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction. To this end, her work has pioneered end-to-end deep learning methods for vision-based robotic manipulation, meta-learning algorithms for few-shot learning, and approaches for scaling robot learning to broad datasets. Her research has been recognized by awards such as the Presidential Early Career Award for Scientists and Engineers, the Sloan Fellowship, and the ACM doctoral dissertation award. Prior to joining Stanford, she received her Bachelor's degree in Electrical Engineering and Computer Science at MIT and her PhD in Computer Science at UC Berkeley.
Abstract: The great recent advances witnessed in robot learning have come from scaling up behavior cloning from expert human demonstration, performed in tandem with a few key algorithmic design choices. In this talk, we focus on one seemingly small but ubiquitous such choice: the practice of action-chunking, or predicting and executing small sequences of actions in open loop . We demonstrate mathematically that behavior cloning in continuous action spaces, as is the case in robot learning, can require exponentially more data than one might naively think. We then show that, when combined with choices like end-effector control, action-chunking directly mitigates this exponential blow up in data needs. Along the way, we will understand how notions of dynamical stability and control theory can drastically affect the difficulty of robot behavior cloning. We conclude by gesturing to forthcoming work that exposes the roles and limitations of generative models and deep learning optimizers in imitating human experts.
Max Simchowitz is an assistant professor at the Machine Learning Department at Carnegie Mellon University with a courtesy appointment in the Robotics Institute. His work studies theoretical foundations and new methodologies for machine learning problems with an interactive, sequential, or dynamical component, currently focusing on reinforcement learning and applications to robotics. His past work has ranged broadly across control, theoretical reinforcement learning, optimization and algorithmic fairness. He received his PhD from University of California, Berkeley in 2021 under Ben Recht and Michael I. Jordan, and completed his postdoctoral research under Russ Tedrake in the Robot Locomotion Group at MIT. His work has been recognized with an ICML 2018 Best Paper Award, ICML 2022 Outstanding Paper Award, and RSS 2023 and ICRA 2024 Best Paper Finalist designations.
Abstract: Large-scale robot learning, such as using Large Behavior Models (LBMs), has become increasingly the norm in robot learning literature since the success of ChatGPT. Nevertheless, there are many questions that remain surrounding their development in the context of real-world, embodied systems. One key difference between the robotic and LLM domains is the scarcity of robot data in comparison. What data we need and how we make use of the data available to us are pressing research questions for the community. My talk will focus specifically on the path to making policies more robust and adaptive through data. First, I will talk about methods to autonomously detect failure data, which can be used downstream to guide data collection and enable policy recovery. Then I will talk about how to adapt available training data through data curation to best suit a deployment environment. Along both directions, I will overview our work as part of the Trustworthy Learning under Uncertainty (TLU) team at TRI.
Bio: Masha Itkina is a Research Lead and Manager in the Large Behavior Model (LBM) division at the Toyota Research Institute (TRI). At TRI, she co-leads the Trustworthy Learning under Uncertainty (TLU) effort in the context of robotic manipulation. Her research focuses on policy evaluation, failure detection and mitigation, and active learning. Previously, she completed her PhD at the Stanford Intelligent Systems Lab (SISL) on uncertainty-aware perception for self-driving cars. Her work has been published in top-tier robotics and machine learning conferences, including RSS, CoRL, ICRA, and NeurIPS. Masha has been invited to speak about her work at TLU at workshops as part of these conferences.
Christopher Agia
Stanford University
Joey Hejna
Stanford University
Rohan Sinha
Stanford University
Huihan Liu
UT Austin
Helen Wang
U Washington
Yuejiang Liu
Stanford University
Jack Collins
Collaborative Robotics
Mahi Shafiullah
NYU Courant
Jeannette Bohg
Stanford University
Kimin Lee
KAIST
Dorsa Sadigh
Stanford University
For any questions about the workshop, please email corldataws@gmail.com.