Data Generation for Robotics
Workshop @ RSS 2024
Description
Data-driven learning has emerged as a promising paradigm to teach robots to perform tasks of interest. However, while there has been considerable progress in developing specific data-driven learning methods and systems, there has been a lack of discussion around the data itself. This workshop seeks to place a data-centric focus on robotics and guide questions around data collection, generation, curation, and modeling.
Discussion Topics
The role of humans in data collection and data generation
At which level of the data collection, generation, and curation process should humans be involved and in what capacity?
What kinds of opportunities and challenges are there for lessening the burden on humans?
Scalable Data Collection in Sim vs. Real
How should data for robotics be collected, and are there ways that are more inherently scalable and cost-feasible?
Are there different considerations for simulation vs. real-world data collection?
Automating Data Generation and Curation
How should LLMs, generative models, simulation, and privileged experts (TAMP, motion planners, etc.) play a role in data generation and data curation? Some examples:
Automated Task and Curriculum Design: LLMs for task proposals, reward functions for an RL agent, and scripting plans to be executed by a motion planner
Foundation Models for Content Creation: How can we use large-scale generative models to create data / assets / new content for learning?
Are there ways to automatically synthesize additional data points, or augment existing data points, that are conducive to downstream learning, and that can minimize the burden of human data collection?
Data Quality Control
How important is the quality of data for training effective robot policies?
How should large datasets be curated to control for quality?
Data-centric Algorithm Design
How should modeling paradigms for downstream learning algorithms adapt to these methods for data acquisition?
Workshop Structure
The workshop will consist of:
8 speaker talks
1 spotlight session for papers that are awarded oral presentations
1 poster session for all papers (which doubles as a coffee break)
2 debates
Debate 1: Data generation in simulation vs. real world. Where should data collection take place, and what kinds of data should be collected? What are some opportunities and challenges for data collection in simulation and in the real world?
Debate 2: The role of humans in data collection, generation, and curation. Given the advent of powerful generative AI models, to what extent should humans be involved at different parts of the robotics data collection stack, and in what capacity?
Virtual attendees will be able to stream all workshop proceedings (except the poster session), and participate by submitting questions for the speakers and during the debates. Our workshop organizers will be monitoring virtual Q&A to facilitate virtual participation, and ensure equal participation for both in-person and virtual participants.
Paper Submission Guidelines
We invite several types of submissions, including full-length papers and position papers, that cover the discussion topics above, as well as related areas.
Paper Format: Submitted papers should follow the RSS 2024 paper format (see this link) and should be anonymized for review. There are no page length requirements, but we suggest a length around 4-9 pages long (excluding references or supplementary material). We welcome both unpublished original contributions as well as recently published papers (e.g. published in 2024). Accepted papers will be presented as posters. A few papers will also be selected as spotlight presentations.
Submission Link: https://openreview.net/group?id=roboticsfoundation.org/RSS/2024/Workshop/DGR
Important Dates
Paper Deadline: May 20th, 2024
Camera-Ready Deadline: July 2nd, 2024
Workshop Date: July 15th, 2024