* indicates the equal contribution and ✉️ indicates the equal corresponding
One Sentence:
Two operators during the data collection, one for tele-operating, and one for adversarial attack. The resulting data will help the model's generalization and robustness!
TL;DR.
The ADC framework is a human-in-the-loop (two operators rather one) approach that compresses diverse task variations, failure-recovery behaviors, and environmental perturbations into minimal yet highly informative demonstrations. By dynamically perturbing object states, environments, and commands in real-time, ADC enables models trained with as little as 20% of the demonstration volume to outperform those built on full traditional datasets.
Built upon the strong GO-1, our ADC could further reduce the training data collection cost and training source needs!
Abstract: The pursuit of data efficiency, where quality outweighs quantity, has emerged as a cornerstone in robotic manipulation, especially given the high costs associated with real-world data collection. We propose that maximizing the informational density of individual demonstrations can dramatically reduce reliance on large-scale datasets while improving task performance. To this end, we introduce Adversarial Data Collection(ADC), a Human-in-the-Loop (HiL) framework that redefines robotic data acquisition through real-time, bidirectional human-environment interactions. Unlike conventional pipelines that passively record static demonstrations, ADC adopts a collaborative perturbation paradigm: during a single episode, an adversarial operator dynamically alters object states, environmental conditions, and linguistic commands, while the tele-operator adaptively adjusts actions to overcome these evolving challenges. This process compresses diverse failure-recovery behaviors, compositional task variations, and environmental perturbations into minimal demonstrations. Our experiments demonstrate that ADC-trained models achieve superior compositional generalization to unseen task instructions, enhanced robustness to perceptual perturbations, and emergent error recovery capabilities. Strikingly, models trained with merely 20\% of the demonstration volume collected through ADC significantly outperform traditional approaches using full datasets. These advances bridge the gap between data-centric learning paradigms and practical robotic deployment, demonstrating that strategic data acquisition, not merely post-hoc processing, is critical for scalable, real-world robot learning.
Additionally, we are curating a large-scale ADC-Robotics dataset comprising real-world manipulation tasks with adversarial perturbations. This benchmark will be open-sourced to facilitate advancements in robotic imitation learning.
Traditional Approach: A tele-operator executes tasks via fixed linguistic instructions in static visual environments.
Adversarial Data Collection~(ADC) Framework: Employs a Two-Humans-in-the-Loop approach, where a secondary operator intervenes to perturb the primary’s execution dynamically when the tele-operator is executing a task.
ADC Loop: The adversarial operator introduces visual (backgrounds, object positions/poses) and linguistic (task goals) perturbations, shifting environmental context and target objects within a single episode.
Even with only 20% ADC data, the model demonstrates significantly greater robustness and positional generalization in both static and dynamic environments compared to the model trained with 100% traditionally collected data.
We further evaluate the model trained with ADC under extreme conditions, simulating scenarios where the equipped camera hardware fails. And experiments showed that the model trained with ADC has the greater robustness against sensor failure.
We thought that robustness comes from both the superior attention concentration and more complete object coverness during the data collection which are brought by our ADC data collection process.
Models trained with ADC focus more precisely on functional cameras, demonstrating superior attention concentration compared to models trained with traditional one.
In the traditional data collection process, the target object (orange) is observed from similar viewpoints, resulting in limited visual diversity. In contrast, ADC introduces dynamic perturbations, allowing the orange to be observed from a wider range of viewpoints.