Learning coordinated behaviors for multi-robot systems from only a few demonstrations is difficult because temporal task dependencies and spatial trajectory generation are tightly coupled, which increases the hypothesis space and often yields unstable generalization in data-scarce regimes. We present DDACE, a structured few-shot learning framework that introduces a structural inductive bias by explicitly decoupling temporal coordination from spatial trajectory synthesis. Demonstrations are first processed via spectral clustering to extract coordination structure and form interaction graphs. A Temporal Graph Network predicts action dependencies and sequences, while Gaussian Process models generate progress-parameterized geometric trajectories that adapt to new start/goal configurations. This factorized design reduces hypothesis coupling and improves data efficiency for few-shot multi-robot coordination. Extensive simulation studies and real-robot experiments show that DDACE produces stable coordinated executions from a small number of demonstrations and improves trajectory consistency compared to end-to-end imitation baselines under limited data.
Challenges in Current Learning from Demonstration (LfD) for Multi-Robot Systems (MRS):
High Data Dependence: Most existing LfD approaches require large-scale demonstrations, making them impractical for real-world, few-shot learning scenarios.
Temporal Oversight: LfD in MRS often ignores the temporal structure of tasks, focusing only on goal states rather than action sequences.
Limited Generalization: Many methods are task-specific and fail to generalize across diverse multi-robot tasks.
Domain-Specific Constraints: While some fields (e.g., robot soccer) address sequential behavior, they rely on handcrafted reward functions, limiting adaptability.
Overview of the proposed DDACE framework. In the training phase, demonstration data is preprocessed, graph structures are extracted via spectral clustering, and the Temporal Graph Networks (TGN) and Gaussian Processes (GP) models are trained independently. During the execution phase, the trained models predict coordinated action sequences and generate spatial trajectories for new scenarios, enabling adaptive and efficient multi-robot task execution.
Task Information
Task 1: 3 heterogeneous robots performing a 5-step collaborative transport task
Task 2: 11 robots executing a 10-step coordinated sequence to evaluate scalability
Task 3: Sports-inspired scenario involving 3 heterogeneous robots in a 4-step action sequence
Task 4: 4 robots executing complex curved and spiral paths to test spatial generalization, also deployed in a real-world robotic setting
Real World Setup
a) Hamster Mobile Robot Platform [1]
b) Real World Experimental Setup [1]
Our real-world experiments were conducted on a tabletop testbed (1.3 m × 3 m × 1.3 m, 50 kg) (b). The setup adapted from [1] consists of Hamster mobile robots (a), a compact differential-drive platform (35 mm × 30 mm × 40 mm, 30 g) actuated by DC motors and controlled via Bluetooth. Each robot is equipped with infrared (IR) proximity sensors that allow collision avoidance and maintenance of relative distances.
To enable global tracking, an overhead USB camera was mounted on a PVC pipe frame above the testbed. The camera streams continuous top-down views of the environment to a desktop computer (Intel Core i7 CPU, 4 GB RAM, Ubuntu 16.04). Using the OpenCV library [2], the system processes these images to detect ArUco markers affixed to the robots. The markers serve as a means of extracting robot positions and orientations for precise visual tracking.
Demonstrations and Executions
Quantitative performance of DDACE against baseline models across all experimental tasks
Measurements:
OSR (Overall Success Rate): Entire task completion
SSR (Sequence Success Rate): Step-by-step sequence accuracy
GCR (Goal Condition Recall): Goal-reaching success accuracy for each robot
FD (Fréchet Distance): Trajectory similarity to demonstration
Baselines for Comparison:
GNN: End-to-end model without structure refinement
LLM (GPT-5.2): Language-based inference from demonstrations
Ablation (No Spectral Clustering): Fully connected graph input to TGN
Effect of Demonstration Quantity on Model Training
This figure illustrates the impact of the number of demonstrations on the training loss across four different tasks. As the number of demonstrations increases, the model converges faster and achieves lower final loss, especially in tasks with complex temporal or spatial structures.
Case Study: Effect of Robot Scale and Action Space Complexity
The left graph shows how increasing the scale of the robot team impacts training convergence—larger teams introduce more inter-agent dependencies, slowing down learning. The right graph examines how expanding the action space complexity affects convergence, revealing that more diverse actions increase reasoning difficulty but the model still achieves stable performance with sufficient demonstrations.
REFERENCES
[1] A. Lee, W. Jo, S. S. Kannan, and B.-C. Min, “Investigating the effect of deictic movements of a multi-robot,” International Journal of Human– Computer Interaction, vol. 37, no. 3, pp. 197–210, 2021.
[2] G. Bradski and A. Kaehler, Learning OpenCV: Computer vision with the OpenCV library. ” O’Reilly Media, Inc.”, 2008.