¹Purdue University, ²Indiana University Bloomington
†Equal contribution
In this research, we propose a novel few-shot learning framework for multi-robot systems that integrate both spatial and temporal elements: Few-Shot Demonstration-Driven Task Coordination and Trajectory Execution (DDACE). Our approach leverages temporal graph networks for learning task-agnostic temporal sequencing and Gaussian Processes for spatial trajectory modeling, ensuring modularity and generalization across various tasks. By decoupling temporal and spatial aspects, DDACE requires only a small number of demonstrations, significantly reducing data requirements compared to traditional learning from demonstration approaches. To validate our proposed framework, we conducted extensive experiments in task environments designed to assess various aspects of multi-robot coordination—such as multi-sequence execution, multi-action dynamics, complex trajectory generation, and heterogeneous configurations. The experimental results demonstrate that our approach successfully achieves task execution under few-shot learning conditions and generalizes effectively across dynamic and diverse settings. This work underscores the potential of modular architectures in enhancing the practicality and scalability of multi-robot systems in real-world applications.
Challenges in Current Learning from Demonstration (LfD) for Multi-Robot Systems (MRS):
High Data Dependence: Most existing LfD approaches require large-scale demonstrations, making them impractical for real-world, few-shot learning scenarios.
Temporal Oversight: LfD in MRS often ignores the temporal structure of tasks, focusing only on goal states rather than action sequences.
Limited Generalization: Many methods are task-specific and fail to generalize across diverse multi-robot tasks.
Domain-Specific Constraints: While some fields (e.g., robot soccer) address sequential behavior, they rely on handcrafted reward functions, limiting adaptability.
Overview of the proposed DDACE framework. In the training phase, demonstration data is preprocessed, graph structures are extracted via spectral clustering, and the Temporal Graph Networks (TGN) and Gaussian Processes (GP) models are trained independently. During the execution phase, the trained models predict coordinated action sequences and generate spatial trajectories for new scenarios, enabling adaptive and efficient multi-robot task execution.
Task Information
Task 1: 3 heterogeneous robots performing a 5-step collaborative transport task
Task 2: 11 robots executing a 10-step coordinated sequence to evaluate scalability
Task 3: Sports-inspired scenario involving 3 heterogeneous robots in a 4-step action sequence
Task 4: 4 robots executing complex curved and spiral paths to test spatial generalization, also deployed in a real-world robotic setting
Real World Setup
a) Hamster Mobile Robot Platform [1]
b) Real World Experimental Setup [1]
Our real-world experiments were conducted on a tabletop testbed (1.3 m × 3 m × 1.3 m, 50 kg) (b). The setup adapted from [1] consists of Hamster mobile robots (a), a compact differential-drive platform (35 mm × 30 mm × 40 mm, 30 g) actuated by DC motors and controlled via Bluetooth. Each robot is equipped with infrared (IR) proximity sensors that allow collision avoidance and maintenance of relative distances.
To enable global tracking, an overhead USB camera was mounted on a PVC pipe frame above the testbed. The camera streams continuous top-down views of the environment to a desktop computer (Intel Core i7 CPU, 4 GB RAM, Ubuntu 16.04). Using the OpenCV library [2], the system processes these images to detect ArUco markers affixed to the robots. The markers serve as a means of extracting robot positions and orientations for precise visual tracking.
Demonstrations and Executions
Quantitative performance of DDACE against baseline models across all experimental tasks
Measurements:
OSR (Overall Success Rate): Entire task completion
SSR (Sequence Success Rate): Step-by-step sequence accuracy
GCR (Goal Condition Recall): Goal-reaching success accuracy for each robot
FD (Fréchet Distance): Trajectory similarity to demonstration
Baselines for Comparison:
GNN: End-to-end model without structure refinement
LLM (GPT-4o): Language-based inference from demonstrations
Ablation (No Spectral Clustering): Fully connected graph input to TGN
Effect of Demonstration Quantity on Model Training
This figure illustrates the impact of the number of demonstrations on the training loss across four different tasks. As the number of demonstrations increases, the model converges faster and achieves lower final loss, especially in tasks with complex temporal or spatial structures.
Case Study: Effect of Robot Scale and Action Space Complexity
The left graph shows how increasing the scale of the robot team impacts training convergence—larger teams introduce more inter-agent dependencies, slowing down learning. The right graph examines how expanding the action space complexity affects convergence, revealing that more diverse actions increase reasoning difficulty but the model still achieves stable performance with sufficient demonstrations.
REFERENCES
[1] A. Lee, W. Jo, S. S. Kannan, and B.-C. Min, “Investigating the effect of deictic movements of a multi-robot,” International Journal of Human– Computer Interaction, vol. 37, no. 3, pp. 197–210, 2021.
[2] G. Bradski and A. Kaehler, Learning OpenCV: Computer vision with the OpenCV library. ” O’Reilly Media, Inc.”, 2008.