ZeroCAP: Zero-Shot Multi-Robot Context Aware Pattern Formation via Large Language Models
Vishnunandan L. N. Venkatesh & Byung-Cheol Min
SMART Lab, Purdue University
Vishnunandan L. N. Venkatesh & Byung-Cheol Min
SMART Lab, Purdue University
Abstract: Incorporating language comprehension into robotic operations unlocks significant advancements in robotics, but also presents distinct challenges, particularly in executing spatially oriented tasks like pattern formation. This paper introduces ZeroCAP, a novel system that integrates large language models with multi-robot systems for zero-shot context aware pattern formation. Grounded in the principles of language-conditioned robotics, ZeroCAP leverages the interpretative power of language models to translate natural language instructions into actionable robotic configurations. This approach combines the synergy of vision-language models, cutting-edge segmentation techniques and shape descriptors, enabling the realization of complex, context-driven pattern formations in the realm of multi robot coordination. Through extensive experiments, we demonstrate the systems proficiency in executing complex context aware pattern formations across a spectrum of tasks, from surrounding and caging objects to infilling regions. This not only validates the system's capability to interpret and implement intricate context-driven tasks but also underscores its adaptability and effectiveness across varied environments and scenarios.
An overview of the ZeroCAP system: It traces the workflow from the initial natural language instruction and input image of the environment through to the final deployment of robots, illustrating the sequence of processing stages—including context identification using Vision Language Model (VLM), object segmentation, shape description, and Large Language Model (LLM) coordination for precise robot placement in the environment.
Key Advantages of ZeroCAP -
Zero-Shot Learning: ZeroCAP enables multi-robot systems to perform complex, context-aware pattern formation without requiring extensive pre-training or predefined rules, making it highly flexible and adaptable.
Natural Language Integration: The system seamlessly translates natural language instructions into precise robotic actions, making it user-friendly and allowing for intuitive control over multi-robot tasks.
Improved Spatial Reasoning: By decoupling spatial reasoning from VLMs and using edge and vertex representations, ZeroCAP significantly enhances the accuracy of robot deployments, overcoming the limitations of existing vision-language models.
Submitted to IEEE International Conference on Robotics and Automation (ICRA 2025)
Latest Version (September 23rd 2024): https://arxiv.org/abs/2404.02318
BibTex
@article{venkatesh2024zerocap,
title={ZeroCAP: Zero-Shot Multi-Robot Context Aware Pattern Formation via Large Language Models},
author={Venkatesh, Vishnunandan LN and Min, Byung-Cheol},
journal={arXiv preprint arXiv:2404.02318},
year={2024}
}
This material is based upon work supported by the National Science Foundation under Grant No. IIS-1846221
We would also like to thank Dr. Tamzidul Mina at Sandia National Laboratories for his valuable feedback, which significantly enhanced the clarity and quality of this work.