Using Foundation Models to Perform a Drone Show

Introducing CLIPSwarm [1], an algorithm revolutionizing swarm drone formation modeling through natural language input. Imagine effortlessly crafting stunning drone displays simply by describing your vision in words. Our approach begins by transforming your words into dynamic text prompts, guiding an iterative process to sculpt the perfect formation. From exploration to exploitation, CLIPSwarm meticulously refines formations, ensuring they align seamlessly with your description. Visual representations, using alpha-shape contours and optimal color selection, breathe life into your words, while our CLIP-based similarity assessment guarantees fidelity to your vision. With precise control actions, our drones navigate with grace and precision, delivering displays that exceed expectations. Experience the power of CLIPSwarm firsthand and witness the seamless fusion of language and technology in action. Explore our supplementary video for a glimpse into the future of drone artistry.

References

[1] Pueyo, P., Montijano, E., & Schwager, M. (2024). CLIPSwarm: Generating Drone Shows from Text Prompts with Vision-Language Models, https://arxiv.org/html/2403.13467v1