CoViS-Net

A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications

CoViS-Net is a decentralized, real-time, multi-robot visual spatial model that learns spatial priors from data to provide relative pose estimates and bird's-eye-view representations. We demonstrate its effectiveness in real-world multi-robot formation control tasks.

corl_suppl_main.mp4

Summary

We show the functioning of our model on a multi-agent pose control task. In our in-door laboratory, we render predicted poses and uncertainty estimates on top of one leader and two follower robots.

Indoor Experiments

We run our model on a variety of indoor scenes, with a remote-controlled leaders and up to four followers configured to stay at a fixed relative pose.


corl_suppl_indoor.mp4
corl_suppl_outdoor.mp4

Outdoor Experiments

While our model was trained exclusively on indoor data, we show that it generalizes to outdoor experiments.


Heterogeneous Multi-Robot System

We show the flexibility of our model by deploying it in a heterogeneous multi-robot experiment. Our platform-agnostic model generalizes to multiple different robot platforms.

corl_heterogeneous.MP4

We show quantitative evaluations of the BEV representation prediction on scenes of our real-world dataset. The top row shows the image for each node, the middle row the ground-truth poses in the coordinate frame of each node, and the bottom row pose predictions and pose uncertainty with the BEV representation prediction in the background.

sn-corridor_01.mp4

Corridor A

s-street_01.mp4

Corridor B

sn05_01.mp4

Office A

sn05_02.mp4

Office B

intellab_terrace_00.mp4

Outdoor

intellab_01.mp4

Study A

intellab_02.mp4

Study B

sn-balcony_01.mp4

Sunny