Di-NeRF

Di-NeRF: Distributed NeRF for Multi-Robot Collaborative Learning with Unknown Relative Poses

Mahboubeh Asadi*, Kourosh Zareinia*, Sajad Saeedi*

* Toronto Metropolitan University

Di-NeRF: Distributed NeRF for Collaborative Learning with Unknown Relative Poses

Code (coming soon)

Supplementary_document_Di_NeRF.pdf

Supplementary Document - Di-NeRF

Di-NeRF

Abstract

Collaborative mapping of unknown environments can be done faster and more robustly than a single robot. However, a collaborative approach requires a distributed paradigm to be scalable and deal with communication issues. This work presents a fully distributed algorithm enabling a group of robots to collectively optimize the parameters of a Neural Radiance Field (NeRF). The algorithm involves the communication of each robot's trained NeRF parameters over a mesh network, where each robot trains its NeRF and has access to its own visual data only. Additionally, the relative poses of all robots are jointly optimized alongside the model parameters, enabling mapping with unknown relative camera poses. We show that multi-robot systems can benefit from differentiable and robust 3D reconstruction optimized from multiple NeRFs. Experiments on real-world and synthetic data demonstrate the efficiency of the proposed algorithm.

Di-NeRF is open-source (code coming soon)

Experiment

In the work, three datasets are used to evaluate Di-NeRF: The synthetic dataset, Tanks and Temples, and the Waymo dataset.

Di-NeRF vs Centralized NeRF on Synthetic Dataset

For the synthetic sequences, two robots are involved, and the dataset is split into two segments. For instance, in the case of the Chair sequence, robot R#1 exclusively observes the front of the chair. In contrast, robot R#2 focuses solely on the back (without any overlap in the robots' trajectories). A similar pattern is applied to other synthetic sequences. The whole data are divided into two parts for synthetic dataset and with COLMAP the poses were estimated for each segment. Therefore, the coordinate system for each robot is different, and the relative pose can be estimated by running Di-NeRF.

Raw data provided for robot R#1

Raw data provided for robot R#2

Independent optimization for robot R#1

Independent optimization for robot R#2

In the following cases, the local origins for robot R#1 and robot R#2 are different.

Di-NeRF without pose optimization for robot1

Di-NeRF without pose optimization for robot2

Di-NeRF for robot1

Di-NeRF for robot2

Relative pose optimization for two robots, red poses are for robot R#1 and blue poses are for robot R#2 which is optimized to transform to robot1 local origin. There are 20 percent overlap in the trajectory.

Di-NeRF allows robots to cooperatively optimize local copies of a neural network model without explicitly sharing visual data. In this figure, two robots use Di-NeRF to cooperatively optimize a unified NeRF. Each robot only sees part of the chair, and robots do not know their relative poses. The robots communicate over a wireless network (gray dashed lines) to cooperatively optimize the final network and relative poses.

Di-NeRF can reconstruct the entire scene via the collaboration of two robots, where each robot only sees a part of the scene. Individual training results in poor reconstruction quality for some areas of the scene, whereas Di-NeRF maintains good quality throughout. In each column, the left image is for robot R#1 and the right one is for robot R#2.

Tanks and Temples - Barn Dataset

In row (a), input images, sourced from the Tanks and Temple dataset (Barn) are shown, which is provided to each robot. Rows (b, c, d) showcase rendered images generated by Di-NeRF. In these rows, the robots collaboratively communicate to reconstruct the complete scene. Notably, the views depicted in rows b, c, and d are strategically chosen—each is only visible in the raw data of a specific robot but can be rendered by all robots collectively in the final reconstruction.

Di-NeRF for different numbers of robots. All robots are fully connected, and the relative poses and NeRFs are trained jointly. On the left side of the image, the allocation of frames and poses from the dataset to the different robots are shown in different colors. None of the robots have common images, but in the end, all robots can render the whole scene.

The San Francisco Mission Bay Waymo Dataset - Unbounded Scenes

For this experiment, a segment of the data, including 233 images from one camera (one of 8 that provide a complete surround view from the roof of the car), measuring approximately 286 meters, was selected. The rendering result for this setup is shown for centralized and Di-NeRF training. The average values for the PSNR and SSIM metric are 25.10 and 0.814 over 6 robots and 25.21 and 0.825 for the centralized setup, In Table~\ref{table4}, the values for each robot are presented.