Xinyi Yang*, Yuxiang Yang*, Chao Yu+, Jiayu Chen, Jingchen Yu, Haibing Ren, Huazhong Yang and Yu Wang+
*Equal Contribution, +Corresponding Authors
Tsinghua University, Meituan Inc.
We investigates the multi-agent cooperative exploration problem, which requires multiple agents to explore an unseen environment via sensory signals in a limited time. A popular approach to exploration tasks is to combine active mapping with planning. Metric maps capture the details of the spatial representation, but are memory intensive and may vary significantly between scenarios, resulting in inferior generalization. Topological maps are a promising alternative as they consist only of nodes and edges with abstract but essential information and are less influenced by the scene structures. However, most existing topology-based exploration tasks utilize classical methods for planning, which are time-consuming and sub-optimal due to their handcrafted design. Deep reinforcement learning (DRL) has shown great potential for learning (near) optimal policies through fast end-to-end inference. In this paper, we propose Multi-Agent Neural Topological Mapping (MANTM) to improve exploration efficiency and generalization for multi-agent exploration tasks. MANTM mainly comprises a Topological Mapper and a novel RL-based Hierarchical Topological Planner (HTP). The Topological Mapper employs a visual encoder and distance-based heuristics to construct a graph containing main nodes and their corresponding ghost nodes. The HTP leverages graph neural networks to capture correlations between agents and graph nodes in a coarse-to-fine manner for effective global goal selection. Extensive experiments conducted in a physically-realistic simulator, Habitat, demonstrate that MANTM reduces the steps by at least 26.40% over planning-based baselines and by at least 7.63% over RL-based competitors in unseen scenarios.
Multi-Agent Neural Topological Mapping (MANTM) consists of 3 components:
Topological Mapper
Hierarchical Topological Planner
Local Planner & Local Policy
We follow the paradigm of centralized training and decentralized execution (CTDE) with partial observations of N agents, where agent k receives local observation at timestep t. The overview of MANTM is depicted in Fig. 1. Each agent shares the same Topological Mapper, Hierarchical Topological Planner, Local Planner, and Local Policy while making decisions independently. The Topological Mapper of each agent utilizes a visual encoder and distance-based heuristics to construct a topological map based on RGB-D images and estimated poses from sensory signals. For better cooperation, the Topological Mapper transforms all individual maps into the same coordination and merges them. The Hierarchical Topological Planner (HTP) receives graphs that respectively contain current agent information, main node, and ghost nodes, along with corresponding historical graphs containing agent trajectories, selected main nodes, and selected ghost nodes from the merged topological map. HTP utilizes GNN on these graphs to hierarchically capture spatio-temporal and intra-agent information. It then selects a ghost node as a global goal at each global step. Finally, the Local Planner plans a reachable path to the predicted global goal via the Fast Marching Method (FMM), and the Local Policy generates environmental actions based on the reachable path. We remark that the Local Planner and the Local Policy do not involve multi-agent interactions, so they are directly adopted from ANS.
The Definition of Fast Marching Method (FMM)
The definition of the FMM distance [1] is as follows: the fast marching method is a numerical method for solving boundary value problems of the Eikonal equation:
|∇u(x)|=1/f(x) for x∈Ω;
u(x)=0 for x∈∂Ω.
Typically, such a problem describes the evolution of a closed surface as a function of time u with speed f in the normal direction at a point x on the propagating surface. The speed function is specified, and the time at which the contour crosses a point x is obtained by solving the equation. Alternatively, u(x) can be thought of as the minimum amount of time it would take to reach ∂Ω starting from the point x. Time is related to the distance, and thus we use FMM to calculate the shortest movable distances between two nodes in this work.
Figure 1: Overview of Multi-Agent Neural Topological Mapping (MANTM).
We introduce a Topological Mapper to provide a merged topological map, as shown in Fig. 1. Inspired by [2, 3], we consider two types of nodes in the topological map: main nodes and ghost nodes. Main nodes are located where agents have already explored. Ghost nodes are located in the unexplored area and adjacent to the corresponding main node.
In map construction, the Topological Mapper relies on a visual encoder and distance-based heuristics. Firstly, we adopt an encoder to predict visual embeddings of panoramic RGB-D images. The topological map constructs main nodes based on the cosine similarity between visual embeddings. Specifically, when the cosine similarity between the visual embedding at the current location and that of existing main nodes is below a threshold (i.e., 0.75 in our work), a new main node is constructed at that location and connected to the main node that the agent most recently localized. However, images of some spatially irrelevant locations may be similar (e.g., images of corners and corridors both contain a large portion of the wall), resulting in incorrect connections between main nodes. To deal with this problem, we delete the connection of main nodes with far-distant FMM distance at each global step. Once a main node is constructed, m new ghost nodes are uniformly adjacent to this main node at a distance λ. Ghost nodes can be removed if they are located in the explored areas identified by the current agent's predicted metric map via SLAM. We remark that the metric map is just for identifying explored regions and not for global planning. Besides, a ghost node can be converted to a main node if the agent passes through it. Therefore, the potential number of remaining ghost nodes ranges from 0 to m×N_main, where N_main is the number of main nodes. In our work, we choose λ=3m and m=12. Moreover, we delete spurious ghost nodes and their connected edges based on the FMM distance between two nodes. More specifically, if the FMM distance between two ghost nodes associated with different main nodes is below a threshold (i.e., 1 meter in our task), we will randomly delete one of the ghost nodes, along with the connected edges. If the FMM distance between a ghost node and a main node, where this ghost node does not correspond to this main node, is below a threshold (i.e., 3 meters in our task), we will remove this ghost node, along with its connected edges.
For better cooperation, we transform all individual topological maps into the same coordinate system and merge them based on the estimated poses of the agents at each global step. During merging, if the distance between two main nodes from different maps is below a threshold (i.e., 3m in our work), we randomly remove one of the nodes and redirect its connected edges to the remaining node. We remark that we only need the merged maps for global decision-making. Therefore, we update the merged maps only at each global decision-making step.
The Hierarchical Topological Planner, shown in Fig.2, has 4 parts, including a Memory Fusion, a Main Node Selector, a Ghost Node Selector, and an Action Generator.
Memory Fusion: Exploration is a memory-based task where historical information significantly impacts the current state. Therefore, this module incorporates each graph with its historical counterpart to mitigate the risk of memory loss. Concretely, we first use a weight-shared multi-layer perception (MLP) layer to encode all node features. Then, the Memory Fusion merges each graph with its corresponding historical graph and yields an updated graph, G ̃, via the self-attention and the cross-attention mechanisms. The output dimensions of the last cross-attention layer match those of the original node features, ensuring that the shape of G ̃ remains consistent with that of G.
Main Node Selector: The Main Node Selector is introduced for the high-level goal selection and yields the matching scores between the agent and the main nodes. It predicts the next preferred region to explore since a main node with a higher matching score implies a higher probability of selecting its corresponding ghost nodes as the global goal. More concretely, this module receives G ̃_{a} and G ̃_{m} and then leverages an Individual Encoder and a Relation Encoder to infer the matching scores, S_{m,re}. The Individual Encoder perceives the states of the agents and the spatial information of the main nodes, while the Relation Encoder captures the correlation between the agents and the main nodes.
Ghost Node Selector: For the low-level goal selection, the Ghost Node Selector provides a probability distribution of ghost nodes and predicts the next location to explore based on the preferred region. Similar to the Main Node Selector, the Ghost Node Selector first receives G ̃_{a} and G ̃_{g} and provides ghost node scores, S_{g,re}, via an Individual Encoder and a Relation Encoder. The Individual Encoder captures intra-agent interactions and spatial information of ghost nodes, while the Relation Encoder infers the correlations between agents and ghost nodes. Additionally, we update S_{g,re} by multiplying it with S_{m,re} to aggregate information from main and ghost nodes for more appropriate goal selection.
Action Generator: The Action Generator selects a global goal from all ghost nodes at each global step. We regard S^_{g,re} as the probability distribution over the ghost nodes. The action space for the HTP is A={a|a in ghost nodes}, where a is a discrete variable sampled from a categorical distribution, indicating the index of the selected ghost node.
Figure 2: Workflow of Hierarchical Topological Planner (HTP), including a Graph Fusion, a Main Node Selector, a Ghost Node Selector, and an Action Generator.
We test our MANTM and a few baselines in selected scenes from the HM3D and Gibson datasets using the Habitat simulator. The goal of the task is to coordinate a team of agents to explore unknown areas as fast as possible within a limited horizon. We report 3 behavior statistics, including the number of Steps to reach a 90% coverage, the Coverage Ratio, and the Mutual Overlap. We use 3 planning-based baselines, which are CoScan, Topological Frontier (TF) and Voronoi, as well as 3 RL-based baselines, ANS-Merge, NeuralCoMapping (NCM) and MAANS. The results for 1, 2 and 5 agents are presented below. We remark that a single agent has no Mutual Overlap ratio since the overlapped area is defined as the area that two agents have both experienced.
MANTM and the baselines are under the same assumptions in our task. All the baselines only replace the Topological Mapper and the HTP with alternatives and keep the rest the same as MANTM, except for TF, which only substitutes the HTP.
Algorithm 1
CoScan: CoScan is a frontier-based method. Firstly, it yields all frontiers via a merged obstacle map at each timestep. Then the K-Means algorithm is applied to find representatives of these frontiers, which are considered as candidates for global goals. Finally, we assign each candidate node to an agent based on the MTSP algorithm, where the cost function is the shortest collision-free distance between two locations. Pseudocode of CoScan can be found in Algorithm 1.
Algorithm 2
Topological Frontier (TF) :The Topological Frontier is a graph-based method. It adopts a centralized policy that selects N nodes with minimal costs for N agents at each global step. If the agent arrives at its assigned node within a global step, it will choose an unexplored edge randomly to explore the area continuously. Pseudocode of Topological Frontier can be found in Algorithm 2.
Algorithm 3
Voronoi: Voronoi can be regarded as a graph-based method. Firstly, it partitions the whole map into a few sub-regions. Each agent owns a sub-region where the center of the chosen sub-region is the closest to the agent. Then the agent utilizes the Utility algorithm to select a global goal in the chosen region. Since each agent is forbidden to choose global goals in other agents' sub-regions, Voronoi alleviates overlapped exploration among agents. The pseudocode of Voronoi is provided in Algorithm 3.
ANS-Merge: Active Neural SLAM (ANS) uses convolution neural networks (CNN) to extract features from egocentric local metric maps and global metric maps, and then to predict global goals for path planning in a single-agent setting. We extend ANS for multi-agent exploration where the global planner takes in merged maps.
NeuralCoMapping (NCM) : NeuralCoMapping transforms the mapping task into a bipartite graph matching problem, where nodes represent agents and frontiers. It uses multiplex graph neural networks (mGNN) to infer the neural distance between frontiers and agents. Next, we construct an affinity matrix based on the predicted neural distance for graph matching. Then with a differentiable linear assignment layer to select nodes with minimal neural distance, we optimize the mGNN via reinforcement learning.
MAANS: Multi-Agent Active Neural SLAM (MAANS) is a variant of ANS for multi-agent cooperative exploration. It takes in metric maps and uses the team-wise transformer to encode the relationships between agents and spatial representations, thus promoting cooperation. Note that MAANS trains an individual policy for each map and then obtains a final policy via policy distillation. For a fair comparison, we train MAANS directly with different maps without using policy distillation.
The results in Tab. 1 show that MANTM outperforms all baselines with N=1, 2, 5 agents on unseen scenes on the Gibson dataset. Specifically, in the N=1 setting, MANTM reduces Steps by at least 4.93% compared to RL-based competitors and 8.72% compared to planning-based competitors. In the N=2 setting, MANTM has at least 6.96% and 12.99% fewer Steps than RL-based and planning-based competitors, respectively. In the N=5 setting, MANTM has at least 5.10% and 7.98% fewer Steps than RL-based and planning-based competitors, respectively.
Table 1: Performance of MANTM, planning-based baselines, and RL-based baselines with N = 1, 2, 5 agents on the Gibson dataset. Note that the horizon of middle and large maps with N = 2, 5 is 300 steps and 600 steps, respectively. The horizon of maps for N = 1 is twice that for N = 2, 5.
We also report the domain generalization performance in Tab. 2, where all models trained on Gibson are evaluated on the HM3D domain with N=1,2,5 agents. MANTM has the best performance with the lowest Steps and the highest Coverage ratio. MANTM reduces Steps by at least 27.98% compared to planning-based baselines and 9.19% compared to RL-based competitors with N=1 in super large scenes. In N=2, MANTM reduces Steps by at least 12.44% against planning-based baselines and 7.21% against RL-based competitors in super large scenes. In the N=5 setting, MANTM reduces Steps by at least 17.36% against planning-based baselines and 8.77% against RL-based competitors in super large scenes.
Table 2: Performance of MANTM, planning-based baselines and RL-based baselines with N = 1, 2, 5 agents on the HM3D dataset. Note that the horizon of middle, large, and super large maps with N = 2, 5 is 450 steps, 720 steps, and 1800 steps, respectively. The horizon of maps for N = 1 is twice that for N = 2, 5.
Voronoi
Overlap: 0.48
Steps: 476.91
Ratio: 0.87
MAANS
Overlap: 0.50
Steps: 433.66
Ratio: 0.94
MANTM
Overlap: 0.35
Steps: 362.52
Ratio: 0.95
Voronoi & MAANS v.s. MANTM on the large map Pettigrew
Voronoi
Overlap: 0.43
Steps: 686.43
Ratio: 0.93
MAANS
Overlap: 0.57
Steps: 720.00
Ratio: 0.89
MANTM
Overlap: 0.40
Steps: 619.92
Ratio: 0.95
Voronoi & MAANS v.s. MANTM on the large map 7dmR22gwQpH
Voronoi
Overlap: 0.33
Steps: 1608.07
Ratio: 0.78
MAANS
Overlap: 0.38
Steps: 1309.13
Ratio: 0.90
MANTM
Overlap: 0.27
Steps: 1277.05
Ratio: 0.94
Voronoi & MAANS v.s. MANTM on the super large map NBg5UqG3di3
[1] Sethian, James A. "A fast marching level set method for monotonically advancing fronts." proceedings of the National Academy of Sciences 93.4 (1996): 1591-1595.
[2]Chaplot, Devendra Singh, et al. "Neural topological slam for visual navigation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
[3] Hahn, Meera, et al. "No rl, no simulation: Learning to navigate without navigating." Advances in Neural Information Processing Systems 34 (2021): 26661-26673.