A figure demonstrating the ROS nodes involved in a single planning step of the proposed algorithm. This flowchart shows the relationship between various ROS nodes in the implementation on the Spot robot. The green blocks are nodes relevant for scan segmentation and filtering. Blue blocks are the map processing steps, and then the red blocks are modified code from the m-explore ROS package which generate the frontiers for use in sampling, which occurs in the Explore node. Not shown is the steps involved in object localization, which is discussed below.
We tested the method in both Ubuntu 18.04 ROS Melodic (simulation) and Ubuntu 20.04 ROS Noetic (hardware).
The model is trained using PyTorch from the expert data collected using the map based pixel-wise scan classifier. The data used to train our model can be found at this link and the code for training the model can be found at this link. Additional data can be trained using the open source code from the map-based classifier, which can be found at this link. Modifications are made to the open source code to publish the segmented scan to a new topic. This scan is published after classifications during each pass through the CorrespondenceCallback method. Once a model is trained, ONNX format is used for fast online inference of the model. We connect ONNX to ROS via a simple package at this link, which subscribes to raw scan and pose data and outputs a pixel-wise labeled scan. The non-map points are extracted and published as a PoseArray before several filtering steps are applied. In practice there are a large number of non-map points due to the size of scans (n = 897), therefore we apply filtering steps to reduce them. First, the pooling operator is used to reduce duplicates within a radius of 0.1m. Second, points inside visually observed regions of the map are removed to avoid repeated visits to previously inspected areas. These steps are shown in the green boxes in the above figure.
Map management is done using the nav_msgs/OccupancyGrid type and the relevant nodes are shown in blue in the figure. A node using the Costmap2D ROS package generates local LiDAR costmap information from raw scans. A separate node generates a triangular field of view to represent the visual sensor using robot pose and occupancy information found from the local LiDAR costmap. This is achieved by filling unknown value to all cells except those within 45 degrees of the robot's heading. In this work, the LiDAR sensor is given a 10mx10m square type costmap of resolution 0.05m and the visual sensor a 5mx5m square type costmap of resolution 0.05m. These respective maps plus pose information is used to update both global search maps by the search map managers, which take robot pose and the costmap and update the respective global search map, which is also a nav_msgs/OccupancyGrid.
The exploration nodes shown in red in the above figure are adapted from the explore_lite ROS package. The main alterations involve separating the original exploration node into two parts. The first generates frontiers and frontier centroids over each of the two global search maps. The second evaluates candidate viewpoints at each frontier centroid from the two global search maps. Evaluation is done by overlaying the same 5mx5m local visual costmap at 4 perpendicular orientations, centered on the centroid. The number of unknown cells in the global visual search map that are discovered at this viewpoint is counted. The list of filtered non-map points is also considered for each viewpoint to score viewpoints with these features of interest.
Object localization is performed using a series of simple steps. First, the raw PointCloud from the LiDAR or depth camera is transformed into the RGB optical frame. Next, the points (x,y,z) are converted into pixel coordinates (u,v,z) using camera intrinsics. Points within the bounding box, found using YOLOv5_ros, are extracted from the converted image. Next, points outside the visual FoV range are filtered out. Finally, the centroid of the remaining depth points is computed in the robot's local frame and transformed back into the global frame. The output is a PoseArray in order to record the locations of each target object discovered.