Evaluation & Tools

Experiments and Benchmarks

In our experiments, we investigate how real-world conditions can affect robot autonomy in heterogeneous environments. We deployed the EnvoDat dataset for four baseline applications - mapping, localisation, object detection, and classification We designed three experiments that address the following research questions (RQs):

RQ1 - Is the performance of the SOTA SLAM algorithms significantly degraded by environment-specific conditions e.g., dynamic entities, varying illumination, opaque surfaces, partial visibility conditions, etc.?
RQ2 - To what extent does feature density or sparsity affect robotic autonomy and perception in heterogeneous environments?
RQ3 - How do the heterogeneity of real-world environments and the viability of objects and terrain appearances, lighting conditions, non-standard objects, etc observed in the majority of the scenes in the EnvoDat affect the object detector models trained on common household, urban or controlled environment datasets?

RQ1 - SOTA SLAM Algorithms Benchmark

We addressed RQ1 by benchmarking five SLAM algorithms - two visual-based (i.e., RTAB and ORBSLAM3), two graph-based LiDAR SLAM (i.e., HDL Graph SLAM and GLIM), and one filter-based LiDAR SLAM (i.e., FAST-LIO2) on EnvoDat. We evaluate their performance based on the following metrics:

METRIC DESCRIPTION

Absolute Trajectory Error (ATE) Measures the global consistency of the entire trajectory.

Relative Pose Error (RPE) Measures the local consistency between poses over fixed-length segments of the trajectory.

Scale Drift (SD) Measures the deviation of the algorithm’s estimated scale of the environment to the ground truth scale over time.

RQ2 - Feature Sparsity and Density

We addressed RQ2 by evaluating the spatial distribution of feature points and correlate it with the per-point ATE and RPE. Figures (a-d) below shows the example correlation between the feature point distributions (clustered, sparse, and evenly distributed) and the per-point trajectory errors.

(a)

(c)

(b)

(d)

Correlation between feature point distributions (clustered, sparse, and evenly distributed) and per-point trajectory errors (ppATE and ppRPE). The top section shows feature density in the reconstructed map, with robot trajectories coloured by distance (blue - start and red - end). The bottom section shows the correlation between feature density and trajectory errors, with colour intensity representing error frequency. The figures showed a complex non-linear relationship between the errors and the feature densities, contrary to the common assumption that more features improve perception and SLAM accuracy.

RQ3 - Object Detection and Classification Benchmarks

We address RQ3 using the EnvoDat dataset by evaluating three pre-trained object detector models: YOLOv8, Fast R-CNN, and Detectron2. We trained these models on the annotated RGB images drawn across all the scenes in the EnvoDat dataset. Their performances are summarised in the table below.