RL-PGO: Reinforcement Learning-based Planar Pose-Graph Optimization
RL-PGO: Reinforcement Learning-based Planar Pose-Graph Optimization
Nikolaos Kourtzanidis and Sajad Saeedi
Toronto Metropolitan University
In this work, we present to the best of our knowledge, the first Deep Reinforcement Learning (DRL) based 2D pose-graph optimization (PGO). We demonstrate that the pose-graph optimization problem can be modelled as a partially observable Markov Decision Process. The proposed agent outperforms state-of-the-art solver g2o on challenging instances where traditional nonlinear least-squares techniques may fail or converge to unsatisfactory solutions. Experimental results indicate that iterative-based solvers bootstrapped with the proposed approach allow for significantly higher quality estimations. We believe that reinforcement learning-based PGO is a promising avenue to further accelerate research towards globally optimal algorithms. Thus, our work paves the way to new optimization strategies in the 2D/3D pose estimation domain.
RL-PGO is open-source: [Github Link]
FIGURE 1: (a)-(b) Cumulative reward plots for Training Environments 1-5.
FIGURE 2: Comparison amongst the standard real-world and synthetic graphs. From left to right: Standalone RL best estimate from 10 evaluations (red), LM100 estimate (blue), LM30 estimate when bootstrapped by the RL result as an initial guess (magenta). It is observed that RL standalone outperforms LM100 on M3500A. In all datasets except for MIT, RL+LM30 produced estimations with the highest quality (lowest objective cost function value). Interestingly, the standalone proposed RL agent was able to achieve solutions of adequate structure, despite having never seen any of these test graphs throughout training.
FIGURE 3: Analysis on the influence of measurement uncertainty ratio for the instance σ_t=0.05m. The RL+GN50 estimate (magenta) as shown in (c), only required 8 iterations for convergence and successfully recovered the global minimum solution. As the ratio of uncertainty increases, the RL standalone estimate (black) provides a much accurate estimation when compared to GN100 (blue).
FIGURE 4: Analysis on the influence of inter-nodal distance spacing for the instance d=10m. The RL+GN10 estimate (magenta) as shown in (c), only required 8 iterations for convergence, to a visually meaningful solution. As the sum of total measurement distances increase, the proposed RL agent (black) provides a much more accurate estimation when compared to GN100 (blue).
TABLE I: g2o's Levenberg-Marquardt (LM) solver set to perform until convergence for Standard Benchmark Datasets
TABLE II: GTSAM’s Levenberg-Marquardt (LM) solver set to perform until convergence for Standard Benchmark Datasets. Standalone RL is able to attain lower objective cost values F(x) in all datasets except for M3500 and Intel when compared head-to-head with LM set to run until convergence.
TABLE III: Comparison of performances based on non-decaying action spaces with varying action range magnitudes (rad). The objective function values F(x) and time required were obtained from the average of 10 runs. Training runs with action range magnitude 0.45 rad and greater were not able to achieve asymptotic convergence.
TABLE IV: Comparison of performances based on decaying action spaces with varying action range magnitudes (rad). Decaying action spaces begin their episode with the initial action range magnitude, and is halved at each cycle. The objective function values F(x) and time required were obtained from the average of 10 runs. None of the training runs were able to achieve asymptotic convergence.
TABLE V: The computational time required for various number of cycles per episode. Here m denotes the number of measurements or factors.
TABLE VI: Performance comparison of network architecture with and without recurrent branch. Recorded F(x) and time required for optimization values are averaged over 10 evaluation trials.
If you have any questions, feel free to reach out to us at the following email us at:
{nkourtza, s.saeedi}@torontomu.ca
@article{Kourtzanidis2023LCSS,
author={Kourtzanidis, Nikolaos and Saeedi, Sajad},
journal={IEEE Control Systems Letters},
title={{RL-PGO}: Reinforcement Learning-Based Planar Pose-Graph Optimization},
year={2023},
volume={7},
number={},
pages={3777-3782},
doi={10.1109/LCSYS.2023.3340619}}