To reliably navigate in the real world, a robot should be able to recognize "where they are". The process of localizing the poses of mobile agents is termed as localization, which requires an internal representation of the environment (the map).
A standard approach is to use an incoming sensor data in combination with the map to make decision whether the robot is in a familiar or novel place. If the sensor is a camera, this problem will be termed as Visual Place Recognition, where, the map is typically defined as the topological map and places will be equivalent to images.
To perform convincingly, a practical VPR algorithm must be robust against appearance changes in the operating environment. These can occur due to higher frequency environmental variability such as weather, time of day, and pedestrian density, as well as longer term changes such as seasons and vegetation growth. A realistic VPR system must also contend with “less cyclical” changes, such as construction and roadworks, updating of signage, facades and billboards, as well as abrupt changes to traffic rules that affect traffic flow. Such appearance changes invariably occur in real life.
Below are image pairs captured in same places but in different environmental conditions
Image pairs captured from a same place but different environmental conditions. Courtesy: Freiburg dataset, Bonn dataset, and Google Streetview
To accommodate long-term evolution in appearance, it is vital to continuously accumulate data and update the VPR algorithm. Under continuous dataset growth, the key to consistently accurate VPR is to "assimilate" new data quickly. This demands a VPR algorithm being scalable. Specifically
Computational cost of testing should not increase visibly with the increase in the map size.
Memory usage for training and inference must grow slowly with the map size.
Updating or retraining the VPR algorithm on new data must also be highly efficient.
We developing scalable VPR system that can be efficiently retrained and compressed, such that the recognition of new queries can exploit all avalable data (including recent changes) without suffering from visible growth in computational cost and memory consumption. Underpinning our approach is a novel temporal image matching technique based on Hidden Markov Models (HMM).
The below figure depicts the overview of our idea using HMM for VPR
The belief provides an information for performing the topological map compression. Figure below is the comparison between updating the map with/without the map compression. Concretely, after the localization of each video query is finished, it is updated to the map, then the map compression scheme is performed to the prevent the expansion of the adjacency matrix of the topological map
We futther investigate HMM with a two-tiered memory management, which exploits temporal look-ahead to transfer promising candidate images between passive storage and active memory when needed. The inference process takes into account both promising images and a coarse representations of the topological map.
We show that this allows constant time and space inference provided that coverage area does not change. The coarse representations can also be updated incrementally to absorb new data. The experimental results on large-scale datasets show excellent scalability of our approach.
Visual localization using HMM and Monte-Carlo localization: https://github.com/dadung/Visual-Localization-Filtering
G2D - a software to assist researchers in collecting syntethtic dataset from the computer game Grand Theft Auto V: https://sites.google.com/view/g2d-software/home
A.-D. Doan, Y Latif, T.-J. Chin, and I. D. Reid. "HM4: Hidden Markov Model with Memory Management for Visual Place Recognition", IEEE Robotics and Automation Letters (RA-L 2020) [arXiv]
Y Latif, A.-D. Doan, T.-J. Chin, and I. D. Reid. "SPRINT: Subgraph Place Recognition for INtelligent Transportation", International Conference on Robotics and Automation (ICRA) 2020.
A.-D. Doan, Y Latif, T.-J. Chin, S. F. Ch'ng, and I. D. Reid. "Visual Localization Under Appearance Change: Filtering Approaches". Neural Computing and Applications (NCAA 2020), Special Issue on Best of DICTA 2019 [arXiv]
A.-D. Doan, Y Latif, T.-J. Chin, and I. D. Reid. "Scalable Place Recognition Under Appearance Change for Autonomous Driving", International Conference in Computer Vision (ICCV 2019) [arXiv]
A.-D. Doan, Y Latif, T.-T. Do, S. F. Ch'ng, T.-J. Chin, and I. D. Reid. "Visual Localization Under Appearance Change: A Filtering Approach". International Conference on Digital Image Computing: Techniques and Applications (DICTA 2019) (APRS/IAPR Best paper award)