The objective of this PhD is to investigate the general problem of visual mapping of complex 3D environments that evolve dynamically over time. This requires explicitly taking into account the variability of viewing conditions and content within a 3D geometric and photometric model of the environment. A central objective will be to investigate how to efficiently and accurately represent this model. The aim is to build models that remain usable over long periods of time for navigation tasks (localisation and path following) for autonomous systems or for people that require guidance. This research problem, which is already active in the vision community, has only just begun to be investigated by the robotics community via the new perspective of incremental SLAM (Simultaneous Localization and Mapping).
Now that the problem of modelling large-scale environments is well understood, the next major technical challenge on the horizon is the problem of being able to maintain a valid environment representation over a long period of time. This problem is referred to as Life-Long Learning. This topic is in continuity with the research carried out by the ANR project from CNRS-I3S UNSA: COMANOID.
The existing approach relies on a sensor-based representation that combines color and a depth maps within a topological graph of key-frames. This representation synthesizes information collected in a local area of space by an embedded acquisition system. The optimal representation of the global environment consists of a collection of omni-directional visual key-frames that provide the necessary coverage of an operational area and each sphere is precisely geo-referenced globally. A "pose" graph that links these spheres together, in six degrees of freedom, also defines the domain which is reachable by an autonomous agent and that is potentially exploitable for navigation tasks in real time. As part of this research, it is proposed to develop an approach to Life-long learning map-based representation by considering the following issues:
How to optimally model a local dynamic environment containing non-rigid moving objects within the scene.
How to compactly represent the information contained in this representation so as to maintain real-time interaction, learning, robustness, accuracy and stability over time (saliency map).
How to exploit this representation in a navigation task in real time.
The implementation of the work has been integrated, tested and validated on robotic platforms available at the I3S Sophia-Antipolis and partner laboratories.
Fig. 1 Point-to-hyperplane ICP approach. Top: The CPU-based algorithm can run in real-time using a sensor ASUS Xtion (x1). Center: Point-to-hyperplane error function. The normal is estimated on the extended measurements (color + depth). Bottom: Pointcloud and some reference keyframes in the sequence.
This thesis presents a detailed account of novel techniques for pose estimation by using both, color and depth information from RGB-D sensors. Since pose estimation simultaneously requires an environment map, 3D scene reconstruction will also be considered in this thesis. Localization and mapping has been extensively studied by the robotics and computer vision communities and it is widely employed in mobile robotics and autonomous systems for performing tasks such as tracking, dense 3D mapping and robust localization.
The central challenge of pose estimation lies in how to relate sensor measurements to the state of position and orientation. When a variety of sensors, which provide different information about the same data points, are available, the challenge then becomes part of how to best fuse acquired information at different times. In order to develop an effective algorithm to deal with these problems, a novel registration method named Point-to-hyperplane Iterative Closest Point will be introduced, analyzed, compared and applied to pose estimation and key-frame mapping. The proposed method allows to jointly minimize different metric errors as a single measurement vector with n-dimensions without requiring a scaling factor to tune their importance during the minimization process.
Within the Point-to-hyperplane framework two main axes have been investigated. Firstly, the proposed method will be employed for performing visual odometry and 3D mapping. Based on actual experiments, it has been shown that the proposed method allows to accurately estimate the pose locally by increasing the domain of convergence and by speeding up the alignment. The invariance is mathematically proven and results in both, simulated and real environments, are provided. Secondly, a method is proposed for global localization for enabling place recognition and detection. This method involves using the point-to-hyperplane methods within a Branch-and-bound architecture to estimate the pose globally. Therefore, the proposed method has been combined with the Branch-and-bound algorithm to estimate the pose globally. Since Branch-and-bound strategies obtain rough alignments regardless of the initial position between frames, the Point-to-hyperplane can be used for refinement. It will be demonstrated that the bounds are better constrained when more dimensions are considered. This last approach is shown to be useful for solving mistracking problems and for obtaining globally consistent 3D maps. In a last part of the thesis and in order to demonstrate the proposed approaches and their performance, both visual SLAM and 3D mapping results are provided.