Self-supervised Depth and Ego-motion Estimation for Monocular Thermal Video using Multi-spectral Consistency Loss
Abstract
Most of the deep-learning based depth and ego-motion networks have been designed for visible cameras. However, it is challenging to use them under low-light conditions such as night scenes, tunnels, and disaster scenarios. On the other hand, A thermal camera can robustly capture temperature images regardless of the lighting conditions. In this paper, we propose an unsupervised learning method for the thermal image based depth and ego-motion estimation. The proposed method exploits multi-spectral consistency that consists of temperature and photometric consistency loss. The temperature loss can provide a proper self-supervisory signal by preserving temporal consistency between adjacent thermal images. The new photometric consistency loss complements the temperature loss by utilizing a depth map and pose estimated from a heterogeneous coordinate system without an additional depth network. The networks trained with the proposed method robustly estimate the depth and pose from monocular thermal video under low-light and even zero-light conditions. To the best of our knowledge, this is the first work to simultaneously estimate both depth and ego-motion from the monocular thermal video in an unsupervised manner.
Methods Overview
Contribution
We propose an unsupervised learning method that exploits the temperature and photometric consistency loss for the single-view depth and multi-view pose estimation from a monocular thermal video.
We propose an efficient thermal image representation strategy, named clipping-and-colorization, that can provide a sufficient self-supervisory signal for the temperature consistency loss while preserving the temporal consistency.
We propose a new photometric consistency loss that can synthesize a visible image with a depth map and pose estimated from a heterogeneous coordinate system to supply complementary self-supervisory signals.
Depth Estimation Results on ViViD dataset
The proposed method robustly estimates the reliable and accurate depth and pose estimation results under low-light and even zero-light conditions.
Bian etal : Visible image input + Photometric Losses.
Ours(T) : Thermal image input + Temperature Consistency Losses.
Ours(MS) : Thermal image input + Multi-spectral Consistency Losses.
Publication
"Self-supervised Depth and Ego-motion Estimation for Monocular Thermal Video using Multi-spectral Consistency Loss" [PDF]
Ukcheol Shin, Kyunghyun Lee, Seokju Lee, and In So Kweon
Robotics and Automation Letters 2021 and ICRA 2022
Bibtext
@article{shin2021self,
title={Self-supervised Depth and Ego-motion Estimation for Monocular Thermal Video using Multi-spectral Consistency Loss},
author={Shin, Ukcheol and Lee, Kyunghyun and Lee, Seokju and Kweon, In So},
journal={IEEE Robotics and Automation Letters},
year={2021},
publisher={IEEE}
}