Self-supervised Depth and Ego-motion Estimation for Monocular Thermal Video using Multi-spectral Consistency Loss

Ukcheol Shin (KAIST), Kyunghyun Lee (KAIST), Seokju Lee (KAIST) , In So Kweon (KAIST)

Abstract

Most of the deep-learning based depth and ego-motion networks have been designed for visible cameras. However, it is challenging to use them under low-light conditions such as night scenes, tunnels, and disaster scenarios. On the other hand, A thermal camera can robustly capture temperature images regardless of the lighting conditions. In this paper, we propose an unsupervised learning method for the thermal image based depth and ego-motion estimation. The proposed method exploits multi-spectral consistency that consists of temperature and photometric consistency loss. The temperature loss can provide a proper self-supervisory signal by preserving temporal consistency between adjacent thermal images. The new photometric consistency loss complements the temperature loss by utilizing a depth map and pose estimated from a heterogeneous coordinate system without an additional depth network. The networks trained with the proposed method robustly estimate the depth and pose from monocular thermal video under low-light and even zero-light conditions. To the best of our knowledge, this is the first work to simultaneously estimate both depth and ego-motion from the monocular thermal video in an unsupervised manner.

Methods Overview

Contribution

  • We propose an unsupervised learning method that exploits the temperature and photometric consistency loss for the single-view depth and multi-view pose estimation from a monocular thermal video.

  • We propose an efficient thermal image representation strategy, named clipping-and-colorization, that can provide a sufficient self-supervisory signal for the temperature consistency loss while preserving the temporal consistency.

  • We propose a new photometric consistency loss that can synthesize a visible image with a depth map and pose estimated from a heterogeneous coordinate system to supply complementary self-supervisory signals.

Depth Estimation Results on ViViD dataset

The proposed method robustly estimates the reliable and accurate depth and pose estimation results under low-light and even zero-light conditions.


  • Bian etal : Visible image input + Photometric Losses.

  • Ours(T) : Thermal image input + Temperature Consistency Losses.

  • Ours(MS) : Thermal image input + Multi-spectral Consistency Losses.


Publication

"Self-supervised Depth and Ego-motion Estimation for Monocular Thermal Video using Multi-spectral Consistency Loss" [PDF]

Ukcheol Shin, Kyunghyun Lee, Seokju Lee, and In So Kweon

Robotics and Automation Letters 2021 and ICRA 2022

Bibtext

@article{shin2021self,

title={Self-supervised Depth and Ego-motion Estimation for Monocular Thermal Video using Multi-spectral Consistency Loss},

author={Shin, Ukcheol and Lee, Kyunghyun and Lee, Seokju and Kweon, In So},

journal={IEEE Robotics and Automation Letters},

year={2021},

publisher={IEEE}

}