RT-X Net: RGB-Thermal cross attention network for Low-Light Image Enhancement
Raman Jha Adithya Lenka Mani Ramanagopal Aswin Sankaranarayanan Kaushik Mitra
IIT Madras IIT Madras CMU CMU IIT Madras
Raman Jha Adithya Lenka Mani Ramanagopal Aswin Sankaranarayanan Kaushik Mitra
IIT Madras IIT Madras CMU CMU IIT Madras
Abstract
In nighttime conditions, high noise levels and bright illumination sources degrade image quality, making low-light image enhancement challenging. Thermal images provide complementary information, offering richer textures and structural details. We propose RT-X Net, a cross-attention network that fuses RGB and thermal images for nighttime image enhancement. We leverage self-attention networks for feature extraction and a cross-attention mechanism for fusion to effectively integrate information from both modalities. To support research in this domain, we introduce the Visible-Thermal Image Enhancement Evaluation (V-TIEE) dataset, comprising 50 co-located visible and thermal images captured under diverse nighttime conditions. Extensive evaluations on the publicly available LLVIP dataset and our V-TIEE dataset demonstrate that RT-X Net outperforms state-of-the-art methods in low-light image enhancement. The code and the V-TIEE can be found here this https URL.
The overview of our method:
(a) In the RT-X Net, Input RGB and Thermal images are fed to the illumination estimator, from which it gets the image features, and those features are used for self-attention, and then these features will be fused using cross cross-attention network to produce an enhanced low-light image.
(b) The workflow of the Self-attention network
(c) The architecture of the cross-attention network to fuse the features of the RGB and thermal images
Qualitative results on the synthetic LLVIP dataset and real-world V-TIEE dataset. Columns denote different scenes. The first two rows show the input visible and thermal images. The next five rows are the outputs from RT-X Net and state-of-the-art visible image enhancement algorithms. The last row shows the reference well-exposed image.
Real-world V-TIEE Dataset: Co-located Visible-Thermal Image Pairs for HDR and Low-light Vision Research
Citations:
If you find our code or datasets useful in your research, please cite the following:
@misc{jha2025rtxnetrgbthermalcross,
title={RT-X Net: RGB-Thermal cross attention network for Low-Light Image Enhancement},
author={Raman Jha and Adithya Lenka and Mani Ramanagopal and Aswin Sankaranarayanan and Kaushik Mitra},
year={2025},
eprint={2505.24705},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.24705},
}