Multi-Scale Attention-Guided Non-Local Network for HDR Image Reconstruction
High-dynamic-range (HDR) image reconstruction methods are designed to fuse multiple Low-dynamic-range (LDR) images captured with different exposure values into a single HDR image. Recent CNN-based methods mostly perform local attention- or alignment-based fusion of multiple LDR images to create HDR contents. Depending on a single attention mechanism or alignment causes failure in compensating ghosting artifacts, which can arise in the synthesized HDR images due to the motion of objects or camera movement across different LDR image inputs. In this study, we propose a multi-scale attention-guided non-local network called MSANLnet for efficient HDR image reconstruction. To mitigate the ghosting artifacts, the proposed MSANLnet performs implicit alignment of LDR image features with multi-scale spatial attention modules and then reconstructs pixel intensity values using long-range dependencies through non-local means-based fusion. These modules adaptively select useful information that is not damaged by an object’s movement or unfavorable lighting conditions for image pixel fusion. Quantitative evaluations against several current state-of-the-art methods show that the proposed approach achieves higher performance than the existing methods. Moreover, comparative visual results show the effectiveness of the proposed method in restoring saturated information from original input images and mitigating ghosting artifacts caused by large movement of objects. Ablation studies show the effectiveness of the proposed method, architectural choices, and modules for efficient HDR reconstruction.
Paper link: https://www.mdpi.com/1424-8220/22/18/7044
Figure 1. Visual comparison of HDR image restoration results obtained using the proposed method and state-of-the-art methods. From left, Wu [28], AHDR [26], AD [27], NHDRR [30], DAHDR [39], Proposed MSANLnet, and Ground Truth.
We propose MSANLnet, a multi-scale attention-guided non-local network for HDR image reconstruction, that extracts important features from the LDR features using the multi-scale spatial attention and adaptively fuses the contextual features to obtain HDR images.
We show that the multi-scale spatial attention, combined with the non-local means-based fusion, can effectively alleviate the “ghosting artifact” and produce aesthetic HDR images.
Our proposed method outperforms the existing methods in both qualitative and quantitative measures, validating the efficacy of the attention modules, non-local means-based fusion, and architectural choices.
Figure 2. Overview of the proposed MSANLnet for HDR image reconstruction.
As shown in Figure 1, the proposed HCoTnet is divided into three main parts: first, a patch embedding (tokenization) module for using the input data (i.e., luminance image and color hint map) in the transformer; second, the Unet-like encoder and decoder modules consisting of transformer blocks [17]; finally, a projection module for outputting the result by restoring and projecting the embedded features onto the ab dimension of the CIELAB color space. The luminance image and color hint map are the inputs of the network, and through the HCoT network, the colorization result is output with the ab color channels.
Figure 8. Visual comparison of HDR images restored from the dataset provided in [28]. From left, Wu [28], AHDR [26], NHDRR [30], AD [27], DAHDR [39], Proposed MSANLnet, and Ground Truth.
Figure 7. Qualitative comparison of HDR result images of the proposed method (MSANLnet) and state-of-the-art models. From left, Wu [28], AHDR [26], AD [27], NHDRR [30], DAHDR [39], Proposed MSANLnet, and Ground Truth.