Image inpainting

Global and Local Attention-Based Free-Form Image Inpainting

Deep-learning-based image inpainting methods have shown significant promise in both rectangular and irregular holes. However, the inpainting of irregular holes presents numerous challenges owing to uncertainties in their shapes and locations. When depending solely on convolutional neural network (CNN) or adversarial supervision, plausible inpainting results cannot be guaranteed because irregular holes need attention-based guidance for retrieving information for content generation. In this paper, we propose two new attention mechanisms, namely a mask pruning-based global attention module and a global and local attention module to obtain global dependency information and the local similarity information among the features for refined results. The proposed method is evaluated using state-of-the-art methods, and the experimental results show that our method outperforms the existing methods in both quantitative and qualitative measures.

Paper link: https://www.mdpi.com/1424-8220/20/11/3204

Figure 1. Example of free-form image inpainting. From left, corrupted image with a free-form mask, inpainted image obtained by our proposed method, and respective ground truth. Please note that due to the availability of global and local attention, our model generates visually plausible inpainting results.

Figure 2. Overall architecture of the proposed model. The proposed model has two stages, namely a coarse network and a refinement network. Both the coarse and refinement networks have two branches, namely a regular branch and an attention branch. The coarse network calculates the global dependencies among the features and prunes out mask features using the proposed MPGA module, generating a rough or coarse inpainted image. The refinement network takes the coarse output as the input and calculates both global and local similarities using the proposed GLA module and produces a refined inpainting result.

Figure.. Visual results on ImageNet dataset [22]. From left: input image, inpainted image, and ground truth.

Figure. Visual results on CelebA-HQ dataset [23]. From left: input image, inpainted image, and ground truth.