Our Architecture
Our Architecture
Fig. 5 The architecture of our proposed method. We generate a raindrop mask with an Attentive GAN pretrained model, then put the masked rain image into an inpainting model to fill in the missing region with derain scenery.
Our Idea + Foundation on Prior Work
For the purpose of recovering the base image from raindrops, our idea is to approach this problem in two stages. The first stage is removing raindrops by detecting the locations and generating masks. This would allow our model to first identify important regions of the image before focusing on those regions in order to discover and reconstruct patterns in the image scene that the raindrops distort. The second stage is inpainting the masked-out regions with generative models. Image inpainting is a method that takes in images with components missing from the image and generates the missing information through contextual components and features. We want to use inpainting models to fill in clean images given input images with raindrops. There is no prior work relating to removing raindrops by inpainting except [10], however [10] is based solely on the physical property modeling of raindrops while our method is learning based.
For the mask generation task, we tried out several methods described in the "Raindrop Mask Generation" section. We decided to base our method on previously successful raindrop removing methods of using attention to obtain an attentive map for our model before applying a Generative Adversarial Network (GAN) [1]. In our implementation, we would be using the attentive portions to create a mask of the raindrops in the image.
For image inpainting, we based our method on EdgeConnect, which performs canny edge detection to extract a filtered edge image that is used as an additional feature input for the GAN [5]. Different from EdgeConnect which aims to inpaint the original input images (rain images in our case), we propose an architecture to let the model learn to complete the scene without raindrops. The details of the EdgeConnect architecture and our modifications are described in the "Inpainting" section.
Fig. 6 Sample results from EdgeConnect from sample images. Masked out images are passed into the network where a canny edge detector is run on the images to detect non masked edges (black). The remaining edges are generated (blue) and the edges are passed in as an additional feature alongside the masked out images into the GAN in order to generate the final reconstructed image. [5]
Intuition behind Our Idea
Attention is used to identify the portions of the image that are most important for our model to focus on. In the case of rain, we are more concerned about reconstructing the regions of the image that the rain distorts, rather than trying to reproduce all portions of the image such as areas that the rain does not cover at all. Reconstructing these distorted regions also involves learning relevant information from nearby regions - a task attention is also especially useful for.
GANs are used to identify patterns and regularities to generate new examples based on them, which is very useful in our task of reconstructing the image after removing the raindrops. The observed raindrops will create similar distortions in the image to be masked out, and determining how the removed regions can be filled in and reconstructed afterwards based on the rest of the image is a task that GANs are well suited to.
Dataset
Fig. 7 The dataset we use is the same one used by our baseline, where the images are of outdoor scenes with various focal features and general scenes, as shown in Fig. 7. The inputs into our network are the scenes with rain (top), while the ground truth are the same scenes without rain (bottom). [1]
Raindrop Mask Generation
We divide our raindrop removal process up into two parts. First, we remove the raindrops, and then we inpaint the removed areas. In this section, we focus on the methodology used to mask out raindrops. Here, when we say "mask out," we mean remove sections of the image.
Edge Detection and Opening/Closing
Our first method for masking out raindrops is to perform edge detection via a simple kernel, followed by an opening operation, and then finally a closing operation. We then find the region in the photo with the highest concentration of "small" regions (that we propose would be raindrops), and identify that overall regions' resulting objects as raindrops.
We hypothesize that edge detection would lead to mostly the identification of foreground objects as well as raindrops. The nature of these images (raindrops on the lens on a relatively blurry picture), gives way to the characteristic that raindrops are the main objects in these images that are in focus. As a result, we believe that applying a simple edge detection kernel will mostly focus on edges surrounding raindrops and some foreground objects as well.
Post the edge detection, we apply an opening operation. This operation essentially removes noise by getting rid of smaller edges, and leaving highly dense edges that are likely one unit. The intuition here is that the resulting masks of the raindrops will be cleaned up to only include main droplets.
Finally, we apply a closing operation with the goal of cleaning up patchy raindrop masks to not have small holes or black points. The intuition here is that this will lead to clean, masked out water droplets that are fully masked out within a droplet. We expect that raindrops will be collected as white circles in the same region, and identify regions with high concentrations of these "circles" as raindrop regions. Then, we use droplets in these regions as our final mask.
Strong Edge Detection
Our next method involved using noise reduction via a Gaussian filter, finding intensity, non-max suppression, and thresholding. Before applying this composition of functions, we first apply a - surprising, given the round nature of a raindrop - Sobel filter (horizontal and vertical edge detector). We chose to do this in order to take advantage of the horizontal and vertical edge components of a raindrop. Our previous method does not at all focus on these axis, so we were curious what the effect would be on isolating a raindrop. Although a raindrop may seem to be fully circular, there are clear vertical and horizontal components that we believe can be very easily detected.
The remaining functions are often bundled together in CV tasks into a "Canny" operation. The purpose of the this agglomeration is to identify "strong" edges. Here, we hypothesize that it will be mostly raindrops that have these "strong" edges due to the nature of images having relatively blurry backgrounds paired with clear raindrops.
Generator
Our final method involves the more complex approach of using a generator that is trained to output the rain masks. This is the first module of a multi module model. The overall model is trained to remove raindrops with ground truth images. This model comes from our baseline attentive GAN model, which we have previously discussed in thorough detail [1]. So, overall, this method uses a specific module of our baseline model to extract the final layer of the LSTM's identified raindrops.
Prior Work
Little prior work has focused on the isolated method of identifying the raindrop mask. Rather, previous has focused on the whole process of raindrop removal. These models have included generators and attention on their own.
Inpainting
Fig. 8 EdgeConnect original model structure [5].
Fig. 9 Our inpainting model structure
EdgeConnect inpainting architecture
The EdgeConnect Model [5] takes the binary mask and the complete image as input, and learns to complete the missing mask region. As shown in Fig. 8, it is composed of two GAN models: Edge GAN and Inpaint GAN. Both generators are encoder-decoder-based residual convolution networks, where the encoders downsample the image feature and decoders upsample it to the initial image size. The residual block consists of dilated convolution networks. All layers inside the generators apply spectral normalization and instance normalization. Both discriminators take the PatchGAN [9] architecture with convolutions followed by spectral normalization and LeakyReLU activation. In detail, the edge model takes the masked image in grayscale as input with the binary mask and predicts an edge map that hallucinates the mask area with the edge. The edge is drawn with the Canny edge detector tool. Its loss is composed of the adversarial loss (discriminator loss + generator loss) and an additional feature matching loss. The feature matching loss aims to let the generator output image features that are similar to the real image in the discriminator. Then the inpainting model completes the missing area based on the edge given.
Our inpainting architecture
Our model is mainly based on the EdgeConnect model. However, the EdgeConnect model aims to complete the missing mask part with the original input image area while we want our model to take the rain images as the input, and learn to complete the raindrop mask area with the derained scenery. As shown in Fig. 9, instead of taking the original input rain image as the real image in the discriminator, we take the ground truth derained image as the real image, in hopes that both models can learn to predict edges and image patches that are similar to the derained image.
Training Details
We firstly train the edge model to make it convergence (around 20 epochs). Then train the inpainting model to make it converge (around 30 epochs) with the previously trained edge model. Finally, we finetune both models by removing the discriminator in the edge model and training 2 models together in an end-to-end manner. 4 trained models (edge generator & discriminator models, inpainting generator & discriminator models) are saved in the google drive link.
Due to limited computing resources and limited time, we resize our input images to a fixed size (256, 256). Both models use Adam optimizers with the generator learning rate 0.0001 at the beginning and down to 0.00001 later. The discriminator learning rate is 0.00001.