Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and Video Denoising

Xiangyu Xu, Muchen Li, Wenxiu Sun, Ming-Hsuan Yang

1. Abstract:

Most of the classical denoising methods restore clear results by aggregating pixels from the noisy input. Instead of relying on hand-crafted aggregation schemes, we propose to explicitly learn this process with deep neural networks. Specifically, we propose a spatial pixel aggregation network (PAN) and learn the pixel sampling and averaging strategies for image denoising. The proposed model naturally adapts to image structures and can effectively improve the denoised results. Furthermore, we develop a spatio-temporal pixel aggregation network (ST-PAN) for video denoising to more efficiently sample pixels across the spatio-temporal space. Our method is able to solve the misalignment issues of large motion from dynamic scenes.

2. Overview of the algorithm:

3. Example results of video denoising:

4. Example results of color input:

The proposed ST-PAN produces clearer results with less artifacts.

5. Visualization of the denoising processs:

Example 1: please find the detailed explanatioin in our paper.

Example 2

6. More examples of single image denoising:

Example 1

Example 2

Example 3

Example 4

7. More examples of video denoising:

Example 1: From top to bottom: (a) whole input image and our output, (b) ground truth, (c) input patches from the reference frame, (d) results of NLM, (e) results of VBM4D, (f) results of KPN, and (g) results of the proposed method.

Example 2: From top to bottom: (a) whole input image and our output, (b) ground truth, (c) input patches from the reference frame, (d) results of NLM, (e) results of VBM4D, (f) results of KPN, and (g) results of the proposed method.

Example 3: From top to bottom: (a) whole input image and our output, (b) ground truth, (c) input patches from the reference frame, (d) results of NLM, (e) results of VBM4D, (f) results of KPN, and (g) results of the proposed method.

Example 4: From top to bottom: (a) whole input image and our output, (b) ground truth, (c) input patches from the reference frame, (d) results of NLM, (e) results of VBM4D, (f) results of KPN, and (g) results of the proposed method.

Example 5: From top to bottom: (a) whole input image and our output, (b) ground truth, (c) input patches from the reference frame, (d) results of NLM, (e) results of VBM4D, (f) results of KPN, and (g) results of the proposed method.

Example 6: From top to bottom: (a) whole input image and our output, (b) ground truth, (c) input patches from the reference frame, (d) results of NLM, (e) results of VBM4D, (f) results of KPN, and (g) results of the proposed method.

Example 7: From top to bottom: (a) whole input image and our output, (b) ground truth, (c) input patches from the reference frame, (d) results of NLM, (e) results of VBM4D, (f) results of KPN, and (g) results of the proposed method.

Example 8: From top to bottom: (a) whole input image and our output, (b) ground truth, (c) input patches from the reference frame, (d) results of NLM, (e) results of VBM4D, (f) results of KPN, and (g) results of the proposed method.

Example 9: From top to bottom: (a) whole input image and our output, (b) ground truth, (c) input patches from the reference frame, (d) results of NLM, (e) results of VBM4D, (f) results of KPN, and (g) results of the proposed method.

Example 10: From top to bottom: (a) whole input image and our output, (b) ground truth, (c) input patches from the reference frame, (d) results of NLM, (e) results of VBM4D, (f) results of KPN, and (g) results of the proposed method.