Image/Video Denoising and Sharpening by Self Similarity and Sparsity
digital image sensor (microlenses, CFA, ADC);
CFA(color filter array): Bayer pattern, demosaicing;
CCD sensor: correlated double sampling (CDS) to remove reset noise;
store the electron charge and move them out of photo sensor converted to voltage signal.
Fig. 1, CCD Sensor and CCD Camera
photon&dark shot noise
amplifier noise
readout (reset, kT/C thermal and 1/f flicker) noise;
quantization noise.
10 times more fix pattern noise than CCD;
10 ~100 times faster then CCD;
all the logic and control circuit be build on the same silicon wafer dice (small in size);
power consumption is 1/2 to 1/4 less then CCD;
convert electron generated by photo diode into voltage signal.
Fig. 2, CMOS Sensor and CMOS Camera
digital (flickering, salt-and-pepper, band/block/ringing, mosquito, bleeding);
analog (film grain, Gaussian/Poisson);
Fig. 3, Gaussian noise, Artifacts
based on various assumptions of content internal structures
smooth the content while preserving its details;
blocking-based [Bosco'03, Amer'02];
smoothing-based [Immerkær'96, Rank'99];
blocking+filtering-based [Shin'05, Foi'08];
wavelet-based [Stefano04];
piecewise linear model [Russo06];
segmentation-based picewise image model [Liu06];
Kurtosis-based DCT bandpass filtering [Zoran & Weiss'09].
Fig. 4, Noise Properties[Liu06]
mean/median/gaussian filtering
bilateral filtering [Tomasi98]
mean shift [Comaniciu&Meer02]
anisotropic diffusion [Perona&Malik90, Black98]
wavelet-based shrinkage in transform domain (thresholding/coring)
adaptive sigma filtering
NLM [Buades05]: Non Local-Means (self similarity for averaging);
K-SVD [Elad&Aharon06]: Optimize the sparse coding in a trained dictionary;
BM3D [Dabov06]: Block Matching 3-D (self similarity in image domain and sparsity in transform domain);
K-LLD [Chatterjee&Milanfar09]: Clustering and Learning Local Dictionaries with kernel regression;
LSSC [Mairal08]: combine grouping and dictionary learning to recursively find the optimized ones;
CSR [Dong11]: combine global sparsity (dictionary learning) and local sparsity (structural clustering).
Fig. 5, Gray Image Denoising [Debov2007a]
Fig. 6, Color Image Denoising [Debov2007b]
Fig. 7, Video Denoising [Debov2007c]
SSIM (structural similarity) learned from training data (original-filtered pair)[Zhou'04];
Saliency map(importance of patches)[Itti'97] ;
JND (just noticeble distortion from visual masking) [Zhang'08];
Artifact index (blockiness, ringing, or color bleeding metric) [Lu'03].
Three step search (TSS) [Koga'81];
New three step search (NTSS) [Li'94];
Four step search (4SS)[Po'96];
Diamond search (DS)[Zhu'00];
3DRS[deHann'93];
MVFAST[Ma & Hosur'99];
PMVFAST[Tourapis, Au & Liou'01];
Fig. 8, Predictive Search in Video BM3D [Debov2007c]
Use sliding of step 3~5 pixels in horizontal/vertical directions;
Restrict max size of a patch group with upper bound 16 or 32;
Search for matching in a local neighborhood 15x15 or 33x33;
Integral image in fast block matching (not usable for predictive search);
Predictive search (especially efficient for video data);
Use 2d + 1d transform for 3d transform;
Simplify transform coefficients, like sparse integral transform;
Precompute 2d spectra before grouping (may not suitable for video data).
Fig. 9, Video Denoising Results (Wall)
Fig. 10, Video Denoising Results(Lepord)
BM4D [Magioni12]: motion trajectory-based volume grouping;
Shape-adaptive by PCA [Dabov'09];
Locally adaptive regression kernel (LARK) [Takeda, Farsiu, Milanfar'07].
Fig. 11, BM4D: volume extraction by motion trajectory[Magioni12]
Fig. 12, BM3D-Shape Adaptive PCA [Dabov'09]
Fig. 13, Locally Adaptive Regression Kernel [Takeda, Farsiu, Milanfar'07]
Pipeline: splatting-blurring-slicing based on importance sampling;
Bilateral grid [Chen, Siggraph’07]: a new data structure that enables fast edge-aware image processing;
x and y correspond to pixel position;
z corresponds to pixel intensity;
Euclidean distance accounts for edges;
Grid can be coarsely sampled;
To apply bilateral grid with three steps: grid creation, processing on bilateral grid and 2d map extraction by slicing.
Fig. 14, Bilateral Grid
Gaussian-kd-tree [Adams, Siggraph’09]: a Monte-Carlo kd-tree sampling algorithm to efficiently computes nonlinear filters:
First, assign each value to be filtered a position in some vector space;
Then, replace every value with a weighted linear combination of all values, with weights determined by a Gaussian function of distance between the positions;
like Gaussian smoothing, bilateral filter, non-local means and BM3D;
Building The Gaussian KD Tree: sparsely represents the high-dimensional space, only a 2D manifold;
Querying The Gaussian KD Tree: using stratified weighted importance sampling to implement the embedding, blurring, and sampling of the space, which complexity is independent of the filter size and linear in the dimensionality;
Fig. 15, Gaussian KD tree compared with Bilateral grid: Scattering pixel values onto nearby samples (splatting), gathering at each
sample from nearby samples (blurring), and then gathering at each pixel from nearby samples to construct the output (slicing).
It tessellates the high-dimensional space with uniform simplices (permutohedra): Lattice points have integer coordinates with a consistent remainder modulo;
Make splatting and slicing fast (simplex computation);
Less sparse than Gaussian kd-tree;
Permutohedral Lattice [Adams, Eurographic’10]:
Fig. 16, Permutohedral Lattice
Fig. 17, Comparsion of Permutohedral Lattice with Gaussian KD-tree and Bilateral Grid
A single chip or embedded system (SoC, either ASIC or FPGA)
Issues to be considered:
interface
memory
interconnect
bandwith
algorithm O(.)
Fig. 18, SoC of Samsung: Video Processing in Exynos
Top: best quality,the most complex;
Middle: medium quality, reduced complexity;
Bottom: reasonable quality, the simplest.