Estimation and Denoising
Nonlinear Approximation Based Image Recovery Using Iterated Denoising
Examples are grayscale to show work does not take advantage of color (with color life is easier.).
Missing regions in images estimated from surrounding regions.
There are no image-specific operations. Same method works in video, audio, time series, etc. Technique knows nothing about images, periodic regions, edges, textures, etc. No conditional statistics, conditional probability distributions, etc., are estimated.
All prior knowledge about the estimation problem is condensed into the basis (weights in neural-net language).
Designed to minimize mean-squared-error but fidelity criterion can be changed.
Each processing block generates a better approximation of the unknown pixels depicted through the frames of video examples. W_2 involves convolutional computations followed by an averaging (linear) projection/reduction.
Can be considered a convolutional neural network with pre-designed weights.
Through Tweedie's formula and recently popular diffusion results, this work can be seen as forward looking early work in this area albeit with a poor-person's CNN with pre-designed weights (DCT-based cycle spinning).
Why pre-designed weights?
Because experts have already designed weights having very good properties (including speed of evaluation) for images.
In signal processing tools are very important. One often asks the question can I design a general tool to handle all sorts of different images/video/signals? For example, it is desirable to have one set of weights for as large a class of images as possible. A lot of heavy math/machinery goes into those designs. Some very talented individuals work hard to enable them. Why not take advantage of their work?
Can optimize the weights using a training set when desired.
Today the deep-net community is busy generating new results and applications. There will come a time when we will start thinking about reducing weights, layers, and ask questions about the possibility of having a single set of weights/networks for the largest class of problems. Those questions will lead us back to approximation theory and, many of us feel, to concepts in sparsity.
IEEE Signal Processing Society Best Paper Award, 2007.
Earliest prototype is circa 2001. The work uses sparsity language. Substitute convolutional for translation-invariant, ...
Source code to play with in the software page.
Image with 8x8 missing blocks (5.94dB) and reconstructed image (28.61dB).
Sparsity Induced Prediction
Imagine a very distorted version of a picture (noise, clutter, structured intereference, ...) Can the distorted picture be used to meaningfully predict the original? Yes!
Sparsity induced prediction is a magical eraser: It erases irrelevant distortions with negligible side-information.
Focus changes, noise, lightning, missing regions, blended irrelevant images, lighting changes, ... ? No problem.
Useful in a image/video compression context.
Useful when estimating optical flow or registration parameters between related but otherwise heavily corrupted images/video.
Suppose predicting y using x with prediction \hat{y}:
A very interesting use of sparsity. State-of-the-art predictions.
Images and video are sparse with respect to various linear decompositions. All one has to do is to predict the significant decomposition coefficients. The reason we can see two images in a blend is that different images have significance at different coefficients. That's what the work uses.
Please see video/paper for an intuitive explanation of how it works.
Data flows similar to a convnet auto encoder.
Source code to play with in the software page.
Denoising with Overcomplete Representations
Removing noise from signals/images/video is a long standing use case. Many of us are also interested in denoising because it is a good way to gauge one's understanding of signals. Suppose you know what an image is really well. You get a noisy image and then you enforce your notion of what an image is. There is your denoising algorithm.
The better the image model, the better the denoiser. As models are primarily mathematical, denoising gets huge interest from communities outside engineering, for example, from applied math and similar communities. Denoising is in effect figuring out the essence of a signal/image/video and removing the rest.
An atomic decomposition represents the essence of a signal in terms of a vector of coefficients. The coefficients multiply a matrix of basis vectors to construct the essence. It is very similar to a generic autoencoder network. A translation-invariant or convolutional decomposition is hence similar to a convnet based autoencoder.
A lot of the essence of images is contained in edges and other abrupt transitions. Hence how well a technique models images can be understood by how well it denoises around edges. In the shown images the third column implements a simple layer of coefficient processing over the second column that is equivalent to a nonlinear weighted reconstruction. Observe how the regions around edges are much better denoised.
Teapot: Third column implements the nonlinear weighted reconstructions and performs the best.
Cameraman: Third column implements the nonlinear weighted reconstructions and performs the best.
Source code to play with in the software page.
Product grade implementation uses fixed-point calculations, skipped/decimated convolutional data processing, and other fast computation tricks.
Super-resolution and Denoising with Overcomplete Warped Transforms
One can set up image/video super-resolution as problems of missing data recovery. Then, similar to the iterated denoising algorithm above, one can obtain reconstructions using a sequence of denoise, data consistency, denoise, ...
On natural images/video theoretical results indicate that the modeling transforms used in denoising should be directional. But directional transforms are sophisticated and computationally complex. Can one generate sophisticated results using simpler tools?
We have designed warped-support transforms for this purpose. They are block/patch transforms but with the patch support warped to generate directional representations:
Poor-person's 4x4 directional transforms. A separable transform kernel (DCT) is used for each. The figure shows how the rows of samples, say the row (a, b, c, d), are tilted to correspond to different directions in 2D. Except for sample orderings complexity remains separable and (for the case of the DCT) fast.
Our real-time system for sophisticated super-resolution on compressed video uses warped transforms and other computational tricks to enable high quality visuals even on older smartphones.
The nice thing about overcomplete/redundant representations is that one does not have to worry about creating seams along similar but different warped-supports as long as one handles the redundancy appropriately.
Casual observation: In machine learning we now have a generation of deep-net techniques that have little concern for computational complexity. In fact "real-time" means "can be done real-time on a $3000 GPU" :) Needless to say all that will change once we are asked to render similar results on 4K video at 60 fps running on mobile :)
Patents:
O. G. Guleryuz, “Image Recovery Using Thresholding and Direct Linear Solvers,” issued, August 2007. Assigned to Seiko-Epson Corporation. Patent no: 7,260,269.
O. G. Guleryuz, “Iterated Denoising for Image Recovery,” issued, October 2006. Assigned to Seiko-Epson Corporation. Patent no: 7,120,308.
O. G. Guleryuz and G. Hua, “Spatial Sparsity Induced Temporal Prediction for Hybrid Video Compression,” issued, November 2011. Assigned to NTT DoCoMo, Inc. Patent no: 8,059,902.
O. G. Guleryuz, “Weighted Overcomplete De-noising,” issued, April 2008. Assigned to Seiko-Epson Corporation. Patent no: 7,352,909.
S. Kanumuri, O. G. Guleryuz, M. R. Civanlar, and A. Fujibayashi, “Methods for Fast and Memory Efficient Implementation of Transforms,” issued, September 2014. Assigned to NTT DoCoMo, Inc.Patent no: 8,837,579.
S. Kanumuri, O. G. Guleryuz, M. R. Civanlar, C. S. Boon, and A. Fujibayashi, “Noise and/or Flicker Reduction in Video Sequences Using Spatial and Temporal Processing,” issued, May 2014. Assigned to NTT DoCoMo, Inc. Patent no: 8,731,062.
S. Kanumuri, O. G. Guleryuz, and M. R. Civanlar, “Image/Video Quality Enhancement and Super-Resolution Using Sparse Transformations,” issued, June 2014. Assigned to NTT DoCoMo, Inc. Patent no:8,743,963.
Papers:
O. G. Guleryuz, “Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions and Iterated Denoising: Part I - Theory,” IEEE Transactions on Image Processing, vol. 15, No. 3, pp. 539-554, March, 2006, {pdf}.
O. G. Guleryuz, “Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions and Iterated Denoising: Part II - Adaptive Algorithms,” IEEE Transactions on Image Processing, vol. 15, No. 3, pp. 555-571, March, 2006, {pdf}.
O. G. Guleryuz, “Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions,” Proc. IEEE Int’l Conf. on Image Proc. (ICIP2003), Barcelona, Spain, Sept. 2003, {pdf}.
O. G. Guleryuz, “Iterated Denoising for Image Recovery,” Proc. Data Compression Conference, IEEE DCC-02, pp. 3-12, April 2002, {pdf}.
G. Hua and O. G. Guleryuz, “Spatial Sparsity-Induced Prediction (SIP) for Images and Video: A Simple Way to Reject Structured Interference,” IEEE Transactions on Image Processing, vol. 20, No. 4, pp. 889-909, April 2011, {pdf}.
O. Harmanci, G. Hua, and O. G. Guleryuz, “Predictive compression and denoising with overcomplete decompositions: a simple way to reject structured noise,” Proc. SPIE Conf. on Wavelets XII, in Image and Signal Processing, San Diego, Aug. 2007 (invited paper), {pdf}.
G. Hua and O. G. Guleryuz, “Spatial Sparsity Induced Temporal Prediction for Hybrid Video Compression,” IEEE Data Compression Conference, IEEE DCC-07, March 2007, {pdf}.
O. G. Guleryuz, “Weighted Averaging for Denoising with Overcomplete Dictionaries,” IEEE Transactions on Image Processing, vol. 16, No. 12, pp. 3020-3034, December 2007, {pdf}.
S. Kanumuri, O. G. Guleryuz, M. R. Civanlar, A. Fujibayashi, and C. S. Boon, “Temporal Flicker Reduction and Denoising in Video using Sparse Directional Transforms,” Proc. SPIE Conf. on Applications of Digital Image Processing XXXI, in Image and Signal Processing, San Diego, Aug. 2008 (invited paper), {pdf}.
S. Kanumuri, O. G. Guleryuz, and M. R. Civanlar, “Fast Super-Resolution Reconstructions of Mobile Video Using Warped Transforms and Adaptive Thresholding,” Proc. SPIE Conf. on Applications of Digital Image Processing XXX, in Image and Signal Processing, San Diego, Aug. 2007 (invited paper), {pdf}.