Learnt block, lapped, or multi-resolutional transforms.
Principled, nonlinear-approximation-based designs: Best k-term approximates of signals. Theoretically reduces to a KLT on a Gaussian process (KLT is optimal for a Gaussian signal, please see paper). If your signal is not guaranteed to be Gaussian you can move to using SOTs and not worry about any lost optimality over the Gaussian case.
Decisively outperforms designs like DCTs, KLTs (PCA-based transforms), etc., on general, non-Gaussian signals.
Useful as a data analysis/modeling/denoising/compression tool.
Constrained to be orthonormal by design. This facilitates easy quantization and easy R-D optimization avoiding integer search problems that plague other designs (please see paper).
Significant improvements over adaptive use of trigonometric transforms (EMT, etc., in JEM). Significant improvements over basic trigonometric transforms (DCT/DST) in HEVC and AVC.
Example directional block SOT basis functions. Shown blocks in the image get classified and transformed with SOTs of different classes.
Typical application of a SOT involves clustering data to N-classes (using the SOT classification algorithm, please see paper) and designing a SOT for each class. Encoder picks best class, sends class information, then associated SOT's coefficients.
SOTs learnt on natural images end up having directional structure compliant with theoretical models of natural images.
Block (8x8) SOTs at different orientations.
Source code to play with in the software page.
For fast approximations check out LGTs and RCTs.
A tool that combines the advantages of transform coding and DPCM while avoiding their disadvantages.
Noncausal encoding of prediction errors at the encoder that enables causal, DPCM-like decoding at the decoder.
Implemented to extend HEVC-INTRA/INTER modes (2-3% gains independently for each scenario, higher when combined.)
Video decoder complexity increase is marginal. Video encoder complexity increase is acceptable.
Consider the video decoder during INTRA prediction. It predicts a block, transform decodes the residuals, and adds the residuals to the prediction. Let's momentarily stop after the decoder adds the first residual value to the prediction of the first sample. The decoder now knows what the first sample in the block is (up to quantization error). Why does it not update the prediction of the second sample in order to obtain a better predictor?
If the decoder did the above, there will be encoder/decoder mismatch and error propagation. But a smart encoder can be designed to accommodate such a prediction updating decoder for overall compression gains.
Properties of video data such as motion dependencies, spatial edges, etc., require sophisticated spatiotemporal transforms that exploit statistical dependencies over motion trajectories and edges. These transforms are very hard to design.
By marrying DPCM with transform coding, NCE bypasses these issues and can be used to design spatiotemporal transforms by using simple transform and predictor tools.
As applications on modern video demand ever higher quality and bandwidth classical temporal video models are becoming more and more inadequate.
In many types of temporal transitions calculating motion/optical-flow and sending a displaced frame difference does not even form a competitive compression path (modern encoders will simply prefer spatial/INTRA prediction instead.)
On these transitions a much more competitive path can be formed by using prediction filtering where a matching block/region in the anchor frame is adaptively filtered before being used for prediction. The prediction filter is signaled to the decoder so that the decoder can put together the same prediction.
During a focus change the adaptive prediction filter needs to blur/sharpen, during a lighting change enhance/subdue lighting, during a cross-fade intelligently recombine the scenes, and so on. This of course has to happen at a spatially local level since within the same frame one can have very different transitions. So can one easily design/optimize a ton of filters quickly and send them to the decoder efficiently? (Mind you, for best results, the filters need to be jointly optimized with the motion/optical-flow search so one really has to watch out for the encoder complexity.)
Earlier work on Sparsity Induced Prediction showcased high quality prediction results over sophisticated inter-picture transitions that translated to significant gains in compression (if you haven't seen SIP results you may want to check them out, some are quite interesting). This work can be seen as distilling those results into high-performance and low-complexity prediction filters that accomplish high quality compression.
One way to send filters is to send tap values. Trying this, one quickly sees that too much information is generated. Out of convenience one can try restricting the filters to be symmetric/low-pass, etc., but should the filters really be symmetric/low-pass? How does one quantize and compress the space of filters in a way that makes sense in the compression of the filtered?
CPF construction does a reduced rank projection on the space of possible prediction filters in a way that makes sense for the compression of the prediction error and the filter itself.
CPFs are learnt from data and can be considered as factorizations of all possible prediction filters (columns of F).
CPF designs are composed of a generalizable set of base filters (columns of G) that are adaptively fine tuned (using the adaptive filter coefficients in the columns of C). The filters are encoded using C assuming both the encoder and the decoder have access to G.
G is learnt from a large training set of of video sequences.
C is highly adaptive and learnt for each prediction block/unit (a PU in HEVC).
Displaced frame difference vs. filter rate of prediction filtering. Base filter spectral responses are shown on the right.
A big attraction of CPFs is that they can be designed using simple scalar operations (rather than matrix inversions, etc., needed by Wiener filters). So in short, simple filters that seamlessly figure out the needed filtering and blur, sharpen, change lighting, recombine scenes, ... all in a fashion that makes the overall compression work.
To be continued.
Patents:
O. Harmanci, O. G. Sezer, and O. G. Guleryuz, “Image and Video Compression Using Sparse Orthonormal Transforms,” issued, May 2013. Assigned to NTT DoCoMo, Inc. Patent no: 8,437,564.
O. G. Guleryuz, A. Said, and S. Yea, “Method for Encoding and Decoding a Media Signal and Apparatus Using the Same,” filed, October 2014. Assigned to LG Electronics, Inc.
O. G. Guleryuz, S. Li, and S. Yea, “Method and Apparatus for Encoding, Decoding a Video Signal Using a Condensed Prediction Filter,” filed, May 2015. Assigned to LG Electronics, Inc.
O. G. Guleryuz, “Highly Adaptive Inter-Picture Relating Filters for Image and Video Processing and Compression,” filed, January 2015. Assigned to LG Electronics, Inc.
O. G. Guleryuz, “A Nonlinear, In-the-loop, Denoising Filter for Quantization Noise Removal for Hybrid Video Compression,” issued, July 2012. Assigned to NTT DoCoMo, Inc. Patent no: 8,218,634.
O. G. Guleryuz, S. Li, and S. Yea, “Method and Apparatus for Encoding, Decoding a Video Signal Using a Condensed Prediction Filter,” filed, May 2015. Assigned to LG Electronics, Inc.
Papers:
O. Sezer, O. G. Guleryuz, and Y. Altunbasak, “Approximation and Compression with Sparse Orthonormal Transforms,” IEEE Transactions on Image Processing, vol. 24, No. 8, pp. 2328-2343, August 2015, {pdf}.
O. G. Sezer , O. Harmanci, O. G. Guleryuz “Sparse Orthonormal Transforms for Image Compression,” Proc. IEEE Int’l Conf. on Image Proc. (ICIP2008), San Diego, CA, Oct. 2008, {pdf}.
J. Ehmann, O. G. Guleryuz, and S. Yea, “Transform-Coded Pel-Recursive Video Compression,” Proc. IEEE Int’l Conf. on Image Proc. (ICIP2016), Phoenix, AZ, Sept. 2016, {pdf}.
O. G. Guleryuz, A. Said, and S. Yea, “Non-causal Encoding of Predictively Coded Samples,” Proc. IEEE Int’l Conf. on Image Proc. (ICIP2014), Paris, France, Oct. 2014, {pdf}.
S. Li, O. G. Guleryuz, and S. Yea, “Reduced-Rank Condensed Filter Dictionaries for Inter-Picture Prediction,” Proc. IEEE Int’l Conf. on Acoustics, Speech and Signal Proc. (ICASSP2015), Brisbane, Australia, April 2015, {pdf}.