Compression Theory
DPCM Compression of Gaussian Auto-Regressive Sequences
Suppose we are DPCM encoding a Gaussian AR sequence of the form:
DPCM tells us to form the best (l_2 norm) prediction of each sample at the encoder. Then quantize and send the prediction error to the decoder. The decoder also forms the best prediction and can reconstruct the sample once the encoded prediction error is received.
"It is clear that one should form the best possible prediction to make DPCM as efficient as possible." Is it really?
In this work we show that the above assertion is false and leads to significantly suboptimal DPCM systems at low bit-rates. The paper designs optimal systems that significantly outperform standard DPCM, including variants that try to better model quantization errors, etc.
In compression we are strongly conditioned to think decorrelate, reduce prediction error, etc., to the extent that we sometimes forget to look at the constraints of the problem we are trying to solve. As this work shows not only is the above assertion false but it is actually quite the wrong thing to do at lower bit-rates. (I believe a compression expert working with audio and video where DPCM is extensively utilized will find the relevant papers below interesting as this topic is still not well-understood in the community.)
If you are familiar with DPCM and basic compression concepts the following figure from the paper should pretty much summarize the main idea.
On the left is the spectrum of the white innovations process that DPCM encodes (see paper for assumptions). On the right is the reconstruction filter (magnitude squared) employed at the decoder for a first order auto regressive process.
The tradeoff: The rate in coding a Gaussian AR process with DPCM is a monotonic function of the variance of the innovations process, i.e., when variance increases then rate increases and similarly when variance decreases then rate decreases. So spectral components at omega_1 and omega_2 contribute equally to rate (since they contribute to the total variance by the same amount). But on the decoder side, due to the reconstruction filter, the contribution due to spectral components at omega_1 is more important. This results in a rate-distortion advantage by sacrificing components at omega_2 for the sake of components at omega_1 at certain bit-rates. The paper derives the optimal trade-off for general AR processes with significant rate-distortion improvements over standard DPCM.
Most of the improvements pointed to by the work can be accomplished by the incorporation of a prefilter at the encoder, which establishes the above trade-off. Hence one can take a standard DPCM system, modify only the encoder (i.e., keep the decoder fixed as the standard decoder) and still get most of the improvements of the fully modified system (see paper for details).
It is important to note that the rate-distortion tradeoff this work exposes and exploits is not technical minutiae. It makes a significant difference in practical systems.
DPCM is extensively used in video compression. We started incorporating this work into video compression using filtering along motion trajectories. To this day it remains not fully utilized.
This work came about when we wanted to improve a hybrid video codec (which uses DPCM) by taking the noise in the video frames into account. The project took off when I realized that the simple filtering modifications I had gave me a rate-distortion advantage whether there was noise or not!
Compression using Wavelet-Based Nonlinear Approximation
Imagine having a one-dimensional signal that is piecewise constant in between jumps. The information in such a signal is encoded in a discrete set of positions and jump values. The information is strongly positional.
The compression techniques we use in audio/image/video compression approximate or encode signals using waveforms. We fit wiggly looking wavelets, DCTs, and other basis functions to signals. Such waveforms are great for compactly representing areas having smooth and predictable fluctuations.
Can one encode positional information using waveforms? Of course, but it will typically need too many waveforms. One's immediate instinct is to treat positional information by encoding positions and areas of smooth fluctuations by encoding waveforms.
In 90s compression and applied math communities started rigorously viewing audio/image/video signals as containing jumps as well as areas having smooth and predictable fluctuations. The mathematical formulations were constructed with an eye toward efficient means of encoding such signals.
One mainstream approach tried separating signals with segmentation-like techniques and used both position and waveform encoders. The other approach tried to design waveforms that were equally efficient at dealing with jumps and regions of smooth/predictable fluctuations.
Our work is the first that rigorously calculates the rate-distortion performance of wavelets over piecewise-smooth signals using a nonlinear approximating encoder. In essence we show (i) using nonlinear approximation encoding piecewise smooth signals with certain types of wavelets is as efficient as encoding uniformly smooth signals, and (ii) the encodings are asymptotically optimal for both cases (see paper for definitions, results, and optimality statements.)
One can hence do efficient transform coding of these complex signal types (without requiring segmentation/separation) provided one uses the right linear transform and nonlinear-approximation-based encoding. For example using compact wavelets having polynomial annihilation properties one can encode piecewise smooth signals with or without jumps equally well and accomplish asymptotically optimal performance (see discussion in the paper for why the asymptotics involved are not trivial and are actually related to practice.) This is a surprising result especially in light of the seemingly "obvious" need to separately encode positional and waveform parts of the data.
The abstract of the manuscript is fairly descriptive and I think the introduction and conclusion convey the intent of the paper well.
Papers:
O. G. Guleryuz and M. T. Orchard “ On the DPCM Compression of Gaussian Autoregressive Sequences,” IEEE Transactions on Information Theory, vol. 48, no.7 pp. 1895-1921, July 2002, {pdf}.
O. G. Guleryuz and M. T. Orchard, “Rate Distortion Based Temporal Filtering for Video Compression,” Proc.
Data Compression Conference, IEEE DCC-96, pp. 122-131, April 1996, {pdf}.
A. Cohen, I. Daubechies, O. G. Guleryuz, and Michael T. Orchard, “On the importance of combining wavelet-based nonlinear approximation with coding strategies,” IEEE Transactions on Information Theory, vol. 48, no.7 pp. 1895-1921, July 2002, {pdf}.