PPP Predictor Compression Protocol
The LZ77 compression algorithm of Ziv and Lempel is a dictionary-based algorithm which uses a history array or sliding-window of consecutive already-seen symbols. However, the input symbols need not be contiguous in the history array: they can be entered into the array using a context-driven hash function. The same hash function is thus able to "predict" the symbols using the context hash or position in the array. If the input symbol matches the symbol in the hashed window position, a boolean TRUE is transmitted. Otherwise, a boolean FALSE is output, and the mispredicted symbol is also emitted. The window position is then updated with the mispredicted symbol. The history array thus becomes a "prediction or guess table."
The above algorithm, due to Kasman E. Thomas (1993), is employed in the Point-to-Point Protocol (PPP), a data link layer and communications protocol with data compression. According to the Data Compression FAQ of the internet newsgroup comp.compression, it was first described by T. Raita and J. Teuhola (1987) in the paper "Predictive text compression by hashing", ACM Conference on Information Retrieval.
Now widely-known as LZP (LZ Prediction) algorithm, it is based on the idea that "the current context is an excellent predictor of the next symbol" [Fenwick 1997] [Bloom 1996]. It has since then been developed with varying context coding functions along with different symbol ranking schemes. Bloom, for example, has many advanced versions of LZP. (It is interesting to note that the same context-based prediction method was already anticipated by C. E. Shannon in his 1951 paper "Prediction and Entropy of Printed English," Bell System Technical Journal, Vol. 30, January 1951, pp. 50-64.)
References:
1.] P. Fenwick, "Symbol Ranking Text Compression with Shannon Recodings," Journal of Universal Computer Science, Vol. 3, No. 2, Feb. 1997, pp.70-85.
2.] C. Bloom, "LZP: a new data compression algorithm", 1996.
Source Code:
PRAQ.ZIP - includes a PPP/LZP algorithm implementation which improves compression by counting the number of prediction bits (i.e., correctly-predicted symbols) and encoding the resulting sum as a variable-length integer. The input bytes are symbol-ranked according to recency of occurrence (MTF), and encoded using the same variable-length coding function. See also LZPN2, PRAQ4, PRAQ51, PRAQ52, PRAQ6, and LZPGT.