Dynamic Prefix-Free Codes

Dynamic Prefix-Free Codes


Compression with the static Huffman algorithm is very good and effective which served programmers for years. However, it has the reputation for being slow to execute especially for very large files, mainly because it is a two-pass compression algorithm. A better approach is to make it dynamic or adaptive, changing in real-time its model of the source and acting dynamically to create various codes for the symbols. This is called Dynamic Huffman Coding. Interestingly, this adaptive algorithm yields even better compression which is highly desirable apart from its “acceptable” speed.

Dynamic Huffman coding belongs to the class of one-pass algorithms, in which the data source is read and concurrently encoded as the algorithm proceeds. No second pass is necessary as the whole source must have been encoded in just one reading of the symbols.

Because the algorithm continuously adapts the model of the source (i.e., the Huffman tree model), it is of great importance that the operations performed by the algorithm be fast as compared to other techniques. Of course, the bit encoding of the data source must be smaller than the output of the static Huffman method. And the main goal is how to quickly re-build or update the Huffman tree each time a symbol is received by the encoder.

Algorithm variants of prefix coding are “adaptive” methods that update the tree at each step of the encoding process. Most techniques can update the tree each time a symbol is seen, in particular Algorithm FGK [Knuth 1985] and Algorithm Vitter [1987]. Other dynamic Huffman schemes adapt the tree only after a large “block” of symbols is processed (e.g., performing static coding on each block). This is done to avoid constant updating of the tree, speeding up compression.

Algorithm FGK and Algorithm Vitter both employ a new symbol (which is not part of the alphabet) and emits its symbol code to “signal” the decoder that a “previously-unseen” symbol follows. The new symbol’s leaf node is called the 0-node and is always the “sibling” of the least-frequent symbol, thus adding a single bit for the code of the least-frequent symbol. The 0-node and its sibling is always placed at the very bottom level of the tree.

Gagie [2004] introduced a “dynamic” version of Shannon coding, incorporating the same 0-node idea but minimizes the number of node operations at the cost of slightly longer codewords.

Pigeon and Bengio [1998] describe a “set-based” dynamic Huffman coding method in which symbols of the same frequency are placed in just one leaf node. The code for the leaf node thus acts as a “prefix” set of output bits. Encoding the actual symbol itself is minimized since if it is the only symbol in its set, the prefix code distinguishes it perfectly. Experiments show that their technique is good for binary files. Working on 16-bit symbols or words, it outperforms algorithms FGK and Vitter, producing fewer bits.  Treating the message as composed of 16-bit symbols  greatly improves compression since the Huffman tree generated by the encoding is clearly an order-1 model.

A customary improvement to dynamic tree algorithms is to delete the node corresponding to “the character not in the alphabet” (i.e., the 0-node) after all the characters of the alphabet have occurred [Gagie 2004]. Theoretically, this method provides a slight improvement in code compaction since the former sibling of the 0-node is assigned a code which is 1 bit shorter than its previous code.

Furthermore, actual tests reveal that more compression can be achieved for Algorithm FGK and Algorithm Vitter when the code-tree is “pre-processed” (i.e., initializing the Huffman tree for the first k symbols) before actual coding begins.



Next >>


References:

 1.)   D. E. Knuth, “Dynamic Huffman Coding,” Journal of Algorithms 6 (1985), pp. 163-180.

2.]   J. S. Vitter, “Design and Analysis of Dynamic Huffman Codes,” Journal of the Association for Computing Machinery, Vol. 34, No.4, October 1987, pp. 825-845.

3.]  S. Pigeon and Y. Bengio, “Memory-Efficient Adaptive Huffman Coding,” Dr. Dobb’s Journal (DDJ), October 1998, pp. 131-135.

4.]   T. Gagie, “Dynamic Shannon Coding,” In Proceedings of the 12th European Symposium on Algorithms, 2004, pp. 359-370.