The Data Compression Guide


 

Data Transformation Methods

 

    The adaptive Huffman as well as the Lempel-Ziv dictionary-based algorithms presented earlier are all dynamic or online algorithms, which means that they quickly output a compressed file out of the supplied input file, requiring no previous knowledge about the input source. But what if you can first “transform” the data source instead of compressing it immediately—and only then employ a second-stage algorithm to actually compress it?

    The subject of data transformation methods  is one of the most interesting themes in the field of data compression today due to its promise of better compression ratios. The main purpose of a data transformation method is to “re-structure” the data such that the transformed message is now more compressible to a specific second-stage algorithm or any existing general-purpose algorithms.

    In this section, we explore some of the best transformation methods employed in data compression that ultimately yield smaller compressed files. Because these methods just transform the source to a different form, the transformed file will then have to be compressed by a second-stage algorithm, making the compression process somewhat slower in performance. However, this slight change in run-time performance is acceptable since the transform will truly skew the data source to allow more effective compression.

    First, we discover a technique which transforms multiple runs of distinct bytes into a run of a single byte, creating only one kind of “run” in the data source. If the input source is taken as a stream of binary symbols, then only a single bit (perhaps '0') may denote the runs of binary digits. Second,  we study a related method which is able to extend its function and prepares the data for an even more effective compression—by always moving the received symbol as the first symbol in the table of codes. And finally, we examine an excellent data transformation method which uses a sorting technique to rearrange the input data such that it becomes highly-amenable for better compression. The method—a reversible sorting technique—was invented in 1983 but was only published in 1994. It is now the basic engine behind a free compression program which outperforms even ZIP file compression.

 

Next >>