PNG
A brief look at lossless compression
Source: https://youtu.be/EFUYNoFRHQI
PNG High Level
Portable Networks Graphics (PNG) is an lossless image algorithm. This means that it does lose any data in the process of encoding and decoding. On the left, you can see a (simplified) high-level breakdown of the algorithm. The raw RGB image is first filtered, then run through the LZSS data compression algorithm and finally represented using Huffman codes. When decoding, all information is retreived.
Filtering
Source: https://www.w3.org/TR/png/#9Filters
Key for the 5 filtering algorithms used by PNG on the right
Source: https://www.w3.org/TR/png/#9Filters
PNG's filtering stage is really fascinating. Rather than using one consistent filter, the algorithm actually has 5 available filters to choose from (described above). It uses a heuristic (minimum sum of absolute differences) to choose the "best" filtering technique for each row. By storing which filters where used, the smart decoder at the end will be still be successful.
LZSS Algorithm
Source: https://www.ijesit.com/Volume%204/Issue%203/IJESIT201503_06.pdf
After filtering, the image goes through the Lempel–Ziv–Storer–Szymanski (LZSS) algorithm. This utilizes a sliding window composed of two sections: the search buffer and the look-ahead buffer. As the window moves through the data, the algorithm looks at the values in the look-ahead buffer. If it has seen similar values before (in the search buffer), it replaces it with a "backreference" that takes up less space. The backreference is composed of an offset from the previously seen instance of the data and the length of the match.
Huffman Codes
Finally, the last step in PNG encoding is the Huffman coding. The key idea in Huffman codes is that certain pixel values will occur more often than others in the image (and that some values may not appear at all). Therefore, the main goal of the code is to represent the more common pixel values with lower numbers of bits.
Steps:
Record the frequencies of each pixel value in the data
Sort by frequency in increasing order
Start building a Huffman Tree by taking the two least frequent pixel values and connecting them to a node with their (frequency) sum
Repeat the process for all values until complete
The Tree now gives a way to represent these values. Every time you go left to the tree, add a 0. Every time you go right, add a 1. Note how the more frequently used pixels are higher up in the tree and thus have a smaller bit representation.
Encode the data with the bit representations giving by the Huffman Tree