JPEG
A smaller way to store files since 1992
A smaller way to store files since 1992
So what is JPEG? Simply put, JPEG is the most widely used image compression algorithm you will find. It is a favorite due to its ability to compress large file sizes into a smaller files, while the result appears indistinguishable to the human eye.
How do colors work?
Images are normally represented by 8 bit pixel values of red, green and blue. In other words, for each pixel in an image, it has and associated value of red, green, and blue from 0 to 2^8 (256). JPEG works better with a different color system, however.
Above is the first step in the JPEG compression process, wherein an RGB image is converted into a Y (luminescence), Cb (chroma blue), and Cr (chroma red) component. This process is built off the main theme of JPEGs, which utilize our understanding of human eyes to manipulate image data size without affecting their appearance to humans. It will be touched upon much later, but essentially, human eyes are sensitive to difference in light, but not nearly as sensitive to color.
Reduce Color Contrast
Left to right, no subsampling, 10x10 subsampling, 12x12 subsampling
Subsampling is when the pixels withing an area are all set to the same chroma value. For instance, a 2x2 chroma subsampling could set all 4 pixels in that grid to the value of the pixel in the top left. Human eyes are sensitive to luminance, but not nearly as sensitive to color changes. As demonstrated by the 10x10 (middle) example, chroma subsampling can be performed quite aggressively before the human eye starts to notice.
Downsampling is a similar process, but instead of just taking one pixel, it averages all the values together, like mixing the paints together and spreading them over the grid.
A familiar algorithm, but also different
The DCT only cares about real numbers.
Above is an example of the DCT coefficients of an 8x8 block from the image. Similar to an FFT, the function measures the "frequency distribution" of the input signal. In the case of an image, this occurs in a matrix. In a more physical sense, the function reads how drastically the image changes from one pixel to the next.
Above is the same block after applying the quantization matrix and rounding. As the isolated lines indicate, the resulting high coefficients corresponding with high frequencies are reduced, leaving only low frequency values. This plays into the theme of JPEGs, where human eyes are less sensitive to drastic changes over short distances (high frequencies).
In essence, the amount of information within the matrix has been drastically reduced. This is the lossy part of a JPEG, and why you may notice some images become fuzzy around edges after changing file types. The information that define those borderes has been eliminated.
Applying this DCT to all the blocks in the image begins to show the effects of compression. That is the process shown above, where there is a noticeable decrease in the resolution of the image. It should be noted that after compression, the number of non-zero DCT coefficients was only 5% the original value, indicating a total reduction to about 5% in the data require to produce the second image after eliminating the high frequency values. Ratio: 0.0568
How the manipulated image is transmitted
The more in-depth explanation is in the PNG page, but for a brief rundown here, Huffman encoding breaks down the distribution of values present, and assigns a key accordingly based on their frequency.
The zig-zag illustration helps demonstrate how the program reads each pixel on its way through an image file during compression, converting a 2D image into a string of bits. Run length encoding saves space by counting the number of repeating bits in a row, and sending that count instead of the entire length of bits, saving data. For instance, a series of informational bits like aaaaabbccccc is converted into 5a2b5c.