Digital images, videos, and articles are everywhere in our daily lives, and the speed at which they’re uploaded to social media or downloaded to your mobile phone continues to increase. Improved devices and network infrastructure are partially accountable for this speed, but software tools such as data compression algorithms, which reduce the size of data files, play a vital role as well.
Compression is the process of representing data with fewer bits. Imagine being in a situation where your mobile phone is running out of storage space. At this point, if increasing storage capacity is not an option, you could free up space by deleting old photos. But instead of saying goodbye to some of your beautiful photos, you could also compress each photo file, reducing file size by removing unnecessary and repeating data.
There are many different compression algorithms, for many different file types. Depending on the algorithm, the output of the compressed file can be slightly different compared to the output of the corresponding uncompressed file. However, the compressed file still functions as it should: humans normally cannot tell the difference between the outputs.
Many types of files can be compressed — these are some of the most common:
Text also compresses very well, because it often contains repeating sequences of characters; instead of each character being stored as a separate byte, words or phrases can be stored together. You will see how this works later on in this section.
Many different files or directories can be compressed and stored together within a ZIP file.
A compressed photo requires fewer bits than its uncompressed counterpart, so it is transmitted faster, and your hardware can process it more quickly; ultimately, the photo loads faster in your browser.
An audio and video file can be compressed by up to 90%, so you can stream it all over the world within seconds.
Compressed images, videos, and audio files on mobile devices are transferred to cloud servers faster, which saves you time when you back up your devices.
Some apps and web browsers actively compress images, videos, and music files before up- or downloading them, thereby directly reducing the amount of data transmission needed. With reduced data transmission comes a smaller bill for your home WiFi or mobile phone!
The video stream service Netflix uses AI to analyse every shot in a video file and compress it without losing image quality visible to the human eye.
The AI system, called the ‘Dynamic Optimiser’ improves the quality of video when users have a poor internet connection. To develop this system, Netflix asked users to rate hundreds of thousands of shots. Then the AI algorithm was trained with this survey data so it could learn to distinguish between high- and low-quality images.
This Netflix algorithm is a smart and somewhat advanced use of compression. Now we’ll put the microscope on compression and look into how it occurs at a binary level.
Every compression algorithm aims to reduce data file size by removing unnecessary parts or finding and efficiently encoding patterns.
As mentioned earlier, text compresses easily because it often has lots of repeating patterns. Imagine a text file containing the following text:
I am Sam, Sam I am. That Sam-I-am! That Sam-I-am! I do not like that Sam-I-am! Do you like green eggs and ham? I do not like them, Sam-I-am. I do not like green eggs and ham.
As you’ve learned on this course, 8-bit ASCII encoding stores each character, symbol, or space in a single byte. Therefore, the text above would be stored in a file with 174 bytes. But by compressing the text, we can reduce the size of this file.
The uncompressed text file used 1 byte to store each character. But as you can see, the text contains repeating characters, words, and phrases:
am repeats 13 times
I do not like repeats 3 times
green eggs and h repeats 2 times
Our new compression system requires a new set of binary values. For each repeating character or phrase, we create a new binary equivalent, so a ‘data dictionary’ stores the words and phrases along with new 1-byte binary values:
Even though the dictionary takes up some space, it allows long repeated phrases to be stored in 1 byte each, reducing the storage space needed for the entire text.
Established compression systems are even more effective at compressing files. For example, with Huffman encoding, a common text encoding technique, the storage space for the ‘I am Sam’ text could be reduced from 174 bytes to 92 bytes.
In the next step we’re going to look at another compression system: run-length encoding (RLE).