Pixel Compression
04
Data Compression
In this lesson we continue the exploration of bits and binary numbers. In this case we learn how to use bits, 1s and 0s, to represent images.
The image representation technique demonstrated in the video below is known as run-length encoding (RLE) and it is an image compression technique. Image compression is a type of data compression which can reduce the size (number of bits) of transmitted or stored data.
The size of data (the number of bits required to store it) affects the time it takes to send that data across the Internet. So, people use data compression algorithms to reduce the size of images, sounds, movies and some other kinds of data.
The amount of size reduction depends on two things:
the amount of redundancy in the original data
the compression algorithm applied
There are two broad categories of data compression algorithms: lossless and lossy, depending on whether information is lost.
Lossless compression works by removing redundant data. These algorithms can usually reduce the number of bits required to store or transmit the data while guaranteeing that the original data can be perfectly reconstructed.
Run-Length Encoding
Run-length encoding is an example of lossless compression. Consider the 158 pixels in the top row of the BJC logo (at right). The first 60 pixels are white. Then come five pixels of yellowish orange (the top slice of the "b"). And the rest of that row is white.
Instead of storing all 158 pixels individually, we could compress them with run-length encoding and just store six values (three numbers and three colors).
Those six values (60, FFFFFF, 5, E5A84A, 93, FFFFFF) can be reconstructed into that whole first row of the image (158 pixels). So, fewer bits does not necessarily mean less information.
Lossy Compression
Lossy Compression works by removing details that people aren't likely to notice. The most commonly used lossy compression algorithm for pictures is called JPEG (or JPG, both pronounced "jay peg" for "Joint Photographic Experts Group," the committee that invented it). JPEG works by preserving most of the brightness information for each pixel (since human eyes are sensitive to that) and performing a kind of averaging process to the color information (because human eyes aren't as good at distinguishing color, especially colors close to white).
To the right are an original, uncompressed picture of pebbles in a pond and a highly compressed JPEG of the same image. Can you tell which is which?
(Note: Google Sites might compress the two images with their own format, so use the links to view the original bmp and jpeg image)
You probably can tell which is which, especially if you looked for sharp edges or very shiny spots. But the compressed file uses 1/30th of the space used by the original, and you could still tell that it's a picture of rocks. So, for many purposes the compressed version would be good enough. Lossy algorithms usually let you control the degree of precision, and generally, people select less extreme compression settings, so the compressed file looks much more like the original than this example.
The most popular file type you probably use for portable music files, MP3 format, is a lossy compression format. It tends to emphasize high frequencies, so people accustomed to MP3 music find uncompressed versions of the same music boomy (bassy).
Which is best?
Both types of data compression exist because each is useful in certain circumstances:
A lossless compression algorithm is one in which no data are lost; the original data can be completely recovered.
An example of lossless compression is RLE.
Lossless compression is a good choice when there are very sharp transitions between colors (such as in logos) or when it's essential to be able to recreate original data precisely (such as the code for a program or the text of a book).
A lossy compression algorithm is one in which some data are lost; the original data cannot be completely restored.
An example of lossy compression is JPEG.
Lossy data compression algorithms can usually reduce the number of bits stored or transmitted more than lossless compression algorithms.
Lossy compression is a good choice when the data does not require precision (such as images, sound, or movies, which people may not even notice have been compressed) and when reducing number of bits stored or transmitted is most important.
Fewer bits does not necessarily mean less information.
The amount of size reduction from compression depends on both the amount of redundancy in the original data representation and the compression algorithm applied.
The amount of compression can vary depending on how many bits are used to represent each pixel in the image.
The amount of compression also depends on the number of different colors used in the actual image. For our black and white spaceship there were only 2 colors, so there were relatively few color changes and therefore lots of long runs. If it were a colored spaceship, there would be many color changes and therefore fewer long runs. So we would get much less compression.
Still Curious?
How do Snapchat filters work?
If you or someone you know uses the social media app Snapchat, they have probably used one of those cool filters. But, how exactly do those filters work? Watch the video to learn more about the algorithm and pixel data behind Snapchat filters.
How does JPEG encoding work?
JPEG is an example of a lossy compression algorithm. JPEG, which uses the file extension .jpg or .jpeg, is the most common format used today to represent images.The JPEG algorithm was created by the Joint Photographic Experts Group (JPEG), hence its name. The fact that JPEG is a lossy technique means that some of the information present in the image is lost during compression and cannot be recovered. Here is a video lecture on the JPEG compression algorithm (Teacher Tube version). The compression algorithm involves some math, but the video describes just enough of the math so that you can see how JPEG works. The video is a summary of an excellent, more detailed presentation by Randell Heyman -- you should really check out the Heyman video if you are interested in more of the mathematical details.
How are audio files digitized?
What about audio files? How are they digitized and converted to bits? Watch the following video for a summary of how audio files are converted from analog to digital format. Analog refers to data with values that change continuously, or smoothly, over time, like sound or music files. Analog data is converted to a digital forms, 0s and 1s in binary, using a sampling technique, which means measuring values of the analog signal at regular intervals (usually in time or space) called samples. The samples are measured to figure out the exact bits required to store each sample. The use of digital data to approximate real-world analog data is a great example of abstraction!
Analog data can be closely approximated digitally using a sampling technique, which means measuring values of the analog signal at regular intervals called samples.
The samples are measured to figure out the exact bits required to store each sample. The number of samples measured per second is the sampling rate, the higher the rate the better the quality.