Pixel Compression

04

Data Compression

In this lesson we continue the exploration of bits and binary numbers. In this case we learn how to use bits, 1s and 0s, to represent images.

The image representation technique demonstrated in the video below is known as run-length encoding (RLE) and it is an image compression technique. Image compression is a type of data compression which can reduce the size (number of bits) of transmitted or stored data.

The size of data (the number of bits required to store it) affects the time it takes to send that data across the Internet. So, people use data compression algorithms to reduce the size of images, sounds, movies and some other kinds of data.

The amount of size reduction depends on two things:

There are two broad categories of data compression algorithms: lossless and lossy, depending on whether information is lost.

Lossless compression works by removing redundant data. These algorithms can usually reduce the number of bits required to store or transmit the data while guaranteeing that the original data can be perfectly reconstructed.

Image Compression

It's All Bits - Video

It's All Bits - Slides

D04-Interpreting Binary Sequences

Run-Length Encoding

Run-length encoding is an example of lossless compression. Consider the 158 pixels in the top row of the BJC logo (at right). The first 60 pixels are white. Then come five pixels of yellowish orange (the top slice of the "b"). And the rest of that row is white.

Instead of storing all 158 pixels individually, we could compress them with run-length encoding and just store six values (three numbers and three colors). 

Those six values (60, FFFFFF, 5, E5A84A, 93, FFFFFF) can be reconstructed into that whole first row of the image (158 pixels). So, fewer bits does not necessarily mean less information.

Lossy Compression

Lossy Compression works by removing details that people aren't likely to notice. The most commonly used lossy compression algorithm for pictures is called JPEG (or JPG, both pronounced "jay peg" for "Joint Photographic Experts Group," the committee that invented it). JPEG works by preserving most of the brightness information for each pixel (since human eyes are sensitive to that) and performing a kind of averaging process to the color information (because human eyes aren't as good at distinguishing color, especially colors close to white).

To the right are an original, uncompressed picture of pebbles in a pond and a highly compressed JPEG of the same image. Can you tell which is which?

(Note: Google Sites might compress the two images with their own format, so use the links to view the original bmp and jpeg image)

You probably can tell which is which, especially if you looked for sharp edges or very shiny spots. But the compressed file uses 1/30th of the space used by the original, and you could still tell that it's a picture of rocks. So, for many purposes the compressed version would be good enough. Lossy algorithms usually let you control the degree of precision, and generally, people select less extreme compression settings, so the compressed file looks much more like the original than this example.

The most popular file type you probably use for portable music files, MP3 format, is a lossy compression format. It tends to emphasize high frequencies, so people accustomed to MP3 music find uncompressed versions of the same music boomy (bassy).

Which is best?

Both types of data compression exist because each is useful in certain circumstances:

A lossless compression algorithm is one in which no data are lost; the original data can be completely recovered.


A lossy compression algorithm is one in which some data are lost; the original data cannot be completely restored. 


Still Curious?

How do Snapchat filters work?

How does JPEG encoding work?

How are audio files digitized?

Analog data can be closely approximated digitally using a sampling technique, which means measuring values of the analog signal at regular intervals called samples.

The samples are measured to figure out the exact bits required to store each sample. The number of samples measured per second is the sampling rate, the higher the rate the better the quality.