Image Compression

Behind the scenes when you're sending snaps

What is image compression?

Images are interpreted by computers as a series of bits that make up the pixels on the screen. The number of bits required to describe said images vary with quality and size, but in any case, the transfer of such bits making up an image requires deciding how to send information on the bits that make up an image from one location to another. This process, called compression, is what we will be exploring, including the two forms, lossy (wherein information is lost during the process), and lossless, where all information is retained during transfer.

Our Data: Essentially whatever we can get our hands on involving images. We will be starting with primarily PNGs as they are closer to "original" images, being the lossless representation of a picture upon which to perform both lossless and lossy compression.

In our project, we explored different methods of image compression.

We first investigated the two most common formats (JPEG and PNG) and implemented basic walkthroughs
We then explored several other popular lossy and lossless algorithms in greater depth by running metrics for each on a dataset in order to see how they compare to one another.
We also looked at ML-powered image compression by developing our own autoencoder which compresses using a convolutional neural network. Looking at the cutting edge, we analyzed the algorithms and techniques used in HiFiC, a GAN based compression algorithm.
Finally, we briefly had a look at implementing our own algorithm with our "Frankenstein Algorithm" that is essentially a modifed version of JPEG

Navigate different parts of our project using the pages bar at the top of the page.

The Methods We Used DSP Tools

Autoencoder visualization

Machine Learning

In class, we learned about the applications of digital signal processing and its connection with machine learning, which involves processing large amounts of data (signals) to generate a desired outcome. In our case, the autoencoder utilized a version of unsupervised learning to develop its abilities to break down and recreate an image.

2D FFT

Image Processing

Towards the end of the semester, we learned in lecture the applications of digital signal processing in image processing (2D DFT). Particularly relevant was the demonstration of edge detection on a 2D matrix, where we could see the horizontal and vertical edges after applying the FFT . Similarly, our project used a frequency transform on an RGB matrix to change into another basis.

2D DCT

DCTs

A familiar function, DCTs are like a relative to the FFTs we learned about extensively in class, and their 2D versions are similarly applied. We learned in class briefly how these frequency basis transformation functions are applied to a 2D image, and in our project, is precisely what we implemented with the JPEG and Frankenstein Algorithms. Like an FFT, it detects the prevalence of frequencies in the sample through a change in basis, a concept we learned early in the course.

Zig-Zag Run-length Encoding

Other Tools

One of the steps involved changing an RGB representation to a YCrCb representation, as well as a transformation function the transformed the chroma distribution of an image.

Other DSP tools we used from outside the class material were Huffman encoding and run-length encoding, which are explored more in depth in their respective pages in PNG and JPEG.

Further encoding algorithms such as LZ77 and what steps compose the Deflate algorithm were also studied. We also had surface level experience with other encoding algorithms used by various formats we tested with metrics. For example, block prediction in WebP.

We also explored machine learning to a level well beyond what we learned in class. We explored HiFiC, a GAN based approach to machine learning image compression and reconstruction, but were unfortunately not able to go as far as we would have liked with it due to resource limitations.

Page updated

Report abuse