In order to quantify how well each algorithm works, we have determined some objective metrics used to evaluate and compare compression algorithms.
Compression ratio is simply the ratio between the data size before and after compressing the image. A lower ratio is better.
Measure similarity between images using mean squared error. This method calculates error by summing the pixel intensity difference between corresponding pixels in the original and compressed image. A lower MSE value is better (0 is the same image).
PSNR measures the ratio between the maximum intensity value of our image and the distorting noise of our compressed image. I_Max is the highest possible pixel value, which is 255 for images. Notice PSNR uses MSE to calculate a score. If comparing the same image, MSE is 0, so PSNR is infinite (or undefined). A higher PSNR score indicates better image quality. For typical compression algorithms, we should expect values between 30-50dB (for 8 bits).
SSIM is a perception based model that attempts to measure image quality by its luminance, contrast, and structural characteristics. The structural characteristic is the idea that pixels that are close to each other are strongly related and give information about the structure of the image. The SSIM score ranges from -1 to 1. A score of 1 means the images are identical. Although it is a bit difficult to give SSIM scores a meaning from numbers alone, basically, the higher the number, the more difficult it is to distinguish the new picture from the original.
Some variations of SSIM exist, such as Multi Scale SSIM (MS-SSIM) exist, which may perform better. We have not looked into such variants in detail.
The time it takes for the compression (and decompression) algorithm to run. We do not consider this metric as algorithms are performed fast enough for the computational time to be unnoticeable in common use cases. This time can also vary depending on how different programs implement them.
To streamline the process of compression evaluation, we have created a set of Matlab and Python functions that will collect this data for us.
To measure the performance of different algorithms, we plan to use images from the following dataset: http://imagecompression.info/. This is a collection of 14 high resolution, high precision images that have been collected specifically for image compression evaluation. It contains a diverse set of both real and computer generated photos. We will use this dataset to evaluate the above metrics for our algorithms of interest.
We chose to analyze the different algorithms separately because 1) it is difficult to choose compression settings so that the compression ratio output of each image will be similar as different algorithms have different settings and 2) the algorithms with bigger compression ratios have an unfair advantage when comparing image quality. Our "uncompressed" files which we will compare each algorithm against is in PPM format. This is relevant for compression ratios.
Overall Summary
Average Compression Ratio: 0.03567492279335994
JPEG is the standard in lossy compression. It is able to reduce the size of our uncompressed files to less than 7% of their original size. On average, we have reduced the size to ~3% of the original while keeping image quality pretty great. However, notice that the SSIM score for the leaves image is quite low but also has one of the lowest compression ratios.
Overall Summary
Average Compression Ratio: 0.02860881921862743
WebP is an increasingly popular alternative to both JPEG and PNG (since it has lossy and lossless forms) developed by Google. It is especially used on the internet.
Overall Summary
Average Compression Ratio: 0.42365446850624366
JPEG2000 is an iteration of the original JPEG compression format made with the intention of superseding it. Instead of using DCT (Discrete Cosine Transform), it uses a wavelet approach. We see from the results that (at least for the settings we used) JPEG2000 scored much better on the similarity metrics at the cost of a higher compression ratio when compared to JPEG.
For lossless algorithms, we mainly looked at the compression ratio. We looked at 4 formats: PNG, BMP, JPEG2000. Unlike in the lossy section, we are using the lossless implimentation of JPEG2000 here. The average compression rate for our dataset is shown for each in the graph below. Note BMP's really high ratio indicates that it is actually not compressed at all! It is included as a comparison to another uncompressed file format. Of the three we compared, JPEG2000 had the best average compression rate, but PNG and JPEG 2000 perform very closely. Looking at the compression ratios for specific images, we can see that they both struggle and perform well with the same images. We are able to directly compare each algorithm here, as reconstruction of images for these algorithms is perfect, so only the compression ratio is different.
PNG Average Compression Ratio: 0.456236928357626
BMP Average Compression Ratio: 1.0000279516135946
JPEG2000 Average Compression Ratio: 0.42365446850624366