Source: https://medium.com/@birla.deepak26/autoencoders-76bb49ae6a8f
The autoencoder is a neural network that attempts to reconstruct the input. An autoencoder can be split into two independent machine learning models: an encoder and a decoder. The encoder breaks down the input image into a smaller representation. The decoder attempts to recover the image from the smaller representation. We explore the potential of the autoencoder as a form of image compression. The goal is for the model to be a generalizable form of encoding and decoding.
Source: https://cs.stanford.edu/~acoates/stl10/
The STL10 dataset is an easily available dataset provided by Stanford. We are able to download it directly from torchvision (pytorch). The dataset contains images labeled images from the following 10 classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck. It also contains a large amount of unlabeled images for testing, making it great for unsupervised learning. For our purposes, we are using the 100,000 unlabeled images available from the dataset.
We use this dataset because it is easily available and contains thousands of images. The dataset is commonly used and is very trusted in the community.
We crop the images into 64x64 images. Due to resource limitation, we only train on 10,000 samples.
Source: https://www.compthree.com/blog/autoencoder/
This is a simple Autoencoder architecture. It is basically just a few convolution and deconvolution blocks with some ReLu blocks in-between. A sigmoid layer is at the end of the decoder to limit the output between 0 and 1 to make it printable. The structure can be seen in the dropdowns following this section and in our Google Colab document linked at the bottom.
Layer (type) Output Shape Param #
Conv2d-1 [-1, 12, 32, 32] 588
ReLU-2 [-1, 12, 32, 32] 0
Conv2d-3 [-1, 24, 16, 16] 4,632
ReLU-4 [-1, 24, 16, 16] 0
Conv2d-5 [-1, 48, 8, 8] 18,480
ReLU-6 [-1, 48, 8, 8] 0
Conv2d-7 [-1, 24, 4, 4] 18,456
ReLU-8 [-1, 24, 4, 4] 0
Layer (type) Output Shape Param #
ConvTranspose2d-1 [-1, 48, 8, 8] 18,480
ReLU-2 [-1, 48, 8, 8] 0
ConvTranspose2d-3 [-1, 24, 16, 16] 18,456
ReLU-4 [-1, 24, 16, 16] 0
ConvTranspose2d-5 [-1, 12, 32, 32] 4,620
ReLU-6 [-1, 12, 32, 32] 0
ConvTranspose2d-7 [-1, 3, 64, 64] 579
Sigmoid-8 [-1, 3, 64, 64] 0
Adam optimizer
MSE loss
500 epochs
We used Google Colab to train our model. This took about 46 minutes to train.
Output at 1 Epoch of Training
Output at 50 Epochs of training
Output at 500 Epochs of training
Our autoencoder does okay, but is not as flexible and is quite bad at decoding the image compared to typical image compression algorithms. We are only able to input 64x64 images into our model. This constraint makes it particularly difficult for us to compare this approach directly with our other algorithms. Our reconstructed images are much lower quality and introduce a lot of artifacts compared to our original images. They look much blurrier grainier, and sometimes forget important details from the original image.
At the bottleneck, our image is represented as a 24x4x4 (384 element) tensor. This is from a 3x64x64 (12,288 element) image.
Assuming we save the tensor as doubles (8 bytes), our representation would be 3072 bytes (384*8). Our images in a raw format are about 12,288 bytes. Using these numbers, we would have a 0.25 compression ratio.
We could resolve the image size limitation by potentially breaking up images into blocks and passing them into the model, but this would probably require us to find a new dataset and retrain the model with larger images. We could achieve a somewhat higher compression ratio using some kind of entropy coding, such as Huffman or arithmetic coding. Unfortunately, the autoencoder doesn't seem to be worth further exploring. The image quality is reduced too greatly and offers a poor compression ratio compared to standard compression algorithms. So, we have decided to leave our exploration of autoencoders off here. However, during our research about machine learning approaches to image compression, we learned about HIFIC, which uses GANs to expand on the autoencoder and achieves much better compression results. See that page for more info.