Research

Autoencoders

A short description of autoencoders

Autoencoders are (often deep) neural networks which project to and from a latent space which is of smaller dimensionality than the input data domain.

https://sites.google.com/site/alasdairnewson/research/autoencoders/autoencoder_illustration_2.png

The main idea behind this is to use the compact and powerful latent space to understand, analyse and manipulate the input data in a much more high-level manner than in the data space. In the following projects, we try to study these architectures in detail in simple cases (images of shapes) and we propose new architectures and loss functions to create latent spaces with useful properties. More precisely, we are interested in the following questions :

  • How can we build autoencoders with latent spaces with useful structural properties (independence of components, organisation of attributes in latent space) ?
  • Can we describe the precise mechanisms which allow autoencoders to encode and decode simple images ?
  • How to use autoencoder or GAN-type networks to carry out image editing

Our papers in this research area

PCAAE: Principal Component Analysis Autoencoder for organising the latent space of generative networks

Chi-Hieu Pham, Saïd Ladjal, Alasdair Newson

arXiv:2006.07827

Paper

Abstract

Autoencoders and generative models produce some of the most spectacular deep learning results to date. However, understanding and controlling the latent space of these models presents a considerable challenge. Drawing inspiration from principal component analysis and autoencoder, we propose the Principal Component Analysis Autoencoder (PCAAE). This is a novel autoencoder whose latent space verifies two properties. Firstly, the dimensions are organised in decreasing importance with respect to the data at hand. Secondly, the components of the latent space are statistically independent. We achieve this by progressively increasing the latent space during training, and with a covariance loss applied to the latent codes. The resulting autoencoder produces a latent space which separates the intrinsic attributes of the data into different components of the latent space, in a completely unsupervised manner. We also describe an extension of our approach to the case of powerful, pre-trained GANs. We show results on both synthetic examples of shapes and on a state-of-the-art GAN. For example, we are able to separate the color shade scale of hair and skin, pose of faces and the gender in the CelebA, without accessing any labels. We compare the PCAAE with other state-of-the-art approaches, in particular with respect to the ability to disentangle attributes in the latent space. We hope that this approach will contribute to better understanding of the intrinsic latent spaces of powerful deep generative models.


Processing Simple Geometric Attributes with Autoencoders

Alasdair Newson, Andrés Almansa, Yann Gousseau, Saïd Ladjal

Journal of Mathematical Imaging and Vision, 2020

Paper

Abstract

Image synthesis is a core problem in modern deep learning, and many recent architectures such as autoencoders and Generative Adversarial networks produce spectacular results on highly complex data, such as images of faces or landscapes. While these results open up a wide range of new, advanced synthesis applications, there is also a severe lack of theoretical understanding of how these networks work. This results in a wide range of practical problems, such as difficulties in training, the tendency to sample images with little or no variability, and generalisation problems. In this paper, we propose to analyse the ability of the simplest generative network, the autoencoder, to encode and decode two simple geometric attributes : size and position. We believe that, in order to understand more complicated tasks, it is necessary to first understand how these networks process simple attributes. For the first property, we analyse the case of images of centred disks with variable radii. We explain how the autoencoder projects these images to and from a latent space of smallest possible dimension, a scalar. In particular, we describe both the encoding process and a closed-form solution to the decoding training problem in a network without biases, and show that during training, the network indeed finds this solution. We then investigate the best regularisation approaches which yield networks that generalise well. For the second property, position, we look at the encoding and decoding of Dirac delta functions, also known as “one-hot” vectors. We describe a hand-crafted filter that achieves encoding perfectly, and show that the network naturally finds this filter during training. We also show experimentally that the decoding can be achieved if the dataset is sampled in an appropriate manner. We hope that the insights given here will provide better understanding of the precise mechanisms used by generative networks, and will ultimately contribute to producing more robust and generalisable networks.