Image editing is the process of modifying the visual, often times semantic, content of an image. An example is to remove or add glasses to an image of a face. This is useful mostly for artistic purposes (advertising, film post-production). These algorithms often use generative models or Autoencoders. Here are some of the works I have carried out on this subject.
Here is a link to the work of Gwilherm Lesné, my former PhD student, at Télécom Paris (Saïd Ladjal, Yann Gousseau, Alasdair Newson), on image editing using GANs and diffusion models :
Here is a list of publications linked to the work of Xu Yao, my former PhD student, in collaboration between InterDigital (Pierre Hellier) and Télécom Paris (Yann Gousseau, Alasdair Newson), on image editing with GANs :
Feature-style Encoder for Style-Based GAN Inversion, X. Yao, A. Newson, Y. Gousseau, P. Hellier, ECCV 2022, Paper
A Latent Transformer for Disentangeled Face Editing in Images and Videos, X. Yao, A. Newson, Y. Gousseau, P. Hellier, ICCV, 2021, Paper, Code
Learning Non-Linear Disentangled Editing For StyleGAN, X. Yao, A. Newson, Y. Gousseau, P. Hellier, ICIP, 2021, Paper
High Resolution Face Age Editing, X. Yao, G. Puy, A. Newson, Y. Gousseau, P. Hellier, ICPR, 2020, Paper, Code
Autoencoders are (often deep) neural networks which project to and from a latent space which is of smaller dimensionality than the input data domain.
The main idea behind this is to use the compact and powerful latent space to understand, analyse and manipulate the input data in a much more high-level manner than in the data space. In the following projects, we try to study these architectures in detail in simple cases (images of shapes) and we propose new architectures and loss functions to create latent spaces with useful properties. More precisely, we are interested in the following questions :
How can we build autoencoders with latent spaces with useful structural properties (independence of components, organisation of attributes in latent space) ?
Can we describe the precise mechanisms which allow autoencoders to encode and decode simple images ?
How to use autoencoder or GAN-type networks to carry out image editing
PCAAE: Principal Component Analysis Autoencoder for organising the latent space of generative networks
This is the work of Chi-Hieu Pham (postdoc), supervised by Saïd Ladjal, Alasdair Newson. The goal is to create an autoencoder which mimics the behaviour of a PCA, which can then be used for image editing. Here is an example of such editing :
The paper can be found here :
Processing Simple Geometric Attributes with Autoencoders
Alasdair Newson, Andrés Almansa, Yann Gousseau, Saïd Ladjal
Journal of Mathematical Imaging and Vision, 2020
Abstract
Image synthesis is a core problem in modern deep learning, and many recent architectures such as autoencoders and Generative Adversarial networks produce spectacular results on highly complex data, such as images of faces or landscapes. While these results open up a wide range of new, advanced synthesis applications, there is also a severe lack of theoretical understanding of how these networks work. This results in a wide range of practical problems, such as difficulties in training, the tendency to sample images with little or no variability, and generalisation problems. In this paper, we propose to analyse the ability of the simplest generative network, the autoencoder, to encode and decode two simple geometric attributes : size and position. We believe that, in order to understand more complicated tasks, it is necessary to first understand how these networks process simple attributes. For the first property, we analyse the case of images of centred disks with variable radii. We explain how the autoencoder projects these images to and from a latent space of smallest possible dimension, a scalar. In particular, we describe both the encoding process and a closed-form solution to the decoding training problem in a network without biases, and show that during training, the network indeed finds this solution. We then investigate the best regularisation approaches which yield networks that generalise well. For the second property, position, we look at the encoding and decoding of Dirac delta functions, also known as “one-hot” vectors. We describe a hand-crafted filter that achieves encoding perfectly, and show that the network naturally finds this filter during training. We also show experimentally that the decoding can be achieved if the dataset is sampled in an appropriate manner. We hope that the insights given here will provide better understanding of the precise mechanisms used by generative networks, and will ultimately contribute to producing more robust and generalisable networks.