Realistic Speech-Driven Animation with GANs

Konstantinos Vougioukas ¹ Stavros Petridis ^1,2 Maja Pantic ^1,2

¹Imperial College London

²Samsung AI Centre Cambridge

Speech-Driven Animation

We propose a temporal GAN capable of producing animated faces using only a still image of a person and an audio clip containing speech. The videos generated using this model do not only produce lip movements that are synchronized with the audio but also exhibit characteristic facial expressions such as blinks, brow raises etc. This extends our previous model by separately dealing with audio-visual synchronization and expression generation. Our improved model works on "in-the-wild" unseen faces and is capable of capturing the emotion of the speaker and reflecting it in the facial expression.

Realistic Speech-Driven Animation with GANs

Speech-Driven Animation

Model

Paper

Turing Test

Example Videos

Singing

Emotions