SVTS: Scalable Video-To-Speech Synthesis


Rodrigo Mira 1 Alexandros Haliassos 1  Stavros Petridis 1 Björn Schuller 1,2 Maja Pantic 1

1 Imperial College London 

2 ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg

Results Demo

(If you require full test samples for comparison, please contact rs2517(at)ic.ac.uk)

GRID - Seen Speakers

grid_seen.mp4

LRW

lrw_comparison.mp4

LRS3 - Seen Speakers

lrs3_seen.mp4

LRS3 - Unseen Speakers

lrs3_unseen.mp4

GRID - Unseen Speakers

grid_unseen.mp4

Vocoder Ablation (GRID - Seen Speakers)

vocoder_ablation.mp4

Loss Ablation (GRID - Seen Speakers)

loss_ablation.mp4