SVTS: Scalable Video-To-Speech Synthesis
Rodrigo Mira 1 Alexandros Haliassos 1 Stavros Petridis 1 Björn Schuller 1,2 Maja Pantic 1
1 Imperial College London
2 ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg
Results Demo
Results Demo
(If you require full test samples for comparison, please contact rs2517(at)ic.ac.uk)
GRID - Seen Speakers
GRID - Seen Speakers
grid_seen.mp4
LRW
LRW
lrw_comparison.mp4
LRS3 - Seen Speakers
LRS3 - Seen Speakers
lrs3_seen.mp4
LRS3 - Unseen Speakers
LRS3 - Unseen Speakers
lrs3_unseen.mp4
GRID - Unseen Speakers
GRID - Unseen Speakers
grid_unseen.mp4
Vocoder Ablation (GRID - Seen Speakers)
Vocoder Ablation (GRID - Seen Speakers)
vocoder_ablation.mp4
Loss Ablation (GRID - Seen Speakers)
Loss Ablation (GRID - Seen Speakers)
loss_ablation.mp4