Linguistically Informed LRS-3 dataset

The data for linguistically informed LRS-3 dataset is originally based on the LRS-3 dataset (Afouras et al). To use linguistically informed LRS-3 dataset, you first have to obtain the LRS-3 by visiting and request for the LRS-3 dataset. Post that, you will have to contact yamank[at]iiitd[dot]ac[dot]in and devansh[dot]batra[at]midas[dot]center for getting access to the linguistically informed LRS-3 dataset.

In this work, we release the following modifications of LRS-3:

  1. Random Corruption

  2. Prefix and Suffix Corruption

  3. Visemic Corruption

  4. Inter-word and Intra-word Corruption

  5. Corruption of POS Tags

The main reasons of this modification on LRS-3 are:

  1. To compare speech generation networks on linguistically informed metrics rather than just on the basis of video quality.

  2. To judge how much video speech reconstruction networks have learnt language. For example, if one misses out speaking a noun while speaking, is the network able to recognize and correct it or does it produce random sequence of pictures to produce lowest mean squared error

  3. To find out if a speech reconstruction network has any underlying bias or bugs (for example, any bias against a particular viseme)

If you are looking forward to collaborate to expand on the current set of linguistic metrics, contact yamank[at]iiitd[dot]ac[dot]in or devansh[dot]batra[at]midas[dot]center.

The data include excerpts of videos obtained from the TED YouTube channel. Use of this content must respect the TED terms of use and the Creative Commons BY-NC-ND 4.0 license.

Please cite the following if you make use of the dataset.


[1] A.N. Mathur, D. Batra, Y. Kumar, R.R. Shah, R. Zimmermann, C. Chen

LiFi: Towards Linguistically Informed Frame Interpolation (ICASSP-2021)

arXiv preprint arXiv:2010.16078

Bibtex | PDF | Blog | Code | Dataset