Multilingual streaming TTS with neural codecs for Indian languages

LIMMITS'25

Challenge overview

About LIMMITS 25 Challenge

As part of the challenge, TTS data of 80 hours is released in each of Indian English and Kannada, along with 40 hours of data in Gujarati and 80 hours of Bhojpuri. Each language will have a male and a female speaker, 8 speaker multilingual challenge dataset. TTS corpora in these languages are being built as a part of the SYSPIN project at SPIRE lab, Indian Institute of Science (IISc) Bangalore, India.

We present an opportunity for researchers to contribute towards the development of streaming and neural codec-based TTS systems. Recent developments and popularity in Conversational AI models create demand for real-time, multilingual, and adaptable speech generation. For applications such as Large Language Models (LLMs), low latency, streaming TTS systems are required. Further, recent neural codec-based TTS systems have obtained SOTA performance. These codecs offer compact representations of speech that enable efficient transmission and storage. Additionally, various speech attributes can be encoded in neural codecs allowing high quality and controllable speech synthesis.

About SYSPIN

SYnthesizing SPeech in INdian languages (SYSPIN) is an initiative to develop large open-source text-to-speech (TTS) corpora and models for TTS systems in nine Indian languages in the area of agriculture and finance. Nine Indian languages considered for this project are Hindi, Bengali, Marathi, Telugu, Bhojpuri, Kannada, Magahi, Chhattisgarhi, and Maithili.

A majority of the population in the country is still unable to use all the technological services due to language and literacy constraints. SYSPIN helps to reduce their barriers to voice-based technologies and creates a potential market for tech innovators and social entrepreneurs.

The output of this project will allow local innovation in emerging markets to develop products and services serving illiterate Indians and rural poor populations in their own medium of engagement with technology. The TTS corpus will be a unique resource for developing assistive technologies for people with speech and visual disabilities. The proposed 720 hours of open-source TTS data will open up opportunities for academic and industrial research.

More about SYSPIN: https://syspin.iisc.ac.in/

Challenge Timeline

Note: The dates given apply to any time zone across the world.

Registration opens - August 2nd, 2024

Dataset shared - August 16th, 2024

Challenge submission opens - November 15th, 2024

Challenge submission closes - November 19th, 2024

Results announced - December 5th, 2024

Paper submission deadline - December 9th, 2024

NOTE: The intellectual property (IP) i.e. the code and the models shared/submitted, is not transferred to the challenge organisers and the participants remain the owners of their code and models. When the code is made publicly available, an appropriate license should be added.

Page updated

Google Sites

Report abuse