1111 Hours Hindi asr challenge 2022

A challenge on Automatic Speech Recognition for Hindi is being organized as part of INTERSPEECH 2022 by sharing the spontaneous telephone speech recordings collected by a social technology enterprise Gram Vaani. The regional variations of Hindi together with spontaneity of speech, natural background and transcriptions with varying degrees of accuracy due to crowd sourcing make it a unique corpus for automatic recognition of spontaneous telephone speech.

Click here to download the flyer

Challenge overview

Recent advancements in Speech technology have shown that ASR systems can work on par with humans. To build a good ASR system requires large amounts of training data and high-end computational resources.

However, when it comes to Indian languages, not everyone, especially academic institutions and startups, have access to these resources. As a part of this challenge, we will be releasing telephone quality speech data in Hindi. Everyone who participates in this challenge will then be free to use this data for research purposes.

As a part of INTERSPEECH 2022 a special session is being organized on low resource ASR development. More details can be found on this link

https://www.nist.gov/itl/iad/mig/low-resource-asr-development-special-session-interspeech-2022

What makes this proposed session special?

Since this will be a focused challenge and all participants will be building systems using the data released as a part of the challenge, a special session would be more appropriate than having this be part of the main conference.

This special session will encourage collaboration between speech researchers and experts in languages and linguistics, due to the nature and type of the data posed by the challenge.

We will be releasing a baseline recipe which will ensure that the barrier for entry is low, and will encourage submissions from many research groups all over the world

About Gram Vaani

Gram Vaani operates several voice-based participatory media platforms across the country. These platforms work on IVR (Interactive Voice Response) as the primary channel for interaction: People call a unique phone number publicized by Gram Vaani, the IVR server then cuts the call and automatically calls the person back, thus making the call free for people. Over this call, users can record a voice message which they want to share, or listen to voice messages left by other users. These voice messages range across a number of domains: hyperlocal news reported by citizen journalists, questions on agriculture or health, grievances related to access to social entitlements, and also folk songs and poems. When a voice message is received, Gram Vaani's content moderators review the message, and if deemed acceptable then the message is published on the platform and can be heard by other users, who can add comments or replies or contribute their own messages. These voice recordings by thousands of Gram Vaani users, and the corresponding transcriptions, comprise the ASR dataset underlying this challenge.

More about Gram Vaani: https://gramvaani.org/

To understand the platform better, you can also refer to the following papers describing the operational model, the use of the platform for social accountability and behavior change communication, and a discussion of the moderation policies.

Important Dates

  • Release of training data : 1 February

  • Evaluation data release : 12 March

  • Last date to upload eval results: 15 April

  • Announcement of winner : 30 April

Note: If you are submitting the draft to the special session of INTERSPEECH 2022, and if you wish to include the Eval results, our turn around time for giving the performance is 48-72 hrs


Registrants location all over the world