FAQ

Submission Guidelines

The challenge comprises four tracks, each evaluated on both read and spontaneous speech test sets. Participants are welcome to submit results for any subset of these tracks. All submissions must be made via the web interface using .tsv files that strictly follow one of the accepted formats described below.

Important:

Submissions must not mix different formats in a single file.
Each uttid (utterance ID) must match the IDs in the provided RESPIN test sets.
Ensure correct labelling (LID/DID) where applicable.

Accepted File Formats with Examples

Case 1 – ASR + Language Identification (LID)

Each line must contain: <uttid> <tab> [lid] <tab> <decoded_text>

Example:

281474977512428 [ch] बायोगैस टन से बनत बाटे

281474977512441 [mt] दौरी छिट्टा मड़ई बनावे में करल जाला

281474977514000 [mg] ई कुल कब से हो सकला

Case 2 – ASR + Language + Dialect Identification (LID + DID)

Each line must contain: <uttid> <tab> [lid_did] <tab> <decoded_text>

Example:

281474977512428 [ch_D1] बायोगैस टन से बनत बाटे

281474977512441 [mt_D2] दौरी छिट्टा मड़ई बनावे में करल जाला

281474977514000 [mg_D3] ई कुल कब से हो सकला

Case 3 – ASR Only

Each line must contain: <uttid> <tab> <decoded_text>

Example:

281474977512428 बायोगैस टन से बनत बाटे

281474977512441 दौरी छिट्टा मड़ई बनावे में करल जाला

281474977514000 ई कुल कब से हो सकला

Qn 1) How can I access the dataset?

Ans) Dataset is hosterd here: https://ee.iisc.ac.in/madasrdataset/

Qn 2) Will there be separate evaluations for all languages for the test set?

Ans) No, the test set evaluation will be combined. Participants need to build multilingual speech recognizers and there will be no separate evaluation for each of the languages.

Qn 3) I see that in the train and dev data, there is speaker and dialect information about each utterance. Will this also be provided for test data?

Ans) No. Language, dialect, speaker, and sentence IDs will not be provided for test-set utterances.

Qn 4) Is using pretrained models (such as wav2vec) allowed in Track 1, or only in Track 3 & 4?

Ans) No. Using pretrained models involves leveraging external acoustic features, it is allowed only in tracks 3 and 4.

Qn 5) How will the evaluation be performed?

Ans) Jiwer (https://pypi.org/project/jiwer/) will be used to compute test set level WER/CER.

Qn 6) Regarding the hypothesis files, is it fine for them to contain [extra symbols] in the text? Will these symbols be ignored during evaluation, or should we preprocess the files before submission and remove any [extra symbols]?

Ans) We will not preprocess any text from the submitted hypothesis

Qn 7) If we utilize a multilingual (only acoustic model) trained only on the challenge data, would our submission still be considered under track 1?

Ans) Yes, you may use the challenge data to build multilingual models for tracks 1 and 2

Qn 8) Is there any requirement regarding the paper submission?

Ans) Your submission will be treated as a regular paper and will go through the normal review process.

Note: If you need any clarification on models/datasets allowed for specific tracks, contact the organizers.

Page updated

Google Sites

Report abuse