Objective metric & filtering
We perform an objective evaluation on all submissions with a bag of speech recognition systems. We further rank the submissions based on Character Error Rate (CER) and consider the top 10 submissions for subjective evaluation. The details regarding the ASRs to be used are shown below. We choose to use open-sourced ASR models (with <30% WER in their test sets) added to huggingface so that the participants can utilize it to gauge their performance. More models may be added in the next few days. The objective evaluation scripts is shared through the challenge github repository. Note that we use multiple ASRs on a single utterance, consider the transcription with least CER and obtain the corpus CER from the selected transcripts. The score is obtained for all the evaluation sentences.
Hindi ASR
IndicWav2Vec wav2vec2 hi finetuned - https://huggingface.co/ai4bharat/indicwav2vec-hindi
Vakyansh wav2vec2 hi finetuned - https://huggingface.co/Harveenchadha/vakyansh-wav2vec2-hindi-him-4200
Telugu ASR
CSTD Telugu espnet joint attn-conformer - https://huggingface.co/viks66/CSTD_Telugu_ASR
IndicWav2Vec wav2vec2 te finetuned - https://huggingface.co/Harveenchadha/vakyansh-wav2vec2-telugu-tem-100
Marathi ASR
huggingface XLSR finetuned on openslr64 - https://huggingface.co/tanmaylaud/wav2vec2-large-xlsr-hindi-marathi
huggingface XLSR finetuned on openslr64 - https://huggingface.co/sumedh/wav2vec2-large-xlsr-marathi