Frequently Asked Questions
General Info regarding challenge
How to Participate for the challenge?
Ans) Enroll yourself by registering on this link:
2. Whom to contact with any doubts regarding dataset/ tracks/ submission?
Ans) For any queries, you can email us at
3. How to access the challenge database?
Ans) Click here to download the data.
4. Are there any eligibility criteria for applying?
Ans) No, anyone can apply.
5. If I register as a single individual, can I change to a team later?
Ans) No
6. Can we add more team members later?
Ans) Yes, you have to register again afresh with all your team members.
7. Any limit on the number of team members?
Ans) No
8. How many teams are allowed from an Organization?
Ans) No limit, but a single person should not be part of two teams
9. When will the track 2 and track 3 fine shot training samples be shared?
Ans) The samples will be shared by end of September 2023
10. When will the reference files for inference be shared?
Ans) The reference files will be shared once evaluation begins
Regarding Challenge Tracks
1. Who are eligible to submit a 2-page paper and present it at ICASSP-2024?
Ans) The challenge organizers will invite the top 5 ranked team from their leaderboard to submit the 2-page paper.
2. Can participants use external data to train the model?
Ans) Not allowed for track 1. Use of publicly available data allowed for track 2 and 3. Note that the target speakers must not be used for training, in any tracks. These includes Hindi and Kannada speakers from IndicTTS corpus, 3 speakers from VCTK (248, 294, 326)
3. Can we use own vocoders in all tracks?
Ans) Yes. Any vocoder can be used.
4. Can we use pretrained TTS models in all tracks?
Ans) Pretrained TTS models may be used only in track 2 and track 3. Additionally, the pretrained TTS model must not be trained on the target speakers.
5. Should we use all of 560 hours data shared as part of challenge?
Ans) No. The duration per speaker (greater than zero) to be used is up to the participants.
6. Should we train multi-lingual models or mono-lingual model for the required target speaker combination?
Ans) We expect all submissions to be multi-lingual models. The multi-lingual models can be fine tuned separetly for each target speaker in the few-shot tracks (1 and 2).
Regarding Challenge evaluation
1. Will the synthesised audio be resampled for subjective evaluation
Ans) Yes. All subjective evaluation will be done on 16Khz audio. The shared samples will be resampled using sox.