Challenge Rules

1) The target speakers for all tracks will be as follows -

3 unseen Hindi speakers (F - IndicTTS - Indian, M - IndicTTS - Indian, F - SPIRE - Indian )
3 unseen Kannada speakers (F - IndicTTS - Indian, M - IndicTTS - Indian, M - SPIRE - Indian)
3 speakers from VCTK - 248 (F - Indian), 294 (F - English), 326 (M - Australian)

2) The target speakers should not be used for base model training. Hindi and Kannada speakers from IndicTTS corpus and 3 speakers from VCTK - 225, 294, 326 must not be used for training.

3) For Track 1 and 2, data corresponding to 5 mins of audio will be provided for the target speaker for a few shot training.

4) For Track3, none of the data of the target speakers will be used for training.

5) External pretrained TTS models or vocoders are allowed only for tracks 2 and 3. For track 3, pretrained models should not be trained on target speakers.

6) Any type of pretrained speaker embeddings are allowed for all three tracks.

7) Any model architecture is allowed - 2 stage / 1 stage / classical approaches, etc.

8) Any type of loss auxiliary objectives are allowed, such as ASR based penalty, speaker similarity etc.

9) We will share 1 reference file for each target speaker for inference.

NOTE: The intellectual property (IP) is not transferred to the challenge organizers, i.e., if code is shared/submitted, the participants remain the owners of their code (when the code is made publicly available, an appropriate license should be added).

Page updated

Google Sites

Report abuse