Submission from participants

BACK

General rules:

The last date for final API submission is on January 27, 2023. However, each team has to submit a trial version of API (for at least one track) on or before January 20, 2023. (This is needed to check whether the organizers encounter challenges in executing the shared API before the final day of submission.)

If required, the challenge contributors may ask for the code base too. However, the intellectual property (IP) is not transferred to the challenge organizers, i.e., if the code is shared/submitted, the participants remain the owners of their code (when the code is made publicly available, an appropriate license should be added)

Track-wise list of things to submit

========================================================================================================

Track 1:

Things to share: The model API, Summary doc (file IDs)

Track 2:

Things to share: The model API, Summary doc (file IDs, breakdown of the parameters)

Track 3:

Things to share: The model API, Summary doc (file IDs, breakdown of the parameters)

========================================================================================================

Specifications for the Model API:

The API should accept 3 keys -

text, spk, lang

Input text to TTS will be passed through the “text” key.

The “text” will contain Marathi, Hindi, and Telugu characters in their respective scripts. The Unicode values of the expected text is updated on the challenge githjub repository - https://github.com/bloodraven66/ICASSP_LIMMITS23/blob/main/evaluation/symbols.json

Target speaker id will be passed through “spk”

The speaker id mapping is as follows -

Marathi Male - mr_m

Marathi Female - mr_f

Telugu Male - te_m

Telugu Female - te_f

Hindi Male - hi_m

Hindi Female hi_f

The language id mappings are as follows -

Marathi - mr

Telugu - te

Hindi - hi

The input from these keys will be used to infer from the trained TTS models (note that using language id is optional).

The Rest API will be called one utterance at a time, i.e., no batch input. The synthesized waveform should be returned as .wav with a sampling rate of 22050Hz. For reference, check the GitHub repository for the baselines.

An example for the API to be used is available at https://docs.google.com/document/d/1CBqCJ5p5AX7ryxYZNZ1YLDFx337JDF1xUil31hY8CpU/edit?usp=sharing

Specifications for the Summary Doc

A Summary Doc is also needed which describes the details of files IDs, and model parameters so that we can replicate the same if the need arises. For Track 2 and 3 layer-wise breakdown of the parameters of the final model is also mandatory. Zip all the files, for all the tracks you are submitting to, and upload it to the final submission form.

Track 1

List of file id - the list of file ids used for train and dev sets. Add the ids for all languages. Example -

https://github.com/bloodraven66/ICASSP_LIMMITS23/blob/main/evaluation/track1_fileids_dev.txt

https://github.com/bloodraven66/ICASSP_LIMMITS23/blob/main/evaluation/track1_fileids_train.txt

Note that the duration per speaker from the raw audio should not exceed 5 hours.

Ensure that your experiments are reproducible - constant seeds, logs (weights and biases/ tensorboard, etc).

Track 2

You will share a file containing the summary of the effective model parameters used. If the effective number of parameters varies per input, report the average across your dev set.

Example -

https://github.com/bloodraven66/ICASSP_LIMMITS23/blob/main/evaluation/track2_model_params_summary.txt