Task 2: Language Diarization

The second task of the MERLIon CCS challenge is language diarization. During development, systems are provided with audio recordings where ground-truth timestamps and language labels have been annotated. During evaluation, only audio is provided. The datasets for development and evaluation are discrete audio recordings from the Talk Together Study. Timestamp information from the evaluation dataset for Task 1 (Language Identification) is unsuitable for Task 2 (Language Diarization).

There are open and closed tracks for this task, which places rules as to what training data can be used in the respective tracks. For more information, please see Datasets

Participation in the closed track for Task 1 (Language Identification) is compulsory for all teams participating in the challenge, while participation in any track for Task 2 (Language Diarization) is optional. 

Scoring

The target languages to be evaluated in the challenge are English and Mandarin. Other languages that may appear in the recordings will not be evaluated. The primary evaluation metric is the total language diarization error rates, while the secondary evaluation metrics are the individual English and Mandarin language error rates. For more information on the evaluation metrics and guidelines, please refer to the Evaluation Plan.

The scoring script for generating these metrics are available on the GitHub repository.

Submission

For each audio recording, there should be a RTTM file, named according the filename of the audio recording.

Each line in the RTTM file should contains three space-delimited fields, start time, end time and language id, indicating the onset and offset of language turns in milliseconds.

For submission, all RTTM files to be evaluated must be placed in a zip folder (with no spaces in the filename) with the following structure:

results.zip/

├── TTS_P12345TT_VCST_ECxxx_01_AO_12345678_v001_R004_CRR_MERLIon-CCS.txt

├── TTS_P22345TT_VCST_ECxxx_02_AO_45678910_v001_R007_CRR_MERLIon-CCS.txt


└── ...

For more information on the results format guidelines, please refer to Appendix C of the Evaluation Plan.  

Results submission for the challenge will be on CodaLab. For more information, please refer to Submission.


Leaderboards