Latest Updates (as of 28/5/2024):

The MERLIon CCS Challenge will remain open indefinitely to encourage model development. The development and evaluation set is publicly available (

When using the dataset, please cite:

For a detailed description of the MERLIon CCS dataset, check out our Interspeech 2023 paper:

The winning systems for open and closed tracks presented the following papers during the INTERSPEECH 2023 special session:

We also presented an analysis of common errors where submitted systems collectively struggle when performing language identification on complex speech:

Updates (as of 14/8/2023):

🎉MERLIon CCS Challenge has been accepted as a special session at Interspeech 2023 🎉 We are looking forward to seeing everyone! 

The following papers related to the challenge have been accepted at Interspeech 2023:

The data archive for the MERLIon CCS dataset is now available:


The inaugural MERLIon CCS Challenge focuses on developing robust language identification and language diarization systems that are reliable for non-standard, accented, spontaneous code-switched, child-directed speech collected via Zoom.  

Due to a bias towards standard speech varieties, non-standard, accented speech remains an ongoing challenge for automatic speech processing. Although existing works have explored automatic speech recognition and language diarization in code-switching speech corpora, those tasks are still challenging for natural in-the-wild speech containing more than one language, particularly when the code-switching occurs in short language spans. Moreover, as child-directed speech contains acoustic features difficult for automatic language identification and language diarization, speech processing systems often struggle with natural speech of this kind. 

Aligning closely with Interspeech 2023’s theme, 'Inclusive Spoken Language Science and Technology – Breaking Down Barriers', we present the challenge of developing robust language identification and language diarization systems that are reliable for non-standard accented, bilingual, child-directed speech collected via a videocall platform. 

As videocalls become increasingly ubiquitous, we present a unique first-of-its-kind Zoom videocall dataset: The MERLIon CCS Challenge will tackle automatic language identification and language diarization in a subset of audio recordings from the Talk Together Study, where parents narrated an onscreen wordless picturebook to their child. The main objectives of this inaugural challenge are: 

Techniques developed in the challenge may benefit other related fields allowing greater understanding of how code-switching occurs in real-life situations.

The challenge will feature language identification (Task 1) and language diarization (Task 2). Two tracks, open and closed, are available. The tracks differ by the data used during system training.

Register here!

DID YOU KNOW?✨ With the body of a mermaid and the head of a lion, the Merlion is a national icon of Singapore. ✨Just as the Merlion is a mix of different creatures, the Singaporean code-switched child-directed speech in this challenge is a mix of different languages✨

Important Dates

All deadlines are AOE!
Registrations Open : 18 Jan 2023 🎉

Registrations Close: 24 Feb 2023 🎉

Training Data Partitions Release: 25 Jan 2023 🎉

Evaluation Plan Release: 27 Jan 2023 🎉

Data Release (Development Set): 27 Jan 2023 🎉

Baseline System Release: 13 Feb 2023 🎉

Data Release (Evaluation Set): 16 Feb 2023 🎉

Leaderboard Active: 17 Feb 2023 🎉

Official Evaluation Closes (Leaderboard Freeze): 28 Feb 2023 Extended to 2 Mar 2023!

INTERSPEECH Paper Submission Closes: 1 Mar 2023

System Description Submission: 2 Mar 2023

INTERSPEECH Paper Update Submission Closes: 8 Mar 2023

Leaderboard Reopens*: 10 Mar 2023

INTERSPEECH Acceptance: 17 May 2023

*After the end of the official challenge period, the leaderboard will reopen for teams who want to continue developing their systems prior to Interspeech session (optional).



We would like to thank the Linguistic Data Consortium for providing Mandarin-English Codeswitching in Southeast Asia (LDC2015S04) Corpus for the challenge. 

Contact Us

For questions, please get in touch with Victoria at

Do join our mailing list or LinkedIn group for all challenge updates!