Latest Updates (as of 30/5/2023):

🎉MERLIon CCS Challenge has been accepted as a special session at Interspeech 2023 🎉 We are looking forward to seeing everyone! 

The following papers related to the challenge have been accepted at Interspeech 2023:


The data archive for the MERLIon CCS dataset is under preparation:

About

The inaugural MERLIon CCS Challenge focuses on developing robust language identification and language diarization systems that are reliable for non-standard, accented, spontaneous code-switched, child-directed speech collected via Zoom.  

Due to a bias towards standard speech varieties, non-standard, accented speech remains an ongoing challenge for automatic speech processing. Although existing works have explored automatic speech recognition and language diarization in code-switching speech corpora, those tasks are still challenging for natural in-the-wild speech containing more than one language, particularly when the code-switching occurs in short language spans. Moreover, as child-directed speech contains acoustic features difficult for automatic language identification and language diarization, speech processing systems often struggle with natural speech of this kind. 

Aligning closely with Interspeech 2023’s theme, 'Inclusive Spoken Language Science and Technology – Breaking Down Barriers', we present the challenge of developing robust language identification and language diarization systems that are reliable for non-standard accented, bilingual, child-directed speech collected via a videocall platform. 

As videocalls become increasingly ubiquitous, we present a unique first-of-its-kind Zoom videocall dataset: The MERLIon CCS Challenge will tackle automatic language identification and language diarization in a subset of audio recordings from the Talk Together Study, where parents narrated an onscreen wordless picturebook to their child. The main objectives of this inaugural challenge are: 

Techniques developed in the challenge may benefit other related fields allowing greater understanding of how code-switching occurs in real-life situations.

The challenge will feature language identification (Task 1) and language diarization (Task 2). Two tracks, open and closed, are available. The tracks differ by the data used during system training.

Register here!

DID YOU KNOW?✨ With the body of a mermaid and the head of a lion, the Merlion is a national icon of Singapore. ✨Just as the Merlion is a mix of different creatures, the Singaporean code-switched child-directed speech in this challenge is a mix of different languages✨

Important Dates

All deadlines are AOE!
Registrations Open : 18 Jan 2023 🎉

Registrations Close: 24 Feb 2023 🎉

Training Data Partitions Release: 25 Jan 2023 🎉

Evaluation Plan Release: 27 Jan 2023 🎉

Data Release (Development Set): 27 Jan 2023 🎉

Baseline System Release: 13 Feb 2023 🎉

Data Release (Evaluation Set): 16 Feb 2023 🎉

Leaderboard Active: 17 Feb 2023 🎉

Official Evaluation Closes (Leaderboard Freeze): 28 Feb 2023 Extended to 2 Mar 2023!

INTERSPEECH Paper Submission Closes: 1 Mar 2023

System Description Submission: 2 Mar 2023

INTERSPEECH Paper Update Submission Closes: 8 Mar 2023

Leaderboard Reopens*: 10 Mar 2023

INTERSPEECH Acceptance: 17 May 2023

*After the end of the official challenge period, the leaderboard will reopen for teams who want to continue developing their systems prior to Interspeech session (optional).

Organizers

Acknowledgements

We would like to thank the Linguistic Data Consortium for providing Mandarin-English Codeswitching in Southeast Asia (LDC2015S04) Corpus for the challenge. 

Contact Us

For questions, please get in touch with Victoria at merlion.challenge@gmail.com.

Do join our mailing list or LinkedIn group for all challenge updates!