VoiceMOS Challenge 2024
News
2024.6.14: The VoiceMOS Challenge 2024 is officially over! Now you can freely get the datasets by registering through the CodaBench page.
2024.4.11: The VoiceMOS Challenge 2024 CodaBench page is live and the training data is released! Participants, please check your email.
2024.4.3: Challenge website and registration go public! We sincerely wait for your participation!
Founded in 2022, the VoiceMOS Challenge (VMC) series aims to use standardized datasets in diverse and challenging domains to understand and compare prediction techniques for human ratings of speech, specifically those collected through a mean opinion score (MOS) test. The main motivation is to foster development in automatic, reference-free, and data-driven speech quality assessment approaches, to overcome the costly and time-consuming human listening tests that are conventionally regarded as the gold standard for evaluating synthesized and processed speech such as text-to-speech synthesis (TTS), voice conversion (VC), and speech enhancement (SE).
This year's challenge is a collection of three tracks, as described below. Like previous challenges, the primary evaluation metrics for MOS prediction will focus on correct ranking of synthesis systems in each track. As usual, there is no participation fee. Also, this year the challenge will be held on CodaBench (https://www.codabench.org/). If you have participated in the previous VMCs, CodaBence is an upgraded version of the CodaLab platform.
Tracks
Track 1: MOS prediction for "zoomed-in" systems
The first track aims to predict the MOS of a “zoomed-in” subset comprising the top systems in the BVCC dataset collected through a separate listening test. We believe this track reflects the need of present-day speech synthesis researchers to compare only high-quality synthesis systems.
Track 2: MOS prediction for singing voice
The second track is based on a new dataset containing samples and their ratings from singing voice synthesis and conversion systems. This track can be regarded as an extension of the SVCC track in VMC 2022, with a larger variety of systems, listeners, and languages.
Track 3: Semi-supervised MOS prediction for noisy, clean, and enhanced speech
The third track is semi-supervised MOS prediction for noisy, clean, and enhanced speech. Participants will only be allowed to use a very small amount of MOS-labeled data provided by organizers.
Participate
Now you can freely register to the CodaBench challenge page and download the datasets! https://www.codabench.org/competitions/2650/
NOTE: Registration is open until the end of the challenge (June 3)!
Please fill in the registration form: https://forms.gle/tBBeNdvHghAdjTg27. We will contact you once we confirm your registration.
Schedule
The tentative schedule for the VoiceMOS challenge 2024 is as follows:
Wednesday, April 10: Training datasets are released on the CodaBanch page.
Monday, May 27: Evaluation dataset released to participants.
Monday, June 3: Predicted scores submission deadline.
Thursday, June 13: Results announced.
Thursday, June 20: SLT 2024 paper deadline.
Rules
General rules:
Registration must be done with an institutional email address (e.g., university or company), not a personal one.
Participants are required submit a system description after the challenge ends.
Any public dataset may be used to develop your prediction system, and the datasets used must be reported in the system description. Use of proprietary datasets, including collecting your own MOS ratings, is not permitted unless the resources are publicly available.
Specific rules for task 3:
The participants can only use the labeled data (<speech, subjective rating>) provided by the organizers. This implies that other off-the-shelf subjective speech quality estimators (including but not limited to, e.g., PESQ, STOI, MOSNet, SSL-MOS, UTMOS) cannot be used during model training.
If you have concerns about what data/tool can be used, please send us an email to check.