VoiceMOS Challenge 2026

News

2026.5.20: We have sent out emails to inform participants the CodaBench page and instructions on the training data! Please contact us if you registered but did not receive the email!
2026.4.16: Challenge website and registration go public! We sincerely await your participation!

Founded in 2022, the VoiceMOS Challenge (VMC) series aims to use standardized datasets in diverse and challenging domains to understand and compare prediction techniques for human ratings of speech, specifically those collected through a mean opinion score (MOS) test, thus the name VoiceMOS Challenge. The main motivation is to foster development in automatic, data-driven speech assessment approaches, to overcome the costly and time-consuming human listening tests that are conventionally regarded as the gold standard for evaluating speech.

After running VMC for three years, in 2025 we organized the AudioMOS Challenge (AMC), where we enlarged the scope to singing voices, music, and even general synthetic audio. Despite the success, we received feedback from the community that main problems in the evaluation of speech remain unsolved. In 2026, we thus decided to roll back to VMC, where we put our focus back on speech again.

For this year, the primary evaluation metrics for MOS prediction will be utterance-level Spearman’s rank correlation coefficient (UTT-SRCC). As usual, there is no participation fee, and the challenge will be held on CodaBench (https://www.codabench.org/). We are still deciding on the venue for participants to submit their papers, and the current plan is to host a special session or a satellite workshop at ICASSP 2027.

Tracks

Track 1: Predicting ACR and CCR of speech quality for speech enhancement systems

The first track is on speech generated by speech enhancement systems. This track is based on the subjective listening test data from the ICASSP 2026 URGENT Challenge. The dataset contains enhanced speech samples from the top six performing systems in the challenge, evaluated on 840 multilingual utterances across nine languages. Given an input speech sample, the system is required to predict both the Absolute Category Rating (ACR) and Comparative Category Rating (CCR).

Track 2: Predicting MOS of speech naturalness and emotion similarity for emotional TTS systems

The second track focuses on synthetic speech from emotional TTS systems and emotional human speech. Given an input speech sample, the system is required to predict (1) MOS for speech quality, (2) MOS for emotion (degree of similarity to the target synthesized emotion label), as well as, optionally, (3) listeners' categorical choices of perceived emotions and (4) ratings for valence, arousal, and dominance.

Track 3: Predicting MOS of speaker and accent similarity for codec-based speech synthesis systems

The third track targets accented English speech generated by codec-based speech synthesis systems. This track is based on the CodecMOS-Accent dataset, which contains 4,000 samples from 24 contemporary codec resynthesis and TTS systems, featuring 32 speakers across ten distinct accents. Given an input speech sample and a reference speech sample, the system is required to predict both the speaker and accent similarity scores.

Participate

NOTE: Registration is open until the end of the challenge (August 7)!

Please fill in the registration form: https://forms.gle/L6YdkUf1PJdSSwLU7

Once we confirm your registration, we will contact you, including the link to the CodaBench page, and instructions on how to download the datasets. (Note this won’t happen until the release date)

Schedule

The tentative schedule for the VoiceMOS challenge 2026 is as follows:

Friday, May 22 (or earlier): Training datasets are released on the CodaBench page.
Friday, July 31: Evaluation dataset released to participants.
Friday, August 7: Predicted scores submission deadline.
Monday, August 31: Results announced.
Wednesday, September 16: ICASSP 2027 paper deadline.

Rules

Registration must be done using an institutional email address (e.g., university or company), not a personal one (unless you are attending as an individual researcher).
Participants are required to submit a system description after the challenge ends.
Any public dataset may be used to develop your prediction system, and the datasets used must be reported in the system description. Use of proprietary datasets, including collecting your own MOS ratings, is not permitted unless the resources are publicly available.

Baseline toolkit

This year we have a centralized Github repo for the baseline systems! Codebase: https://github.com/voicemos-challenge/vmc2026-baselines

Track 1: URGENT-MOS (Paper: https://arxiv.org/abs/2601.18438)
Track 2:
- QMOS Baseline: UTMOS (Paper: https://www.isca-archive.org/interspeech_2022/saeki22c_interspeech.html)
- Emotion Categories Baseline: Emotion2vec (Paper: https://aclanthology.org/2024.findings-acl.931.pdf)
- EMOS Baseline: Gemini LLM-as-judge.
- Valence, Arousal, and Dominance Baseline: Gemini LLM-as-judge.
Track 3: a speaker embedding-based method

Organizing Committee

Wen-Chin Huang & Tomoki Toda (Nagoya University, Japan)
Erica Cooper (National Institute of Information and Communications Technology, Japan)
Wei Wang (Shanghai Jiao Tong University, China)
Marvin Sach (Technische Universität Braunschweig, Germany)
Xiaoxue Gao (A*STAR, Singapore)
Nicholas Sanders (University of Edinburgh, UK)

Page updated

Google Sites

Report abuse