2025.9.3. The summary paper of the AudioMOS Challenge 2025 is on arXiv now! [Paper]
2025.6.25. The challenge is officially over! Please fine the dataset links below. We will release the summary paper after acceptance. Stay tuned!
2025.6.16. The results are released to the participants! Please check if you received it or not!
2025.6.6. The evaluation phase is over! We have sent an email to participants regarding the ground truth label for the evaluation set and the system description form. Please check if you received it or not!
2025.5.28. The evaluation phase has started! The evaluation set links could be found on the CodaBench page. The submission deadline is June 4.
2025.4.22. We have received several inquiries from participants regarding confusion about the Track 3 setting. We have updated the description. Please refer to the Track 3 description page for more details.
2025.4.17. We have made an announcement regarding the Track 2 training data. Please see the announcements section below.
2025.4.17. We are pleased to inform you that the AudioMOS Challenge 2025 has been accepted as a challenge session at ASRU 2025, which will be held on Dec. 6 - 10 in Honolulu, Hawaii, USA!
2025.4.9: We have sent out emails to inform participants the CodaBench page and instructions on the training data! Please contact us if you registered but did not receive the email!
2025.3.21: Challenge website and registration go public! We warmly invite you to participate in the challenge!
With the rapid progress of generative AI technologies, synthesizing not only speech but also singing voices, music, or even general audio has become a popular research field. However, just as speech, the end users of these synthesized audio samples are human, thus the evaluation of these audio generation systems faces the same challenges as speech synthesis systems do. To facilitate research development in the automatic evaluation of audio generation systems, we decided to enlarge the scope of the challenge, and rename it as the AudioMOS Challenge.
Like previous challenges, the primary evaluation metrics for MOS prediction will focus on the correct ranking of audio generation systems in each track. As usual, there is no participation fee, and the challenge will be held on CodaBench (https://www.codabench.org/).
Here we briefly describe each track. More details can be found in the CodaBench challenge page (link will be provided once registration is done).
The first track aims to predict the MOS of text-to-music (TTM) systems. This track is based on the MusicEval dataset, which is the first dataset for synthetic music assessment. The dataset contains music clips generated by 31 prevalent and advanced TTM systems, along with ratings collected from music experts. The evaluation was conducted across two dimensions: overall musical impression and alignment with the text prompt, which respectively emphasizes the importance of both the quality of the generated music and its consistency with the given text prompt.
Recently, Meta released the Meta Audiobox Aesthetics, which aims to provide a unified solution for automatic quality assessment for speech, music, and sound. They proposed four new evaluation dimensions: production quality, production complexity, content enjoyment, and content usefulness. In this challenge, we will use their open-sourced AES-natural set as the training and development set. For the evaluation set, we will prepare samples from text-to-speech (TTS), text-to-audio (TTA) and text-to-music (TTM) systems. Participants are expected to assess the evaluation set samples along the four proposed dimensions.
The third track is MOS prediction for speech in high sampling frequencies. For the training set, we provide samples in 16kHz, 24kHz and 48kHz, along with their subjective ratings obtained from listening tests that only contained samples in the same sampling frequencies. For the development and evaluation sets, the participants are asked to make predictions of samples in order to reflect their scores in a listening test that contains samples from all sampling frqeuencies.
The challenge is over! Thank you for your participation!
NOTE: Registration is open until the end of the challenge (June 4)!
Please fill in the registration form: https://forms.gle/am1qDtEwWVmEnh5d9. Once we confirm your registration, we will contact you, including the link to the CodaBench page, and instructions on how to download the datasets.
The tentative schedule for the AudioMOS Challenge 2025 is as follows:
Wednesday, April 9: Training datasets are released on the CodaBench page.
Wednesday, May 28: Evaluation dataset released to participants.
Wednesday, June 4: Predicted scores submission deadline.
Monday, June 16: Results announced.
Wednesday, June 25: ASRU 2025 challenge paper deadline.
General rules:
Registration must be done with an institutional email address (e.g., university or company), not a personal one.
Participants are required submit a system description after the challenge ends.
Any public dataset may be used to develop your prediction system, and the datasets used must be reported in the system description. Use of proprietary datasets, including collecting your own MOS ratings, is not permitted unless the resources are publicly available.
2025.4.17:
We have received several inquiries from participants regarding issues accessing the Track 2 training set. Specifically, some of the training data originates from AudioSet, which is based on YouTube videos. Unfortunately, due to YouTube’s policies, some videos may no longer be available and therefore cannot be downloaded.
While we recognize the inconvenience this may cause, both we as challenge organizers and MetaAI, the primary data provider for this track, are unable to redistribute the data due to these restrictions. We appreciate your understanding.
Based on this paper: https://arxiv.org/abs/2501.10811
Dataset link: https://drive.google.com/file/d/1KjrZAzmd3k3BWZ0XofwvOG-0jvsiRjCQ/view?usp=drive_link
Baseline https://github.com/NKU-HLT/MusicEval
Training & development sets: AES-natural
Link: https://github.com/facebookresearch/audiobox-aesthetics/tree/main/audiomos2025_track2
We cannot redistribute the audio samples. Please refer to the audiobox-aesthetics repo for details.
Due to Meta policies, we cannot redistribute the labels. Please retrieve the labels from the link above.
Training & development sets: [Link] [Labels (development set)]
Baseline: fine-tuned SSL-MOS
Please note that the pre-trained model in the above link is not fine-tuned.
arXiv version: http://arxiv.org/abs/2509.01336
Wen-Chin Huang & Tomoki Toda (Nagoya University, Japan)
Hui Wang & Cheng Liu & Yong Qin (Nankai University, China)
Yi-Chiao Wu & Andros Tjandra & Wei-Ning Hsu (Meta AI, USA)
Erica Cooper (National Institute of Information and Communications Technology, Japan)