Interspeech 2025 Special Session
The quality and availability of datasets are crucial for advancing the field of speech research. Reliable models rely heavily on high-quality data, making data collection and annotation pivotal. However, the challenges associated with collecting, curating, annotating and sharing large datasets are numerous, especially for domains such as child, health and low-resource languages speech. These include privacy concerns, standardization issues, and the significant cost and time involved in manual annotation. This special session thus aims to bring together researchers to share their experiences, methodologies, and insights related to speech data collection and annotation.
Many datasets are either proprietary or subject to privacy restrictions, preventing them from being openly shared. Yet, the methods and guidelines used to collect and annotate these datasets are often as valuable as the data itself. By sharing these processes, even for datasets that cannot be made public, the community can enhance reproducibility and foster the creation of similar high-quality datasets. This exchange of knowledge can help avoid common pitfalls in data collection and annotation while promoting innovation in creating automated annotation workflows.
The session emphasizes the need to standardize annotation practices and methodologies across the field. Discrepancies in annotation schemes, guidelines, and practices often hinder the ability to combine datasets or reuse data for training machine learning models. Establishing more uniform practices can lead to greater interoperability between datasets, ultimately improving model performance and the robustness of research outcomes. Moreover, sharing best practices and experiences can significantly benefit researchers by highlighting how to select appropriate cohorts and materials, and implement quality assurance measures for data collection and annotation.
Another key focus is to explore ways to automate the data annotation process. Manual annotation, while essential for ensuring quality, is both costly and time-consuming. Developing automated workflows that integrate human annotators efficiently can reduce the burden while maintaining high levels of reliability and accuracy. Techniques to evaluate the quality of both human and automated annotations, such as inter-annotator reliability and consistency checks, will also be discussed, providing attendees with practical methods for improving their annotation pipelines.
Including but not limited to:
Selection of cohort and prompts for data collection
Methods for evaluating the quality of collected speech data
Automating the annotation process and integrating human annotators
Guidelines for manual annotation and automatic workflows
Evaluating the reliability and consistency of human annotations
Standardizing data collected with different protocols, data formats and annotations
Negotiating privacy, confidentiality, and legal constraints
Sharing processes and lessons learned in lieu of sharing private datasets
Using synthetic data to augment the dataset
Please adhere to the standard INTERSPEECH paper submission guidelines available on the official website.
When submitting your paper, select “Challenges in Speech Data Collection, Curation, and Annotation” as the subject area.
Submitted papers will go through the same review process as the regular papers.
Paper Submission Portal Open: 18 December 2024
Paper Submission Deadline: 12 February 2025
Paper Update Deadline: 19 February 2025
Paper Acceptance Notification: 21 May 2025
Beena Ahmed, University of New South Wales
Mostafa Shahin, University of New South Wales (main contact)
Tünde Szalay, Macquarie University
Tan Lee, Chinese University Hong Kong
Mark Liberman, University of Pennsylvania
Mengyue Wu, Shanghai Jiao Tong University
Thomas Schaaf, Solventum
Ahmed Ali, Saudi Data and Artificial Intelligence Authority
Carlos Busso, Carnegie Mellon University