Jul. 31, 2025 Webpage released
The Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA) is an important event that brings together researchers and industry professionals to drive innovation and share knowledge. Since it began in 2005, as a joint effort between the Audio and Acoustic Signal Processing Technical Committee (AASP TC) and the Speech and Language Technical Committee (SLTC), the workshop has focused on encouraging collaboration across different fields. HSCMA provides a platform for exchanging new ideas and practical solutions to industry challenges. The workshop centers on hands-free speech communication and microphone arrays, which are key in applications like voice-controlled assistants, teleconferencing, and speech recognition in tough acoustic environments. Major challenges in this area include handling environmental noise, reverberation, and variations in human speech.
Recently, there has been a strong focus on using multiple modalities for speech and audio processing. This is supported by the growing presence of wearable technologies in our lives. Wearables naturally offer a variety of multi-modal signals, making them a great platform for applying the themes of the HSCMA workshop. Topics like multi-channel signal processing, noise robustness, and hands-free communication fit well with the capabilities and uses of wearable technologies. The HSCMA workshop provides a valuable opportunity to advance speech processing systems, encouraging the development of practical strategies that connect research with real-world applications. By focusing on the intersection of these emerging technologies and established research areas, the workshop aims to improve user experiences and tackle real-world challenges in hands-free communication.
The International Workshop on Speech Processing in Everyday Environments (CHiME) series, now in its 9th edition, has been a significant event since its inception in 2011, bringing together researchers in speech enhancement, speech and speaker recognition, computational hearing, and machine learning. The workshop is dedicated to addressing the challenges of processing speech in real-world environments characterized by acoustic clutter and dynamic sound sources. This year, the CHiME-9 challenge introduces two innovative tasks that push the boundaries of speech processing technology:
The Multi-Modal Context-aware Recognition (MCoRec) task focuses on accurately transcribing speech from overlapping conversations using both audio and video inputs. Given an audio recording and video footage of participants, the system must differentiate between concurrent conversations and generate a transcription for each conversation. The objective is to answer the question: Who speaks, when, what, and with whom?
The Enhancing Conversations to address Hearing Impairment (ECHI) task focuses on enhancing conversational speech for hearing devices. It provides a unique dataset of real conversations recorded with 4-channel hearing aids in a noisy, cafeteria-like environment. The goal is to develop low-latency systems that enhance the speech of the hearing device wearer’s conversational partners while suppressing all competing sound sources.
By presenting these tasks, the CHiME workshop continues to drive innovation in the field, encouraging the development of cutting-edge solutions that enhance the robustness and accessibility of speech processing technologies in everyday environments. The workshop serves as a vital platform for exchanging ideas and fostering collaboration among researchers and industry professionals, ultimately contributing to the advancement of speech processing systems that can tackle real-world challenges. This year's tasks particularly emphasize the topics of multi-modality and wearables, aligning with the overarching theme of the workshop.
This time, HSCMA and CHiME will be held as a one-day satellite workshop of IEEE ICASSP 2026. The collaboration between the HSCMA and CHiME workshops is an opportunity to leverage the strengths of both events. Both workshops have historically shared a focus on key areas such as noise robust speech recognition, spatial audio processing, source separation, and multi-modal signal processing, among other topics. CHiME's tradition of introducing new datasets and tasks provides a structured environment for researchers to test and refine their ideas, complementing the innovative concepts often presented at HSCMA. The connection of the workshops will allow the attendees to exchange ideas and inspire advancements in the field of speech and audio processing.
Katerina Zmolikova
Meta AI
Shota Horiguchi
NTT, Inc.
Shinji Watanabe
Carnegie Mellon University
Marc Delcroix
NTT, Inc.
Paola Garcia
Johns Hopkins University
Minje Kim
University of Illinois at Urbama Champaign
Speech and Language Processing Technical Committee (IEEE Signal Processing Society)
Audio and Acoustic Signal Processing Technical Committee (IEEE Signal Processing Society)
Data Science Initiative (IEEE Signal Processing Society)