Introduction
Audio and speech technology has recently achieved unprecedented success in real-world applications, driven primarily by self-supervised pre-training of large neural networks on massive datasets. While state-of-the-art models show remarkable performance across various tasks, their growing complexity raises fundamental questions about interpretability, reliability, and trustworthiness. Despite increasing interest from our community, we lack a deep understanding of what information is encoded in speech representations and what abstractions are learned during model training. This special session brings together researchers working on making audio and speech models more interpretable and explainable, drawing insights from machine learning, cognitive science, linguistics, speech science, and neuroscience.
Topics
Topics include, but are not limited to:
Applying analytic techniques from neuroscience to understand speech models
Probing intermediate representations to decode linguistic and paralinguistic information
Linguistically-informed analysis of speech representations at the levels of phonetics, phonology, and prosody
Developing novel interpretable architectures for speech and audio processing
Understanding latent representations using speech synthesis
Analyzing model robustness to speaker variations, accents, and acoustic conditions
Cognitively motivated analysis of speech models and their representations
Adapting visualization and interpretability techniques from other modalities to audio signals
Bias mitigation, causal inference, posthoc explanation and intervention analysis
Model safety and adversarial robustness
Analysis of multimodal and multilingual speech models
Extending interpretability methods to new tasks such as voice conversion, speech translation, and text-to-speech synthesis
Developing new methods and benchmarks for interpretability in the audio modality
We encourage submissions demonstrating the value of interpretability research by highlighting actionable insights and their impact on model robustness, safety, and bias mitigation.
Important Dates
Paper Submission Portal Open: 18 December 2024
Paper Submission Deadline: 12 February 2025
Paper Update Deadline: 19 February 2025
Paper Acceptance Notification: 21 May 2025
For changes and updates to the dates, always refer to the official Interspeech 2025 website.
Paper Submission and Session Format
Paper submissions must conform to the format defined on the Interspeech 2025 website. When submitting the paper in the Interspeech electronic paper submission system, please indicate that the paper should be included in the Special Session on Interpretability in Audio and Speech. All submissions will take part in the normal paper review process.
The session format will be a poster session or an oral presentation, depending on the number of accepted papers. We will, therefore, inform participants about the format depending on the outcome of the reviewing process.
Contact
If you have questions, contact either Aravind Krishnan 📧 or Francesco Paissan 📧.