Allison Koenecke is an Assistant Professor of Information Science at Cornell Tech. Her research on algorithmic fairness applies computational methods, such as machine learning and causal inference, to study societal inequities in domains from online services to public health. Koenecke is regularly quoted as an expert on disparities in automated speech-to-text systems. She previously held a postdoctoral researcher role at Microsoft Research and received her PhD from Stanford's Institute for Computational and Mathematical Engineering. She is the recipient of several NSF grants and a Cornell CIS DEIB Faculty of the Year Award, and has been honored as a Sloan Fellow in Computer Science and Forbes 30 Under 30 lister in Science.
Abstract: Automated speech recognition (ASR) systems are used in a variety of applications to convert spoken language to text -- from scribing patient notes, to conducting hiring interviews, to writing police narrative reports. The risks of ASR underperformance are real, and disproportionately placed on certain demographics of speakers – often those who are marginalized in the contexts in which ASR is being used. We argue for more principled audits to be conducted on ASR systems, analogous to post-marketing surveillance for medical devices. Our framework for doing so involves: (1) collecting diverse, domain-specific speech datasets representative of real patient and provider populations, (2) developing metric suites that go beyond the singular gold standard of Word Error Rates, and (3) conducting human-centered design research to align functionality with user needs. We conclude by discussing possibilities for the “who, what, when, and how” of functionally conducting such audits.
Éva Székely is an Assistant Professor in Speech Technology at KTH Royal Institute of Technology in Stockholm. She works at the intersection of speech technology and speech science, with a focus on developing conversational text-to-speech and studying the perception of synthetic voices. She leads several nationally and foundation-funded research projects on spontaneous and conversational speech modelling, and also pursues work on inclusive speech technologies, including gender-diverse voice design, synthetic voices for assistive communication, and methods for detecting and mitigating bias in speech foundation models. She has published extensively in leading speech technology venues, and her work includes open-sourced methods for prosody evaluation and bias detection. She holds an MSc from Utrecht University in Speech and Language Technology and a PhD from University College Dublin. In 2025 she was named among Hungary’s Top 15 Women in Artificial Intelligence.
Title: Do we measure what we value or value what we measure? What an edge-first approach reveals about progress in speech AI
Abstract: The commonly held view in speech AI is that listening tests capture the subjective aspects of system performance, while automated metrics are seen as objective. This combination is generally treated as providing reasonable, if imperfect coverage. On this basis, we rank models, select winners, and demonstrate progress. As speech technology becomes increasingly benchmark-driven, however, these comparisons prove more complex in practice. In this talk, I share stories and insights from four projects developed in close collaboration with individuals with highly specific communication needs: voice reconstruction following sudden-onset speech loss, speech recognition for highly atypical speech, speaker trait recognition training for cochlear implant users, and voice design for nonbinary users of speech-generating devices. Beginning with individual cases at the edges where systems break, reveals gaps that both perceptual and automated scores can overlook. My hope is to open a conversation about how we can navigate this tension, meeting the comparative expectations of our venues while moving closer to measuring what we actually care about.