Join our network to engage in seminars and discussions that advance discovery and collaboration around cutting-edge research in speech and language health signals
March 23, 2026 · 5:00 PM (Eastern Time)
Dr. Beena Ahmed, PhD
Associate Professor in Signal Processing
School of Electrical Engineering and Telecommunications
University of New South Wales, Australia
Associate Professor Beena Ahmed is the Co-Director of the Signals, Information, and Machine Intelligence Lab and the Technical Lead, Connected Health at the Tyree Foundation Institute of Health Engineering at UNSW. She is currently leading research projects on the recognition and assessment of children’s and disordered speech, mispronunciation detection in disordered and accented speech. She has received $9+ million in funding, both local and internationally, and published over 100 career publications. She is also the founder of Say66, where she is translating her research to provide an automated speech therapy system for children with speech disorders. She has received multiple awards for her work in this area including the 2020 Innovation Award from Speech Pathology Australia, 2021 Women in AI and 2022 Telstra Digital Health Award.
From Clinical Speech to Language Learning: Mispronunciation Detection Across Domains
Accurate pronunciation is critical for intelligible spoken communication, yet it remains a major challenge for both individuals with speech disorders and second-language (L2) learners. In this talk, I will present my research on automatic mispronunciation detection and diagnosis (MDD) as a tool for supporting atypical and non-native speakers wanting to improve their spoken communication. I will focus on approaches that move beyond canonical, native-speaker-centric assumptions and instead account for the variability inherent in disordered speech and L2 pronunciation patterns.
I will discuss methods for detecting articulation errors using speech technologies grounded in phonetic theory and data-driven modelling, highlighting how these methods perform across different speaker populations. Attention will be paid to shared challenges across domains, such as sparse labelled data, high inter-speaker variability, and limited training data availability, as well as to domain-specific considerations that arise in clinical versus educational contexts. By comparing findings from disordered speech and non-native learner speech, this talk aims to highlight common methodological insights and opportunities for cross-fertilization between clinical speech processing and computer-assisted pronunciation training.
Meetings are held on Zoom for 60 minutes, about once a month—typically on Mondays at 8 AM PT / 11 AM ET / 4 PM London / 5 PM Berlin / 11 PM Beijing.
Join our mailing list to recieve updates and zoom link📙!
For questions, suggesting speakers, or proposing papers for a journal club, reach out to Jingyao Wu 📧 and Ahmed Yousef 📧
This network was co-founded in 2022 by Daniel Low, Tanya Talkar, Daryush Mehta, Satrajit Ghosh and Tom Quatieri as the Harvard-MIT Speech and Language Biomarker Interest Group.
It seeks to bring together researchers and students from around the world to share novel research, receive feedback, discuss papers, and kickstart collaborations.
Jingyao Wu, PhD, MIT
Ahmed Yousef, PhD, MGH & Harvard Medical School
Daniel Low, PhD, Child Mind Institute & Harvard University
Fabio Catania, PhD, MIT
Nick Cummins, PhD, King's College London
Hamzeh Ghasemzadeh, PhD, University of Central Florida
Rahul Brito, Harvard & MIT
Tanya Talkar, PhD, Linus Health
Daryush Mehta, PhD, MGH & Harvard Medical School
Satrajit Ghosh, PhD, MIT McGovern Institute for Brain Research
Thomas Quatieri, PhD, MIT Lincoln Laboratory
Curated materials (tools, datasets, readings) to support exploration and innovation in the field.
Audio
senselab: a Python package that simplifies building pipelines for digital biometric analysis on speech and voice.
Riverst: a multimodal avatar for interacting with the user(s) and collect audio and video data.
Text
Quick spacy type metrics: https://github.com/HLasse/TextDescriptives and https://github.com/novoic/blabla
Suicide Risk Lexicon, build lexicon with LLMs, and semantic similarity: https://github.com/danielmlow/construct-tracker
Audio and text
Many voice and speech datasets: Alden Blatter, Hortense Gallois, Samantha Salvi Cruz, Yael Bensoussan, Bridge2AI Voice Consortium, Maria Powell, Jean-Christophe Bélisle-Pipon. (2025). “Global Voice Datasets Repository Map.” Voice Data Governance. https://map.b2ai-voice.org/.
Bridge2AI Voice Dataset https://b2ai-voice.org/the-b2ai-voice-database/
Facebook's large-scale multimodal dataset of 4,000+ hours of human interactions for AI research: https://github.com/facebookresearch/seamless_interaction
Audio
CLAC: A Speech Corpus of Healthy English Speakers
Many speech datasets: https://github.com/jim-schwoebel/allie/tree/master/datasets#speech-datasets
Many audio visual datasets: https://github.com/krantiparida/awesome-audio-visual#datasets
Text
Many text datasets: https://lit.eecs.umich.edu/downloads.html#undefined
Many text datasets: https://github.com/niderhoff/nlp-datasets
Audio
Ramanarayanan, V., Lammert, A. C., Rowe, H. P., Quatieri, T. F., & Green, J. R. (2022). Speech as a biomarker: Opportunities, interpretability, and challenges. Perspectives of the ASHA Special Interest Groups, 7(1), 276-283.
Low, D. M., Bentley, K. H., & Ghosh, S. S. (2020). Automated assessment of psychiatric disorders using speech: A systematic review. Laryngoscope investigative otolaryngology.
Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J., & Quatieri, T. F. (2015). A review of depression and suicide risk assessment using speech analysis. Speech communication. link
Patel, R. R., Awan, S. N., Barkmeier-Kraemer, J., Courey, M., Deliyski, D., Eadie, T., ... & Hillman, R. (2018). Recommended protocols for instrumental assessment of voice: American Speech-Language-Hearing Association expert panel to develop a protocol for instrumental assessment of vocal function. American journal of speech-language pathology, 27(3), 887-905.
Text
Mihalcea, R., Biester, L., Boyd, R. L., Jin, Z., Perez-Rosas, V., Wilson, S., & Pennebaker, J. W. (2024). How developments in natural language processing help us in understanding human behaviour. Nature Human Behaviour, 8(10), 1877-1889.
Stade, E. C., Stirman, S. W., Ungar, L. H., Boland, C. L., Schwartz, H. A., Yaden, D. B., ... & Eichstaedt, J. C. (2024). Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation. NPJ Mental Health Research, 3(1), 12.
Low, D., Mair, P., Nock, M., & Ghosh, S. Text Psychometrics: Assessing Psychological Constructs in Text Using Natural Language Processing. PsyArxiv. link
Most recordings are available upon request.
26-Jan-2026 | Speech and Audio Intelligence for Health: Sensing, Reasoning, and Prediction| Ting Dang (The University of Melbourne, Australia)
15-Dec-2025 | Generating and investigating laryngeal biosignals| Andreas Kist (Friedrich-Alexander-Universität Erlangen-Nürnberg)
10-Nov-2025 | Speech as a modality for the characterization and adaptation of neurodiversity| Mark Hasegawa-Johnson (University of Illinois Urbana-Champaign)
06-Oct-2025 | From Noise to Signal: Individual Variability in Voice Fatigue Subtyping| Mark Berardi (University of Iowa)
19-May-2025 | Building your research team: Who should be in the room where it happens? | Maria Powell (Vanderbilt University Medical Center)
5-May-2025 | Clinical theory and dimensions of speech markers: Psychosis as a case study | Lena Palaniyappan (Professor of Psychiatry, McGill)
7-Apr-2025 | Toward generalizable machine learning models in speech, language, and hearing sciences: Estimating sample size and reducing overfitting | Hamzeh Ghasemzade (Massachusetts General Hospital – Harvard Medical School)
17-Mar-2025 | Exploring the Mechanistic Role of Cognition in the Relationship between Major Depressive Disorder and Acoustic Features of Speech | Lauren White (King’s College London)
3-Mar-2025 | Speech as a Biomarker for Disease Detection | Catarina Botelho (INESC-ID, University of Lisbon)
17-Feb-2025 | Exploring Intraspeaker Variability in Vocal Hyperfunction Through Spatiotemporal Indices of RFF | Jenny Vojtech (Boston University)
20-Jan-2025 | Clinically meaningful speech-based endpoints in clinical trials | Julie Liss (Arizona State University)
12-03-2024 | Revealing Confounding Biases: A Novel Benchmarking Approach for Aggregate-Level Performance Metrics in Health Assessments | Roseline Polle (Thymia)
10-16-2023 | Estimation of parameters of the phonatory system from voice | Zhaoyan Zhang (UCLA Head and Neck Surgery)
18-Nov-2024 | The interplay between signal processing and AI to archive enhanced and trustworthy interaction systems | Ingo Siegert (Otto-von-Guericke-University Magdeburg)
4-Nov-2024 | Remote Voice Monitoring System for Patients with Heart Failure | Fan Wu (ETH Zurich)
6-May-2024 | Building Speech-Based Affective Computing Solutions by Leveraging the Production and Perception of Human Emotions | Carlos Busso (UT Dallas)
1-Apr-2024 | Parkinson's speech | Godino-Llorente
4-Mar-2024 | Modelling individual and cross-cultural variation in the mapping of emotions to speech prosody | Pol van Rijn (Max Planck Institute for Empirical Aesthetics)
5-Feb-2024 | Speech Analysis for Intent and Session Quality Assessment in Motivational Interviews | Mohammad Soleymani (USC)
20-Nov-2023 | High speed videoendoscopy | Maryam Naghibolhosseini (Michigan State University)
18-Sep-2023 | Democratizing speaker diarization with pyannote | Hervé Bredin (Institut de Recherche en Informatique de Toulouse) and Marvin Lavechin (Meta AI, ENS)
7-Aug-2023 | Overview of Zero-Shot Multi-speaker TTS Systems | Edresson Casanova (Coqui)
17-Jul-2023 | Considerations for Identifying Biomarkers of Spoken Language Outcomes for Neurodevelopmental Conditions | Karen Chenausky (Harvard Medical School & Massachusetts General Hospital)
15-May-2023 | The Potential of smartphones voice recordings to monitor depression severity | Nicholas Cummins (King's College London)
1-May-2023 | Reading group session: Introductory overview of self-supervised learning, transformers, and attention | Daniel Low (Harvard & MIT)
20-Mar-2023 | Accuracy of Acoustic Measures of Voice via Telepractice Videoconferencing Platforms | Hasini Weerathunge (Boston University)
6-Mar-2023 | Casual discussion on audio quality control and preprocessing | Daniel Low (Harvard & MIT)
26-Jan-2023 | Developing speech-based clinical analytics models that generalize: Why is it so hard and what can we do about it? | Visar Berisha (Arizona State University)
12-Jan-2023 | Using knockoffs for controlled predictive biomarker identification | Kostas Sechidis (Novartis)
15-Dec-2022 | Provide ideas and feedback on the protocol for a large-scale data collection effort (N=5k) on mental health and voice from the NIH Bridge2AI | Daniel Low (Harvard & MIT)
1-Dec-2022 | Inferring neuropsychiatric conditions from language: how specific are transformers and traditional ML pipelines in a multi-class setting? | Lasse Hansen (Aarhus University) & Roberta Rocca (Aarhus University)
17-Nov-2022 | Meet and greet/intros |
3-Nov-2022 | Speech and Voice-based Detection of Mental and Neurological Disorders: Traditional vs Deep Representation and Explainability | Bjorn Schuller (Imperial College London)
20-Oct-2022 | What do machines hear? Overview of deep learning approaches for representing voice | Gasser Elbanna (EPFL & MIT)