Join our network to engage in seminars and discussions that advance discovery and collaboration around cutting-edge research in speech and language health signals
November 10, 2025 · 11:00 AM (Eastern Time)
Dr. Mark A. Hasegawa-Johnson, Professor of Electrical and Computer Engineering
University of Illinois Urbana-Champaign
Dr. Mark Hasegawa-Johnson is the M.E. Van Valkenburg Professor of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign. His research converts facts about speech production into low-resource transfer learning algorithms that can be used to make speech technology more fair, more inclusive, and more accessible. His research has been featured in online stories by the Wall Street Journal, CNN, CNET, and The Atlantic. Dr. Hasegawa-Johnson is a Fellow of the IEEE, of the Acoustical Society of America, and of the International Speech Communication Association, and he is currently Editor in Chief of the IEEE Transactions on Audio, Speech, and Language Processing.
Speech as a modality for the characterization and adaptation of neurodiversity
Parents without medical expertise may seek help to integrate their children into home life, school, and society. Neuromotor conditions such as cerebral palsy (CP) and Down syndrome (DS) are typically diagnosed prenatally or at birth, but may generate challenges later; conditions such as apnea, anxiety, autism, and developmental language delay may remain undetected until their behavioral correlates have caused problems. Artificial intelligence has the potential to characterize neurodiversity early in life, and to adapt its behavior in order to help the child and her parents learn together. Wearables such as Littlebeats (TM) have been shown to discriminate sleep versus waking, and monologue versus dialogue infant vocalizations; with these abilities, an infant wearable has the potential to detect behavioral disorders early in life, and to help the parents find accommodations. Accurate tests of developmental language delay exist for children 3-5 years of age, and professional speech and language treatments have been shown to improve outcomes: automatic speech recognition (ASR) for young children has the potential to make these treatments available to all children who need them. Thanks to the Speech Accessibility Project, ASR error rates for adults with Parkinson's disease halved this year, and there is reason to believe that similarly large improvements for adults with CP and DS could help grant them better access to economic and social opportunities. In these and other ways, artificially intelligent speaking agents have the potential to bridge gaps in society, and improve inter-human interaction.
Meetings are held on zoom (75 minutes) about once a month (typically on the 1st Monday of each month at 8 am PT / 11 am ET / 4 pm London / 5 pm Berlin / 11 pm Beijing
Join our mailing list to recieve updates and zoom link📙!
For questions, suggesting speakers, or proposing papers for a journal club, reach out to Jingyao Wu 📧 and Ahmed Yousef 📧
This network was co-founded in 2022 by Daniel Low, Tanya Talkar, Daryush Mehta, Satrajit Ghosh and Tom Quatieri as the Harvard-MIT Speech and Language Biomarker Interest Group.
It seeks to bring together researchers and students from around the world to share novel research, receive feedback, discuss papers, and kickstart collaborations.
Jingyao Wu, PhD, MIT
Ahmed Yousef, PhD, Massachusetts General Hospital & Harvard Medical School
Daniel Low, PhD, Child Mind Institute & Harvard University
Fabio Catania, PhD, MIT
Nick Cummins, PhD, King's College London
Hamzeh Ghasemzadeh, PhD, University of Central Florida
Rahul Brito, Harvard & MIT
Tanya Talkar, PhD, Linus Health
Daryush Mehta, PhD, Massachusetts General Hospital & Harvard Medical School
Satrajit Ghosh, PhD, MIT McGovern Institute for Brain Research
Thomas Quatieri, PhD, MIT Lincoln Laboratory
Curated materials (tools, datasets, readings) to support exploration and innovation in the field.
Audio
senselab: a Python package that simplifies building pipelines for digital biometric analysis on speech and voice.
Riverst: a multimodal avatar for interacting with the user(s) and collect audio and video data.
Text
Quick spacy type metrics: https://github.com/HLasse/TextDescriptives and https://github.com/novoic/blabla
Suicide Risk Lexicon, build lexicon with LLMs, and semantic similarity: https://github.com/danielmlow/construct-tracker
Audio and text
Many voice and speech datasets: Alden Blatter, Hortense Gallois, Samantha Salvi Cruz, Yael Bensoussan, Bridge2AI Voice Consortium, Maria Powell, Jean-Christophe Bélisle-Pipon. (2025). “Global Voice Datasets Repository Map.” Voice Data Governance. https://map.b2ai-voice.org/.
Bridge2AI Voice Dataset https://b2ai-voice.org/the-b2ai-voice-database/
Facebook's large-scale multimodal dataset of 4,000+ hours of human interactions for AI research: https://github.com/facebookresearch/seamless_interaction
Audio
CLAC: A Speech Corpus of Healthy English Speakers
Many speech datasets: https://github.com/jim-schwoebel/allie/tree/master/datasets#speech-datasets
Many audio visual datasets: https://github.com/krantiparida/awesome-audio-visual#datasets
Text
Many text datasets: https://lit.eecs.umich.edu/downloads.html#undefined
Many text datasets: https://github.com/niderhoff/nlp-datasets
Audio
Ramanarayanan, V., Lammert, A. C., Rowe, H. P., Quatieri, T. F., & Green, J. R. (2022). Speech as a biomarker: Opportunities, interpretability, and challenges. Perspectives of the ASHA Special Interest Groups, 7(1), 276-283.
Low, D. M., Bentley, K. H., & Ghosh, S. S. (2020). Automated assessment of psychiatric disorders using speech: A systematic review. Laryngoscope investigative otolaryngology.
Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J., & Quatieri, T. F. (2015). A review of depression and suicide risk assessment using speech analysis. Speech communication. link
Patel, R. R., Awan, S. N., Barkmeier-Kraemer, J., Courey, M., Deliyski, D., Eadie, T., ... & Hillman, R. (2018). Recommended protocols for instrumental assessment of voice: American Speech-Language-Hearing Association expert panel to develop a protocol for instrumental assessment of vocal function. American journal of speech-language pathology, 27(3), 887-905.
Text
Mihalcea, R., Biester, L., Boyd, R. L., Jin, Z., Perez-Rosas, V., Wilson, S., & Pennebaker, J. W. (2024). How developments in natural language processing help us in understanding human behaviour. Nature Human Behaviour, 8(10), 1877-1889.
Stade, E. C., Stirman, S. W., Ungar, L. H., Boland, C. L., Schwartz, H. A., Yaden, D. B., ... & Eichstaedt, J. C. (2024). Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation. NPJ Mental Health Research, 3(1), 12.
Low, D., Mair, P., Nock, M., & Ghosh, S. Text Psychometrics: Assessing Psychological Constructs in Text Using Natural Language Processing. PsyArxiv. link
Most recordings are available upon request.
06-October-2025 | From Noise to Signal: Individual Variability in Voice Fatigue Subtyping| Mark Berardi (University of Iowa)
19-May-2025 | Building your research team: Who should be in the room where it happens? | Maria Powell (Vanderbilt University Medical Center)
5-May-2025 | Clinical theory and dimensions of speech markers: Psychosis as a case study | Lena Palaniyappan (Professor of Psychiatry, McGill)
7-Apr-2025 | Toward generalizable machine learning models in speech, language, and hearing sciences: Estimating sample size and reducing overfitting | Hamzeh Ghasemzade (Massachusetts General Hospital – Harvard Medical School)
17-Mar-2025 | Exploring the Mechanistic Role of Cognition in the Relationship between Major Depressive Disorder and Acoustic Features of Speech | Lauren White (King’s College London)
3-Mar-2025 | Speech as a Biomarker for Disease Detection | Catarina Botelho (INESC-ID, University of Lisbon)
17-Feb-2025 | Exploring Intraspeaker Variability in Vocal Hyperfunction Through Spatiotemporal Indices of RFF | Jenny Vojtech (Boston University)
20-Jan-2025 | Clinically meaningful speech-based endpoints in clinical trials | Julie Liss (Arizona State University)
12-03-2024 | Revealing Confounding Biases: A Novel Benchmarking Approach for Aggregate-Level Performance Metrics in Health Assessments | Roseline Polle (Thymia)
10-16-2023 | Estimation of parameters of the phonatory system from voice | Zhaoyan Zhang (UCLA Head and Neck Surgery)
18-Nov-2024 | The interplay between signal processing and AI to archive enhanced and trustworthy interaction systems | Ingo Siegert (Otto-von-Guericke-University Magdeburg)
4-Nov-2024 | Remote Voice Monitoring System for Patients with Heart Failure | Fan Wu (ETH Zurich)
6-May-2024 | Building Speech-Based Affective Computing Solutions by Leveraging the Production and Perception of Human Emotions | Carlos Busso (UT Dallas)
1-Apr-2024 | Parkinson's speech | Godino-Llorente
4-Mar-2024 | Modelling individual and cross-cultural variation in the mapping of emotions to speech prosody | Pol van Rijn (Max Planck Institute for Empirical Aesthetics)
5-Feb-2024 | Speech Analysis for Intent and Session Quality Assessment in Motivational Interviews | Mohammad Soleymani (USC)
20-Nov-2023 | High speed videoendoscopy | Maryam Naghibolhosseini (Michigan State University)
18-Sep-2023 | Democratizing speaker diarization with pyannote | Hervé Bredin (Institut de Recherche en Informatique de Toulouse) and Marvin Lavechin (Meta AI, ENS)
7-Aug-2023 | Overview of Zero-Shot Multi-speaker TTS Systems | Edresson Casanova (Coqui)
17-Jul-2023 | Considerations for Identifying Biomarkers of Spoken Language Outcomes for Neurodevelopmental Conditions | Karen Chenausky (Harvard Medical School & Massachusetts General Hospital)
15-May-2023 | The Potential of smartphones voice recordings to monitor depression severity | Nicholas Cummins (King's College London)
1-May-2023 | Reading group session: Introductory overview of self-supervised learning, transformers, and attention | Daniel Low (Harvard & MIT)
20-Mar-2023 | Accuracy of Acoustic Measures of Voice via Telepractice Videoconferencing Platforms | Hasini Weerathunge (Boston University)
6-Mar-2023 | Casual discussion on audio quality control and preprocessing | Daniel Low (Harvard & MIT)
26-Jan-2023 | Developing speech-based clinical analytics models that generalize: Why is it so hard and what can we do about it? | Visar Berisha (Arizona State University)
12-Jan-2023 | Using knockoffs for controlled predictive biomarker identification | Kostas Sechidis (Novartis)
15-Dec-2022 | Provide ideas and feedback on the protocol for a large-scale data collection effort (N=5k) on mental health and voice from the NIH Bridge2AI | Daniel Low (Harvard & MIT)
1-Dec-2022 | Inferring neuropsychiatric conditions from language: how specific are transformers and traditional ML pipelines in a multi-class setting? | Lasse Hansen (Aarhus University) & Roberta Rocca (Aarhus University)
17-Nov-2022 | Meet and greet/intros |
3-Nov-2022 | Speech and Voice-based Detection of Mental and Neurological Disorders: Traditional vs Deep Representation and Explainability | Bjorn Schuller (Imperial College London)
20-Oct-2022 | What do machines hear? Overview of deep learning approaches for representing voice | Gasser Elbanna (EPFL & MIT)