Program Schedule

[05/09/2024]: Thu-SS-8 – Connecting Speech science and Speech technology for Children’s Speech

Schedule

• - 13:30 - 13:35 Organizers welcome and overview

• - 13:35 - 14:40 Poster session

• - 14:45 - 15:30 Panel discussion

Organizers welcome and overview (13:30 - 13:35)

A quick introduction of the special session by the session chairs

Poster session (13:35 - 14:40 )

[All Papers will be presented as posters]

Kindly follow Interspeech poster presentation guidelines [Link].

#159 Bridging Child-Centered Speech Language Identification and Language Diarization via Phonetics Yujia Wang(Johns Hopkins University); Paola Garcia (Johns Hopkins University); Hexin Liu (Nanyang Technological University)

#485 Improving child speech recognition with augmented child-like speech Yuanyuan Zhang (Technische Universiteit Delft); Zhengjun Yue (Technische Universiteit Delft); Tanvina Patel (Multimedia computing, Delft University of Technology), Odette Scharenborg (Multimedia computing, Delft University of Technology)

#499 Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch Thomas Graave (Technische Universität Braunschweig); Zhengyang Li (Technische Universität Carolo-Wilhelmina Braunschweig); Timo Lohrenz (Technische Universität Braunschweig); Tim Fingscheidt (Technische Universität Braunschweig)

#540 EnhancingChildVocalizationClassificationwithPhonetically-TunedEmbeddingsforAssistingAutism Diagnosis Jialu Li (UIUC); Mark A Hasegawa-Johnson (University of Illinois); Karrie Karahalios (University of Illinois at UrbanaChampaign)

#717 Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions AnfengXu(University ofSouthernCalifornia); Kevin Y Huang(University of Southern California); Tiantian Feng (University of Southern California); Shen Lue (Boston University); Helen Tager-Flusberg (Boston University); Shrikanth Narayanan (USC)

#992 Training speech-breathing coordination in computer-assisted reading Delphine Charuau (GIPSA Lab); Andrea Briglia (GIPSA-lab Université de Grenoble Alpes); Erika Godde (LEADUniversité de Bourgogne); gerard bailly (GIPSA-Lab/CNRS)

#1095 Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning Lucas Block Medin (Lalilo by Renaissance Learning); Thomas Pellegrini (IRIT); Lucile Gelin (Lalilo)

#1102 Introduction to Partial fine-tuning: A comprehensive evaluation of end-to-end children’s automatic speech recognition adaptation Thomas Rolland (INESC-ID); Alberto Abad (INESC-ID/IST)

#1114 Examining Vocal Tract Coordination in Childhood Apraxia of Speech with Acoustic-to-Articulatory Speech Inversion Feature Sets Nina R Benway (University of Maryland); Jonathan L Preston (Syracuse University); Carol Y Espy-Wilson (University of Maryland)

#1180 ReadingMiscue Detection in Primary School through Automatic Speech Recognition LingyunGao(RadboudUniversityNijmegen); CristianTejedor-Garcia(RadboudUniversityNijmegen); HelmerStrik (Radboud Universiteit Nijmegen); Catia Cucchiarini (Radboud Universiteit Nijmegen)

#1353 Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models Ruchao Fan (University of California, Los Angeles); Natarajan Balaji Shankar (University of California Los Angeles); Abeer Alwan (UCLA)

#1359 PreliminaryInvestigation ofPsychometricPropertiesofaNovelMultimodalDialogBasedAffectProduction Task in Children and Adolescents with Autism Carly Demopoulos (UCSF); Linnea Lampinnen (UCSF); Cristian Preciado (UCSF); Hardik Kothare (Modality.AI); Vikram Ramanarayanan (University of California, San Francisco ＆ Modality.AI)

#2125 Automatic Evaluation of a Sentence Memory Test for Preschool Children Ilja Baumann (Technische Hochschule Nürnberg Georg Simon Ohm); Nicole Unger (Technische Hochschule Nürnberg Georg Simon Ohm); Dominik Wagner (Technische Hochschule Nuernberg Georg Simon Ohm); Korbinian Riedhammer (Technische Hochschule Nürnberg Georg Simon Ohm); Tobias Bocklet (TH Nürnberg )

#2239 HowDoesAlignment Error Affect Automated Pronunciation Scoring in Children's Speech? Prad Kadambi (Arizona State University); Tristan J Mahr (University of Wisconsin- Madison); Lucas Annear (University of Wisconsin-Madison); Henry Nomeland (University of Wisconsin-Madison); Julie Liss (Arizona State University); Katherine Hustad (University of Wisconsin- Madison); Visar Berisha (Arizona State University)

#2481 Children’s Speech Recognition through Discrete Token Enhancement Vrunda N Sukhadia (QCRI); Shammur Chowdhury (QCRI)

Panel Discussion (14:45 - 15:30 )

Panel members:

Beena Ahmed, University of New South Wales, Associate Professor

Torbjørn Svendsen, Norwegian University of Science and Technology, Professor

Odette Scharenborg, Delft University of Technology, the Netherlands, Associate Professor

Thomas Rolland, Instituto Superior Técnico, Universidade de Lisboa (INESC-ID), Postdoc

Kay Berkling, Professor, Baden-Wuerttemberg Cooperative State University (DHBW)

Session Moderator

Nina R Benway, University of Maryland A. James Clark, Postdoctoral Fellow

Torbjørn Svendsen, Professor

Norwegian University of Science and Technology, Sweden

Torbjørn Svendsen is a Professor at the Department of Electronic Systems. Professor Svendsen holds a MScEE, and a PhD both from the NTNU.

His research interests have from the outset in 1979 been speech signal processing. The first period was focused on source coding, i.e. speech compression, which was also the subject of his doctoral thesis. From the mid 80’s the research interests have been mainly on automatic speech recognition, but also areas like spoken dialogue systems and speech synthesis have been included in my research. Speech analysis methods and lexical modelling, e.g. pronunciation modelling have been two central areas. Realizing that current approaches to speech recognition seem to be nearing a saturation point in terms of performance, a major recent activity has been to investigate new paradigms for speech recognition, aiming to integrate phonetic and linguistic knowledge in a statistical framework based on detection of (language universal) phonetic features. Lately, the challenges of reliable recognition of children's speech and transcription of conversational, accented and dialectal speech have been central in his research.

Thomas Rolland, Postdoc

Instituto Superior Técnico, Universidade de Lisboa (INESC-ID), Lisbon

Thomas Rolland is a postdoctoral researcher at INESC-ID Lisbon, where he specializes in improving automatic speech recognition systems for children. He completed his PhD in Lisbon, with his dissertation titled "Towards Improving Automatic Speech Recognition for Children." His research is centered on developing parameter-efficient methods that accurately capture the unique characteristics of children's speech, even when utilizing limited datasets, thus ensuring improved performance and adaptability. Additionally, he also focuses on leveraging synthetic data for data augmentation and investigating strategies to reduce discrepancies between real and imperfect synthetic data, with a particular focus on synthetic data of children's speech.

Beena Ahmed, Associate Professor

University of New South Wales, USA

Dr. Beena Ahmed is an Associate Professor in Signal Processing with the School of Electrical Engineering and Telecommunications. She received her B.Sc. Engineering in Electrical Engineering from the University of Engineering and Technology, Lahore, Pakistan in 1993 and her Ph.D. from UNSW in 2004. She joined UNSW in 2017. Prior to that she was an Assistant Professor at Texas A&M University at Qatar.

Dr. Ahmed has been awarded international research grants for projects on long-term insomnia monitoring and remote speech therapy. She also received funding to use wearable physiological sensors to identify physiological correlates of mental stress and then adopt these correlates to develop biofeedback mobile games to teach users relaxation skills. Her current research interests are on applying machine learning and remote monitoring in healthcare and therapeutic applications

Odette Scharenborg, Associate Professor

Delft University of Technology, the Netherlands

In our Delft Inclusive Multimodal Speech Lab we are working on developing inclusive speech technology, i.e., technology that can be used by everyone, irrespective of the way they speak. We focus particularly on speaker groups whose speech patterns deviate from those of normal speakers, including children, non-native speakers and dysarthric speakers. Our research focuses on quantifying the bias against these diverse speaker groups and perhaps even more importantly mitigating the bias against these speaker groups. We are involved in several collaborations with hospitals and other associations focusing on the automatic analysis and recognition of the speech of normally developing children, children with developmental disorders, and children who stutter or have a cleft palate.

Kay Berkling, Professor

Baden-Wuerttemberg Cooperative State University (DHBW)

Over the last decade, Kay Berkling has focused on advancing education through digital technologies, particularly in the areas of gamification and language acquisition. As a professor of computer science at the Cooperative State University (DHBW) Karlsruhe, she has been deeply involved in projects that use game-based learning to improve orthography and language skills. Her research aims to make learning more engaging by integrating emotions into the educational process, challenging the traditional view that play and learning are opposites.