Interspeech 2019 Special Session


Spoken Language Processing for Children's Speech

Special Session Motivation

This special session aims to bring together researchers and practitioners from academia and industry working on the challenging task of processing spoken language produced by children. While recent years have seen dramatic advances in the performance of a wide range of speech processing technologies (such as automatic speech recognition, speaker identification, speech-to-speech machine translation, sentiment analysis, etc.), the performance of these systems often degrades substantially when they are applied to spoken language produced by children. This is partly due to a lack of large-scale data sets containing examples of children's spoken language that can be used to train models but also because children's speech differs from adult speech at many levels, including acoustic, prosodic, lexical, morphosyntactic, and pragmatic. We envision that this session will bring together researchers working in the field of processing children's spoken language for a variety of downstream applications to share their experiences about what approaches work best for this challenging population. Papers describing original work in, but not limited to, the following research areas will be solicited for inclusion in the special session:

  • Intra- and inter-speaker variability in children's speech
  • Age-dependent characteristics of spoken language
  • Acoustic, language and pronunciation modeling in ASR for children
  • Analysis of pathological speech
  • Paralinguistic information extraction from children’s speech
  • Spoken dialogue systems
  • Multimodal speech-based child-machine interaction
  • Speech-enabled toys and games

What makes this Special Session "Special"?

• This special session is being organized by SIG-CHILD, the ISCA special interest group focusing on multimodal child-computer interaction and continues a series of productive events that have been hosted by SIG-CHILD in the area of child-computer interaction and analysis of children's speech since 2008. These prior events include six instantiations of the Workshop on Child Computer Interaction (WOCCI) as well as an Interspeech special session that have all resulted in proceedings published in the ISCA archive; the details of these previous events are listed below:

o WOCCI 2008: co-located with ICMI (the International Conference on Multimodal Interaction), Chania, Crete, Greece, October 23, 2008

o WOCCI 2009: co-located with ICMI, Cambridge, MA, USA, November 2-4, 2009

o WOCCI 2012: co-located with Interspeech, Portland, OR, USA, September 14, 2012

o WOCCI 2014: co-located with Interspeech, Singapore, September 19, 2014

o Interspeech 2015 special session on "Speech and Language Processing of Children's Speech", Dresden, Germany, September 7, 2015

o WOCCI 2016: co-located with Interspeech, San Francisco, CA, USA, September 6-7, 2016

o WOCCI 2017: co-located with ICMI, Glasgow, Scotland, November 13, 2017

• The acoustic and linguistic characteristics of children's speech are widely different from those of adults and the rising need for interactive voice interfaces between children and computers opens up challenging research issues about how to develop effective models for reliable processing of children's spoken language. The aim of this special session is to provide a focused venue at Interspeech 2019 for the presentation of research progress in all aspects spoken language processing of children’s speech including multi-modal analysis.

• Various state-of-the-art systems are typically presented as key components for next generation child-centered computer interaction. Technological advances are increasingly necessary in a world where education and health pose growing challenges to the core well-being of our societies. Noticeable examples are remedial treatments for children with or without disabilities and individualized learning systems. This special session will serve as a venue for presenting recent advances in these core technologies as well as experimental systems and prototypes that apply them in user-oriented downstream applications.

The Workshop on Speech and Language Technology in Education (SLaTE) will be held as a satellite event of Interspeech 2019 from September 20 - 21. SLaTE typically attracts around 60 - 70 attendees who conduct research in the field of educational applications of speech processing technology, many of whom specialize in children's spoken language. The inclusion of a special session focusing on research into children's spoken language at Interspeech 2019 will bring about synergies with SLaTE 2019 and will enable these researchers to have an additional venue for interacting with researchers working on the processing of children's speech in other, non-education-related domains who may not attend SLaTE 2019.

Special Session Organizers

Keelan Evanini, Educational Testing Service, Princeton, USA (Chairperson of SIG-CHILD)

Maryam Najafian, CSAIL Lab, MIT, Cambridge , USA (Co-secretary of SIG-CHILD)

Saeid Safavi, CVSSP, University of Surrey, UK (Co-secretary of SIG-CHILD)

Kay Berkling, Baden-Wuerttemberg Cooperative State University, Karlsruhe, Germany (Past chair advisory of SIG-CHILD)

Accepted Papers

Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network

Fei Wu (Johns Hopkins University), Leibny Paola Garcia Perera (Johns Hopkins University), Dan Povey (Johns Hopkins University), Sanjeev Khudanpur (Johns Hopkins University)

A Frequency Normalization Technique for Kindergarten Speech Recognition Inspired by the Role of F0 in Vowel Perception

Gary Yeung (University of California, Los Angeles), Abeer Alwan (UCLA)

Improving ASR Systems for Children with Autism and Language Impairment Using Domain-Focused DNN Transfer Techniques

Robert Gale (Oregon Health & Science University), Liu Chen (Oregon Health & Science University), Jill Dolata (Oregon Health & Science University), meysam asgari (CSLU)

Ultrasound tongue imaging for diarization and alignment of child speech therapy sessions

Manuel Sam Ribeiro (The University of Edinburgh), Aciel Eshky (University of Edinburgh), Korin Richmond (Informatics, University of Edinburgh), Steve Renals (University of Edinburgh)

Automated estimation of oral reading fluency during summer camp e-book reading with MyTurnToRead

Anastassia Loukina (Educational Testing Service), Beata Beigman Klebanov (Educational Testing Service), Patrick Lange (Educational Testing Service R&D), Yao Qian (Educational Testing Service), Binod Gyawali (Educational Testing Service), Nitin Madnani (Educational Testing Service), Abhinav Misra (Educational Testing Service (ETS)), Klaus Zechner (ETS), Zuowei Wang (Educational Testing Service), John Sabatini (Educational Testing Service)

Sustained Vowel Game: a computer therapy game for children with dysphonia

Vanessa Lopes (Universidade Nova de Lisboa), Joao Magalhaes (Universidade Nova de Lisboa), Sofia Cavaco (Universidade Nova de Lisboa)