WOCCI 2017 – Workshop on Child Computer Interaction

Satellite event of ICMI' 2017, Glasgow, Scotland November 13, 2017

Program Time-schedule

Accepted Papers

Sadeen Alharbi, Anthony Simons, Shelagh Brumfitt and Phil Green. Automatic recognition of children's read speech for stuttering application

Eleni Chatzidaki, Michalis Xenos and Charikleia Machaira. A Natural User Interface Game for the Evaluation of Children with Learning Difficulties

Erika Godde, Gerard Bailly, David Escudero, Marie-Line Bosse and Estelle Gillet-Perret. Evaluation of Reading Performance of Primary School Children: Objective Measurements vs. Subjective Ratings

Angela Grimminger and Katharina J. Rohlfing. “Can you teach me?” – Children teaching new words to a robot in a book reading scenario

André Grossinho, Joao Magalhaes and Sofia Cavaco. Visual-feedback in an interactive environment for speech-language therapy

Athanasia Kolovou, Elias Iosif and Alexandros Potamianos. Lexical and affective models in early acquisition of semantics

Anastassia Loukina, Beata Beigman Klebanov, Patrick Lange, Binod Gyawali and Yao Qian. Developing speech processing technologies for shared book reading with a computer

Maxime Portaz, Adela Barbulescu, Maxime Garcia, Antoine Begault, Laurence Boissieux, Marie-Paule Cani, Rémi Ronfard and Dominique Vaufreydaz. Figurines, a multimodal framework for tangible storytelling

Saeid Safavi and Lily Meng. Comparison of two scoring method within i-vector framework for speaker recognition from children’s speech

Invited Presentation:

Professor Martin Russell

Automatic Recognition of Children’s Speech for Child-Computer Interaction


Since the first paper on the subject appeared in mid-1990s it has been known that automatic speech recognition (ASR) is more challenging for children’s speech than for that of adults. Due to their shorter vocal tracts, spectral structures, such as formants, occur at higher frequencies in children’s speech and spectral resolution is poor because of their higher fundamental frequencies. A number of studies have demonstrated greater variability in a range of acoustic parameters in children’s speech, but whether this is due to poor motor control or cognitive, phonological factors associated with language acquisition is not clear. Consequently ASR reported error rates have typically been more than 100% greater for child speech than adult speech on comparable tasks. This is a significant

problem because ASR has a number of compelling applications with children and for some of these applications, such as pronunciation or reading tuition, ASR is the key enabling technology, and not just an alternative means of interaction. A number of these applications are particularly demanding because they involve very young children and require accurate recognition at the phone level.

In the past decade conventional hidden Markov model (HMM) based approaches to ASR, that were the subject of incremental development since the mid 1980s, have been outperformed by deep neural network (DNN) methods based on deep learning. Most recently these DNN-HMM hybrid systems have been applied to children’s speech. This talk will chart progress in ASR for children’s speech from the early work in the 1990s to today’s DNN-HMM based systems. As well as tracking improvements in recognition accuracy I will try to measure progress in terms of how our understanding of variability in children’s speech has improved, and how research in speech technology might even give new insights into the nature of children’s speech. I will also compare results obtained using state-of- the-art systems in laboratories with the experiences of those trying to deploy ASR in real child-computer interaction applications. Finally, I will outline the most interesting challenges for the future and how they might be addressed.


Martin Russell is Professor of Information Engineering in the School of Electronic, Electrical and Systems Engineering. He joined the University of Birmingham in 1998 and was Head of School from 2006 until 2009. His research interests are in speech and language technology and the integration of speech with other modalities, for example gaze and gesture. He has published over 100 research papers in these areas.