WOCCI 2017 – Workshop on Child Computer Interaction

Satellite event of ICMI' 2017, Glasgow, Scotland November 13, 2017

Program Schedule

8:00 - 13:00: WOCCI registration

12:30 - 13:00: lunch (buffet lunch provided by ICMI)

13:00 - 13:15: WOCCI welcome

13:15 - 14:15: Invited presentation: Martin Russell. Automatic Recognition of Children’s Speech for Child-Computer Interaction

14:20 - 15:00: Oral Session I (Oral reading assessment)

15:00 - 16:15: poster session (jointly with ICMI coffee break)

16:15 - 17:15: Oral Session II (Health-related applications)

17:15 - 17:30: WOCCI closing

17:30 - 20:00: WOCCI dinner (Brel Bar, 37-43 Ashton Lane, Glasgow)

Invited Presentation:

Professor Martin Russell

Automatic Recognition of Children’s Speech for Child-Computer Interaction


Since the first paper on the subject appeared in mid-1990s it has been known that automatic speech recognition (ASR) is more challenging for children’s speech than for that of adults. Due to their shorter vocal tracts, spectral structures, such as formants, occur at higher frequencies in children’s speech and spectral resolution is poor because of their higher fundamental frequencies. A number of studies have demonstrated greater variability in a range of acoustic parameters in children’s speech, but whether this is due to poor motor control or cognitive, phonological factors associated with language acquisition is not clear. Consequently ASR reported error rates have typically been more than 100% greater for child speech than adult speech on comparable tasks. This is a significant

problem because ASR has a number of compelling applications with children and for some of these applications, such as pronunciation or reading tuition, ASR is the key enabling technology, and not just an alternative means of interaction. A number of these applications are particularly demanding because they involve very young children and require accurate recognition at the phone level.

In the past decade conventional hidden Markov model (HMM) based approaches to ASR, that were the subject of incremental development since the mid 1980s, have been outperformed by deep neural network (DNN) methods based on deep learning. Most recently these DNN-HMM hybrid systems have been applied to children’s speech. This talk will chart progress in ASR for children’s speech from the early work in the 1990s to today’s DNN-HMM based systems. As well as tracking improvements in recognition accuracy I will try to measure progress in terms of how our understanding of variability in children’s speech has improved, and how research in speech technology might even give new insights into the nature of children’s speech. I will also compare results obtained using state-of- the-art systems in laboratories with the experiences of those trying to deploy ASR in real child-computer interaction applications. Finally, I will outline the most interesting challenges for the future and how they might be addressed.


Martin Russell is Professor of Information Engineering in the School of Electronic, Electrical and Systems Engineering. He joined the University of Birmingham in 1998 and was Head of School from 2006 until 2009. His research interests are in speech and language technology and the integration of speech with other modalities, for example gaze and gesture. He has published over 100 research papers in these areas.