Computational models of early language acquisition and the role of different voices


In this project, we worked in an interdisciplinary team and combined expertise on the hands-on study of language acquisition in the Baby Research Center Nijmegen with computational models to find out how babies take their first steps to becoming a native speaker.

Infants learn words from the speakers in their environment. This thesis was inspired by the question how infants can discover words in the speech signal in the face of multiple voices. Detecting words is difficult, because the speech signal is continuous, unlabelled, and not divided into shorter units that coincide with sound- or word-boundaries. This problem is further complicated by the presence of multiple speakers who all sound differently. So far, research on the role of infants' input concentrated on the main caregiver, usually the mother. The assumption was that the main caregiver provides most of the information babies need to learn their native language. The role of other speakers was largely unknown.

In this thesis computational models simulated the language acquisition process, specifically word learning. Using computational models allowed for full control over the input and over all processes inside the simulated baby's mind. The models learned words from real speech, without intervening processes that transform the continuous signal into sequences of single sounds or words. It turned out that these models can learn words and that they are even able to simulate babies' behaviour in experiments successfully. Across chapters, overt, measurable behaviour was simulated along with the underlying abilities that infants might bring to the task of word learning.

The different studies in this thesis revealed that hearing many speakers, both men and women, can help the word learning process. Hearing variable input from multiple speakers generally led to successful word learning. Especially in more adverse conditions, such as hearing speech in the presence of background noise or encountering yet another unknown speaker, the models which learned from many speakers usually fared better. In conclusion, variability from different voices in the speech signal can be very valuable for the word learning process.

More Information

Promotor: Prof. Paula Fikkert

Promotor: Prof. Lou Boves

Co-Promotor: Dr. Louis ten Bosch