Rhythmic patterns and literary genres in synthesized speech

Author(s): Elisabeth Delais-Roussarie, Damien Lolive, Hiyon Yoo and David Guennec


In the last twenty years, the quality of synthesized speech has improved greatly with the emergence of new TTS techniques, including corpus-based synthesis systems. Yet the rhythmic patterns obtained do not always sound very natural. An improvement is thus necessary for using synthesis in a wide range of applications (games, educational software, etc.). In this paper, we compare the rhythmic patterns observed in natural and synthesized speech for three literary forms (rhymes, poems, and fairy tales) in order to evaluate how rhythm could be improved in synthesized speech. The study is based on the analysis of a corpus of six rhymes, four poems and two extracts from fairy tales. All texts were recorded by three speakers and were generated with two distinct synthesized voices. The comparison of the rhythmic patterns observed is done by analyzing duration in relation to prosodic structure in the various data. This approach allows showing that rhythmic differences between synthesized and natural speech are mostly due to the marking of prosodic structure, especially at the level of the intonational phrase (IP). The lengthening rate for accented syllables located at the end of IPs is much more important in synthesized speech than in natural speech.