Text-to-speech systems usually incorporate modules for predicting the prosodic parameters (usually F0 and duration) of the statements to be generated. These modules typically use duration and F0 models to predict the prosodic values of each sound. In this general framework, the aim of this line of research is to explore the possibilities of heuristic and parametric approaches (those based on rules and linguistic knowledge) for the prediction of prosody in text-to-speech systems, both in neutral as well as expressive situations. This line has so far resulted in:
The development of GenProso, a duration and F0 prediction module for text-to-speech conversion.
The definition of a procedure for parametric modification of prosodic parameters for the synthetic generation of expressive speech.