Identifying visual prosody: Where do people look?

Author(s): Simone Simonetti, Jeesun Kim and Chris Davis


Talkers produce different types of spoken prosody by varying acoustic cues (e.g., F0, duration, and amplitude), also making complementary head and face movements (visual prosody). Perceivers can categorise auditory and visual prosodic expressions at high levels of accuracy. Research using eye-tracking trained participants to recognise the visual prosody of two-word sentences and found that the upper face is more critical for determining prosody than the lower face. However, recent studies using longer sentences have shown that untrained perceivers can match lower and upper faces across modalities. Given these, we aimed to extend the eye-tracking research by examining the gaze patterns of untrained participants when judging prosody with longer utterances. Twelve participants were presented questions, narrowly focussed, or broad focussed (neutral) utterances for a 3 alternative forced-choice identification task while eye gaze was recorded. Identification accuracy was high (81-97%) and did not differ among expression types. Participants gazed at eye regions longer and more often than mouth regions for all expressions. They gazed less at the mouth region for questions than for broad and narrow focussed statements. These results are consistent with the early research indicating the importance of the upper face for determining visual prosody.