My research broadly investigates how people perceive synthetic/computer voices. So far, I have approached these questions in two ways: 1) what factors affect how memorable or distinctive a synthetic voice is, and 2) do users socially evaluate synthetic voices similarly to human ones when they use social-like behaviors?
In this line of research inquiry, I use experimental methods from psychology and speech perception to test listeners' recognition and recall of linguistic input when they heard speech from a computer voice. In one study, I utilized the Deese-Roediger-McDermott (DRM) task to test if listeners' semantic gist memory varied based on the acoustic quality of a synthetic voice. In one condition, participants listened to a neural synthetic TTS voice read the DRM lists, while in the other condition participants heard a roboticized TTS voice read the lists. We found that the roboticness of the voice did not affect listeners' encoding of the lists into gist memory. But, in a post-hoc analysis where we compared our findings to the original DRM literature, we found that participants in our study showed lower overall recognition of list items compared to the human voices in the DRM literature. This suggests that listeners might attend to speech input from synthetic voices less than human ones.
Following this, we investigated how robustly participants would encode distinct naturalistic synthetic voices compared to roboticized ones. In this study, participants heard four neural and four roboticized synthetic voices in a vowel identification task, and then completed a surprise voice recognition test where they were asked to select whether a voice was "new" or "old" among a set of eight neural and roboticizied synthetic voices.