My research investigates how people perceive synthetic/computer voices.
So far, I've studied perception of synthetic voices in two ways:
what factors influence memorability and distinctiveness of a synthetic voice, and
how do users socially evaluate synthetic voices when they imitate human social behavior?
Current research on speech and memory suggests that human minds are selective for human speech over other kinds of auditory input (Binder et al., 2000). I tested if there is a bigger penalty on the memorability of non-human speech based on how natural it sounds.
I tested whether voice naturalness affects how well listeners can identify a synthetic voice as unique. In this study, participants were sorted into a "naturalistic" or "robotic" listening condition, and completed a vowel categorization task for a single synthetic voice. Then, they took a surprise voice recognition test where they were asked to select whether a voice was "new" or "old" among a set of eight naturalistic and robotic voices. Listeners in the robotic condition struggled to discriminate naturalistic voices from robotic ones, as illustrated here:
Basically, participants in the robotic condition remembered the voice they heard less robustly than participants in the naturalistic condition. This supports the idea that the more naturalistic a synthetic voice is, the more memorable it will be to listeners.
In another experiment, I used the Deese-Roediger-McDermott (DRM) task from behavioral psychology to test if listeners' semantic gist memory improved when they heard word lists produced by naturalistic synthetic voices compared to robotic ones. I found that the roboticness of the voice did not affect listeners' encoding of semantic gist. When I compared my findings to the original DRM literature, I found that my participants had lower overall recognition of list items compared to the human voices. This study supported the "human voices are specially encoded" theory proposed by Binder (2000).