(top) residual between speech and "non-paralinguistic" vocalization (middle) residual between "non-paralinguistic" vocalization and EGG and (bottom) synthesized "non-paralinguistic" vocalization
This project focuses on the creation of an auditory stimulus dataset consisting of non-linguistic vocalizations parallel in prosody, energy, and average vocal tract filter response to the original affective speech but lacking linguistic content (i.e. no language-based semantic meaning). Synthesis of these transformed "tEGG" signals utilizes EGG (electroglottograph) signals recorded synchronously with the speech signals. The stimuli are currently under affect perception validation by a participant pool of neurotypical adults. Initial results shows that variation between affect ratings for speech and transformed speech audio is approximately equal to that within affect ratings for speech alone, indicating that our method effectively preserves speech affect while removing phonologic meaning.
In collaboration with the Stanford School of Medicine, Department of Behavioral Sciences and Psychiatry and funded by a Wu Tsai Neurosciences Seed Grant.
Our study aimed to further understanding of emotion in speech and music by specifically analyzing perception of a hybrid of the two: the singing voice. Previous studies have focused on analyzing the accuracy and development of emotional perception in response to speech, music, and affective vocal bursts. This study aimed to perform a simplified replication of the RAVDESS data collection study and integrate the analyses with the results of other speech, music and affect burst studies. We asked fourteen participants to classify spoken and sung phrases as having one of six emotions. Our results align with those found in the RAVDESS study and also support the results found by other vocal and musical burst studies that demographic information of both the speaker and the listener affect the perceptual interpretation of the vocalized.
A short follow-up machine perception study was done to compare low-dimension similarity measurements between machine classification of affect given vocal audio compared to human classification. (2018-2019).
2D Similarity plot of machine-perceived emotions based on their acoustic features.