Research

Learning Paralinguistic Attributes from Audiobooks with Voice Conversion

Zakaria Aldeneh, Matthew Perez, Emily Mower Provost

Paralinguistics, the non-lexical components of speech, play a crucial role in human-human interaction. Paralinguistic tasks, specifically the detection of emotional expression from speech, have limited access to large datasets with accurate labels. As a result, it is difficult to train models that capture paralinguistic attributes via the supervised learning paradigm. In this work, we propose the Expressive Voice Conversion Autoencoder (EVoCA), which is a framework for capturing paralinguistic (e.g., emotion) attributes from a large-scale (i.e., 200 hours) audio-textual data without requiring manual emotion annotations. The proposed network utilizes the conversion of synthesized (neutral) speech and real (expressive) speech in order to learn what makes speech expressive in an unsupervised manner. We demonstrate that the learned embeddings from EVoCA outperform Mel-spectrum based acoustic features and other current unsupervised methods on emotion and speaking style classification tasks.

-- NAACL 2021 [Accepted]

Aphasic Speech Recognition using a Mixture of Speech Intelligibility Experts

Matthew Perez, Zakaria Aldeneh, Emily Mower Provost

Automatic speech recognition (ASR) is a key component for automatic, aphasic speech analysis. However, current approaches of using a standard, one-size-fits-all ASR model might be sub-optimal due to the wide range of speech intelligibility that exists both within and between speakers. In this work, we investigate the importance of speech intelligibility with regards to ASR modeling. We show how speech intelligibility can be estimated using a neural network and how intelligibility variability can be addressed within our acoustic model architecture using a mixture of experts. Our results show that our model leads to significant phone recognition improvement compared to a traditional, one-size-fits-all model.

-- Interspeech 2020 [Paper]

Classification of Huntington Disease using Acoustic and Lexical Features

Matthew Perez, Wenyu Jin, Duc Le, Noelle Carlozzi, Praveen Dayalu, Angela Roberts, Emily Mower Provost

This works presents a pipeline for an automatic, end-to-end classification system using speech as the primary input for predicting Huntington Disease. We explore using transcript-based features to capture speech-characteristics of interest and use methods such as k-Nearest Neighbors (with euclidean and dynamic time warped distances) as well as more modern neural net approaches for classification.

-- Interspeech 2018 [Paper]

Portable mTBI Assessment Using Temporal and Frequency Analysis of Speech

Louis Daudet, Nikhil Yadav, Matthew Perez, Christian Poellabauer, Sandra Schneider, Alan Huebner

This work investigates the use of mobile devices for the extraction and analysis of various acoustic features at detecting mild traumatic brain injury (mTBI). Our results suggest strong correlation between certain temporal and frequency features and likelihood of a concussion.

-- IEEE Journal of Biomedical and Health Informatics [Paper]

Page updated

Report abuse