My research is currently aimed at pushing the boundaries of bio-marking in human health. I leverage multi-modal signals and integrate deep learning and signal processing methodologies to achieve this. My interests extend beyond this specific area, encompassing speech and language technologies, affective computing, and the exciting field of digital health.
Bio-marking of Health from Multi-modal Signals, Biomedical Signal Processing
Speech and Language Technologies, Machine Learning for Speech Processing
Computational Para-linguistics, Affective Computing
Speech Analysis, Recognition & Synthesis, Speech Production & Perception, Speech Prosody
Processing of Singing, Behavioral & Social Signals
Detection of Speech Deepfakes
Auditory Perception and Cognition
Structured variability in vocal tract articulation dynamics in speech
Humans produce and use speech to communicate and interact with one another, to convey their thoughts and to their express emotions, in a vast variety of ways. The production of the rich sounds of speech by an individual involves intricate movement and coordination of the vocal organs, such as the tongue and jaw, in a manner that is flexible and adaptive to the context of the interaction. Yet, the details of how this flexibility is achieved are not known completely. Speech production can also be affected by a variety of personal circumstances including illness and disorder.
The project will create a scientific foundation for understanding how human speech varies across time, both within and across interpersonal interactions––over hours, days, weeks, months and years––by directly observing and modeling articulation during speech. Such knowledge is fundamental to both advancing speech science and to the design of robust interactive speech technologies. A longstanding goal in speech research is to understand and address the rich and pervasive variability in its production, both within and across individuals and for varied interactional contexts. Our research investigates questions not approachable via speech acoustics alone. Direct access to dynamic information on vocal tract articulation, complemented by technology and analysis advances, allow us to examine complex behavior associated with speech production variability—namely, its flexibility and stability over task and over time. The project will use advanced real-time magnetic resonance imaging (rtMRI) and computational modeling of the human vocal tract motion during speech production to understand the structure and control of spoken language communication across timescales, both within individual experience and across interpersonal interactions, offering an unprecedented opportunity to observe how humans plan and produce speech collaboratively with one another at a spatiotemporal detail not possible before. The project innovations include imaging the vocal tracts of conversing speakers simultaneously and synchronously at two sites to understand speech production behavior during a dialog, mapping how a single individual’s production speech production varies naturally and typically over hours, days, months and years, and how individual differences in speech flexibility are predictive of speaker stability. The research program integrates speech science and engineering through empirical work leveraging rich, quantitative, and dynamic articulatory rtMRI data, and will broadly share the unique data, tools and models. The project also has critical applied significance beyond speech technology, as knowledge of normative articulation and its variability can impact the assessment and remediation of speech disorders by helping derive robust speech-based biomarkers for a variety of clinical conditions across the life span from Autism to dementia.
Multilingualism as a factor of resilience to Alzheimer's disease and related dementias in India
Multilingualism as a factor of resilience to Alzheimer's disease and related dementias in India - Project Summary/Abstract By 2050, two-thirds of older individuals with dementia will live in low-and middle-income countries (LMICs). As LMICs continue to experience a reduction in mortality, it is critical to determine factors that confer protection and resilience toward Alzheimer’s disease and related dementias (ADRD). Some studies find that bilinguals are at reduced ADRD risk compared to monolinguals, but other studies do not find evidence of a bilingual advantage. The rationale for this study is that the equivocal findings of prior research has been driven largely by methodological inconsistencies that limit our understanding of bilingualism’s role in cognitive aging such as: 1) inadequate control for potential confounders that limit our ability to infer whether bilingualism has a direct effect on cognition or whether these relationships are due to environmental and sociocultural factors, 2) limited inclusion of markers of neuropathology, and 3) little attention given to within-group differences among bilinguals, such as age of second language (L2) acquisition, proficiency, frequency of language use, number of languages spoken, and diversity of language families. India offers a unique opportunity to study the role of bilingualism in cognitive reserve and resilience, given its rich linguistic and sociocultural diversity across the country. The overall aim of the study is to leverage the unique features of India’s linguistic and sociodemographic landscape to discern whether bilingualism modifies the association between blood-based and neuroimaging biomarkers of ADRD and cognition and cognitive decline. This study will analyze available plasma-based measures of amyloid and tau pathologies, MRI, and cognitive assessments from the Longitudinal Aging Study in India–Diagnostic Assessment of Dementia (LASI-DAD), a large, population-representative study of ageing and dementia in India. Specifically, the project will 1) determine whether bilingualism modifies the association between ADRD biomarkers (blood-based or neuroimaging) and cognitive outcomes, 2) evaluate whether the protective effect of bilingualism differs across diverse life-course environmental determinants of health, and 3) deconstruct language use within bilinguals in India to understand the mechanisms by which bilingualism confers cognitive reserve against biological risk of ADRD. We hypothesize that bilingualism will buffer the effects of blood AD-biomarkers (amyloid, and tau plasma levels), cortical atrophy, and white matter integrity on baseline cognition and rate of cognitive decline compared to monolinguals in the domains of memory, language, and executive functioning. In addition, by deconstructing bilingualism, we hypothesize that earlier age of L2 acquisition, higher bilingual proficiency, greater daily multiple language use, higher number of languages acquired, and greater distance between language families will confer cognitive reserve, independent of confounding sociocultural factors (i.e., education, socioeconomic status). This proposal will enhance the field of ADRD by uncovering the underlying mechanisms of resilience to ADRD that may be modifiable and transferable to other populations.
PRECOG: Multimodal integration of neural and biobehavioral signals for predicting preconscious responses
The physical and psychological health and well-being of every service member is foundational to national security, safety, and readiness in today’s highly dynamic and complex global environment. Current methods for assessing and tracking individual mental and behavioral well-being—primarily self-reports or behavioral interviews—are inadequate; they are often incomplete, unreliable, or delivered too late to be useful in critical situations. As a result, many military service members and veterans slip through the cracks and do not receive timely treatment, contributing to elevated rates of adverse outcomes, including suicide. Advances in neuroscience, together with technological progress in neural and biobehavioral sensing, signal processing, machine learning, and computing, now offer a path forward. These developments create unprecedented opportunities for acquiring and analyzing diverse, information-rich data that enable causal, multimodal characterization of an individual’s mental state with a level of granularity, context, and scale not previously possible. Within this landscape, DARPA’s NEAT program asks a central question: Is it possible to create objective inferences of behavioral health risk by quantifying and aggregating preconscious brain and body responses to carefully designed linguistic stimuli?
PRECOG—a project focused on multimodal integration of neural and biobehavioral signals for predicting preconscious responses—aims to address this question through a rigorous, multidimensional research effort. PRECOG seeks to objectively illuminate the interplay between neural and biobehavioral signals of preconscious processing (e.g., responses to specific intents conveyed through external stimuli), mental states (e.g., emotions), and well-being or associated risk (e.g., suicidal ideation). The approach integrates rich multimodal measurements elicited in experimental settings through stimuli grounded in neuroscientific and psycholinguistic theory. These data are analyzed and modeled using advanced signal processing and robust machine learning methods. Through this framework, mental health risk factors can be inferred objectively from multimodal signatures of preconscious responses.
SFARI: Multimodal, objective assessment of the ASD phenotype: Longitudinal stability and change across contexts
Autism spectrum disorder (ASD) is a complex neurodevelopmental condition that is defined in terms of child behavior. However, objective measurement of ASD-relevant behavior remains rare, particularly across real-world contexts. This is a multi-context observational study incorporating fixed and wearable behavior sensors to digitally phenotype ASD-related behavior in real-world settings. We employ a computational, scalable approach to investigate how objective measurements of ASD-related behaviors are associated with clinical indices of ASD severity and the stability of these objective measurements across clinical assessment and naturalistic classroom contexts.
This collaborative Miami–UCLA–USC project harnesses advances by our interdisciplinary team and others to objectively characterize the autism behavioral phenotype using digital proxemics data obtained from radio-frequency identification technology and machine learning of behavior from audio and video recordings. The assessment of 150 three- to five-year-old children (50 with ASD, 50 with other developmental disabilities [DD], and 50 typically developing [TD]) will yield a broad range of ASD-relevant behavior across contexts. Following an objectively and clinically characterized ADOS-2, six longitudinal BOSCC evaluations are paired with classroom observations over the course of the school year to determine the stability of ASD-relevant multimodal behavior—and its sensitivity to change over time—in both a relatively controlled clinical assessment and in relatively unconstrained preschool inclusion classrooms. The project will create an objective portrait of multimodal behaviors characterizing variability in the ASD behavioral phenotype between two real-world settings (the clinic and the inclusion classroom) over longitudinal time. These efforts will contribute to scalable, objective capture of clinically meaningful ASD-relevant behaviors to support the development of interventions that capitalize on sensitivity to change while yielding effects that are stable across contexts.
Investigating “Phonological” Developmental Speech Errors Using Real-Time Magnetic Resonance Imaging
For decades, many developmental speech patterns that deviate from those produced by typical adult speakers have been described as phonological in nature (e.g., “stopping,” whereby target /s/ is produced in a way that is perceived and transcribed as /t/). Classification of these errors as phonological suggests that such patterns arise due to phonemic miscategorization of speech sounds at a cognitive–linguistic level rather than due to speech motor inaccuracy. However, preliminary evidence indicates that at least some of the patterns traditionally classified as “phonological” may instead be underlain by speech motor inaccuracy. For example, children who exhibit patterns traditionally labeled as phonological “backing” and “fronting” often produce undifferentiated lingual gestures, in which movement of the tongue tip, tongue body, tongue dorsum, and the lateral margins of the tongue are not independently controlled. It has been speculated that this pattern is driven primarily by developmental constraints on the independent movement of the jaw and tongue.
Our objective is to conduct a fine-grained investigation of the articulatory movements underlying “phonological” errors using real-time magnetic resonance imaging. We hypothesize that patterns traditionally classified as “phonological” processes may be more accurately understood as systematic motor speech errors, and that these errors arise primarily from developmental constraints on the movement and coordination of the speech articulators.