Auditory-Visual Integration for Speech and Non-Speech Presentations

or
What the Computational Hearing Community Should Know About How Human Communicators Integrate Eye and Ear

Ken Grant, Walter Reed National Military Medical Center


Abstract

Our auditory system, along with our very smart human brains, is tasked with solving several real-world problems, such as, forming, localizing, and attending to acoustic “objects” in complex auditory and visual scenes. The fact that humans have evolved to solve many of these challenges should not be overlooked when designing computational models of hearing and communication. For example, when the auditory scene becomes too complex, listeners have several options. They can use advanced hearing technologies to improve the speech-to-noise ratio (e.g., using noise reduction and/or directional microphones), watch the speakers’ face (i.e., speechreading) along with other facial and body gestures, and integrate this information with impoverished and potentially distorted auditory information, or they can simply choose to leave the chaotic scene in favor of a less complex environment. For listeners with hearing impairment, auditory scenes that are processed successfully by listeners with normal-hearing thresholds, often pose unique and difficult problems for hearing-impaired listeners, and for these listeners and conditions, clinicians do recommend facing the speaker to extract visual speech information. In fact, in many instances, we have demonstrated that the unaided auditory-visual (AV) condition can be more intelligible than an auditory-only (AO) hearing-aided condition. In this talk, I will present some of the many ways that integrating eye and ear during speech communication provides significantly more information than the AO speech signal, dramatically and seamlessly improves the speech-to-noise ratio, allows for faster processing of syllables and words, enhances the ability to attend to the desired speech target, and reduces listening effort. These data provide crutial information for developing biologically inspired computational models of AV speech recognition.

[Work supported by a grant from CDMRP, #DM130027.  The views expressed in this abstract are those of the authors and do not reflect the official policy of the Department of Army/Navy/Air Force, Department of Defense, or U.S. Government.]