Do you see what I’m saying? Optimal Visual Enhancement of Speech Recognition in Noisy Environments

Lars A. Ross, Program in Cognitive Neuroscience, Department of Psychology at the City College of the City University New York and the Nathan Kline Institute for Psy

Abstract
Viewing a speaker’s articulatory movements substantially improves a listener’s ability to understand spoken words, especially under noisy environmental conditions. It has been claimed that this gain is most pronounced when auditory input is weakest, an effect that has been related to a well-known principle of multisensory integration - inverse effectiveness. In contrast, we show that this principle does not apply for audio-visual speech perception. Rather, the gain from viewing visual articulations is maximal at intermediate signal-to-noise ratios (SNRs) well above the lowest auditory SNR where the recognition of whole words is significantly different from zero. The multisensory speech system appears to be optimally tuned for SNRs between extremes, extremes where the system relies on either the visual (speech-reading) or the auditory modality alone, forming a window of maximal integration at intermediate SNR- levels. At these intermediate levels, the extent of multisensory enhancement of speech-recognition is considerable, amounting to more than a threefold performance improvement relative to an auditory-alone condition. Additional data collected from patients with schizophrenia show that the gain from additional visual stimulation in speech recognition in noise is not as pronounced as in controls, while unimodal auditory speech recognition remains intact.

Not available

Back to Abstract