‘Analysis-by-synthesis’ in auditory-visual speech perception

‘Analysis-by-synthesis’ in auditory-visual speech perception

Virginie van Wassenhove, Department of Radiology, University of California, San Francisco

Abstract
The ‘analysis-by-synthesis’ model of speech processing was described by Halle and Stevens in 1962. We build on this model through a series of electroencephalographic (EEG) and psychophysical experiments and argue that it provides a sensible framework for auditory-visual (AV) speech perception. First, the natural dynamics of AV speech permit visual information to be extracted prior to the auditory speech. Using EEG, we show that the latency of classic auditory evoked-potentials is systematically shortened as a function of preceding visual speech inputs, while their amplitude is independently reduced. Second, perceptual detection and identification of AV speech tolerate desynchronization larger than those observed for non-speech stimuli. The temporal window of subjective simultaneity points out to two fundamental units of speech: sub-phonetic (~25ms) and syllabic (~250ms). Third, the perception of temporal order in AV speech inputs is accompanied by spectro-temporal shifts of the EEG signals in the gamma and theta bands, supporting the view that in AV speech, these two units are extracted in parallel. We hypothesize that the spectro-temporal complexity and representational status of speech compared to non-speech signals is key to characterizing multisensory perceptual systems.

Not available

Back to Abstract