Audio-Visual Integration

Home / Event / Audio-Visual Integration

T1.1 A common mechanism processes auditory and visual motion

Alais, D., Fernández Folgueiras U. & Leung, D.
University of Sydney                                                                                                                                                         

Show abstract

Neuroimaging studies suggest human visual area V5, an area specialised for motion processing, responds to movement presented in the visual, auditory or tactile domains. Here we report behavioural findings strongly implying common motion processing for auditory and visual motion. We presented brief translational motion stimuli drifting leftwards or rightwards in either the visual or auditory modality at various speeds. Using the method of single stimuli, observers made a speed discrimination on each trial, comparing the current speed against the average of all presented speeds. Data were compiled into psychometric functions and mean perceived speed was calculated. A sequential dependency analysis was used to analyse the adaptive relationship between consecutive trials. In a vision-only experiment, motion was perceived as faster after a slow preceding motion, and slower after a faster motion. This is a negative serial dependency, consistent with the classic ‘repulsive’ motion aftereffect (MAE). In an audition-only experiment, we found the same negative serial dependency, showing that auditory motion produces a repulsive MAE in a similar way to visual MAEs. A third experiment interleaved auditory and visual motion, presenting each modality in alternation to test whether sequential adaptation was modality specific. Whether analysing vision preceded by audition, or audition preceded by vision, negative (repulsive) serial dependencies were observed: a slow motion made a subsequent motion seem faster (and vice versa) despite the change of modality. This result shows that the motion adaptation was supramodal as it occurred despite the modality mismatch between adaptor and test. We conclude that a common mechanism processes motion regardless of whether the input is visual or auditory. 

Hide abstract

 

T1.2 Effects of horizontal and vertical discrepancy of visual-auditory stimuli on reaction time: Multisensory integration or exogenous spatial attention? A TWIN analysis

Diederich, A. & Colonius, H.
Oldenburg University                                                                                                    

Show abstract

A classic behavioral finding in multisensory research is that the speed of a response to a stimulus of one modality is modulated by the spatiotemporal co-occurrence of a stimulus from another modality, even when subjects are instructed to ignore the auditory non-target (focused attention paradigm). Here we report a manual RT experiment with visual targets and auditory non-targets aligned either horizontally or vertically. Average reaction time (RT) to a visual target, when an acoustic non-target was presented 250 ms or less prior to the target, was reduced depending on the spatiotemporal configuration of the stimuli. Specifically, facilitation was larger when stimuli were presented in azimuth compared to elevation, and it increased when the spatial distance between target and non-target decreased. However, this spatial effect diminished the closer in time both stimuli were presented.

Findings like this have raised the issue of whether RT facilitation in this context should be attributed to genuine multisensory integration (MSI) or simply an exogenous spatial attention (ESA) effect (e.g., Van der Stoep et al, 2015 APP). Here we develop a new version of the time window of integration (TWIN) model presented in Diederich & Colonius (2008 ExpBrainRes). The model allows both MSI and ESA to occur in one and the same trial but, depending on the temporal alignment of target and non-target, only integration, only attention, or neither of them may occur. Moreover, the model yields numerical estimates of the contribution of both MSI and ESA and their possible interaction. First quantitative and qualitative tests of the model with data from the above experiment suggest that it can account for the observed pattern of results.   

Hide abstract

 

T1.3 Generalizing audio-visual integration: what kinds of stimuli have we been using?

Schutz, M & Gillard, J.
McMaster University                                                                                                      

Show abstract

The tension between experimental control and ecological validity presents continual challenges in psychological research. In particular, prominent figures in audition have long voiced concern (Gaver, 1993; Neuhoff, 2004) with the simplistic sounds used in perceptual experiments, which pose barriers to understanding the processes involved in listening to natural sounds. My team’s work on audio-visual integration illustrates this challenge, documenting findings at odds with widely accepted theory when assessing both the duration (Schutz & Lipscomb, 2007) and temporal order (Chuen & Schutz, 2016) of audio-visual stimuli. These surprising breaks stem from clear differences between the complex structure of natural sounds in contrast to simplistic beeps, raising important questions about the degree to which theories derived from artificial tones generalize beyond the laboratory.

To contribute to our understanding of stimuli used in auditory perception research, my team recently completed a survey of 1000 experiments drawn from hundreds of studies across the complete history of four key journals: Journal of Experimental Psychology, Attention, Perception & Psychophysics, Journal of the Acoustical Society of America, Hearing Research. This provides the first comprehensive overview of auditory stimuli, tabulating temporal and spectral attributes, as well as duration, frequency, and loudness. Curiously, it appears researchers failed to define the temporal structure of 38% of stimuli from these prominent studiesÑan important aspect of sound playing a crucial role in audio-visual integration (Grassi & Casco, 2012; Schutz & Kubovy, 2009). More importantly, over 85% of the stimuli encountered in this wide range of experiments fail to exhibit temporal variation beyond simple ramped onsets and offsets. My talk will review the key findings of this novel survey, with a particular focus in implications for interpreting recent findings from the audio-visual integration literature. 

Hide abstract

 

T1.4 Concurrent Unimodal Learning Enhances Multisensory Responses of Symmetric Crossmodal Learning in Robotic Audio-Visual Tracking

Shaikh, D.
University of Southern Denmark                                                                                          

Show abstract

Crossmodal sensory cue integration is a fundamental process in the brain by which stimulus cues from different sensory modalities are combined together to form an coherent and unified representation of observed events in the world. Crossmodal integration is a developmental process involving learning, with neuroplasticity as its underlying mechanism. We present a Hebbian-like temporal correlation learning-based adaptive neural circuit for crossmodal cue integration that does not require such a priori information. The circuit correlates stimulus cues within each modality as well as symmetrically across modalities to independently update modality-specific neural weights on a moment-by-moment basis, in response to dynamic changes in noisy sensory stimuli. The circuit is embodied as a non-holonomic robotic agent that must orient a towards a moving audio-visual target. The circuit continuously learns the best possible weights required for a weighted combination of auditory and visual spatial target directional cues. The result is directly mapped to robot wheel velocities to illicit a multisensory orientation response. Trials in simulation demonstrate that concurrent unimodal learning improves both the overall accuracy and precision of the multisensory responses of symmetric crossmodal learning.       

Hide abstract

 

T1.5 Differential effects of the temporal and spatial distribution of audiovisual stimuli on cross-modal spatial recalibration

Bruns, P. & Rӧder, B.
University of Hamburg                                                                                                    

Show abstract

Auditory spatial representations are constantly recalibrated by visual input. Exposure to spatially discrepant audiovisual stimuli typically results in a localization bias (ventriloquism aftereffect, VAE), whereas exposure to spatially congruent audiovisual stimuli results in an improvement (multisensory enhancement, ME) in auditory localization. In previous studies of VAE and ME, audiovisual stimuli have typically been presented at a steady rate of one or two stimuli per second (i.e., 1-2 Hz) with a fixed spatial relationship between the auditory and visual stimulus. However, it is known from unisensory perceptual learning studies that presenting stimuli at a higher frequency of 10-20 Hz, which mimics long-term potentiation (LTP) protocols, often leads to improvements while low-frequency stimulation around 1-2 Hz leads to impairments in performance. Therefore, in Experiment 1 we tested whether cross-modal spatial learning is similarly affected by the stimulation frequency. Unisensory sound localization was tested before and after participants were exposed to either audiovisual stimuli with a constant spatial disparity of 13.5¡ (VAE) or congruent audiovisual stimulation (ME). Audiovisual stimuli were either presented at a low frequency of 2 Hz or at a high frequency of 10 Hz. Compared to low-frequency stimulation, the VAE was reduced after high-frequency stimulation, whereas ME occurred with both stimulation protocols. In Experiment 2, we manipulated the spatial distribution of the audiovisual stimulation in the low-frequency condition. Audiovisual stimuli were presented with varying audiovisual disparities centered around 13.5¡ (VAE) or 0¡ (ME). Both VAE and ME were equally strong compared to the fixed spatial relationship of 13.5¡ (VAE) or 0¡ (ME) in Experiment 1. Taken together, our results suggest that (a) VAE and ME represent partly dissociable forms of learning, (b) stimulation frequency specifically modulates adaptation to spatially misaligned audiovisual stimuli, and (c) auditory representations adjust to the overall stimulus statistics rather than to a specific audiovisual spatial relationship.

Hide abstract

 


 

Event Timeslots (1)

Friday, June 15
-