9 Perceptual Effects of Cross-Modal Stimulation: Ventriloquism and the Freezing Phenomenon JEANVROOME~ AND BEATRICEDE GELDER Introduction For readers of a book on multimodal perception, it probablycomes as no surprise that most events in real lifeconsistof perceptual inputs in more than one modality and that sensory modalities may influence each oilier.For example, seeing a speaker provides not only auditoryinformation, conveyed in what is said, but also visualinformation, conveyed through the movements of ilie lips,face, and body, as well as visual cues about the originof the sound. Most handbooks on cognitive psychologypay comparatively little attention to this multimodalstateof affairs, and the different senses-seeing, hearing,smell, taste, touch-are treated as distinct and separatemodules with little or no interaction. It is becomingincreasingly clear, however,that when the differentsensesreceive correlated input about th~ same externalobject or event, information is often combined by the perceptual system to yield a multimodally determinedpercept. An important issue is to characterize such multisengoryinteractions and their cross-modal effects. THere areat leastthree different notions at stake here. One is iliat information is processed in a hierarchical and strictlyfeed-forwardfashion. In this view, information fromdifferent sensory modalities converges in a multimodalrepresentation in a feed-forward way. For exampIe,in the fuzzy logic model of perception (Massaro, 1998),degrees of support for different alternatives fromeach modality-':'-say,audition and vision-are determinedand then combined to give an overall degree of support.Information is propagated in a strictly feedforwardfashion so that higher-order multimodal representationsdo not affect lower-order sensory-specific representations. There is thus no cross-talkbetween the sensory modalities such that, say,vision affetts early processing stagesof audition or vice versa. Cross-modal interactionsin feed-forward models take place only at or beyondmultimodal stages.An alternative possibility is that multimodal representations send feedbackto primary sensory levels (e.g., Driver & Spence, 2000). In this view, higher-order multimodallevels can affect sensory levels. Vision might thus affect audition, but only via multimodal representations. Alternatively, it may also be the casethat cross-modalinteractions take place without multimodal representations. For example, it may be that the sensesaccesseach other directly from their sensory-specificsystems(e.g., Ettlinger & Wilson, 1990). Vision might then affect audition without the involvement of a multimodal representation (see, e.g., Falchier, Clavagnier, Barone, & Kennedy, 2002, for recent neuroanatomical evidence showing that there are projections from primary auditory cortex to the visual area VI). The role of feedback in sensory processing has been debated for a longtime (e.g., the interactive activation model of reading by Rumelhart and McClelland, 1982). However, as far as cross-modal effects are concerned, there is at present no clear empirical evidence that allows distinguishing among feed-forward, feedback, and direct-accessmodels. Feed-forward models predict that early sensory processing levels should be autonomous and unaffected by higher-order processing levels, whereas feedback or direct-access models would, in principle, allow that vision affects auditory processes,or vice versa. Although this theoretical distinction seems straightforward, empirical demonstration in favor of one or another alternative has proved difficult to obtain. One of the main problems is to find measuresthat are sufficiently unambiguous and that can be taken as pure indices of an auditory or visual sensory process. Among the minimal requirements for stating that a cross-modaleffect has perceptual consequencesat early sensory stages, the phenomenon should at least be (1) robust, (2) not explainable as a strategic effect, and (3) not occurring at response-relatedprocessing stages. If one assumes stagewise processing, with sensation coming before attention (e.g. the "late selection" view 141 of attention), one might also want to argue that (4) cross-modal effects should be pre-attentive. If these minimal criteria are met, it becomes at least likely that cross-modal interactions occur at early perceptual processing stages, and thus that models that allow' access to primary processing levels (i.e., feedback or direct-accessmodels) better describe the phenomenon. In our work on cross-modalperception, we investigated the extent to which such minimal criteria apply to some casesof audiovisual perception. One caseconcerns a situation in which vision affects the localization of a sound (the ventriloquism effet.t) , the other a situation in which an abrupt sound affects visual processing of a rapidly presented visual stimulus .(the freezing phenomenon). In Chapter 36 of this volume, we describe the case of cross-modal interactions in affect perception. Each of these phenomena we consider to be based on cross-modal interactions affecting early levels of perception. ecologically useful, because spatial resolution in the visual modality is better than in the auditory one. However, there are also other, more trivial explanations of the ventriloquism effect. One alternative is similar to Stroop task interference: when two conflicting stimuli are presented together, such as when the word blue is written in red ink, there is competition at the level of response selection rather than at a perceptual level per se. Stroop-like response competition may also be at play in the ventriloquism effect. In that case,there would be no real attraction between sound and vision, but the ventriloquist illusion would be derived from the fact that~ubjects sometimes point to the visual stimulus instead of the sound by mistake. Strategic or cognitive factors may also playa role. For example, a subject may wonder why sounds and lights are presented from different locations, and then adopt a postperceptual responsestrategy that satisfiesthe experimenter's ideasof the task (Bertelson, 1999). Of course, these possibilities are not exclusive, and the experimenter should find Vision affecting sound localization: waysto check or circumvent them. The ventriloquism effect In our research, we dealt with these and other aspects of the ventriloquist situation in the hope of showing Presenting synchronous auditory and visual informathat the apparent location of a sound is indeed shifted tion in slightly separate locations creates the illusion at a perceptual level of auditory spaceperception. More that the sound is coming from the direction of the vispecifically,we askedwhether a ventriloquism effect can sual stimulus. Although the effect is smaller, shifting of be observed (1) when subjects are explicitly trained to the visual percept in the direction of the sound has, at ignore the visual distracter; (2) when cognitive strateleast in some studies, also been observed (Bertelson & gies of the subject to respond in a particular way can be Radeau, 1981). The auditory shift is usually measured exclud~d; (3) when the visual distracter is not attended, by asking subjects to localize the sound by pointing or either endogenously or exogenously; and (4) when the by fixating the eyes on the apparent location of the visual distracter is not seen consciously. We also asked sound. When localization responsesare compared with (5) whether the ventriloquism effect as such is possibly a control condition (e.g., a condition in which the a pre-attentive phenomenon. sound is presented in isolation without a visual stimu1. A visual distractercannot beignored In a typical venIus), a shift of a few degreesin the direction of the visual triloquism situation, subjects are asked to locate a stimulus is usually observed. Such an effect, produced sound while ignoring a visual distracter. Typically, subby audiovisual spatial conflict, is called the ventriloquism jects remain unaware of how well they perform during effect,becauseone of the most spectacular everyday exthe experiment and how well they succeed~in obeying amples is the illusion created by performing ventriloinstructions. In one of our experiments, though, we quists that the speech they produce without visible faaskedwhether it is possible to train subjectsexplicitly to cial movements comes from a puppet they agitate in ignore the visual distracter (Vroomen, Bertelson, & de synchrony with the speech. Gelder, 1998). If despite training it is impossible to igA standard explanation of the ventriloquism effect is nore the visual distracter, this speaksto the robustness the following: when auditory and visual stimuli occur in of the effect. Subjects were trained to discriminate close temporal and spatial proximity, the perceptual sys- among sequencesof tones that emanated either from a tem assumesthat a single event has occurred. The percentral location only or from alternating locations, in ceptual systemthen tries to reduce the conflict between which case two speakers located next to a computer the location of the visual and auditory data, because screen emitted the tones. With no visual input, this there is an a priori constraint that an object or event can same/ different location task wasvery easy,becausethe have only one location (e.g., Bedford, 1999). Shifting difference between the central and lateral locations was the auditory location in the direction of the visual event clearly noticeable. However, the task was much more rather than the other way around would seem to be 'difficult when, in synchrony ~th the tones, light flashes I 142 PERCEPTUAL CONSEQUENCES OF MULTIPLE SENSORYSYSTEMS I t I. I werealternatedleft and right on a computer screen. occur was the dependent variable. In the study by Bertelson and Aschersleben, sounds were delivered together with a visual distracter, a light-emitting diode (LED) flash, in a central location. When the LED flash was synchronized with the sound, response reversal occurred earlier than when the light was desynchronized from the sound. Apparently, the synchronized LED flashes attracted the apparent location of the sound toward its central location, so that response reversal occurred earlier on the staircase.Similar results have now werethat the larger the separationbetweenthe lights, themor~ false alarms occurred (subjectsrespon~ed been reported by Caclin, Soto-Faraco,Kingstone,and S~ence(2002) showi~gthat a centrally located ~ctile ! "alternatlngsound" to a centrally presented ventnlor I The location at which theseresponsereversalsbeganto This condition created the strong impression that soundsfrom the central location now alternated betweenleft and right, presumably because the light flashesattracted the apparent location of the sounds. Wethen tried to train subjects to discriminate centrally presentedbut ventriloquized sounds from sounds that alternated physically between the left and right. Subjects were instructed to ignore the lights as much as possible(but without closing their eyes), and they receivedcorrective feedback after each trial. The results quizedsound), presumablybecausethe farther apart thelightswere,the farther apart wasthe perceived10- stImulus attracts a perIpheral sound toward the mIddle. It is important to note that there is no waysubjectscan figure out a responsestrategythat might lead to this re- cationof the sounds. Moreover, although feedback was suIt, because once response reversal begins to occur, providedafter each trial, performance did not improve subjects no longer know whether a sound belongs to a in the courseof the experiment. Instructions and feedleft or a right staircase.A conscious response strategyin backthus could not overcome the effect of the visual this situation is thus extremely unlikely to account for distracteron sound localization, which indicates that the effect. theventriloquism effect is indeed very robust. Conscious strategies are also unlikely to playa role 2. A ventriloquismeffectis obtainedeven when cognitive when the effect of presenting auditory and visual stimuli strategies can be excluded When subjects are asked to at separate locations is measured as an aftereffect. The pointto the location of auditory stimuli while ignoring aftereffect is a shift in the apparent 10catiQJlof unimodally spatiallydiscrepant visual distracters, they may be aware presented acoustic stimuli consequent on exposure of the spatial discrepancy and adjust their response acto synchronous but spatially disparate auditory-visual cordingly.The visual bias one obtains may then reflect stimulus pairs. Initial studiesused prisms to shift the relapostperceptualdecisions rathe.r than genuine perceptive locations of visual and auditory stimuli (Canon, 1970; tualeffects.However, contamination by strategies can Radeau & Bertelson, 1974). Participants localized bepreventedwhen ventriloquism effects are studied via acoustic targets before and after a period of adaptation. a staircaseprocedure or as an aftereffect. During the adaptation phase, there was a mismatch Bertelsonand Aschersleben (1998) were the first to between the spatial locations of acoustic and visual applythe staircaseprocedure in the ventriloquism situstimuli. Typically, between pre- and posttest a shift of ation.The advantage of a staircaseprocedure is that it about 1-4 degreeswasfound in the direction of the visual is not transparent, and the effects are therefore more attractor. Presumably, the spatial conflict between likely to reflect genuine perceptual processes. In auditory and visual data during the adaptation phasewas the staircase procedure used by Bertelson and resolved by recalibration of the perceptual system,and Aschersleben, subjectshad to judge the apparent origin this/alteration lasted long enough to be detected as ofa stereophonicallycontrolled sound asleft or right of aftereffects. It should be noted that aftereffects are amedianreference point. Unknown to the subjects,the measured by comparing unimodal pointing to a sound locationof the sound waschanged asa function of their before and after an adaptation phase. Stroop-like judgment,following the principle of the psychophysical response competition between the auditory target and staircase. Mter a "left" judgment, the next sound on the visual distractor during the test situation thus plays no samestaircasewasmoved one step to the right, and vice role, becausethe test sound is presented without a visual versa.A staircasestarted with sounds coming from an distracter. Moreover, aftereffects are usually obtained extremeleft or an extreme right position. At that stage, when the spatial discrepancybetween auditory and visual correctresponseswere generally given on each succes- stimuli is sosmall that subjectsdo not evennotice the sepsivetrial, so that the target sounds moved progressively aration (Frissen,Bertelson, Vroomen, & de Gelder, 2003; towardthe center. Then, at some point, response reversal5 Radeau & Bertelson, 1974;Vroomen et al., 2003; Woods (responsesdifferent from the preceding one on the & Recanzone, Chap. 3, this volume). Mtereffects can samestaircase)began to occur. Thereafter, the subjects therefore be interpreted astrue perceptual recalibration wereno longer certain about ~e location of the sound. effects (e.g., Radeau, 1994). VROOMEN AND DE GELDER: PERCEPTUAL EFFECTS OF CROSS-MODAL STIMULATION 143 3. Attention towardthevisual distracteris not needed to obtain a ventriloquismeffect A relevant question is whether attention plays a role in the ventriloquism effect. One could argue, asTreisman and Gelade (1980) have done in feature-integration theory for the visual modality, that focused attention might be the glue that combines fea- sensory modality rather than another regardless of location (Spence& Driver, 1997a), or one may attend to one particular location rather than another regardless of modality. Furthermore, in the literature on spatial attention, two different means of the allocation of attention are generally distinguished. First, there is an tures across modalities. Could it be, then, that when a endogenousprocess by which attention can be moved vol- ! sound and visual distracter are attended, an integrated cross-modal event is perceived, but when they are unattended, two separate events are perceived that do not interact? If so, one might predict that a visual distracter would have a stronger effect on the apparent location of a sound when it is focused upon. We considered the possibility that ventriloquism indeed requires or is modulated by this kind of focused attention (Bertelson, Vroomen, de Gelder, & Driver, 2000). The subjects' task was to localize trains of tones while monitoring visual events on a computer screen. On experimental trials, a bright square appeared on the left or the right of the screen in exact synchrony with the tones. No square appeared on control trials. The attentional manipulation consisted of having subjects monitor either the center of the display, in which case the attractor square was in the visual periphery, or the lateral square itself for occasional occurrences of a catch stimulus (a very small diamond that could only be detected when in the fovea). The attentional hypothesis predicts that the attraction of the apparent location of the sound by the square would be stronger with attention focused on the attractor square than with attention focused on the center. In fact, though, equal degrees of attraction were obtained in the two attention conditions. Focused attention thus did not modulate the ventriloquist effect. However,the effect of attention might havebeen small and overruled by the bottom-up information from the laterally presented visual square. What would happen when the bottom-up information wasmore ambiguous? Would an effect of attention then appear? In a second experiment, we used bilateral squaresthat were flashed in synchrony with the sound so as to provide competing visual attractors. When the two squares were of equal size, auditory localization was unaffected by which side participants monitored for visual targets, but when one square was larger than the other, auditory localization was reliably attracted toward the bigger square, again regardlessof where visual monitoring wasrequired. This led to the conclusion that the ventriloquism effect largely reflects automatic sensoryinteractions, with little or no role for focused attention. In discussinghow attention might influence ventriloquism, though, one must distinguish several sensesin which the term attentionis used. One may attend to one untarily. Second, there is an automatic or exogenous mechanism by which attention is reoriented automatically to stimuli in the environment with some special features. The study by Bertelson, Vroomen, et al. (2000) manipulated endogenous attention by asking subjects to focus on one or the other location. Yet it may have been the case that the visual distracter received a certain amount of exogenous attention independent of where the subject was focusing. For that reason, one might ask whether capture of exogenous attention by the visual distracter is essential to affect the perceived location of a sound. To investigate this possibility, we tried to create a situation in which exogenous attention wascaptured in one direction, whereas the apparent location of a sound was ventriloquized in the other direction (Vroomen, Bertelson, & de Gelder, 2001a). Our choice was influenced by earlier data showing that attention can be captured by a visual item differing substantially in one or several attributes (e.g., color, form, orientation, shape) from a set of identical items among which it is displayed (e.g., Treisman & Gelade, 1980). The unique item is sometimes called the singleton, and its influence on attention is referred to as the singleton effect. Ifventriloquism is mediated by exogenous attention, presenting a sound in synchrony with a display that contains a singleton should shift the apparent location of the sound toward the singleton. Consequently, finding a singleton that would not shift the location of a sound in its direction would provide evidence that exogenous attention can be dissociated from ventriloquism. We used a psychophysical staircase procedure as in Bertelson and Aschersleben (1998). The occurrence of visual bias was examined by presenting a display in synchrony with the sound. We tried to shift the apparent 10cation of the sound in the opposite direction of the singleton by using a display that consisted of four horizontally aligned squares: two big squares on one side, and a big square and a small square (the singleton) on the other side (Fig. 9.1). The singleton was either in the far left or in the far right position. A visual bias dependent on the position of the singleton should manifest itself at the level of the locations at which reversalsbegin to occur on the staircasesfor the two visual displays. If, for instance, the apparent location of the sound were attracted toward the singleton, reversals 144 PERCEPTUAL CONSEQUENCES OF MULTIPLE SENSORYSYSTEMS I FIGURE 9.1 An example of one of the displays used by Vroomen,Bertelson, and de Gelder (2001a). Subjects saw. " lour (th squares, one e . SIng 1 t e ) all on sm th er th an th e 0 war e SIng eton. n contro trl s, ' !twas fi d oun th th at or right of the central reference. With this mixed de..' sIgn, results were stIll ers. Whenthe display was flashed on a computer screen, subjects hearda stereophonicallycontrolled sound whose location hadto bejudged as left or right of the median fixation cross. Theresultsshowed that the apparent location of the sound wasshiftedin the direction of the two big squares,and not tod th . I 0 I . al the singleton. In the control condition, the display consisted of four equally sized big squares. Discrimination performance wasworse in the singleton condition than in the control condition, thus showing that attention wasattracted awayfrom the target letter and toward the singleton. Nevertheless,one might still argue that a singleton in the sound localization task did not capture attention becausesubjects were paying attention to audition, not vision. In a third experiment, we therefore randomized sound localization trials with visual X/O discrimination trials so that subjectsdid not know in advance which task they had to perform. When subjects sawan X or an 0, they pressed as fast as possible a corresponding key; otherwise, when no letter wasdete~ted, they decided whether the sound had come from the left e . sm-. gletonattractedvisual attention. The direction in which asoundwasventriloquizedwasthus dissociatedfrom where exogenous attentionwascaptured. f , wouldfirst occur at locations more to the left for the displaywith the singleton on the right than for the display withthe singleton on the left. The results of this experiment were very straightforward.The apparent origin of the sound was not shifted towardthe singleton but in the opposite direction, toward the two big squares. Apparently, the two big squares on one side of the display were attracting the apparentorigin of the sound more strongly than the smalland big square at the other side. Thus, the attractor sizeeffect that we previously obtained (Bertelson, Vroomen,de Gelder, & Driver, 2000) occurred with the presentvisual display aswell. This result thus suggested thatattraction of a sound was not mediated through exogenous attention capture. However, befor~ that conclusioncould be drawn, it was necessaryto check that thevisual display had the capacity to attract attention towardthe singleton. We therefore ran a control experiment in which the principle was to measure the attention-attracting capacity of the small square through its effect on the discrimination of targets presentedelsewhere in the display. In the singleton condition, participants were shown the previously used displaywith the three big squares and the small one. A targetletter X or 0, calling for a choice reaction, was displayedin the most peripheral big square opposite exactly as before. attentIon was .. at. tracted toward the smgleton whIle the sound was shIfted awayfrom the singleton. Strategic differences between an auditory and a visual task were thus unlikely to explain the result. Rather we demonstrated a dissociation " between ventriloquIsm ' . and exogenous attentIon. . the . apparent locatIon of the sound was shIfted toward the two big squares (or the "center of gravity" of the visual display), while the singleton attracted exogenous attention. The findings from the studies concerning the role of exogenous attention together with those of the earlier one showing the independence of ventriloquism from the direction of endogenous attention (Bertelson, Vroomen,- et. al., 2000) thus support the conclusion that ventriloquism is not affected by the direction of attention. 4. The ventriloquismeffectis still obtainedwhen the visual distracter is not seen consciously The conclusion that attention is not needed to obtain a ventriloquism effect is further corroborated by our work on patients with unilateral visual neglect (Bertelson, Pavani, Ladavas, Vroomen, & de Gelder, 2000). The neglect syndrome is usually interpreted as an attentional deficit and reflected in a reduced capacity to report stimuli in the contralateral side (usually the left). Previously, it had been reported that ventriloquism could improve the attentional deficit. Soroker, Calamaro, and Myslobodsky (1995) showed that inferior identification of syllables delivered through a loudspeaker on the left (auditory neglect) could be improved when the same stimuli on the left were administered in the presence of a fictitious loudspeaker on the right. The authors attributed this improvement to a ventriloquism effect, even though their setting wasvery different from the usual ventriloquism situation. That is, their visual stimulus was stationary, whereas typically the onset and offset of an auditory and visual stimulus are synchronized. The VROOMENAND DE GELDER:PERCEPTUALEFFECTSOF CROSS-MODAL STIMULATION 145 effect was therefore probably mediated by higher-order knowledge about the fact that sounds can be delivered through loudspeakers. In our research, we used the more typical ventriloquism situation (a light and a sound presented simultaneously) and asked whether a visual stimulus that remains undetected because it is presented in the neglected field neverthelessshifts the apparent location of the sound toward its location. This may occur becausealthough perceptual awarenessis compromised in neglect, much perceptual processing can still proceed unconsciously for the affected side. Our results were clear: patients with left visual neglect consistently failed to detect a stimulus presented in their left visual field; nevertheless,their pointing to a sound wasshifted in the direction of the visual stimulus. This is thus another demonstration that ventriloquism is not dependent on attention or even awarenessof the visual distracter. 5. The ventriloquismeffectis a pre-attentivePhenomenon The previous studiesled us to conclude that cross-modal interactions take place without the need of attention. This stageis presumably one concerned with the initial analysisof the spatial scene (Bertelson, 1994). The presumption receivesadditional support from the findings by Driver (1996) in which the visual bias of auditory location was measured in the classic "cocktail party" situation through its effect in facilitating the focusing of attention on one of two simultaneous spoken messages. Subjects found the shadowing task easier when the apparent location of the target sound wasattracted away from the distracter by a moving face. This result thus implies that focused attention operates on a representation of the external scenethat has already been spatially reorganized by cross-modalinteractions. We asked whether a similar cross-modal reorganization of external space occurs when exogenous rather than focused attention is at stake (Vroomen, Bertelson, & de Gelder, 2001b). To do so, we used the orthogonal cross-modal cuing task introduced by Spence and Driver (1997b). In this task, participants are asked to judge the elevation (up vs. down, regardless of whether it is on the left or right of fixation) of peripheral targets in either audition, vision, or touch following an uninformative cue in either one of these modalities. In general, cuing effects (i.e., faster responseswhen the cue is on the same side as the target) have been found across all modalities, except that visual cues do not affect responsesto auditory targets (Driver & Spence, 1998; but see McDonald, Teder-Salejirvi, Heraldez, & Hillyard, 2001; Spence,2001). This then opens an intriguing possibility: What happens with an auditory cue whose veridical location is in the center but whose apparent 10cation is ventriloquized toward a simultaneous light in 146 the periphery? Can such a ventriloquized cue affect responses to auditory targets? The ventriloquized cue consisted of a tone presented from an invisible central speaker synchronized with a visual cue presented on the left or right. Depending on the stimulus onset asynchrony (SOA)-100, 300, or 500 ms-a target sound (white noise bursts) was delivered with equal probabilities from one of the four target speakers.Subjectsmade a speeded decision about whether the target had been delivered through one of the upper speakersor one of the lower speakers.Results showed that visual cues had no effect on auditory target detection (see also Spence & Driver, 1997b). More important, ventriloquized cues had no cuing effect at 100 ms SOA, but the facilitatory effect appeared at 300 and 500 ms SOA. This suggests that a ventriloquized cue directed auditory exogenous attention to the perceived rather than the physical auditory location, implying that the cross-modal interaction between vision and audition reorganized spaceon which auditory exogenous attention operates. Spence and Driver (2000) reported similar cuing effects (with a somewhat different time course) in their study of ventriloquized cuesin the vertical dimension. They showed that a visual cue presented above or below fixation led to a vertical shift of auditory attention when it was paired with a tone presented at fixation. Attentional capture can thus be directed to the apparent location of a ventriloquized sound, suggesting that cross-modalintegration precedes or at least co-occurs with reflexive shifts of covert attention. To summarize the case of ventriloquism, the apparent location of a sound can be shifted in the direction of a visual stimulus that is synchronized with the sound. It is unlikely that this robust effect can be explained solely by voluntary response strategies,as it is obtained with a psychophysical staircase procedure and can be observed as an aftereffect. The effect is even obtained when the visual distracter is not seen consciously,asin patients with hemineglect. Moreover, the ventriloquism effect does not require attention because it is not affected by whether a visual distracter is focused on or not, and the direction of ventriloquism can be dissociated from where visual attention is captured. In fact, the ventriloquism effect may be a pre-attentive phenomenon, becauseauditory attention can be captured at the ventriloquized location of a sound. Taken together, these observations indicate that the ventriloquism effect is perceptually "real." Soundaffectingvision: Thefreezingphenomenon Recently we described a case of sound affecting vision (Vroomen & de Gelder, 2000). The basic phenomenon PERCEPTUALCONSEQUENCES OF MULllPLE SENSORY SYSTEMS r::e described as follows: when subjects are shown a rapidly changing visual display, an abrupt sound may "freeze"the display with which the sound is synchronized.Perceptually, the display appears brighter or appearsto be shown for a longer period of time. We describedthis phenomenon against the background of sceneanalysis. Scene analysisrefers to the concept that information reachingthe sense organs is parsed into objects and events.In vision, scene analysissucceedsdespite partial occlusionof one object by the other, the presence of shadows extending acrossobject boundaries, and deformationsof the retinal image produced by moving objects.Vision is not the only modality in which object segregationoccurs. Auditory object segregation has also beendemonstrated (Bregman, 1990). It occurs, for instance,when a sequence of alternating high- and lowfrequencytones is played at a certain rate. When the frequencydifference between the tones is small, or when thetones are played at a slow rate, listeners are able to followthe entire sequence of tones. But at bigger frequencydifferences or higher rates, the sequence splits into two streams, one high and one low in pitch. Althoughit is possible to shift attention between the two streams, it is difficult to report the order of the tones in theentire sequence. Auditory stream segregation, like apparentmotion in vision, appears to follow Korte's third law (Korte, 1915). When the difference in frequencybetween the tones increases,stream segregation occursat longer SOAs. Bregman (1990) described a number of Gestaltprinciplesfor auditory scene analysis in which he stressed the resemblance between audition and vision, since principlesof perceptual organization such as similarity (in volume, timbre, spatial location), good continualion,and common fate seem to play similar roles in the twomodalities. Such a correspondence between principiesof visual and auditory organization raises the queslion of whether the perceptual systemutilizes informalion from one sensory modality to organize the perceptualarray in the other modality. In other words, issceneanalysisitself a cross-modalphenomenon? Previously,O'Leary and Rhodes (1984) showed that perceptualsegmentation in one modality could influencethe concomitant segmentation in another modality.They used a display of six dots, three high and three low.The dots were displayed one by one, alternating betweenthe high and low positions and moving from left toright. At slow rates, a single dot appeared to move up anddown, while at faster rates two dots were seen as movinghorizontally, one above the other. A sequence that was perceived as two dots caused a concurrent !uditory sequence to be perceived as two tones as well I at a rate that would yield a single perceptual object when the accompanying visual sequence was perceived as a single object. The number of objects seen thus influenced the number of objects heard. They also found the opposite influence from audition to vision. Segmentation in one modality thus affected segmentation in the other modality. To us, though, it was not clear whether the cross-modal effect was truly perceptual, or whether it occurred becauseparticipants deliberately changed their interpretation of the sounds and dots. It is well known that there is a broad range of rates and tones at which listeners can hear, at will, one or two streams (van Noorden, 1975). O'Leary and Rhodes presented ambiguous sequences,and this raises the possibility that a cross-modal influence was found because perceivers changed their interpretationof the sounds and dots, but the perceptionmay have been the same.For example, participants under the impression of hearing two streams instead of one may have inferred that in vision there should also be two streams instead of one. Such a conscious strategy would explain the observations of the cross-modalinfluence without the need for a direct perceptual link between audition and vision. We pursued this question in a study that led us to observe the freezing phenomenon. We first tried to determine whether the freezing of the display to which an abrupt tone is synchronized is a perceptually genuine effect or not. Previously, Stein, London, Wilkinson, & Price (1996) had shown, with normal subjects, that a sound enhances the perceived visual intensity of a stimulus. The latter seemed to be a close analogue of the freezing phenomenon we wanted to create. However, Stein et al. used a somewhat indirect measure of visual intensity (a visual analogue scale; participants judged the intensity of a light by rotating a dial), and they could not find an enhancement by a sound when the visual stimulus was presented subthreshold. It was therefore unclear whether their effect wastruly perceptual rather than postperceptual. In our experiments, we tried to avoid this difficulty by using a more direct estimate of visual persistence by measuring speededperformance on a detection task. Participants sawa 4 X 4 matrix of flickering dots that was created by rapidly presenting four different displays, each containing four dots in quasirandom positions (Fig. 9.2). Each display on its own was difficult to see, because it was shown only briefly and was immediately followed by a mask. One of the four displays contained a target to be detected. The target consisted of four dots that made up a diamond in the upper left, upper right, lower left, or lower right corner of the matrix. The task of the participants wasto detect the position of the diamond as fast and as accurately as possible. We investigated whether the detectability of VROOMEN AND DE GELDER: PERCEPTUAL EFFECTS OF CROSS-MODAL STIMULATION 147 O . , t4 information about when to expect the target? In a sec- tone4 ." ond experimentwe controlled for this possibilityand . . . ' .. . target 0 '. 3 tone 3 between '.,. 0 , . , . ' , , 'tone , 9.2 used relation thus knew 1 actually worsened. As reported by the par- ticipants, it is probable that the high tone contributed to higher visibility of the distracter display with which it mask was synchronized, thereby increasing interference. . . However, the most important result wasthat we could show that the perceptual organization of the tone Sf- quence determined the cross-modal enhancement , Our introspective FIGURE quence and performance ,., , display 'tone 2 . , . t1 about the temporal and target the high tone. Yet, although the participants were now also given a cue about when to expect the target their ,. ' tone , .,. . were informed high that the target would be presented immediately after D O .. . t2 Participants '.. t synchronized the high tone with a distracter display that immediately preceded presentation of the target Ain Vroomen simplified representation and de Gelder of a (2000). stimulus Big squaresse- was improved only observation was that visual detection when the high tone w as s~grega ted represent the dots shown at time t; small squareswere actually not presented to the viewers but are in the figure to show the position of the dots within the 4 X 4 matrix. The four-dot displays were shown for 97 ms each. Ea~h display was immediately followed by a mask (the full matrIx of 16 dots) for 97 ms, followed by a darkthe blank screen ms. The target display (in this example, diamond infor the60 upper left corner whose from the tone seq~ence. In our. next expenment. we prevented segregation of the hIgh tone by making it part of the beginning of the well-known tune "Frere Jacques." When subjects heard repetitively a low-middie-high-iow tone sequence while seeing th e tar get on th e. th.lr d h' h th . 19 tone, ere was no position had to be detected) was presented at ~. The sequence of the four-dot displays was repeated without interruption until a responsewasgiven. Tones (97 ms in duration) were synchronized with the onset of the four-dot displays. Results showed that when a tone was presented at ~ that seg. enhancement of the VISUaldIsplay. Thus, the perceptual organization of the tone in the sequence increased the visibility of the target display rather than the high tone per se, showing that cross-modal interactions can occur t th I I f al . gating tone waspresented at ~, target detection becameworse How to qualify thenature of cross-modal becausethe visualdistracterat ~ causedmore interference. interactions? regated,target detectIonwasenhanced,presumablybecause the visibilityof the targetdisplaywasincreased.Whena segre- a e eve 0 scene an YSIS. . There was no enhancement when the tone at ~ did not segre- gate. The visibility of a display was thus increasedwhen synchronized with an abrupt We have argued that ventriloquism and the freezing tone. phenomenon the target could be improved by the presentation of an abrupt sound together with the target. The tones were delivered through a loudspeaker under the monitor. Participants in the experimental condition heard a high tone at the target display and a low tone at the other four-dot displays (the distracters). In the control condition, participants heard only low tones. The idea was that the high tone in the sequence of tones segregated from the low tones, and that under the experimental circumstances it would increase the detectability of the target display. The results indeed showed that the target was detected more easily when presentation of the target was synchronized with presentation of the high tone. Subjects were faster and more accurate when a high tone was presented at target onset. Did the high tone simply act as a warning signal, giving subjects 148 PERCEPTUAL CONSEQUENCES OF MULllPLE are two examples of . . mtersensory mterac- tions with consequencesat perceptual processing levels (seealsoVroomen & de Gelder, 2003, for audiovisualinteractions between moving stimuli). They may therefore be likely candidates for showing that cross-modalinteractions can affect primary sensory levels. There is now also some preliminary neurophysiological evidence showing that brain areasthat are usually considered to be unimodal can be affected by input from different modalities. For example, with functional magnetic resonance imaging, Calvert, Campbell, and Brammer (2000) found that lipreading could affect primary auditory cortex. In a similar vein, Macaluso, Frith, and Driver (2000) showed that a tactile cue could enhance neural responsesto avisual target in visual cortex. Giard and Peronnet (1999) also reported that tones synchronized with a visual stirnulus affected event-related potentials (ERPs) in visual cortex. Pourtois, de Gelder, Vroomen, Rossion, and Crommelink (2000) found early modulation of auditory SENSORYSYSTEMS ERPs when facial expressions sented with auditory sentence of emotions were prefragments, the prosodic . content of . . th whIch was congruent WI . Bertelson,P. (1999).Ventriloquism:Acaseofcros&-modalperceptual grouping. In G. Aschersleben, T. Bachmann, . or . MussIer auditory Nl component 110ms than when they were incongruent. at around perceptual sensory areas may & by ~acDo~ald, lIpreadIng the way a sound is heard. In the ventriloquism Ii 1 .. h .h 1976), changes situation, examples of such qualitative as (Sh ~s Ka ams, . ta ". ~I & Sh lll, ' Imojo, multlmodally determIned percepts modal qualia of the sensory input . spa- 2000). changes: 2000) Perception & Psychophysics,62, 616-630. Calvert, G. A, Campbell, R., & Brammer, M. J. (2000). Evidence from functional magnetic resonance imaging of Th . spatial discordance. 321-332. Bregman,A.S. (1990). Auditory sceneanalysis. Cambridge, MA: MIT Press. C 1. ac In, A ., S0 t 0-Faraco, S., Ki ngs t one,.,A &S pence,. C (2002) . Tactile "capture" of audition. Perception & Psychophysics,64, For example, when a single flash of light is accompanied by multiple beeps the light is seen as multiple fl h fusion with auditory-visual of deliberate visual attention. conflicting With emoh £ a appy J.ace, " u voIce IS s own WIt tlons, w hen a J.ear '. the VOIce sounds happIer (de Gelder & Vroomen, There are other & J. of Bertelson, P., Vroomen, J., & de Gelder, B. (2003). Visual re?alibration of auditory speech identification. Psychological Sczence. Bertelson, P., Vroomen,J., de Gelder, B., & Driver,]. (2000). The ventriloquist effect does not depend on the direction when a sound is presented with a spatially light, the location of the sound is changed. . perception Perception & Psychophysics,29, 578-584. alsobe related to the fact that the subjective experience of cross-modal interaction affects the target modality. In ~e M~Gurk e~fect (Mc.curk VIsual InformatIon proVIded the Bertelson, P., Pavani, F., Ladavas, E., Vroomen,J., & de Gelder, B. (2000). Ventriloqui~m in patients with unilateral visual neglect. Neuropsychologza,38, 1634-1642. Bertelson, P., & Radeau, M. (1981). Cros&-modal bias and accesseach other directly. primary to 482-489. with the idea that there is feedlevels to unimodal levels of . . . th dal . . perceptIon or WIth the notIon at sensory mo ItIes between contributions bias of auditory location. PsychonomicBulletin and Review, 5, All these find- ings are in accordance back from multimodal Such cross-talk Cognitive . WIththe face. When the face-voIce paIrs were congruent, there was a bigger (Eds.), . tzal and tem,J, rOTal events (pp. 347 - 362) . Am ster d am: El seVIer. Bertelson, P., & Aschersleben, G. (1998). Automatic visual Incongruent cros&-modal Current es~ Biology, binding 10,649-657. in the human heteromodal cortex. thus have the Ulllfrom the primary Canon, L. K (1970). Intermodality inconsistency of input and directed attention as determinants of the nature of adapta- r modality, and this may be due to the existence of backprojections to the primary sensory areas. Yet this is not to sa that a percept resulting from . . Y .. . cross-modal InteractIons IS, m all Its relevant aspects, tion.journalofExperimentalPsychology, 84, 141-147. de ?elder, B., & Vroomen,]. (~O.OO).The pe~ception of emo~ons by ear and by eye. CognztzonandE~tz~, 1~, 289-~11. Dnver, J. (1996). Enhancement of selectIve lIstenIng by Illusory mislocalization of speech sounds due to lip-reading. r equivalent to its unimodal r r a ventriloquized sound in all its perceptual and neurophysiological relevant dimensions the same as a sound Driver,J., & Spence, C. (1998). Attention and the cros&-modal ?onstruction of space. Trends in Cogniti~e Sciences,2, 254-~62. Ic played from quize Dnver, J., & Spence, C. (2000). MultIsensory perceptIon: Beyond 731-735. modularity and convergence. Current Biology, 10, ,. , combinations (i.e., hearing /ba/ and seeing /ga/) , the auditory component can be dissociated from the per- " ~ ~ the . d soun d was counterpart. For example, is Nature, 381, 66-68. direction from where the ventrilo. . 1us perceIve d r For M c Gur k -l I.k e stImu "' Ettlinger, G., & Wilson, W. A. (1990). Cros&-modal performance: Behavioural processes, phylogenetic considerations ceived component, as the contrast effect in selective speech adaptation is driven mainly by the auditory stim- I d . b th d t fth d ,. al Y e perceIve aspec 0 e au IOVISU . ' .. stimulus combInatIon (Roberts & Summerfield, 1981; u us an Falchier, Vroomen, & de Gelder, 2003). For other cross-modal phenomena such as ventriloquism, number of intriguing questions remain about to w h. . IC h th . 11 e .. I . usory or not, from Its vendlcal percept IS d .. IstInguls . a the h counterpart. " REFERENCES Bedford, F. L. (1999). Keeping perception accurate. Trends in CognitiveSciences,2,4-11. Bertelson, P. (1994). The cognitive architecture behind auditory-visual interaction in scene analysis and speech identification. Cahiers de PsychologieCognitive, 13, 69-75. VROOMEN L., Barone, P., & Kennedy, H. (2001). Thejournal of Neuroscience,27,511-521. striate e, Renaud, Behavioural Brain Research, 40, Falchier, A., Clavagnier, S., Barone, P., & Kennedy, H. (2002). Anatomical evidence of multimodal integration in primate bl a A., mechanisms. .. projections . " ExtensIve from the pnmary auditory cortex and polysensory area STP to peripheral area VI in the macaque. not but see Bertelson, extent and neural 169-192. cortex. The journal of Neuroscience, 22, 5749-5759. F . nssen, I ., Berte 1son, P., Vroomen,., & d e Gelder, B. (2003). The aftereffects of ventriloquism: Are they sound frequency J specific? Acta Psychologica, 113, 315-327. Gi~d, M. ~., & Pe~onnet, F. ~1999). Au~i~orY-;isual integratIon dunng multImodal object recogmtIon In humans: A behavioural and electrophysiological study. journal of Cognitive Neuroscience,11,473-490. Korte, A. (1915). Kinematoscopische Untersuchungen [Kinematoscopic research]. Zeitschrift flir Psychologie der Sinnesorgane, 72, 193-296. AND DE GELDER: PERCEPTUAL EFFECTS OF CROSS-MODAL STIMUlATION 149 Macaluso, E., Frith, C., & Driver, J. (2000). Modulation of human visual cortex by cross-modal spatial attention. Science, 289, 1206-1208. Massaro, D. W. (1998). Perceivingtalking faces: From speech perceptionto a behavioralprinciple. Cambridge: MA: MIT Press. McDonald, J. J., Teder-Salejarvi,W. A., Heraldez, D., & Hillyard, S. A. (2001). Electrophysiological evidence for the "missing link" in cross-modalattention. CanadianJournal of Psychology, 55, 143-151. McGurk, H., & MacDonald,J. (1976). Hearing lips and seeing voices. Nature, 264, 746-748. O'Leary, A., & Rhodes,G. (1984). Cross-modaleffects on visual and auditory object perception. Perception& Psychophysics, 35,565-569. Pourtois, G., de Gelder, B., Vroomen, J., Rossion, B., & Crommelink, M. (2000). The time-<:ourseof intermodal binding between seeing and hearing affective information. NeuroReport,11,1329-1333. Radeau, M. (1994). Auditory-visual spatial interaction and modularity. CahiersdePsychologie Cognitive,13,3-51. Radeau, M., & Bertelson, P. (1974). The after-effects of ventriloquism. The QuarterlyJournal of ExperimentalPsychology, 26,63-71. Roberts, M., & Summerfield, Q. (1981). Audiovisual presentation demonstrates that selective attention to speech is purely auditory. Perception& Psychophysics, 30, 309-314. Rumelhart, D., & McClelland, J. (1982). An interactive activation model of context effects in letter perception: Part 2. The contextual enhancement effect and some tests and extensions of the model. Psychological Review,89, 60-94. Shams,L., Kamitani, Y, & Shimojo, S. (2000). What you seeis what you hear. Nature, 408, 708. distraction and action: MultiPle perspectiveson attentional capture(pp. 231-262). Amsterdam: Elsevier. Spence, C., & Driver, J. (1997a). On measuring selective attention to a specific sensory modality. Perception& Psychophysics, 59, 389-403. Spence,C., & Driver,]. (1997b). Audiovisual links in exogenous Soroker, N., Calamaro, N., & Myslobodsky, M. S. (1995). Ventriloquism reinstates responsivenessto auditory stimuli in the "ignored" spacein patients with hemispatial neglect. Journal of Clinical and Experimental Neuropsychology, 17, 243-255. Spence,C. (2001). Cross-modalattentional capture; A controversy resolved? In C. Folk & B. Gibson (Eds.), Attention, Vroomen,J., & de Gelder, B. (2000). Sound enhances visual perception: Cross-modaleffects of auditory organisation on vision. Journal of ExperimentalPsychology: Human Perception and Performance, 26, 1583-1590. Vroomen,J., & de Gelder, B. (2003). Visual motion influences the contingent auditory motion aftereffect. Psychologica Science, 14,357-361. 150 PERCEPTUAL CONSEQUENCES OF MULllPLE covertspatialattention.Perception & Psychophysics, 59,1-22. Spence, C., & Driver,J. (2000). Attracting attention to the illusory location of a sound: Reflexive cross-modal orienting and ventriloquism. NeuroReport,11,2057-2061. Stein,B. E., London, N., Wilkinson,L. K., &Price,D. D. (1996). Enhancement of perceived visual intensity by auditory stimuli: A psychophysical analysis. Journal of Cognitive Neuroscience, 8, 497-506. Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. CognitivePsychology, 5, 109-137. van Noorden, L. P.A. S. (1975). Temporalcoherence in thepercep. tion of tone sequences. Unpublished doctoral dissertation, Technische Hogeschool Eindhoven, The Netherlands. Vroomen, J., Bertelson, P., & de Gelder, B. (1998). A visual influence in the discrimination of auditory location. Proceedings of the International Conference on Auditory-Visual speechProcessing(AVSP'98) (pp. 131-135). Terrigai-Sydney, Australia: Causal Productions. Vroomen, J., Bertelson, P., & de Gelder, B. (2001a). The. ventriloquist effect does not depend on the direction of automatic visual attention. Perception& Psychophysics, 63, 651-659. . Vroomen,J., Bertelson, P., & de Gelder, B. (2001b). Directing spatial attention towards the illusory 10catioQof a ventriloquized sound. Acta Psychologica, 108,21-33. Vroomen,J., Bertelson, P., Frissen, I., & de Gelder, B. (2003). A spatial gradientin theventriloquismafter-effect. Unpublished manuscript. . SENSORYSYSTEMS