9 Perceptual Effects of Cross-Modal Stimulation: Ventriloquism and

advertisement
9
Perceptual Effects of Cross-Modal
Stimulation: Ventriloquism
and
the Freezing Phenomenon
JEANVROOME~ AND BEATRICEDE GELDER
Introduction
For readers of a book on multimodal perception, it
probablycomes as no surprise that most events in real
lifeconsistof perceptual inputs in more than one modality and that sensory modalities may influence each
oilier.For example, seeing a speaker provides not only
auditoryinformation, conveyed in what is said, but also
visualinformation, conveyed through the movements of
ilie lips,face, and body, as well as visual cues about the
originof the sound. Most handbooks on cognitive psychologypay comparatively little attention to this multimodalstateof affairs, and the different senses-seeing,
hearing,smell, taste, touch-are treated as distinct and
separatemodules with little or no interaction. It is
becomingincreasingly clear, however,that when the differentsensesreceive correlated input about th~ same
externalobject or event, information is often combined
by the perceptual system to yield a multimodally
determinedpercept.
An important issue is to characterize such multisengoryinteractions and their cross-modal effects. THere
areat leastthree different notions at stake here. One is
iliat information is processed in a hierarchical and
strictlyfeed-forwardfashion. In this view, information
fromdifferent sensory modalities converges in a multimodalrepresentation in a feed-forward way. For exampIe,in the fuzzy logic model of perception (Massaro,
1998),degrees of support for different alternatives
fromeach modality-':'-say,audition and vision-are determinedand then combined to give an overall degree
of support.Information is propagated in a strictly feedforwardfashion so that higher-order multimodal representationsdo not affect lower-order sensory-specific
representations.
There is thus no cross-talkbetween the
sensory
modalities such that, say,vision affetts early processing
stagesof audition or vice versa. Cross-modal interactionsin feed-forward models take place only at or
beyondmultimodal stages.An alternative possibility is
that multimodal representations send feedbackto primary sensory levels (e.g., Driver & Spence, 2000). In
this view, higher-order multimodallevels can affect sensory levels. Vision might thus affect audition, but only
via multimodal representations. Alternatively, it may
also be the casethat cross-modalinteractions take place
without multimodal representations. For example, it
may be that the sensesaccesseach other directly from
their sensory-specificsystems(e.g., Ettlinger & Wilson,
1990). Vision might then affect audition without the
involvement of a multimodal representation (see, e.g.,
Falchier, Clavagnier, Barone, & Kennedy, 2002, for
recent neuroanatomical evidence showing that there
are projections from primary auditory cortex to the
visual area VI).
The role of feedback in sensory processing has been
debated for a longtime (e.g., the interactive activation
model of reading by Rumelhart and McClelland, 1982).
However, as far as cross-modal effects are concerned,
there is at present no clear empirical evidence that allows distinguishing among feed-forward, feedback, and
direct-accessmodels. Feed-forward models predict that
early sensory processing levels should be autonomous
and unaffected by higher-order processing levels,
whereas feedback or direct-access models would, in
principle, allow that vision affects auditory processes,or
vice versa. Although this theoretical distinction seems
straightforward, empirical demonstration in favor of
one or another alternative has proved difficult to obtain. One of the main problems is to find measuresthat
are sufficiently unambiguous and that can be taken as
pure indices of an auditory or visual sensory process.
Among the minimal requirements for stating that a
cross-modaleffect has perceptual consequencesat early
sensory stages, the phenomenon should at least be
(1) robust, (2) not explainable as a strategic effect, and
(3) not occurring at response-relatedprocessing stages.
If one assumes stagewise processing, with sensation
coming before attention (e.g. the "late selection" view
141
of attention), one might also want to argue that
(4) cross-modal effects should be pre-attentive. If these
minimal criteria are met, it becomes at least likely that
cross-modal interactions occur at early perceptual
processing stages, and thus that models that allow'
access to primary processing levels (i.e., feedback or
direct-accessmodels) better describe the phenomenon.
In our work on cross-modalperception, we investigated
the extent to which such minimal criteria apply to some
casesof audiovisual perception. One caseconcerns a situation in which vision affects the localization of a sound
(the ventriloquism effet.t) , the other a situation in
which an abrupt sound affects visual processing of a
rapidly presented visual stimulus .(the freezing phenomenon). In Chapter 36 of this volume, we describe
the case of cross-modal interactions in affect perception. Each of these phenomena we consider to be based
on cross-modal interactions affecting early levels of
perception.
ecologically useful, because spatial resolution in the
visual modality is better than in the auditory one.
However, there are also other, more trivial explanations of the ventriloquism effect. One alternative is similar to Stroop task interference: when two conflicting
stimuli are presented together, such as when the word
blue is written in red ink, there is competition at the
level of response selection rather than at a perceptual
level per se. Stroop-like response competition may also
be at play in the ventriloquism effect. In that case,there
would be no real attraction between sound and vision,
but the ventriloquist illusion would be derived from the
fact that~ubjects sometimes point to the visual stimulus
instead of the sound by mistake. Strategic or cognitive
factors may also playa role. For example, a subject may
wonder why sounds and lights are presented from different locations, and then adopt a postperceptual responsestrategy that satisfiesthe experimenter's ideasof
the task (Bertelson, 1999). Of course, these possibilities
are not exclusive, and the experimenter should find
Vision affecting sound localization:
waysto check or circumvent them.
The ventriloquism effect
In our research, we dealt with these and other aspects
of the ventriloquist situation in the hope of showing
Presenting synchronous auditory and visual informathat the apparent location of a sound is indeed shifted
tion in slightly separate locations creates the illusion
at a perceptual level of auditory spaceperception. More
that the sound is coming from the direction of the vispecifically,we askedwhether a ventriloquism effect can
sual stimulus. Although the effect is smaller, shifting of
be observed (1) when subjects are explicitly trained to
the visual percept in the direction of the sound has, at
ignore the visual distracter; (2) when cognitive strateleast in some studies, also been observed (Bertelson &
gies of the subject to respond in a particular way can be
Radeau, 1981). The auditory shift is usually measured
exclud~d; (3) when the visual distracter is not attended,
by asking subjects to localize the sound by pointing or
either endogenously or exogenously; and (4) when the
by fixating the eyes on the apparent location of the
visual distracter is not seen consciously. We also asked
sound. When localization responsesare compared with
(5) whether the ventriloquism effect as such is possibly
a control condition (e.g., a condition in which the
a pre-attentive phenomenon.
sound is presented in isolation without a visual stimu1. A visual distractercannot beignored In a typical venIus), a shift of a few degreesin the direction of the visual
triloquism situation, subjects are asked to locate a
stimulus is usually observed. Such an effect, produced
sound while ignoring a visual distracter. Typically, subby audiovisual spatial conflict, is called the ventriloquism jects remain unaware of how well they perform during
effect,becauseone of the most spectacular everyday exthe experiment and how well they succeed~in obeying
amples is the illusion created by performing ventriloinstructions. In one of our experiments, though, we
quists that the speech they produce without visible faaskedwhether it is possible to train subjectsexplicitly to
cial movements comes from a puppet they agitate in
ignore the visual distracter (Vroomen, Bertelson, & de
synchrony with the speech.
Gelder, 1998). If despite training it is impossible to igA standard explanation of the ventriloquism effect is
nore the visual distracter, this speaksto the robustness
the following: when auditory and visual stimuli occur in
of the effect. Subjects were trained to discriminate
close temporal and spatial proximity, the perceptual sys- among sequencesof tones that emanated either from a
tem assumesthat a single event has occurred. The percentral location only or from alternating locations, in
ceptual systemthen tries to reduce the conflict between
which case two speakers located next to a computer
the location of the visual and auditory data, because
screen emitted the tones. With no visual input, this
there is an a priori constraint that an object or event can
same/ different location task wasvery easy,becausethe
have only one location (e.g., Bedford, 1999). Shifting
difference between the central and lateral locations was
the auditory location in the direction of the visual event
clearly noticeable. However, the task was much more
rather than the other way around would seem to be 'difficult when, in synchrony ~th the tones, light flashes
I
142
PERCEPTUAL CONSEQUENCES OF MULTIPLE SENSORYSYSTEMS
I
t
I.
I
werealternatedleft
and right on a computer screen.
occur was the dependent variable. In the study by
Bertelson and Aschersleben, sounds were delivered together with a visual distracter, a light-emitting diode
(LED) flash, in a central location. When the LED flash
was synchronized with the sound, response reversal occurred earlier than when the light was desynchronized
from the sound. Apparently, the synchronized LED
flashes attracted the apparent location of the sound toward its central location, so that response reversal occurred earlier on the staircase.Similar results have now
werethat the larger the separationbetweenthe lights,
themor~ false alarms occurred (subjectsrespon~ed
been reported by Caclin, Soto-Faraco,Kingstone,and
S~ence(2002) showi~gthat a centrally located ~ctile
! "alternatlngsound" to a centrally presented ventnlor
I
The location at which theseresponsereversalsbeganto
This condition created the strong impression that
soundsfrom the central location now alternated betweenleft and right, presumably because the light
flashesattracted the apparent location of the sounds.
Wethen tried to train subjects to discriminate centrally
presentedbut ventriloquized sounds from sounds that
alternated physically between the left and right.
Subjects
were instructed to ignore the lights as much as
possible(but without closing their eyes), and they receivedcorrective feedback after each trial. The results
quizedsound), presumablybecausethe farther apart
thelightswere,the farther apart wasthe perceived10-
stImulus attracts a perIpheral sound toward the mIddle.
It is important to note that there is no waysubjectscan
figure out a responsestrategythat might lead to this re-
cationof the sounds. Moreover, although feedback was
suIt, because once response reversal begins to occur,
providedafter each trial, performance did not improve
subjects no longer know whether a sound belongs to a
in the courseof the experiment. Instructions and feedleft or a right staircase.A conscious response strategyin
backthus could not overcome the effect of the visual
this situation is thus extremely unlikely to account for
distracteron sound localization, which indicates that
the effect.
theventriloquism effect is indeed very robust.
Conscious strategies are also unlikely to playa role
2. A ventriloquismeffectis obtainedeven when cognitive when the effect of presenting auditory and visual stimuli
strategies
can be excluded When subjects are asked to
at separate locations is measured as an aftereffect. The
pointto the location of auditory stimuli while ignoring
aftereffect is a shift in the apparent 10catiQJlof unimodally
spatiallydiscrepant visual distracters, they may be aware
presented acoustic stimuli consequent on exposure
of the spatial discrepancy and adjust their response acto synchronous but spatially disparate auditory-visual
cordingly.The visual bias one obtains may then reflect
stimulus pairs. Initial studiesused prisms to shift the relapostperceptualdecisions rathe.r than genuine perceptive locations of visual and auditory stimuli (Canon, 1970;
tualeffects.However, contamination by strategies can
Radeau & Bertelson, 1974). Participants localized
bepreventedwhen ventriloquism effects are studied via
acoustic targets before and after a period of adaptation.
a staircaseprocedure or as an aftereffect.
During the adaptation phase, there was a mismatch
Bertelsonand Aschersleben (1998) were the first to
between the spatial locations of acoustic and visual
applythe staircaseprocedure in the ventriloquism situstimuli. Typically, between pre- and posttest a shift of
ation.The advantage of a staircaseprocedure is that it
about 1-4 degreeswasfound in the direction of the visual
is not transparent, and the effects are therefore more
attractor. Presumably, the spatial conflict between
likely to reflect genuine perceptual processes. In
auditory and visual data during the adaptation phasewas
the staircase procedure used by Bertelson and
resolved by recalibration of the perceptual system,and
Aschersleben,
subjectshad to judge the apparent origin
this/alteration lasted long enough to be detected as
ofa stereophonicallycontrolled sound asleft or right of
aftereffects. It should be noted that aftereffects are
amedianreference point. Unknown to the subjects,the
measured by comparing unimodal pointing to a sound
locationof the sound waschanged asa function of their
before and after an adaptation phase. Stroop-like
judgment,following the principle of the psychophysical
response competition between the auditory target and
staircase.
Mter a "left" judgment, the next sound on the
visual distractor during the test situation thus plays no
samestaircasewasmoved one step to the right, and vice
role, becausethe test sound is presented without a visual
versa.A staircasestarted with sounds coming from an
distracter. Moreover, aftereffects are usually obtained
extremeleft or an extreme right position. At that stage,
when the spatial discrepancybetween auditory and visual
correctresponseswere generally given on each succes- stimuli is sosmall that subjectsdo not evennotice the sepsivetrial, so that the target sounds moved progressively
aration (Frissen,Bertelson, Vroomen, & de Gelder, 2003;
towardthe center. Then, at some point, response
reversal5 Radeau & Bertelson, 1974;Vroomen et al., 2003; Woods
(responsesdifferent from the preceding one on the
& Recanzone, Chap. 3, this volume). Mtereffects can
samestaircase)began to occur. Thereafter, the subjects
therefore be interpreted astrue perceptual recalibration
wereno longer certain about ~e location of the sound.
effects (e.g., Radeau, 1994).
VROOMEN AND DE GELDER: PERCEPTUAL EFFECTS OF CROSS-MODAL STIMULATION
143
3. Attention towardthevisual distracteris not needed
to obtain
a ventriloquismeffect A relevant question is whether
attention plays a role in the ventriloquism effect. One
could argue, asTreisman and Gelade (1980) have done
in feature-integration theory for the visual modality, that
focused attention might be the glue that combines fea-
sensory modality rather than another regardless of
location (Spence& Driver, 1997a), or one may attend to
one particular location rather than another regardless
of modality. Furthermore, in the literature on spatial
attention, two different means of the allocation of
attention are generally distinguished. First, there is an
tures across modalities. Could it be, then, that when a
endogenousprocess by which attention can be moved vol- !
sound and visual distracter are attended, an integrated
cross-modal event is perceived, but when they are
unattended, two separate events are perceived that do
not interact? If so, one might predict that a visual
distracter would have a stronger effect on the apparent
location of a sound when it is focused upon.
We considered the possibility that ventriloquism indeed requires or is modulated by this kind of focused
attention (Bertelson, Vroomen, de Gelder, & Driver,
2000). The subjects' task was to localize trains of tones
while monitoring visual events on a computer screen.
On experimental trials, a bright square appeared on the
left or the right of the screen in exact synchrony with
the tones. No square appeared on control trials. The
attentional manipulation consisted of having subjects
monitor either the center of the display, in which case
the attractor square was in the visual periphery, or the
lateral square itself for occasional occurrences of a
catch stimulus (a very small diamond that could only be
detected when in the fovea). The attentional hypothesis
predicts that the attraction of the apparent location of
the sound by the square would be stronger with attention focused on the attractor square than with attention
focused on the center. In fact, though, equal degrees of
attraction were obtained in the two attention conditions. Focused attention thus did not modulate the
ventriloquist effect.
However,the effect of attention might havebeen small
and overruled by the bottom-up information from the
laterally presented visual square. What would happen
when the bottom-up information wasmore ambiguous?
Would an effect of attention then appear? In a second
experiment, we used bilateral squaresthat were flashed
in synchrony with the sound so as to provide competing
visual attractors. When the two squares were of equal
size, auditory localization was unaffected by which side
participants monitored for visual targets, but when one
square was larger than the other, auditory localization
was reliably attracted toward the bigger square, again
regardlessof where visual monitoring wasrequired. This
led to the conclusion that the ventriloquism effect
largely reflects automatic sensoryinteractions, with little
or no role for focused attention.
In discussinghow attention might influence ventriloquism, though, one must distinguish several sensesin
which the term attentionis used. One may attend to one
untarily. Second, there is an automatic or exogenous
mechanism by which attention is reoriented automatically to stimuli in the environment with some special
features. The study by Bertelson, Vroomen, et al. (2000)
manipulated endogenous attention by asking subjects
to focus on one or the other location. Yet it may have
been the case that the visual distracter received a certain amount of exogenous attention independent of
where the subject was focusing. For that reason, one
might ask whether capture of exogenous attention by
the visual distracter is essential to affect the perceived
location of a sound.
To investigate this possibility, we tried to create a situation in which exogenous attention wascaptured in one
direction, whereas the apparent location of a sound was
ventriloquized in the other direction (Vroomen,
Bertelson, & de Gelder, 2001a). Our choice was influenced by earlier data showing that attention can be captured by a visual item differing substantially in one or
several attributes (e.g., color, form, orientation, shape)
from a set of identical items among which it is displayed
(e.g., Treisman & Gelade, 1980). The unique item is
sometimes called the singleton, and its influence on
attention is referred to as the singleton effect. Ifventriloquism is mediated by exogenous attention, presenting
a sound in synchrony with a display that contains a singleton should shift the apparent location of the sound
toward the singleton. Consequently, finding a singleton
that would not shift the location of a sound in its direction would provide evidence that exogenous attention
can be dissociated from ventriloquism.
We used a psychophysical staircase procedure as in
Bertelson and Aschersleben (1998). The occurrence of
visual bias was examined by presenting a display in synchrony with the sound. We tried to shift the apparent 10cation of the sound in the opposite direction of the singleton by using a display that consisted of four
horizontally aligned squares: two big squares on one
side, and a big square and a small square (the singleton) on the other side (Fig. 9.1). The singleton was
either in the far left or in the far right position. A visual
bias dependent on the position of the singleton should
manifest itself at the level of the locations at which reversalsbegin to occur on the staircasesfor the two visual
displays. If, for instance, the apparent location of the
sound were attracted toward the singleton, reversals
144
PERCEPTUAL CONSEQUENCES OF MULTIPLE SENSORYSYSTEMS
I
FIGURE
9.1 An example of one of the displays used by
Vroomen,Bertelson, and de Gelder (2001a). Subjects saw.
"
lour
(th
squares,
one
e
.
SIng
1
t
e
)
all
on
sm
th
er
th
an
th
e
0
war
e
SIng
eton.
n
contro
trl
s,
'
!twas
fi
d
oun
th
th
at
or right of the central reference. With this mixed de..'
sIgn,
results
were
stIll
ers.
Whenthe display was flashed on a computer screen, subjects
hearda stereophonicallycontrolled sound whose location
hadto bejudged as left or right of the median fixation cross.
Theresultsshowed that the apparent location of the sound
wasshiftedin the direction of the two big squares,and not tod th . I
0
I .
al
the singleton. In the control condition, the display consisted of four equally sized big squares. Discrimination
performance wasworse in the singleton condition than
in the control condition, thus showing that attention
wasattracted awayfrom the target letter and toward the
singleton. Nevertheless,one might still argue that a singleton in the sound localization task did not capture attention becausesubjects were paying attention to audition, not vision. In a third experiment, we therefore
randomized sound localization trials with visual X/O
discrimination trials so that subjectsdid not know in advance which task they had to perform. When subjects
sawan X or an 0, they pressed as fast as possible a corresponding key; otherwise, when no letter wasdete~ted,
they decided whether the sound had come from the left
e
.
sm-.
gletonattractedvisual attention. The direction in which
asoundwasventriloquizedwasthus dissociatedfrom where
exogenous
attentionwascaptured.
f
,
wouldfirst occur at locations more to the left for the displaywith the singleton on the right than for the display
withthe singleton on the left.
The results of this experiment were very straightforward.The apparent origin of the sound was not shifted
towardthe singleton but in the opposite direction, toward the two big squares. Apparently, the two big
squares
on one side of the display were attracting the
apparentorigin of the sound more strongly than the
smalland big square at the other side. Thus, the attractor sizeeffect that we previously obtained (Bertelson,
Vroomen,de Gelder, & Driver, 2000) occurred with the
presentvisual display aswell. This result thus suggested
thatattraction of a sound was not mediated through
exogenous
attention capture. However, befor~ that conclusioncould be drawn, it was necessaryto check that
thevisual display had the capacity to attract attention
towardthe singleton. We therefore ran a control experiment in which the principle was to measure the
attention-attracting capacity of the small square
through its effect on the discrimination of targets presentedelsewhere in the display. In the singleton condition, participants were shown the previously used displaywith the three big squares and the small one. A
targetletter X or 0, calling for a choice reaction, was
displayedin the most peripheral big square opposite
exactly
as
before.
attentIon
was
..
at.
tracted toward the smgleton whIle the sound was shIfted
awayfrom the singleton. Strategic differences between
an auditory and a visual task were thus unlikely to explain the result. Rather we demonstrated a dissociation
"
between
ventriloquIsm
'
.
and
exogenous
attentIon.
.
the
.
apparent locatIon of the sound was shIfted toward the
two big squares (or the "center of gravity" of the visual
display), while the singleton attracted exogenous attention. The findings from the studies concerning the role
of exogenous attention together with those of the earlier one showing the independence of ventriloquism
from the direction of endogenous attention (Bertelson,
Vroomen,- et. al., 2000) thus support the conclusion
that ventriloquism is not affected by the direction of
attention.
4. The ventriloquismeffectis still obtainedwhen the visual
distracter is not seen consciously The conclusion that
attention is not needed to obtain a ventriloquism effect
is further corroborated by our work on patients with
unilateral visual neglect (Bertelson, Pavani, Ladavas,
Vroomen, & de Gelder, 2000). The neglect syndrome is
usually interpreted as an attentional deficit and
reflected in a reduced capacity to report stimuli in the
contralateral side (usually the left). Previously, it had
been reported that ventriloquism could improve the attentional deficit. Soroker, Calamaro, and Myslobodsky
(1995) showed that inferior identification of syllables
delivered through a loudspeaker on the left (auditory
neglect) could be improved when the same stimuli on
the left were administered in the presence of a fictitious
loudspeaker on the right. The authors attributed this
improvement to a ventriloquism effect, even though
their setting wasvery different from the usual ventriloquism situation. That is, their visual stimulus was
stationary, whereas typically the onset and offset of an
auditory and visual stimulus are synchronized. The
VROOMENAND DE GELDER:PERCEPTUALEFFECTSOF CROSS-MODAL
STIMULATION
145
effect was therefore probably mediated by higher-order
knowledge about the fact that sounds can be delivered
through loudspeakers.
In our research, we used the more typical ventriloquism situation (a light and a sound presented simultaneously) and asked whether a visual stimulus that remains undetected because it is presented in the
neglected field neverthelessshifts the apparent location
of the sound toward its location. This may occur becausealthough perceptual awarenessis compromised in
neglect, much perceptual processing can still proceed
unconsciously for the affected side. Our results were
clear: patients with left visual neglect consistently failed
to detect a stimulus presented in their left visual field;
nevertheless,their pointing to a sound wasshifted in the
direction of the visual stimulus. This is thus another
demonstration that ventriloquism is not dependent on
attention or even awarenessof the visual distracter.
5. The ventriloquismeffectis a pre-attentivePhenomenon
The previous studiesled us to conclude that cross-modal
interactions take place without the need of attention.
This stageis presumably one concerned with the initial
analysisof the spatial scene (Bertelson, 1994). The presumption receivesadditional support from the findings
by Driver (1996) in which the visual bias of auditory
location was measured in the classic "cocktail party"
situation through its effect in facilitating the focusing of
attention on one of two simultaneous spoken messages.
Subjects found the shadowing task easier when the
apparent location of the target sound wasattracted away
from the distracter by a moving face. This result thus implies that focused attention operates on a representation of the external scenethat has already been spatially
reorganized by cross-modalinteractions.
We asked whether a similar cross-modal reorganization of external space occurs when exogenous rather
than focused attention is at stake (Vroomen, Bertelson,
& de Gelder, 2001b). To do so, we used the orthogonal
cross-modal cuing task introduced by Spence and
Driver (1997b). In this task, participants are asked to
judge the elevation (up vs. down, regardless of whether
it is on the left or right of fixation) of peripheral targets
in either audition, vision, or touch following an uninformative cue in either one of these modalities. In general, cuing effects (i.e., faster responseswhen the cue is
on the same side as the target) have been found across
all modalities, except that visual cues do not affect responsesto auditory targets (Driver & Spence, 1998; but
see McDonald, Teder-Salejirvi, Heraldez, & Hillyard,
2001; Spence,2001). This then opens an intriguing possibility: What happens with an auditory cue whose
veridical location is in the center but whose apparent 10cation is ventriloquized toward a simultaneous light in
146
the periphery? Can such a ventriloquized cue affect responses to auditory targets? The ventriloquized cue
consisted of a tone presented from an invisible central
speaker synchronized with a visual cue presented on the
left or right. Depending on the stimulus onset asynchrony (SOA)-100, 300, or 500 ms-a target sound
(white noise bursts) was delivered with equal probabilities from one of the four target speakers.Subjectsmade
a speeded decision about whether the target had been
delivered through one of the upper speakersor one of
the lower speakers.Results showed that visual cues had
no effect on auditory target detection (see also Spence
& Driver, 1997b). More important, ventriloquized cues
had no cuing effect at 100 ms SOA, but the facilitatory
effect appeared at 300 and 500 ms SOA. This suggests
that a ventriloquized cue directed auditory exogenous
attention to the perceived rather than the physical auditory location, implying that the cross-modal interaction between vision and audition reorganized spaceon
which auditory exogenous attention operates. Spence
and Driver (2000) reported similar cuing effects (with a
somewhat different time course) in their study of ventriloquized cuesin the vertical dimension. They showed
that a visual cue presented above or below fixation led
to a vertical shift of auditory attention when it was
paired with a tone presented at fixation. Attentional
capture can thus be directed to the apparent location of
a ventriloquized sound, suggesting that cross-modalintegration precedes or at least co-occurs with reflexive
shifts of covert attention.
To summarize the case of ventriloquism, the apparent location of a sound can be shifted in the direction of
a visual stimulus that is synchronized with the sound. It
is unlikely that this robust effect can be explained solely
by voluntary response strategies,as it is obtained with a
psychophysical staircase procedure and can be observed as an aftereffect. The effect is even obtained
when the visual distracter is not seen consciously,asin
patients with hemineglect. Moreover, the ventriloquism
effect does not require attention because it is not
affected by whether a visual distracter is focused on or
not, and the direction of ventriloquism can be dissociated from where visual attention is captured. In fact, the
ventriloquism effect may be a pre-attentive phenomenon, becauseauditory attention can be captured at the
ventriloquized location of a sound. Taken together,
these observations indicate that the ventriloquism
effect is perceptually "real."
Soundaffectingvision: Thefreezingphenomenon
Recently we described a case of sound affecting vision
(Vroomen & de Gelder, 2000). The basic phenomenon
PERCEPTUALCONSEQUENCES
OF MULllPLE SENSORY
SYSTEMS
r::e
described as follows: when subjects are shown a
rapidly changing visual display, an abrupt sound may
"freeze"the display with which the sound is synchronized.Perceptually, the display appears brighter or appearsto be shown for a longer period of time. We describedthis phenomenon against the background of
sceneanalysis.
Scene
analysisrefers to the concept that information
reachingthe sense organs is parsed into objects and
events.In vision, scene analysissucceedsdespite partial
occlusionof one object by the other, the presence of
shadows
extending acrossobject boundaries, and deformationsof the retinal image produced by moving objects.Vision is not the only modality in which object segregationoccurs. Auditory object segregation has also
beendemonstrated (Bregman, 1990). It occurs, for instance,when a sequence of alternating high- and lowfrequencytones is played at a certain rate. When the frequencydifference between the tones is small, or when
thetones are played at a slow rate, listeners are able to
followthe entire sequence of tones. But at bigger frequencydifferences or higher rates, the sequence splits
into two streams, one high and one low in pitch.
Althoughit is possible to shift attention between the two
streams,
it is difficult to report the order of the tones in
theentire sequence. Auditory stream segregation, like
apparentmotion in vision, appears to follow Korte's
third law (Korte, 1915). When the difference in frequencybetween the tones increases,stream segregation
occursat longer SOAs.
Bregman (1990) described a number of Gestaltprinciplesfor auditory scene analysis in which he stressed
the resemblance between audition and vision, since
principlesof perceptual organization such as similarity
(in volume, timbre, spatial location), good continualion,and common fate seem to play similar roles in the
twomodalities. Such a correspondence between principiesof visual and auditory organization raises the queslion of whether the perceptual systemutilizes informalion from one sensory modality to organize the
perceptualarray in the other modality. In other words,
issceneanalysisitself a cross-modalphenomenon?
Previously,O'Leary and Rhodes (1984) showed that
perceptualsegmentation in one modality could influencethe concomitant segmentation in another modality.They used a display of six dots, three high and three
low.The dots were displayed one by one, alternating betweenthe high and low positions and moving from left
toright. At slow rates, a single dot appeared to move up
anddown, while at faster rates two dots were seen as
movinghorizontally, one above the other. A sequence
that was perceived as two dots caused a concurrent
!uditory sequence to be perceived as two tones as well
I
at a rate that would yield a single perceptual object
when the accompanying visual sequence was perceived
as a single object. The number of objects seen thus
influenced the number of objects heard. They also
found the opposite influence from audition to vision.
Segmentation in one modality thus affected segmentation in the other modality. To us, though, it was not
clear whether the cross-modal effect was truly perceptual, or whether it occurred becauseparticipants deliberately changed their interpretation of the sounds and
dots. It is well known that there is a broad range of rates
and tones at which listeners can hear, at will, one or two
streams (van Noorden, 1975). O'Leary and Rhodes presented ambiguous sequences,and this raises the possibility that a cross-modal influence was found because
perceivers changed their interpretationof the sounds and
dots, but the perceptionmay have been the same.For example, participants under the impression of hearing
two streams instead of one may have inferred that in
vision there should also be two streams instead of one.
Such a conscious strategy would explain the observations of the cross-modalinfluence without the need for
a direct perceptual link between audition and vision.
We pursued this question in a study that led us to observe the freezing phenomenon. We first tried to determine whether the freezing of the display to which an
abrupt tone is synchronized is a perceptually genuine
effect or not. Previously, Stein, London, Wilkinson, &
Price (1996) had shown, with normal subjects, that a
sound enhances the perceived visual intensity of a stimulus. The latter seemed to be a close analogue of the
freezing phenomenon we wanted to create. However,
Stein et al. used a somewhat indirect measure of visual
intensity (a visual analogue scale; participants judged
the intensity of a light by rotating a dial), and they could
not find an enhancement by a sound when the visual
stimulus was presented subthreshold. It was therefore
unclear whether their effect wastruly perceptual rather
than postperceptual. In our experiments, we tried to
avoid this difficulty by using a more direct estimate of visual persistence by measuring speededperformance on
a detection task. Participants sawa 4 X 4 matrix of flickering dots that was created by rapidly presenting four
different displays, each containing four dots in quasirandom positions (Fig. 9.2). Each display on its own was
difficult to see, because it was shown only briefly and
was immediately followed by a mask. One of the four
displays contained a target to be detected. The target
consisted of four dots that made up a diamond in the
upper left, upper right, lower left, or lower right corner
of the matrix. The task of the participants wasto detect
the position of the diamond as fast and as accurately as
possible. We investigated whether the detectability of
VROOMEN AND DE GELDER: PERCEPTUAL EFFECTS OF CROSS-MODAL STIMULATION
147
O
. ,
t4
information about when to expect the target? In a sec-
tone4
."
ond experimentwe controlled for this possibilityand
. . . '
..
.
target
0
'.
3
tone 3
between
'.,.
0
, . , .
'
,
,
'tone
,
9.2
used
relation
thus knew
1
actually worsened. As reported by the par-
ticipants, it is probable that the high tone contributed
to higher visibility of the distracter display with which it
mask
was synchronized, thereby increasing interference.
. .
However, the most important result wasthat we could
show that the perceptual
organization
of the tone Sf-
quence determined the cross-modal enhancement
,
Our introspective
FIGURE
quence
and
performance
,.,
,
display
'tone 2
. , .
t1
about the temporal
and target
the high tone. Yet, although the participants were now
also given a cue about when to expect the target their
,.
'
tone
,
.,.
.
were informed
high
that the target would be presented immediately after
D
O ..
.
t2
Participants
'..
t
synchronized the high tone with a distracter display
that immediately preceded presentation of the target
Ain Vroomen
simplified
representation
and
de Gelder
of a
(2000).
stimulus
Big squaresse-
was
improved
only
observation was that visual detection
when
the
high
tone
w as s~grega
ted
represent the dots shown at time t; small squareswere actually
not presented to the viewers but are in the figure to show the
position of the dots within the 4 X 4 matrix. The four-dot displays were shown for 97 ms each. Ea~h display was immediately followed by a mask (the full matrIx of 16 dots) for 97 ms,
followed
by a darkthe
blank
screen
ms. The
target display
(in this example,
diamond
infor
the60
upper
left corner
whose
from the tone seq~ence. In our. next expenment. we
prevented segregation of the hIgh tone by making
it part of the beginning
of the well-known tune
"Frere Jacques." When subjects heard repetitively a
low-middie-high-iow
tone sequence while
seeing
th e tar get on th e. th.lr d h'
h
th
. 19 tone,
ere was no
position had to be detected) was presented at ~. The sequence of the four-dot displays was repeated without interruption until a responsewasgiven. Tones (97 ms in duration)
were synchronized with the onset of the four-dot displays.
Results showed that when
a tone was presented at ~ that seg.
enhancement of the VISUaldIsplay. Thus, the perceptual
organization of the tone in the sequence increased the
visibility of the target display rather than the high tone
per se, showing that cross-modal interactions can occur
t th I
I f
al .
gating tone waspresented at ~, target detection becameworse
How to qualify thenature of cross-modal
becausethe visualdistracterat ~ causedmore interference.
interactions?
regated,target detectIonwasenhanced,presumablybecause
the visibilityof the targetdisplaywasincreased.Whena segre-
a
e eve 0 scene an YSIS.
.
There was no enhancement when the tone at ~ did not segre-
gate. The visibility of a display was thus increasedwhen
synchronized
with
an
abrupt
We have argued that ventriloquism and the freezing
tone.
phenomenon
the target could be improved by the presentation of an
abrupt sound together with the target. The tones were
delivered through a loudspeaker under the monitor.
Participants in the experimental condition heard a high
tone at the target display and a low tone at the other
four-dot displays (the distracters). In the control condition, participants heard only low tones. The idea was
that the high tone in the sequence of tones segregated
from the low tones, and that under the experimental
circumstances it would increase the detectability of the
target display. The results indeed showed that the target
was detected more easily when presentation of the target was synchronized with presentation of the high
tone. Subjects were faster and more accurate when a
high tone was presented at target onset. Did the high
tone simply act as a warning signal, giving subjects
148
PERCEPTUAL CONSEQUENCES OF MULllPLE
are
two
examples
of
.
.
mtersensory
mterac-
tions with consequencesat perceptual processing levels
(seealsoVroomen & de Gelder, 2003, for audiovisualinteractions between moving stimuli). They may therefore
be likely candidates for showing that cross-modalinteractions can affect primary sensory levels. There is now
also some preliminary neurophysiological evidence
showing that brain areasthat are usually considered to be
unimodal can be affected by input from different modalities. For example, with functional magnetic resonance
imaging, Calvert, Campbell, and Brammer (2000) found
that lipreading could affect primary auditory cortex. In a
similar vein, Macaluso, Frith, and Driver (2000) showed
that a tactile cue could enhance neural responsesto avisual target in visual cortex. Giard and Peronnet (1999)
also reported that tones synchronized with a visual stirnulus affected event-related potentials (ERPs) in visual
cortex. Pourtois, de Gelder, Vroomen, Rossion, and
Crommelink (2000) found early modulation of auditory
SENSORYSYSTEMS
ERPs when facial expressions
sented with auditory
sentence
of emotions
were prefragments,
the prosodic
.
content
of
.
.
th
whIch
was
congruent
WI
.
Bertelson,P. (1999).Ventriloquism:Acaseofcros&-modalperceptual grouping. In G. Aschersleben, T. Bachmann,
.
or
.
MussIer
auditory
Nl
component
110ms than when they were incongruent.
at around
perceptual
sensory
areas may
&
by
~acDo~ald,
lIpreadIng
the way a sound is heard. In the ventriloquism
Ii 1
..
h
.h
1976),
changes
situation,
examples
of such qualitative
as
(Sh
~s
Ka
ams,
.
ta
".
~I
&
Sh
lll,
'
Imojo,
multlmodally determIned
percepts
modal qualia of the sensory input
.
spa-
2000).
changes:
2000)
Perception & Psychophysics,62,
616-630.
Calvert, G. A, Campbell, R., & Brammer, M. J. (2000).
Evidence from functional magnetic resonance imaging of
Th
.
spatial discordance.
321-332.
Bregman,A.S. (1990). Auditory sceneanalysis. Cambridge, MA:
MIT Press.
C 1.
ac In, A ., S0 t 0-Faraco, S., Ki ngs t one,.,A &S pence,. C (2002) .
Tactile "capture" of audition. Perception & Psychophysics,64,
For example, when a single flash of light is accompanied by multiple
beeps the light is seen as multiple
fl h
fusion with auditory-visual
of deliberate visual attention.
conflicting
With emoh
£
a appy J.ace,
" u voIce IS s own WIt
tlons, w hen a J.ear
'.
the VOIce sounds happIer (de Gelder & Vroomen,
There are other
& J.
of
Bertelson, P., Vroomen, J., & de Gelder, B. (2003). Visual
re?alibration of auditory speech identification. Psychological
Sczence.
Bertelson, P., Vroomen,J., de Gelder, B., & Driver,]. (2000).
The ventriloquist effect does not depend on the direction
when a sound is presented with a spatially
light, the location of the sound is changed.
.
perception
Perception & Psychophysics,29, 578-584.
alsobe related to the fact that the subjective experience
of cross-modal interaction
affects the target modality. In
~e M~Gurk e~fect (Mc.curk
VIsual InformatIon
proVIded
the
Bertelson, P., Pavani, F., Ladavas, E., Vroomen,J., & de Gelder,
B. (2000). Ventriloqui~m in patients with unilateral visual
neglect. Neuropsychologza,38, 1634-1642.
Bertelson, P., & Radeau, M. (1981). Cros&-modal bias and
accesseach other directly.
primary
to
482-489.
with the idea that there is feedlevels to unimodal
levels of
.
.
. th
dal . .
perceptIon or WIth the notIon
at sensory mo
ItIes
between
contributions
bias of auditory location. PsychonomicBulletin and Review, 5,
All these find-
ings are in accordance
back from multimodal
Such cross-talk
Cognitive
.
WIththe face. When the face-voIce paIrs were congruent,
there was a bigger
(Eds.),
.
tzal and tem,J,
rOTal events (pp. 347 - 362) . Am ster d am: El seVIer.
Bertelson, P., & Aschersleben, G. (1998). Automatic visual
Incongruent
cros&-modal
Current
es~
Biology,
binding
10,649-657.
in
the
human
heteromodal
cortex.
thus have the Ulllfrom the primary
Canon, L. K (1970). Intermodality inconsistency of input and
directed attention as determinants of the nature of adapta-
r
modality, and this may be due to the existence
of
backprojections to the primary sensory areas.
Yet this is not to sa that a percept resulting
from
.
. Y ..
.
cross-modal InteractIons
IS, m all Its relevant aspects,
tion.journalofExperimentalPsychology,
84, 141-147.
de ?elder, B., & Vroomen,].
(~O.OO).The pe~ception of emo~ons by ear and by eye. CognztzonandE~tz~,
1~, 289-~11.
Dnver, J. (1996). Enhancement of selectIve lIstenIng by Illusory mislocalization of speech sounds due to lip-reading.
r
equivalent to its unimodal
r
r
a ventriloquized
sound in all its perceptual
and neurophysiological relevant dimensions
the same as a sound
Driver,J., & Spence, C. (1998). Attention and the cros&-modal
?onstruction of space. Trends in Cogniti~e Sciences,2, 254-~62.
Ic
played from
quize
Dnver, J., & Spence, C. (2000). MultIsensory perceptIon:
Beyond
731-735. modularity and convergence. Current Biology, 10,
,.
,
combinations (i.e., hearing /ba/ and seeing /ga/) , the
auditory component
can be dissociated from the per-
"
~
~
the
. d soun d was
counterpart.
For example,
is
Nature, 381, 66-68.
direction
from where the ventrilo.
.
1us
perceIve d r For M c Gur k -l I.k e stImu
"'
Ettlinger, G., & Wilson, W. A. (1990). Cros&-modal performance: Behavioural processes, phylogenetic considerations
ceived component,
as the contrast effect in selective
speech adaptation is driven mainly by the auditory stim-
I
d
.
b th
d
t
fth
d
,.
al
Y e perceIve
aspec 0
e au IOVISU
. '
..
stimulus combInatIon
(Roberts & Summerfield,
1981;
u
us
an
Falchier,
Vroomen,
& de Gelder,
2003).
For
other cross-modal phenomena
such as ventriloquism,
number of intriguing
questions
remain
about
to
w
h.
.
IC
h
th
. 11
e
..
I
.
usory
or not, from Its vendlcal
percept
IS
d
..
IstInguls
.
a
the
h
counterpart.
"
REFERENCES
Bedford, F. L. (1999). Keeping perception accurate. Trends in
CognitiveSciences,2,4-11.
Bertelson, P. (1994). The cognitive architecture
behind
auditory-visual interaction in scene analysis and speech
identification. Cahiers de PsychologieCognitive, 13, 69-75.
VROOMEN
L.,
Barone,
P.,
&
Kennedy,
H.
(2001).
Thejournal of Neuroscience,27,511-521.
striate
e,
Renaud,
Behavioural Brain Research, 40,
Falchier, A., Clavagnier, S., Barone, P., & Kennedy, H. (2002).
Anatomical evidence of multimodal integration in primate
bl
a
A.,
mechanisms.
.. projections
.
"
ExtensIve
from the pnmary
auditory cortex and
polysensory area STP to peripheral area VI in the macaque.
not
but see Bertelson,
extent
and neural
169-192.
cortex.
The
journal
of
Neuroscience,
22,
5749-5759.
F .
nssen, I ., Berte 1son, P., Vroomen,.,
& d e Gelder, B. (2003).
The aftereffects of ventriloquism: Are they sound frequency
J
specific? Acta Psychologica, 113, 315-327.
Gi~d, M. ~., & Pe~onnet, F. ~1999). Au~i~orY-;isual integratIon dunng multImodal object recogmtIon In humans: A
behavioural
and electrophysiological
study. journal of
Cognitive Neuroscience,11,473-490.
Korte,
A. (1915).
Kinematoscopische
Untersuchungen
[Kinematoscopic
research]. Zeitschrift flir Psychologie der
Sinnesorgane, 72, 193-296.
AND DE GELDER: PERCEPTUAL
EFFECTS OF CROSS-MODAL
STIMUlATION
149
Macaluso, E., Frith, C., & Driver, J. (2000). Modulation of
human visual cortex by cross-modal spatial attention.
Science,
289, 1206-1208.
Massaro, D. W. (1998). Perceivingtalking faces: From speech
perceptionto a behavioralprinciple. Cambridge: MA: MIT
Press.
McDonald, J. J., Teder-Salejarvi,W. A., Heraldez, D., &
Hillyard, S. A. (2001). Electrophysiological evidence for the
"missing link" in cross-modalattention. CanadianJournal of
Psychology,
55, 143-151.
McGurk, H., & MacDonald,J. (1976). Hearing lips and seeing
voices. Nature, 264, 746-748.
O'Leary, A., & Rhodes,G. (1984). Cross-modaleffects on visual
and auditory object perception. Perception& Psychophysics,
35,565-569.
Pourtois, G., de Gelder, B., Vroomen, J., Rossion, B., &
Crommelink, M. (2000). The time-<:ourseof intermodal
binding between seeing and hearing affective information.
NeuroReport,11,1329-1333.
Radeau, M. (1994). Auditory-visual spatial interaction and
modularity. CahiersdePsychologie
Cognitive,13,3-51.
Radeau, M., & Bertelson, P. (1974). The after-effects of ventriloquism. The QuarterlyJournal of ExperimentalPsychology,
26,63-71.
Roberts, M., & Summerfield, Q. (1981). Audiovisual presentation demonstrates that selective attention to speech is
purely auditory. Perception& Psychophysics,
30, 309-314.
Rumelhart, D., & McClelland, J. (1982). An interactive
activation model of context effects in letter perception:
Part 2. The contextual enhancement effect and some
tests and extensions of the model. Psychological
Review,89,
60-94.
Shams,L., Kamitani, Y, & Shimojo, S. (2000). What you seeis
what you hear. Nature, 408, 708.
distraction and action: MultiPle perspectiveson attentional
capture(pp. 231-262). Amsterdam: Elsevier.
Spence, C., & Driver, J. (1997a). On measuring selective
attention to a specific sensory modality. Perception&
Psychophysics,
59, 389-403.
Spence,C., & Driver,]. (1997b). Audiovisual links in exogenous
Soroker, N., Calamaro, N., & Myslobodsky, M. S. (1995).
Ventriloquism reinstates responsivenessto auditory stimuli
in the "ignored" spacein patients with hemispatial neglect.
Journal of Clinical and Experimental Neuropsychology,
17,
243-255.
Spence,C. (2001). Cross-modalattentional capture; A controversy resolved? In C. Folk & B. Gibson (Eds.), Attention,
Vroomen,J., & de Gelder, B. (2000). Sound enhances visual
perception: Cross-modaleffects of auditory organisation on
vision. Journal of ExperimentalPsychology:
Human Perception
and Performance,
26, 1583-1590.
Vroomen,J., & de Gelder, B. (2003). Visual motion influences
the contingent auditory motion aftereffect. Psychologica
Science,
14,357-361.
150
PERCEPTUAL CONSEQUENCES OF MULllPLE
covertspatialattention.Perception
& Psychophysics,
59,1-22.
Spence, C., & Driver,J. (2000). Attracting attention to the illusory location of a sound: Reflexive cross-modal orienting
and ventriloquism. NeuroReport,11,2057-2061.
Stein,B. E., London, N., Wilkinson,L. K., &Price,D. D. (1996).
Enhancement of perceived visual intensity by auditory
stimuli: A psychophysical analysis. Journal of Cognitive
Neuroscience,
8, 497-506.
Treisman, A. M., & Gelade, G. (1980). A feature-integration
theory of attention. CognitivePsychology,
5, 109-137.
van Noorden, L. P.A. S. (1975). Temporalcoherence
in thepercep.
tion of tone sequences.
Unpublished doctoral dissertation,
Technische Hogeschool Eindhoven, The Netherlands.
Vroomen, J., Bertelson, P., & de Gelder, B. (1998). A visual
influence in the discrimination of auditory location.
Proceedings
of the International Conference
on Auditory-Visual
speechProcessing(AVSP'98) (pp. 131-135). Terrigai-Sydney,
Australia: Causal Productions.
Vroomen, J., Bertelson, P., & de Gelder, B. (2001a). The.
ventriloquist effect does not depend on the direction of
automatic visual attention. Perception& Psychophysics,
63,
651-659.
.
Vroomen,J., Bertelson, P., & de Gelder, B. (2001b). Directing
spatial attention towards the illusory 10catioQof a ventriloquized sound. Acta Psychologica,
108,21-33.
Vroomen,J., Bertelson, P., Frissen, I., & de Gelder, B. (2003).
A spatial gradientin theventriloquismafter-effect.
Unpublished
manuscript.
.
SENSORYSYSTEMS
Download