Intersensory Gestalten and Crossmodal Scene Perception

advertisement
Intersensory Gestalten and Crossmodal
Scene Perception
Charles Spence (U niversity
D aniel Sanabria (U niversity
Salvador Soto-Faraco (U niversitat de
of O xford)
of O xford)
Barcelona)
Abstract
The last few years have seen a dramatic growth of interest in
questions related to the integration of information arising from the
different senses (e.g., Calvert, Spence, & Stein, 2004). R esearchers have identi ed a number of factors that can facilitate
multisensory binding, such as spatial and temporal coincidence,
common fate, and common temporal structure. These crossmodal
binding principles map onto a number of G estalt grouping
principles that have been put forward previously to describe how
sensory information is perceptually organized within individual
sensory modalities (i.e., intramodal perceptual grouping or stream
segregation). H owever, to date, only a small body of research has
attempted to investigate the extent to which the perceptual
organization of stimuli taking place in one sensory modality can
in uence the perceptual organization of stimuli in another sensory
modality, and/or the nature of any crossmodal interactions
between stimuli presented in different sensory modalities. In this
chapter, we provide a historical overview of the empirical
evidence relating to the crossmodal (or multisensory) aspects of
scene perception (i.e., perceptual organization). Taken together,
the evidence clearly highlights the necessity of considering
intramodal perceptual grouping when investigating the multisen-
sory integration (or crossmodal grouping) of sensory information.
1)
Introduction
H ave you ever stared at the
ickering lights on a Christmas tree
and wondered why they appear to ash in time with the rhythm of
the music that you are listening to ? D espite the fact that many
people report having had this experience anecdotally, very little
research has been directed at trying to explain the phenomenon
empirically (though see Intriligator, 2000). In the present chapter,
we put forward the view that the Christmas tree lights illusion
most probably re ects the consequences of the brain’s attempt to
organize the perceptual scene crossmodally. In fact, we will argue
that the illusion provides an everyday example (albeit of a
seasonal variety) demonstrating that the perceptual organization of
stimuli taking place in one sensory modality (audition in this case)
may be used by the nervous system to help organize (or structure)
the information that is simultaneously being processed by other
sensory modalities (in this case vision).
While
researchers have
known
about
such
crossmodal
interactions (or intersensory G estalten ; G ilbert, 1938, 1941) for
many years (see Maass, 1938 ; Schiller, 1935 ; Thomas, 1941 ;
U rbantschitsch, 1888 ; Z ietz & Werner, 1927, for early studies ;
see R yan, 1940, London, 1954, for early reviews), surprisingly
little
research
has
been
conducted
on
this
fascinating
phenomenon. O ne reason for this lack of interest regarding the
organizational principles governing multisensory scene perception
may be the seemingly effortless manner in which we normally
segregate the inputs constantly bombarding our senses in terms of
their modality of origin (though see McG urk & McD onald, 1976,
for one of the best-known counter-examples, showing that changes
Intersensory G estalten and Crossmodal Scene Perception
in acoustic speech perception can be induced by changing the
nature of the visual display ; and Shams, Kamitani, & Shimojo,
2000, for evidence that changes in visual perception/structure can
be induced by presenting the appropriate pattern of auditory
stimulation). Instead, the majority of researchers have tended to
focus
their
efforts
on
elucidating
the
grouping
principles
underlying our perception of unimodal visual displays (e.g.,
Corbin, 1942 ; H ochberg, 1974 ; Koffka, 1935 ; Kohler, 1930 ;
Pomerantz, 1981 ; R ush, 1937 ; Sekuler & Bennett, 2001 ; Smith,
1988 ; Wertheimer, 1923/1938), and to a lesser extent, unimodal
auditory
displays
McA dams
&
(e.g.,
Bregman,
Bregman,
1990 ; McA dams,
1979 ; R oyer
&
1984 ;
G arner,
1970 ;
Wertheimer, 1923/1938).
R esearch by G estalt psychologists on the topic of unimodal
perceptual organization has revealed the existence of a number of
grouping principles, such as spatial proximity, similarity, good
continuation, common fate (or uniform destiny), closure, and
E instellung,
that
exert
a
profound
in uence
on
people’s
perception of the organization of visual displays. Similarly, the
research of Bregman and his colleagues (summarized in Bregman,
1990) has also highlighted the existence of a number of grouping
principles governing the segregation of auditory scenes (see also
Carlyon, 2004).
Interestingly, the principles of auditory scene
analysis that have been identi ed, such as similarity (in terms of
pitch,
volume,
timbre,
or
the
location
of sounds),
good
continuation, and common fate map nicely onto those reported
previously in unimodal visual studies (see also Julesz & H irsh,
1972 ; Kubovy, 1981 ; Wertheimer, 1923/1938), thus suggesting
the existence of a set of general perceptual organizational
principles (cf. A ksentijevic, E lliott, & Barbe, 2001 ; Shepard,
1981, 1987, 1994).
The last few years have seen a very rapid growth of interest
in the study of multisensory perception by cognitive neuroscientists (e.g., see the chapters in Calvert et al., 2004). What’s
more, recent studies of multisensory integration have also
highlighted the importance of factors such as spatial coincidence,
temporal
synchrony,
crossmodal
binding
and
(or
common
perceptual
fate
in
mediating
grouping)
of
the
sensory
information (e.g., Bertelson, 1999 ; King & Calvert, 2001 ;
Slutsky & R ecanzone, 2001 ; Soto-Faraco, Kingstone, & Spence,
2004 ; Spence & D river, 2004 ; Thomas, 1941 ; Welch, 1999 ;
Welch & Warren, 1980 ; Z ampini, G uest, Shore, & Spence,
2005).
G iven
organizational
this
apparent
principles
correspondence
constraining
our
between
interpretation
the
of
unimodal visual and auditory scenes (Bregman, 1990 ; Kubovy,
1981 ; Kubovy & Van Valkenburg, 2001) or patterns (e.g., Julesz
& H irsh, 1972), and the fact that multisensory integration (or
crossmodal perceptual grouping) appears to be constrained by
many of the same organizational principles, one might reasonably
ask to what extent scene analysis actually re ects a crossmodal
phenomenon (see O ’Leary & R hodes, 1984). In other words, to
what extent does the human perceptual system utilize information
from one sensory modality in order to impose a certain
organization on the perceptual array present in another sensory
modality ?
The research that has been conducted thus far on the
perceptual organization of multisensory scenes can be divided into
several distinct areas : A number of researchers have focused on
the question of whether or not the perceptual organization of
stimuli taking place within one sensory modality can in uence the
perceptual organization of stimuli presented in another sensory
modality (e.g., Maass, 1938 ; O ’Leary & R hodes, 1984 ; Soto-
Intersensory G estalten and Crossmodal Scene Perception
Faraco, Lyons, G azzaniga, Spence, & Kingstone, 2002 ; SotoFaraco, Kingstone, & Spence, 2003 ; Z ietz & Werner, 1927 ; see
also Shimojo & Shams, 2001). Meanwhile, other researchers have
addressed the question of whether perceptual interactions between
stimuli presented to different sensory modalities (a phenomenon
sometimes referred as crossmodal perceptual grouping ; e.g.,
Bertelson, 1999 ; Bertelson & de G elder, 2004, p. 150) are
modulated by the perceptual grouping taking place within a
particular sensory modality (e.g., Thomas, 1941 ; Vroomen & de
G elder, 2000 ; Watanabe & Shimojo, 2001a).
More recently,
researchers have also started to investigate the consequences of
the local versus global grouping of stimuli in one sensory modality
upon
crossmodal perceptual organization
(Lyons,
Sanabria,
Vatakis, & Spence, 2006 ; Sanabria, Soto-Faraco, Chan, &
Spence, 2004b, 2005a ; Sanabria, Soto-Faraco, & Spence, 2004c).
A lthough the aforementioned studies have utilized a whole
range of different experimental paradigms, they converge insofar
as they can all be said to re ect an attempt by the researchers
involved to investigate the complex interplay between intramodal
and crossmodal perceptual grouping (or scene organization) ; O r,
in other words, to investigate the relationship between intra sensory and inter-sensory G estalten (G ilbert, 1938, 1941). In the
sections that follow, we provide a chronological overview of the
evidence supporting the claim that scene perception (or perceptual
organization) re ects a crossmodal (or intermodal) phenomenon.
2)
Early studies of crossmodal perceptual organization
In
one
of
the
earliest
studies
of
crossmodal
perceptual
organization, Maass (1938) presented participants with a number
of visual stimuli positioned so as to facilitate the perception of
several different patterns of visual apparent motion. 1) The visual
stimuli were either presented in silence or else were presented
together with a 2 beat or 3 beat auditory rhythm.
Maass
reported that the presentation of the auditory stimuli limited the
number of visual patterns that people reported seeing.
In
particular, participants tended to report visual patterns that
corresponded to the auditory rhythms, but not those that did not.
Maass’s results therefore provide one of the
rst empirical
demonstrations that the nature of the perceptual organization
taking place in the auditory modality can affect the organization
imposed by the perceptual system on a simultaneously-presented
visual display (as highlighted by the Christmas lights illusion
mentioned earlier ; see also Z ietz & Werner, 1927, for similar
results).
More than 40 years later, O ’Leary and R hodes (1984)
reported an important study in which they investigated the
in uence of perceptual grouping (or organization) within one
sensory modality on the perceived grouping of stimuli presented in
another modality. The participants in this study were repeatedly
presented with a sequence of 6 visual stimuli (dots), three dots in
a higher sub-group (presented from three different elevations in
the upper part of the visual
eld), and another three dots in a
lower sub-group (presented from three different elevations in the
lower part of the visual
eld ; see Figure 1A ). D ots from the
high and low sub-groups were presented in an alternating
sequence from the same horizontal position, 2) although at different
elevations (see
Figure
2A ).
At
slower
rates of stimulus
presentation, this sequence of visual stimuli gave rise to the
perception of a single dot that appeared to move up and down
sequentially between the lower and upper part of the visual
display (see Figure 2B), whereas at faster rates of presentation,
Intersensory G estalten and Crossmodal Scene Perception
the higher and lower streams segregated and two dots were
perceived as moving up and down concurrently (see Figure 2C),
one at the top of the display (moving between the three different
elevations of the upper sub-group) and the other at the bottom of
the display (moving between the three different elevations of the
lower sub-group ; see also Bregman & A chim, 1973).
Figure 1 Schematic illustration of the sequence of visual (A ) and auditory (B)
stimuli used in O ’Leary and R hodes’s (1984) study of crossmodal perceptual
organization. T1 T6 indicates the temporal sequence (from rst to last) in which
the 6 events were presented in each modality. The visual and auditory stimuli were
either presented individually or together (in the bimodal stimulation condition).
O ’Leary and R hodes (1984) also presented their participants
with sequences of tones. In a manner analogous to the visual
displays, 2 sub-groups of tones, one of higher frequency and the
other of lower frequency, were presented (see Figure 1B). A t
lower rates of stimulus presentation, these tones were perceived as
a single tone alternating in frequency between the upper and
lower
frequencies,
whereas
at
higher
rates
of
stimulus
presentation, two temporally-overlapping, but perceptually distinct
(i.e., segregated), auditory streams were perceived (see Bregman
& Campbell, 1971 ; McA dams, 1984 ; Miller & H eise, 1950 ; Van
Noorden, 1971, 1975).
Initially, the thresholds (in terms of the SO A ) for the
Figure 2 (A ) Sequence of stimuli presented in the upper and lower sub-groups in
O ’Leary and R hodes’s (1984) experiment. T1 T6 indicates the temporal sequence
(from rst to last) in which the 6 events were presented in each modality. (B C)
Perceptual correlates associated with different rates of stimulus presentation. A t
lower rates of stimulus presentation (B), a single dot (or tone) was perceived
alternating sequentially between the two sub-groups of stimuli (as shown by the
continuous line connecting the dots). A t higher rates of stimulus presentation (C),
two separate concurrent streams were perceived, one in the upper part of the visual
display (or frequency range) and the other in the lower part of the display (or
frequency range). A t intermediate rates of stimulus presentation, O ’Leary and
R hodes observed that participant’s subjective reports of whether they perceived
one or two streams in a given sensory modality could be in uenced by whether they
were perceiving one or two streams in the other modality at the same time. These
results were taken by O ’Leary and R hodes to show that the nature of the
perceptual organization in one sensory modality can in uence how the perceptual
scene may be organized (or segregated) in another sensory modality.
perception of one versus two streams were determined for each
modality individually by varying both the magnitude of the
separation between the upper and lower sub-groups (either
spatially for the visual stimuli, or in the frequency domain for the
auditory stimuli), and the timing between successive stimuli in the
sequence. Next these thresholds were assessed under conditions
Intersensory G estalten and Crossmodal Scene Perception
of bimodal stimulus presentation.
conditions,
the
highest
Note that in the bimodal
frequency sound
was presented
in
synchrony with the highest visual stimulus, the second highest tone
with the second highest visual stimulus and so on, that is, the
stimuli were presented in a synaesthetically congruent manner (see
G allace & Spence, 2007 ; Marks, 2004 ; Pratt, 1930).
O ’Leary
and R hodes reported that the presentation of visual displays that
were perceived by participants as consisting of two moving objects
(i.e., visual streams where segregation had taken place) caused
them to report that the concurrent auditory displays were also
perceived as two streams (i.e., as segregated) at presentation rates
that yielded reports of a single perceptual stream when the
accompanying visual sequence was perceived as a single stream,
and vice versa.
O ’Leary and R hodes’s (1984) study represents a seminal piece
of empirical research on the nature of crossmodal perceptual
organization. H owever, their
ndings have been criticized (e.g.,
by Vroomen & de G elder, 2000) on the grounds that they remain
open to a response bias (i.e., non-perceptual) interpretation. O ne
feature that is common to many ambiguous displays, such as the
auditory and visual displays used by O ’Leary and R hodes, is that
their perceptual interpretation remains ambiguous over a relatively
wide range of temporal intervals (e.g., van Norden, 1971).
What’s more, the perceptual interpretation of such displays can
easily be ipped at will (i.e., volitionally) by observers. H ence, in
O ’Leary and R hodes’s study, it is possible that the perception of
one versus two streams in a given sensory modality may simply
have biased people to respond in the same way when asked about
their perception of the (ambiguous) stimulus organization in the
other modality. In other words, one might predict that simply
presenting the numbers 1 or
2 to participants (no matter
whether they were presented auditorily or visually) may have had
exactly the same effect on people’s reports concerning their
interpretation of the stimulus displays.
R hodes’s
A s such, O ’Leary and
ndings may only tell us about an observer’s voluntary
control over the interpretation of ambiguous images (i.e., about
the contribution of
perception),
rather
decisional
than
factors to crossmodal scene
necessarily
revealing
anything
fundamental about the nature of the perceptual constraints on
crossmodal information processing and scene analysis (a similar
criticism can also be levelled at Maass’s, 1938, study).
3)
Assessing the in uence of intramodal perceptual
grouping on multisensory interactions
This potential criticism was addressed in an elegant series of
experiments by Vroomen and de G elder (2000) in which they
investigated the in uence of auditory perceptual grouping (or
stream segregation) on participant’s performance of a visual target
identi cation task. In their experiments, a sequence of four visual
displays, each consisting of four dots placed randomly within a
4 by 4 array of possible locations (see Figure 3), was repeatedly
presented to participants. E ach visual display was brie y ashed
(for 97 ms) and then immediately masked until the onset of the
next visual display (the mask consisted of all 16 dots in the array
being displayed for 97 ms followed by a dark screen being
presented for a further 60 ms). O n each trial, the participants had
to judge the location of the visual target, de ned as a diamondshaped array of the 4 dots in one of the four corners of the display
(see the 3rd display illustrated in Figure 3). The whole sequence
of displays was presented repeatedly until participants made their
4 alternative spatial discrimination response, and both the speed
Intersensory G estalten and Crossmodal Scene Perception
(in terms of the number of stimulus displays presented prior to
response execution) and accuracy of participants’ responses were
registered.
Figure 3 Schematic illustration of a representative sequence of events in Vroomen
and de G elder’s (2000) E xperiment 1. A rapid sequence of 4 visual displays was
presented continuously, each visual display presented concurrently with an
auditory event (illustrated here by the ’musical’ notation). The participant’s task
was to discriminate the location (4 choice spatial discrimination) of the diamondshaped visual target (presented in the 3rd frame of this illustration). Participants
were able to discriminate the location of the visual target signi cantly more
accurately (and rapidly) when the unique auditory high tone coincided with the
presentation of the visual target than in trials where only low tones were presented.
(Note that the small dots in the displays were not actually seen by the participants
but are just shown here to illustrate the possible positions from which the four
visual stimuli could be presented in each display ; Note also that a visual mask was
presented between each display, see text for details).
The manipulation of interest related to the tones that were
presented in synchrony with each of the visual displays. In half of
the trials in Vroomen and de G elder’s (2000) rst experiment, the
same low tone (1000 H z) was presented in synchrony with each of
the 4 visual displays (the LLLL condition), whereas in the
remainder of the trials, a high tone (1259 H z) was presented
synchronously with the target display (the LLH L condition ; note
that the visual target was always presented as the 3rd display in
each sequence of 4 displays). The 4 sounds were presented a
number of times (between 4 and 8) prior to the onset of the visual
displays on each trial in order to facilitate the segregation of the
auditory stimuli into separate low and high frequency streams (see
Bregman, 1990) on those trials where both low and high tones
were presented.
Vroomen and de G elder’s (2000) results showed that
participants were able to report the location of the visual target
more accurately on trials where it was presented in time with the
high tone than on trials where only low tones were presented
(mean response accuracy of 66% vs. 55% , respectively). Their
results also showed that participants responded more rapidly in
LLH L trials (i.e., after fewer presentations of the target stimulus ;
mean of 2.86 target presentations) than in LLLL trials (M 3.32
target presentations).
This crossmodal auditory facilitation of
visual spatial discrimination performance occurred despite the fact
that
the
auditory
stimuli
were
entirely
irrelevant
to
the
participant’s task and despite the fact that the auditory stimuli
provided absolutely no information with regard to the spatial
location of the to-be-discriminated visual target (cf. D river &
Spence, 2000).
The participants in Vroomen and de G elder’s
study also reported subjectively that the visual display coinciding
with the high tone appeared to segregate from the other visual
displays in the sequence (and was also perceived to have a
somewhat longer duration ; that is, it appeared to
though see Staal & D onderi, 1983).
freeze ;
This auditorily-induced
performance enhancement was eliminated, however, if the high
tone was presented in synchrony with the visual display directly
Intersensory G estalten and Crossmodal Scene Perception
preceding the target display (i.e., synchronous with the 2nd display
in the sequence ; Vroomen & de G elder, 2000, E xperiment 2).
This latter result shows that the bene cial effect of the
presentation of the high tone on performance could not simply be
attributed to it acting as some kind of non-speci c warning signal
(or temporal marker) indicating to participants when in the
sequence of stimuli they should expect the target to occur (cf.
Correa, Sanabria, Spence, Tudela, & Lupianez, 2005 ; Posner,
1978).
In their
nal two experiments, Vroomen and de G elder
(2000) demonstrated that the bene cial effect of presenting a
higher freguency tone in time with the target visual display could
be attenuated (or even eliminated) simply by reducing the
likelihood that the high tone would segregate from the other tones
in the auditory stream. They lowered the probability of effective
stream segregation by using a Low, Medium, H igh, Low tone
sequence (LMH L ; with the medium tone falling 2 semitones
between the high and low tones) to reduce the frequency
separation between the high tone and the other tones in the
sequence. Crucially, the participants responded signi cantly more
accurately in the LLH L condition than in the LMH L condition,
no matter whether or not they were informed that the LMH L
tone sequence actually corresponded to the beginning of the wellknown French tune Frere-Jaques . Taken together, these results
therefore suggest that it may simply have been the reduction in
the frequency separation between the high tone and the other
tones in the sequence, rather than necessarily the participants
experiencing the LMH L tone sequence as a familiar tune (or
melody), that elicited this effect (cf. Bregman & D annenbring,
1973 ; H eise & Miller, 1951).
Vroomen and de G elder’s (2000) results therefore provide a
convincing demonstration that the nature of the perceptual
grouping (or stream segregation) taking place within one sensory
modality (audition in this case) can affect the extent of any
crossmodal
interactions
observed
between
presented auditory and visual stimuli.
simultaneously-
What’s more, and in
contrast to the results of the earlier studies (e.g., Maass, 1938 ;
O ’Leary & R hodes, 1984 ; Z ietz & Werner, 1927), Vroomen and
de G elder’s ndings cannot easily be accounted for in terms of any
kind of response bias, since the segmentation/grouping of the
auditory stream was orthogonal to the 4 choice visual spatial
discrimination response that participants had to make (cf. D river
& Spence, 2000 ; Spence & D river, 1997). H ence, the in uence
of auditory perceptual grouping (or stream segregation) on
crossmodal audiovisual interactions can be con dently attributed
to a genuine perceptual effect instead.
The year after Vroomen and de G elder’s study, Watanabe
and Shimojo (2001a) provided another elegant demonstration that
the nature of the intramodal perceptual grouping taking place
within the
auditory modality can in uence
the
nature
of
audiovisual crossmodal interactions. They used a variation of the
bouncing ball illusion (Metzger, 1934 ; Michotte, 1946/1963), in
which two identical objects (typically opaque disks) are shown
moving directly toward each other along a straight trajectory,
coinciding, and moving away from each other, on a twodimensional visual display.
The display is ambiguous, being
equally consistent with two possible perceptual interpretations :
That is, when the balls coincide, they can either be seen to pass
(or stream) through each other (the most common percept under
normal conditions of unimodal stimulation), or else they can
appear to bounce off one another (e.g., Bertenthal, Benton, &
Bradbury, 1993 ; R amachandran & A nstis, 1983 ; Sekuler &
Intersensory G estalten and Crossmodal Scene Perception
Sekuler, 1999 ; Watanabe & Shimojo. 1998). In 1997, Sekuler,
Sekuler, and Lau showed that the presentation of a brief sound at
the moment when the two objects coincide signi cantly increases
the likelihood that participants will report the two disks as having
bounced off of each other (rather than as have streamed through
each other ; see also Bushara, H anakawa, Immisch, Toma,
Kansaku, & H allett, 2002 ; E cker & H eller, 2005 ; Sanabria,
Lupianez, & Spence, in press ; 2004 ; Scheier, Lewkowicz, &
Shimojo, 2003 ; Shimojo & Shams, 2001). 3)
Watanabe and Shimojo (2001a) demonstrated that this
auditory modulation of the bouncing/streaming percept could be
reduced if the sound presented in time with the two objects
meeting on the screen (the simultaneous auditory event ) was
embedded within a stream of identical auditory
Figure 4). That the effect of the
ankers’ (see
ankers should be attributed to
auditory grouping, rather than to some low-level effect of the
presentation of the
ankers on people’s perception of the
simultaneous auditory event (in terms of altering its perceived
loudness, duration, or time of occurrence), was con rmed
subsequently in a series of control studies (Watanabe & Shimojo,
2001a ; E xperiments 4 6).
Watanabe
and
Shimojo’s results
therefore show that the grouping of the simultaneous auditory
event into a stream with the auditory
ankers modulated (i.e.,
attenuated) any crossmodal in uence of the sound on the
resolution of ambiguous visual motion. 4)
Watanabe and Shimojo (2001a) also demonstrated that this
auditory modulation of ambiguous visual motion perception could
be revived if the auditory ankers were modi ed such that they no
longer
grouped
with
the
simultaneous
auditory
event : In
particular, an increase in the proportion of bounce responses was
once again observed if the frequency of the simultaneous auditory
Figure 4 Schematic illustration of the sequence of events in Watanabe and
Shimojo’s (2001a ; E xperiment 2) study of the effect of auditory stimuli on visual
motion processing. A t the start of each trial, two black disks were seen moving
toward each other on a two-dimensional visual display (the direction of motion is
indicated by the arrows shown in the lower display). Participants had to report
whether they perceived the disks as bouncing off one another or else as streaming
through each other (the dominant percept under unimodal visual stimulation
conditions). Previous research has shown that the presentation of an auditory event
simultaneous with the coincidence of the two disks (the black musical note shown
next to the middle display in the gure) increases the proportion of bouncing’
responses (e.g., Sekuler et al., 1997). Watanabe and Shimojo demonstrated that
the modulatory effect of the presentation of the simultaneous auditory event on
bounce responses was signi cantly attenuated if a sequence of same frequency
auditory ankers (the white musical notes in the gure) were presented before and
after it (presumably because the simultaneous auditory event now grouped with the
auditory ankers rather than with the visual coincidence event). The proportion of
bounce responses remained high, however, if the frequency of the simultaneous
auditory event was made suf ciently different from that of the ankers.
event was made suf ciently different from that of the
ankers
(i.e., when the simultaneous auditory event was presented at 900
or 2,700 H z, while the
ankers were presented at 1800 H z ;
conditions under which the simultaneous auditory event should
presumably have segregated from the uniform frequency auditory
ankers). Therefore, it appears that just as in Vroomen and de
G elder’s (2000) study, frequency separation can be used as an
effective cue to weaken intramodal auditory grouping (and so
facilitate stream segregation), and thereby potentially to facilitate
Intersensory G estalten and Crossmodal Scene Perception
crossmodal grouping (or binding).
The proportion of bounce
responses also remained high if the simultaneous auditory event
was made louder than the ankers (60 dB), but not if it was made
ankers which were always
quieter (56 dB, as compared to the
presented at 58 dB ; cf. G ilbert, 1941). Watanabe and Shimojo’s
results therefore converge with those of Vroomen and de G elder
(2000) in showing that the extent of any crossmodal interaction
between simultaneously-presented auditory and visual stimuli can
be modulated by the intramodal perceptual grouping taking place
within the auditory modality. H owever, they also highlight the
importance
of
stimulus
saliency
in
modulating
crossmodal
interactions over-and-above the effect of any unimodal grouping
that may be taking place.
4)
The crossmodal perceptual organization of apparent
motion stimuli
Both Vroomen and de G elder (2000) and Watanabe and Shimojo
(2001a) focused on the effects of the intramodal grouping taking
place within the auditory modality on crossmodal audiovisual
interactions
that
were
assessed
by
means
of
participants’
performance on a visual task. O ne might reasonably ask therefore
whether similar effects would also occur in the reverse direction :
That is, would the nature of any intramodal grouping taking place
within the visual modality also in uence crossmodal interactions as
indexed by auditory discrimination performance ? Salvador SotoFaraco and his colleagues have reported a number of studies
addressing precisely this question using a variation of the
crossmodal dynamic capture task (Soto-Faraco et al., 2002). 5)
In 2002, Soto-Faraco et al. reported a series of experiments in
which they showed that the presentation of a visual apparent
motion stream consisting of the sequential presentation of two
light
ashes, one from either side of xation, could in uence the
direction in which an auditory apparent motion stream consisting
of two sequentially-presented tones, one presented from either
side of xation, appeared to move (see Figure 5 ; see also Staal &
D onderi, 1983). In a typical crossmodal dynamic capture study,
participants are asked to judge the direction of the auditory
apparent motion stream (either from left-to-right or vice versa)
while trying to ignore an irrelevant visual apparent motion stream
moving in either the same (i.e., congruent ) or opposite (i.e.,
Figure 5 Schematic illustration of the typical experimental set-up used in SotoFaraco et al.’s (2002) studies of the audiovisual crossmodal dynamic capture effect.
O n each trial, a sound was sequentially presented from each of two loudspeakers,
and each of two LE D s was also illuminated sequentially. The order of presentation
(i.e., left or right rst) of the stimuli in each modality was entirely unpredictable.
The participants had to discriminate whether the sound (target) appeared to move
from left-to-right (A ) or vice versa (B), while trying to ignore the apparent
movement of the visual distractors which could be either incongruent or congruent
with that of the auditory stimuli. T1 T2 indicate the temporal sequence of events
in the trial.
Intersensory G estalten and Crossmodal Scene Perception
incongruent or con icting) direction. The principle result to have
emerged from many such studies conducted over the last 5 years is
that
participants
signi cantly
perform
less
accurately
on
incongruent trials than on congruent trials (see Figure 6), at least
when
the
target
and
distractor
streams
are
presented
simultaneously.
Figure 6 Typical pattern of results from a crossmodal dynamic capture experiment
showing that discrimination of the direction of an auditory apparent motion stream
can be substantially impaired (i.e., response accuracy is signi cantly lowered) by
the simultaneous presentation of a distracting visual stream moving in the opposite
direction. Note that no such performance decrement was reported if the auditory
and visual apparent motion streams were presented asynchronously (separated by
500 ms in this example). The magnitude of the crossmodal dynamic effect is
measured in terms of the difference in performance between incongruent and
congruent trials (i.e., the difference between the black and grey bars in the gure).
Soto-Faraco
and
his colleagues have
argued
that
this
crossmodal dynamic capture effect (de ned as the magnitude of
the difference in performance between incongruent and congruent
trials) re ects the mandatory integration of visual and auditory
apparent
motion
signals.
Interestingly,
however,
while
performance on congruent crossmodal trials typically tends to
hover around ceiling (i.e., 100% correct) in the majority of
crossmodal dynamic capture studies (as it does on unimodal
auditory direction of motion discrimination trials), performance on
con icting trials tends to fall in the range of 40 60% correct. This
suggests a partial, rather than a complete, capture of the direction
of auditory apparent motion by the distracting visual apparent
motion stream. It is important to note though that the effect of
the presentation of the visual stimuli cannot just be attributed to
an attentional distraction effect (i.e., to participants simply being
unsure of the direction of the sound when visual stimuli are
presented on incongruent trials), since the crossmodal dynamic
capture effect is just as prevalent on trials where participants are
con dent of their response (about which direction the sound
moved in) as on trials where they are more uncertain of their
response (see Soto-Faraco et al., 2004a, E xperiment 3).
It has been shown that crossmodal dynamic capture effects
are somewhat larger when the stimuli in the different sensory
modalities are presented from the same (rather than from a
different) set of spatial locations (Soto-Faraco et al., 2002,
E xperiments 1 and 2 ; see also Meyer Wuerger, R ohrbein, &
Z etzche, 2005), and when the auditory and visual streams are
presented
at
approximately
the
same
time,
rather
than
asynchronously (e.g., see Soto-Faraco et al., 2002, E xperiment 3 ;
see also G ilbert, 1939). Crossmodal dynamic capture has been
shown to in uence the perception of the direction of both
continuous and apparent motion stimuli (e.g., Soto-Faraco,
Spence, & Kingstone, 2004a, E xperiment 6), and to occur bidirectionally between auditory and tactile stimuli (Sanabria, SotoFaraco, & Spence, 2005b ; Soto-Faraco, Spence, & Kingstone,
2004b). Visual distractors can also in uence the perception of the
direction of tactile apparent motion (Craig, 2006 ; Lyons et al.,
2006 ; Soto-Faraco & Kingstone, 2004), though little effect of
auditory or tactile distractors has typically been reported on
Intersensory G estalten and Crossmodal Scene Perception
judgments of the direction of visual apparent motion, even when
the strength (or quality) of apparent motion has been matched
across the auditory and visual modalities (see Soto-Faraco et al.,
2004a, E xperiments 1 and 2 ; Soto-Faraco & Kingstone, 2004 ;
though see also G ilbert, 1939 ; Soto-Faraco et al., 2003). 6) A
response bias interpretation of crossmodal dynamic capture has
been ruled out in a number of recent studies (Sanabria, Spence, &
Soto-Faraco, 2007 ; Soto-Faraco, Kingstone, & Spence, 2006 ;
Soto-Faraco, Spence, & Kingstone, 2005 ; Vroomen & de G elder,
2003), thus con rming the genuinely perceptual nature of at least
some component of this crossmodal effect.
5)
Assessing the role of intramodal grouping on the
crossmodal dynamic capture effect
The majority of studies of the crossmodal dynamic capture effect
have involved the presentation of the same number of stimuli in
both the target (to-be-reported) and distractor (to-be-ignored)
modalities (typically 2 ; though see Sanabria et al., 2007, for
evidence that crossmodal dynamic capture effects can also be
observed when 4 stimuli are presented in both modalities).
Sanabria and his colleagues (Sanabria, Soto-Faraco, Chan, &
Spence, 2005a ; E xperiment 1) recently investigated whether
increasing the number of visual stimuli presented in the distractor
visual apparent motion stream (from 2 to 6) would affect the
magnitude of the crossmodal congruency effect when participants
were required to judge the direction in which a sequence of two
auditory stimuli moved (either from left-to-right or vice versa ;
see Figure 7).
G iven that increasing the number of stimuli increases the
strength of visual apparent motion (e.g., Kolers, 1972 ; Sanabria
Figure 7 Schematic illustration of the stimulus displays used in Sanabria et al.’s
(2005a) study of intramodal versus crossmodal perceptual grouping. Note that the
bimodal displays in the 2 lights condition (A ) are identical to the middle 2 bimodal
displays (T3 T4) in the 6 lights condition (B). The auditory and visual apparent
motion stimuli moved in opposite directions on incongruent trials, and in the same
direction on congruent trials. T1 T6 indicate the temporal sequence of events in
the trial.
et al., 2005a, Footnote 3), one might have predicted that this
manipulation should actually have made the visual distractor
apparent motion harder to ignore, and hence have resulted in a
larger crossmodal dynamic capture effect (cf. Spence & Walton,
2005). In contrast, Sanabria et al. (2005a) argued that increasing
the number of stimuli in the visual display might actually reduce
the crossmodal capture effect because the increase in the strength
of intramodal visual perceptual grouping (and in the number of
visual stimuli relative to auditory stimuli), should make it more
likely that the auditory and visual streams would segregate from
one another and so be treated as separate perceptual events. A ny
such segregation of the stimuli presented in the two modalities
Intersensory G estalten and Crossmodal Scene Perception
would be expected to reduce multisensory integration (crossmodal
grouping) and hence reduce the in uence of the visual stream on
participants’ auditory direction-of-motion discrimination responses
(cf. Vatakis, Bayliss, Z ampini, & Spence, in press). Consistent
with Sanabria et al.’s (2005a) account, signi cantly less crossmodal
dynamic capture was observed in a condition where the visual
distractor stream contained 6 lights (mean capture effect of 21% )
than in the traditional 2 lights displays (mean capture effect of
34% ). These results therefore demonstrate that the nature of the
intramodal perceptual grouping taking place within the visual
modality can also in uence audiovisual crossmodal interactions as
indexed by performance on an auditory direction-of-motion
discrimination task.
In Sanabria et al.’s (2005a ; E xperiment 1) study, as in a
number of the other studies described thus far (e.g., Vroomen &
de G elder, 2000 ; Watanabe & Shimojo, 2001a), the conditions
for intramodal perceptual grouping were set up prior to the
presentation of any bimodal stimulus displays. Thus, perceptual
grouping in the distractor modality could build up in advance of
the opportunity for crossmodal grouping. The question therefore
arises as to whether intramodal perceptual grouping would still
modulate crossmodal perceptual grouping to the same extent if the
conditions for crossmodal perceptual grouping were actually
initiated prior to (or at the same time as) those promoting
intramodal perceptual grouping in the stimulus displays (cf.
Watanabe, 2004, for a similar discussion regarding the relative
timing of intramodal grouping by motion and the relative
localization of stimuli within vision). Sanabria and his colleagues
(Sanabria et al., 2004b ; E xperiment 1) addressed this issue in an
experiment where they compared the crossmodal dynamic capture
effects seen under conditions where intramodal visual perceptual
grouping was promoted
prior
to
the
presentation
of the
crossmodal displays (see Figure 8A ) with other conditions where
the
rst displays that were presented to the participants were
bimodal (and so presumably should have promoted crossmodal
grouping
instead ; see
Figure
8B).
Signi cantly
smaller
crossmodal dynamic capture effects were observed when the
conditions for intramodal visual grouping were set up prior to the
presentation of the bimodal displays (see Figure 8A ; mean
crossmodal congruency effect of 38% ) than when the rst stimulus
displays
were
bimodal
(see
Figure
8B ; mean
crossmodal
congruency effect of 59% ), and thus presumably promoted
crossmodal (over intramodal) perceptual grouping.
Sanabria et al. (2004b, E xperiment 2) conducted an important
follow-up experiment to con rm that it was not simply any
temporal warning (or alerting) effect provided by the initial
presentation of the unimodal visual stimuli (see Posner, 1978 ;
Spence & D river, 1997 ; cf. Vroomen & de G elder, 2000) that
may have led to the improved crossmodal stream segregation (i.e.,
to the reduced crossmodal congruency effects) reported in the
intramodal grouping condition (see Figure 8A ). The results of
this control experiment showed that the presentation of two lights
prior to the bimodal displays did not have any effect on the
magnitude of the crossmodal dynamic capture effect if they were
presented centrally and in a different color to the visual stimuli
leading to the impression of apparent motion. Crucially. while the
central lights provided just as much temporal information as the
peripheral lights in the previous experiment, the difference in their
color and location meant that they were no longer grouped with
the subsequently-presented lights giving rise to the impression of
visual apparent motion. Taken together, therefore, the results of
these two experiments demonstrate that intramodal perceptual
Intersensory G estalten and Crossmodal Scene Perception
Figure 8 Schematic illustration of two of the displays (both incongruent) presented
in Sanabria et al.’s (2004b) study of the interaction between intramodal and
crossmodal perceptual grouping. Note that the conditions for unimodal visual
perceptual grouping were either initiated prior to the presentation of the
audiovisual event (see Figure 8A ), or else after the appearance of the rst bimodal
display (see Figure 8B). Note that the actual number of stimuli presented in the
two modalities was kept constant across both conditions. T1 T6 indicate the
temporal sequence of events in the trial.
grouping has a more pronounced effect on crossmodal perceptual
organization when the conditions promoting intramodal grouping
are given temporal precedence over the bimodal displays that can
be grouped crossmodally.
6)
Local versus global perceptual organization
Sanabria and his colleagues (Sanabria et al., 2004c) have been
able to show that changes in the local versus global grouping of
visual apparent motion displays presented in time with the
bimodal displays can
also
modulate
crossmodal perceptual
organization (at least as far as these changes can be indexed by
changes in the crossmodal dynamic capture effect).
Previous
studies of unimodal visual scene perception have shown that
perceptual analysis tends to be governed by global
eld effects
rather than by the local analysis of the individual parts of an
image (e.g., A lais & Lorenceau, 2002 ; H e & O oi, 1999 ; Kramer
& Yantis, 1997 ; R amachandran & A nstis, 1983). Sanabria et al.
(2004c) asked a similar question with regard to the perceptual
organization
investigated
of
multisensory
whether
crossmodal
scenes : In
perceptual
particular,
they
organization
is
Figure 9 Schematic illustration of four of the different trial types presented in
Sanabria et al.’s (2005a ; E xperiment 2) study of the relative in uence of local
versus global visual apparent motion on the crossmodal dynamic capture effect.
The four conditions shown here result from the crossing of the factors of
congruency (incongruent versus congruent trials) and the number of visual stimuli
(2 versus 4 lights). Note that there were also 4 more trial types (not shown) in
which the visual stimuli were presented in the opposite order to that shown here
(i.e., the lights appeared to move from the bottom to the top of the display in the
4 lights condition). The global direction of apparent motion in the displays is
indicated by the horizontal and vertical arrows in the gure. The magnitude of the
crossmodal dynamic capture effect was signi cantly greater in the 2 lights displays
than in the 4 lights displays. Sanabria et al. interpreted this difference in terms of
the additional two lights presented in the 4 lights displays giving rise to global
visual motion in the vertical dimension.
Intersensory G estalten and Crossmodal Scene Perception
dependent on the local analysis of the parts of the display (cf.
U llman, 1979), or whether instead it depends on the global eldlike aspects of the display. In their experiment, Sanabria et al.
varied the perceived direction (horizontal vs. vertical) of visual
apparent motion (see Figure 9). They demonstrated that when
the two visual stimuli giving rise to horizontal local motion were
embedded within an array of four lights, giving rise to the global
perception of vertical apparent motion (see the 4 lights conditions
shown in Figure 9B), the in uence of any local visual motion
information
was
reduced
signi cantly
(mean
crossmodal
congruency effect of 23% ) as compared to the more typical
2 lights conditions (see Figure 9A ) where a mean crossmodal
dynamic capture effect of 36% was observed.
In a subsequent study, Sanabria et al. (2005a ; E xperiment 2)
were able to demonstrate a similar dominance of global
eld
effects over local visual apparent motion when the two were pitted
directly against each other (see Figure 10). In this experiment,
the addition of the two extra light
ashes arranged horizontally
induced the impression of a group of two lights moving in one
direction, while the central two lights of the compound appeared
to move, if inspected in isolation, in the opposite direction. In
fact, the global percept actually served to reverse the perceived
direction of apparent motion of the 2 central lights : If the local
motion of the two central lights was from left-to-right, then the
global motion of the 4 lights display was from right-to-left instead
(see Figure 10B). O nce again, the crossmodal dynamic capture
effect was signi cantly larger in the 2 lights condition than in the
4 lights condition (mean crossmodal congruency effect of 35%
versus 15% , respectively). 7) Moreover, the results showed that
the presentation of the additional 2 lights at either side of the
display reversed the direction of visual apparent motion (as
Figure 10 Schematic illustration of four of the different trial types presented in
Sanabria et al.’s (2004b) study of the modulatory effect of visual perceptual
grouping on the crossmodal dynamic capture effect. The four conditions shown
here result from the crossing of the factors of congruency (incongruent versus
congruent trials ; note that congruency is de ned with respect to the global
direction of apparent motion in the display, as indicated by the horizontal arrows in
the gure) and the number of visual stimuli (2 versus 4 lights). The magnitude of
the crossmodal dynamic capture effect was signi cantly greater in the 2 lights
displays than in the 4 lights displays. G iven that congruency in this study was
de ned with respect to the global motion of the visual display, these results show
that the crossmodal interactions are being driven by the global visual apparent
motion rather than by the local motion of the middle two lights (which was actually
in the reverse direction).
de ned at the local level by the central 2 lights of the entire
display), and that it was the global direction of apparent motion of
the visual display that determined the perceptual organization
taking place within the auditory modality. (Note here, once again,
that as there were 4 lights and only 2 sounds, it should have been
easier to segregate them from the two sounds than in the 2 lights
displays, where the same number of stimuli were presented in
each modality.)
Intersensory G estalten and Crossmodal Scene Perception
The results of Sanabria et al.’s (2004b, c ; 2005a) recent
studies therefore support the view that crossmodal perceptual
organization involves a complex interplay between temporallyoverlapping intramodal and crossmodal grouping processes. This
conclusion might at
rst glance appear to stand in contrast to the
claim made by a number of researchers regarding the putatively
early’ nature of intramodal perceptual grouping (e.g., Francis &
G rossberg, 1996). A ccording to such a view, one might predict
that
intramodal
perceptual
grouping
precedence over crossmodal grouping.
should
normally
take
H owever, the fact that
under the appropriate conditions both unimodal and crossmodal
grouping
compete
to
determine
crossmodal
perceptual
organization is consistent with more recent evidence showing that
certain forms of intramodal perceptual grouping (at least within
the visual modality) appear to occur much later in information
processing than had been traditionally thought (e.g., see Beck &
Palmer, 2002 ; Palmer, 2002 ; Palmer, Brooks, & Nelson, 2003 ;
see also Wertheimer’s, 1923/1938, pp. 79 80, description of the
higher level visual grouping principle, known as E instellung, or
objective set, here). What’s more, recent neuroscience research
has revealed that crossmodal interactions can actually take place
very early in information processing (i.e., within 40 ms of stimulus
onset ; e.g., see Fu, Johnston, Shah, A rnold, Smiley, H ackett,
G arraghty,
&
Schroeder,
2003 ; G iard
&
Peronnet,
1999 ;
Murray, Molholm, Michel, H eslenfeld, R itter, Javitt, Schroeder,
& Foxe, 2004). Furthermore, direct anatomical connections have
now also been shown to exist between what were formally
considered to be strictly unimodal’ cortical processing areas, thus
providing a potential neural substrate for very early crossmodal
interactions (e.g., Falchier, Clavagnier, Barone, & Kennedy,
2003 ; R ockland & O jima, 2003). When this information is taken
together with the fact that the relative strength of the various
intramodal grouping principles also varies (e.g., Kubovy, 1981),
one might expect that the extent to which unimodal perceptual
grouping will dominate over crossmodal perceptual grouping in
any given situation may well depend (at least to some extent) on
the nature of the perceptual grouping processes that are being
deployed intramodally versus crossmodally. 8)
7)
Crossmodal in uences on temporal organization
O ne other area of research that is very relevant to the topic of
crossmodal perceptual organization relates to the
temporal
structure of stimuli and stimulus sequences. O ver the years, a
number of studies have shown that the perceived rate of
stimulation in one sensory modality can be modulated by the rate
of stimulus presentation in another modality (e.g., G ebhard &
Mowbray, 1959 ; G rignolo, Boles-Carenini, & Cerri, 1954 ; Knox,
1945a, b ; Kravkov, 1939 ; London, 1954 ; Maier, Bevan, &
Behar, 1961 ; R egan & Spekreise, 1977 ; Shipley, 1964 ; Von
Schiller, 1932 a b). In particular, research has shown that the rate
at which a rapidly alternating visual stimulus appears to be
ickering can be modulated quite dramatically by changes in the
rate at which a simultaneously-presented auditory stimulus is made
to utter. For example, participants in Shipley’s classic study had
to judge the rate at which a sound appeared to utter or, at other
times, to judge the rate at which a light source appeared to
icker. Shipley reported that changing the physical rate of utter
of a clicking sound induced a systematic change in the apparent
rate at which a
ashing light was simultaneously seen to
icker.
Indeed, for one of Shipley’s observers, a visual stimulus that was
actually presented at a
icker rate of 10 cycles per second was
Intersensory G estalten and Crossmodal Scene Perception
reported at different times to be
ickering at anything between 7
and 22 cycles per second depending on the rate of
simultaneously-presented auditory stimulus.
utter of the
O ne generalization
that has emerged from these studies has been that auditory utter
typically has a much greater in uence on judgments of perceived
visual
icker than vice versa.
H owever, it is perhaps worth pausing at this point to take
note of the fact that many of these studies presented stimuli at
repetition rates close to the
icker/ utter-fusion threshold. The
icker/ utter fusion threshold (sometimes known as the critical
icker frequency or the critical utter frequency) is de ned as the
frequency at which a clicking sound appears steady, while the
critical
icker frequency threshold is de ned as the frequency at
which
a
ashing
light
appears
steady
(i.e.,
it
appears
indistinguishable from a continuously illuminated light). A s such,
it could be argued that the results of many of these earlier studies,
while interesting, may tell us more about crossmodal interactions
in the perception of a speci c stimulus attribute, rather than
necessarily anything about crossmodal in uences in perceptual
organization per se (cf. Kubovy, 1981, p. 83, for his distinction
between micro time and event time). It is therefore important to
note that very similar results have also now been reported in other
studies when the rate of stimulus presentation was much lower
(e.g., G uttman, G ilroy, & Blake 2005 ; Kitagawa & Wada, 2004 ;
R ecanzone, 2003 ; Wada, Kitagawa, & Noguchi, 2003 ; Welch et
al., 1986). U nder conditions where stimuli are presented at rates
that are slow enough for participants to individuate the elements
in the sequence, it would appear that the results become more
directly relevant to issues of crossmodal perceptual organization.
Participants in one such representative study by Wada et al.
(2003) had to judge the rate of change of a stream of brie y-
presented visual stimuli (the rate of presentation of stimuli in the
visual
ash train could either increase or decrease, with 11 brief
stimuli being presented within a time window of 2050 ms) while
attempting to ignore a train of distractor tones whose rate of
presentation was either increasing or decreasing.
Wada et al.
showed that the rate of change of temporal structure in the
distractor modality (audition in this case) in uenced participants’
judgments of the rate of change of the visual stimuli, at least when
the rate of change of stimulus presentation in the target modality
was ambiguous. A similar pattern of results was also reported in
the reverse direction, that is, judgments of the change in the rate
of presentation of a stream of auditory stimuli were also
in uenced by the presence of an irrelevant visual distractor
sequence. A t
rst glace, results such as these would appear to
support the claim that in the temporal domain, just as for the
other areas that have been outlined earlier, the perceptual
organization of stimuli in one sensory modality can in uence the
perceptual organization of the stimuli in another. H owever, as
the authors themselves recognize, it is unclear to what extent
these results should be taken as re ecting a genuine crossmodal
in uence on the temporal aspects of perceptual organization
versus simply re ecting a response bias induced by the presence of
the distractor stimuli on trials where participants were uncertain of
the rate of change of stimulus presentation in the target modality
(see also Noguchi & Wada, 2004).
More convincing evidence for the existence of genuinely
perceptual crossmodal in uences in the temporal aspects of
perceptual organization come from recent research on the twoash illusion (e.g., A ndersen, Tiippana, & Sams, 2005 ; Shams et
al., 2000 ; Shams, Kamitani, & Shimojo, 2002). In a prototypical
study, participants are presented with a rapid train of 1 4 ashes
Intersensory G estalten and Crossmodal Scene Perception
in the periphery, and have simply to report the number of ashes
that they see.
A t the same time, a distractor stream of 1 4
auditory stimuli may be presented. The surprising nding to have
emerged from a number of such studies is that, in one- ash trials,
participants report having seen two lights whenever 2 or more
beeps are presented auditorily.
What’s more, the illusion is
asymmetrical in that the auditorily induced
ssion of a unitary
visual event has been shown to occur far less readily when people
have to report the number of beeps while being presented with a
sequence of ashes (see also Shipley, 1964 ; though see A ndersen,
Tiippana, & Sams, 2004). O ther researchers have also shown that
this ssion illusion, initially reported with sounds and lights, also
occurs between touch and both audition and vision (e.g.,
Bresciani, E rnst, D rewing, Bouyer, Maury, & Kheddar, 2005 ;
H olmes, Sanabria, Calvert, & Spence, 2006 ; H otting & R oder,
2004 ; Violentyev, Shimojo, & Shams, 2005).
perceptual nature
of the
two- ash
illusion
The genuinely
has also
been
demonstrated in subsequent research using signal detection theory
(e.g., Violentyev et al., 2005 ; see also Berger, Martelli, & Pelli,
2003). The extant research on the temporal aspects of perceptual
organization therefore converges with the
ndings reported in the
earlier sections of this review in showing that the perceptual
organization of the stimuli presented in one sensory modality can
have an in uence on the perceptual organization of the temporal
structure of stimuli presented in another sensory modality. What’s
more, this research also appears to highlight the importance of
stimulus structure in constraining the nature of such crossmodal
interactions.
In particular, it would appear that the modality
carrying the signal that is more discontinuous (and hence possibly
more salient) becomes the in uential, or modulating, modality
(Shimojo & Shams, 2001 ; cf. Shipley, 1964).
8)
Selective attention and crossmodal scene perception
Before concluding, it is important to consider brie y what role, if
any, selective attention might play in constraining crossmodal
perceptual
organization
(cf.
Kahneman,
1973,
chapter
5;
Kahneman & H enik, 1981 ; see also Knox, 1945b ; Pomerantz,
1981). Several studies have reported that focused attention, no
matter whether directed to a particular sensory modality or to a
particular spatial location, can in uence the perceptual organization taking place within a given sensory modality. For example,
Carlyon and colleagues (Carlyon, Plack, & Cusack, 2001 ;
Carlyon,
Plack,
Fantini,
&
Cusack,
2003 ; Cusack,
D eeks,
A ikman, & Carlyon, 2004) have shown that auditory stream
segregation is impaired if a participant’s attention is directed
toward the visual modality in order to perform an attentiondemanding monitoring task during the presentation of the auditory
stimuli. Similarly, Soto-Faraco and colleagues (Soto-Faraco et al.,
2003) have shown that under certain conditions,
directing
attention to a particular stream of sensory information can
modulate the size of the crossmodal dynamic capture effect (see
also Toro, Sinnett, & Soto-Faraco, 2005, for the effects of
diverting attention to the visual modality on the perceptual
learning of auditory sequences).
While most crossmodal dynamic capture studies have failed to
show any in uence of auditory distractors on visual direction-ofmotion discrimination responses, Soto-Faraco et al. (2003) were
able to demonstrate such an effect under conditions where
participants
simultaneously
had
to
perform
a
demanding
monitoring task. The participants in this study had to monitor
one of two centrally-presented streams (one auditory and the
other visual) of rapidly-presented stimuli in order to detect
occasionally-presented target stimuli. The participants also had to
Intersensory G estalten and Crossmodal Scene Perception
respond to the direction of visual or auditory apparent motion
displays (while ignoring the apparent motion stimuli presented in
the other modality) in certain trials presented infrequently at
unpredictable moments during the primary monitoring task.
U nder such conditions, Soto-Faraco et al. observed a signi cant
crossmodal dynamic capture effect from the auditory apparent
motion distractors on visual direction-of-motion discrimination
responses (a mean crossmodal capture effect of 22% ). A similar
modulation of the crossmodal dynamic capture effect was also
observed on auditory direction-of-motion discrimination responses
(though note that there was no interaction between the modality
of the central monitoring task and the target modality for the
motion discrimination task). Soto-Faraco et al.’s results therefore
show that focused attention can modulate crossmodal perceptual
grouping as indexed by performance in the crossmodal dynamic
capture task.
Sanabria, Soto-Faraco, and Spence (in press) have recently
extended this line of research to show that the spatial orienting of
attention can also have a robust in uence of the perceptual
organization of crossmodal scenes.
They showed that the
magnitude of the audiovisual crossmodal dynamic capture effect
could be reduced by as much as 8.5% if a participant’s attention
was endogenously (i.e., voluntarily) directed to the location from
which the auditory and visual apparent motion stimuli were to be
presented as compared to conditions in which the participant’s
attention had been directed elsewhere. A n even more impressive
modulation of the crossmodal dynamic capture effect was seen
under conditions where the participant’s attention was directed to
the peripheral location exogenously (i.e., automatically) by means
of the presentation of spatially-nonpredictive peripheral visual
cues instead (the mean reduction in the magnitude of the
crossmodal dynamic capture effect under such conditions was
20% ).
Sanabria et al.’s (in press) results contrast markedly with
previous evidence showing that spatial attention does not appear
to in uence crossmodal integration in the classic ventriloquism
effect, at least as assessed by determining the perceived location
of a stationary auditory event in the presence of an irrelevant
visual distractor (see Bertelson & de G elder, 2004, for a recent
review). O ne explanation for Sanabria et al.’s results has to do
precisely with the interplay between within- and cross-modal
perceptual organization processes being modulated by attention,
something that is less likely to occur when single events are
presented, such as in the prototypical version of the ventriloquism
illusion. The explanation for these counterintuitive
ndings may
be that spatial attention actually helps to segregate different
streams of sensory information, thereby weakening the in uence
of the perceptual organization in one modality on the organization
of perceptual experience in the other sensory modality. Sanabria
et al.’s results can therefore be seen as providing empirical
support for the view that stream segregation does not always occur
prior to attentional selection, at least for the case of crossmodal
perceptual organization
(see
Bregman
&
R udnicky,
1975 ;
Kahneman, 1973 ; Kahneman & H enik, 1981). Instead, it would
appear that focused attention, nomatter whether directed spatially
or to a particular sensory modality, can constrain the process of
crossmodal perceptual organization.
9)
Intersensory Gestalten
G ilbert (1938, 1941) introduced the term inter-sensory G estalten
to account for (or at least to describe) crossmodal interactions of
the type outlined in the present chapter. While the term itself
Intersensory G estalten and Crossmodal Scene Perception
seems more appealing than Z apparoli and R eatto’s (1969, pp.
266 267) G estalten of G estalten’, it is important to note that there
are at least two quite distinct phenomena that could be described
under
such
a
heading.
O ne
relatively
uncontroversial
interpretation of the term is to use it to refer to situations in
which the organizational structure of stimuli in one sensory
modality can be shown to in uence the perceived organization of
stimuli presented in another modality (e.g., as in the studies of
O ’Leary & R hodes, 1984 ; and Soto-Faraco et al., 2002). This
interpretation is consistent with G ilbert’s (1938) use of the term to
describe the in uence of sequentially presented auditory stimuli on
the perception of visual apparent motion between two discrete
light sources. It is also consistent with G ilbert’s (1941) discussion
of the fact that
properties.
we m ust also reck on with the total field
T his involves the superim position of one pattern of
stim ulation upon a heterom odal pattern, with a resulting new
com plex
inter-sensory G estalt
in which the properties of the
original patterns are m odi ed.’ (see G ilbert, 1941, p. 401). Note
here that the stress is on the modi cation of the original sensory
(i.e., unimodal) patterns, rather than on the generation of a new
intersensory pattern.
perceptual
organization
A ccording to
takes
place
G ilbert’s interpretation,
within
each
modality
individually, while still allowing for the fact that the perceptual
organization taking place in one modality can in uence the
perceptual organization occurring in another sensory modality.
H owever, a second and more controversial interpretation of
the term intersensory G estalten would be to take it to imply the
existence of some kind of multisensory organization (or structure ;
what A llen & Kolers, 1981, p. 1318, described as a com m on or
suprasensory organiz ing principle’) that is not present (and/or
could not be perceived) by considering either of the component
sensory modalities individually. Paul Schiller (1935, p. 468) seems
to have been getting at something like this when he argued that
Such configurational tendencies can com e not only from the sam e
but also from different
heterosensorial’ fields.
T hat is what
happens in these experim ents. A perception is produced by sensations of different m odalities, which often create intersensorial
patterns.’ H owever, it is important to note that other researchers
have argued that such intersensory patterns do not exist : For
example, Fraisse (1963, p. 73) points out in his book The psychology of time’ that A succession of sounds and lights will never
allow perception of an organiz ation which integrates the two.
T here will be perception of a double series, one of sounds and one
of lights.’
To date, empirical evidence that can be taken to support the
existence of genuine intersensory G estalten is weak. O ne source
of evidence would come from the emergence of an inter-sensory
pattern of organization in experiments where the stimuli were
presented at different rates in each modality. H owever, G ebhard
and Mowbray (1959, p. 523) report, albeit anecdotally, failing to
observe any such phenomenon in their study of the auditory
driving of visual
icker (though see G uttman et al., 2005,
Footnote 3, p. 234, for subjective reports from participants
claiming to experience complex rhythmic G estalts combining both
auditory and visual inputs). Similarly, we are aware of no other
evidence supporting the emergence of intersensory rhythms (cf.
H andel & Buffardi, 1968, 1969). A second line of support for the
existence of genuinely intersensory G estalten would be provided
by a convincing demonstration of the existence of crossmodal (or
intermodal) apparent motion (i.e., the perception of apparent
motion emerging from the sequential presentation of static stimuli
in different sensory modalities and from different spatial locations
Intersensory G estalten and Crossmodal Scene Perception
at the appropriate temporal interval).
H owever, while the
perception of apparent motion within individual sensory modalities
has been widely explored and documented since the seminal study
of Wertheimer (1912 ; e.g., see Burtt, 1917a, b ; Kolers, 1972 ;
Strybel & Vatakis, 2005), far more controversy surrounds the
possible existence of crossmodal (or intermodal ; A llen & Kolers,
1981 ; Z apparoli & R eatto, 1969) apparent motion.
Intermodal apparent motion might be expected to occur when
two or more stationary stimuli of different sensory modalities are
brie y presented from distinct spatial locations at the appropriate
interstimulus interval. U nder such conditions, certain researchers
have reported that people can indeed experience some weak form
of apparent motion. E arly research, based on subjective reports,
suggested the existence of crossmodal apparent motion between
all possible combinations of auditory, visual and tactile stimuli
(G alli, 1932 ; Z apparoli & R eatto, 1969). Z apparoli and R eatto
(p. 262) describe the experience of intermodal apparent movement
between auditory and visual stimuli as som ething that m oves
between the sound and the light or between the light and the sound,
a light and sound tunnel which grows longer and shorter, or a light
tunnel which grows longer and shorter while a sound passes
through it.’ More recently, H arrar, Winter, and H arris (2005)
described the percept of apparent motion that was observed
subjectively following the presentation of a visual and a tactile
stimulus from different locations as feeling lik e an event at one
location causing an event at another’.
H owever, it is important to note that the weight of evidence
from more recent research that has subjected the putative effect to
more robust empirical investigation has mostly failed to provide
any support for the phenomenon (e.g., A llen & Kolers, 1981 ;
Sanabria et al., 2005 ; though see also H arrar et al., 2005).
Therefore, at present, there appears to be little convincing
evidence
to support the
existence
of genuine
intersensory
G estalten, if what is meant by the term is patterns of crossmodal
perceptual organization that rely for their existence on stimulation
in more than one sensory modality, and which are not also present
in their constituent sensory modalities.
10)
Conclusions
Taken together, the research highlighted in the present chapter
demonstrates just how profoundly multisensory integration (or
crossmodal perceptual grouping) can be in uenced by the nature
of any intramodal perceptual grouping that may be taking place at
the same time. O ver the last 75 years, a number of different
studies, using a wide variety of different experimental paradigms,
have provided a wealth of empirical evidence to show just how
profoundly intramodal perceptual grouping in uences the nature
and magnitude of any crossmodal interactions. We have argued
that a meaningful distinction can be drawn between a number of
different questions that have been addressed by the researchers in
this area : O ne question that has been tackled by several
researchers relates to investigating the extent to which variations
in the perceptual organization/segregation taking place within one
sensory modality can affect the perceptual organization of stimuli
presented within a different sensory modality (e.g., Maass, 1939 ;
O ’Leary & R hodes, 1984 ; Soto-Faraco et al., 2002 ; note that
research on this issue is perhaps most directly related to the
Christmas tree lights illusion with which we started this chapter).
Meanwhile, other researchers have directed their efforts instead at
determining whether the magnitude of any crossmodal binding/
interaction/grouping taking place between stimuli presented in
Intersensory G estalten and Crossmodal Scene Perception
different sensory modalities is affected by variations in the
strength of the intramodal perceptual grouping cues that may be
available in the scene/display (e.g., Vroomen & de G elder, 2000 ;
Watanabe & Shimojo, 2001a).
The most recent research has
tended to focus on assessing the extent to which crossmodal
perceptual organization is dependent upon the global versus local
organization of the constituent unimodal displays (e.g., Sanabria
et al., 2004b, c ; 2005a).
Finally, several researchers have
successfully demonstrated that selective attention can also affect
crossmodal perceptual organization (e.g., see Carlyon et al., 2001,
2003 ; Sanabria et al., in press a ; Soto-Faraco et al., 2003).
This growing body of empirical research helps to emphasize
the importance of considering perceptual organization as a
crossmodal (or multisensory) phenomenon. H owever, it is also
important to note that the existence of such robust crossmodal
organizational in uences (or intersensory G estalten) does not
necessarily imply the existence of any genuinely intersensory (or
suprasensory, in A llen & Kolers, 1981, terminology) forms of
perceptual organization (see A llen & Kolers, 1981 ; G ebhard &
Mowbray, 1959 ; Fraisse, 1963 ; Sanabria et al., 2005c), although
this issue remains somewhat controversial (e.g., see G alli, 1932 ;
G uttman et al., 2005 ; H arrar et al., 2005 ; Z apparoli & R eatto,
1969). Nevertheless, the principle point remains that intramodal
perceptual grouping has been shown to exert a robust effects on
crossmodal interactions, and so this factor should be added to a
list of other factors, such as common spatial location (i.e., spatial
proximity ; Spence & D river, 2004), common timing (temporal
proximity ; Z ampini et al., 2005), common fate (e.g., Bertelson,
1999 ; Mateeff, H ohnsbein, & Noack, 1985), the unity assumption
(Vatakis & Spence, in press ; Welch, 1999), and common
temporal structure (e.g., A rmel & R amachandran, 2003 ; O ’Leary
& R hodes, 1984 ; Thomas, 1941) that have all been shown to
in uence crossmodal integration.
11)
Future research
G iven the paucity of studies conducted to date on the topic of
crossmodal perceptual organization (or scene perception) the area
offers a number of exciting and important avenues for future
research. Indeed, it is striking how many of the studies that have
been conducted to date have focused solely on the perceptual
organization of audiovisual stimulus displays. It would therefore
be interesting in future research to assess the extent to which the
principles governing crossmodal perceptual organization outlined
here extend to other pairings of sensory modalities, such as, for
example, vision and touch, or audition and touch (though see
Boernstein, 1955 1956, pp. 212 213). 9) Indeed, Phillips-Silver and
Trainor (2005) have recently published developmental evidence
regarding the crossmodal in uence of vestibular/proprioceptive
bouncing movement cues on 7 month-old infants organization of
auditory sequences, speci cally on their perception of auditory
rhythm in ambiguous rhythm patterns (i.e.,
those without
accented beats). While a few studies have started to investigate
the visual-tactile modality pairing (e.g., Churchland et al., 1994 ;
H olmes, Sanabria, Calvert, & Spence, 2006 ; Lyons et al., 2006 ;
Sanabria et al., 2005c ; Shimojo & Shams, 2001 ; Violentyev et
al., 2005), there may be particularly good grounds for investigating any crossmodal interactions between the auditory and
tactile modalities, given the greater similarity in the nature of the
physical signals that are transduced by these two sensory systems
(e.g., see Kitagawa & Spence, 2006 ; von Bekesy, 1957, 1959 ;
Mahar, Mackenzie, & McNicol, 1994 ; Mowbray & G ebhard,
Intersensory G estalten and Crossmodal Scene Perception
1957 ; and Bresciani et al., 2005 ; H otting & R oder, 2004, for
early empirical evidence from the audiotactile version of the two
ash illusion ; cf. Sherrick, 1976). A s such, one might expect the
audiotactile pairing to provide one of the best opportunities for
contriving a situation in which grouping-by-similarity might be
more likely to occur crossmodally than intramodal grouping-byproximity (cf. Mahar et al., 1994 ; H andel & Buffardi, 1968, p.
1028), a situation that it has thus far proven impossible to achieve
using only audiovisual displays (see Figure 11).
Figure 11 A n example of a visual display in which the organizational principles of
grouping-by-proximity and grouping-by-similarity have been put into con ict
(based on Kubovy et al., 1998, Figure 1D ). It remains an interesting question for
future research to determine whether grouping-by-proximity could ever dominate
over grouping-by-similarity in crossmodal perceptual organization. To date, the
evidence suggests that grouping-by-similarity (in terms of stimulation from the
same sensory modality) will always dominate over grouping-by-proximity in
crossmodal scene perception. H owever, given the greater similarity between
audition and touch than between the other sensory modalities (e.g., see von
Bekesy, 1957, 1959 ; Mahar et al., 1994 ; Mowbray & G ebhard, 1957), this pairing
of sensory modalities would appear to represent perhaps the best opportunity for
achieving such a result experimentally.
A second area where further research is needed in order to
better understand the rules governing crossmodal perceptual
organization relates to the role of spatial factors.
G iven the
present context, the question is really one of whether the in uence
of the perceptual organization of stimuli in one sensory modality
on the perceptual organization of stimuli presented in another
modality is spatially-modulated or not. A t present, we do not
have a clear answer to this question. For while some researchers
have demonstrated signi cant effects of relative spatial position on
certain crossmodal effects (e.g., Soto-Faraco et al., 2002 ; see also
H olmes et al., 2006 ; Mays & Schirillo, 2005 ; Meyer et al.,
2005), others have reported no such spatial modulation of
crossmodal phenomena such as the auditory driving of visual
temporal rate perception (e.g., R ecanzone, 2003 ; R egan &
Spekreijse, 1977 ; Welch, D uttonH urt, & Warren, 1986), or the
temporal ventriloquism illusion (Vroomen & Keetels, 2006).
D etermining the importance of spatial factors to crossmodal
perceptual organization is made all the more dif cult by the fact
that many of the previous studies in this area failed to report
whether or not the auditory and visual stimuli were presented
from the same position in their experiments (e.g., see O ’Leary &
R hodes, 1984 ; Vroomen & de G elder, 2000).
That is, it is
unclear whether the auditory stimuli in these studies were
presented from the computer’s internal loudspeaker (i.e., from
approximately the same position as the visual stimuli) or from
headphones (i.e., from different positions ; though see Z ampini,
Shore, & Spence, 2003).
It will therefore be a particularly
interesting challenge for future research to try and determine the
conditions under which such spatial modulation of crossmodal
perceptual organization takes place (see Spence, in press). O ne
possibility here is that spatial co-localization may have a more
pronounced in uence on crossmodal scene perception under
conditions where some kind of spatial processing is required,
either explicitly or implicitly (see Spence & McD onald, 2004 ;
though see also H olmes et al., 2006).
Intersensory G estalten and Crossmodal Scene Perception
A third area where additional research would be merited
relates to an assessment of the importance of synaesthetic
correspondence in modulating crossmodal perceptual organization
effects such as those reported here (see G allace & Spence, 2006 ;
Marks, 2004).
To take but one example, in O ’Leary and
R hodes’s (1984) study, the high auditory frequency stimuli were
paired with the visual stimuli in the upper screen locations in the
bimodal stimulation conditions, thus potentially utilizing any
synaesthetic correspondence between elevation/height (of space vs.
frequency) that may exist between the two sensory modalities (see
Pratt, 1930 ; R of er & Butler, 1968). A s O ’Leary and R hodes
themselves noted more than 20 years ago, it is an open question
as to whether similar results would have been found if the high
(spatial) visual stimuli had been paired with the low (frequency)
tones (see also Shipley, 1964, p. 1328, for evidence that
synaesthetic correspondences between auditory and visual stimuli
might modulate the auditory driving of visual
icker).
Finally, having provided a range of empirical evidence to
demonstrate that the perceptual grouping taking place within a
given sensory modality does indeed affect crossmodal scene
perception (both on the perception of individual stimuli, as well as
the perceptual organization of groups of stimuli, presented in
another sensory modality) it remains a critical issue for future
research to try and quantify more precisely the exact nature of
these interactions between intramodal and crossmodal perceptual
organization. For, as H ochberg (1974, p. 204) so elegantly put it,
when summarizing the literature on unimodal visual perceptual
organization more than 30 years ago : T he G estalt explanation of
perceptual organiz ation m ust be regarded as a
rst stage in an
evolving form ulation of both problem and solution, neither a closed
issue nor a successful theory.’
So, having demonstrated the
interaction
between
intramodal
and
crossmodal
grouping
principles in multisensory perceptual organization, future studies
will increasingly need to develop more quantitative rules that can
predict the relative strength of intramodal versus crossmodal
perceptual grouping under a range of different experimental
conditions (see H ochberg, 1974 ; H ochberg & Silverstein, 1956 ;
Kubovy, H olcombe, & Wagemans, 1998 ; and O yama, 1961, for
previous attempts to assess the relative strength of grouping-bysimilarity and grouping-by-proximity in unimodal visual displays).
It will only be by moving forward from the merely descriptive to
an account of crossmodal scene perception that is genuinely
predictive that future research on intersensory G estalten will
provide a genuinely useful contribution to our understanding of
the laws (rather than rules) governing the multisensory integration
of sensory information.
References
A ksentijevic, A ., E lliott, M. A ., & Barber, P. J. 2001 D ynamic of
perceptual grouping : Similarities in the organization of visual and
auditory groups. V isual Cognition, 8, 349 358.
A lais, D ., & Lorenceau, J. 2002 Perceptual grouping in the Ternus
display : E vidence for an association eld in apparent motion.
V ision R esearch, 42, 1005 1016.
A ndersen, T. S., Tiippana, K., & Sams, M. 2004 Factors in uencing
audiovisual ssion and fusion illusions. Cognitive B rain R esearch,
21, 301 308.
A ndersen, T. S., Tiippana, K., & Sams, M. 2005 Maximum likelihood
integration of rapid ashes and beeps. N euroscience L etters, 380,
155 160.
A rmel, K. C., & R amachandran, V. S. 2003 Projecting sensations to
external objects : E vidence from skin conductance response.
Intersensory G estalten and Crossmodal Scene Perception
Proceedings of the R oyal Society B , 270, 1499 1506.
Beck, D . M., & Palmer, S. E . 2002 Top-down in uences on
perceptual grouping. Journal of E xperim ental Psychology : H um an
Perception and Perform ance, 28, 1071 1084.
Berger, T. D ., Martelli, M., & Pelli, D . G . 2003 Flicker utter : Is an
illusory event as good as the real thing ? Journal of V ision, 3,
406 412.
Bertelson, P. 1999 Ventriloquism : A case of crossmodal perceptual
grouping. In G . A shersleben, T. Bachmann, & J. Musseler (E ds.),
Cognitive contributions to the perception of spatial and tem poral
events (pp. 347 362). E lsevier Science, BV : A msterdam.
Bertelson, P., & de G elder, B. 2004 The psychology of multimodal
perception. In C. Spence & J. D river (E ds.), Crossm odal space
and crossm odal attention (pp. 141 177). O xford : O xford
U niversity Press.
Bertenthal, B. I., Banton, T., & Bradbury, A . 1993 D irectional bias
in the perception of translating patterns. Perception, 22, 193 207.
Boernstein, W. S. 1955 1956 Classification of the human senses. Y ale
Journal of B iology and M edicine, 28, 208 215.
Bregman, A . S. 1990 A uditory scene analysis : T he perceptual
organiz ation of sound. Cambridge, MA : MIT Press.
Bregman, A . S., & A chim, A . 1973 Visual stream segregation.
Perception & Psychophysics, 13, 451 454.
Bregman, A . S., & Campbell, J. 1971 Primary auditory stream
segregation and perception of order in rapid sequences of tones.
Journal of E xperim ental Psychology, 89, 244 249.
Bregman, A . S., & D annenbring, G . L. 1973 The effect of continuity
on auditory stream segregation. Perception & Psychophysics, 13,
308 312.
Bregman, A . S., & R udnicky, A . I. 1975 A uditory segregation :
Stream or streams ? Journal of E xperim ental Psychology : H um an
Perception and Perform ance, 1, 263 267.
Bresciani, J. P., E rnst, M. O ., D rewing, K., Bouyer, G ., Maury, V.,
& Kheddar, A . 2005 Feeling what you hear : A uditory signals
can modulate tactile tap perception. E xperim ental B rain R esearch,
162, 172 180.
Burtt, H . E . 1917a A uditory illusions of movement A preliminary
study. Journal of E xperim ental Psychology, 2, 63 75.
Burtt, H . E . 1917b Tactile illusions of movement. Journal of
E xperim ental Psychology, 2, 371 385.
Bushara, K. O ., H anakawa, T., Immisch, I., Toma, K., Kansaku, K.,
& H allett, M. (2002). Neural correlates of cross-modal binding.
N ature N euroscience, 6, 190 195.
Calvert, G . A ., Spence, C., & Stein, B. E . (E ds.) 2004 T he handbook
of m ultisensory processes. Cambridge, MA : MIT Press.
Carlyon, R . P. 2004 H ow the brain separates sounds. T rends in
Cognitive Sciences, 8, 465 471.
Carlyon, R . P., Plack, C. J., & Cusack, R . 2001 Cross-modal and
cognitive in uences on the build-up of auditory streaming. B ritish
Journal of A udiology, 35, 139 140.
Carlyon, R . P., Plack, C. J., Fantini, D . A ., & Cusack, R . 2003
Cross-modal and non-sensory in uences on auditory streaming.
Perception, 32, 1393 1402.
Churchland, P. S., R amachandran, V. S., Sejnowski, T. J. 1994 A
critique of pure vision. In C. Koch & J. L. D avis (E d.), L argescale neuronal theories of the brain (pp. 23 60). Cambridge, MA :
MIT Press.
Corbin, H . H . 1942 The perception of grouping and apparent
movement in visual depth. A rchives of Psychology, 273, 1 50.
Correa, A ., Sanabria, D ., Spence, C., Tudela, P., & Lupianez, J. 2006
Selective temporal attention enhances the temporal resolution of
visual perception : E vidence from a temporal order judgment task.
B rain R esearch, 1070, 202 205.
Craig, J. C. 2006 Visual motion interferes with tactile motion
perception. Perception, 35, 351- 367.
Cusack, R ., D eeks, J., A ikman, G ., & Carlyon, R . P. 2004 E ffects of
location, frequency region, and time course of selective attention
on auditory scene analysis. Journal of E xperim ental Psychology :
H um an Perception & Perform ance, 30, 643 656.
D river, J., & Spence, C. 2000 Multisensory perception : Beyond
modularity and convergence. Current B iology, 10, R 731 R 735.
E cker, A . J., & H eller, L. M. 2005 A uditory-visual interactions in the
perception of a ball’s path. Perception, 34, 59 75.
Falchier, A ., Clavagnier, S., Barone, P., & Kennedy, H . 2003
A natomical evidence of multimodal integration in primate striate
cortex. Journal of N euroscience, 22, 5749 5759.
Intersensory G estalten and Crossmodal Scene Perception
Fraisse, P. 1963 T he psychology of tim e. London : H arper & R ow.
Francis, G ., & G rossberg, S. 1996 Cortical dynamics of form and
motion integration : Persistence, apparent motion, and illusory
contours. V ision R esearch, 36, 149 173.
Fu, K. M. G ., Johnston, T. A ., Shah, A . S., A rnold, L., Smiley, J.,
H ackett, T. A ., G arraghty, P. E ., & Schroeder, C. E . 2003
A uditory cortical neurons respond to somatosensory stimulation.
Journal of N euroscience, 23, 7510 7515.
G allace, A ., & Spence, C. 2006 Multisensory synesthetic interactions
in the speeded classi cation of visual size. Perception &
Psychophysics, 68, 1191 1203.
G alli, P. A . 1932 U ber mittelst verschiedener Sinnesreize erweckte
Wahrnehmung von Scheinbewegungen [O n the perception of
apparent motion elicited by different sensory stimuli]. A rchiv fur
die gesam te Psychologie, 85, 137 180.
G ebhard, J. W., & Mowbray, G . H . 1959 O n discriminating the rate
of visual icker and auditory utter. A m erican Journal of
Psychology, 72, 521 528.
G iard, M. H ., & Peronnet, F. 1999 A uditory-visual integration during
multimodal object recognition in humans : A behavioral and
electrophysiological study. Journal of Cognitive N euroscience, 11,
473 490.
G ilbert, G . M. 1938 A study in inter-sensory G estalten. Psychological
B ulletin, 35, 698.
G ilbert, G . M. 1939 D ynamic psychophysics and the phi phenomenon. A rchives of Psychology, 237, 5 43.
G ilbert, G . M. 1941 Inter-sensory facilitation and inhibition. Journal
of G eneral Psychology, 24, 381 407.
G rignolo, A ., Boles-Carenini, B., & Cerri, S. (1954). R esearches on
the in uence of acoustic stimulation upon the critical fusion
frequency of light stimulation. R ivista O to-N euro-O ftalm ologia, 29,
56 73.
G uttman, S. E ., G ilroy, L. A ., & Blake, R . 2005 H earing what the
eyes see : A uditory encoding of visual temporal sequences.
Psychological Science, 16, 228 235.
H andel, S., & Buffardi, L. 1968 Pattern perception : Integrating
information presented in two modalities. Science, 162, 1026 1028.
H andel, S., & Buffardi, L. 1969 U sing several modalities to perceive
one temporal pattern. Q uarterly Journal of E xperim ental
Psychology, 21, 256 266.
H arrar, V., Winter, R ., & H arris, L. 2005 M ultim odal apparent m oth
tion. Poster presented at the 6 A nnual Meeting of the Internath
tional Multisensory R esearch Forum, R overeto, Italy, 5 8 June.
H e, Z . J., & O oi, T. L. 1999 Perceptual organization of apparent
motion in the Ternus display. Perception, 28, 877 892.
H eise, G . A ., & Miller, G . A . 1951 A n experimental study of
auditory patterns. A m erican Journal of Psychology, 64, 68 77.
H ochberg, J. 1974 O rganization and the G estalt tradition. In E . C.
Carterette & M. P. Friedman (E ds.), H andbook of perception V ol.
1 : H istorical and philosophical roots of perception (pp. 179 210).
New York : A cademic Press.
H ochberg, J., & H ardy, D . 1960 Brightness and proximity factors in
grouping. Perceptual and M otor Sk ills, 10, 22.
H ochberg, J., & Silverstein, A . 1956 A quantitative index of stimulussimilarity : Proximity versus differences in brightness. A m erican
Journal of Psychology, 69, 456 458.
H olmes, N. P., Sanabria, D ., Calvert, G . A ., & Spence, C. 2006
Crossing the hands impairs performance on a nonspatial
multisensory discrimination task. B rain R esearch, 1077, 108- 115.
H otting, K., & R oder. B. 2004 H earing cheats touch, but less in
congenitally blind than in sighted individuals. Psychological
Science, 15, 60 64.
Intriligator, J. M. 2000 Self-synchroniz ing anim ations. U nited States
Patent 6, 163, 323.
Julesz, B., & H irsh, I. J. 1972 Visual and auditory perception A n
essay of comparison. In E . E . D avid, Jr., & P. B. D enes (E ds.),
H um an com m unication : A unified view (pp. 283 340). New York :
McG raw-H ill.
Kahneman, D . 1973 A ttention and effort. E nglewood Cliffs, NJ :
Prentice-H all.
Kahneman, D ., & H enik, A . 1981 Perceptual organization and
attention. In M. Kubovy & J. R . Pomerantz (E ds.), Perceptual
organiz ation (pp. 181 211). H illsdale, NJ : Lawrence E rlbaum
A ssociates.
Katz, D . 1925/1989 T he world of touch. H illsdale, NJ : E rlbaum.
King, A . J., & Calvert, G . A . 2001 Multisensory integration :
Intersensory G estalten and Crossmodal Scene Perception
Perceptual grouping by eye and ear. Current B iology, 11,
R 322 R 325.
Kitagawa, N., & Spence, C. 2006 A udiotactile multisensory
interactions in information processing. Japanese Psychological
R esearch, 48, 158- 173.
Kitagawa, N., & Wada, Y. 2004 Flexible weighting of auditory and
th
visual information in temporal perception. Proceedings of the 18
International Congress on A coustics, III 2289 III 2292, ICA
Kyoto, Japan.
Koffka, K. 1935 Principles of G estalt psychology. New York :
H arcourt, Brace, & World.
Kohler, W. 1929 Physical G estalten. In W. D . E llis (E d.), A source
book of G estalt psychology (pp. 17 54). London : R outledge &
Kegan Paul.
Kohler, W. 1930 G estalt psychology. London : G . Bell & Sons.
Kolers, P. A . 1972 A spects of m otion perception. New York :
Pergamon Press.
Knox, G . W. 1945a Investigations of icker and fusion : III. The
effect of auditory stimulation on the visual CFF. Journal of
G eneral Psychology, 33, 139 143.
Knox, G . W. 1945b Investigations of icker and fusion : IV. The
effect of auditory icker on the pronouncedness of visual icker.
Journal of G eneral Psychology, 33, 145 154.
Kramer, P., & Yantis, S. 1997 Perceptual grouping in space and
time : E vidence from the Ternus display. Perception &
Psychophysics, 59, 87 99.
Kravkov, S. W. 1939 Critical frequency of icker and indirect stimuli.
C. R . (D ak ) A cad. Sci. U. R . S. S., 22, 64 66.
Kubovy, M. 1981 Concurrent-pitch segregation and the theory of
indispensable attributes. In M. Kubovy & J. R . Pomerantz (E ds.),
Perceptual organiz ation (pp. 55 98). H illsdale, NJ : E rlbaum.
Kubovy, M., H olcombe, A . O ., & Wagemans, J. 1998 O n the
lawfulness of grouping by proximity. Cognitive Psychology, 35,
71 98.
Kubovy, M., & Van Valkenburg, D . 2001 A uditory and visual
objects. Cognition, 80, 97 126.
London, I. D . 1954 R esearch on sensory interaction in the Soviet
U nion. Psychological B ulletin, 51, 531 568.
Lyons, G ., Sanabria, D ., Vatakis, A ., & Spence, C. 2006 The
modulation of crossmodal integration by unimodal perceptual
grouping : A visuotactile apparent motion study. E xperim ental
B rain R esearch, 174, 510 516.
Maass, H . 1938 U ber den E in uss akusticher R hythmen auf optische
Bewegungsgestaltungen [A bout the in uence of acoustic rhythms
on visual motion]. (Sander, F. G anzheit und G estalt. Psychol.
U ntersuch. VIII) A rchiv fur die G esam te Psychologie, 100,
424 464. (1923 No. 61)
Madsen, M. C., R ollins, H . A ., & Senf, G . M. 1970 Variables
affecting immediate memory for bisensory stimuli : E ar-eye
analogue studies of dichotic listening. Journal of E xperim ental
Psychology, 83
Mahar, D ., Mackenzie, B., & McNicol, D . 1994 Modality-speci c
differences in the processing of spatially, temporally, and
spatiotemporally
distributed
information.
Perception,
23,
1369 1386.
Maier, B., Bevan, W., & Behar, I. 1961 The effect of auditory
stimulation upon the critical icker for different regions of the
visible spectrum. A m erican Journal of Psychology, 74, 67 73.
Marks, L. E . 2004 Cross-modal interactions in speeded classi cation.
In G . A . Calvert, C. Spence, & B. E . Stein (E ds.), H andbook of
m ultisensory processes (pp. 85 105). Cambridge, MA : MIT Press.
Mateeff, S., H ohnsbein, J., & Noack, T. 1985 D ynamic visual capture
: A pparent auditory motion induced by a moving visual target.
Perception, 14, 721 727.
Mays, A ., & Schirillo, J. 2005 Lights can reverse illusory directional
hearing. N euroscience L etters, 384, 336 338.
McA dams, S. 1984 The auditory image : A metaphor for musical and
psychological research on auditory organization. In W. P. Crozier
& A . J. Chapman (E ds.), Cognitive process in the perception of art
(pp. 289 323). A msterdam : North-H olland.
McA dams, S. E ., & Bregman, A . S. 1979 H earing musical streams.
Com puter M usic Journal, 3, 26 43.
McG urk, H ., & MacD onald, J. 1976 H earing lips and seeing voices.
N ature, 264, 746 748.
Metzger, W. 1934 Beobachtungen uber phanomenale Identitat
[Studies of phenomenal identity]. Psychologische Forschung, 19,
Intersensory G estalten and Crossmodal Scene Perception
1 60.
Meyer, G . F., Wuerger, S. M., R ohrbein, F., & Z etzsche, C. 2005
Low-level integration of auditory and visual motion signals requires
spatial co-localisation. E xperim ental B rain R esearch, 166, 538 547.
Michotte, A . 1963 T he perception of causality. London : Methuen.
(O riginal work published in 1946)
Miller, G . A ., & H eise, G . A . 1950 The trill threshold. Journal of the
A coustical Society of A m erica, 22, 637 638.
Mowbray, G . H ., & G ebhard, J. W. 1957 Sensitivity of the skin to
changes in rate of intermittent mechanical stimulation. Science,
125, 1297 1298.
Murray, M. M., Molholm, S., Michel, C. M., H eslenfeld, D . J., R itter,
W., Javitt, D . C., Schroeder, C. E ., & Foxe, J. J. 2004 G rabbing
your ear : A uditory-somatosensory multisensory interactions in
early sensory cortices are not constrained by stimulus alignment.
Cerebral Cortex, 15, 963 974.
O gilvie, J. C. 1956a E ffect of auditory utter on the visual critical
icker frequency. Canadian Journal of Psychology, 10, 61 68.
O gilvie, J. C. 1956b The interaction of auditory utter and CFF : The
effect of brightness. Canadian Journal of Psychology, 10, 207 210.
O ’Leary, A ., & R hodes, G . 1984 Cross-modal effects on visual and
auditory object perception. Perception & Psychophysics, 35,
565 569.
O yama, T. 1961 Perceptual grouping as a function of proximity.
Perceptual and M otor Sk ills, 13, 305 306.
Palmer, S. E . 2002 Perceptual grouping : It’s later than you think.
Current D irections in Psychological Science, 11, 101 106.
Palmer, S. E ., Brooks, J. L., & Nelson, R . 2003 When does grouping
happen ? A cta Psychologica, 114, 311 330.
Phillips-Silver, J., & Trainor, L. J. 2005 Feeling the beat : Movement
in uences infant rhythm perception. Science, 308, 1430.
Pomerantz, J. R . 1981 Perceptual organization in information
processing. In M. Kubovy & J. R . Pomerantz (E ds.), Perceptual
organiz ation (pp. 141 180). H illsdale, NJ : E rlbaum.
Posner, M. I. 1978 Chronom etric explorations of m ind. H illsdale, NJ :
E rlbaum.
Pratt, C. C. 1930 The spatial character of high and low tones. Journal
of E xperim ental Psychology, 13, 278 285.
R amachandran, V. S., & A nstis, S. M. 1983 Perceptual organization
of moving patterns. N ature, 304, 529 531.
R ecanzone, G . H . 2003 A uditory in uences on visual temporal rate
perception. Journal of N europhysiology, 89, 1078 1093.
R egan, D ., & Spekreijse, H . 1977 A uditory-visual interactions and
the correspondence between perceived auditory space and
perceived visual space. Perception, 6, 133 138.
R ockland, K. S., & O jima, H . 2003 Multisensory convergence in
calcarine visual areas in macaque monkey. International Journal of
Psychophysiology, 50, 19 26.
R of er, S. K., & Butler, R . A . 1968 Factors that in uence the
localization of sound in the vertical plane. Journal of the A coustical
Society of A m erica, 43, 1255 1259.
R oyer, F. L., & G arner, W. R . 1970 Perceptual organization of nineelement auditory temporal patterns. Perception & Psychophysics,
7, 115 120.
R yan, T. A . 1940 Interrelations of the sensory systems in perception.
Psychological B ulletin, 37, 659 698.
R ush, G . P. 1937 Visual grouping in relation to age. A rchives of
Psychology, 31 (Whole No. 217), 1 95.
Sanabria, D ., Correa, A ., Lupianez, J., & Spence, C. 2004a
Bouncing or streaming ? E xploring the in uence of auditory cues
on the interpretation of ambiguous visual motion. E xperim ental
B rain R esearch, 157, 537 541.
Sanabria, D ., Lupianez, J., & Spence, C. (in press). A uditory motion
affects visual motion perception in a speeded discrimination task.
E xperim ental B rain R esearch.
Sanabria, D ., Soto-Faraco, S., Chan, J. S., & Spence, C. 2004b When
does visual perceptual grouping affect multisensory integration ?
Cognitive, A ffective, & B ehavioral N euroscience, 4, 218 229.
Sanabria, D ., Soto-Faraco, S., Chan, J. S., & Spence, C. 2005a
Intramodal perceptual grouping modulates multisensory integration : E vidence
from
the
crossmodal
congruency
task.
N euroscience L etters, 377, 59 64.
Sanabria, D ., Soto-Faraco, S., & Spence, C. 2004c E xploring the role
of visual perceptual grouping on the audiovisual integration of
motion. N euroreport, 15, 2745 2749.
Sanabria, D ., Soto-Faraco, S., & Spence, C. 2005b Spatiotemporal
Intersensory G estalten and Crossmodal Scene Perception
interactions between audition and touch depend on hand posture.
E xperim ental B rain R esearch, 165, 505 514.
Sanabria, D ., Soto-Faraco, S., & Spence, C. 2005c A ssessing the
effect of visual and tactile distractors on the perception of auditory
apparent motion. E xperim ental B rain R esearch, 166, 548 558.
Sanabria, D ., Soto-Faraco, S., & Spence, C. (in press) Spatial attention
modulates audiovisual interactions in apparent motion. Journal of
E xperim ental Psychology : H um an Perception and Perform ance.
Sanabria, D ., Spence, C., & Soto-Faraco, S. (2007). Perceptual and
decisional contributions to audiovisual interactions in the
perception of apparent motion : A signal detection study
Cognition, 102, 299 310.
Scheier, C., Lewkowicz, D . J., & Shimojo, S. 2003 Sound induced
perceptual reorganization of an ambiguous motion display in
human infants. D evelopm ental Science, 6, 233 241.
Schiller, P. 1935 Interrelation of different senses in perception. B ritish
Journal of Psychology, 25, 465 469.
Sekuler, A . B., & Benneett, P. J. 2001 G eneralized common fate :
G rouping by common luminance changes. Psychological Science,
12, 437 444.
Sekuler, R ., Sekuler, A . B., & Lau, R . 1997 Sound alters visual
motion perception. N ature, 385, 308.
Sekuler, A . B., & Sekuler, R . 1999 Collisions between moving visual
targets : What controls alternative ways of seeing an ambiguous
display ? Perception, 28, 415 432.
Shams, L., Kamitani, Y., & Shimojo, S. 2000 What you see is what
you hear : Sound induced visual ashing. N ature, 408, 788.
Shams, L., Kamitani, Y., & Shimojo, S. 2002 Visual illusion induced
by sound. Cognitive B rain R esearch, 14, 147 152.
Shepard, R . N. 1981 Psychophysical complementarity. In M. Kubovy
& J. R . Pomerantz (E ds.), Perceptual organiz ation (pp. 279 341).
H illsdale, NJ : E rlbaum.
Shepard, R . N. 1987 E volution of a mesh between principles of the
mind and regularities of the world. In J. D upre (E d.), T he latest
on the best : E ssays on evolution and optim ality (pp. 251 275).
Cambridge, MA : MIT Press.
Shepard, R . N. 1994 Perceptual-cognitive universals as re ections of
the world. Psychonom ic B ulletin & R eview, 1, 2 28.
Sherrick, C. E . 1976 The antagonisms of hearing and touch. In S. K.
H irsh, D . H . E ldredge, I. J., H irsh & S. R . Silverman (E ds.),
H earing and D avis : E ssays honoring H allowell D avis (pp.
149 158). St. Louis, Mo., Washington U niversity Press.
Shimojo, S., & Shams, L. 2001 Sensory modalities are not separate
modalities : Plasticity and interactions. Current O pinion in
N eurobiology, 11, 505 509.
Shipley, T. 1964 A uditory utter-driving of visual icker. Science,
145, 1328 1330.
Slutsky, D . A ., & R ecanzone, G . H . 2001 Temporal and spatial
dependency of the ventriloquism effect. N euroreport, 12, 7 10.
Smith, B. (E d.) 1988 Foundations of G estalt theory. Munich,
G ermany : Philosophia Verlag.
Soto-Faraco, S., & Kingstone, A . 2004 Multisensory integration of
dynamic information. In G . A . Calvert, C. Spence, & B. E . Stein
(E ds.), T he handbook of m ultisensory processes (pp. 49 67).
Cambridge, MA : MIT Press.
Soto-Faraco, S., Kingstone, A ., & Spence, C. 2003 Multisensory
contributions to the perception of motion. N europsychologia, 41,
1847 1862.
Soto-Faraco, S., Kingstone, A ., & Spence, C. 2006 Integrating
motion information across sensory modalities : The role of topdown factors. Progress in B rain R esearch, 155, 277 290.
Soto-Faraco, S., Lyons, J., G azzaniga, M., Spence, C., & Kingstone,
A . 2002 The ventriloquist in motion : Illusory capture of dynamic
information across sensory modalities. Cognitive B rain R esearch,
14, 139 146.
Soto-Faraco, S., Spence, C., & Kingstone, A . 2004a Cross-modal
dynamic capture : Congruency effects in the perception of motion
across sensory modalities. Journal of E xperim ental Psychology :
H um an Perception and Perform ance, 30, 330 345.
Soto-Faraco, S., Spence, C., & Kingstone, A . 2004b Congruency
effects between auditory and tactile motion : E xtending the
phenomenon of crossmodal dynamic capture. Cognitive, A ffective,
& B ehavioral N euroscience, 4, 208 217.
Soto-Faraco, S., Spence, C., & Kingstone, A . 2005 A ssessing
automaticity in the audiovisual integration of motion. A cta
Psychologica, 118, 71 92.
Intersensory G estalten and Crossmodal Scene Perception
Spence, C. (in press). A udiovisual multisensory integration. A coustical
Science & T echnology.
Spence, C., & D river, J. 1997 A udiovisual links in exogenous covert
spatial orienting. Perception & Psychophysics, 59, 1 22.
Spence, C., & D river, J. (E ds.) 2004 Crossm odal space and
crossm odal attention. O xford, U K : O xford U niversity Press.
Spence, C., & McD onald, J. 2004 The crossmodal consequences of
the exogenous spatial orienting of attention In G . A . Calvert, C.
Spence, & B. E . Stein (E ds.), T he handbook of m ultisensory
processing (pp. 3 25). Cambridge, MA : MIT Press.
Spence, C., & Walton, M. 2005 O n the inability to ignore touch when
responding to vision in the crossmodal congruency task. A cta
Psychologica, 118, 47 70.
Staal, H . E ., & D onderi D . C. 1983 The effect of sound on visual
apparent movement. A m erican Journal of Psychology, 96, 95 105.
Strybel, T. Z ., & Vatakis, A . 2005 A comparison of auditory and
visual apparent motion presented individually and with crossmodal
moving distractors. Perception, 33, 1033 1048.
Thomas, G . J. 1941 E xperimental study of the in uence of vision on
sound localization. Journal of E xperim ental Psychology, 28,
163 177.
Toro, J. M., Sinnett, S., & Soto-Faraco, S. 2005 Speech segmentation
by statistical learning depends on attention. Cognition, 97,
B25 B34.
U llman, S. 1979 T he interpretation of visual m otion. Cambridge, MA :
MIT Press.
U rbantschitsch, V. 1888 U eber des E in uss einer Sinneserregung auf
die ubrigen Sinnesemp ndungen. [O n the in uence of one sensory
percept on the other sensory percepts]. A rchiv fur die geschichte
Physiologie, 42, 154 182.
Van Noorden, L. P. A . S. 1971 R hythmic ssion as a function of tone
rate. In IPO A nnual Progress R eport (No. 6). E indhoven,
Netherlands : Institute for Perception R esearch.
Van Noorden, L. P. A . S. 1975 T em poral coherence in the perception
of tone sequences. PhD , E indhoven U niversity of Technology.
Vatakis, A ., Bayliss, L., Z ampini, M., & Spence, C. (in press) The
influence of synchronous audiovisual distractors on audiovisual
temporal order judgments. Perception & Psychophysics.
Vatakis, A ., & Spence, C. (in press) Crossmodal binding : E valuating
the unity assumption’ using audiovisual speech stimuli. Perception
& Psychophysics.
Violentyev, A ., Shimojo, S., & Shams, L. 2005 Touch-induced visual
illusion. N euroreport, 16, 1107 1110.
von Bekesy, G . 1957 Neural volleys and the similarity between some
sensations produced by tones and by skin vibrations. Journal of the
A coustical Society of A m erica, 29, 1059 1069.
von Bekesy, G . 1959 Similarities between hearing and skin sensations.
Psychological R eview, 66, 1 22.
von Frey, M. 1929 Variations in tactual impressions. In W. D . E llis
(E d.), A source book of G estalt psychology (pp. 193 195).
London : R outledge and Kegan Paul.
von Schiller, P. 1932a D as optische Verschmelzen in seiner
A bhangigkeit von heteromodaler R eizung [O ptical integration and
its dependence on heteromodal shimulation]. Z eitschrift fur
Psychologie B ildung, 125, 249 288.
von Schiller, P. 1932b D ie R auhigkeit als intermodale E rscheinung
[R oughness as an intermodel phenomenon]. Z eitschrift fur
Psychologie B ildung, 127, 265 289.
Vroomen, J., & de G elder, B. 2000 Sound enhances visual perception
: Cross-modal effects of auditory organization on vision. Journal of
E xperim ental Psychology : H um an Perception and Perform ance,
26, 1583 1590.
Vroomen, J., & de G elder, B. 2003 Visual motion in uences the
contingent auditory motion aftereffect. Psychological Science, 14,
357 361.
Vroomen, J., & Keetels, M. 2006 The spatial constraint in
intersensory pairing : No role in temporal ventriloquism. Journal
of E xperim ental Psychology : H um an Perception & Perform ance,
32, 1063 1071.
Wada, Y., Kitagawa, N., & Noguchi, K. 2003 A udio-visual
integration in temporal perception. International Journal of
Psychophysiology, 50, 117 124.
Watanabe, K. 2004 Visual grouping by motion precedes the relative
localization between moving and ashed stimuli. Journal of
E xperim ental Psychology : H um an Perception & Perform ance, 30,
504 512.
Intersensory G estalten and Crossmodal Scene Perception
Watanabe, K., & Shimojo, S. 1998 A ttentional modulation in
perception of visual motion events. Perception, 27, 1041 1054.
Watanabe, K., & Shimojo, S. 2001a When sound affects vision :
E ffects of auditory grouping on visual motion perception.
Psychological Science, 12, 109 116.
Watanabe, K., & Shimojo, S. 2001b Postcoincidence trajectory
duration affects motion event perception. Perception &
Psychophysics, 63, 16 28.
Welch, R . B. 1999 Meaning, attention, and the unity assumption" in
the intersensory bias of spatial and temporal perceptions. In G .
A shersleben, T. Bachmann, & J. Musseler (E ds.), Cognitive
contributions to the perception of spatial and tem poral events (pp.
371 387). A msterdam : E lsevier Science, B. V.
Welch, R . B., D uttonH urt, L. D ., & Warren, D . H . 1986
Contributions of audition and vision to temporal rate perception.
Perception & Psychophysics, 39, 294 300.
Welch, R . B., & Warren, D . H . 1980 Immediate perceptual response
to intersensory discrepancy. Psychological B ulletin, 3, 638 667.
Wertheimer, M. 1912 E xperimentelle Studien uber das Sehen von
Bewegung [E xperimental studies on the visual perception of
movement]. Z eitschrift fur Psychologie, 61, 161 265. [A lso in T.
Shipley (E d. and Trans.), Classics in psychology (pp. 1032 1089).
New York : Philosophical Library.]
Wertheimer, M. 1938 Laws of organization in perceptual forms. In
W. E llis (E d.), A source book of G estalt psychology (pp. 71 88).
London : R outledge & Kegan Paul. (O riginal published in 1923)
Z ampini, M., G uest, S., Shore, D . I., & Spence, C. 2005 A udiovisual
simultaneity judgments. Perception & Psychophysics, 67, 531 544.
Z ampini, M., Shore, D . I., & Spence, C. 2003 Multisensory temporal
order judgments : The role of hemispheric redundancy.
International Journal of Psychophysiology, 50, 165 180.
Z apparoli, G . C., & R eatto, L. L. 1969 The apparent movement
between visual and acoustic stimulus and the problem of
intermodal relations. A cta Psychologica, 29, 256 267.
Z ietz, K., & Werner, H . 1927 Werner’s Studien uber Strukturgesetze,
VIII : U ber die dynamische Struktur der Bewegung. [Werner’s
studies on the laws of structure, VIII : O n the dynamic structure
of movement]. Z eitschrift fur Psychologie, 105, 226 249.
Footnotes
1.
The illusion of apparent motion, otherwise known as the phiphenomenon, occurs under conditions of the discrete sequential
presentation of static stimuli from different locations at rates fast
enough to give rise to the illusion of an object moving smoothly and
continuously through the space between them (e.g., Wertheimer,
1912 ; Kolers, 1972 ; see also Strybel & Vatakis, 2005)
2. Note that Vroomen and de G elder (2000) incorrectly describe these
visual stimuli as moving sequentially from left to right across the
screen. A ll of the visual stimuli were actually presented from the
same lateral position. It is, however, uncertain whether O ’Leary and
R hodes could actually have reduced the duration of the visual stimuli
in 8 ms steps as they report in their paper. This would have
necessitated the use of a monitor with a screen refresh rate of 125
H z, and it is questionable whether such performance could have been
attained using the display technologies available at the time.
3. G iven that physically extending the moment of collision has been
shown to increase the perception of bouncing (e.g., Bertenthal et al.,
1993), it could be argued that the presentation of the sound may
simply have extended the perceived duration of the collision (i.e., it
may have had its effect by freezing’ the visual display, as reported in
Vroomen and de G elder’s, 2000, study). H owever, Sanabria et al.
(2004) have shown that this freezing account cannot provide the sole
explanation for this particular crossmodal effect. They found that
the presentation of the sound still in uences people’s interpretation
of the visual display even if it was presented while the disks were
occluded behind an opaque barrier (hence when there was no
moment of collision to freeze’).
4. While a response bias interpretation of the auditory grouping effect
on this particular crossmodal interaction remains possible (cf.
Bertelson & de G elder, 2004 ; Vroomen & de G elder, 2000), an
extensive body of crossmodal research has demonstrated that there is
a genuinely perceptual component to this crossmodal effect (e.g., see
Sanabria et al., 2004 ; Watanabe & Shimojo, 2001b).
5. It seems particularly apposite to use apparent motion to look at the
question of intramodal versus crossmodal grouping given that the
eld of G estalt psychology itself originated with Wertheimer’s (1912)
early studies of visual apparent motion (see Shepard, 1981, p. 311).
Intersensory G estalten and Crossmodal Scene Perception
6.
In a book chapter published in 1994, Churchland et al. (pp. 30 31)
reported that if a visual occluder (such as a piece of card) is placed
on a screen and a visual stimulus is ashed on and off just to the side
(i.e., to the left) of it then participants only see a single light ashing
on and off (i.e., no perception of motion is experienced). H owever,
if a sound is played to the left ear over headphones when the light is
ashed, and then a sound is presented to the right ear when the light
is turned off, this can give rise to the perception that the visual
stimulus is actually moving right behind the occluder. Churchland et
al. also reported a similar, albeit somewhat weaker, subjective
motion effect when the ashing of the light is accompanied by a
tactile stimulus to the left hand, and by a tactile stimulus to the right
hand when it is turned off (with the hands presumably placed in an
uncrossed posture). These fascinating, albeit anecdotal, ndings
clearly warrant further empirical research.
7. Note that congruency in the 4 lights displays was de ned in terms
of the global (rather than local) motion in the displays.
8. In fact, while all of the research that has been highlighted so far
has had as its focus the question of the extent to which changes in
the strength of intramodal perceptual grouping affect the nature of
any crossmodal grouping (or binding) that is observed, one could
presumably also ask the reverse question : Namely, can changes in
the strength (or type) of crossmodal perceptual grouping modulate
the strength (or type) of grouping taking place intramodally (perhaps
using a variant of the local versus global grouping displays used by
Sanabria et al., 2004c ; 2005a).
9. A t this point it is perhaps also worth noting how little research has
been conducted on the grouping principles governing intramodal
tactile perception (though see Katz 1925/1989 ; and von Frey, 1929).
Author Notes
Correspondence concerning this article should be addressed to Charles
Spence (E -mail : charles.spence@psy.ox.ac.uk) at the D epartment of
E xperimental Psychology, South Parks R oad, O xford, E ngland, O X1
3U D . This research was funded by a grant from the O xford
McD onnell Centre for Cognitive Neuroscience to CS and SS F.
Download