Intersensory Gestalten and Crossmodal Scene Perception Charles Spence (U niversity D aniel Sanabria (U niversity Salvador Soto-Faraco (U niversitat de of O xford) of O xford) Barcelona) Abstract The last few years have seen a dramatic growth of interest in questions related to the integration of information arising from the different senses (e.g., Calvert, Spence, & Stein, 2004). R esearchers have identi ed a number of factors that can facilitate multisensory binding, such as spatial and temporal coincidence, common fate, and common temporal structure. These crossmodal binding principles map onto a number of G estalt grouping principles that have been put forward previously to describe how sensory information is perceptually organized within individual sensory modalities (i.e., intramodal perceptual grouping or stream segregation). H owever, to date, only a small body of research has attempted to investigate the extent to which the perceptual organization of stimuli taking place in one sensory modality can in uence the perceptual organization of stimuli in another sensory modality, and/or the nature of any crossmodal interactions between stimuli presented in different sensory modalities. In this chapter, we provide a historical overview of the empirical evidence relating to the crossmodal (or multisensory) aspects of scene perception (i.e., perceptual organization). Taken together, the evidence clearly highlights the necessity of considering intramodal perceptual grouping when investigating the multisen- sory integration (or crossmodal grouping) of sensory information. 1) Introduction H ave you ever stared at the ickering lights on a Christmas tree and wondered why they appear to ash in time with the rhythm of the music that you are listening to ? D espite the fact that many people report having had this experience anecdotally, very little research has been directed at trying to explain the phenomenon empirically (though see Intriligator, 2000). In the present chapter, we put forward the view that the Christmas tree lights illusion most probably re ects the consequences of the brain’s attempt to organize the perceptual scene crossmodally. In fact, we will argue that the illusion provides an everyday example (albeit of a seasonal variety) demonstrating that the perceptual organization of stimuli taking place in one sensory modality (audition in this case) may be used by the nervous system to help organize (or structure) the information that is simultaneously being processed by other sensory modalities (in this case vision). While researchers have known about such crossmodal interactions (or intersensory G estalten ; G ilbert, 1938, 1941) for many years (see Maass, 1938 ; Schiller, 1935 ; Thomas, 1941 ; U rbantschitsch, 1888 ; Z ietz & Werner, 1927, for early studies ; see R yan, 1940, London, 1954, for early reviews), surprisingly little research has been conducted on this fascinating phenomenon. O ne reason for this lack of interest regarding the organizational principles governing multisensory scene perception may be the seemingly effortless manner in which we normally segregate the inputs constantly bombarding our senses in terms of their modality of origin (though see McG urk & McD onald, 1976, for one of the best-known counter-examples, showing that changes Intersensory G estalten and Crossmodal Scene Perception in acoustic speech perception can be induced by changing the nature of the visual display ; and Shams, Kamitani, & Shimojo, 2000, for evidence that changes in visual perception/structure can be induced by presenting the appropriate pattern of auditory stimulation). Instead, the majority of researchers have tended to focus their efforts on elucidating the grouping principles underlying our perception of unimodal visual displays (e.g., Corbin, 1942 ; H ochberg, 1974 ; Koffka, 1935 ; Kohler, 1930 ; Pomerantz, 1981 ; R ush, 1937 ; Sekuler & Bennett, 2001 ; Smith, 1988 ; Wertheimer, 1923/1938), and to a lesser extent, unimodal auditory displays McA dams & (e.g., Bregman, Bregman, 1990 ; McA dams, 1979 ; R oyer & 1984 ; G arner, 1970 ; Wertheimer, 1923/1938). R esearch by G estalt psychologists on the topic of unimodal perceptual organization has revealed the existence of a number of grouping principles, such as spatial proximity, similarity, good continuation, common fate (or uniform destiny), closure, and E instellung, that exert a profound in uence on people’s perception of the organization of visual displays. Similarly, the research of Bregman and his colleagues (summarized in Bregman, 1990) has also highlighted the existence of a number of grouping principles governing the segregation of auditory scenes (see also Carlyon, 2004). Interestingly, the principles of auditory scene analysis that have been identi ed, such as similarity (in terms of pitch, volume, timbre, or the location of sounds), good continuation, and common fate map nicely onto those reported previously in unimodal visual studies (see also Julesz & H irsh, 1972 ; Kubovy, 1981 ; Wertheimer, 1923/1938), thus suggesting the existence of a set of general perceptual organizational principles (cf. A ksentijevic, E lliott, & Barbe, 2001 ; Shepard, 1981, 1987, 1994). The last few years have seen a very rapid growth of interest in the study of multisensory perception by cognitive neuroscientists (e.g., see the chapters in Calvert et al., 2004). What’s more, recent studies of multisensory integration have also highlighted the importance of factors such as spatial coincidence, temporal synchrony, crossmodal binding and (or common perceptual fate in mediating grouping) of the sensory information (e.g., Bertelson, 1999 ; King & Calvert, 2001 ; Slutsky & R ecanzone, 2001 ; Soto-Faraco, Kingstone, & Spence, 2004 ; Spence & D river, 2004 ; Thomas, 1941 ; Welch, 1999 ; Welch & Warren, 1980 ; Z ampini, G uest, Shore, & Spence, 2005). G iven organizational this apparent principles correspondence constraining our between interpretation the of unimodal visual and auditory scenes (Bregman, 1990 ; Kubovy, 1981 ; Kubovy & Van Valkenburg, 2001) or patterns (e.g., Julesz & H irsh, 1972), and the fact that multisensory integration (or crossmodal perceptual grouping) appears to be constrained by many of the same organizational principles, one might reasonably ask to what extent scene analysis actually re ects a crossmodal phenomenon (see O ’Leary & R hodes, 1984). In other words, to what extent does the human perceptual system utilize information from one sensory modality in order to impose a certain organization on the perceptual array present in another sensory modality ? The research that has been conducted thus far on the perceptual organization of multisensory scenes can be divided into several distinct areas : A number of researchers have focused on the question of whether or not the perceptual organization of stimuli taking place within one sensory modality can in uence the perceptual organization of stimuli presented in another sensory modality (e.g., Maass, 1938 ; O ’Leary & R hodes, 1984 ; Soto- Intersensory G estalten and Crossmodal Scene Perception Faraco, Lyons, G azzaniga, Spence, & Kingstone, 2002 ; SotoFaraco, Kingstone, & Spence, 2003 ; Z ietz & Werner, 1927 ; see also Shimojo & Shams, 2001). Meanwhile, other researchers have addressed the question of whether perceptual interactions between stimuli presented to different sensory modalities (a phenomenon sometimes referred as crossmodal perceptual grouping ; e.g., Bertelson, 1999 ; Bertelson & de G elder, 2004, p. 150) are modulated by the perceptual grouping taking place within a particular sensory modality (e.g., Thomas, 1941 ; Vroomen & de G elder, 2000 ; Watanabe & Shimojo, 2001a). More recently, researchers have also started to investigate the consequences of the local versus global grouping of stimuli in one sensory modality upon crossmodal perceptual organization (Lyons, Sanabria, Vatakis, & Spence, 2006 ; Sanabria, Soto-Faraco, Chan, & Spence, 2004b, 2005a ; Sanabria, Soto-Faraco, & Spence, 2004c). A lthough the aforementioned studies have utilized a whole range of different experimental paradigms, they converge insofar as they can all be said to re ect an attempt by the researchers involved to investigate the complex interplay between intramodal and crossmodal perceptual grouping (or scene organization) ; O r, in other words, to investigate the relationship between intra sensory and inter-sensory G estalten (G ilbert, 1938, 1941). In the sections that follow, we provide a chronological overview of the evidence supporting the claim that scene perception (or perceptual organization) re ects a crossmodal (or intermodal) phenomenon. 2) Early studies of crossmodal perceptual organization In one of the earliest studies of crossmodal perceptual organization, Maass (1938) presented participants with a number of visual stimuli positioned so as to facilitate the perception of several different patterns of visual apparent motion. 1) The visual stimuli were either presented in silence or else were presented together with a 2 beat or 3 beat auditory rhythm. Maass reported that the presentation of the auditory stimuli limited the number of visual patterns that people reported seeing. In particular, participants tended to report visual patterns that corresponded to the auditory rhythms, but not those that did not. Maass’s results therefore provide one of the rst empirical demonstrations that the nature of the perceptual organization taking place in the auditory modality can affect the organization imposed by the perceptual system on a simultaneously-presented visual display (as highlighted by the Christmas lights illusion mentioned earlier ; see also Z ietz & Werner, 1927, for similar results). More than 40 years later, O ’Leary and R hodes (1984) reported an important study in which they investigated the in uence of perceptual grouping (or organization) within one sensory modality on the perceived grouping of stimuli presented in another modality. The participants in this study were repeatedly presented with a sequence of 6 visual stimuli (dots), three dots in a higher sub-group (presented from three different elevations in the upper part of the visual eld), and another three dots in a lower sub-group (presented from three different elevations in the lower part of the visual eld ; see Figure 1A ). D ots from the high and low sub-groups were presented in an alternating sequence from the same horizontal position, 2) although at different elevations (see Figure 2A ). At slower rates of stimulus presentation, this sequence of visual stimuli gave rise to the perception of a single dot that appeared to move up and down sequentially between the lower and upper part of the visual display (see Figure 2B), whereas at faster rates of presentation, Intersensory G estalten and Crossmodal Scene Perception the higher and lower streams segregated and two dots were perceived as moving up and down concurrently (see Figure 2C), one at the top of the display (moving between the three different elevations of the upper sub-group) and the other at the bottom of the display (moving between the three different elevations of the lower sub-group ; see also Bregman & A chim, 1973). Figure 1 Schematic illustration of the sequence of visual (A ) and auditory (B) stimuli used in O ’Leary and R hodes’s (1984) study of crossmodal perceptual organization. T1 T6 indicates the temporal sequence (from rst to last) in which the 6 events were presented in each modality. The visual and auditory stimuli were either presented individually or together (in the bimodal stimulation condition). O ’Leary and R hodes (1984) also presented their participants with sequences of tones. In a manner analogous to the visual displays, 2 sub-groups of tones, one of higher frequency and the other of lower frequency, were presented (see Figure 1B). A t lower rates of stimulus presentation, these tones were perceived as a single tone alternating in frequency between the upper and lower frequencies, whereas at higher rates of stimulus presentation, two temporally-overlapping, but perceptually distinct (i.e., segregated), auditory streams were perceived (see Bregman & Campbell, 1971 ; McA dams, 1984 ; Miller & H eise, 1950 ; Van Noorden, 1971, 1975). Initially, the thresholds (in terms of the SO A ) for the Figure 2 (A ) Sequence of stimuli presented in the upper and lower sub-groups in O ’Leary and R hodes’s (1984) experiment. T1 T6 indicates the temporal sequence (from rst to last) in which the 6 events were presented in each modality. (B C) Perceptual correlates associated with different rates of stimulus presentation. A t lower rates of stimulus presentation (B), a single dot (or tone) was perceived alternating sequentially between the two sub-groups of stimuli (as shown by the continuous line connecting the dots). A t higher rates of stimulus presentation (C), two separate concurrent streams were perceived, one in the upper part of the visual display (or frequency range) and the other in the lower part of the display (or frequency range). A t intermediate rates of stimulus presentation, O ’Leary and R hodes observed that participant’s subjective reports of whether they perceived one or two streams in a given sensory modality could be in uenced by whether they were perceiving one or two streams in the other modality at the same time. These results were taken by O ’Leary and R hodes to show that the nature of the perceptual organization in one sensory modality can in uence how the perceptual scene may be organized (or segregated) in another sensory modality. perception of one versus two streams were determined for each modality individually by varying both the magnitude of the separation between the upper and lower sub-groups (either spatially for the visual stimuli, or in the frequency domain for the auditory stimuli), and the timing between successive stimuli in the sequence. Next these thresholds were assessed under conditions Intersensory G estalten and Crossmodal Scene Perception of bimodal stimulus presentation. conditions, the highest Note that in the bimodal frequency sound was presented in synchrony with the highest visual stimulus, the second highest tone with the second highest visual stimulus and so on, that is, the stimuli were presented in a synaesthetically congruent manner (see G allace & Spence, 2007 ; Marks, 2004 ; Pratt, 1930). O ’Leary and R hodes reported that the presentation of visual displays that were perceived by participants as consisting of two moving objects (i.e., visual streams where segregation had taken place) caused them to report that the concurrent auditory displays were also perceived as two streams (i.e., as segregated) at presentation rates that yielded reports of a single perceptual stream when the accompanying visual sequence was perceived as a single stream, and vice versa. O ’Leary and R hodes’s (1984) study represents a seminal piece of empirical research on the nature of crossmodal perceptual organization. H owever, their ndings have been criticized (e.g., by Vroomen & de G elder, 2000) on the grounds that they remain open to a response bias (i.e., non-perceptual) interpretation. O ne feature that is common to many ambiguous displays, such as the auditory and visual displays used by O ’Leary and R hodes, is that their perceptual interpretation remains ambiguous over a relatively wide range of temporal intervals (e.g., van Norden, 1971). What’s more, the perceptual interpretation of such displays can easily be ipped at will (i.e., volitionally) by observers. H ence, in O ’Leary and R hodes’s study, it is possible that the perception of one versus two streams in a given sensory modality may simply have biased people to respond in the same way when asked about their perception of the (ambiguous) stimulus organization in the other modality. In other words, one might predict that simply presenting the numbers 1 or 2 to participants (no matter whether they were presented auditorily or visually) may have had exactly the same effect on people’s reports concerning their interpretation of the stimulus displays. R hodes’s A s such, O ’Leary and ndings may only tell us about an observer’s voluntary control over the interpretation of ambiguous images (i.e., about the contribution of perception), rather decisional than factors to crossmodal scene necessarily revealing anything fundamental about the nature of the perceptual constraints on crossmodal information processing and scene analysis (a similar criticism can also be levelled at Maass’s, 1938, study). 3) Assessing the in uence of intramodal perceptual grouping on multisensory interactions This potential criticism was addressed in an elegant series of experiments by Vroomen and de G elder (2000) in which they investigated the in uence of auditory perceptual grouping (or stream segregation) on participant’s performance of a visual target identi cation task. In their experiments, a sequence of four visual displays, each consisting of four dots placed randomly within a 4 by 4 array of possible locations (see Figure 3), was repeatedly presented to participants. E ach visual display was brie y ashed (for 97 ms) and then immediately masked until the onset of the next visual display (the mask consisted of all 16 dots in the array being displayed for 97 ms followed by a dark screen being presented for a further 60 ms). O n each trial, the participants had to judge the location of the visual target, de ned as a diamondshaped array of the 4 dots in one of the four corners of the display (see the 3rd display illustrated in Figure 3). The whole sequence of displays was presented repeatedly until participants made their 4 alternative spatial discrimination response, and both the speed Intersensory G estalten and Crossmodal Scene Perception (in terms of the number of stimulus displays presented prior to response execution) and accuracy of participants’ responses were registered. Figure 3 Schematic illustration of a representative sequence of events in Vroomen and de G elder’s (2000) E xperiment 1. A rapid sequence of 4 visual displays was presented continuously, each visual display presented concurrently with an auditory event (illustrated here by the ’musical’ notation). The participant’s task was to discriminate the location (4 choice spatial discrimination) of the diamondshaped visual target (presented in the 3rd frame of this illustration). Participants were able to discriminate the location of the visual target signi cantly more accurately (and rapidly) when the unique auditory high tone coincided with the presentation of the visual target than in trials where only low tones were presented. (Note that the small dots in the displays were not actually seen by the participants but are just shown here to illustrate the possible positions from which the four visual stimuli could be presented in each display ; Note also that a visual mask was presented between each display, see text for details). The manipulation of interest related to the tones that were presented in synchrony with each of the visual displays. In half of the trials in Vroomen and de G elder’s (2000) rst experiment, the same low tone (1000 H z) was presented in synchrony with each of the 4 visual displays (the LLLL condition), whereas in the remainder of the trials, a high tone (1259 H z) was presented synchronously with the target display (the LLH L condition ; note that the visual target was always presented as the 3rd display in each sequence of 4 displays). The 4 sounds were presented a number of times (between 4 and 8) prior to the onset of the visual displays on each trial in order to facilitate the segregation of the auditory stimuli into separate low and high frequency streams (see Bregman, 1990) on those trials where both low and high tones were presented. Vroomen and de G elder’s (2000) results showed that participants were able to report the location of the visual target more accurately on trials where it was presented in time with the high tone than on trials where only low tones were presented (mean response accuracy of 66% vs. 55% , respectively). Their results also showed that participants responded more rapidly in LLH L trials (i.e., after fewer presentations of the target stimulus ; mean of 2.86 target presentations) than in LLLL trials (M 3.32 target presentations). This crossmodal auditory facilitation of visual spatial discrimination performance occurred despite the fact that the auditory stimuli were entirely irrelevant to the participant’s task and despite the fact that the auditory stimuli provided absolutely no information with regard to the spatial location of the to-be-discriminated visual target (cf. D river & Spence, 2000). The participants in Vroomen and de G elder’s study also reported subjectively that the visual display coinciding with the high tone appeared to segregate from the other visual displays in the sequence (and was also perceived to have a somewhat longer duration ; that is, it appeared to though see Staal & D onderi, 1983). freeze ; This auditorily-induced performance enhancement was eliminated, however, if the high tone was presented in synchrony with the visual display directly Intersensory G estalten and Crossmodal Scene Perception preceding the target display (i.e., synchronous with the 2nd display in the sequence ; Vroomen & de G elder, 2000, E xperiment 2). This latter result shows that the bene cial effect of the presentation of the high tone on performance could not simply be attributed to it acting as some kind of non-speci c warning signal (or temporal marker) indicating to participants when in the sequence of stimuli they should expect the target to occur (cf. Correa, Sanabria, Spence, Tudela, & Lupianez, 2005 ; Posner, 1978). In their nal two experiments, Vroomen and de G elder (2000) demonstrated that the bene cial effect of presenting a higher freguency tone in time with the target visual display could be attenuated (or even eliminated) simply by reducing the likelihood that the high tone would segregate from the other tones in the auditory stream. They lowered the probability of effective stream segregation by using a Low, Medium, H igh, Low tone sequence (LMH L ; with the medium tone falling 2 semitones between the high and low tones) to reduce the frequency separation between the high tone and the other tones in the sequence. Crucially, the participants responded signi cantly more accurately in the LLH L condition than in the LMH L condition, no matter whether or not they were informed that the LMH L tone sequence actually corresponded to the beginning of the wellknown French tune Frere-Jaques . Taken together, these results therefore suggest that it may simply have been the reduction in the frequency separation between the high tone and the other tones in the sequence, rather than necessarily the participants experiencing the LMH L tone sequence as a familiar tune (or melody), that elicited this effect (cf. Bregman & D annenbring, 1973 ; H eise & Miller, 1951). Vroomen and de G elder’s (2000) results therefore provide a convincing demonstration that the nature of the perceptual grouping (or stream segregation) taking place within one sensory modality (audition in this case) can affect the extent of any crossmodal interactions observed between presented auditory and visual stimuli. simultaneously- What’s more, and in contrast to the results of the earlier studies (e.g., Maass, 1938 ; O ’Leary & R hodes, 1984 ; Z ietz & Werner, 1927), Vroomen and de G elder’s ndings cannot easily be accounted for in terms of any kind of response bias, since the segmentation/grouping of the auditory stream was orthogonal to the 4 choice visual spatial discrimination response that participants had to make (cf. D river & Spence, 2000 ; Spence & D river, 1997). H ence, the in uence of auditory perceptual grouping (or stream segregation) on crossmodal audiovisual interactions can be con dently attributed to a genuine perceptual effect instead. The year after Vroomen and de G elder’s study, Watanabe and Shimojo (2001a) provided another elegant demonstration that the nature of the intramodal perceptual grouping taking place within the auditory modality can in uence the nature of audiovisual crossmodal interactions. They used a variation of the bouncing ball illusion (Metzger, 1934 ; Michotte, 1946/1963), in which two identical objects (typically opaque disks) are shown moving directly toward each other along a straight trajectory, coinciding, and moving away from each other, on a twodimensional visual display. The display is ambiguous, being equally consistent with two possible perceptual interpretations : That is, when the balls coincide, they can either be seen to pass (or stream) through each other (the most common percept under normal conditions of unimodal stimulation), or else they can appear to bounce off one another (e.g., Bertenthal, Benton, & Bradbury, 1993 ; R amachandran & A nstis, 1983 ; Sekuler & Intersensory G estalten and Crossmodal Scene Perception Sekuler, 1999 ; Watanabe & Shimojo. 1998). In 1997, Sekuler, Sekuler, and Lau showed that the presentation of a brief sound at the moment when the two objects coincide signi cantly increases the likelihood that participants will report the two disks as having bounced off of each other (rather than as have streamed through each other ; see also Bushara, H anakawa, Immisch, Toma, Kansaku, & H allett, 2002 ; E cker & H eller, 2005 ; Sanabria, Lupianez, & Spence, in press ; 2004 ; Scheier, Lewkowicz, & Shimojo, 2003 ; Shimojo & Shams, 2001). 3) Watanabe and Shimojo (2001a) demonstrated that this auditory modulation of the bouncing/streaming percept could be reduced if the sound presented in time with the two objects meeting on the screen (the simultaneous auditory event ) was embedded within a stream of identical auditory Figure 4). That the effect of the ankers’ (see ankers should be attributed to auditory grouping, rather than to some low-level effect of the presentation of the ankers on people’s perception of the simultaneous auditory event (in terms of altering its perceived loudness, duration, or time of occurrence), was con rmed subsequently in a series of control studies (Watanabe & Shimojo, 2001a ; E xperiments 4 6). Watanabe and Shimojo’s results therefore show that the grouping of the simultaneous auditory event into a stream with the auditory ankers modulated (i.e., attenuated) any crossmodal in uence of the sound on the resolution of ambiguous visual motion. 4) Watanabe and Shimojo (2001a) also demonstrated that this auditory modulation of ambiguous visual motion perception could be revived if the auditory ankers were modi ed such that they no longer grouped with the simultaneous auditory event : In particular, an increase in the proportion of bounce responses was once again observed if the frequency of the simultaneous auditory Figure 4 Schematic illustration of the sequence of events in Watanabe and Shimojo’s (2001a ; E xperiment 2) study of the effect of auditory stimuli on visual motion processing. A t the start of each trial, two black disks were seen moving toward each other on a two-dimensional visual display (the direction of motion is indicated by the arrows shown in the lower display). Participants had to report whether they perceived the disks as bouncing off one another or else as streaming through each other (the dominant percept under unimodal visual stimulation conditions). Previous research has shown that the presentation of an auditory event simultaneous with the coincidence of the two disks (the black musical note shown next to the middle display in the gure) increases the proportion of bouncing’ responses (e.g., Sekuler et al., 1997). Watanabe and Shimojo demonstrated that the modulatory effect of the presentation of the simultaneous auditory event on bounce responses was signi cantly attenuated if a sequence of same frequency auditory ankers (the white musical notes in the gure) were presented before and after it (presumably because the simultaneous auditory event now grouped with the auditory ankers rather than with the visual coincidence event). The proportion of bounce responses remained high, however, if the frequency of the simultaneous auditory event was made suf ciently different from that of the ankers. event was made suf ciently different from that of the ankers (i.e., when the simultaneous auditory event was presented at 900 or 2,700 H z, while the ankers were presented at 1800 H z ; conditions under which the simultaneous auditory event should presumably have segregated from the uniform frequency auditory ankers). Therefore, it appears that just as in Vroomen and de G elder’s (2000) study, frequency separation can be used as an effective cue to weaken intramodal auditory grouping (and so facilitate stream segregation), and thereby potentially to facilitate Intersensory G estalten and Crossmodal Scene Perception crossmodal grouping (or binding). The proportion of bounce responses also remained high if the simultaneous auditory event was made louder than the ankers (60 dB), but not if it was made ankers which were always quieter (56 dB, as compared to the presented at 58 dB ; cf. G ilbert, 1941). Watanabe and Shimojo’s results therefore converge with those of Vroomen and de G elder (2000) in showing that the extent of any crossmodal interaction between simultaneously-presented auditory and visual stimuli can be modulated by the intramodal perceptual grouping taking place within the auditory modality. H owever, they also highlight the importance of stimulus saliency in modulating crossmodal interactions over-and-above the effect of any unimodal grouping that may be taking place. 4) The crossmodal perceptual organization of apparent motion stimuli Both Vroomen and de G elder (2000) and Watanabe and Shimojo (2001a) focused on the effects of the intramodal grouping taking place within the auditory modality on crossmodal audiovisual interactions that were assessed by means of participants’ performance on a visual task. O ne might reasonably ask therefore whether similar effects would also occur in the reverse direction : That is, would the nature of any intramodal grouping taking place within the visual modality also in uence crossmodal interactions as indexed by auditory discrimination performance ? Salvador SotoFaraco and his colleagues have reported a number of studies addressing precisely this question using a variation of the crossmodal dynamic capture task (Soto-Faraco et al., 2002). 5) In 2002, Soto-Faraco et al. reported a series of experiments in which they showed that the presentation of a visual apparent motion stream consisting of the sequential presentation of two light ashes, one from either side of xation, could in uence the direction in which an auditory apparent motion stream consisting of two sequentially-presented tones, one presented from either side of xation, appeared to move (see Figure 5 ; see also Staal & D onderi, 1983). In a typical crossmodal dynamic capture study, participants are asked to judge the direction of the auditory apparent motion stream (either from left-to-right or vice versa) while trying to ignore an irrelevant visual apparent motion stream moving in either the same (i.e., congruent ) or opposite (i.e., Figure 5 Schematic illustration of the typical experimental set-up used in SotoFaraco et al.’s (2002) studies of the audiovisual crossmodal dynamic capture effect. O n each trial, a sound was sequentially presented from each of two loudspeakers, and each of two LE D s was also illuminated sequentially. The order of presentation (i.e., left or right rst) of the stimuli in each modality was entirely unpredictable. The participants had to discriminate whether the sound (target) appeared to move from left-to-right (A ) or vice versa (B), while trying to ignore the apparent movement of the visual distractors which could be either incongruent or congruent with that of the auditory stimuli. T1 T2 indicate the temporal sequence of events in the trial. Intersensory G estalten and Crossmodal Scene Perception incongruent or con icting) direction. The principle result to have emerged from many such studies conducted over the last 5 years is that participants signi cantly perform less accurately on incongruent trials than on congruent trials (see Figure 6), at least when the target and distractor streams are presented simultaneously. Figure 6 Typical pattern of results from a crossmodal dynamic capture experiment showing that discrimination of the direction of an auditory apparent motion stream can be substantially impaired (i.e., response accuracy is signi cantly lowered) by the simultaneous presentation of a distracting visual stream moving in the opposite direction. Note that no such performance decrement was reported if the auditory and visual apparent motion streams were presented asynchronously (separated by 500 ms in this example). The magnitude of the crossmodal dynamic effect is measured in terms of the difference in performance between incongruent and congruent trials (i.e., the difference between the black and grey bars in the gure). Soto-Faraco and his colleagues have argued that this crossmodal dynamic capture effect (de ned as the magnitude of the difference in performance between incongruent and congruent trials) re ects the mandatory integration of visual and auditory apparent motion signals. Interestingly, however, while performance on congruent crossmodal trials typically tends to hover around ceiling (i.e., 100% correct) in the majority of crossmodal dynamic capture studies (as it does on unimodal auditory direction of motion discrimination trials), performance on con icting trials tends to fall in the range of 40 60% correct. This suggests a partial, rather than a complete, capture of the direction of auditory apparent motion by the distracting visual apparent motion stream. It is important to note though that the effect of the presentation of the visual stimuli cannot just be attributed to an attentional distraction effect (i.e., to participants simply being unsure of the direction of the sound when visual stimuli are presented on incongruent trials), since the crossmodal dynamic capture effect is just as prevalent on trials where participants are con dent of their response (about which direction the sound moved in) as on trials where they are more uncertain of their response (see Soto-Faraco et al., 2004a, E xperiment 3). It has been shown that crossmodal dynamic capture effects are somewhat larger when the stimuli in the different sensory modalities are presented from the same (rather than from a different) set of spatial locations (Soto-Faraco et al., 2002, E xperiments 1 and 2 ; see also Meyer Wuerger, R ohrbein, & Z etzche, 2005), and when the auditory and visual streams are presented at approximately the same time, rather than asynchronously (e.g., see Soto-Faraco et al., 2002, E xperiment 3 ; see also G ilbert, 1939). Crossmodal dynamic capture has been shown to in uence the perception of the direction of both continuous and apparent motion stimuli (e.g., Soto-Faraco, Spence, & Kingstone, 2004a, E xperiment 6), and to occur bidirectionally between auditory and tactile stimuli (Sanabria, SotoFaraco, & Spence, 2005b ; Soto-Faraco, Spence, & Kingstone, 2004b). Visual distractors can also in uence the perception of the direction of tactile apparent motion (Craig, 2006 ; Lyons et al., 2006 ; Soto-Faraco & Kingstone, 2004), though little effect of auditory or tactile distractors has typically been reported on Intersensory G estalten and Crossmodal Scene Perception judgments of the direction of visual apparent motion, even when the strength (or quality) of apparent motion has been matched across the auditory and visual modalities (see Soto-Faraco et al., 2004a, E xperiments 1 and 2 ; Soto-Faraco & Kingstone, 2004 ; though see also G ilbert, 1939 ; Soto-Faraco et al., 2003). 6) A response bias interpretation of crossmodal dynamic capture has been ruled out in a number of recent studies (Sanabria, Spence, & Soto-Faraco, 2007 ; Soto-Faraco, Kingstone, & Spence, 2006 ; Soto-Faraco, Spence, & Kingstone, 2005 ; Vroomen & de G elder, 2003), thus con rming the genuinely perceptual nature of at least some component of this crossmodal effect. 5) Assessing the role of intramodal grouping on the crossmodal dynamic capture effect The majority of studies of the crossmodal dynamic capture effect have involved the presentation of the same number of stimuli in both the target (to-be-reported) and distractor (to-be-ignored) modalities (typically 2 ; though see Sanabria et al., 2007, for evidence that crossmodal dynamic capture effects can also be observed when 4 stimuli are presented in both modalities). Sanabria and his colleagues (Sanabria, Soto-Faraco, Chan, & Spence, 2005a ; E xperiment 1) recently investigated whether increasing the number of visual stimuli presented in the distractor visual apparent motion stream (from 2 to 6) would affect the magnitude of the crossmodal congruency effect when participants were required to judge the direction in which a sequence of two auditory stimuli moved (either from left-to-right or vice versa ; see Figure 7). G iven that increasing the number of stimuli increases the strength of visual apparent motion (e.g., Kolers, 1972 ; Sanabria Figure 7 Schematic illustration of the stimulus displays used in Sanabria et al.’s (2005a) study of intramodal versus crossmodal perceptual grouping. Note that the bimodal displays in the 2 lights condition (A ) are identical to the middle 2 bimodal displays (T3 T4) in the 6 lights condition (B). The auditory and visual apparent motion stimuli moved in opposite directions on incongruent trials, and in the same direction on congruent trials. T1 T6 indicate the temporal sequence of events in the trial. et al., 2005a, Footnote 3), one might have predicted that this manipulation should actually have made the visual distractor apparent motion harder to ignore, and hence have resulted in a larger crossmodal dynamic capture effect (cf. Spence & Walton, 2005). In contrast, Sanabria et al. (2005a) argued that increasing the number of stimuli in the visual display might actually reduce the crossmodal capture effect because the increase in the strength of intramodal visual perceptual grouping (and in the number of visual stimuli relative to auditory stimuli), should make it more likely that the auditory and visual streams would segregate from one another and so be treated as separate perceptual events. A ny such segregation of the stimuli presented in the two modalities Intersensory G estalten and Crossmodal Scene Perception would be expected to reduce multisensory integration (crossmodal grouping) and hence reduce the in uence of the visual stream on participants’ auditory direction-of-motion discrimination responses (cf. Vatakis, Bayliss, Z ampini, & Spence, in press). Consistent with Sanabria et al.’s (2005a) account, signi cantly less crossmodal dynamic capture was observed in a condition where the visual distractor stream contained 6 lights (mean capture effect of 21% ) than in the traditional 2 lights displays (mean capture effect of 34% ). These results therefore demonstrate that the nature of the intramodal perceptual grouping taking place within the visual modality can also in uence audiovisual crossmodal interactions as indexed by performance on an auditory direction-of-motion discrimination task. In Sanabria et al.’s (2005a ; E xperiment 1) study, as in a number of the other studies described thus far (e.g., Vroomen & de G elder, 2000 ; Watanabe & Shimojo, 2001a), the conditions for intramodal perceptual grouping were set up prior to the presentation of any bimodal stimulus displays. Thus, perceptual grouping in the distractor modality could build up in advance of the opportunity for crossmodal grouping. The question therefore arises as to whether intramodal perceptual grouping would still modulate crossmodal perceptual grouping to the same extent if the conditions for crossmodal perceptual grouping were actually initiated prior to (or at the same time as) those promoting intramodal perceptual grouping in the stimulus displays (cf. Watanabe, 2004, for a similar discussion regarding the relative timing of intramodal grouping by motion and the relative localization of stimuli within vision). Sanabria and his colleagues (Sanabria et al., 2004b ; E xperiment 1) addressed this issue in an experiment where they compared the crossmodal dynamic capture effects seen under conditions where intramodal visual perceptual grouping was promoted prior to the presentation of the crossmodal displays (see Figure 8A ) with other conditions where the rst displays that were presented to the participants were bimodal (and so presumably should have promoted crossmodal grouping instead ; see Figure 8B). Signi cantly smaller crossmodal dynamic capture effects were observed when the conditions for intramodal visual grouping were set up prior to the presentation of the bimodal displays (see Figure 8A ; mean crossmodal congruency effect of 38% ) than when the rst stimulus displays were bimodal (see Figure 8B ; mean crossmodal congruency effect of 59% ), and thus presumably promoted crossmodal (over intramodal) perceptual grouping. Sanabria et al. (2004b, E xperiment 2) conducted an important follow-up experiment to con rm that it was not simply any temporal warning (or alerting) effect provided by the initial presentation of the unimodal visual stimuli (see Posner, 1978 ; Spence & D river, 1997 ; cf. Vroomen & de G elder, 2000) that may have led to the improved crossmodal stream segregation (i.e., to the reduced crossmodal congruency effects) reported in the intramodal grouping condition (see Figure 8A ). The results of this control experiment showed that the presentation of two lights prior to the bimodal displays did not have any effect on the magnitude of the crossmodal dynamic capture effect if they were presented centrally and in a different color to the visual stimuli leading to the impression of apparent motion. Crucially. while the central lights provided just as much temporal information as the peripheral lights in the previous experiment, the difference in their color and location meant that they were no longer grouped with the subsequently-presented lights giving rise to the impression of visual apparent motion. Taken together, therefore, the results of these two experiments demonstrate that intramodal perceptual Intersensory G estalten and Crossmodal Scene Perception Figure 8 Schematic illustration of two of the displays (both incongruent) presented in Sanabria et al.’s (2004b) study of the interaction between intramodal and crossmodal perceptual grouping. Note that the conditions for unimodal visual perceptual grouping were either initiated prior to the presentation of the audiovisual event (see Figure 8A ), or else after the appearance of the rst bimodal display (see Figure 8B). Note that the actual number of stimuli presented in the two modalities was kept constant across both conditions. T1 T6 indicate the temporal sequence of events in the trial. grouping has a more pronounced effect on crossmodal perceptual organization when the conditions promoting intramodal grouping are given temporal precedence over the bimodal displays that can be grouped crossmodally. 6) Local versus global perceptual organization Sanabria and his colleagues (Sanabria et al., 2004c) have been able to show that changes in the local versus global grouping of visual apparent motion displays presented in time with the bimodal displays can also modulate crossmodal perceptual organization (at least as far as these changes can be indexed by changes in the crossmodal dynamic capture effect). Previous studies of unimodal visual scene perception have shown that perceptual analysis tends to be governed by global eld effects rather than by the local analysis of the individual parts of an image (e.g., A lais & Lorenceau, 2002 ; H e & O oi, 1999 ; Kramer & Yantis, 1997 ; R amachandran & A nstis, 1983). Sanabria et al. (2004c) asked a similar question with regard to the perceptual organization investigated of multisensory whether crossmodal scenes : In perceptual particular, they organization is Figure 9 Schematic illustration of four of the different trial types presented in Sanabria et al.’s (2005a ; E xperiment 2) study of the relative in uence of local versus global visual apparent motion on the crossmodal dynamic capture effect. The four conditions shown here result from the crossing of the factors of congruency (incongruent versus congruent trials) and the number of visual stimuli (2 versus 4 lights). Note that there were also 4 more trial types (not shown) in which the visual stimuli were presented in the opposite order to that shown here (i.e., the lights appeared to move from the bottom to the top of the display in the 4 lights condition). The global direction of apparent motion in the displays is indicated by the horizontal and vertical arrows in the gure. The magnitude of the crossmodal dynamic capture effect was signi cantly greater in the 2 lights displays than in the 4 lights displays. Sanabria et al. interpreted this difference in terms of the additional two lights presented in the 4 lights displays giving rise to global visual motion in the vertical dimension. Intersensory G estalten and Crossmodal Scene Perception dependent on the local analysis of the parts of the display (cf. U llman, 1979), or whether instead it depends on the global eldlike aspects of the display. In their experiment, Sanabria et al. varied the perceived direction (horizontal vs. vertical) of visual apparent motion (see Figure 9). They demonstrated that when the two visual stimuli giving rise to horizontal local motion were embedded within an array of four lights, giving rise to the global perception of vertical apparent motion (see the 4 lights conditions shown in Figure 9B), the in uence of any local visual motion information was reduced signi cantly (mean crossmodal congruency effect of 23% ) as compared to the more typical 2 lights conditions (see Figure 9A ) where a mean crossmodal dynamic capture effect of 36% was observed. In a subsequent study, Sanabria et al. (2005a ; E xperiment 2) were able to demonstrate a similar dominance of global eld effects over local visual apparent motion when the two were pitted directly against each other (see Figure 10). In this experiment, the addition of the two extra light ashes arranged horizontally induced the impression of a group of two lights moving in one direction, while the central two lights of the compound appeared to move, if inspected in isolation, in the opposite direction. In fact, the global percept actually served to reverse the perceived direction of apparent motion of the 2 central lights : If the local motion of the two central lights was from left-to-right, then the global motion of the 4 lights display was from right-to-left instead (see Figure 10B). O nce again, the crossmodal dynamic capture effect was signi cantly larger in the 2 lights condition than in the 4 lights condition (mean crossmodal congruency effect of 35% versus 15% , respectively). 7) Moreover, the results showed that the presentation of the additional 2 lights at either side of the display reversed the direction of visual apparent motion (as Figure 10 Schematic illustration of four of the different trial types presented in Sanabria et al.’s (2004b) study of the modulatory effect of visual perceptual grouping on the crossmodal dynamic capture effect. The four conditions shown here result from the crossing of the factors of congruency (incongruent versus congruent trials ; note that congruency is de ned with respect to the global direction of apparent motion in the display, as indicated by the horizontal arrows in the gure) and the number of visual stimuli (2 versus 4 lights). The magnitude of the crossmodal dynamic capture effect was signi cantly greater in the 2 lights displays than in the 4 lights displays. G iven that congruency in this study was de ned with respect to the global motion of the visual display, these results show that the crossmodal interactions are being driven by the global visual apparent motion rather than by the local motion of the middle two lights (which was actually in the reverse direction). de ned at the local level by the central 2 lights of the entire display), and that it was the global direction of apparent motion of the visual display that determined the perceptual organization taking place within the auditory modality. (Note here, once again, that as there were 4 lights and only 2 sounds, it should have been easier to segregate them from the two sounds than in the 2 lights displays, where the same number of stimuli were presented in each modality.) Intersensory G estalten and Crossmodal Scene Perception The results of Sanabria et al.’s (2004b, c ; 2005a) recent studies therefore support the view that crossmodal perceptual organization involves a complex interplay between temporallyoverlapping intramodal and crossmodal grouping processes. This conclusion might at rst glance appear to stand in contrast to the claim made by a number of researchers regarding the putatively early’ nature of intramodal perceptual grouping (e.g., Francis & G rossberg, 1996). A ccording to such a view, one might predict that intramodal perceptual grouping precedence over crossmodal grouping. should normally take H owever, the fact that under the appropriate conditions both unimodal and crossmodal grouping compete to determine crossmodal perceptual organization is consistent with more recent evidence showing that certain forms of intramodal perceptual grouping (at least within the visual modality) appear to occur much later in information processing than had been traditionally thought (e.g., see Beck & Palmer, 2002 ; Palmer, 2002 ; Palmer, Brooks, & Nelson, 2003 ; see also Wertheimer’s, 1923/1938, pp. 79 80, description of the higher level visual grouping principle, known as E instellung, or objective set, here). What’s more, recent neuroscience research has revealed that crossmodal interactions can actually take place very early in information processing (i.e., within 40 ms of stimulus onset ; e.g., see Fu, Johnston, Shah, A rnold, Smiley, H ackett, G arraghty, & Schroeder, 2003 ; G iard & Peronnet, 1999 ; Murray, Molholm, Michel, H eslenfeld, R itter, Javitt, Schroeder, & Foxe, 2004). Furthermore, direct anatomical connections have now also been shown to exist between what were formally considered to be strictly unimodal’ cortical processing areas, thus providing a potential neural substrate for very early crossmodal interactions (e.g., Falchier, Clavagnier, Barone, & Kennedy, 2003 ; R ockland & O jima, 2003). When this information is taken together with the fact that the relative strength of the various intramodal grouping principles also varies (e.g., Kubovy, 1981), one might expect that the extent to which unimodal perceptual grouping will dominate over crossmodal perceptual grouping in any given situation may well depend (at least to some extent) on the nature of the perceptual grouping processes that are being deployed intramodally versus crossmodally. 8) 7) Crossmodal in uences on temporal organization O ne other area of research that is very relevant to the topic of crossmodal perceptual organization relates to the temporal structure of stimuli and stimulus sequences. O ver the years, a number of studies have shown that the perceived rate of stimulation in one sensory modality can be modulated by the rate of stimulus presentation in another modality (e.g., G ebhard & Mowbray, 1959 ; G rignolo, Boles-Carenini, & Cerri, 1954 ; Knox, 1945a, b ; Kravkov, 1939 ; London, 1954 ; Maier, Bevan, & Behar, 1961 ; R egan & Spekreise, 1977 ; Shipley, 1964 ; Von Schiller, 1932 a b). In particular, research has shown that the rate at which a rapidly alternating visual stimulus appears to be ickering can be modulated quite dramatically by changes in the rate at which a simultaneously-presented auditory stimulus is made to utter. For example, participants in Shipley’s classic study had to judge the rate at which a sound appeared to utter or, at other times, to judge the rate at which a light source appeared to icker. Shipley reported that changing the physical rate of utter of a clicking sound induced a systematic change in the apparent rate at which a ashing light was simultaneously seen to icker. Indeed, for one of Shipley’s observers, a visual stimulus that was actually presented at a icker rate of 10 cycles per second was Intersensory G estalten and Crossmodal Scene Perception reported at different times to be ickering at anything between 7 and 22 cycles per second depending on the rate of simultaneously-presented auditory stimulus. utter of the O ne generalization that has emerged from these studies has been that auditory utter typically has a much greater in uence on judgments of perceived visual icker than vice versa. H owever, it is perhaps worth pausing at this point to take note of the fact that many of these studies presented stimuli at repetition rates close to the icker/ utter-fusion threshold. The icker/ utter fusion threshold (sometimes known as the critical icker frequency or the critical utter frequency) is de ned as the frequency at which a clicking sound appears steady, while the critical icker frequency threshold is de ned as the frequency at which a ashing light appears steady (i.e., it appears indistinguishable from a continuously illuminated light). A s such, it could be argued that the results of many of these earlier studies, while interesting, may tell us more about crossmodal interactions in the perception of a speci c stimulus attribute, rather than necessarily anything about crossmodal in uences in perceptual organization per se (cf. Kubovy, 1981, p. 83, for his distinction between micro time and event time). It is therefore important to note that very similar results have also now been reported in other studies when the rate of stimulus presentation was much lower (e.g., G uttman, G ilroy, & Blake 2005 ; Kitagawa & Wada, 2004 ; R ecanzone, 2003 ; Wada, Kitagawa, & Noguchi, 2003 ; Welch et al., 1986). U nder conditions where stimuli are presented at rates that are slow enough for participants to individuate the elements in the sequence, it would appear that the results become more directly relevant to issues of crossmodal perceptual organization. Participants in one such representative study by Wada et al. (2003) had to judge the rate of change of a stream of brie y- presented visual stimuli (the rate of presentation of stimuli in the visual ash train could either increase or decrease, with 11 brief stimuli being presented within a time window of 2050 ms) while attempting to ignore a train of distractor tones whose rate of presentation was either increasing or decreasing. Wada et al. showed that the rate of change of temporal structure in the distractor modality (audition in this case) in uenced participants’ judgments of the rate of change of the visual stimuli, at least when the rate of change of stimulus presentation in the target modality was ambiguous. A similar pattern of results was also reported in the reverse direction, that is, judgments of the change in the rate of presentation of a stream of auditory stimuli were also in uenced by the presence of an irrelevant visual distractor sequence. A t rst glace, results such as these would appear to support the claim that in the temporal domain, just as for the other areas that have been outlined earlier, the perceptual organization of stimuli in one sensory modality can in uence the perceptual organization of the stimuli in another. H owever, as the authors themselves recognize, it is unclear to what extent these results should be taken as re ecting a genuine crossmodal in uence on the temporal aspects of perceptual organization versus simply re ecting a response bias induced by the presence of the distractor stimuli on trials where participants were uncertain of the rate of change of stimulus presentation in the target modality (see also Noguchi & Wada, 2004). More convincing evidence for the existence of genuinely perceptual crossmodal in uences in the temporal aspects of perceptual organization come from recent research on the twoash illusion (e.g., A ndersen, Tiippana, & Sams, 2005 ; Shams et al., 2000 ; Shams, Kamitani, & Shimojo, 2002). In a prototypical study, participants are presented with a rapid train of 1 4 ashes Intersensory G estalten and Crossmodal Scene Perception in the periphery, and have simply to report the number of ashes that they see. A t the same time, a distractor stream of 1 4 auditory stimuli may be presented. The surprising nding to have emerged from a number of such studies is that, in one- ash trials, participants report having seen two lights whenever 2 or more beeps are presented auditorily. What’s more, the illusion is asymmetrical in that the auditorily induced ssion of a unitary visual event has been shown to occur far less readily when people have to report the number of beeps while being presented with a sequence of ashes (see also Shipley, 1964 ; though see A ndersen, Tiippana, & Sams, 2004). O ther researchers have also shown that this ssion illusion, initially reported with sounds and lights, also occurs between touch and both audition and vision (e.g., Bresciani, E rnst, D rewing, Bouyer, Maury, & Kheddar, 2005 ; H olmes, Sanabria, Calvert, & Spence, 2006 ; H otting & R oder, 2004 ; Violentyev, Shimojo, & Shams, 2005). perceptual nature of the two- ash illusion The genuinely has also been demonstrated in subsequent research using signal detection theory (e.g., Violentyev et al., 2005 ; see also Berger, Martelli, & Pelli, 2003). The extant research on the temporal aspects of perceptual organization therefore converges with the ndings reported in the earlier sections of this review in showing that the perceptual organization of the stimuli presented in one sensory modality can have an in uence on the perceptual organization of the temporal structure of stimuli presented in another sensory modality. What’s more, this research also appears to highlight the importance of stimulus structure in constraining the nature of such crossmodal interactions. In particular, it would appear that the modality carrying the signal that is more discontinuous (and hence possibly more salient) becomes the in uential, or modulating, modality (Shimojo & Shams, 2001 ; cf. Shipley, 1964). 8) Selective attention and crossmodal scene perception Before concluding, it is important to consider brie y what role, if any, selective attention might play in constraining crossmodal perceptual organization (cf. Kahneman, 1973, chapter 5; Kahneman & H enik, 1981 ; see also Knox, 1945b ; Pomerantz, 1981). Several studies have reported that focused attention, no matter whether directed to a particular sensory modality or to a particular spatial location, can in uence the perceptual organization taking place within a given sensory modality. For example, Carlyon and colleagues (Carlyon, Plack, & Cusack, 2001 ; Carlyon, Plack, Fantini, & Cusack, 2003 ; Cusack, D eeks, A ikman, & Carlyon, 2004) have shown that auditory stream segregation is impaired if a participant’s attention is directed toward the visual modality in order to perform an attentiondemanding monitoring task during the presentation of the auditory stimuli. Similarly, Soto-Faraco and colleagues (Soto-Faraco et al., 2003) have shown that under certain conditions, directing attention to a particular stream of sensory information can modulate the size of the crossmodal dynamic capture effect (see also Toro, Sinnett, & Soto-Faraco, 2005, for the effects of diverting attention to the visual modality on the perceptual learning of auditory sequences). While most crossmodal dynamic capture studies have failed to show any in uence of auditory distractors on visual direction-ofmotion discrimination responses, Soto-Faraco et al. (2003) were able to demonstrate such an effect under conditions where participants simultaneously had to perform a demanding monitoring task. The participants in this study had to monitor one of two centrally-presented streams (one auditory and the other visual) of rapidly-presented stimuli in order to detect occasionally-presented target stimuli. The participants also had to Intersensory G estalten and Crossmodal Scene Perception respond to the direction of visual or auditory apparent motion displays (while ignoring the apparent motion stimuli presented in the other modality) in certain trials presented infrequently at unpredictable moments during the primary monitoring task. U nder such conditions, Soto-Faraco et al. observed a signi cant crossmodal dynamic capture effect from the auditory apparent motion distractors on visual direction-of-motion discrimination responses (a mean crossmodal capture effect of 22% ). A similar modulation of the crossmodal dynamic capture effect was also observed on auditory direction-of-motion discrimination responses (though note that there was no interaction between the modality of the central monitoring task and the target modality for the motion discrimination task). Soto-Faraco et al.’s results therefore show that focused attention can modulate crossmodal perceptual grouping as indexed by performance in the crossmodal dynamic capture task. Sanabria, Soto-Faraco, and Spence (in press) have recently extended this line of research to show that the spatial orienting of attention can also have a robust in uence of the perceptual organization of crossmodal scenes. They showed that the magnitude of the audiovisual crossmodal dynamic capture effect could be reduced by as much as 8.5% if a participant’s attention was endogenously (i.e., voluntarily) directed to the location from which the auditory and visual apparent motion stimuli were to be presented as compared to conditions in which the participant’s attention had been directed elsewhere. A n even more impressive modulation of the crossmodal dynamic capture effect was seen under conditions where the participant’s attention was directed to the peripheral location exogenously (i.e., automatically) by means of the presentation of spatially-nonpredictive peripheral visual cues instead (the mean reduction in the magnitude of the crossmodal dynamic capture effect under such conditions was 20% ). Sanabria et al.’s (in press) results contrast markedly with previous evidence showing that spatial attention does not appear to in uence crossmodal integration in the classic ventriloquism effect, at least as assessed by determining the perceived location of a stationary auditory event in the presence of an irrelevant visual distractor (see Bertelson & de G elder, 2004, for a recent review). O ne explanation for Sanabria et al.’s results has to do precisely with the interplay between within- and cross-modal perceptual organization processes being modulated by attention, something that is less likely to occur when single events are presented, such as in the prototypical version of the ventriloquism illusion. The explanation for these counterintuitive ndings may be that spatial attention actually helps to segregate different streams of sensory information, thereby weakening the in uence of the perceptual organization in one modality on the organization of perceptual experience in the other sensory modality. Sanabria et al.’s results can therefore be seen as providing empirical support for the view that stream segregation does not always occur prior to attentional selection, at least for the case of crossmodal perceptual organization (see Bregman & R udnicky, 1975 ; Kahneman, 1973 ; Kahneman & H enik, 1981). Instead, it would appear that focused attention, nomatter whether directed spatially or to a particular sensory modality, can constrain the process of crossmodal perceptual organization. 9) Intersensory Gestalten G ilbert (1938, 1941) introduced the term inter-sensory G estalten to account for (or at least to describe) crossmodal interactions of the type outlined in the present chapter. While the term itself Intersensory G estalten and Crossmodal Scene Perception seems more appealing than Z apparoli and R eatto’s (1969, pp. 266 267) G estalten of G estalten’, it is important to note that there are at least two quite distinct phenomena that could be described under such a heading. O ne relatively uncontroversial interpretation of the term is to use it to refer to situations in which the organizational structure of stimuli in one sensory modality can be shown to in uence the perceived organization of stimuli presented in another modality (e.g., as in the studies of O ’Leary & R hodes, 1984 ; and Soto-Faraco et al., 2002). This interpretation is consistent with G ilbert’s (1938) use of the term to describe the in uence of sequentially presented auditory stimuli on the perception of visual apparent motion between two discrete light sources. It is also consistent with G ilbert’s (1941) discussion of the fact that properties. we m ust also reck on with the total field T his involves the superim position of one pattern of stim ulation upon a heterom odal pattern, with a resulting new com plex inter-sensory G estalt in which the properties of the original patterns are m odi ed.’ (see G ilbert, 1941, p. 401). Note here that the stress is on the modi cation of the original sensory (i.e., unimodal) patterns, rather than on the generation of a new intersensory pattern. perceptual organization A ccording to takes place G ilbert’s interpretation, within each modality individually, while still allowing for the fact that the perceptual organization taking place in one modality can in uence the perceptual organization occurring in another sensory modality. H owever, a second and more controversial interpretation of the term intersensory G estalten would be to take it to imply the existence of some kind of multisensory organization (or structure ; what A llen & Kolers, 1981, p. 1318, described as a com m on or suprasensory organiz ing principle’) that is not present (and/or could not be perceived) by considering either of the component sensory modalities individually. Paul Schiller (1935, p. 468) seems to have been getting at something like this when he argued that Such configurational tendencies can com e not only from the sam e but also from different heterosensorial’ fields. T hat is what happens in these experim ents. A perception is produced by sensations of different m odalities, which often create intersensorial patterns.’ H owever, it is important to note that other researchers have argued that such intersensory patterns do not exist : For example, Fraisse (1963, p. 73) points out in his book The psychology of time’ that A succession of sounds and lights will never allow perception of an organiz ation which integrates the two. T here will be perception of a double series, one of sounds and one of lights.’ To date, empirical evidence that can be taken to support the existence of genuine intersensory G estalten is weak. O ne source of evidence would come from the emergence of an inter-sensory pattern of organization in experiments where the stimuli were presented at different rates in each modality. H owever, G ebhard and Mowbray (1959, p. 523) report, albeit anecdotally, failing to observe any such phenomenon in their study of the auditory driving of visual icker (though see G uttman et al., 2005, Footnote 3, p. 234, for subjective reports from participants claiming to experience complex rhythmic G estalts combining both auditory and visual inputs). Similarly, we are aware of no other evidence supporting the emergence of intersensory rhythms (cf. H andel & Buffardi, 1968, 1969). A second line of support for the existence of genuinely intersensory G estalten would be provided by a convincing demonstration of the existence of crossmodal (or intermodal) apparent motion (i.e., the perception of apparent motion emerging from the sequential presentation of static stimuli in different sensory modalities and from different spatial locations Intersensory G estalten and Crossmodal Scene Perception at the appropriate temporal interval). H owever, while the perception of apparent motion within individual sensory modalities has been widely explored and documented since the seminal study of Wertheimer (1912 ; e.g., see Burtt, 1917a, b ; Kolers, 1972 ; Strybel & Vatakis, 2005), far more controversy surrounds the possible existence of crossmodal (or intermodal ; A llen & Kolers, 1981 ; Z apparoli & R eatto, 1969) apparent motion. Intermodal apparent motion might be expected to occur when two or more stationary stimuli of different sensory modalities are brie y presented from distinct spatial locations at the appropriate interstimulus interval. U nder such conditions, certain researchers have reported that people can indeed experience some weak form of apparent motion. E arly research, based on subjective reports, suggested the existence of crossmodal apparent motion between all possible combinations of auditory, visual and tactile stimuli (G alli, 1932 ; Z apparoli & R eatto, 1969). Z apparoli and R eatto (p. 262) describe the experience of intermodal apparent movement between auditory and visual stimuli as som ething that m oves between the sound and the light or between the light and the sound, a light and sound tunnel which grows longer and shorter, or a light tunnel which grows longer and shorter while a sound passes through it.’ More recently, H arrar, Winter, and H arris (2005) described the percept of apparent motion that was observed subjectively following the presentation of a visual and a tactile stimulus from different locations as feeling lik e an event at one location causing an event at another’. H owever, it is important to note that the weight of evidence from more recent research that has subjected the putative effect to more robust empirical investigation has mostly failed to provide any support for the phenomenon (e.g., A llen & Kolers, 1981 ; Sanabria et al., 2005 ; though see also H arrar et al., 2005). Therefore, at present, there appears to be little convincing evidence to support the existence of genuine intersensory G estalten, if what is meant by the term is patterns of crossmodal perceptual organization that rely for their existence on stimulation in more than one sensory modality, and which are not also present in their constituent sensory modalities. 10) Conclusions Taken together, the research highlighted in the present chapter demonstrates just how profoundly multisensory integration (or crossmodal perceptual grouping) can be in uenced by the nature of any intramodal perceptual grouping that may be taking place at the same time. O ver the last 75 years, a number of different studies, using a wide variety of different experimental paradigms, have provided a wealth of empirical evidence to show just how profoundly intramodal perceptual grouping in uences the nature and magnitude of any crossmodal interactions. We have argued that a meaningful distinction can be drawn between a number of different questions that have been addressed by the researchers in this area : O ne question that has been tackled by several researchers relates to investigating the extent to which variations in the perceptual organization/segregation taking place within one sensory modality can affect the perceptual organization of stimuli presented within a different sensory modality (e.g., Maass, 1939 ; O ’Leary & R hodes, 1984 ; Soto-Faraco et al., 2002 ; note that research on this issue is perhaps most directly related to the Christmas tree lights illusion with which we started this chapter). Meanwhile, other researchers have directed their efforts instead at determining whether the magnitude of any crossmodal binding/ interaction/grouping taking place between stimuli presented in Intersensory G estalten and Crossmodal Scene Perception different sensory modalities is affected by variations in the strength of the intramodal perceptual grouping cues that may be available in the scene/display (e.g., Vroomen & de G elder, 2000 ; Watanabe & Shimojo, 2001a). The most recent research has tended to focus on assessing the extent to which crossmodal perceptual organization is dependent upon the global versus local organization of the constituent unimodal displays (e.g., Sanabria et al., 2004b, c ; 2005a). Finally, several researchers have successfully demonstrated that selective attention can also affect crossmodal perceptual organization (e.g., see Carlyon et al., 2001, 2003 ; Sanabria et al., in press a ; Soto-Faraco et al., 2003). This growing body of empirical research helps to emphasize the importance of considering perceptual organization as a crossmodal (or multisensory) phenomenon. H owever, it is also important to note that the existence of such robust crossmodal organizational in uences (or intersensory G estalten) does not necessarily imply the existence of any genuinely intersensory (or suprasensory, in A llen & Kolers, 1981, terminology) forms of perceptual organization (see A llen & Kolers, 1981 ; G ebhard & Mowbray, 1959 ; Fraisse, 1963 ; Sanabria et al., 2005c), although this issue remains somewhat controversial (e.g., see G alli, 1932 ; G uttman et al., 2005 ; H arrar et al., 2005 ; Z apparoli & R eatto, 1969). Nevertheless, the principle point remains that intramodal perceptual grouping has been shown to exert a robust effects on crossmodal interactions, and so this factor should be added to a list of other factors, such as common spatial location (i.e., spatial proximity ; Spence & D river, 2004), common timing (temporal proximity ; Z ampini et al., 2005), common fate (e.g., Bertelson, 1999 ; Mateeff, H ohnsbein, & Noack, 1985), the unity assumption (Vatakis & Spence, in press ; Welch, 1999), and common temporal structure (e.g., A rmel & R amachandran, 2003 ; O ’Leary & R hodes, 1984 ; Thomas, 1941) that have all been shown to in uence crossmodal integration. 11) Future research G iven the paucity of studies conducted to date on the topic of crossmodal perceptual organization (or scene perception) the area offers a number of exciting and important avenues for future research. Indeed, it is striking how many of the studies that have been conducted to date have focused solely on the perceptual organization of audiovisual stimulus displays. It would therefore be interesting in future research to assess the extent to which the principles governing crossmodal perceptual organization outlined here extend to other pairings of sensory modalities, such as, for example, vision and touch, or audition and touch (though see Boernstein, 1955 1956, pp. 212 213). 9) Indeed, Phillips-Silver and Trainor (2005) have recently published developmental evidence regarding the crossmodal in uence of vestibular/proprioceptive bouncing movement cues on 7 month-old infants organization of auditory sequences, speci cally on their perception of auditory rhythm in ambiguous rhythm patterns (i.e., those without accented beats). While a few studies have started to investigate the visual-tactile modality pairing (e.g., Churchland et al., 1994 ; H olmes, Sanabria, Calvert, & Spence, 2006 ; Lyons et al., 2006 ; Sanabria et al., 2005c ; Shimojo & Shams, 2001 ; Violentyev et al., 2005), there may be particularly good grounds for investigating any crossmodal interactions between the auditory and tactile modalities, given the greater similarity in the nature of the physical signals that are transduced by these two sensory systems (e.g., see Kitagawa & Spence, 2006 ; von Bekesy, 1957, 1959 ; Mahar, Mackenzie, & McNicol, 1994 ; Mowbray & G ebhard, Intersensory G estalten and Crossmodal Scene Perception 1957 ; and Bresciani et al., 2005 ; H otting & R oder, 2004, for early empirical evidence from the audiotactile version of the two ash illusion ; cf. Sherrick, 1976). A s such, one might expect the audiotactile pairing to provide one of the best opportunities for contriving a situation in which grouping-by-similarity might be more likely to occur crossmodally than intramodal grouping-byproximity (cf. Mahar et al., 1994 ; H andel & Buffardi, 1968, p. 1028), a situation that it has thus far proven impossible to achieve using only audiovisual displays (see Figure 11). Figure 11 A n example of a visual display in which the organizational principles of grouping-by-proximity and grouping-by-similarity have been put into con ict (based on Kubovy et al., 1998, Figure 1D ). It remains an interesting question for future research to determine whether grouping-by-proximity could ever dominate over grouping-by-similarity in crossmodal perceptual organization. To date, the evidence suggests that grouping-by-similarity (in terms of stimulation from the same sensory modality) will always dominate over grouping-by-proximity in crossmodal scene perception. H owever, given the greater similarity between audition and touch than between the other sensory modalities (e.g., see von Bekesy, 1957, 1959 ; Mahar et al., 1994 ; Mowbray & G ebhard, 1957), this pairing of sensory modalities would appear to represent perhaps the best opportunity for achieving such a result experimentally. A second area where further research is needed in order to better understand the rules governing crossmodal perceptual organization relates to the role of spatial factors. G iven the present context, the question is really one of whether the in uence of the perceptual organization of stimuli in one sensory modality on the perceptual organization of stimuli presented in another modality is spatially-modulated or not. A t present, we do not have a clear answer to this question. For while some researchers have demonstrated signi cant effects of relative spatial position on certain crossmodal effects (e.g., Soto-Faraco et al., 2002 ; see also H olmes et al., 2006 ; Mays & Schirillo, 2005 ; Meyer et al., 2005), others have reported no such spatial modulation of crossmodal phenomena such as the auditory driving of visual temporal rate perception (e.g., R ecanzone, 2003 ; R egan & Spekreijse, 1977 ; Welch, D uttonH urt, & Warren, 1986), or the temporal ventriloquism illusion (Vroomen & Keetels, 2006). D etermining the importance of spatial factors to crossmodal perceptual organization is made all the more dif cult by the fact that many of the previous studies in this area failed to report whether or not the auditory and visual stimuli were presented from the same position in their experiments (e.g., see O ’Leary & R hodes, 1984 ; Vroomen & de G elder, 2000). That is, it is unclear whether the auditory stimuli in these studies were presented from the computer’s internal loudspeaker (i.e., from approximately the same position as the visual stimuli) or from headphones (i.e., from different positions ; though see Z ampini, Shore, & Spence, 2003). It will therefore be a particularly interesting challenge for future research to try and determine the conditions under which such spatial modulation of crossmodal perceptual organization takes place (see Spence, in press). O ne possibility here is that spatial co-localization may have a more pronounced in uence on crossmodal scene perception under conditions where some kind of spatial processing is required, either explicitly or implicitly (see Spence & McD onald, 2004 ; though see also H olmes et al., 2006). Intersensory G estalten and Crossmodal Scene Perception A third area where additional research would be merited relates to an assessment of the importance of synaesthetic correspondence in modulating crossmodal perceptual organization effects such as those reported here (see G allace & Spence, 2006 ; Marks, 2004). To take but one example, in O ’Leary and R hodes’s (1984) study, the high auditory frequency stimuli were paired with the visual stimuli in the upper screen locations in the bimodal stimulation conditions, thus potentially utilizing any synaesthetic correspondence between elevation/height (of space vs. frequency) that may exist between the two sensory modalities (see Pratt, 1930 ; R of er & Butler, 1968). A s O ’Leary and R hodes themselves noted more than 20 years ago, it is an open question as to whether similar results would have been found if the high (spatial) visual stimuli had been paired with the low (frequency) tones (see also Shipley, 1964, p. 1328, for evidence that synaesthetic correspondences between auditory and visual stimuli might modulate the auditory driving of visual icker). Finally, having provided a range of empirical evidence to demonstrate that the perceptual grouping taking place within a given sensory modality does indeed affect crossmodal scene perception (both on the perception of individual stimuli, as well as the perceptual organization of groups of stimuli, presented in another sensory modality) it remains a critical issue for future research to try and quantify more precisely the exact nature of these interactions between intramodal and crossmodal perceptual organization. For, as H ochberg (1974, p. 204) so elegantly put it, when summarizing the literature on unimodal visual perceptual organization more than 30 years ago : T he G estalt explanation of perceptual organiz ation m ust be regarded as a rst stage in an evolving form ulation of both problem and solution, neither a closed issue nor a successful theory.’ So, having demonstrated the interaction between intramodal and crossmodal grouping principles in multisensory perceptual organization, future studies will increasingly need to develop more quantitative rules that can predict the relative strength of intramodal versus crossmodal perceptual grouping under a range of different experimental conditions (see H ochberg, 1974 ; H ochberg & Silverstein, 1956 ; Kubovy, H olcombe, & Wagemans, 1998 ; and O yama, 1961, for previous attempts to assess the relative strength of grouping-bysimilarity and grouping-by-proximity in unimodal visual displays). It will only be by moving forward from the merely descriptive to an account of crossmodal scene perception that is genuinely predictive that future research on intersensory G estalten will provide a genuinely useful contribution to our understanding of the laws (rather than rules) governing the multisensory integration of sensory information. References A ksentijevic, A ., E lliott, M. A ., & Barber, P. J. 2001 D ynamic of perceptual grouping : Similarities in the organization of visual and auditory groups. V isual Cognition, 8, 349 358. A lais, D ., & Lorenceau, J. 2002 Perceptual grouping in the Ternus display : E vidence for an association eld in apparent motion. V ision R esearch, 42, 1005 1016. A ndersen, T. S., Tiippana, K., & Sams, M. 2004 Factors in uencing audiovisual ssion and fusion illusions. Cognitive B rain R esearch, 21, 301 308. A ndersen, T. S., Tiippana, K., & Sams, M. 2005 Maximum likelihood integration of rapid ashes and beeps. N euroscience L etters, 380, 155 160. A rmel, K. C., & R amachandran, V. S. 2003 Projecting sensations to external objects : E vidence from skin conductance response. Intersensory G estalten and Crossmodal Scene Perception Proceedings of the R oyal Society B , 270, 1499 1506. Beck, D . M., & Palmer, S. E . 2002 Top-down in uences on perceptual grouping. Journal of E xperim ental Psychology : H um an Perception and Perform ance, 28, 1071 1084. Berger, T. D ., Martelli, M., & Pelli, D . G . 2003 Flicker utter : Is an illusory event as good as the real thing ? Journal of V ision, 3, 406 412. Bertelson, P. 1999 Ventriloquism : A case of crossmodal perceptual grouping. In G . A shersleben, T. Bachmann, & J. Musseler (E ds.), Cognitive contributions to the perception of spatial and tem poral events (pp. 347 362). E lsevier Science, BV : A msterdam. Bertelson, P., & de G elder, B. 2004 The psychology of multimodal perception. In C. Spence & J. D river (E ds.), Crossm odal space and crossm odal attention (pp. 141 177). O xford : O xford U niversity Press. Bertenthal, B. I., Banton, T., & Bradbury, A . 1993 D irectional bias in the perception of translating patterns. Perception, 22, 193 207. Boernstein, W. S. 1955 1956 Classification of the human senses. Y ale Journal of B iology and M edicine, 28, 208 215. Bregman, A . S. 1990 A uditory scene analysis : T he perceptual organiz ation of sound. Cambridge, MA : MIT Press. Bregman, A . S., & A chim, A . 1973 Visual stream segregation. Perception & Psychophysics, 13, 451 454. Bregman, A . S., & Campbell, J. 1971 Primary auditory stream segregation and perception of order in rapid sequences of tones. Journal of E xperim ental Psychology, 89, 244 249. Bregman, A . S., & D annenbring, G . L. 1973 The effect of continuity on auditory stream segregation. Perception & Psychophysics, 13, 308 312. Bregman, A . S., & R udnicky, A . I. 1975 A uditory segregation : Stream or streams ? Journal of E xperim ental Psychology : H um an Perception and Perform ance, 1, 263 267. Bresciani, J. P., E rnst, M. O ., D rewing, K., Bouyer, G ., Maury, V., & Kheddar, A . 2005 Feeling what you hear : A uditory signals can modulate tactile tap perception. E xperim ental B rain R esearch, 162, 172 180. Burtt, H . E . 1917a A uditory illusions of movement A preliminary study. Journal of E xperim ental Psychology, 2, 63 75. Burtt, H . E . 1917b Tactile illusions of movement. Journal of E xperim ental Psychology, 2, 371 385. Bushara, K. O ., H anakawa, T., Immisch, I., Toma, K., Kansaku, K., & H allett, M. (2002). Neural correlates of cross-modal binding. N ature N euroscience, 6, 190 195. Calvert, G . A ., Spence, C., & Stein, B. E . (E ds.) 2004 T he handbook of m ultisensory processes. Cambridge, MA : MIT Press. Carlyon, R . P. 2004 H ow the brain separates sounds. T rends in Cognitive Sciences, 8, 465 471. Carlyon, R . P., Plack, C. J., & Cusack, R . 2001 Cross-modal and cognitive in uences on the build-up of auditory streaming. B ritish Journal of A udiology, 35, 139 140. Carlyon, R . P., Plack, C. J., Fantini, D . A ., & Cusack, R . 2003 Cross-modal and non-sensory in uences on auditory streaming. Perception, 32, 1393 1402. Churchland, P. S., R amachandran, V. S., Sejnowski, T. J. 1994 A critique of pure vision. In C. Koch & J. L. D avis (E d.), L argescale neuronal theories of the brain (pp. 23 60). Cambridge, MA : MIT Press. Corbin, H . H . 1942 The perception of grouping and apparent movement in visual depth. A rchives of Psychology, 273, 1 50. Correa, A ., Sanabria, D ., Spence, C., Tudela, P., & Lupianez, J. 2006 Selective temporal attention enhances the temporal resolution of visual perception : E vidence from a temporal order judgment task. B rain R esearch, 1070, 202 205. Craig, J. C. 2006 Visual motion interferes with tactile motion perception. Perception, 35, 351- 367. Cusack, R ., D eeks, J., A ikman, G ., & Carlyon, R . P. 2004 E ffects of location, frequency region, and time course of selective attention on auditory scene analysis. Journal of E xperim ental Psychology : H um an Perception & Perform ance, 30, 643 656. D river, J., & Spence, C. 2000 Multisensory perception : Beyond modularity and convergence. Current B iology, 10, R 731 R 735. E cker, A . J., & H eller, L. M. 2005 A uditory-visual interactions in the perception of a ball’s path. Perception, 34, 59 75. Falchier, A ., Clavagnier, S., Barone, P., & Kennedy, H . 2003 A natomical evidence of multimodal integration in primate striate cortex. Journal of N euroscience, 22, 5749 5759. Intersensory G estalten and Crossmodal Scene Perception Fraisse, P. 1963 T he psychology of tim e. London : H arper & R ow. Francis, G ., & G rossberg, S. 1996 Cortical dynamics of form and motion integration : Persistence, apparent motion, and illusory contours. V ision R esearch, 36, 149 173. Fu, K. M. G ., Johnston, T. A ., Shah, A . S., A rnold, L., Smiley, J., H ackett, T. A ., G arraghty, P. E ., & Schroeder, C. E . 2003 A uditory cortical neurons respond to somatosensory stimulation. Journal of N euroscience, 23, 7510 7515. G allace, A ., & Spence, C. 2006 Multisensory synesthetic interactions in the speeded classi cation of visual size. Perception & Psychophysics, 68, 1191 1203. G alli, P. A . 1932 U ber mittelst verschiedener Sinnesreize erweckte Wahrnehmung von Scheinbewegungen [O n the perception of apparent motion elicited by different sensory stimuli]. A rchiv fur die gesam te Psychologie, 85, 137 180. G ebhard, J. W., & Mowbray, G . H . 1959 O n discriminating the rate of visual icker and auditory utter. A m erican Journal of Psychology, 72, 521 528. G iard, M. H ., & Peronnet, F. 1999 A uditory-visual integration during multimodal object recognition in humans : A behavioral and electrophysiological study. Journal of Cognitive N euroscience, 11, 473 490. G ilbert, G . M. 1938 A study in inter-sensory G estalten. Psychological B ulletin, 35, 698. G ilbert, G . M. 1939 D ynamic psychophysics and the phi phenomenon. A rchives of Psychology, 237, 5 43. G ilbert, G . M. 1941 Inter-sensory facilitation and inhibition. Journal of G eneral Psychology, 24, 381 407. G rignolo, A ., Boles-Carenini, B., & Cerri, S. (1954). R esearches on the in uence of acoustic stimulation upon the critical fusion frequency of light stimulation. R ivista O to-N euro-O ftalm ologia, 29, 56 73. G uttman, S. E ., G ilroy, L. A ., & Blake, R . 2005 H earing what the eyes see : A uditory encoding of visual temporal sequences. Psychological Science, 16, 228 235. H andel, S., & Buffardi, L. 1968 Pattern perception : Integrating information presented in two modalities. Science, 162, 1026 1028. H andel, S., & Buffardi, L. 1969 U sing several modalities to perceive one temporal pattern. Q uarterly Journal of E xperim ental Psychology, 21, 256 266. H arrar, V., Winter, R ., & H arris, L. 2005 M ultim odal apparent m oth tion. Poster presented at the 6 A nnual Meeting of the Internath tional Multisensory R esearch Forum, R overeto, Italy, 5 8 June. H e, Z . J., & O oi, T. L. 1999 Perceptual organization of apparent motion in the Ternus display. Perception, 28, 877 892. H eise, G . A ., & Miller, G . A . 1951 A n experimental study of auditory patterns. A m erican Journal of Psychology, 64, 68 77. H ochberg, J. 1974 O rganization and the G estalt tradition. In E . C. Carterette & M. P. Friedman (E ds.), H andbook of perception V ol. 1 : H istorical and philosophical roots of perception (pp. 179 210). New York : A cademic Press. H ochberg, J., & H ardy, D . 1960 Brightness and proximity factors in grouping. Perceptual and M otor Sk ills, 10, 22. H ochberg, J., & Silverstein, A . 1956 A quantitative index of stimulussimilarity : Proximity versus differences in brightness. A m erican Journal of Psychology, 69, 456 458. H olmes, N. P., Sanabria, D ., Calvert, G . A ., & Spence, C. 2006 Crossing the hands impairs performance on a nonspatial multisensory discrimination task. B rain R esearch, 1077, 108- 115. H otting, K., & R oder. B. 2004 H earing cheats touch, but less in congenitally blind than in sighted individuals. Psychological Science, 15, 60 64. Intriligator, J. M. 2000 Self-synchroniz ing anim ations. U nited States Patent 6, 163, 323. Julesz, B., & H irsh, I. J. 1972 Visual and auditory perception A n essay of comparison. In E . E . D avid, Jr., & P. B. D enes (E ds.), H um an com m unication : A unified view (pp. 283 340). New York : McG raw-H ill. Kahneman, D . 1973 A ttention and effort. E nglewood Cliffs, NJ : Prentice-H all. Kahneman, D ., & H enik, A . 1981 Perceptual organization and attention. In M. Kubovy & J. R . Pomerantz (E ds.), Perceptual organiz ation (pp. 181 211). H illsdale, NJ : Lawrence E rlbaum A ssociates. Katz, D . 1925/1989 T he world of touch. H illsdale, NJ : E rlbaum. King, A . J., & Calvert, G . A . 2001 Multisensory integration : Intersensory G estalten and Crossmodal Scene Perception Perceptual grouping by eye and ear. Current B iology, 11, R 322 R 325. Kitagawa, N., & Spence, C. 2006 A udiotactile multisensory interactions in information processing. Japanese Psychological R esearch, 48, 158- 173. Kitagawa, N., & Wada, Y. 2004 Flexible weighting of auditory and th visual information in temporal perception. Proceedings of the 18 International Congress on A coustics, III 2289 III 2292, ICA Kyoto, Japan. Koffka, K. 1935 Principles of G estalt psychology. New York : H arcourt, Brace, & World. Kohler, W. 1929 Physical G estalten. In W. D . E llis (E d.), A source book of G estalt psychology (pp. 17 54). London : R outledge & Kegan Paul. Kohler, W. 1930 G estalt psychology. London : G . Bell & Sons. Kolers, P. A . 1972 A spects of m otion perception. New York : Pergamon Press. Knox, G . W. 1945a Investigations of icker and fusion : III. The effect of auditory stimulation on the visual CFF. Journal of G eneral Psychology, 33, 139 143. Knox, G . W. 1945b Investigations of icker and fusion : IV. The effect of auditory icker on the pronouncedness of visual icker. Journal of G eneral Psychology, 33, 145 154. Kramer, P., & Yantis, S. 1997 Perceptual grouping in space and time : E vidence from the Ternus display. Perception & Psychophysics, 59, 87 99. Kravkov, S. W. 1939 Critical frequency of icker and indirect stimuli. C. R . (D ak ) A cad. Sci. U. R . S. S., 22, 64 66. Kubovy, M. 1981 Concurrent-pitch segregation and the theory of indispensable attributes. In M. Kubovy & J. R . Pomerantz (E ds.), Perceptual organiz ation (pp. 55 98). H illsdale, NJ : E rlbaum. Kubovy, M., H olcombe, A . O ., & Wagemans, J. 1998 O n the lawfulness of grouping by proximity. Cognitive Psychology, 35, 71 98. Kubovy, M., & Van Valkenburg, D . 2001 A uditory and visual objects. Cognition, 80, 97 126. London, I. D . 1954 R esearch on sensory interaction in the Soviet U nion. Psychological B ulletin, 51, 531 568. Lyons, G ., Sanabria, D ., Vatakis, A ., & Spence, C. 2006 The modulation of crossmodal integration by unimodal perceptual grouping : A visuotactile apparent motion study. E xperim ental B rain R esearch, 174, 510 516. Maass, H . 1938 U ber den E in uss akusticher R hythmen auf optische Bewegungsgestaltungen [A bout the in uence of acoustic rhythms on visual motion]. (Sander, F. G anzheit und G estalt. Psychol. U ntersuch. VIII) A rchiv fur die G esam te Psychologie, 100, 424 464. (1923 No. 61) Madsen, M. C., R ollins, H . A ., & Senf, G . M. 1970 Variables affecting immediate memory for bisensory stimuli : E ar-eye analogue studies of dichotic listening. Journal of E xperim ental Psychology, 83 Mahar, D ., Mackenzie, B., & McNicol, D . 1994 Modality-speci c differences in the processing of spatially, temporally, and spatiotemporally distributed information. Perception, 23, 1369 1386. Maier, B., Bevan, W., & Behar, I. 1961 The effect of auditory stimulation upon the critical icker for different regions of the visible spectrum. A m erican Journal of Psychology, 74, 67 73. Marks, L. E . 2004 Cross-modal interactions in speeded classi cation. In G . A . Calvert, C. Spence, & B. E . Stein (E ds.), H andbook of m ultisensory processes (pp. 85 105). Cambridge, MA : MIT Press. Mateeff, S., H ohnsbein, J., & Noack, T. 1985 D ynamic visual capture : A pparent auditory motion induced by a moving visual target. Perception, 14, 721 727. Mays, A ., & Schirillo, J. 2005 Lights can reverse illusory directional hearing. N euroscience L etters, 384, 336 338. McA dams, S. 1984 The auditory image : A metaphor for musical and psychological research on auditory organization. In W. P. Crozier & A . J. Chapman (E ds.), Cognitive process in the perception of art (pp. 289 323). A msterdam : North-H olland. McA dams, S. E ., & Bregman, A . S. 1979 H earing musical streams. Com puter M usic Journal, 3, 26 43. McG urk, H ., & MacD onald, J. 1976 H earing lips and seeing voices. N ature, 264, 746 748. Metzger, W. 1934 Beobachtungen uber phanomenale Identitat [Studies of phenomenal identity]. Psychologische Forschung, 19, Intersensory G estalten and Crossmodal Scene Perception 1 60. Meyer, G . F., Wuerger, S. M., R ohrbein, F., & Z etzsche, C. 2005 Low-level integration of auditory and visual motion signals requires spatial co-localisation. E xperim ental B rain R esearch, 166, 538 547. Michotte, A . 1963 T he perception of causality. London : Methuen. (O riginal work published in 1946) Miller, G . A ., & H eise, G . A . 1950 The trill threshold. Journal of the A coustical Society of A m erica, 22, 637 638. Mowbray, G . H ., & G ebhard, J. W. 1957 Sensitivity of the skin to changes in rate of intermittent mechanical stimulation. Science, 125, 1297 1298. Murray, M. M., Molholm, S., Michel, C. M., H eslenfeld, D . J., R itter, W., Javitt, D . C., Schroeder, C. E ., & Foxe, J. J. 2004 G rabbing your ear : A uditory-somatosensory multisensory interactions in early sensory cortices are not constrained by stimulus alignment. Cerebral Cortex, 15, 963 974. O gilvie, J. C. 1956a E ffect of auditory utter on the visual critical icker frequency. Canadian Journal of Psychology, 10, 61 68. O gilvie, J. C. 1956b The interaction of auditory utter and CFF : The effect of brightness. Canadian Journal of Psychology, 10, 207 210. O ’Leary, A ., & R hodes, G . 1984 Cross-modal effects on visual and auditory object perception. Perception & Psychophysics, 35, 565 569. O yama, T. 1961 Perceptual grouping as a function of proximity. Perceptual and M otor Sk ills, 13, 305 306. Palmer, S. E . 2002 Perceptual grouping : It’s later than you think. Current D irections in Psychological Science, 11, 101 106. Palmer, S. E ., Brooks, J. L., & Nelson, R . 2003 When does grouping happen ? A cta Psychologica, 114, 311 330. Phillips-Silver, J., & Trainor, L. J. 2005 Feeling the beat : Movement in uences infant rhythm perception. Science, 308, 1430. Pomerantz, J. R . 1981 Perceptual organization in information processing. In M. Kubovy & J. R . Pomerantz (E ds.), Perceptual organiz ation (pp. 141 180). H illsdale, NJ : E rlbaum. Posner, M. I. 1978 Chronom etric explorations of m ind. H illsdale, NJ : E rlbaum. Pratt, C. C. 1930 The spatial character of high and low tones. Journal of E xperim ental Psychology, 13, 278 285. R amachandran, V. S., & A nstis, S. M. 1983 Perceptual organization of moving patterns. N ature, 304, 529 531. R ecanzone, G . H . 2003 A uditory in uences on visual temporal rate perception. Journal of N europhysiology, 89, 1078 1093. R egan, D ., & Spekreijse, H . 1977 A uditory-visual interactions and the correspondence between perceived auditory space and perceived visual space. Perception, 6, 133 138. R ockland, K. S., & O jima, H . 2003 Multisensory convergence in calcarine visual areas in macaque monkey. International Journal of Psychophysiology, 50, 19 26. R of er, S. K., & Butler, R . A . 1968 Factors that in uence the localization of sound in the vertical plane. Journal of the A coustical Society of A m erica, 43, 1255 1259. R oyer, F. L., & G arner, W. R . 1970 Perceptual organization of nineelement auditory temporal patterns. Perception & Psychophysics, 7, 115 120. R yan, T. A . 1940 Interrelations of the sensory systems in perception. Psychological B ulletin, 37, 659 698. R ush, G . P. 1937 Visual grouping in relation to age. A rchives of Psychology, 31 (Whole No. 217), 1 95. Sanabria, D ., Correa, A ., Lupianez, J., & Spence, C. 2004a Bouncing or streaming ? E xploring the in uence of auditory cues on the interpretation of ambiguous visual motion. E xperim ental B rain R esearch, 157, 537 541. Sanabria, D ., Lupianez, J., & Spence, C. (in press). A uditory motion affects visual motion perception in a speeded discrimination task. E xperim ental B rain R esearch. Sanabria, D ., Soto-Faraco, S., Chan, J. S., & Spence, C. 2004b When does visual perceptual grouping affect multisensory integration ? Cognitive, A ffective, & B ehavioral N euroscience, 4, 218 229. Sanabria, D ., Soto-Faraco, S., Chan, J. S., & Spence, C. 2005a Intramodal perceptual grouping modulates multisensory integration : E vidence from the crossmodal congruency task. N euroscience L etters, 377, 59 64. Sanabria, D ., Soto-Faraco, S., & Spence, C. 2004c E xploring the role of visual perceptual grouping on the audiovisual integration of motion. N euroreport, 15, 2745 2749. Sanabria, D ., Soto-Faraco, S., & Spence, C. 2005b Spatiotemporal Intersensory G estalten and Crossmodal Scene Perception interactions between audition and touch depend on hand posture. E xperim ental B rain R esearch, 165, 505 514. Sanabria, D ., Soto-Faraco, S., & Spence, C. 2005c A ssessing the effect of visual and tactile distractors on the perception of auditory apparent motion. E xperim ental B rain R esearch, 166, 548 558. Sanabria, D ., Soto-Faraco, S., & Spence, C. (in press) Spatial attention modulates audiovisual interactions in apparent motion. Journal of E xperim ental Psychology : H um an Perception and Perform ance. Sanabria, D ., Spence, C., & Soto-Faraco, S. (2007). Perceptual and decisional contributions to audiovisual interactions in the perception of apparent motion : A signal detection study Cognition, 102, 299 310. Scheier, C., Lewkowicz, D . J., & Shimojo, S. 2003 Sound induced perceptual reorganization of an ambiguous motion display in human infants. D evelopm ental Science, 6, 233 241. Schiller, P. 1935 Interrelation of different senses in perception. B ritish Journal of Psychology, 25, 465 469. Sekuler, A . B., & Benneett, P. J. 2001 G eneralized common fate : G rouping by common luminance changes. Psychological Science, 12, 437 444. Sekuler, R ., Sekuler, A . B., & Lau, R . 1997 Sound alters visual motion perception. N ature, 385, 308. Sekuler, A . B., & Sekuler, R . 1999 Collisions between moving visual targets : What controls alternative ways of seeing an ambiguous display ? Perception, 28, 415 432. Shams, L., Kamitani, Y., & Shimojo, S. 2000 What you see is what you hear : Sound induced visual ashing. N ature, 408, 788. Shams, L., Kamitani, Y., & Shimojo, S. 2002 Visual illusion induced by sound. Cognitive B rain R esearch, 14, 147 152. Shepard, R . N. 1981 Psychophysical complementarity. In M. Kubovy & J. R . Pomerantz (E ds.), Perceptual organiz ation (pp. 279 341). H illsdale, NJ : E rlbaum. Shepard, R . N. 1987 E volution of a mesh between principles of the mind and regularities of the world. In J. D upre (E d.), T he latest on the best : E ssays on evolution and optim ality (pp. 251 275). Cambridge, MA : MIT Press. Shepard, R . N. 1994 Perceptual-cognitive universals as re ections of the world. Psychonom ic B ulletin & R eview, 1, 2 28. Sherrick, C. E . 1976 The antagonisms of hearing and touch. In S. K. H irsh, D . H . E ldredge, I. J., H irsh & S. R . Silverman (E ds.), H earing and D avis : E ssays honoring H allowell D avis (pp. 149 158). St. Louis, Mo., Washington U niversity Press. Shimojo, S., & Shams, L. 2001 Sensory modalities are not separate modalities : Plasticity and interactions. Current O pinion in N eurobiology, 11, 505 509. Shipley, T. 1964 A uditory utter-driving of visual icker. Science, 145, 1328 1330. Slutsky, D . A ., & R ecanzone, G . H . 2001 Temporal and spatial dependency of the ventriloquism effect. N euroreport, 12, 7 10. Smith, B. (E d.) 1988 Foundations of G estalt theory. Munich, G ermany : Philosophia Verlag. Soto-Faraco, S., & Kingstone, A . 2004 Multisensory integration of dynamic information. In G . A . Calvert, C. Spence, & B. E . Stein (E ds.), T he handbook of m ultisensory processes (pp. 49 67). Cambridge, MA : MIT Press. Soto-Faraco, S., Kingstone, A ., & Spence, C. 2003 Multisensory contributions to the perception of motion. N europsychologia, 41, 1847 1862. Soto-Faraco, S., Kingstone, A ., & Spence, C. 2006 Integrating motion information across sensory modalities : The role of topdown factors. Progress in B rain R esearch, 155, 277 290. Soto-Faraco, S., Lyons, J., G azzaniga, M., Spence, C., & Kingstone, A . 2002 The ventriloquist in motion : Illusory capture of dynamic information across sensory modalities. Cognitive B rain R esearch, 14, 139 146. Soto-Faraco, S., Spence, C., & Kingstone, A . 2004a Cross-modal dynamic capture : Congruency effects in the perception of motion across sensory modalities. Journal of E xperim ental Psychology : H um an Perception and Perform ance, 30, 330 345. Soto-Faraco, S., Spence, C., & Kingstone, A . 2004b Congruency effects between auditory and tactile motion : E xtending the phenomenon of crossmodal dynamic capture. Cognitive, A ffective, & B ehavioral N euroscience, 4, 208 217. Soto-Faraco, S., Spence, C., & Kingstone, A . 2005 A ssessing automaticity in the audiovisual integration of motion. A cta Psychologica, 118, 71 92. Intersensory G estalten and Crossmodal Scene Perception Spence, C. (in press). A udiovisual multisensory integration. A coustical Science & T echnology. Spence, C., & D river, J. 1997 A udiovisual links in exogenous covert spatial orienting. Perception & Psychophysics, 59, 1 22. Spence, C., & D river, J. (E ds.) 2004 Crossm odal space and crossm odal attention. O xford, U K : O xford U niversity Press. Spence, C., & McD onald, J. 2004 The crossmodal consequences of the exogenous spatial orienting of attention In G . A . Calvert, C. Spence, & B. E . Stein (E ds.), T he handbook of m ultisensory processing (pp. 3 25). Cambridge, MA : MIT Press. Spence, C., & Walton, M. 2005 O n the inability to ignore touch when responding to vision in the crossmodal congruency task. A cta Psychologica, 118, 47 70. Staal, H . E ., & D onderi D . C. 1983 The effect of sound on visual apparent movement. A m erican Journal of Psychology, 96, 95 105. Strybel, T. Z ., & Vatakis, A . 2005 A comparison of auditory and visual apparent motion presented individually and with crossmodal moving distractors. Perception, 33, 1033 1048. Thomas, G . J. 1941 E xperimental study of the in uence of vision on sound localization. Journal of E xperim ental Psychology, 28, 163 177. Toro, J. M., Sinnett, S., & Soto-Faraco, S. 2005 Speech segmentation by statistical learning depends on attention. Cognition, 97, B25 B34. U llman, S. 1979 T he interpretation of visual m otion. Cambridge, MA : MIT Press. U rbantschitsch, V. 1888 U eber des E in uss einer Sinneserregung auf die ubrigen Sinnesemp ndungen. [O n the in uence of one sensory percept on the other sensory percepts]. A rchiv fur die geschichte Physiologie, 42, 154 182. Van Noorden, L. P. A . S. 1971 R hythmic ssion as a function of tone rate. In IPO A nnual Progress R eport (No. 6). E indhoven, Netherlands : Institute for Perception R esearch. Van Noorden, L. P. A . S. 1975 T em poral coherence in the perception of tone sequences. PhD , E indhoven U niversity of Technology. Vatakis, A ., Bayliss, L., Z ampini, M., & Spence, C. (in press) The influence of synchronous audiovisual distractors on audiovisual temporal order judgments. Perception & Psychophysics. Vatakis, A ., & Spence, C. (in press) Crossmodal binding : E valuating the unity assumption’ using audiovisual speech stimuli. Perception & Psychophysics. Violentyev, A ., Shimojo, S., & Shams, L. 2005 Touch-induced visual illusion. N euroreport, 16, 1107 1110. von Bekesy, G . 1957 Neural volleys and the similarity between some sensations produced by tones and by skin vibrations. Journal of the A coustical Society of A m erica, 29, 1059 1069. von Bekesy, G . 1959 Similarities between hearing and skin sensations. Psychological R eview, 66, 1 22. von Frey, M. 1929 Variations in tactual impressions. In W. D . E llis (E d.), A source book of G estalt psychology (pp. 193 195). London : R outledge and Kegan Paul. von Schiller, P. 1932a D as optische Verschmelzen in seiner A bhangigkeit von heteromodaler R eizung [O ptical integration and its dependence on heteromodal shimulation]. Z eitschrift fur Psychologie B ildung, 125, 249 288. von Schiller, P. 1932b D ie R auhigkeit als intermodale E rscheinung [R oughness as an intermodel phenomenon]. Z eitschrift fur Psychologie B ildung, 127, 265 289. Vroomen, J., & de G elder, B. 2000 Sound enhances visual perception : Cross-modal effects of auditory organization on vision. Journal of E xperim ental Psychology : H um an Perception and Perform ance, 26, 1583 1590. Vroomen, J., & de G elder, B. 2003 Visual motion in uences the contingent auditory motion aftereffect. Psychological Science, 14, 357 361. Vroomen, J., & Keetels, M. 2006 The spatial constraint in intersensory pairing : No role in temporal ventriloquism. Journal of E xperim ental Psychology : H um an Perception & Perform ance, 32, 1063 1071. Wada, Y., Kitagawa, N., & Noguchi, K. 2003 A udio-visual integration in temporal perception. International Journal of Psychophysiology, 50, 117 124. Watanabe, K. 2004 Visual grouping by motion precedes the relative localization between moving and ashed stimuli. Journal of E xperim ental Psychology : H um an Perception & Perform ance, 30, 504 512. Intersensory G estalten and Crossmodal Scene Perception Watanabe, K., & Shimojo, S. 1998 A ttentional modulation in perception of visual motion events. Perception, 27, 1041 1054. Watanabe, K., & Shimojo, S. 2001a When sound affects vision : E ffects of auditory grouping on visual motion perception. Psychological Science, 12, 109 116. Watanabe, K., & Shimojo, S. 2001b Postcoincidence trajectory duration affects motion event perception. Perception & Psychophysics, 63, 16 28. Welch, R . B. 1999 Meaning, attention, and the unity assumption" in the intersensory bias of spatial and temporal perceptions. In G . A shersleben, T. Bachmann, & J. Musseler (E ds.), Cognitive contributions to the perception of spatial and tem poral events (pp. 371 387). A msterdam : E lsevier Science, B. V. Welch, R . B., D uttonH urt, L. D ., & Warren, D . H . 1986 Contributions of audition and vision to temporal rate perception. Perception & Psychophysics, 39, 294 300. Welch, R . B., & Warren, D . H . 1980 Immediate perceptual response to intersensory discrepancy. Psychological B ulletin, 3, 638 667. Wertheimer, M. 1912 E xperimentelle Studien uber das Sehen von Bewegung [E xperimental studies on the visual perception of movement]. Z eitschrift fur Psychologie, 61, 161 265. [A lso in T. Shipley (E d. and Trans.), Classics in psychology (pp. 1032 1089). New York : Philosophical Library.] Wertheimer, M. 1938 Laws of organization in perceptual forms. In W. E llis (E d.), A source book of G estalt psychology (pp. 71 88). London : R outledge & Kegan Paul. (O riginal published in 1923) Z ampini, M., G uest, S., Shore, D . I., & Spence, C. 2005 A udiovisual simultaneity judgments. Perception & Psychophysics, 67, 531 544. Z ampini, M., Shore, D . I., & Spence, C. 2003 Multisensory temporal order judgments : The role of hemispheric redundancy. International Journal of Psychophysiology, 50, 165 180. Z apparoli, G . C., & R eatto, L. L. 1969 The apparent movement between visual and acoustic stimulus and the problem of intermodal relations. A cta Psychologica, 29, 256 267. Z ietz, K., & Werner, H . 1927 Werner’s Studien uber Strukturgesetze, VIII : U ber die dynamische Struktur der Bewegung. [Werner’s studies on the laws of structure, VIII : O n the dynamic structure of movement]. Z eitschrift fur Psychologie, 105, 226 249. Footnotes 1. The illusion of apparent motion, otherwise known as the phiphenomenon, occurs under conditions of the discrete sequential presentation of static stimuli from different locations at rates fast enough to give rise to the illusion of an object moving smoothly and continuously through the space between them (e.g., Wertheimer, 1912 ; Kolers, 1972 ; see also Strybel & Vatakis, 2005) 2. Note that Vroomen and de G elder (2000) incorrectly describe these visual stimuli as moving sequentially from left to right across the screen. A ll of the visual stimuli were actually presented from the same lateral position. It is, however, uncertain whether O ’Leary and R hodes could actually have reduced the duration of the visual stimuli in 8 ms steps as they report in their paper. This would have necessitated the use of a monitor with a screen refresh rate of 125 H z, and it is questionable whether such performance could have been attained using the display technologies available at the time. 3. G iven that physically extending the moment of collision has been shown to increase the perception of bouncing (e.g., Bertenthal et al., 1993), it could be argued that the presentation of the sound may simply have extended the perceived duration of the collision (i.e., it may have had its effect by freezing’ the visual display, as reported in Vroomen and de G elder’s, 2000, study). H owever, Sanabria et al. (2004) have shown that this freezing account cannot provide the sole explanation for this particular crossmodal effect. They found that the presentation of the sound still in uences people’s interpretation of the visual display even if it was presented while the disks were occluded behind an opaque barrier (hence when there was no moment of collision to freeze’). 4. While a response bias interpretation of the auditory grouping effect on this particular crossmodal interaction remains possible (cf. Bertelson & de G elder, 2004 ; Vroomen & de G elder, 2000), an extensive body of crossmodal research has demonstrated that there is a genuinely perceptual component to this crossmodal effect (e.g., see Sanabria et al., 2004 ; Watanabe & Shimojo, 2001b). 5. It seems particularly apposite to use apparent motion to look at the question of intramodal versus crossmodal grouping given that the eld of G estalt psychology itself originated with Wertheimer’s (1912) early studies of visual apparent motion (see Shepard, 1981, p. 311). Intersensory G estalten and Crossmodal Scene Perception 6. In a book chapter published in 1994, Churchland et al. (pp. 30 31) reported that if a visual occluder (such as a piece of card) is placed on a screen and a visual stimulus is ashed on and off just to the side (i.e., to the left) of it then participants only see a single light ashing on and off (i.e., no perception of motion is experienced). H owever, if a sound is played to the left ear over headphones when the light is ashed, and then a sound is presented to the right ear when the light is turned off, this can give rise to the perception that the visual stimulus is actually moving right behind the occluder. Churchland et al. also reported a similar, albeit somewhat weaker, subjective motion effect when the ashing of the light is accompanied by a tactile stimulus to the left hand, and by a tactile stimulus to the right hand when it is turned off (with the hands presumably placed in an uncrossed posture). These fascinating, albeit anecdotal, ndings clearly warrant further empirical research. 7. Note that congruency in the 4 lights displays was de ned in terms of the global (rather than local) motion in the displays. 8. In fact, while all of the research that has been highlighted so far has had as its focus the question of the extent to which changes in the strength of intramodal perceptual grouping affect the nature of any crossmodal grouping (or binding) that is observed, one could presumably also ask the reverse question : Namely, can changes in the strength (or type) of crossmodal perceptual grouping modulate the strength (or type) of grouping taking place intramodally (perhaps using a variant of the local versus global grouping displays used by Sanabria et al., 2004c ; 2005a). 9. A t this point it is perhaps also worth noting how little research has been conducted on the grouping principles governing intramodal tactile perception (though see Katz 1925/1989 ; and von Frey, 1929). Author Notes Correspondence concerning this article should be addressed to Charles Spence (E -mail : charles.spence@psy.ox.ac.uk) at the D epartment of E xperimental Psychology, South Parks R oad, O xford, E ngland, O X1 3U D . This research was funded by a grant from the O xford McD onnell Centre for Cognitive Neuroscience to CS and SS F.