Sound-driven enhancement of vision: disentangling detection

J Neurophysiol 109: 1065–1077, 2013. First published December 5, 2012; doi:10.1152/jn.00226.2012. Sound-driven enhancement of vision: disentangling detection-level from decision-level contributions Alexis Pérez-Bellido,1 Salvador Soto-Faraco,2,3 and Joan López-Moliner1,4 1 Departament de Psicologia Bàsica, Universitat de Barcelona, Barcelona, Spain; 2Departament de Tecnologies de la Informació i les Comunicacions, Universitat Pompeu Fabra, Barcelona, Spain; 3Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain; and 4Institute for Brain Cognition and Behaviour (IR3C), Barcelona, Spain Submitted 15 March 2012; accepted in final form 29 November 2012 Pérez-Bellido A, Soto-Faraco S, López-Moliner J. Sound-driven enhancement of vision: disentangling detection-level from decision-level contributions. J Neurophysiol 109: 1065–1077, 2013. First published December 5, 2012; doi:10.1152/jn.00226.2012.—Cross-modal enhancement can be mediated both by higher-order effects due to attention and decision making and by detection-level stimulus-driven interactions. However, the contribution of each of these sources to behavioral improvements has not been conclusively determined and quantified separately. Here, we apply psychophysical analysis based on Piéron functions in order to separate stimulus-dependent changes from those accounted by decision-level contributions. Participants performed a simple visual speeded detection task on Gabor patches of different spatial frequencies and contrast values, presented with and without accompanying sounds. On one hand, we identified an additive crossmodal improvement in mean reaction times across all types of visual stimuli that would be well explained by interactions not strictly based on stimulus-driven modulations (e.g., due to reduction of temporal uncertainty and motor times). On the other hand, we singled out an audio-visual benefit that strongly depended on stimulus features such as frequency and contrast. This particular enhancement was selective to low-visual spatial frequency stimuli, optimized for magnocellular sensitivity. We therefore conclude that interactions at detection stages and at decisional processes in response selection that contribute to audio-visual enhancement can be separated online and express on partly different aspects of visual processing. sensory integration; reaction time; psychophysics; audio-visual stimuli; magnocellular pathway; race models WATCHING A MOVIE is not only a visual experience, attending a concert is not only acoustic, and stroking our pet is not only tactile. Instead, perception in everyday life typically involves input from various sensory modalities. In consequence, the nervous system of complex animals, including humans, has evolved to maximize the information by exploiting correlations between multimodal inputs. In recent years, a growing body of literature indicates the existence of multisensory interactions in a wide variety of contexts. For example, neurophysiological evidence from animal studies has described multisensory responses of neurons in the deep layers of the superior colliculus (SC), a subcortical structure that receives ascending inputs from auditory, visual, and somatosensory pathways and subserves reflexive stimulus-driven behaviors such as orienting head and eye movements (e.g., Meredith and Stein 1983, 1986). Some of these neurons are characterized by their enhanced responses to multisensory compared with unisensory Address for reprint requests and other correspondence: J. López-Moliner, Departament de Psicologia Bàsica, Universitat de Barcelona, Barcelona, Spain (e-mail: j.lopezmoliner@ub.edu). www.jn.org stimulation, typically when sensory inputs coming from different modalities are in approximate temporal synchrony and spatially overlapping (Meredith and Stein 1996; Wallace et al. 1993, 1998). Complementary behavioral studies have demonstrated that multisensory stimulation leads to benefits such as lower detection thresholds (e.g., Noesselt et al. 2010; Wuerger et al. 2003), increased precision (e.g., Ernst and Bülthoff 2004), and quicker motor responses (e.g., Morrell 1972; Nozawa et al. 1994). One of the most often cited, yet still controversial, cases of multisensory facilitation is the enhancement of visual perception as a function of concurrent acoustic input. Here, we examine the basis of this acoustic facilitation of visual processing in behavior. Behavioral evidence for auditory enhancement of vision is illustrated by improvements in orienting responses to audiovisual, compared with visual-only, stimuli (in cats, Stein et al. 1989); faster and more accurate saccadic reactions (Colonius and Arndt 2001; Corneil et al. 2002; Diederich and Colonius 2004); increases in brightness judgments of a visual stimulus accompanied by a sound (Stein et al. 1996); and improvements in sensitivity to different aspects of visual stimuli (Bolognini et al. 2005; Frassinetti et al. 2002; Jaekl and Soto-Faraco 2010; Manjarrez et al. 2007). However, some controversy still surrounds the underlying sources of these audio-visual enhancements. Despite the fact that these improvements are generally attributed to changes in early sensory processes at detection stage as a result of hard-wired integrative mechanisms (Lakatos et al. 2007; Stein 1998), many of these cross-modal enhancements cannot be dissociated from effects at decisional stages of processing that are more general in scope (Doyle and Snowden 2001; Lippert et al. 2007; Odgaard et al. 2003). For example, several authors have argued that sound-induced enhancement of vision can be attributed to a reduction in uncertainty within the spatial and temporal domains, without necessarily implying early interactions at sensory level. Indeed, generic mechanisms such as the involuntary orienting of spatial attention to the location of a sound have been shown to enhance early perceptual processing within the visual modality (e.g., McDonald et al. 2000, 2005; Spence and Driver 1994), without necessarily involving any early sensory interaction between sound and vision. Likewise, in the temporal domain, an abrupt sound can considerably reduce the temporal uncertainty about the time of visual target presentation (e.g., Lippert et al. 2007). Thus, according to this alternative view, cross-modal enhancement would be explained by cross-modal cuing effects, which have been widely demonstrated in the attention literature (e.g., Driver and Spence 1998). Discerning the contribution of these 0022-3077/13 Copyright © 2013 the American Physiological Society 1065 1066 SOUND-DRIVEN ENHANCEMENT OF VISION cross-modal cuing effects from early cross-modal convergence leading to enhanced detection, as a classical sensory integration account would propose, is critical when characterizing the nature of multisensory interactions (Driver and Noesselt 2008; McDonald et al. 2001). The nature of auditory-induced enhancement of visual processing has previously been addressed by measuring detection thresholds or changes in visual sensitivity but, as discussed above, with mixed conclusions (Frassinetti et al. 2002; Jaekl and Soto-Faraco 2010; Lippert et al. 2007; McDonald et al. 2000; Noesselt et al. 2008). Studying the temporal dynamics of audio-visual perception has also provided interesting insights about the nature of such interactions. For example, at a neurophysiological level, multisensory stimulation leads to shorter neuronal latencies (Rowland et al. 2007). Rowland et al. found that initial neural responses in individual SC neurons of anesthetized cats speed up in response to audio-visual stimuli. Additionally, Giard and Peronnet (1999) provided physiological evidence in humans that audio-visual stimuli produced neural activity in the visual cortex as early as 40 ms after stimulus, above and beyond that evoked by visual-only stimuli. It is remarkable that, considering that possible sensory-based effects may express at (and lead to) shorter processing latencies, previous behavioral studies addressing the nature of auditory-induced visual enhancement (see above) have typically used off-line detection tasks that are not so sensitive to latency shifts. Note that, as we discuss below, the many studies using the redundant signals approach on audio-visual reaction times (RTs) cannot tell whether the speed-up in latencies is due to enhancement of vision specifically. Here, we decided to apply a RT paradigm specifically focused on measuring visual processing, with the aim of capturing any latency aspect involved in auditory-induced visual enhancement. When two or more signals are presented together in different channels, simple detection responses are faster on average than when only one signal is present—a well-known phenomenon termed the redundant signals effect (RSE; Kinchla 1974). Previous studies have modeled such multisensory RT advantages as a function of unisensory RTs by applying the race model inequality test (RMI) (Colonius and Arndt 2001; Miller 1982; but see Otto and Mamassian 2012 for a criticism). According to many authors, this paradigm allows one to infer how these signals are used in detection. If the auditory signal and the visual signal are processed independently at sensory level and combined during a decision stage, the distribution of RTs to audio-visual events cannot be significantly faster than the probability summation model predictions based on the RTs to unisensory targets (Raab 1962). If, on the other hand, audio-visual RTs are faster than those predicted by the probabilistic summation (race) model, then it is inferred that these signals are effectively integrated within a common channel prior to decision. This latter kind of interaction has been accurately described by linear or coactivation models (Colonius and Arndt 2001; Miller 1982; Nickerson 1973). This RMI has been extensively applied in the multisensory domain (see Cappe et al. 2009; Colonius and Arndt 2001; Laurienti et al. 2004; Leo et al. 2008). However, a key aspect of this approach is that the task is not modality selective (that is, participants respond to any signal regardless of whether it is auditory or visual). Therefore, regardless of recent criticisms (Otto and Mamassian 2012), the RSE model is simply not adequate when the goal is to ascertain how, specifically, one sensory modality is enhanced by the presence of the other, as in the present study. For this reason, we started by using a different RT paradigm well known in the visual literature that exploits the fact that RTs to visual events decrease monotonically with stimulus intensity following a power function, typically referred to as the Piéron function (Piéron 1914): RT ⫽ kC⫺␣ ⫹ t0 (1) The function in Eq. 1 captures different separate contributions to response latency across variations in stimulus contrast (C). In this model, two parameters (k and ␣) modulate the decay of RTs caused by stimulus-dependent variables, while t0 modulations reflect the variation in RTs due to stimulus-independent factors (Pins and Bonnet 1996; Plainis and Murray 2000). In particular, k accounts for the time it takes for sensory information to reach threshold, that is, the gain rate in the accumulation of sensory evidence. In turn, ␣ is an exponent that characterizes a given sensory modality. Finally, in this framework t0 represents the asymptotic RT, which reflects a time constant that includes processing latencies intrinsic of the sensory pathway and motor time of the effector system (Pins and Bonnet 2000). One important aspect for the present study is that variations in response latencies caused by differences in stimulus uncertainties, task complexity, or decision making cumulate additively and equally across all levels of contrast or any other stimulus variation, thus adding to the parameter t0 (Pins and Bonnet 1996). This assumption has been explicitly tested by Carpenter (2004) with visual stimuli. In particular, using this framework, Carpenter showed that the contributions of contrast variations to overall RT are associated to detection processes, whereas variations concerning higher-order factors (such as the probability of the stimulus requiring a response or not) are linked to decision processes leading to shifts in t0. Applying this approach to the cross-modal case, we will address how the different parameters of the Piéron function account for changes in RT following the addition of sound to an imperative visual stimulus. This paradigm will therefore allow us to discern higher-level effects from interactions occurring due to stimulus-driven processing at the detection level between vision and sound (see Data analysis). Given that previous studies have suggested that differences in multisensory integration may depend on the particular visual pathways involved (Jaekl and Harris 2009; Jaekl and SotoFaraco 2010; Leo et al. 2008), we included a range of spatial frequencies (SFs) putatively engaging the magnocellular (M) and parvocellular (P) pathways to different extents. We chose SF values of 0.3 c/°, 0.74 c/°, and 5.93 c/° given that frequencies above 4 c/° are thought to stimulate sustained channels more specifically than frequencies below (Legge 1978; Levi et al. 1979). Therefore the low SFs 0.3 c/° and 0.74 c/° in our stimuli are assumed to primarily stimulate the M-pathway, while the intermediate frequency 5.93 c/° is predicted to elicit relatively more activation in the P-channel. We hypothesize that sound-induced visual enhancements that are dependent of contrast (i.e., detection level) should be modulated by SF and will therefore lead to differences in the gain parameter k across SFs. In particular, stimulus-driven benefits should concentrate in the low SFs. On the other hand, interactions produced by the sound that are not contrast dependent (e.g., decision stage; Carpenter 2004) should in principle J Neurophysiol • doi:10.1152/jn.00226.2012 • www.jn.org SOUND-DRIVEN ENHANCEMENT OF VISION accumulate equally across all contrast levels, being mainly reflected in changes of the asymptotic parameter t0. Moreover, this sensory unspecific improvement driven by high-level processes is expected to occur not only invariantly across contrast levels but also across SFs. Note that, in a strict sense, it is not logically possible to discard the influence of sensory-level interactions on t0, so that the nonsensory interpretation of changes in t0 is just a conservative interpretation. Of course, this does not invalidate the interpretation of the k parameter, which would remain representative of the rate at which evidence is accumulated at the detection stage across levels of contrast (i.e., gain). MATERIALS AND METHODS Experiment 1 Methods. PARTICIPANTS. Two of the authors and three naive volunteers (24 – 42 yr old; 3 women, 2 men) from the University of Barcelona participated in this experiment. All subjects had normal hearing and normal or corrected-to-normal vision by self-report and gave informed consent prior to participating. The study received the approval of the University of Barcelona ethical committee. MATERIALS. Visual stimuli consisted of vertical Gabor patches [Gaussian window standard deviation (SD) ⫽ 0.25; mean luminance ⫽ 23.4 cd/m2] subtending a visual angle of 2°, with SF of 0.3, 0.74, or 5.93 c/°, depending on the experimental condition. They were presented at the center of a Philips 22-in. CRT monitor (Brilliance 202P4) placed at a 1-m viewing distance. These Gabor patches were presented at varying levels of contrast in six steps ranging from 0.05 to 0.822 (Michelson contrast) on a logarithmic scale. Auditory stimuli were 340-ms broadband white noise bursts [58 dB (A) SPL; 10-ms on-/off-ramps] delivered from two loudspeaker cones located at either side of the monitor. Auditory and visual stimulus timing accuracy and precision were calibrated with a Black Box Toolkit oscilloscope (Black Box Toolkit). PROCEDURE. Prior to each session, participants adapted to the room lighting conditions and to the screen luminance for 5 min and then were presented with a training block of 20 trials. Participants were asked to press a joystick button as soon as they detected a visual stimulus (Gabor patch) at the center of the monitor. Each block contained 180 trials including 20% catch trials, where no visual stimulus was present. Catch trials encouraged participants to selectively attend visually and discouraged responses based on the auditory component. Half of the total trials (including half of the catch trials) contained an auditory stimulus. All types of trials (sound vs. no sound, 1067 with vs. without visual stimulus, and contrast level of the visual stimulus) were randomly intermixed throughout each block while the SF was blocked. The beginning of each trial was signaled by a central fixation cross (which remained throughout the trial), and, after a variable delay (randomly chosen from 500 ms to 1,500 ms), the target stimulus was presented for 340 ms (Fig. 1). After a response deadline of 1,500 ms, the screen blanked and a 2,000-ms delay led to the beginning of the next trial. Subjects typically completed four blocks of 180 trials of one SF per session, with short resting breaks between blocks. Each session was run on a different day, and the experiment lasted six sessions (2 sessions for each SF). The order of the sessions was randomized to prevent learning effects. Overall, each participant performed an average of 80 (minimum of 60) valid trials per condition (with and without sound) and level of contrast. Data analysis. RT to visual stimuli is determined by contrast level and SF (Breitmeyer 1975; Harwerth and Levi 1978). The decrease in RTs to increments in contrast is well accounted for by the Piéron function in Eq. 1. As discussed in the introduction, the slope of this function (determined by the parameters k and ␣ in Eq. 1) captures variations of RT caused by variations in stimulus properties. In many cases it is possible to express the slope with a single parameter by assuming an ␣ exponent of ⫺1, which is a particular case of the general function often applied in visual psychophysics (see Eq. 2; Murray and Plainis 2003; Plainis and Murray 2000; see Fitting of ␣ parameter below): RT ⫽ k · 1 C ⫹ t0 (2) In this case, the slope k can be directly interpreted as the gain (rate) at which the system accumulates sensory evidence about the presence of a stimulus (k is inversely proportional to sensitivity, so that sensitivity ⫽ 1/k). Interestingly, within this framework the parameter k has been shown to be modulated by the different sensitivity of the P- and the M-pathways to different levels of contrast and SFs (Murray and Plainis 2003; Plainis and Murray 2000). Specifically, low values of k (indicating higher sensitivity) have been reported for low SFs compared with higher SFs (for a more detailed discussion of the paradigm see Murray and Plainis 2003; Plainis and Murray 2000). In the present study, we extrapolate this framework to the multisensory domain in order to investigate how sound modulates the processing of a visual (Gabor) stimulus. We aim to reveal potential frequency-specific modulations by looking at RTs as a function of stimulus contrast. That is, we contend that efficient visual processing will lead to shallower slopes (lower k, indicating higher sensitivity). If present, this reduction in RTs is expected to be stronger at lowcontrast than at high-contrast visual stimuli, because there is more room for improvement in sensory processing in that case. Fig. 1. Schematic representation of a trial sequence. After a randomly chosen interval following the onset of the fixation cross (500 –1,500 ms), a Gabor stimulus (visual target) was presented on 80% of the trials (therefore, this is the proportion of trials requiring response). The remaining 20% were catch trials (no visual target). Fifty percent of the trials (across target-bearing and catch trials) contained a sound temporally aligned with the visual event, thus making an equal proportion of visual and audio-visual trials. (The relative size of the Gabor has been scaled up for clarity of presentation; see details in text.) J Neurophysiol • doi:10.1152/jn.00226.2012 • www.jn.org 1068 SOUND-DRIVEN ENHANCEMENT OF VISION In terms of cross-modal enhancement, it is predicted that if audiovisual interactions produce changes in the rate of accumulation of sensory evidence, this will be reflected as reductions in k (i.e., the slope)1 with respect to visual-only trials. Moreover, we hypothesize that this slope modulation should reveal some specificity for low SFs, in agreement with the results of previous investigations (Jaekl and Soto-Faraco 2010; Leo et al. 2008). Other benefits due to sound that are not strictly modulating sensitivity are also expected, but in this case the modulation should concern the constant t0. Abrupt sounds are indeed known to speed up motor responses independently of the stimulus features (Nickerson 1973; Valls-Solé et al. 1995), by reductions in temporal uncertainty (Lippert et al. 2007) or altering the decision making process (Doyle and Snowden 2001; Odgaard et al. 2003). These mechanisms are assumed to be equally effective across different levels of contrast. Hence, we contend that decreases in RTs caused by these higher-order decisional processes not originated in the detection phase will thus be captured by the parameter t0, in agreement with Carpenter (2004) and Pins and Bonnet (1996). As mentioned above, the t0 term of Eq. 2 reflects the intercept of the function. Pins and Bonnet (1996; see also Carpenter 2004) already tested how the slope and the asymptote of the Piéron function vary under task manipulations that involved, for example, stimulus uncertainty, to demonstrate that high-order effects produce changes in the asymptotic parameter t0 (while the slope remained unaltered). Fitting ␣ parameter. As pointed out before, the oft-used simplification of the original Piéron function shown in Eq. 2 allows one to 1 Note that this pattern can be loosely related to the rule of inverse effectiveness, often present in the multisensory literature (Rowland et al. 2007; Stein et al. 2009). However, the conditions to put this idea to the test are not optimal, since in this case the strength of the acoustic event is not particularly weak. relate RTs to the reciprocal of contrast (C⫺1) with slope k and intercept t0. This simplification involves the assumption that the parameter ␣ in the original Piéron functions is ⫺1. We thought it important to test this initial assumption, and did so in two ways. First, we fitted the Piéron function in its original form to our data, leaving ␣ as a free parameter, and calculated its variability with a nonlinear method (Marquardt-Levenberg algorithm). The result was in general compatible with an ␣ ⫽ ⫺1 (with only 4 exceptions from the 30 possible combinations; see Fig. 2). To secure further the viability of this assumption, and considering that the nonlinear fitting of the exponent is not fully conclusive and could depend on the algorithm used, we also proceeded with an approach based on linear fitting. We tested for the linearity of the data points when plotted against the reciprocal of the contrast (therefore, this time keeping ␣ fixed to ⫺1). The linearity of the distributions was assessed with a likelihood ratio test (per subject and condition; P ⬍ 0.05 in all cases). Since both approaches led to converging evidence, it was safe to accept an ␣ ⫽ ⫺1. Results and discussion. Mean RTs were calculated after filtering out individual data points falling 2 SDs from the mean for each distribution of RTs per condition and participant (⬍4.5% of the original data was rejected). Simple visual inspection of the averaged data in Fig. 3 shows how responses in audio-visual trials were systematically faster and less variable than in visual-only trials (see also Table 1). We adjusted the RT data as a linear function of the reciprocal of contrast (Eq. 2) for each different visual frequency and modality combination (Fig. 3) in order to estimate the corresponding slopes (i.e., the gain parameter k) and t0. Below, we report statistically significant results only, and unless otherwise noted, all other effects and interactions were nonsignificant. First, we ran simple paired Fig. 2. Individual ␣ parameter values and their confidence intervals across visual spatial frequencies (SFs) for the visual (A) and audio-visual (B) conditions for the data in experiment 1. These values correspond to fits of the Piéron function when leaving ␣ as a free parameter (see text for details). Each column represents 1 participant. Except for 4 of the 30 fitted values, the ␣ parameter was not significantly different from ⫺1. J Neurophysiol • doi:10.1152/jn.00226.2012 • www.jn.org SOUND-DRIVEN ENHANCEMENT OF VISION 1069 Fig. 3. Response times (RTs) as a function of the reciprocal of contrast (1/C) for each SF (from left to right) and modality combination (plotted in different shades of gray) in experiment 1. Small symbols represent individual means of each participant (each based on an average of 80 RTs, min ⫽ 60), whereas large symbols represent the interparticipant average. Lines represent the linear least-squares regression fits (thin dotted and thick solid lines for individual and average data, respectively). It can be appreciated that that the slope of these fits (k parameter) is shallower in the audio-visual condition compared with the visual condition at the low SFs. The thin solid black line in each plot represents the prediction of the probability summation model according to the LATER model. t-tests on the k parameter across SFs within the visual modality (threshold for significance set to P ⫽ 0.033 for multiple comparisons adjustment; Benjamini and Hochberg 1995). This was done for comparison with previous visual studies. As expected, maximal sensitivity (low k parameter) was seen for SF ⫽ 0.74 c/° compared with SF ⫽ 0.3 c/° [t(8)⫽ ⫺3.344, P ⫽ 0.01] and SF ⫽ 5.93 c/° [t(8) ⫽ 9.694, P ⬍ 0.001]. These data are thus consistent with Plainis and Murray (2000) and in agreement with the typical contrast sensitivity curves for SFs in the visual literature. To test the effect of sound on visual sensitivity, we then ran an ANOVA on the gain parameter k with modality of presentation (visual vs. audio-visual) and visual SF (0.3, 0.74, and 5.93 c/°) as independent variables. This analysis revealed a main effect of modality, confirming that responses to audio-visual trials were faster than responses to visual trials (F1,4 ⫽ 14.79, P ⫽ 0.018), and a significant interaction between modality and SF (F2,8 ⫽ 4.5925, P ⫽ 0.046). Following up on this interaction, we ran paired t-tests between the audio-visual and visual conditions at each level of SF. These revealed a significant increase in sensitivity (i.e., lower k parameter) in the audio-visual condition with respect to the visual-only condition for SF ⫽ 0.3 c/° [t(8)⫽ ⫺9.352, P ⬍ 0.001] and 0.74 c/° [t(8) ⫽ ⫺5.548, P ⬍ 0.001] but not for 5.93 c/° stimuli (Fig. 4A). This reduction in slope is compatible with audio-visual interaction at detection stages, that is, the system accumulates visual evidence at a faster rate when the sound is present, thereby increasing sensitivity. Moreover, our data suggest that this enhancement is stronger at low contrast values. We ran another ANOVA on the intercepts (t0) extracted from the fits to Eq. 2, using the same independent variables as before (SF and modality). This analysis revealed a significant main effect of presentation modality caused by a shift in t0 indicating an overall RT decrement (a speed-up of 35 ms on average) in the audio-visual condition with respect to the visual-only condition (F1,4 ⫽ 97.24, P ⬍ 0.001). This enhancement was equivalent for all SFs tested (see Fig. 4B), as revealed by the lack of significant interaction between SF and modality (F1,4 ⫽ 0.418, P ⫽ 0.672). The main effect of SF was also insignificant (F1,4 ⫽ 0.770, P ⫽ 0.494). One possible concern with the modulation in slope arising from these analyses is the potential impact of erroneous responses triggered by the sound, which are only possible in the audio-visual condition. To address this, we analyzed the proportion of false alarms (FAs, corresponding to erroneous responses to catch trials). The percentage of FAs in the no-sound catch trials was very close to 0% for all participants and conditions (no significant differences). However, participants committed an average of 17% FAs in the sound-present catch trials (no differences between SFs). In any case, the linear regression between parameter k and FA rate (Fig. 5) did not reveal any significant or near-significant positive or negative correlation, suggesting that these FAs to sound were not causing variations in the slope (k parameter) of the functions. On the other hand, the percentage of misses (failing to respond to a target when present) was very low overall (0.016% on average) and statistically equivalent across SFs and presentation modality (as expected from the suprathreshold nature Table 1. Visual and audio-visual RTs and latency of FA responses in auditory-only catch trials 0.050 Visual Audio-visual 0.087 0.153 0.268 0.469 0.822 FA Contrast SF, c/° RT, s SD RT, s SD RT, s SD RT, s SD RT, s SD RT, s SD RT, s SD 0.3 0.74 5.93 0.3 0.74 5.93 0.384 0.359 0.377 0.304 0.297 0.330 0.146 0.093 0.059 0.097 0.070 0.059 0.328 0.323 0.346 0.275 0.272 0.299 0.072 0.054 0.052 0.055 0.053 0.050 0.311 0.307 0.325 0.265 0.265 0.283 0.064 0.056 0.062 0.057 0.046 0.057 0.303 0.300 0.313 0.260 0.263 0.271 0.050 0.053 0.052 0.054 0.049 0.039 0.296 0.300 0.302 0.255 0.255 0.264 0.060 0.075 0.058 0.043 0.041 0.037 0.294 0.293 0.300 0.254 0.256 0.264 0.050 0.050 0.046 0.041 0.048 0.058 0.263 0.266 0.258 0.042 0.048 0.036 Values are averages and SDs of visual and audio-visual reaction times (RTs) and latency of false alarm (FA) responses in auditory-only catch trials. Although this is an averaged representation of the data, in the simulations we applied the LATER model considering each participant individually. SF, spatial frequency. J Neurophysiol • doi:10.1152/jn.00226.2012 • www.jn.org 1070 SOUND-DRIVEN ENHANCEMENT OF VISION Fig. 4. A: interparticipant averages for audio-visual and visual sensitivities (1/k) in experiment 1. B: average audio-visual and visual values for the intercept (t0). Results are presented separately for each SF. Each symbol represents the averaged interparticipant intercept, and error bars represent the associated SD. of the stimuli). Please note that a signal detection approach (i.e., calculating parameter d=) is not applicable here, given that the paradigm attempts to address changes in visual sensitivity from differences in processing time (RT) and not from accuracy measurements (i.e., FAs are mostly due to motor anticipations, not to incorrect detections). The results described above are clear in that RTs to visual events are speeded up by sounds and that this enhancement comprises both a gain in stimulus-driven information toward detection as well as a more generic, across-the-board facilitation that can be attributed to higher-order mechanisms. The former source of gain is sensitive to stimulus parameters such as SF and aligns well with previous results suggesting a stronger engagement of the magnocellular visual pathway in multisensory integration. The latter source of enhancement, on the other hand, captures well the interpretations of some previous studies showing the role of accessory sounds in reducing uncertainty (Lippert et al. 2007) and speeding up motor pathways (Valls-Solé et al. 1995). Previous literature has often characterized the mechanism underlying multisensory enhancement by testing whether the data from the cross-modal condition surpass the prediction of a probability summa- Fig. 5. Correlation plot between visual and audio-visual individual sensitivity (1/k) and % of false alarms in experiment 1. tion model based on the unisensory data (e.g., Arnold et al. 2010; Meyer et al. 2005). Despite some recent fundamental criticism depending on the paradigm involved (Otto and Mamassian 2012; Pannunzi et al. 2012), this approach has been used frequently and in a variety of contexts to evaluate the nature of multisensory enhancement. Probability summation assumes that the speed-up in audiovisual trials arises from a statistical advantage when responses can be based on two independent inputs that race toward reaching a threshold (Green 1958; Quick 1974; Treisman 1998; Wuerger et al. 2003). Given that, in the present experiment, participants were asked to respond exclusively to the visual stimulus and not to the sounds, probability summation would not appear adequate to model the present task. We nevertheless consider it relevant to bridge our present results with this common test for integration, as it has been quite influential in the literature so far. We decided to do so in two ways: First, we used the FA data from experiment 1 to simulate auditory RTs in order to be able to calculate the outcome of a probability summation model (see analysis below). Second, we reproduced the stimulus conditions of experiment 1 in a new test in which participants were asked to respond to both modalities, thus effectively adopting the standard paradigm of probability summation (experiment 2, reported below). Assessing probability summation. Judging from the FAs to soundonly trials (17% on average; Fig. 5), a percentage of the correct responses in the audio-visual RT distributions could have been triggered by uncontrolled responses toward the auditory stimulus. We decided to evaluate the possible impact of probability summation in our results assuming that a percentage of the RTs to the audio-visual trials originated from anticipations to sound. First, we obtained the averages and SDs of the individual RTs to the visual-only stimuli and those of FAs in catch trials with sound (Table 1; note that FA RT distributions were filtered in the same manner as the correct RTs in experiment 1). To compute the prediction of the probabilistic summation model, we applied a modified version of the LATER model. LATER assumes that sensory signals start a race to threshold in an ergodic rate, with the winning signal triggering the response (Carpenter and Williams 1995; Logan et al. 1984). We adapted the model to the present experimental design by assuming that only in a percentage of the audio-visual trials would intrusions from FAs win the race to threshold against the visual stimulus (this proportion was adjusted for each SF and observer, according to the empirical data from individual auditory catch trial conditions). Data of the probabilistic summation model were simulated by randomly selecting RTs from the visual and the FA distributions (9,000 simulations per point). Sensitivity (1/k) was computed from the newly modeled chronometric distributions (represented in J Neurophysiol • doi:10.1152/jn.00226.2012 • www.jn.org SOUND-DRIVEN ENHANCEMENT OF VISION Fig. 3 by the thin solid black line). The empirical audio-visual k parameter was significantly lower (higher sensitivity) than the theoretical k calculated from the probability summation model for the 0.3 c/° and 0.74 c/° SFs [t(8) ⫽ ⫺14.91, P ⬍ 0.0005 and t(8) ⫽ ⫺3.484, P ⫽ 0.025, respectively], but there were no significant differences for the 5.93 c/° condition. Thus, according to this analysis, while a probability summation model based on intrusive FAs to sound may eventually account for the RT shift in the 5.93 c/° SF, it cannot account for the gains produced in the remaining, lower SFs. The analysis above helps to reassure us, in a new fashion, that audio-visual responses in this experiment were genuinely benefited by interactions at the stage of detection, not (only) mediated by intrusive FAs to sound. However, there is an established literature suggesting that error and consequent physiological error negativity (which reflects the manifestation of an error detection system that checks actual behavior against an internal goal) have a marked effect on physiological processes, even when participants are not aware of their mistakes (e.g., Nieuwenhuis et al. 2001). For this reason, an RT to FAs might not be completely suitable for a conclusive test/reject of the probability summation hypothesis. Furthermore, the size of the FA RT distributions was based on 25 trials on average per participant and SF, thus making for a low reliability of the corresponding mean and SDs. We therefore decided, in experiment 2, to apply the RMI in a more traditional form (Miller 1982; Raab 1962), in order to validate our paradigm and the results of experiment 1 against a host of previous literature. Experiment 2 In this experiment we applied the RMI (Cappe et al. 2009; Colonius and Arndt 2001; Laurienti et al. 2004; Leo et al. 2008; Raab 1962) by directly collecting responses to unisensory visual and auditory trials and estimating Miller’s bound (Ulrich et al. 2007) for our stimuli. As noted above, recent studies (e.g., Otto and Mamassian 2012) have argued that Miller’s bound might not be adequate to reveal sensory interactions beyond purely statistical advantage. It is not within the scope of this study to claim otherwise, but this paradigm should suffice to back up the results from experiment 1 by comparison with a known and widely used approach to multisensory interactions. According to past literature, Miller’s bound is the maximal RT gain that can be explained by purely statistical facilitation when decisions are based on two independent sources of information. This is represented mathematically by Eq. 3, where the probability of detecting an audio-visual signal (AV) is the sum of the individual probabilities of detecting the visual (V) and the auditory (A) signals. ⱍ ⱍ ⱍ p共T ⬍ t AV兲 ⱕ p共T ⬍ t V兲 ⫹ p共T ⬍ t A兲 (3) If the results from experiment 1 were to be confirmed in this paradigm, we would expect to find the largest specific RT enhancement for the audio-visual RTs (relative to the unimodal RTs) at the lowest SF conditions. Methods. PARTICIPANTS. Seven new volunteers naive to the purpose of the experiment (23–29 yr old; 4 women, 3 men) were recruited among students from the University of Barcelona. All had normal hearing and normal or corrected-to-normal vision by self-report and gave informed consent prior to participation. The study received the approval of the University of Barcelona ethical committee. MATERIALS AND PROCEDURE. The experimental apparatus, timeline, and stimuli were identical to those used in experiment 1, except that we simplified the design by using only two SFs (0.3 and 5.93 c/°, lowest and highest in experiment 1, respectively), and two levels of contrast (0.05 and 0.822 Michelson contrast units, lowest and highest in experiment 1). Participants were asked to respond as fast as possible to any stimulus (visual, auditory, or audio-visual, indistinctly). Each participant ran 600 trials divided in 3 blocks of 200 trials (45 min in 1071 total) where all SFs and contrasts appeared randomly intermixed. Prior to the experimental session, participants dark-adapted for 5 min and performed 40 warm-up trials. In total every subject performed 45 valid trials for each visual, audio-visual, SF, and contrast condition and 180 auditory trials (in order to have the same number of visual, auditory, and audio-visual trials per block). Each block also included catch trials (amounting to 10% of the total) where no stimulus was presented and thus participants had to refrain from responding. Results and discussion. Just as in experiment 1, we filtered RTs below and above 2 SDs from the mean for each condition and participant (leading to the rejection of ⬍6% of the data) and obtained the empirical cumulative distribution functions. Participants responded to ⬍4% of the catch trials; therefore erroneous anticipations can be ruled out as a making a major contribution to the pattern of results. A 2 ⫻ 2 ⫻ 3 (contrast ⫻ SF ⫻ modality) repeated-measures ANOVA was conducted on the RTs. This analysis revealed a significant main effect of SF (F1,6 ⫽ 9.15, P ⫽ 0.0023), contrast level (F1,6 ⫽ 208.87, P ⬍ 0.0001), and modality (F2,12 ⫽ 168.62, P ⬍ 0.0001). All the interactions were also significant (all P ⬍ 0.001 except SF ⫻ contrast, which was marginally significant, F1,6 ⫽ 0.439, P ⫽ 0.0532). As can be seen in Fig. 6, RTs were faster for high-contrast and for low-SF visual stimuli, in consonance with previous literature (Breitmeyer 1975; Legge 1978) and with the results of experiment 1. Moreover, RTs were also generally faster for the auditory-only stimuli than for the visual-only stimuli. According to our expectations, responses to audio-visual stimuli were significantly faster than the visual-only and auditory-only responses in all conditions, with the exception of the low-contrast 5.93 c/° SF condition, where audioalone RTs were as fast as audio-visual RTs. Interestingly, the audiovisual absolute enhancement was larger at the high-contrast conditions, contradicting what the inverse effectiveness law would predict. However, it is important to point out that considering direct RTs to make the comparison might be misleading. For instance, if we consider the cumulative RT distribution functions presented in Fig. 6, bottom, it is also evident that the probabilistic overlap between visual and auditory unisensory conditions is also larger at high-contrast conditions. This means that just by means of probability summation overall audio-visual response speed-up should be larger, but mostly because of statistical reasons. Thus we calculated the relative index of enhancement for each type of visual stimulus by obtaining the Miller’s bound (for a specific description of the method, see Ulrich et al. 2007). This is normally considered the maximal possible speed-up in RT obtained by simple probabilistic advantage when two signals are independent. Data were binned in 10 quantiles (10% increments), and at each quantile (see Fig. 7) we tested the alternative hypothesis that audiovisual RTs were faster than the prediction of the Miller’s bound. The shaded areas in Fig. 7 indicate the quantiles where the race model was violated (i.e., the audio-visual RT was significantly faster). The 0.3 c/° SF manifested the highest proportion of such violations, with significant differences in 6 quantiles in the low-contrast condition and 4 in the high-contrast condition. The analysis of the 5.93 c/° SF did not reflect any enhancement for the low-contrast condition and only three significant violations in the high-contrast condition. In all cases, all significant violations of the race model encompassed adjacent time bins. The results of experiment 2 shed new light on the interpretation of the results of experiment 1. In particular, the reduction in the slope at low SFs in experiment 1, according to the analysis of the Piéron functions, seems to correspond to the specific sensitivity enhancement found for the low SF in experiment 2, accomplishing a convergence criterion. From previous literature, changes in the slope of the Piéron function are highly determined by the low-contrast values (Murray and Plainis 2003). The inverse effectiveness principle, often proposed in J Neurophysiol • doi:10.1152/jn.00226.2012 • www.jn.org 1072 SOUND-DRIVEN ENHANCEMENT OF VISION Fig. 6. Interparticipant average RTs in experiment 2 (error bars represent confidence intervals) plotted as a function of contrast and modality. The 2 tested SFs are represented in different columns. Insets: plots of cumulative Gaussians fitted to the empirical cumulative distribution functions for the RTs in the auditory (light gray and dashed line), visual (medium gray and dotted line), and audio-visual (dark gray and continuous line) conditions. multisensory integration, might be a logical candidate to explain our results, given that the Miller’s bound violation in experiment 2 was greater (for SF ⫽ 0.3 c/°) at the low-contrast condition.2 This conclusion is in principle independent of the potential problems for the claim that the race model may not be a good test of probability summation as recently pointed out by Otto and Mamassian (2012). In short, we use the race model here as a relative index to quantify the benefits produced across SFs in audio-visual trials. The main conclusion to be extracted from this experiment is that our results support the paradigm applied in experiment 1, showing that multisensory interactions are larger when visual stimuli correspond to low SFs and contrast levels. GENERAL DISCUSSION The brain combines incoming sensory information from various modalities in order to optimize the organism’s interaction with the environment. Various animal and human studies have highlighted the putative neural mechanisms supporting these enhancements (Calvert et al. 2001; Meredith and Stein 1986; Schroeder and Foxe 2005; Smiley and Falchier 2009; Watkins et al. 2006). However, one important, and yet unresolved, question refers to the specific case of the putative visual enhancement produced by concurrent sounds. As highlighted in the introduction, previous RT studies addressing 2 Note that there was also a Miller’s bound violation at the SF ⫽ 5.93 c/° high-contrast condition. Although it might seem contradictory with what the rule of inverse effectiveness postulates, this result can be accommodated because high-contrast stimuli, regardless of frequency, tend to engage the magnocellular visual pathway as well (as pointed out by different authors; Mitov and Totev 2005; Thomas et al. 1999). audio-visual integration (Cappe et al. 2009; Colonius and Arndt 2001; Laurienti et al. 2004; Leo et al. 2008; Miller 1982) have reported improvements in detection that cannot be accounted for solely as a result of a statistical advantage. Nevertheless, while these studies have consistently revealed the existence of cross-modal enhancements, they cannot dissociate and quantify the different sources contributing to the crossmodal benefit. One important constraint related to the classical approach based on the RSE is that responses in this task are given to any stimulus indiscriminately (i.e., visual or auditory), and consequently the RSE paradigm fails to describe with precision detection changes over one specific sensory modality. Hence, we decided to apply a different paradigm based on the analysis of Piéron functions on visual response latencies, in order to study specific changes in visual processing resulting from stimulus-driven interactions with a concurrent sound. Our results provide evidence for the coexistence of two functionally distinct mechanisms that support the enhancement of vision by sound: First, presenting a sound synchronously with a visual stimulus can decrease RTs as a result of processes not necessarily supported by sensory interactions. This source of enhancement may be a consequence of the superior temporal resolution of the auditory system (Lippert et al. 2007), in addition to (or in combination with) the potential alerting effects of sounds leading to increases in overt motor activation, shortening response latency (startle effect; Valls-Solé et al. 1995). In line with the nonsensory specific nature of these mechanisms, their contribution to RT speed-up was equivalent across all intensities of visual contrast and SFs tested. Such J Neurophysiol • doi:10.1152/jn.00226.2012 • www.jn.org SOUND-DRIVEN ENHANCEMENT OF VISION 1073 Fig. 7. Race model inequality tests in experiment 2. Black squares represent the empirical average RTs in the audio-visual condition binned in 10 quantiles, while gray squares represent the prediction of the probability summation model obtained from the auditory-only and visual-only RT distributions (Miller’s bound). Error bars represent SE. Shaded areas indicate quantiles where the race model was significantly violated (P ⬍ 0.05). nonsensory specific enhancement was clearly reflected in the reduction in the asymptotic term of the Piéron functions. Most importantly, however, our results also provide strong evidence another source of enhancement, based on sensory interactions between visual and auditory inputs at the detection stage. This interaction was clearly dependent on stimulus variations and, as we contend, it would be produced by a modulation of the rate at which visual sensory evidence is accrued (though, alternatively, it could also be based on a reduction of the sensory threshold needed to elicit a response; see below). In any case, this effect was reflected in the change of the slope of the Piéron function (gain parameter k), which is interpreted as an increase in the sensitivity of the visual system. From a computational point of view, this advantage in detection can be explained in at least two alternative ways. On one hand, visual and auditory signals may be integrated following a linear summation model that assumes convergence of the sensory inputs from the two modalities into a single physiological or psychophysical channel before a decision regarding the presence of the stimulus (Meyer et al. 2005; López-Moliner and Soto-Faraco 2007; Morrone et al. 1995). A second possibility is that the sound reduces the sensory threshold (for example, by increasing visual cortex excitability; Romei et al. 2009) to trigger the response. (Please note that although the main conclusions of this report are not affected by which of these two particular alternatives is true, we attempted to tell these two models of enhancement apart at the detection level, without conclusive results; see APPENDIX). What remains clear so far is that, regardless of the particular mechanism, an important part of the auditory-induced enhancement of vision reported here is due to changes at the detection stages of stimulus processing. To validate the original paradigm applied in experiment 1, and bridge our results with extant multisensory literature, in experiment 2 we followed the classical RSE approach. We showed that sensory interactions conformed well to the pattern of findings in experiment 1. Interestingly, and in stark contrast with the nonspecific component of the audio-visual enhancement, both experiments converged in showing that improvements at the detection level of processing were selective for low SFs. Further analyses of the slopes in experiment 1 allowed us to address this audio-visual enhancement across the different levels of contrast. As demonstrated by Murray and Plainis (2003), the low-intensity contrast region determines the slope; hence the marked reduction of the slope in the soundpresent condition could be construed as a behavioral analog of the inverse effectiveness rule, often cited in the multisensory integration literature (Meredith and Stein 1983; Wallace et al. 1998), that is, the lower the contrast intensity, the higher the benefit produced by the sound (Fig. 3, Fig. 7) and, in consequence, the audio-visual gain. Congruently, experiment 2 also yielded larger sensory interactions for the critical low-SF condition at the low-contrast stimuli. One might perhaps think that the reductions in RT in the audio-visual conditions could be accounted for in terms of response preparation (Nickerson 1973). According to this perspective, when one target and one accessory stimulus are presented in temporal synchrony, the longer the processing J Neurophysiol • doi:10.1152/jn.00226.2012 • www.jn.org 1074 SOUND-DRIVEN ENHANCEMENT OF VISION time for the target, the more time the accessory stimulus has to work as a warning signal and prepare the system for responding to that target, leading to advantages in RTs. However, our present results in experiments 1 (Table 1) and 2 (Fig. 6) showed that while visual latencies are slower for the high SF, the multisensory benefits in RT are more likely to occur at the low SFs, thus effectively rejecting response preparation as a viable account for the present pattern of results. Most previous studies conducted in multisensory research have not directly addressed possible differences in audio-visual integration as a function of the well-known functional division in visual processing between M- and P-pathways. In the present study, the component of enhancement linked to the detection stage revealed here was constrained to low SFs, and thus the present findings bear on the potentially important role of the low-frequency-tuned magnocellular visual pathway in multisensory detection (see Jaekl and Soto-Faraco 2010 for related findings). This fits well with prior knowledge concerning the physiology underlying low-level audio-visual interaction in the M-pathway and its functionality in terms of mediating detection of intensity changes. Moreover, from a neurophysiological perspective this dominance of the M-pathway in multisensory interaction for detection is indirectly supported by two additional lines of evidence. First, the SC, which has been frequently considered as determinant for multisensory enhancement (Leo et al. 2008; Meredith and Stein 1986, 1996), receives connections from the magnocellular layers of the lateral geniculate nucleus (LGN) of the thalamus via cortical area V1 (Berson 1988), whereas evidence for connections from the P-pathway is scant (Livingstone and Hubel 1988; Merigan and Maunsell 1993). Second, anatomical inputs into V1 from the auditory cortex occur primarily in the representation of the peripheral visual field (Falchier et al. 2002), a region of space that is mainly represented by the M-pathway (Silveira and Perry 1991). Third, another relevant feature of the M-pathway is its specific sensitivity to transient stimulation. Some recent studies have suggested that pure transient information is integrated (Andersen and Mamassian 2008) and it could be determinant for multisensory associations (Van der Burg et al. 2010; Werner and Noppeney 2011). Although our results support the important contribution of the M-pathway to multisensory interactions in detection, it is important to take into account the fact that the features of the experiment probably prioritize fast detection responses and, in consequence, the processing under this particular visual pathway, which has shorter latencies. Therefore, our results do not necessarily negate a possible role of the P-pathway in driving multisensory interactions in other kinds of tasks more specifically tapping at parvocellular processing (i.e., cross-modal object recognition; Giard and Peronnet 1999; Jaekl and Harris 2009; Molholm et al. 2002). Nevertheless, more research specifically addressing this dissociation should be considered in the future. Considering an explanation based on the linear summation hypothesis, the reduction in slope (gain parameter k) observed for the lower SFs is described as the stimulus energy reaching the detection threshold ␪ at a faster rate because of the concomitant presence of a sound that is linearly added to the visual signal. The slope (k ⫽ ␪/c) then consists of a threshold ␪ and the rate c that determines the accumulation to this threshold when the channel is stimulated by a given amount of energy E. According to this description an accurate estimate of k will depend on how well the energy values (E) of the stimuli used to fit the model are chosen. In our main analyses of experiment 1 we modeled the RTs as a function of vision (visual contrast) alone, as this was the modality the participants were asked to respond to. Hence, here we evaluate the possibility that the race to the threshold in the audio-visual condition was also determined by the auditory signals, at least for the lower SFs. To determine whether the present data are consistent with a linear summation model, we estimated the specific contribution of the auditory stimulus to the different audio-visual conditions. Specifically, we fitted the equation (Eq. A8; see Mathematical Derivation of Linear Summation Model below) and included an additional term (Ea) as the only free parameter to account for the auditory stimulus contribution. We set (k ⫽ ␪/c) for each SF to their respective values estimated from the visual-alone condition under the assumption that the threshold and the gain of the channel do not depend on the nature of the stimulation. That is, we consider that the threshold ␪ used to trigger the response is the same for both stimulation conditions (visual only and audio-visual). We set t0 to the values obtained with the Piéron equation in the audio-visual condition, as this reflects the nonspecific enhancement of RT due to sound. The linear model with a single parameter fitted the data perfectly. As dictated by the physical stimuli shown to the participants, Ea denotes a constant stimulation (i.e., the sound was always the same). However, the estimated contribution of this parameter to the final RT is inversely proportional to the contrast for each SF and, importantly, inversely proportional to SF, considering that the estimated values for Ea were 0.053 and 0.035 for the 0.3 and 0.74 c/° SFs, while that for 5.93c/° was nearly 0. In other words, the contribution of the sound to visual detection is greatest at low SFs. Importantly, this selective contribution accurately describes the data while keeping the slope (channel gain) at the same value as that obtained in the visual condition. Conclusion Mathematical Derivation of Linear Summation Model The present psychophysical analysis of RTs allowed us to shed light on the contribution of different sources of interaction leading to sound-induced enhancement of vision, one of the most often-cited but still controversial cases of cross-modal gain. First, the results of this study provide clear evidence The means of visual RTs measured as a function of stimulus contrast are usually well described by the Piéron function: showing that concurrent auditory signals result in an increase in sensitivity to visual stimuli, at a detection-specific level. This increase in sensitivity is largest at low SF and low contrast values of the visual stimulus. Second, the present paradigm allowed us to isolate and quantify the reduction in RTs to audio-visual stimuli as a consequence of more general processes that are separable from the reduction resulting from interactions at the detection stage. Although previous literature had hinted at the potential mixed contribution of detectionlevel and decision-level processes in multisensory enhancement, clear-cut evidence for the contribution of each type of source within the same paradigm was scant. APPENDIX Linear Summation Hypothesis RT ⫽ kC⫺␣ ⫹ t0 (A1) Let us assume that the response mechanism is based on an accumu- J Neurophysiol • doi:10.1152/jn.00226.2012 • www.jn.org SOUND-DRIVEN ENHANCEMENT OF VISION lator model. The stimulus-dependent latency component (t ⫽ kC⫺␣) is then a function of the time it takes for the mechanism to accumulate evidence (E, contrast in our case) until it reaches a threshold (␪): t⫽ ␪ (A2) cE where cE denotes the rate at which the system accumulates the evidence. A more general form of Eq. A2 underlies decisional response mechanisms like the LATER model (Carpenter and Williams 1995). RT can then be expressed as RT ⫽ ␪ cE ⫹ t0 (A3) Combining Eqs. A2 and A3 and taking the logarithms we obtain ␪ logt ⫽ log共RT ⫺ t0兲 ⫽ log ⫺ logE c (A4) A more general form of Eq. A4 is easily obtained by adding the extra parameter ␣ ⫽ ⫺1 and letting k ⫽ ␪/c: log共RT ⫺ t0兲 ⫽ log共t兲 ⫽ log共k兲 ⫺ ␣logE (A5) By taking the exponential at both sides of Eq. A5 we obtain the Piéron function (Eq. A1): RT ⫽ k · E ⫺␣ ⫹ t0 (A6) Therefore the larger the rate at which the sensory evidence accumulates, the smaller the parameter k. Any specific sensory effect must then affect the stimulus-dependent part of Eq. A1: ␪ ␪ t ⫽ E⫺1 ⫽ c cE (A7) If a visual (E) and an auditory (Ea, constant in our case) signal were integrated within a single channel during the race to the threshold ␪ without changing the rate c in a linear summation fashion, t⫽ ␪ c共E ⫹ Ea兲 ⫽ ␪ c 共E ⫹ Ea兲⫺1 (A8) The true sensory benefit could be captured in the audio-visual condition by a RT model that takes into account an auditory signal in the input with a slope k ⫽ ␪/c not different from the visual slope and an additive parameter t0 that is not different from the estimate in Eq. A6, in which the auditory system is not considered. Visual Threshold Reduction by Sound Hypothesis Alternatively, the reduction of the slope value in the lower SFs could be caused by an increase in the excitability of the system by the sound, leading to a reduction (Ha) of the threshold ␪ in the audiovisual conditions: RT ⫽ 关共␪ ⫺ Ha兲 ⁄ c兴 ⫹ t0 (A9) Therefore, we fitted Eq. A9 with fixed values of ␪ and c whose ratio (␪/c) matched the slope observed in the visual condition (Ha ⫽ 0.0). In this way, we can obtain an approximation of Ha for the different SFs. The estimated values were 2.07 and 1.977 for 0.3 c/° and 0.74 c/°, respectively, and nearly 0.0 for the high-frequency condition 5.93 c/°. This pattern was very similar to that shown by assuming changes in the rate of sensory evidences accumulation; however, after statistical comparison of the goodness of fit for the linear summation against the varying threshold model, the linear summation model results were slightly better in all the SFs but only in a significant way for the 0.3 c/° SF (F2,3 ⫽ 15.64, P ⫽ 0.023). However, we do not interpret these fittings as being conclusive about what particular 1075 mechanism best explains the results; nevertheless, we can claim that the changes in the slope are linked to sensory processing independently of the model applied and are specifically larger for the low-SF visual channels than for the high-SF channels, highlighting their stimulus dependence. ACKNOWLEDGMENTS We thank Dr. Phil Jaekl for his helpful comments on an earlier version of this article. GRANTS This research was supported by the Spanish Ministry of Science and Innovation (PSI2010-15867, PSI2010-15426, and Consolider INGENIO CSD2007-00012), Comissionat per a Universitats i Recerca del DIUE-Generalitat de Catalunya (SGR2009-092, SGR2009-308), and the European Research Council (StG-2010263145). DISCLOSURES No conflicts of interest, financial or otherwise, are declared by the author(s). AUTHOR CONTRIBUTIONS Author contributions: A.P.-B. performed experiments; A.P.-B. and J.L.-M. analyzed data; A.P.-B., S.S.-F., and J.L.-M. interpreted results of experiments; A.P.-B. and J.L.-M. prepared figures; A.P.-B., S.S.-F., and J.L.-M. drafted manuscript; A.P.-B., S.S.-F., and J.L.-M. edited and revised manuscript; A.P.-B., S.S.-F., and J.L.-M. approved final version of manuscript; A.P.-B and J.L.-M. conception and design of research. REFERENCES Andersen TS, Mamassian P. Audiovisual integration of stimulus transients. Vision Res 48: 2537–2544, 2008. Arnold DH, Tear M, Schindel R, Roseboom W. Audio-visual speech cue combination. PloS One 5: e10217, 2010. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57: 289 –300, 1995. Berson DM. Retinal and cortical inputs to cat superior colliculus: composition, convergence and laminar specificity. Prog Brain Res 75: 17–26, 1988. Bolognini N, Frassinetti F, Serino A, Làdavas E. “Acoustical vision” of below threshold stimuli: interaction among spatially converging audiovisual inputs. Exp Brain Res 160: 273–282, 2005. Breitmeyer BG. Simple reaction time as a measure of the temporal response properties of transient and sustained channels. Vision Res 15: 1411–1412, 1975. Calvert GA, Hansen PC, Iversen SD, Brammer MJ. Detection of audiovisual integration sites in humans by application of electrophysiological criteria to the BOLD effect. Neuroimage 14: 427– 438, 2001. Cappe C, Thut G, Romei V, Murray MM. Selective integration of auditoryvisual looming cues by humans. Neuropsychologia 47: 1045–1052, 2009. Carpenter RH. Contrast, probability, and saccadic latency?: evidence for independence of detection and decision. Curr Biol 14: 1576 –1580, 2004. Carpenter RH, Williams ML. Neural computation of log likelihood in control of saccadic eye movements. Nature 377: 59 – 62, 1995. Colonius H, Arndt P. A two-stage model for visual-auditory interaction in saccadic latencies. Percept Psychophys 63: 126 –147, 2001. Corneil BD, Van Wanrooij M, Munoz DP, Van Opstal AJ. Auditory-visual interactions subserving goal-directed saccades in a complex scene. J Neurophysiol 88: 438 – 454, 2002. Diederich A, Colonius H. Bimodal and trimodal multisensory enhancement: effects of stimulus onset and intensity on reaction time. Percept Psychophys 66: 1388 –1404, 2004. Doyle MC, Snowden RJ. Identification of visual stimuli is improved by accompanying auditory stimuli: the role of eye movements and sound location. Perception 30: 795– 810, 2001. Driver J, Spence C. Cross-modal links in spatial attention. Philos Trans R Soc Lond B Biol Sci 353: 1319 –1331, 1998. J Neurophysiol • doi:10.1152/jn.00226.2012 • www.jn.org 1076 SOUND-DRIVEN ENHANCEMENT OF VISION Driver J, Noesselt T. Multisensory interplay reveals crossmodal influences on “sensory-specific” brain regions, neural responses, and judgments. Neuron 57: 11–23, 2008. Ernst MO, Bülthoff HH. Merging the senses into a robust percept. Trends Cogn Sci 8: 162–169, 2004. Falchier A, Clavagnier S, Barone P, Kennedy H. Anatomical evidence of multimodal integration in primate striate cortex. J Neurosci 22: 5749 –5759, 2002. Frassinetti F, Bolognini N, Làdavas E. Enhancement of visual perception by crossmodal visuo-auditory interaction. Exp Brain Res 147: 332–343, 2002. Giard MH, Peronnet F. Auditory-visual integration during multimodal object recognition in humans: a behavioral and electrophysiological study. J Cogn Neurosci 11: 473– 490, 1999. Green DM. Detection of multiple component signals in noise. J Acoust Soc Am 30: 904 –911, 1958. Harwerth RS, Levi DM. Reaction time as a measure of suprathreshold grating detection. Vision Res 18: 1579 –1586, 1978. Jaekl PM, Harris LR. Sounds can affect visual perception mediated primarily by the parvocellular pathway. Vis Neurosci 26: 477– 486, 2009. Jaekl PM, Soto-Faraco S. Audiovisual contrast enhancement is articulated primarily via the M-pathway. Brain Res 1366: 85–92, 2010. Kinchla RA. Detecting target elements in multielement arrays?: a confusability model. Perception 15: 149 –158, 1974. Lakatos P, Chen CM, O’Connell MN, Mills A, Schroeder CE. Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron 53: 279 –292, 2007. Laurienti PJ, Kraft RA, Maldjian JA, Burdette JH, Wallace MT. Semantic congruence is a critical factor in multisensory behavioral performance. Exp Brain Res 158:, 405– 414, 2004. Legge GE. Sustained and transient mechanisms in human vision: temporal and spatial properties. Vision Res 18: 69 – 81, 1978. Leo F, Bertini C, di Pellegrino G, Làdavas E. Multisensory integration for orienting responses in humans requires the activation of the superior colliculus. Exp Brain Res 186: 67–77, 2008. Levi DM, Harwerth RS, Manny RE. Suprathreshold spatial frequency detection and binocular interaction in strabismic and anisometropic amblyopia. Invest Ophthalmol Vis Sci 18: 714 –725, 1979. Lippert M, Logothetis NK, Kayser C. Improvement of visual contrast detection by a simultaneous sound. Brain Res 1173: 102–109, 2007. Livingstone MS, Hubel DH. Do the relative mapping densities of the magnoand parvocellular systems vary with eccentricity? J Neurosci 8: 4334 – 4339, 1988. Logan GD, Cowan WB, Davis KA. On the ability to inhibit simple and choice reaction time responses: a model and a method. J Exp Psychol Hum Percept Perform 10: 276 –291, 1984. López-Moliner J, Soto-Faraco S. Vision affects how fast we hear sounds move. J Vis 7: 6.1– 6.7, 2007. Manjarrez E, Mendez I, Martinez L, Flores A, Mirasso CR. Effects of auditory noise on the psychophysical detection of visual signals: crossmodal stochastic resonance. Neurosci Lett 415: 231–236, 2007. McDonald JJ, Teder-Sälejärvi WA, Hillyard SA. Involuntary orienting to sound improves visual perception. Nature 407: 906 –908, 2000. McDonald JJ, Teder-Sälejärvi WA, Ward LM. Multisensory integration and crossmodal attention effects in the human brain. Science 292: 1791, 2001. McDonald JJ, Teder-Sälejärvi WA, Di Russo F, Hillyard SA. Neural basis of auditory-induced shifts in visual time-order perception. Nat Neurosci 8: 1197–1202, 2005. Meredith MA, Stein BE. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. J Neurophysiol 56: 640 – 662, 1986. Meredith MA, Stein BE. Spatial determinants of multisensory integration in cat superior colliculus neurons. J Neurophysiol 75: 1843–1857, 1996. Meredith M, Stein B. Interactions among converging sensory inputs in the superior colliculus. Science 221: 389 –391, 1983. Merigan WH, Maunsell JH. How parallel are the primate visual pathways? Annu Rev Neurosci 16: 369 – 402, 1993. Meyer GF, Wuerger SM, Röhrbein F, Zetzsche C. Low-level integration of auditory and visual motion signals requires spatial co-localisation. Exp Brain Res 166: 538 –547, 2005. Miller J. Divided attention: evidence for coactivation with redundant signals. Cogn Psychol 14: 247–279, 1982. Mitov D, Totev T. How many pathways determine the speed of grating detection? Vision Res 45: 821– 825, 2005. Molholm S, Ritter W, Murray MM, Javitt DC, Schroeder CE, Foxe JJ. Multisensory auditory-visual interactions during early sensory processing in humans?: a high-density electrical mapping study. Cogn Brain Res 14: 115–128, 2002. Morrell F. Visual system’s view of acoustic space. Nature 238: 44 – 46, 1972. Morrone MC, Burr DC, Vaina LM. Two stages of visual processing for radial and circular motion. Nature 376: 507–509, 1995. Murray IJ, Plainis S. Contrast coding and magno/parvo segregation revealed in reaction time studies. Vision Res 43: 2707–2719, 2003. Nickerson RS. Intersensory facilitation of reaction time: energy summation or preparation enhancement? Psychol Rev 80: 489 –509, 1973. Nieuwenhuis S, Ridderinkhof KR, Blom J, Band GP, Kok A. Error-related brain potentials are differentially related to awareness of response errors: evidence from an antisaccade task. Psychophysiology 38: 752–760, 2001. Noesselt T, Tyll S, Boehler CN, Budinger E, Heinze HJ, Driver J. Sound-induced enhancement of low-intensity vision: multisensory influences on human sensory-specific cortices and thalamic bodies relate to perceptual enhancement of visual detection sensitivity. J Neurosci 30: 13609 –13623, 2010. Noesselt T, Bergmann D, Hake M, Heinze HJ, Fendrich R. Sound increases the saliency of visual events. Brain Res 1220: 157–163, 2008. Nozawa G, Reuter-Lorenz PA, Hughes HC. Parallel and serial processes in the human oculomotor system: bimodal integration and express saccades. Biol Cybern 72: 19 –34, 1994. Odgaard EC, Arieh Y, Marks LE. Cross-modal enhancement of perceived brightness: sensory interaction versus response bias. Percept Psychophys 65: 123–132, 2003. Otto TU, Mamassian P. Noise and correlations in parallel perceptual decision making. Curr Biol 22: 1391–1396, 2012. Pannunzi M, Pérez-Bellido A, Baños AP, López-Moliner J, Deco G, Soto-Faraco S. Scrutinizing integrative effects in a multi-stimuli detection task. Seeing Perceiving 25: 100, 2012. Piéron H. Recherches sur les lois de variation des temps de latence sensorielle en fonction des intensités excitatrices. L’Année Psychol 20: 17–96, 1914. Pins D, Bonnet C. The Piéron function in the threshold region. Percept Psychophys 62: 127–136, 2000. Pins D, Bonnet C. On the relation between stimulus intensity and processing time?: Pieron’s law and choice reaction time. Perception 58: 390 – 400, 1996. Plainis S, Murray IJ. Neurophysiological interpretation of human visual reaction times: effect of contrast, spatial frequency and luminance. Neuropsychologia 38: 1555–1564, 2000. Quick RF. A vector-magnitude model of contrast detection. Kybernetik 16: 65– 67, 1974. Raab DH. Statistical facilitation of simple reaction times. Trans NY Acad Sci 24: 574 –590, 1962. Romei V, Murray MM, Cappe C, Thut G. Preperceptual and stimulusselective enhancement of low-level human visual cortex excitability by sounds. Curr Biol? 19: 1799 –1805, 2009. Rowland BA, Quessy S, Stanford TR, Stein BE. Multisensory integration shortens physiological response latencies. J Neurosci 27: 5879 –5884, 2007. Schroeder CE, Foxe J. Multisensory contributions to low-level, “unisensory” processing. Curr Opin Neurobiol 15: 454 – 458, 2005. Silveira LC, Perry VH. The topography of magnocellular projecting ganglion cells (M-ganglion cells) in the primate retina. Neuroscience 40: 217–237, 1991. Smiley JF, Falchier A. Multisensory connections of monkey auditory cerebral cortex. Hear Res 258: 37– 46, 2009. Spence C, Driver J. Covert orienting in audition: endogenous and exogenous mechanisms. J Exp Psychol Hum Percept Perform 20: 555–574, 1994. Stein BE. Neural mechanisms for synthesizing sensory information and producing adaptive behaviors. Exp Brain Res 123: 124 –135, 1998. Stein BE, Stanford TR, Ramachandran R, Perrault TJ, Rowland BA. Challenges in quantifying multisensory integration: alternative criteria, models, and inverse effectiveness. Exp Brain Res 198: 113–126, 2009. Stein BE, London N, Wilkinson LK, Price DD. Enhancement of perceived visual intensity by auditory stimuli: a psychophysical analysis. J Cogn Neurosci 8: 497–506, 1996. Stein BE, Meredith MA, Huneycutt WS, McDade L. Behavioral indices of multisensory integration: orientation to visual cues is affected by auditory stimuli. J Cogn Neurosci 1: 12–24, 1989. J Neurophysiol • doi:10.1152/jn.00226.2012 • www.jn.org SOUND-DRIVEN ENHANCEMENT OF VISION Thomas JP, Fagerholm P, Bonnet C. One spatial filter limits speed of detecting low and middle frequency gratings. Vision Res 39: 1683–1693, 1999. Treisman M. Combining information: probability summation and probability averaging in detection and discrimination. Psychol Methods 3: 252–265, 1998. Ulrich R, Miller J, Schröter H. Testing the race model inequality: an algorithm and computer programs. Behav Res Methods 39: 291–302, 2007. Valls-Solé J, Solé A, Valldeoriola F, Muñoz E, Gonzalez LE, Tolosa ES. Reaction time and acoustic startle in normal human subjects. Neurosci Lett 195: 97–100, 1995. Van der Burg E, Cass J, Olivers CN, Theeuwes J, Alais D. Efficient visual search from synchronized auditory signals requires transient audiovisual events. PloS One 5: e10664, 2010. 1077 Wallace MT, Meredith MA, Stein BE. Converging influences from visual, auditory, and somatosensory cortices onto output neurons of the superior colliculus. J Neurophysiol 69: 1797–1809, 1993. Wallace MT, Meredith MA, Stein BE. Multisensory integration in the superior colliculus of the alert cat. J Neurophysiol 80: 1006 –1010, 1998. Watkins S, Shams L, Tanaka S, Haynes JD, Rees G. Sound alters activity in human V1 in association with illusory visual perception. Neuroimage 31: 1247–1256, 2006. Werner S, Noppeney U. The contributions of transient and sustained response codes to audiovisual integration. Cereb Cortex 21: 920 –931, 2011. Wuerger SM, Hofbauer M, Meyer GF. The integration of auditory and visual motion signals at threshold. Percept Psychophys 65: 1188 –1196, 2003. J Neurophysiol • doi:10.1152/jn.00226.2012 • www.jn.org

Sound-driven enhancement of vision: disentangling detection

Related documents

Products

Support

Sound-driven enhancement of vision: disentangling detection

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib