In: Kokinov, B., Karmiloff-Smith, A., Nersessian, N. J. (eds.) European Perspectives on Cognitive Science. © New Bulgarian University Press, 2011 ISBN 978-954-535-660-5 Multimodal Temporal Processing Between Separate and Combined Modalities Adam D. Danz (adam.danz@gmail.com) Central and East European Center for Cognitive Science, New Bulgarian University 21 Montevideo St., 1618 Sofia, Bulgaria Abstract Previous research has shown that the auditory modality dominates in detecting temporal frequency changes when there is a discrepancy between the auditory and visual modalities. Little to no research investigates how the visual and auditory modalities cooperate when the temporal frequencies are perceived in parallel between the two sensory modalities. In experiment I, detection of temporal frequency changes of an increase or decrease of 5% from a base frequency of 2Hz are examined in separate modalities. In experiment II, the frequencies were presented in parallel between both modalities. Comparison of these results shows support towards multimodal sensory integration rather than auditory dominance of temporal perception. Keywords: multimodal, temporal discrepancy; perception; RT, time, auditory, visual, sensory integration, cognition. Introduction Many experimental designs have been devised in studying temporal processing utilizing auditory stimuli (Schubotz, Friederici, & Cramon, 2000), visual stimuli (Moutoussis, 1997), and even tactile stimuli (Macar et. al., 2002). Researchers have studied how one percept can influence the other in multimodal temporal processing (Welch & DuttonHurt, 1986; Gebhard & Mowbray, 1959) and how various aspects within a single percept are perceived at different time courses (Moutoussis, 1997). The bulk of literature concerning multimodal temporal processing focuses on a paradigm of competing modalities in order to determine which sensory modality is dominant in temporal perception. The study herein, however, provides an examination of multimodal temporal stimuli between vision and audition as they are perceived when stimuli are presented congruently in parallel (experiment II) compared to processing these modalities in isolation (experiment I). In this approach, sensory modalities are not competing against differing temporal representations and instead, they may either work together or allow one modality to dominate interpretation of the temporal stimuli. Sensory influence on temporal processing As Newton proclaimed in Principia Mathematica (1698), “absolute, true, and mathematical time, of itself and from its own nature, flows equably without regard to anything external”, yet without sensory organs to detect this incorporeal flow, humans have the ability to synchronize with and recreate temporal patterns even accurately predicting some durations of time (Rao et.al., 2001). Even more intriguing, temporal perception is highly congruent across observers. Because human sensory organs are susceptible to limitations, the processing of external temporal events are affected and, as with every sensory perception, not necessarily representative of the actual environment. When two or more events occur without discernable succession, they are said to have occurred simultaneously, or to have occurred „at the same time‟. Perceived simultaneity does not conform to physical simultaneity. In attempt to determine the limits of perceived simultaneity, many paradigms have been used resulting in many differing conclusions Hirsh & Sherrig (1961) examined the succession/simultaneity threshold of various sensory modalities in isolation including vision, audition, and tactile perception. The results showed that successive stimuli under approximately 20ms intervals were perceived as simultaneous across all modalities whereas intervals greater that 20ms were perceived as successive. Instead of singlemodality succession of stimuli, Hirsh & Fraisse (1964) tested succession across modalities using an acoustic click and a brief flash of light. When the sound preceded the light, the threshold was measured at about 60ms while light preceding the sound resulted in thresholds between 90 to 120ms (Hirsh & Fraisse, 1964). The contrast of these results demonstrates the sensitivity of temporal perception relative to sensory modality. However, such inconsistencies have been shown within a single modality by merely changing the complexity of the stimuli (see Moutoussis, 1997). For example, when six letters are presented in random succession, they are perceived as simultaneous as long as the total duration does not succeed 90ms (Hylan, 1903). When four light emitting diodes in the shape of a diamond are lighted in an ordered succession, as long as the duration between the first and last flash of light is under 125ms, the succession is perceived as simultaneous (Lichtenstein, 1961). With such variability of simultaneity-threshold measurement, it is not surprising that investigations of duration perception yield varying results. Experiments involving duration require participants to either estimate a duration after it has completed or interact with a duration that is being perceived at present time. For stimuli used in duration discrimination tasks, not only do participants process the temporal span of stimuli as in simultaneity research, but the semantic aspects of stimuli are also processed. The „paradox of subjective duration‟ (Pöppel, 1997) demonstrates that in retrospective evaluation of duration, a high memory load, consistent with a difficult task, complex stimuli, or both, will result in less attention towards the actual duration and therefore an overestimation of that duration. On the other hand, during the experience of time passing, if task and stimuli are complex, time seems to pass quicker rather than during states of boredom (Pöppel, 1997; Fraisse, 1979) The sensory modalities chosen for temporal perception research, the types of stimuli and their features, the complexity of the stimuli, and the response task all contribute to measures of temporal processing. Since these sensory-based confounding constraints are unavoidable, researchers should be conscious of them when designing their experiments and interpreting their results. Integration of Multimodal Sensory Data The „modality appropriateness hypothesis‟ states that in multimodal perception, contributions of various sensory modalities are relative to the stimuli being perceived. Welch & DuttonHurt (1986) demonstrated this effect in the field of temporal perception by using bimodal temporal stimuli resulting in auditory dominance over vision in discrimination of temporal frequencies. In their experiment, designed discrepancies between auditory and visual frequencies showed evidence of auditory bias over vision demonstrating that when visual and auditory temporal frequencies are in conflict the auditory information dominates the percept. Whereas vision specializes in spatial perception, audition seems to dominate that of temporal perception as a gross number of studies confirm. The threshold of vision in discriminating between flashes of light is much lower than the threshold of audition in discriminating between successive bursts of sound showing that temporal acuity is much higher in audition than in vision. This gives rise to the phenomenon referred to as „auditory driving‟ which shows that when an auditory frequency is gradually increased or decreased while being compared to a steady visual frequency, the visual oscillations seem to increase or decrease along with the auditory stimuli even though they are remaining constant (Gebhard & Mowbray, 1959). The visual system does not seem to yield the same effect, however, on auditory perception. Sensory transduction itself operates on different time courses as sensory data are being transformed into electrochemical impulses. Specifically, auditory information is encoded faster than visual information and if a bimodal temporal stimulus is to be perceived simultaneously the brain must either lean towards the timing of one modality or integrate the incoming sensory data (evidence of integration sites for auditory and visual stimuli can be found in Bushara, Grafman, & Hallett, 2001 and Calvert, et. al., 2001). The dynamic integration of temporal perception spans beyond sensory constraints. By using direct galvanic stimulation of the vestibular system, Trainor et.al. (2009) were able to manipulate participants into perceiving otherwise ambiguous rhythms as specific interpretations. The galvanic stimulation replicated the common experience of nodding the head to music except without bodily movement. The vestibular system sends much of its information to the cerebellum, which has been shown to play a role in interval timing (Ivry, 1996). Tempo Perception, in Music Though the ties between frequency perception and tempo in music research seem plausibly related, there is a communicative gap between music theorists and cognitive scientists who currently use different lexicons to measure the same entity. Where frequency is measured in hertz (Hz) and durations in milliseconds (ms), tempo is measured in beats per minute (bpm) whereby a „beat‟ is Euclidean in that it represents a point or marker without duration in and of itself. While psychophysical measurements concern thresholds of simultaneity and succession, music theorists broaden their scope to measure the range of tempos with which musicians may interact. Tempi above ~300bmp (5Hz, 200ms intervals) are perceptually difficult to discern in the context of music (Van Noorden & Moelants, 1999) while tempi under ~40bpm (0.67Hz, 1500ms intervals) result in perceptual isolation of each beat and is beyond the capacity of working memory to process two consecutive beats that are required to create a tempo (Van Noorden & Moelants, 1999). Moelants (2002) implied that there must be a zero point between these ranges where tempo perception is optimal. Researchers have settled at an optimal tempo centered around 120 bpm (2Hz, 500ms intervals) which has also been replicated within the visual modality (Luck and Sloboda, 2007). Methods Experiment I: Separate Modalities Participants 20 participants (7 males) took part in the study with mean age 26.4 (19-35 years; 4.4 SD). Participants with imperfect vision wore corrective lenses or glasses and no participants were hearing impaired. All participants had little to no musical experience eliminating experts in tempo discernment. Stimuli and Design The auditory pulse consisting of a square wave tone of 440 Hz lasting 125ms presented binaurally using full coverage headphones produced the sound of one auditory „beep‟. Each 125ms pulse followed by one empty interval consisted of one cycle. Cycles repeated between 15 to 22 times defining one stimulus. There was a 1500ms inter trial interval which also consisted of silence. The baseline tempo consisted of the 125ms tone followed by a 375ms period of silence before the following tone. Combined, the 500ms baseline cycle represented the preferred tempo range of 120bpm (2Hz). During the auditory portion of the experiment, the computer monitor was gray as in the visual portion of the experiment; however, participants were not instructed to watch the monitor. The visual stimulus consisted of a black circle with a diameter of 1.25cm centered on the screen with a light gray background to reduce contrast and thereby reducing afterimage effects. Similar to the auditory stimulus, the „dot‟ appeared for 125ms followed by a 375ms light gray background. This constituted the 500ms (120bpm, 2Hz) baseline frequency for the visual pulse. Two experimentally manipulated conditions, two masking conditions, plus one set of catch trails, or control trials, were designed and replicated across both modalities. Unchanged frequencies, or „catch trials‟, were included in the experiment so that participants would not signal a detection on every trial thereby decreasing habitual guessing. The experimentally manipulated conditions all began at a baseline frequency of 500ms intervals (120 bpm, 2Hz) and after at least seven cycles of 500ms (but no more than 14 cycles) featured a sudden 5% increase or decrease in frequency. Masking trials consisted of either a 20% decrease or 30% increase of frequency. Catch trials continued at the base frequency without change. Thus, the design of the experiment was a within-subject 2 (Modality: visual vs. auditory) x 2 (Directionality: decrease vs. increase) design. Eight different change points were used and repeated three times per item making a total of 24 trials per condition per modality. Catch trials totaled 24 and masking trials totaled 50 (24 increases and 24 decreases) per modality. Changes in frequency were created by increasing or decreasing the silent, or „empty‟, intervals between pulses whereby a decrease in interval created an increase in frequency and vice versa. Five percent increases contained an empty interval of 351 ms and 5% decreases contained 410 ms of silence between pulses. Changes occurred only once during the stimulus. Cycles continued for exactly eight pulses after the frequency change ending with a silent interval. For this reason, the window of time for detection of the change was different between conditions ranging from three to five seconds. While each modality was tested separately, all trials including the masking and catch trials were pseudo randomized. Procedure Participants were tested individually using E-Prime 2.0 software (Schneider, Eschman, & Zuccolotto, 2002), wore full-coverage headphones with external volume control during the auditory portion of the experiment, and were left uninterrupted in a quiet room for the full duration of the experiment. Participants were instructed to adjust the volume to their preference at the beginning of the practice session. Half of the participants began with the auditory portion of the experiment while the other half began with the visual portion. Participants were provided with written instructions to press the mouse button as soon as they detect a change in frequency of the stimuli. They were informed that some trials would be more obvious than others while some trials would not change at all. Instructions were followed by a practice session featuring one to two trials in each condition including masking and catch trials. Each portion of the experiment required about thirty minutes and participants were able to take a break between auditory and visual portions of the experiment as well as after every 40 trials within each modality. After completing 120 trials, participants were tested in the other modality. E-prime experiment generator software controlled stimuli presentation, timing, recorded responses and reaction time (Schneider, Eschman, & Zuccolotto, 2002). Experiment II: Combined Modalities Participants 21 participants (3 male) took part in the study with mean age 25.3 (20-36 years; 4.7 SD). Participants with imperfect vision wore corrective lenses or glasses. One participant had visibly noticeable amblyopia but performed better than average in the experiment. No other visual deficiencies were present in the participants and all had unimpaired hearing. All participants had little to no musical experience eliminating experts in tempo discernment. Three participants had taken part in experiment I at least three weeks prior to participation in this experiment. Stimuli and Design The only difference between the stimuli of this experiment and of experiment I is that the visual and auditory pulses were combined to make one audio-visual pulse consisting of the 1.25cm black circle and 440hz square wave tone in controlled synchrony for 125ms. The same two conditions (decrease vs. increase) were used as in experiment I as there were no other changes to the stimuli between experiments. The use of E-prime experiment generator software ensured the physical simultaneity and congruency of the auditory and visual stimuli each beginning and ending in parallel. Procedure The procedure matched that of experiment I although requiring half the time, about 30 minutes, due to the combination of modalities. Participants were instructed simply to press the mouse button if and when they detected a change in frequency of the stimuli while ignoring unchanged catch trials. Participants were not influenced to add any unnatural attention towards visual or auditory modalities as it was up to their discretion in how they detected the changes in frequency. This is important to note as it provides a natural approach to detection and does not force unnatural attention to any particular modality. This, in addition to non-competitive temporal frequencies, provides a natural perception of frequencies as experienced outside of the laboratory. Results Experiment I: Separate Modalities The data of four participants were withdrawn from analysis after accuracy calculations showed that two had performed under chance level as their d-prime analysis (d‟) resulted in scores of less than 1.1 in all conditions within both modalities. The other two participants were excluded from analysis because they did not detect any changes in one of the experimental conditions. The following results are from the remaining 16 participants. Dependent variables included reaction time (RT), number of pulses to detect temporal change (NoP), and accuracy of detection (measured in percent-hits and d‟). RT analysis Reaction time analysis was based on correct responses only. For a response to be correct, detection was required to occur after the onset of frequency change. This eliminated trials that did not feature a frequency change as well as trials where participants signaled detection prematurely to the programmed frequency change. RT was measured from the point of frequency change in each stimulus to the point of detection by the participant. For increases of frequency the initial pulse after the increase is naturally presented sooner than in baseline cycles allowing for immediate discrimination from the baseline frequency. Decreases in frequency, however, naturally present the initial pulse following the decrease later than expected. For this reason, all RT measurements in the decrease condition were tailored by subtracting the baseline frequency interval (375ms) from the final RT since it is impossible to detect this change in frequency during this initial span of time. It is only after this initial 375ms that the frequency actually changes in decreases conditions as the empty interval become longer in duration and timing mechanisms may begin to detect this change in frequency. Table 1 shows mean RT and SD per condition in both sensory modalities. A 2 (Modality: visual vs. auditory) x 2 (Directionality: decreases vs. increases) repeated measures analysis of variance (ANOVA) on item and subject RT means resulted in concurrence with preferred tempo research (Moelants, 2002) having no main effect of increases or decreases from the 120 bpm (2Hz, 500ms interval) base frequency (Fi (1, 7)=0.84; p>0.7; Fs (1, 15)=0.23; p>0.8). Furthermore, overall auditory detection of the frequency changes did not differ from visual performance (Fi (1, 7)=2.98; p>0.1; Fs (1, 15)=4.26; p>0.056). An interaction was found between modality (auditory vs. visual) and directionality (increase vs. decrease) (Fi (1, 7)=14.26; p<0.006; Fs (1, 15)=14.29; p<0.001) first showing that increase conditions did not differ across modalities and secondly, RT was greater by about 540ms for auditory-decrease conditions than visual decreases which resided near the more quickly detected increase conditions (p<0.005; Bonferroni test was used for all post-hoc analysis in both experiments). Pulse analysis While RT analysis shows meaningful results, measuring the number of pulses that transpire between frequency change and detection is an additional measure of RT that sheds light upon aspects of temporal perception and especially music and cognition. RT alone does not describe how many cycling stimuli were required to detect the change in frequency. Additionally, pulse analysis is more attributable towards cognitive models of duration discrimination since those models measure accumulated pulses while their output is dependent upon the pulse-count. Furthermore, remembering that trials changed frequency once and only once during observation, and understanding that mean RT were considerably longer than one or two cycles of each frequency, there is evidence that detection of frequency change is holistic rather than direct comparison of individual cycles. It might be the case that NoP is a richer measurement of temporal processing in such a cyclical experimental design in that it explicitly shows the number of cycles needed before confidence is reached and signaled in detecting a change in frequency. Though measurements of RT and NoP should coincide, there remains some variance between them due to how each variable is measured. Counting cycles between signal and detection results in a whole number whereas RT is measured to the millisecond. For example, if participants signal detection after the sixth cycle, there can be an RT variance of up to 400ms depending on when the detection occurred within the empty interval. Secondly, in conditions of frequency decrease, fewer cycles will transpire in a fixed amount of time than in conditions of frequency increase and in comparison to baseline frequencies. These differences should be considered in comparing literature using differing systems of measurement. Future research would benefit from the use of both measurements. Number of pulses required to detect frequency change was analyzed using correct responses resulting in the exact same data set as the RT analysis. Pulses were counted after the first changing empty interval. Table 1 shows mean NoP and SD per condition in both sensory modalities. A 2 (Modality: visual vs. auditory) x 2 (Directionality: decreases vs. increases) repeated measures analysis of variance (ANOVA) on item and subject NoP means agreed with RT findings showing no main effects of directionality (increase vs. decrease) (Fi (1, 7)=0.90; p>0.3; Fs (1, 15)=1.78; p>0.2) nor modality (auditory vs. visual) (Fi (1, 7)=2.27; p>0.1; Fs (1, 15)=3.11; p>0.09). In further agreement with RT measurements, the interaction between modality (auditory vs. visual) and directionality (increase vs. decrease) (Fi (1, 7)=14.95; p<0.006; Fs (1, 15)=16.324; p<0.001) resulted in greater NoP for the auditory-decrease condition than other conditions, though not significantly different from visual increase detection. This small discrepancy is most likely due to properties of NoP measurement resulting in more pulses for increase conditions compared to decrease conditions. Experiment II: Combined Modalities Data from three participants were removed from analysis after accuracy calculations showed performance under chance level for at least one condition. An additional two participants were removed from analysis to match for experiment I. The following results are from the remaining 16 participants. Repeated measures analysis of variance (ANOVA) of item and subject RT means (Fi (1, 7)=7.63; p<0.002; Fs (1, 15)=6.47; p<0.002) showed a main effect of directionality (increase vs. decrease) resulting in faster detection of decreases by ~250ms on average. Analysis of NoP had no effect (Fi (1, 7)=2.27; p>0.1; Fs (1, 15)=1.46; p>0.2), as there also was no effect for ANOVA analysis of accuracy (Fi (1, 7)=0.01; p>0.9; Fs (1, 15)=0.002; p>0.9) Table 1: means and standard deviations RT (ms) (SD) Experiment 1 Percent correct as well as d-prime (d‟) scores were examined in analysis of accuracy while both measurements were statistically analyzed separately. Here, „percent correct‟ is defined as the percentage of experimental trials where participants signaled a detection after the change of frequency. In signal detection theory, these are known as „percent hits‟. For a response to be correct, detection had to be signaled after the experimentally manipulated onset of frequency change and before the end of the stimulus, exactly eight cycles after frequency change. Because d’ scores matched the accuracy measurements, they are not reported herein. Due to the design of both experiments, percent correct measurements are more appropriate than d’ scores. Sixteen participants executing 240 trials each resulted in 3840 sets of data. Control conditions, or catch trials, accounted for 768 of these trials, while masking conditions accounted for 1536 trials. The remaining 1536 trials were independent variable trials. From the control conditions, featuring no change of frequency, 78% (598 trials) were correctly rejected. From the frequency-change trials, 43% (660 trials) were detected, 48% (731 trials) were missed by participants, and 9% (145 trials) signaled detection prior to the experimentally manipulated change of frequency. Though these figures seem to hover at chance level of performance, it should be noted that in similar designs, 5% changes from similar baseline frequencies have been detected (Jongsma, 2007). Five percent change of frequency is not easily detected even among professional musicians (Danz & Janyan, 2009). Strategies of pure guessing also would have resulted in a greater number of incorrect detections prior to the experimentally manipulated change. For a comparison with the much easier masking trials, 30% increases were detected with 84% accuracy and 20% decreases with 85% accuracy. A 2 (Modality: visual vs. auditory) x 2 (Directionality: decreases vs. increases) repeated measures analysis of variance (ANOVA) on item and subject accuracy means obtained a main effect of directionality (increase vs. decrease) resulting in much greater accuracy in the frequency-increase conditions (Fi (1, 7)=427.25; p=0.000; Fs (1, 15)=50.83; p=0.000). A main effect of modality (auditory vs. visual) was also found resulting in auditory frequencies having higher accuracy (Fi (1, 7)=14.96; p<0.006; Fs (1, 15)=19.23; p<0.005). An interaction between modalities and directionality explains these main effects (Fi (1, 7)=427.25; p=0.000; Fs (1, 15)=63.67; p=0.000) as the accuracy of detection of auditory-increases far outperformed other conditions. Audition detected 5% increases at a mean 75% success (p=0.000) while all other conditions were detected at chance levels (24%-38%). Among these conditions, auditory-decrease resulted in the lowest accuracy of 24% (p<0.03). Exp. 2 Accuracy NoP (SD) Accuracy (SD) total 2235 836 4.9 1.2 49.87% 30.26% dec 2483 1111 5.5 1.4 24.48% 16.24% inc 2154 708 4.4 0.7 75.26% 15.92% total 2146 1098 4.5 0.7 36.07% 16.45% dec 1942 1114 4.2 0.6 34.38% 18.29% inc 2331 1053 4.8 0.7 37.76% 14.79% total 2177 945 4.6 0.8 53.26% 19.16% Combined dec 2052 883 4.5 0.6 53.39% 21.15% inc 2302 989 4.7 1.0 53.13% 17.64% Auditory Visual Discussion Many multimodal investigations of temporal processing set sensory representations in competition with one another to determine how the mind best perceives and processes time. One eye may have better vision than the other yet both contribute to perception. Likewise, time is perceived dynamically via many sensory organs and may even employ embodied variables from the vestibular system (Trainor et.al; 2009), gait and motion patterns, the inherent rhythmicity of the heart, breathing patterns, and sleep cycles, in order to develop a conscious experience of what Newton claims to be an „equably flow without regard to anything external‟. In the current study, multimodal temporal frequencies were instead studied in parallel and in congruency while being compared to the performance of isolated sensory modalities. The conditions of 5% frequency change from a base frequency of 2Hz is considered fairly difficult to detect yet this condition has been used in other studies as well (Jongsma, 2007). In separate modalities, the data herein conflicted with the „preferred tempo‟ findings (Moelants, 2002) in that auditory detection of frequency increases from a 120bpm base were detected far more accurately than decreases, though RT between these conditions did not differ significantly albeit a slightly quicker detection of decreases by the visual modality (see Table 1). Due to this strength of auditory detection of 5% increases, the auditory modality outperformed the visual modality in accurately detecting the challenging condition, holding true to the bulk of literature pointing towards the auditory system as dominant in temporal perception. However, when these modalities were presented in parallel without competition, the dominance of the auditory system became extinct and resulted in accuracies between increase and decrease conditions that varied less that 1% and resided near chance level. While the phenomenon of auditory driving demonstrated the auditory system‟s ability to overcome incompatible frequencies of the visual modality, when these bimodal perceptions are presented in parallel, the auditory system loses its drive as it gives way to visual processing. Furthermore, decrease conditions for auditory and visual modalities resulted in very low mean accuracies (24% and 34% respectively) in experiment I. However, in experiment II, with the combination of audition and vision, these means almost doubled to 53% in both conditions. This cannot be explained via auditory driving since audition alone resulted in much lower accuracy than in combination with vision. This could point towards integrative effects of multimodal perception and away from the modality appropriate hypothesis. With concerns to RT, the visual modality registered decreases of frequency about 300ms on average faster than increases while auditory detection did not significantly differ between these conditions but registered detection of decreases more than 500ms later than vision. In combined modalities, the visual artifact was preserved as decreases were detected about 250ms on average faster than increases (see Table 1). Once again, auditory integration with the visual system improved detection of temporal frequency change when compared to audition alone. While audition seems to lead perceptions of frequency in isolation or in sensory conflict situations, parallel and congruent multimodal frequencies demonstrate a more integrative and dynamic representation of time. References Bushara, K.O., Grafman, J., Hallett, M. (2001). Neural correlates of auditory-visual onset asynchrony detection. The Journal of Neuroscience, 21(1): 300-304. Calvert, G.A., Hansen, P.C., Iverson, S.D., Brammer, M.J. (2001). Detection of audio-visual integration sites in humans by application of electrophysiological criteria to the BOLD effect. NeuroImage, 14: 427-438. Danz, A., & Janyan, A. (2009). Detecting audio-video tempo discrepancies between conductor and orchestra. In N.A. Taatgen & H. van Rijn (Eds.), Proceedings of the 31th Annual Conference of the Cognitive Science Society (pp. 3064-3069). Austin, TX: Cognitive Science Society. Fraisse, P. (1979). Influence de la duree du traitement de I'information sur I'estimation du'une duree d'une secondc [Influence of the duration of course of information on estimation of seconds.] Annee Psychol.79:495-504 Gebhard, J. W., Mowbray, G., H. (1959). On discriminating the rate of visual flicker and auditory flutter. American Journal of Psychology, 71: 521-528. Hirsh I.J., Fraisse, P. (1964). Simultanéité et succession de stimuli hétérogènes [Simultaneity and succession of heterogeneous stimuli]. L‟Année Psychologique, 64, 1-19 Hirsh I. J., Sherrig, C., E. (1961). Perceived order in different sense modalities. Journal of Experimental Psychology, 62: 423-432. Hylan, J.P. (1903). The distribution of attention. Psychological Review, 10(4): 373-403. Ivry, R.B., (1996). The representation of temporal information in perception and motor control. Current Opinion in Neurobiology, 6: 851-857 Jongsma, M.L.A., Meeuwissen, E., Vos, P.G., Maes, R. (2007). Rhythm perception: Speeding up or slowing down affects different subcomponents of the ERP P3 complex. Biological Psychology 75 (3), 219-228. Lichtenstein, M. (1961). Phenomenal simultaneity with irregular timing of components of the visual stimuli. Perceptual and Motor Skills. 12: 47-60. Luck, G., Sloboda, J. (2007). Synchronizing with complex biological motion: An investigation of musicians‟ synchronization with traditional conducting beat patterns. Music Performance Research, 1(1), 26–46. Macar, F., Lejenne, H., Bonnet, M., Ferrara, A., Pouthas, V., Videl, F., Maquet, P. (2002). Activation of the supplementary motor area and attentional networks during temporal processing. Experimental Brain Research, 142: 475-485. Moelants, D. (2002). Preferred tempo reconsidered. Proceedings of the 7th International Conference on Music Perception and Cognition, Sydney Moutoussis, K. Z., (1997). A direct demonstration of perceptual asynchrony in vision. Proceedings of the Royal Society of London, Biology, 264: 393-399. Newton, I. (1689). Scholium to the definitions in philosophiae naturalis principia mathematica, BK 1; trans. Andrew Motte (1729), rev. Florian Cajori, Berkeley: University of California Press, 1934. Pöppel, E. (1997). A hierarchical model of temporal perception. Trends in Cognitive Sciences, 1(2):56-61 Rao, S.M., Mayer, A.R., Harrington, D.L., (2001). The evolution of brain activation during temporal processing. Nature neuroscience, 4(3): 317-323 Schneider, W., Eschman, A., Zuccolotto, A. (2002). EPrime user‟s guide. Pittsburgh: Psychology Software Tools Inc. Schubotz, R., Friederici, A., von Cramon, Y. (2000). Time Perception and motor timing: a common cortical and subcortical basis revealed by fMRI. NeuroImage, 11: 112. Trainor, L.J., Gao, X., Lei. J., Lehtovaara, K., Harris, L.R. (2009). The primal role of the vestibular system in determining musical rhythm. Cortex, 45: 35-43. Van Noorden, L., Moelants, D. (1999). Resonance in the perception of musical pulse. Journal of New Music Research, 28(1): 43-66. Welch, R. B., DuttonHurt, L.D. (1986). Contribution of audition and vision to temporal rate perception. Perception and Psychophysics, 39(4): 294-300.