Empty and filled rhythms: An inquiry into the different cognitive processing of vocal and instrumental rhythms Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Arts in the Graduate School of The Ohio State University By Yong Jeon Cheong Graduate Program in Music The Ohio State University 2013 Master’s Examination Committee: Professor Arved Ashby Professor Jan Radzynski Professor Udo Will, Advisor Copyright by Yong Jeon Cheong 2013 Abstract In contemporary thinking about music the distinction between vocal and instrumental music does not play a decisive role. However, physiological and imaging studies have shown that our brain process human speech sound differently from nonspeech sound (Belin et al., 2000; Levy et al., 2003), and Hung (2011) found differences in auditory processing of vocal rhythm and clapstick rhythm. Furthermore, a behavioral study of Klyn (2012) indicates that there are also differences in working memory between vocal rhythm and clapstick rhythm. The goal of this thesis to test whether these differences between vocal and clapstick rhythms are due to one specific physical difference between voice and clapstick sounds, that is, the continuity (filled rhythm) and discontinuity (empty rhythm) of the events they contain. In the experiment, two participant groups (musicians and non-musicians) listened to stimulus pairs of three different sound types. The first type is vocal rhythm, the second is filled instrumental rhythm, and the third is empty instrumental rhythm. In order to test whether these rhythms are processed differently in working memory, stimulus pairs that were either identical or different were presented with both a 500 ms and a 15000 ms interstimulus intervals (ISIs). The participants were asked to make a same-or-different judgment on each pair while reaction time and error rate were recorded. ii The results indicate that not only musicians react faster and more accurate than non-musicians, but also that there is significant difference between filled instrumental rhythm and empty instrumental rhythm. Immediate recall has faster reaction than delayed recall. The participants showed faster reaction on same condition than on different condition. Musician showed no significant different accuracy for immediate and delayed recalls but non-musicians did. Overall, the same condition is slightly better than different condition. Because of the considerable differences between filled instrumental rhythm and vocal rhythm, the contrast between 'empty' and 'filled' does not explain the difference in cognitive processing between vocal rhythm and instrumental rhythm identified in previous studies. It is proposed that these differences may be caused by voice specific combination of features like amplitude, pitch, spectral, and timbral changes. The findings strengthen arguments for differential cognitive processing of vocal and instrumental music and support the idea of different origins of the two types of music. iii Dedication This thesis is dedicated to my family. iv Acknowledgments Sincere thanks to my advisor, Dr. Udo Will, for his assistance and advising throughout my master’s study and this process. v Vita February 1999 ................................................Busan High School of Arts 2003 ...............................................................B.M. Music Composition, Ewha Womans University 2006 ...............................................................M.M. Music Composition, Ewha Womans University 2009 ...............................................................Researcher, Human-Robot Interaction Research Center, Korea Advanced Institute of Science and Technology 2011 to present ..............................................Researcher, Ewha Music Research Center, Ewha Womans University 2012 to present ..............................................Graduate Research Associate, Department of Music, The Ohio State University vi Publications Jee, E. S., Jeong, Y. J., Kim, C. H., & Kobayashi, H. (2010). Sound design for emotion and intention expression of socially interactive robots. Intelligent Service Robotics, 3(3), 199-206. Jee, E. S., Cheong, Y. J., Kim, C. H., Kwon, D. S., & Kobayashi, H. (2009). Sound production for the emotional expression of socially interactive robots. Cheong, Yong Jeon (2006). An Analysis of “Piano Etude No.1 & No.2” composed by Yong Jeon Cheong. M. M. Thesis. Ewha Womans University (Seoul, Korea) Fields of Study Major Field: Music vii Table of Contents Abstract ............................................................................................................................... ii Dedication .......................................................................................................................... iv Acknowledgments............................................................................................................... v Vita..................................................................................................................................... vi List of Figures ..................................................................................................................... x Chapter 1: Introduction ...................................................................................................... 1 Chapter 2: Backgroud ......................................................................................................... 5 2.1 Sound as communication system .............................................................................. 5 2.2 Voice sound vs. non-voice sound.............................................................................. 6 2.3 Rhythm ...................................................................................................................... 7 2.4 Memory ..................................................................................................................... 9 2.5 Rhythm in memory.................................................................................................. 11 2.6 Filled rhythm vs. empty rhythm.............................................................................. 14 Chapter 3: Experiment ...................................................................................................... 16 3.1 Experiment design................................................................................................... 16 viii 3.2 Experiment materials............................................................................................... 17 3.3 Stimulus set ............................................................................................................. 25 3.4 Participant................................................................................................................ 26 3.5 Equipment ............................................................................................................... 27 3.6 Procedure................................................................................................................. 27 3.7 Data analysis ........................................................................................................... 29 Chapter 4: Experiment results........................................................................................... 31 4.1 Reaction time........................................................................................................... 31 4.2 Accuracy.................................................................................................................. 33 Chapter 5: Discussion ....................................................................................................... 37 Chapter 6: Conclusion....................................................................................................... 42 References......................................................................................................................... 45 ix List of Figures Figure 1. Baddeley’s working memory ........................................................................... 11 Figure 2. Comparison of three different types of corresponding rhythms........................ 20 Figure 3. Comparison of two different interstimulus intervals(ISI) ................................. 22 Figure 4. Difference point in vocal rhythm ...................................................................... 23 Figure 5. Difference point in filled instrumental rhythm.................................................. 24 Figure 6. Difference point in empty instrumental rhythm ................................................ 25 Figure 7. Experimental procedure..................................................................................... 28 Figure 8. The range of correct response............................................................................ 30 Figure 9. Reaction time interaction between stimulus type and same/different condition for non-musicians and musicians...................................................................................... 33 Figure 10. Accuracy interaction between type and condition........................................... 34 Figure 11. Interaction between memory task and same/different condition..................... 35 Figure 12. Interaction between stimulus type and same/different condition for nonmusicians and musicians................................................................................................... 36 x Chapter 1: Introduction Why is it that many cultures seem to prefer vocal to instrumental music? Huntergatherer societies in South-East Asia, Australia and Africa have music dominated by vocal genres, although some, like the Pygmies of Central Africa, also have highly sophisticated instrumental genres. In classical South Asian music we find traditions in which an aesthetic goal of instrumentalists is the imitation of the flexibility of the human voice on their instruments. In some religious traditions music is only accept if it is a form of vocal music. This is, for instance, the case in ‘Beompae(!"; #$),’ one of three traditional Korean vocal genres, and it is performed in the Buddhist ceremony ‘Jae (%; &),’ the monks chanting Sanskrit texts about the life of Buddha (Korea Arts & Culture Education Service, 1980). In the history of sacred music in the West, the Roman Catholic Church had excluded instrumental music from liturgy for a couple of centuries. The trace of this tradition of the Catholic churches still remains in some conservative dioceses that perform only Gregorian chants and a cappella music during the holy week service. For Holy Thursday and Good Friday rituals, these Catholic churches have banned even the organ that has been regarded as a church instrument. In the church, choir organs have supported vocal parts and been used as a substitute for missing voice part in polyphonic music. In addition, in terms of portability, 1 organs built in the church are too large to carry in comparison with other music instruments like lutes. Itinerant musicians like troubadours and trouvère carried lutes on their backs, thus these small movable instruments had an association not to religion but to entertainment. The negative connotations of secular traveling musicians and portable instruments may have been one cause for the exclusion of instrumental performance in the church. Although organs were introduced to the church in about the ninth century, the churches were reluctant to use organs in liturgy until the fifteenth century, which shows that the church distinguished and separated vocal music from instrumental music for a long time (Bowles, 1957; Owen et al, 2001). Until now some Catholic churches do not admit percussion instrument for mass. Why, in the above examples, is a clear line drawn between vocal music and instrumental music? Why do most religious traditions distinguish vocal music from instrumental music? There are clearly a variety of reasons, from the use of words in vocal music to the association of improper behaviors with instrumental music, that contribute to this distinction. Interestingly, however, some recent research has found there are also differences in our cognitive processing between vocal and instrumental music. Could these findings help to understand why the church has been reluctant to use instruments, or why there is a preference for vocal music in many musical traditions? In this thesis, I am not going to discuss the historical or philosophical backgrounds of the exclusion of instrumental music in the church liturgy in early days. Rather, I would like to clarify why we process vocal music differently from instrumental music by focusing on one parameter in music, which is rhythm. More specifically, I 2 concentrate on whether the observed differences between vocal and instrumental rhythms are related to the physical differences between continuous (filled) and discontinuous (empty) sounds. Rhythm can be made of either continuous or discontinuous sounds, depending on the type of instruments and the manner of articulation. For example, string instruments that are bowed (i.e. violin and cello) produce continuous sound unless a performer plucks a string or uses intermittent bowing to produce staccato articulation. For wind and brass instruments, a performer’s regulation of his or her breath determines whether sound is continuous or discontinuous. Likewise, vocalists also can control their vocal cords and respiration to produce either continuous or discontinuous sounds. However, most percussion instruments, if sounded by a performer’s regular beating action, only produce relatively short, discontinuous sounds. Two recent studies by Hung (2011) and Klyn(2012) report behavioral experiments that are the most relevant for this thesis, and they found significant differences between vocal and clapstick rhythm in auditory and memory processing. Their selection of stimuli (vocal and clapstick rhythms) is marked by the contrast between continuous and discontinuous sounds. They used songs from central Australian aboriginal singers for vocal rhythm stimuli and clapstick rhythm for instrumental stimuli. In aboriginal performance context, clapsticks, as an accompanying instrument that has been used for generations, plays a role to maintain rhythm. To begin with, let’s examine physical features of vocal rhythm and clapstick rhythm and find differences between them. Vocal rhythms are determined by a complex 3 combination of duration control, pitch change, intensity regulation and timbre alternation. In addition, phonetic components like consonants and vowels (i.e. swift spectral changes) also contribute to the formation of vocal rhythms. The complicated entanglement of these various components produces the temporal structure of the sound that human can perceive as vocal rhythm. This contrasts with clapstick rhythm perception because the physical characteristics of the clapstick sound do not vary to the same degree as the singing voice due to the brief duration of the sound (i.e. fixed pitch and timbre). In addition, sound energy of the clapstick is condensed in a short period while vocal rhythm has an evenly distributed energy in time, and accordingly, the perception of clapstick rhythm seems to depend more on the number and the timing of successive events, rather than on spectral, pitch or dynamic features. This thesis tests whether the different auditory and memory processing between vocal and instrumental rhythm are caused by the physical differences between voice and clapstick, in other words, continuity (filled) or discontinuity (empty) of sound through the experiment, or whether vocal rhythms are processed differently from instrumental rhythms due to other features like pitch and timbre changes. 4 Chapter 2: Background 2.1. Sound as communication system Any information from the outer world that is perceived through our sensory organs is processed in our mind either consciously or subconsciously. Sound is one of the most important sources of information for human orientation in the environment and survival. For example, we can detect movements of an object by listening to the sound the object produces, and we can also estimate the rough distance of the object when it emits a sound. For instance, we can easily recognize an airplane in air just by listening to sound that the airplane produces even when it flies in clouds. For many organisms, it is essential for survival to perceive sound signals because it allows them to identify dangers or prey. Furthermore, humans have developed two core communication systems; one is language and the other is music. Language and music are our main systems to transmit information through sound that is a wave produced by pressure change in air because sound communication is an economic system. Producing and processing sound signals need less energy than other forms of communication, for instance, a waggle bee must dance by moving its whole body to create signals for communication with other bees. Humans have specially developed sound as a core medium of two communication systems, our ears and brain have specialized and been sensitized to detect and analyze 5 sound that humans have produced. The cone shape of our outer ear is helpful to locate and collect sound signals that are scattered in space. The eardrums and hair cells that are located in our inner ears transform the changes of air pressure, namely sound, into electrical signals that can travel to the brain in milliseconds. Because of the importance of communication through sound, language and music have co-evolved with our brain throughout human history. 2.2 Voice sound vs. non-voice sound There are two ways of sound production in humans, either by using our vocal cords (i.e. vocalization) or by interaction with external objects (i.e. beating a drum or blowing a flute): one is voice sound and the other is non-voice sound. In his comparative study, Fitch (2006) indicated that animals except humans use either voice or non-voice sound only for the communication and suggest that the distinction between voice sound and non-voice sound may provide better understanding of the origin of music. Currently, there are efforts to discover how brains process voice sound differently from non-voice sound. First, Belin et al. (2000) found the existence of voice-selective regions in human auditory cortex, which is the superior temporal sulcus (STS) and suggested that STS is possibly the sensory counterpart of the face-selective area in the visual cortex. Their findings may improve our understanding of how we recognize other people via sound, which is important given that humans build their identity by building relationships with others. Similarly to Belin et al.’s research, a study of Binder et al. (2000) provided evidence of a specialized brain region that is activated preferentially by 6 speech sounds, and concluded that the sensitivity to speech sound in STS is not because of processing lexical-semantic association but because of processing complex acoustic features of speech sound. Next, Levy et al. (2003) pointed out that attention is an important factor for voice specific response (VSR) and proposed that this response is based on voice stimuli for human listeners. Through EEG experiment, they observed that a specific response elicited by human voice at about 320 ms after stimuli presentation and considered this response as VSR because a comparable response was not found when stimuli were non-voice sounds. However, Joanisse and Gati (2003) could not identify speech selective region and suggested the overlapping brain area in processing both speech and non-speech signals. The evidence for different processing between vocal and instrumental music (Belin et al. 2000; Binder et al. 2000; Fecteau et al. 2004; Fitch, 2006) brought up the questions about whether there are differences between vocal and instrumental music in terms of rhythm processing. 2.3 Rhythm Before continuing to review studies about the differences of rhythm processing in vocal and instrumental music, it is necessary to define rhythm. What is rhythm? Rhythm is a problematic concept because there is not just one accepted definition for rhythm and a number of terms are mingled with rhythm. To reduce the confusion between these terms, I operationalize definitions of several terms for this thesis. First of all, the definition of rhythm is a pattern of acoustic events in time. Any type of temporal distribution of events can be rhythm. Next, the first term frequently mixed with rhythm is 7 beat. Especially in this thesis, beat, a single acoustic event in time, can be a temporal reference point. The next term, pulse, refers to a series of regular beats. When beats are equally distributed in time, listeners feel a pulse underlying the music. Beat is often used in the sense of pulse in common sense but beat will be distinguished from pulse here. Third, tempo is a speed of pulse and a measurement of tempo is Beat per Minute (BPM). Fourth, meter is the organization of beats of a pulse along dynamic features, accents. In order to understand musical rhythm processing, Hung (2012) performed a set of experiments from which she expected to get both behavioral and physiological indications for two separate cognitive pathways for vocal and clapstick rhythm. For her behavioral study, rhythm pairs were presented and the participants were asked to make a decision about whether two rhythms are same or different after listening to the stimuli pairs. The data suggested not only that the reaction time (RT) differs depending on both stimulus types (vocal or clapstick rhythm) and conditions (same-or-different) but also that vocal rhythm could be easily influenced by clapstick rhythm depending on the condition types. Although her behavioral study seemed to suggest the existence of two different types of rhythm processing, Hung could not find two completely separate pathways through her fMRI study, one for vocal rhythm and the other for clapstick rhythms. Like Joanisse and Gati (2003), Hung observed overlapping brain regions that were activated by both vocal and clapstick rhythms. Interestingly, her data provided some evidence that our brain treats human voice with preference because vocal rhythm induces a higher level of activation than clapstick rhythm. 8 2.3 Memory Let me briefly discuss memory before examining Klyn’s (2012) study that examines whether rhythmic memory treats vocal and instrumental sounds differently. The term memory refers to the process of encoding, storing and retrieving information, and it has an influence on our experience of music because sound patterns that are directly perceived through our sensory organs remain available for only a limited amount of time. In other words, musical experience is restricted due to the ephemerality of sound, but memory enables us to overcome this restriction. If we cannot memorize the ominous motive of the movie “Jaws,” the minor second progression is not meaningful throughout the movie because we experience that the music is always fresh. The minor second progression motive of the movie ‘Jaws’ makes us experience a tension by expecting a scene that a shark will appear and attack people. In psychology, the paradigm shift from behaviorism to the cognitivism in 60’s renewed researchers’ interest in the mind and research on memory became an important topic that help to understand the working of the mind. Miller’s study on the magic number 7 ± 2 triggered a new surge in memory studies (Miller, 1956). In behaviorism terms, the mind is a black box, and memory that is a part of our mind is not observable and measurable. However, for cognitivists the concepts of mind and memory are central because memory enables us to do what we can do as a human being. Some psychologists claimed that memory is the mark of being a human because we are not able to think the way we do without memory (Malcolm, 1963; Burge (2003). Memory research 9 distinguishes different types of memory, (i.e., sensory memory, working memory, longterm memory etc.) and sub-systems of memory. First of all, sensory memory is divided into iconic memory (vision), echoic memory (audition, olfaction, etc.) and haptic memory (tactile sense) in accordance with different types of sensory information. We can retain external stimuli in our mind from a couple of milliseconds up to about two seconds (Baddeley, 1997) because of the operation of sensory memory. Second, working memory, also known as short-term memory (STM), is immediate, but the information in working memory is not raw sensation. Working memory deals with information that is transferred from sensory memory and transformed so we can manipulate stimuli in mind through rehearsal processes in a stage of working memory. Let’s assume that you smell an odor of a flower that is familiar to you but you cannot remember exactly what it is. As soon as you inhale the flower smell, your sensory memory is activated and working memory starts to search which flowers match to the odor are because it draws your attention, and working memory enables you to compare several flower fragrances in your mind. Baddeley(2000) focused on working memory and established a model of working memory. According to his working memory model, the central executive as a supervisory system controls information by communicating with three slave systems, visuospatial sketchpad, episodic buffer and phonological loop, during the rehearsal period. (see fig. 1) Phonological store is one sub system of phonological loop that can be understood as sensory memory for linguistic sounds, and information in the store decays rapidly. 10 Articulatory loop is the other sub system of phonological loop that makes us keep the information in mind. The figure below is Baddeley’s model of working memory. Figure 1. Baddeley's working memory Working memory may reach as 10 to 12 s even if it lasts on average 3 to 5s (Snyder 2000). It is known that there are individual differences in the capacity of working memory (Daneman and Carpenter, 1980; Swanson, 1993). Last, long-term memory (LTM) is associated with memories that can be recalled after a couple of months or years. LTM generally involves durable memories, but they can be also forgotten. 2.4 Rhythm in memory As an extension of Hung’s behavioral study, Klyn(2012) raised a question about how we remember rhythm and whether vocal rhythm is different from clapstick rhythm in terms of memory. Klyn’s (2012) study consisted of two experiments in order to clarify the difference between vocal and clapstick rhythm in terms of sensory and working 11 memory processing. The first experiment asked participants to identify the sameness or difference of rhythm pairs that present both vocal and clapstick rhythm simultaneously. The pairs were presented with two different interstimulus interval (ISI) durations (0.5s and 12.50s) to investigate different memory conditions (sensory memory and working memory). His data shows a superiority of musicians only in clapstick recognition on the memory task and there was no strong evidence of differences between musician and nonmusician group in the vocal rhythm condition in terms of accuracy. Also, similar decline in accuracy for vocal rhythm was observed across the groups with longer ISI. The participants reacted faster in same condition than in different condition. reaction time shows a similar result to accuracy. Musicians were slight faster than non-musicians. The duration of ISI has a strong effect on participants’ reaction time. The longer ISI seems to require participants to need more time to make a decision. Klyn’s second experiment is a variation of the first experiment that explores possible differences in working memory in rhythm processing by adding two intervening factors; finger tapping and sub-vocal re-articulation. This experiment is designed to test one of Baddeley’s slave systems, phonological loop which deals with sound information. Baddeley had shown that the phonological loop helps to maintain timing of speech events in working memory by doing rehearsal with vocal motor system. Hence, it is thought that it may have a general function for rhythm memory. Klyn raised a question about a role of phonological loop in rhythm memory. The data obtained from musicians showed fingertapping improved performance on rhythm memory for voice and clapstick but sub-vocal re-articulation did not reduce performance in memory of both rhythms. In the sub-vocal 12 condition, the participant repeated the word “the” at the steady tempo without producing the sound “the.” This result indicates that the phonological loop does not seem to be influential on rhythm memory in decision experiments. During the Fall semester of 2012, I ran rhythm reproduction experiment with the same setting as Klyn’s second experiment. I used the same stimuli and same intervening factors of finger-tapping and sub-vocal re-articulation. The only difference was that my experiment asked the participants to reproduce what they listened to. The results of the reproduction experiments not yet been published or presented, but they are different from Klyn’s results because I observed a significant deterioration of accuracy in sub-vocal rearticulation tasks. The contrasting results between the decision and reproduction tasks indicate that phonological loop seems not so much involved in memorization of rhythm per se but in maintaining motor representations of the rhythms for their upcoming execution. Given that this set of experiments shows a significant difference between vocal and clapstick rhythm in terms of human cognitive processing, I would like to get answers to the following questions: why is there a difference between vocal and clapstick rhythm processing? Could this difference be explained by the physical difference of vocal and clapstick rhythm, namely, continuity or discontinuity of sound, or has it something to do with the timbre differences of the sounds? 13 2.5 Filled rhythm vs. empty rhythm In this section, I will clarify filled and empty rhythms with examples of voice and clapstick. One of the rhythmic characteristics of human speech sound is that often there is no silence between successive acoustic events in a phrase. I call the sound that is continuous because of absence of silence between two events, filled interval. If a speaker does not intentionally separate syllables or words, most of vocal rhythms are constituted by continuous sounds. In filled interval, the first sound continues until the following sound event occurs. That is, the interval of time is signaled by the uninterrupted presentation of an auditory stimulus. In music, filled rhythm can be expressed by the term ‘legato.’ In contrast, an empty interval is marked by two consecutive acoustic events where the duration of the first event is shorter than the interval. I call it empty interval because this interval has an empty space, silence, between two events in time. The time of an empty interval is signaled by a short presentation of two sound events; one at the beginning of interval and the other at the end of the interval duration (i.e. two clinking sounds). In terms of sound envelope, a sound event of an empty interval often has a faster attack time and relatively short sustain level and decay time when compared to that of filled interval. Empty interval may be characterized by a musical term ‘staccato.’ One representative example of empty intervals is a clapstick rhythm. Filled and empty intervals have attracted many psychologists who have been interested in human sensation of time. For instance, James (1890) noted that filled 14 intervals seem longer than empty ones of the same duration; that is a phenomenon referred to as the filled interval illusion. Experimenters have found not only that a filled interval is perceived as longer than an empty interval even if both intervals have the same duration (Goldstone and Goldfarb, 1963) but also that timing of filled intervals is more accurate than that of empty ones (Rammsayer and Lima, 1991; Rammsayer and Skrandies, 1998). As Repp et al. (2009) pointed out, most of the psychophysical research on duration perception on filled and empty intervals used only single intervals, so previous studies have certain limitations for our discussion of rhythms. In addition, filled sound has been studied in the relation with perceptual restoration of sound in the context of speech. A couple of studies could not find any superiority of recognition in filled sound when compared to recognition in empty sound (Miler and Licklider, 1950; Dirks and Bower, 1970). However, continuous speech, even if it is filled with noise, made listeners perform more accurate decision than discontinuous speech (Cherry and Wiley, 1967; Holloway, 1970; Powers and Wilcox, 1977; Verschuure and Brocaar, 1983) not only because silences of empty interval may cause false transition by interfering with sound recognition but also because rhythmic introduction of silence disrupts the listener’s perception (Bregman,1990). 15 Chapter 3: Experiment 3.1 Experiment design In this study, I hypothesized that filled and empty rhythms are treated differently in auditory and memory processing and the hypothesis was tested with four factors; type, memory, condition and musician. First, stimuli type is the most important factor in this experiment. Types consist of three different stimuli groups; vocal rhythm, filled instrumental rhythm and empty instrumental rhythms. Vocal rhythms, adopted from songs of central Australian aborigines, were taken as reference rhythms for the creation of stimuli, because filled and empty instrumental rhythms were derived from the vocal ones so the three sets correspond to each other. For details about the three stimuli groups see the below ‘Experiment Materials’ section. The second factor is memory. In this thesis, memory is divided into sensory memory and working memory on a basis of how long information is stored in mind, which is a classical way of labeling memory in psychology. Sensory memory that is also called perceptual memory starts to work the moment our sensations are activated by information and lasts up to two or three seconds. Information that is filtered and extracted with the aid of some mechanisms like attention moves into working memory stage where information is temporarily stored for between about two and 15 seconds. The different 16 memory is tested by two different ISIs; sensory memory for 50 ms and working memory for 15000ms. The third factor is same or different condition. Bamber (1969) suggested same and different responses are based on two different processes and significant reaction time difference between same and different condition were found in both Hung’s and Klyn’s experiments. Specifically, the participants gave their response faster and more accurate in same condition than in different condition in Klyn’s experiment. The fourth factor is musical training. Musical training is one way of developing our sense of time (James, 1890). Gaver (1993) argued that there are two different ways of listening; musical listening and everyday listening and musicians may be trained to identify auditory exemplars relying on subtle acoustical parameters through musical listening for a long period (Chartrand et al. 2008). Klyn’s experiment shows that musicians have shorter reaction time and better accuracy in remembering instrumental rhythms. 3.2 Experiment Materials In experimental design section, I mentioned that this experiment has four factors that are type, memory, condition and musician. However, I considered only three experimental factors when creating stimuli because the musician factor was tested by recruiting appropriate participants, one group of musicians and the other group of nonmusicians. To begin with, I explain how type factor was dealt in detail during stimulus creation. As I mentioned above, this experiment has three different types of stimuli; vocal 17 rhythm, filled instrumental rhythm and empty instrumental rhythm. First, there are fourteen mono sound clips that were sung by an Australian aboriginal male performer were selected as vocal rhythm stimuli. Aboriginal songs were chosen for the experiment because I would like to avoid an influence by sementic processing due to the participants’ language background and understanding of the song texts. The second type, the filled instrumental rhythms were derived from the fourteen vocal rhythms. Through a software, Praat, I could attain information about the quantitative data of amplitude change of each vocal rhythm, and the extracted amplitude data were re-synthesized with a cello monotone sample that were found from University of Iowa Electronic Music Studios. The amplitude re-synthesis causes some problems, for instance, events that I perceived in vocal rhythm disappears in the corresponding filled instrumental rhythm. It is because I contained only intensity information to create to filled instrumental rhythm while vocal rhythm is determined by syllable changes, pitch changes, intensity changes and timbre changes. Before ending up with the cello sample, the re-synthesis was tried with a couple of flute and didgeridoo samples. At the beginning stage of creating filled rhythm stimuli, I planned to extract both amplitude data and pitch contours from vocal rhythm through Praat and tried to re-synthesize them with an instrumental sample, which caused a problem that timbre of instruments is distorted. This is because the frequency range of male human voice does not correspond with the frequency range of the instruments. In specific, the spectral distortion of didgeridoo was the worst, which sounded like a noise 18 from a sine wave generator because the difference of the fundamental frequency (Fo) between didgeridoo and male voice was the widest. In addition to didgeridoo, I also tried re-synthesis with flute but there was still spectral distortion because flute tones that correspond to male voice are located in the low register of the instrument. Low tones of the flute do not have the wide energy distribution in the spectrogram that is necessary to manipulate or re-synthesize the sound in order to keep the flute’s timbre. Therefore, I selected a cello to produce the stimuli because cello tones correspond more closely to a male voice and are less distorted than flute sounds. In the case of aerophone, timbre, the quality of sound, is more subject to the physical structure of the instrument. However, re-synthesizing cello with both amplitude and pitch information of vocal rhythm still had a problem of spectral change in the cello sample. Finally, I ended up with amplitude re-synthesis only. Sound of filled rhythm continues from onset to offset, in other words, amplitude of filled rhythm does not decreased to zero level during the presentation. The third type is empty instrumental rhythm that was derived from the filled rhythms. I could perceive the amplitude peaks of filled rhythms as acoustic event so that the peaks are the reference points to create empty rhythms. The duration of each event in empty rhythms is 80 ms and consists of 30ms onset and 50 ms decay. In order to produce not artificial but natural sound, 30 ms fade-in function was used for onset and 50ms fadeout was done for decay through an audio editing program Audacity. The interval between the end of the event and the onset of the next was ‘silenced’ through the silence 19 generation function of Audacity. The figure 2 shows an example of the three corresponding rhythms that were produced for each file using the Audacity program. Vocal rhythm is a reference sound to filled instrumental rhythm that is also a reference to empty instrumental rhythms. I tried to keep the same number of events across three rhythms because filled and empty instrumental rhythms are derivatives of vocal rhythm so they are meant to correspond to each other. However, filled rhythm is re-synthesized only with amplitude information of vocal rhythm so acoustic-rhythmic events that are not generated by amplitude changes in vocal rhythm are sometimes missing. Figure 2. Examples of the three types of corresponding rhythms used in the study: top is vocal rhythm; middle is filled instrumental rhythm; bottom is empty instrumental rhythm. 20 The top sound envelope in the figure 2 is a sample of vocal rhythm. There are numerous spikes that are not conserved but filled and empty instrumental rhythms have smoother shapes. The different envelope shapes show that vocal rhythm has more information that is missing in filled and empty rhythms. However, the contour of vocal rhythm is similar to that of filled rhythm, which means that vocal and filled rhythm share the same amplitude information. In this filled instrumental rhythm, there are seven conspicuous peaks that we can perceive as sound events because seven major amplitude changes causes the perception of distinct events while other minor amplitude variations are not perceived as discrete events. Therefore, seven events were created in empty rhythm. Next, memory is the second factor in this experiment. Each pair is presented twice, each time with two different inter-stimulus times. To investigate the differences between sensory and working memory, I use a comparison task in which the two pairs to be compared are either separated by a long ISI of 15000 ms for working memory and, in a second presentation, they are separated by a short ISI of 500 ms for sensory memory. The figure 3 is presented to compare two different ISIs. 21 Figure 3. Comparison of two different interstimulus intervals(ISI); the top image shows 15s ISI for long memory and the bottom image has 500 ms ISI for short memory. The last factor that is considered during stimulus creation is same or different condition. Same condition means that two sound clips in a rhythm pair are identical while the rhythm pair in the different condition, consists of one sound clip and its variant. Rhythmic variants were created by a sound editing computer program, CoolEdit. For vocal stimuli they were created by noticeably changing, either by shortening or by lengthening, one of the vowels, with special care taken to keep the sound changes inaudible. The figure 4 shows a comparison of same and different conditions of vocal rhythm. In the figure 4, the variant of the different condition was created through stretching the duration of a vowel. 22 Figure 4. Difference point in vocal rhythm: The first pair consists of two same sound clips (same condition) and the second pair consists of a sound clip and its variant (different condition). The red arrow indicates the difference point between same and different. In the filled different condition, I manipulated the shape of amplitude through Cooledit. Smoothening a sharp amplitude change eliminates a sound event while creating a sharp amplitude change produces one more event in filled rhythm. The figure 5 shows a comparison of same and different conditions of filled instrumental rhythm. In the figure 5, the variant is created by filling up the fourth sharp amplitude change in the different condition, which makes us perceive one less acoustic event. 23 Figure 5. Difference point in filled instrumental rhythm: The first pair consists of two same sound clips (same condition) and the second pair consists of a sound clip and its variant (different condition). The red arrow indicates the difference point between same and different. In empty instrumental condition, the variant is created either by adding one more event or eliminating one event through Audacity. The figure 6 is empty instrumental rhythm that corresponds to the above filled instrumental rhythm. Since we perceive one less event in different condition of filled instrumental rhythm in the figure 5, I eliminate an event of empty rhythm at the location where event in filled rhythm was missing. The location of the difference is same in both filled and empty rhythms. 24 Figure 6. Difference point in empty instrumental rhythm: The first pair consists of two same sound clips (same condition) and the second pair consists of a sound clip and its variant (different condition). The red arrow indicates the difference point between same and different During the creation of variants, I carefully selected the location of difference point to avoid the effect of the location of difference point. I tried to distribute the location of difference points evenly through out the sound clip: five pairs of each type have the difference point in the first third of the clip, four pairs of each types have the difference point in the middle, and four pairs of each type have the difference point in final third. 3.3 Stimulus set Before the experiment is run, all participants had a practice section that consists of a set of trials to familiarize participants with the experiment. One empty instrumental, one filled instrumental and one vocal rhythm pairs with 500 ms ISI, each in the different and same condition, were presented for this trial section. 25 This experiment consisted of two blocks each of which also had two sections. One block consists of two short sections because the stimuli in the block have 500 ms ISI, while stimuli in the remaining two sections in the other block have 1500ms ISI so they are called long sections. Each section has fifty-two rhythm pairs; fourteen stimuli from vocal rhythm, fourteen from filled rhythm and fourteen from empty rhythm. In each condition, seven stimuli are same pairs and other seven stimuli are different pairs. Stimuli in one short section are same to those in one long section, thus stimuli are presented in blocks to avoid successive repetition of same stimuli. In addition, stimuli were randomly presented in the sections. 3.4 Participants Originally twenty-two people participated in this experiment, however, one female non-musician subject was eliminated because she showed 100% wrong responses in one section due to drowsiness; ten subjects are non-musicians (average age 27.1 years, range 20 to 45) and the other eleven subjects are musicians (average age 28.27 years, range 22 to 34, five males and six females). Three criteria to define musicians were applied; one is that a participant should have no more than 5 years of formal musical training and another is that a participant should not have regular musical activity for the past three years. In addition, a participant should define himself or herself as nonmusician. Only participants fulfilling all three criteria were classified as musicians. Musicians are currently studying music at the institute or perform professionally and on average they have played music for 20.55 years ranging from 13 to 26 years. All nonmusicians are not active in music performance and the average period of formal music 26 training is 2.55 years ranging from 0 to 5 years except one case who has eight years of music lesson. Musician group consists of five males and six females and non-musician group consist of five males and five females which has a pretty good balance in terms of gender. Participants were mostly recruited from The Ohio State University community and one subject was from Columbus Community College. 3.5 Equipment The experiment was performed on a Sony VAIO laptop having Window XP operation system. Stimuli were presented with DMDX software through which the data was obtained. 3.6 Procedure To begin with, all participants had listened to an explanation of their rights as experiment subjects and consented to proceed to the experiment verbally before a short questionnaire about age, dominant hand, musical training, etc was given. Next, each participant performed the practice section so they could understand the experiment procedure. Participants were asked to make a decision as quickly and precisely as possible. Their decision was recorded by the track-pad of the laptop. If participants think the sounds are same in a pair, they press the left track pad which showed a “=” sign. If not, they press the right button that had a “!” sign. In the practice section, all participants listened to test examples of vocal, filled instrumental and empty rhythm after seeing the question, “Same or Different?” on the screen. After the training trials, participants 27 performed the four experimental sections. The experiment has two blocks; one block with two long sections and the other block with two short sections. Each section has twentyone stimuli; seven vocal rhythms, seven filled rhythms and seven empty rhythms and, in a section, the stimuli are presented in a random order to reduce order effect. Participants could control break time between each section. The experiment including introduction and training trials takes thirty-five minutes in total. After their performance, the participants were asked about their impression on the experiment and their strategies to perform experiment during debriefing section. All participants received a five-dollar compensation and three of participants received extra credit of their Speech and Hearing Science class on top of it. The short description about the experiment and its procedure was provided as shown below in the flow chart. !"#$%&'()*+"*,'%*-' +.)#,'/0"+1)**%2#"+' %$)0,'+0$3"4,+5' $%467)0*-' 8*,#)-041)*',)' "9:"#2;"*,'%*-' "9:&%*%1)*')<' :#)4"-0#"' =#%414"'+"++2)*'%*-' +0$3"4,+5'<""-$%46' >)0#'+"41)*+'?,@)' &)*7'%*-',@)'+.)#,' $&)46+A'%*-'+0$3"4,+' 4)*,#)&&"-':#)7#"++' $",@""*'+"++2)*+'' Figure 7. Experiment procedure 28 B"$#2"C*7'%*-' 4);:"*+%1)*' 3.7 Data analysis The output from DMDX is raw data of this experiment, thus three kinds of wrong responses were eliminated. The first wrong response is wrong decision. Wrong decision is that participants press “=” button after different stimulus presentation or they press “!” after same stimulus presentation. The second wrong response is decision that is made before difference point. Difference point for the same condition is the time point where the second sound ends, while difference point for the different condition is the time point where the second sound clip differs from the fist sound clip in a pair. The third wrong response is decision that is made after decision period. In order to get participants’ reaction times, I subtract the difference point from recorded time. The raw data is made by DMDX that records time since the second sound clip in a rhythm pair is presented. After having participants’ reaction time, I filtered the reaction time data to remove the second and third types of wrong responses. First, negative reaction times mean that participants made a decision before the difference point, which shows negative reaction times are wrong decision. In addition, latency of reaction time is also considered. Humans need at least 200ms to give responses after listening to stimuli because our brain processes stimuli and operates our motor system to press the button. Second, I also removed reaction times that are longer than 4s after difference point. These values appear outliers in the reaction time distribution graph. We can assume that participants may guess answers if they do not provide their response instantly. The figure 7 shows the location of correct responses in time. 29 Figure 8. The range of correct response: correct responses are located between 200ms and 4000ms after difference point. ISI=interstimulus interval, RT=reaction time Accuracy data is the number of correct responses of each category and transformed into percentage data. This experiment has twelve categories (3x2x2) in within group. Thus, the number of correct responses is divided by total number of responses in each category and multiplied to 100 %. 30 Chapter 4: Experiment Results 4.1 Reaction time A repeated measurement ANOVA with factors type, memory, condition and musician was performed on the reaction time data. All main factors were found to be significant (Type: (F (2,20) = 9.7652; p<0.001); Memory: (F (1,20) = 45.1678; p<0.001); Condition: (F (1,20) = 502.5527; p<0.001; Musician: (F (1,20) = 7.701; p=0.012)). There was a significant 2-way interaction for type and condition (F (2,42) = 29.679; p<0.001), and the interaction between memory and type approached significance (F (2,42) = 3.21; p=0.05 First, the filled instrumental rhythm showed the slowest reaction time (1399ms). The vocal rhythm mean is 1263 ms and the empty instrumental rhythm mean is 1211ms. Second, the mean reaction time for short memory is 1132 ms while the mean reaction time for long memory is 1450 ms. Third, the participants reacted faster in same condition (905 ms) than in different condition (1677ms.) Fourth, musicians were significantly faster (1176 ms) than nonmusicians (1406 ms). Fifth, in the long memory task both musicians and non-musicians showed significantly longer reaction time (1602 ms and 1298 ms, respectively) than in the short memory task (1210 ms and 1053 ms, respectively), and the difference between the two participant groups was not significant (p=0.1351). Although 31 the non-musicians show a proportionally longer reaction time in the long memory condition this difference (245 ms vs 392 ms) is not significant (p=0.135). Next, the interaction of stimulus type and condition was significant, but there was no significant difference between musicians and non-musicians. The means for the same conditions were 916 ms (voice), 873 (filled), and 926 (empty); for the different condition they were 1611 ms (voice), 1924 ms (filled), and 1496 ms (empty). For the different condition, the filled instrumental rhythms had the longest and the empty rhythms the shortest reaction time, in both memory conditions and for both subject groups. In terms of the interaction between condition and memory, for the same condition of the long memory task there is a significant difference (p = 0.005) between musicians and non-musicians, but not between the three stimulus types. For the same condition of the short memory task the empty rhythms have a slightly longer reaction time (not significant) than the two other rhythm types, and there is no significant differences between the subject groups. 32 Figure 9. Reaction time Interaction between stimulus type and same/different condition for non-musicians (left) and musicians (right). d = different condition, s = same condition; einst = empty instrumental rhythms, finst = filled instrumental rhythms, voc =vocal rhythms 4.2 Accuracy For each subject percentage values were calculated from the decision data. The square root of these % values were then subjected to an arcsin transformation and a repeated measure ANOVA with factors type, memory, condition and musician was performed. All four main factors turned out to be significant. The factor Type was highly significant (F (2,19) = 165.27; p<0.001) with 84.3% for vocal, 87.8% for empty, and 54.2% for filled instrumental rhythms; the factor Memory (F (1,19) = 45.1678; p<0.001) indicates that there are more correct responses in the short (78.3%) than in the long memory task (72.8%); the factor Condition was also significant (F (1,19) = 5.595; p=0.028) with 79.1% for the same and 71.9% for the different condition; the factor Musician: (F (1,19) = 7.47; p=0.013) showed that musicians were generally better (80% 33 correct) than non-musicians (71% correct). The below figure 10 overall accuracy of each category. Figure 10. Accuracy interaction between type and condition; left graph for non-musician & right graph for musician;d = different condition, s = same condition; einst = empty instrumental rhythms, finst = filled instrumental rhythms, voc = vocal rhythms. A significant interaction between type and memory (F (1,19) = 6.68; p=0.0033) indicates that the effect of memory is mainly due to the better performance on empty and vocal rhythms in the short memory task, whereas the performance for the filled rhythm is the same for both memory tasks. There is also a significant interaction between memory and condition (F (1,19) = 24.76; p=0.0033), which is caused by a significant decline in accuracy for the same condition and a minor improvement for the different condition between the short and long memory task (see fig.11). 34 Figure 11. Interaction between memory task and same/different condition. l = long memory task, s = short memory task. d (blue) = different condition, s (red) = same condition. There was also significant interaction between type and condition (F (1,19) = 12.47; p<0.001), and a significant 3-way interaction between memory, type, and condition (F (2,38) = 12.39; p<0.001). Both effects can be attributed to the changing responses of the non-musicians in the two memory tasks, whereas the musicians do not show a type * condition interaction (see fig. 12). 35 Figure 12. Interaction between stimulus type and same/different condition for nonmusicians (left) and musicians (right). d = different condition, s = same condition; einst = empty instrumental rhythms, finst = filled instrumental rhythms, voc = vocal rhythms. 36 Chapter 5: Discussion The experimental results are interesting and challenging. In terms of stimulus type, the participants showed the best performance for empty instrumental rhythms. First, it seems that the clear separation between events of empty instrumental rhythms helps participants to detect their rhythms more easily than those of filled instrumental examples. The separation between events of the empty instrumental rhythms is created by zero level amplitude, i.e. by periods of silence, which may help the participants distinguish rhythmic events more clearly. This contrasts with filled rhythms, for which the amplitude envelop does not reach zero level, so the separation area between events (i.e. amplitude maxima) has still continuous sound. This may cause difficulties in detecting rhythmic events in terms of distinguishability: the amplitude maxima are not as clearly separated as they are in empty rhythms. In other words, the clear separation of empty pairs may make participants’ decisions easy, while the absence of the clear distinction of filled pairs probably increases the difficulties. For instance, a couple of non-musician subjects reported that empty rhythms reminded them of Morse code but they could not transform filled rhythms into corresponding mental images. This suggests that differences in participants’ performance on empty and filled rhythms are not so much based on the different physical contents then on the different levels of task difficulty. 37 In the experiment, empty rhythms show shorter reaction time and more accuracy than filled rhythm. The study focused on how fast participants react and how accurate their reactions were. When we examine previous studies on empty and filled interval, the filled intervals resulted in more accurate duration judgments than the empty interval, which seems to contradict the results of this experiment. In those previous studies, the stimuli for filled intervals are stable tones, with sound onset and offset defining the interval boundaries, whereas empty interval stimuli are usually created by two clicking sounds. However, the filled rhythm stimuli of my experiment are made up of several successive sound events between which the spaces are filled with sound, while empty rhythms are composed of brief sound events. This difference in stimuli between previous studies and my experiment most likely explains the different results, namely, that in the present study the empty rhythms produce the better results in this study. Second, it seems that multiple components of vocal rhythm, i.e. amplitude, pitch and timbre changes help the participants recognize vocal rhythmic pattern better when we compare their performance for filled instrumental rhythm and vocal rhythm. In the stimulus creation section I described that filled rhythms are created only with the information of amplitude changes from corresponding vocal rhythm. However, vocal rhythms are established by a combination of amplitude, pitch, spectral and timbre changes. It means that the filled rhythms of my study are one-dimensional (the change of one parameter causes the event segmentation) while the vocal rhythms are multidimensional. In terms of distinguishability, amplitude itself has a limitation and the 38 decreased performance for filled rhythms in comparison to vocal rhythms may due to lack of pitch, spectral and timbre information in the sounds. Third, performance for empty rhythms is slight better than for vocal rhythms. Even if empty rhythm is one-dimensional, the silence between events creates clear distinction, which makes us extract the rhythm information easily. In contrast, vocal rhythms are composed of several components from which we extract rhythm information. In other words, we need to process several different types of information simultaneously. Because of this simultaneous processing, decisions on vocal rhythms do not take more time than those on empty rhythms, at least not for the short memory task. Next, short memory shows faster reaction and more accurate responses than long memory. It seems that for longer ISIs rhythmic memory seems to fade, making the comparison with the just heard pattern more difficult and leading to longer reaction times and higher error rates. Gardiner et al. (1994) proposed a tempting idea to explain this phenomenon by suggesting two separate ways of rehearsal; maintenance rehearsal and elaborative rehearsal. Maintenance rehearsal is related to knowing, not to remembering, while elaborative rehearsal affects remembering, not knowing. Gardiner et al.’s suggestion of two different rehearsal strategies could be applied to music memory research and may be tested in the future studies. Craik & Watkins’s (1973) study indicated that the duration of mental rehearsal is related not to short-term memory but long term memory and maintenance rehearsal may not improve memory performance. In addition, same condition shows faster reactions and slight better accuracy. This result is consistent with that of Klyn’s experiment (2012). It is plausible that same 39 decisions require a different processing than different decisions. A slower reaction time in different decisions may be related to additional re-check process when the brain detects or checks for differences (For a general discussion of different processes involved in same and different decisions see Briggs and Johnsen, 1973; Krueger, 1978). The interaction of type and same condition in non-musician group shows the interesting phenomenon that for ‘different’ decisions there is no significant difference between empty rhythm and filled rhythm and vocal rhythm presents the best accuracy, while musicians show still their best performance in empty rhythm. The reaction of the nonmusicians can be seen as additional support for the existence of a voice specific response (VSR) that was first proposed by Levy et al. (2006). Finally, musical training contributes to the participants’ ability to identify and judge the sameness and difference of stimuli, especially in empty instrumental rhythms. It seems that musical training improves rhythm perception and memory of the participants, but the improvements is strongest for empty rhythms, less so for vocal rhythms, which is also consistent with findings of Klyn (2012). In addition, the musician participants of my experiment showed better performance for filled rhythm than non-musicians. This implies that musicians may have a different representation of instrumental sounds than non-musicians, which improves the ability to recognize instrumental rhythms more quickly and correctly. It is possible that musicians, specifically those with western music training, develop an ability to transform instrumental sound into visual, music notationlike representations, that seem to be more stable and easier to recall. Another possible explanation for this, an association of sound rhythms with bodily actions, comes from the 40 musician participants’ debriefing: A couple of musicians reported that they combine the sound with images of playing the cello specifically for filled rhythms in order to distinguish rhythmic events from vibration; this may have similar stabilizing effects as the notation-like representation. 41 Chapter 6: Conclusion Despite the significant difference between filled and empty rhythm that was identified in this thesis, the results do not support the research hypothesis that differences between vocal and clapstick rhythms, identified in previous studies, are due to the difference between filled and empty rhythms. If the hypothesis had been correct, vocal rhythms should have shown similar results as filled rhythms. However, filled and vocal rhythms produced clearly different responses in the participants, both in terms of accuracy and reaction time. Most likely, the processing differences between clapstick and vocal rhythms have to do with the different dimensionality of the rhythms that was discussed above. The complex combination of features that forms rhythms of the voice sounds is, in our experience, linked to the vocal production of sounds. On the other hand, from an ‘ecological’ perspective, instrumental sounds like that of clapsticks bear the hallmark of sounds produced by human interaction with external (resonating) objects, i.e. sounds produced by hitting, slapping, thumping, etc. The cognitive differences in processing of vocal and instrumental sound may therefore reflect the different actions in and interactions with our environment that led and leads to the production of these two types of sound. The experiment also shows the different influence of sensory and working memory on the participants’ performance and confirms the existence of different cognitive processing for ‘same’ or ‘different’ recognitions and decisions. In addition, 42 musical training seems to improve accuracy in instrumental rhythm, especially for empty rhythm and contribute to the faster perception of empty instrumental rhythm. It indicates that musical education increases the differences between instrumental and vocal rhythm processing, and helps to represent musical rhythms in more stable forms in memory. What does this tell us? Music, as a parallel system to language, provides us a primary modeling system that enables us to communicate and think in unique, nonlinguistic terms (Blacking 1976). Language heavily relies on human voice while music mediates between voice and non-voice sounds, thus we do not tend to make a clear distinction between vocal and instrumental music when we think of music. It may therefore come as a surprise that there are cognitive differences in the processing of vocal and instrumental sound. However, as pointed out above, from an ecological perspective these differences are just a reflection of different interactions in and with our environment and may shed new light on the origins of these different forms of sound production. Through the present study, I tried to contribute a small portion to our understanding of rhythm in the context of music. Musical rhythm can be produced either by human voice, music instrument, or both. Given that using both voice and non-voice sound is quite unique for the human species, studies about vocal rhythm and instrumental rhythm may therefore shed new light on the origins and evolution of music (Fitch, 2006). Although this study approached musical rhythm from a very specific angle, namely the question whether the different cognitive processing of vocal and instrumental rhythms can be explained by the difference between empty and filled rhythms, I nevertheless hope 43 that the present study contributes to tracing back the origin of musical man (Blacking 1973). 44 References Audacity Team. (2013). Audacity. (Version 2.0.3.0.) [Computer program]. Retrieved January, 2013, from http://audacity.sourceforge.net/ Baddeley, A.D. (1997). Human memory: Theory and practice. Psychology Pr. Baddeley, A. D. (2000). The phonological loop and the irrelevant speech effect: Some comments on Neath (2000). Psychonomic Bulletin & Review, 7(3), 544-549. Baddeley, A.D. (2010). Working memory. Current Biology, 20(4), R136–R140. Baddeley, A.D., & Hitch, G. J. (1994). Developments in the concept of working memory. Neuropsychology, 8, 485-493. Baddeley, A. D., & Logie, R. (1992). Auditory imagery and working memory. Auditory imagery, 179–197. Bamber, D. (1969). Reaction times and error rates for “same”-“different” judgments of multidimensional stimuli. Perception & Psychology, 84, 213-219. Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in human auditory cortex. Nature, 403(6767), 309-312. Binder, J. R., Frost, J. A., Hammeke, T. A., Bellgowan, P. S. F., Springer, J. A., Kaufman, J. N., & Possing, E. T. (2000). Human temporal lobe activation by speech and nonspeech sounds. Cerebral Cortex, 10(5), 512-528. Blacking, J. (1973). How musical is man?. University of Washington Press. Blacking, J. (1992). The biology of music-making. In H. Meyer (Ed.), Ethnomusicology: An Introduction (pp. 301-314). New York, NY: Norton. Bregman, A. S. (1990). Auditory scene analysis. MIT Press: Cambridge, MA Briggs, G. E., & Johnsen, A. M. (1973). On the nature of control processing in choice reactions. Memory & Cognition, 1, 91-100. Boersma, P., & Weenink, D. (2013). Praat: doing phonetics by computer (Version 5.3.59) [Computer program]. Retrieved January, 2013, from http://www.praat.org/ 45 Bowles, E. A. (1957). Were Musical Instruments Used in the Liturgical Service during the Middle Ages?. The Galpin Society Journal, 10, 40-56. Burge, T. (2003). Memory and persons. The Philosophical Review, 112(3), 289-337. Chartrand, J. P., Peretz, I., & Belin, P. (2008). Auditory recognition expertise and domain specificity. Brain research, 1220, 191-198. Cherry, C., & Wiley, R. (1967). Speech communication in very noisy environments. Nature, 214, 1164. Craik, F. I., & Watkins, M. J. (1973). The role of rehearsal in short-term memory. Journal of verbal learning and verbal behavior, 12(6), 599-607 Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and reading. Journal of verbal learning and verbal behavior, 19(4), 450-466 Dirks, D. D., & Bower, D. (1970). Effect of forward and backward masking on speech intelligibility. The Journal of the Acoustical Society of America, 47, 1003. Dyirbal song poetry traditional songs of an Australian rainforest people. (1996). Mascot, N.S.W :: Larrikin. Fecteau, S., Armony, J. L., Joanette, Y., & Belin, P. (2004). Is voice processing speciesspecific in human auditory cortex? An fMRI study. NeuroImage, 23(3), 840-848. Fitch, W. T. (2006). The biology and evolution of music: A comparative perspective. Cognition, 100(1), 173-215. Forster, K. I., & Forster, J. C. (2003). DMDX: A Windows display program with millisecond accuracy. Behavior Research Methods, 35(1), 116–124. Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.), The psychology of music (p. 149-180). New York: Academic Press. Gardiner, J. M., Gawlik, B., & Richardson-Klavehn, A. (1994). Maintenance rehearsal affects knowing, not remembering; elaborative rehearsal affects remembering, not knowing. Psychonomic Bulletin & Review, 1(1), 107-110. Gaver, W.W. (1993). What in the world do we hear?: An ecological approach to auditory event perception. Ecological Psychology, 5(1), 1-29. 46 Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience, 19(5), 893-906. Goldfarb, J. L., & Goldstone, S. (1963). Time judgment: A comparison of filled and unfilled durations. Perceptual and Motor Skills, 16(2), 376-376 Goldstone, S., & Goldfarb, J. L. (1963). Judgment of filled and unfilled durations: Intersensory factors. Perceptual and motor skills, 17(3), 763-774. Holloway, C. M. (1970). Passing the strongly voiced components of noisy speech. Hung, T. H. (2011). One music? Two musics? How many musics? Cognitive ethnomusicological, behavioral, and fMRI study on vocal and instrumental rhythm processing (Doctoral dissertation, Ohio State University). Ihle, R. C., & Wilsoncroft, W. E. (1983). The filled-duration illusion: limits of duration of interval and auditory fillers. Perceptual and motor skills, 56(2), 655-660. Joanisse, M. F., & Gati, J. S. (2003). Overlapping neural regions for processing rapid temporal cues in speech and nonspeech signals. Neuroimage, 19(1), 64-79. James, W. (1890). 1950) The Principles of Psychology. Klyn, N. A. M. (2012). Working Memory for Rhythm (Master’s Thesis, The Ohio State University). Korea Arts & Culture Education Service. (1980). Beompae : Systematic inventory of folk music 4. Seoul, Korea: Korea Arts & Culture Education Service. Krueger, L.E. (1978). A theory of perceptual matching. Psychological review, 85, 278304. Levy, D. A., Granot, R., & Bentin, S. (2003). Neural sensitivity to human voices: ERP evidence of task and attentional influences. Psychophysiology, 40(2), 291-305. Malcolm, N. (1963). Three lectures on memory. Knowledge and certainty, 187-240. Marsh, E., & Bower, G. (2004). The role of rehearsal and generation in false memory creation. Memory (Hove, England), 12(6), 748-761. Merker, B., & Brown, S. (2001). The Origins of Music (New edition.). The MIT Press. Mithen, S. J. (2005) The Singing Neanderthals: the Origins of Music, Language, Mind 47 and Body. Cambridge, Mass. : Harvard University Press, 2006 Miller, G. A. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological Review, 63, 81-97. Miller George, A. (1957). The magic number seven, plus or minus two. The Psychological Review, 63, 81-97. Miller, G. A., & Licklider, J. C. R. (1950). The intelligibility of interrupted speech. The Journal of the Acoustical Society of America, 22, 167. Owen, B. et al. (2001). Organ. Grove Music Online. Oxford Music Online. Oxford University Press, Retrieved April, 2013, http://www.oxfordmusiconline.com.proxy.lib.ohiostate.edu/subscriber/article/grove/music/44010. Peretz, I., & Zatorre, R. J. (2005). Brain organization for music processing. Annu. Rev. Psychol., 56, 89-114. Powers, G. L., & Wilcox, J. C. (1977). Intelligibility of temporally interrupted speech with and without intervening noise. The Journal of the Acoustical Society of America, 61, 195. Rammsayer, T. H., & Lima, S. D. (1991). Duration discrimination of filled and empty auditory intervals: Cognitive and perceptual factors. Perception & Psychophysics, 50(6), 565-574. Rammsayer, T. H., & Skrandies, W. (1998). Stimulus characteristics and temporal information processing: Psychophysical and electrophysiological data. Journal of Psychophysiology. Repp, B. H., & Bruttomesso, M. (2009). A filled duration illusion in music: Effects of metrical subdivision on the perception and production of beat tempo. Advances in Cognitive Psychology, 5, 114. Saito, S. (2001). The phonological loop and memory for rhythms: An individual differences approach. Memory, 9(4), 313–322. Snyder, B. (2000). Music and memory: an introduction. The MIT Press. Swanson, H. L. (1993). Individual differences in working memory: A model testing and subgroup analysis of learning-disabled and skilled readers. Intelligence. University of Iowa Electronic Music Studio. (2012). Biofuels. Retrieved January, 2013, 48 from http://theremin.music.uiowa.edu/MIScello.html Verschuure, J., & Brocaar, M. P. (1983). Intelligibility of interrupted meaningful and nonsense speech with and without intervening noise. Perception & psychophysics, 33(3), 232-240. Wallaschek, R. & Cattell, J. M. (1891). On the Origin of Music. Mind. 16(63), 375-388. Will, U. (2004). Oral memory in Australian song performance and the Parry-Kirk debate: a cognitive ethnomusicological perspective. International Study Group on Music Archaeology, 10, 1–29. Korea Arts & Culture Education Service. (1980). Beompae : Systematic inventory of folk music 4. Seoul, Korea: Korea Arts & Culture Education Service. 49