Empty and filled rhythms - OhioLINK Electronic Theses and

advertisement
Empty and filled rhythms:
An inquiry into the different cognitive processing of vocal and instrumental rhythms
Thesis
Presented in Partial Fulfillment of the Requirements for the Degree Master of Arts
in the Graduate School of The Ohio State University
By
Yong Jeon Cheong
Graduate Program in Music
The Ohio State University
2013
Master’s Examination Committee:
Professor Arved Ashby
Professor Jan Radzynski
Professor Udo Will, Advisor
Copyright by
Yong Jeon Cheong
2013
Abstract
In contemporary thinking about music the distinction between vocal and
instrumental music does not play a decisive role. However, physiological and imaging
studies have shown that our brain process human speech sound differently from nonspeech sound (Belin et al., 2000; Levy et al., 2003), and Hung (2011) found differences
in auditory processing of vocal rhythm and clapstick rhythm. Furthermore, a behavioral
study of Klyn (2012) indicates that there are also differences in working memory
between vocal rhythm and clapstick rhythm. The goal of this thesis to test whether these
differences between vocal and clapstick rhythms are due to one specific physical
difference between voice and clapstick sounds, that is, the continuity (filled rhythm) and
discontinuity (empty rhythm) of the events they contain.
In the experiment, two participant groups (musicians and non-musicians) listened
to stimulus pairs of three different sound types. The first type is vocal rhythm, the second
is filled instrumental rhythm, and the third is empty instrumental rhythm. In order to test
whether these rhythms are processed differently in working memory, stimulus pairs that
were either identical or different were presented with both a 500 ms and a 15000 ms
interstimulus intervals (ISIs). The participants were asked to make a same-or-different
judgment on each pair while reaction time and error rate were recorded.
ii
The results indicate that not only musicians react faster and more accurate than
non-musicians, but also that there is significant difference between filled instrumental
rhythm and empty instrumental rhythm. Immediate recall has faster reaction than delayed
recall. The participants showed faster reaction on same condition than on different
condition. Musician showed no significant different accuracy for immediate and delayed
recalls but non-musicians did. Overall, the same condition is slightly better than different
condition.
Because of the considerable differences between filled instrumental rhythm and
vocal rhythm, the contrast between 'empty' and 'filled' does not explain the difference in
cognitive processing between vocal rhythm and instrumental rhythm identified in
previous studies. It is proposed that these differences may be caused by voice specific
combination of features like amplitude, pitch, spectral, and timbral changes. The findings
strengthen arguments for differential cognitive processing of vocal and instrumental
music and support the idea of different origins of the two types of music.
iii
Dedication
This thesis is dedicated to my family.
iv
Acknowledgments
Sincere thanks to my advisor, Dr. Udo Will, for his assistance and advising throughout
my master’s study and this process.
v
Vita
February 1999 ................................................Busan High School of Arts
2003 ...............................................................B.M. Music Composition, Ewha Womans
University
2006 ...............................................................M.M. Music Composition, Ewha Womans
University
2009 ...............................................................Researcher, Human-Robot Interaction
Research Center, Korea Advanced Institute
of Science and Technology
2011 to present ..............................................Researcher, Ewha Music Research Center,
Ewha Womans University
2012 to present ..............................................Graduate Research Associate, Department
of Music, The Ohio State University
vi
Publications
Jee, E. S., Jeong, Y. J., Kim, C. H., & Kobayashi, H. (2010). Sound design for emotion
and intention expression of socially interactive robots. Intelligent Service Robotics, 3(3),
199-206.
Jee, E. S., Cheong, Y. J., Kim, C. H., Kwon, D. S., & Kobayashi, H. (2009). Sound
production for the emotional expression of socially interactive robots.
Cheong, Yong Jeon (2006). An Analysis of “Piano Etude No.1 & No.2” composed by
Yong Jeon Cheong. M. M. Thesis. Ewha Womans University (Seoul, Korea)
Fields of Study
Major Field: Music
vii
Table of Contents
Abstract ............................................................................................................................... ii
Dedication .......................................................................................................................... iv
Acknowledgments............................................................................................................... v
Vita..................................................................................................................................... vi
List of Figures ..................................................................................................................... x
Chapter 1: Introduction ...................................................................................................... 1
Chapter 2: Backgroud ......................................................................................................... 5
2.1 Sound as communication system .............................................................................. 5
2.2 Voice sound vs. non-voice sound.............................................................................. 6
2.3 Rhythm ...................................................................................................................... 7
2.4 Memory ..................................................................................................................... 9
2.5 Rhythm in memory.................................................................................................. 11
2.6 Filled rhythm vs. empty rhythm.............................................................................. 14
Chapter 3: Experiment ...................................................................................................... 16
3.1 Experiment design................................................................................................... 16
viii
3.2 Experiment materials............................................................................................... 17
3.3 Stimulus set ............................................................................................................. 25
3.4 Participant................................................................................................................ 26
3.5 Equipment ............................................................................................................... 27
3.6 Procedure................................................................................................................. 27
3.7 Data analysis ........................................................................................................... 29
Chapter 4: Experiment results........................................................................................... 31
4.1 Reaction time........................................................................................................... 31
4.2 Accuracy.................................................................................................................. 33
Chapter 5: Discussion ....................................................................................................... 37
Chapter 6: Conclusion....................................................................................................... 42
References......................................................................................................................... 45
ix
List of Figures
Figure 1. Baddeley’s working memory ........................................................................... 11
Figure 2. Comparison of three different types of corresponding rhythms........................ 20
Figure 3. Comparison of two different interstimulus intervals(ISI) ................................. 22
Figure 4. Difference point in vocal rhythm ...................................................................... 23
Figure 5. Difference point in filled instrumental rhythm.................................................. 24
Figure 6. Difference point in empty instrumental rhythm ................................................ 25
Figure 7. Experimental procedure..................................................................................... 28
Figure 8. The range of correct response............................................................................ 30
Figure 9. Reaction time interaction between stimulus type and same/different condition
for non-musicians and musicians...................................................................................... 33
Figure 10. Accuracy interaction between type and condition........................................... 34
Figure 11. Interaction between memory task and same/different condition..................... 35
Figure 12. Interaction between stimulus type and same/different condition for nonmusicians and musicians................................................................................................... 36
x
Chapter 1: Introduction
Why is it that many cultures seem to prefer vocal to instrumental music? Huntergatherer societies in South-East Asia, Australia and Africa have music dominated by
vocal genres, although some, like the Pygmies of Central Africa, also have highly
sophisticated instrumental genres. In classical South Asian music we find traditions in
which an aesthetic goal of instrumentalists is the imitation of the flexibility of the human
voice on their instruments. In some religious traditions music is only accept if it is a form
of vocal music. This is, for instance, the case in ‘Beompae(!"; #$),’ one of three
traditional Korean vocal genres, and it is performed in the Buddhist ceremony ‘Jae (%;
&),’ the monks chanting Sanskrit texts about the life of Buddha (Korea Arts & Culture
Education Service, 1980).
In the history of sacred music in the West, the Roman Catholic Church had
excluded instrumental music from liturgy for a couple of centuries. The trace of this
tradition of the Catholic churches still remains in some conservative dioceses that
perform only Gregorian chants and a cappella music during the holy week service. For
Holy Thursday and Good Friday rituals, these Catholic churches have banned even the
organ that has been regarded as a church instrument.
In the church, choir organs have supported vocal parts and been used as a
substitute for missing voice part in polyphonic music. In addition, in terms of portability,
1
organs built in the church are too large to carry in comparison with other music
instruments like lutes. Itinerant musicians like troubadours and trouvère carried lutes on
their backs, thus these small movable instruments had an association not to religion but to
entertainment. The negative connotations of secular traveling musicians and portable
instruments may have been one cause for the exclusion of instrumental performance in
the church. Although organs were introduced to the church in about the ninth century, the
churches were reluctant to use organs in liturgy until the fifteenth century, which shows
that the church distinguished and separated vocal music from instrumental music for a
long time (Bowles, 1957; Owen et al, 2001). Until now some Catholic churches do not
admit percussion instrument for mass.
Why, in the above examples, is a clear line drawn between vocal music and
instrumental music? Why do most religious traditions distinguish vocal music from
instrumental music? There are clearly a variety of reasons, from the use of words in vocal
music to the association of improper behaviors with instrumental music, that contribute to
this distinction. Interestingly, however, some recent research has found there are also
differences in our cognitive processing between vocal and instrumental music. Could
these findings help to understand why the church has been reluctant to use instruments, or
why there is a preference for vocal music in many musical traditions?
In this thesis, I am not going to discuss the historical or philosophical
backgrounds of the exclusion of instrumental music in the church liturgy in early days.
Rather, I would like to clarify why we process vocal music differently from instrumental
music by focusing on one parameter in music, which is rhythm. More specifically, I
2
concentrate on whether the observed differences between vocal and instrumental rhythms
are related to the physical differences between continuous (filled) and discontinuous
(empty) sounds.
Rhythm can be made of either continuous or discontinuous sounds, depending on
the type of instruments and the manner of articulation. For example, string instruments
that are bowed (i.e. violin and cello) produce continuous sound unless a performer plucks
a string or uses intermittent bowing to produce staccato articulation. For wind and brass
instruments, a performer’s regulation of his or her breath determines whether sound is
continuous or discontinuous. Likewise, vocalists also can control their vocal cords and
respiration to produce either continuous or discontinuous sounds. However, most
percussion instruments, if sounded by a performer’s regular beating action, only produce
relatively short, discontinuous sounds.
Two recent studies by Hung (2011) and Klyn(2012) report behavioral
experiments that are the most relevant for this thesis, and they found significant
differences between vocal and clapstick rhythm in auditory and memory processing.
Their selection of stimuli (vocal and clapstick rhythms) is marked by the contrast
between continuous and discontinuous sounds. They used songs from central Australian
aboriginal singers for vocal rhythm stimuli and clapstick rhythm for instrumental stimuli.
In aboriginal performance context, clapsticks, as an accompanying instrument that has
been used for generations, plays a role to maintain rhythm.
To begin with, let’s examine physical features of vocal rhythm and clapstick
rhythm and find differences between them. Vocal rhythms are determined by a complex
3
combination of duration control, pitch change, intensity regulation and timbre alternation.
In addition, phonetic components like consonants and vowels (i.e. swift spectral changes)
also contribute to the formation of vocal rhythms. The complicated entanglement of these
various components produces the temporal structure of the sound that human can
perceive as vocal rhythm. This contrasts with clapstick rhythm perception because the
physical characteristics of the clapstick sound do not vary to the same degree as the
singing voice due to the brief duration of the sound (i.e. fixed pitch and timbre). In
addition, sound energy of the clapstick is condensed in a short period while vocal rhythm
has an evenly distributed energy in time, and accordingly, the perception of clapstick
rhythm seems to depend more on the number and the timing of successive events, rather
than on spectral, pitch or dynamic features.
This thesis tests whether the different auditory and memory processing between
vocal and instrumental rhythm are caused by the physical differences between voice and
clapstick, in other words, continuity (filled) or discontinuity (empty) of sound through the
experiment, or whether vocal rhythms are processed differently from instrumental
rhythms due to other features like pitch and timbre changes.
4
Chapter 2: Background
2.1. Sound as communication system
Any information from the outer world that is perceived through our sensory
organs is processed in our mind either consciously or subconsciously. Sound is one of the
most important sources of information for human orientation in the environment and
survival. For example, we can detect movements of an object by listening to the sound
the object produces, and we can also estimate the rough distance of the object when it
emits a sound. For instance, we can easily recognize an airplane in air just by listening to
sound that the airplane produces even when it flies in clouds. For many organisms, it is
essential for survival to perceive sound signals because it allows them to identify dangers
or prey.
Furthermore, humans have developed two core communication systems; one is
language and the other is music. Language and music are our main systems to transmit
information through sound that is a wave produced by pressure change in air because
sound communication is an economic system. Producing and processing sound signals
need less energy than other forms of communication, for instance, a waggle bee must
dance by moving its whole body to create signals for communication with other bees.
Humans have specially developed sound as a core medium of two communication
systems, our ears and brain have specialized and been sensitized to detect and analyze
5
sound that humans have produced. The cone shape of our outer ear is helpful to locate
and collect sound signals that are scattered in space. The eardrums and hair cells that are
located in our inner ears transform the changes of air pressure, namely sound, into
electrical signals that can travel to the brain in milliseconds. Because of the importance of
communication through sound, language and music have co-evolved with our brain
throughout human history.
2.2 Voice sound vs. non-voice sound
There are two ways of sound production in humans, either by using our vocal
cords (i.e. vocalization) or by interaction with external objects (i.e. beating a drum or
blowing a flute): one is voice sound and the other is non-voice sound. In his comparative
study, Fitch (2006) indicated that animals except humans use either voice or non-voice
sound only for the communication and suggest that the distinction between voice sound
and non-voice sound may provide better understanding of the origin of music.
Currently, there are efforts to discover how brains process voice sound differently
from non-voice sound. First, Belin et al. (2000) found the existence of voice-selective
regions in human auditory cortex, which is the superior temporal sulcus (STS) and
suggested that STS is possibly the sensory counterpart of the face-selective area in the
visual cortex. Their findings may improve our understanding of how we recognize other
people via sound, which is important given that humans build their identity by building
relationships with others. Similarly to Belin et al.’s research, a study of Binder et al.
(2000) provided evidence of a specialized brain region that is activated preferentially by
6
speech sounds, and concluded that the sensitivity to speech sound in STS is not because
of processing lexical-semantic association but because of processing complex acoustic
features of speech sound. Next, Levy et al. (2003) pointed out that attention is an
important factor for voice specific response (VSR) and proposed that this response is
based on voice stimuli for human listeners. Through EEG experiment, they observed that
a specific response elicited by human voice at about 320 ms after stimuli presentation and
considered this response as VSR because a comparable response was not found when
stimuli were non-voice sounds. However, Joanisse and Gati (2003) could not identify
speech selective region and suggested the overlapping brain area in processing both
speech and non-speech signals. The evidence for different processing between vocal and
instrumental music (Belin et al. 2000; Binder et al. 2000; Fecteau et al. 2004; Fitch,
2006) brought up the questions about whether there are differences between vocal and
instrumental music in terms of rhythm processing.
2.3 Rhythm
Before continuing to review studies about the differences of rhythm processing in
vocal and instrumental music, it is necessary to define rhythm. What is rhythm? Rhythm
is a problematic concept because there is not just one accepted definition for rhythm and
a number of terms are mingled with rhythm. To reduce the confusion between these
terms, I operationalize definitions of several terms for this thesis. First of all, the
definition of rhythm is a pattern of acoustic events in time. Any type of temporal
distribution of events can be rhythm. Next, the first term frequently mixed with rhythm is
7
beat. Especially in this thesis, beat, a single acoustic event in time, can be a temporal
reference point. The next term, pulse, refers to a series of regular beats. When beats are
equally distributed in time, listeners feel a pulse underlying the music. Beat is often used
in the sense of pulse in common sense but beat will be distinguished from pulse here.
Third, tempo is a speed of pulse and a measurement of tempo is Beat per Minute (BPM).
Fourth, meter is the organization of beats of a pulse along dynamic features, accents.
In order to understand musical rhythm processing, Hung (2012) performed a set
of experiments from which she expected to get both behavioral and physiological
indications for two separate cognitive pathways for vocal and clapstick rhythm. For her
behavioral study, rhythm pairs were presented and the participants were asked to make a
decision about whether two rhythms are same or different after listening to the stimuli
pairs. The data suggested not only that the reaction time (RT) differs depending on both
stimulus types (vocal or clapstick rhythm) and conditions (same-or-different) but also
that vocal rhythm could be easily influenced by clapstick rhythm depending on the
condition types. Although her behavioral study seemed to suggest the existence of two
different types of rhythm processing, Hung could not find two completely separate
pathways through her fMRI study, one for vocal rhythm and the other for clapstick
rhythms. Like Joanisse and Gati (2003), Hung observed overlapping brain regions that
were activated by both vocal and clapstick rhythms. Interestingly, her data provided some
evidence that our brain treats human voice with preference because vocal rhythm induces
a higher level of activation than clapstick rhythm.
8
2.3 Memory
Let me briefly discuss memory before examining Klyn’s (2012) study that
examines whether rhythmic memory treats vocal and instrumental sounds differently.
The term memory refers to the process of encoding, storing and retrieving information,
and it has an influence on our experience of music because sound patterns that are
directly perceived through our sensory organs remain available for only a limited amount
of time. In other words, musical experience is restricted due to the ephemerality of sound,
but memory enables us to overcome this restriction. If we cannot memorize the ominous
motive of the movie “Jaws,” the minor second progression is not meaningful throughout
the movie because we experience that the music is always fresh. The minor second
progression motive of the movie ‘Jaws’ makes us experience a tension by expecting a
scene that a shark will appear and attack people.
In psychology, the paradigm shift from behaviorism to the cognitivism in 60’s
renewed researchers’ interest in the mind and research on memory became an important
topic that help to understand the working of the mind. Miller’s study on the magic
number 7 ± 2 triggered a new surge in memory studies (Miller, 1956). In behaviorism
terms, the mind is a black box, and memory that is a part of our mind is not observable
and measurable. However, for cognitivists the concepts of mind and memory are central
because memory enables us to do what we can do as a human being. Some psychologists
claimed that memory is the mark of being a human because we are not able to think the
way we do without memory (Malcolm, 1963; Burge (2003). Memory research
9
distinguishes different types of memory, (i.e., sensory memory, working memory, longterm memory etc.) and sub-systems of memory.
First of all, sensory memory is divided into iconic memory (vision), echoic
memory (audition, olfaction, etc.) and haptic memory (tactile sense) in accordance with
different types of sensory information. We can retain external stimuli in our mind from a
couple of milliseconds up to about two seconds (Baddeley, 1997) because of the
operation of sensory memory. Second, working memory, also known as short-term
memory (STM), is immediate, but the information in working memory is not raw
sensation. Working memory deals with information that is transferred from sensory
memory and transformed so we can manipulate stimuli in mind through rehearsal
processes in a stage of working memory. Let’s assume that you smell an odor of a flower
that is familiar to you but you cannot remember exactly what it is. As soon as you inhale
the flower smell, your sensory memory is activated and working memory starts to search
which flowers match to the odor are because it draws your attention, and working
memory enables you to compare several flower fragrances in your mind.
Baddeley(2000) focused on working memory and established a model of working
memory. According to his working memory model, the central executive as a supervisory
system controls information by communicating with three slave systems, visuospatial
sketchpad, episodic buffer and phonological loop, during the rehearsal period. (see fig. 1)
Phonological store is one sub system of phonological loop that can be understood as
sensory memory for linguistic sounds, and information in the store decays rapidly.
10
Articulatory loop is the other sub system of phonological loop that makes us keep the
information in mind. The figure below is Baddeley’s model of working memory.
Figure 1. Baddeley's working memory
Working memory may reach as 10 to 12 s even if it lasts on average 3 to 5s
(Snyder 2000). It is known that there are individual differences in the capacity of working
memory (Daneman and Carpenter, 1980; Swanson, 1993). Last, long-term memory
(LTM) is associated with memories that can be recalled after a couple of months or years.
LTM generally involves durable memories, but they can be also forgotten.
2.4 Rhythm in memory
As an extension of Hung’s behavioral study, Klyn(2012) raised a question about
how we remember rhythm and whether vocal rhythm is different from clapstick rhythm
in terms of memory. Klyn’s (2012) study consisted of two experiments in order to clarify
the difference between vocal and clapstick rhythm in terms of sensory and working
11
memory processing. The first experiment asked participants to identify the sameness or
difference of rhythm pairs that present both vocal and clapstick rhythm simultaneously.
The pairs were presented with two different interstimulus interval (ISI) durations (0.5s
and 12.50s) to investigate different memory conditions (sensory memory and working
memory). His data shows a superiority of musicians only in clapstick recognition on the
memory task and there was no strong evidence of differences between musician and nonmusician group in the vocal rhythm condition in terms of accuracy. Also, similar decline
in accuracy for vocal rhythm was observed across the groups with longer ISI. The
participants reacted faster in same condition than in different condition. reaction time
shows a similar result to accuracy. Musicians were slight faster than non-musicians. The
duration of ISI has a strong effect on participants’ reaction time. The longer ISI seems to
require participants to need more time to make a decision.
Klyn’s second experiment is a variation of the first experiment that explores
possible differences in working memory in rhythm processing by adding two intervening
factors; finger tapping and sub-vocal re-articulation. This experiment is designed to test
one of Baddeley’s slave systems, phonological loop which deals with sound information.
Baddeley had shown that the phonological loop helps to maintain timing of speech events
in working memory by doing rehearsal with vocal motor system. Hence, it is thought that
it may have a general function for rhythm memory. Klyn raised a question about a role of
phonological loop in rhythm memory. The data obtained from musicians showed fingertapping improved performance on rhythm memory for voice and clapstick but sub-vocal
re-articulation did not reduce performance in memory of both rhythms. In the sub-vocal
12
condition, the participant repeated the word “the” at the steady tempo without producing
the sound “the.” This result indicates that the phonological loop does not seem to be
influential on rhythm memory in decision experiments.
During the Fall semester of 2012, I ran rhythm reproduction experiment with the
same setting as Klyn’s second experiment. I used the same stimuli and same intervening
factors of finger-tapping and sub-vocal re-articulation. The only difference was that my
experiment asked the participants to reproduce what they listened to. The results of the
reproduction experiments not yet been published or presented, but they are different from
Klyn’s results because I observed a significant deterioration of accuracy in sub-vocal rearticulation tasks. The contrasting results between the decision and reproduction tasks
indicate that phonological loop seems not so much involved in memorization of rhythm
per se but in maintaining motor representations of the rhythms for their upcoming
execution.
Given that this set of experiments shows a significant difference between vocal
and clapstick rhythm in terms of human cognitive processing, I would like to get answers
to the following questions: why is there a difference between vocal and clapstick rhythm
processing? Could this difference be explained by the physical difference of vocal and
clapstick rhythm, namely, continuity or discontinuity of sound, or has it something to do
with the timbre differences of the sounds?
13
2.5 Filled rhythm vs. empty rhythm
In this section, I will clarify filled and empty rhythms with examples of voice and
clapstick. One of the rhythmic characteristics of human speech sound is that often there is
no silence between successive acoustic events in a phrase. I call the sound that is
continuous because of absence of silence between two events, filled interval. If a speaker
does not intentionally separate syllables or words, most of vocal rhythms are constituted
by continuous sounds. In filled interval, the first sound continues until the following
sound event occurs. That is, the interval of time is signaled by the uninterrupted
presentation of an auditory stimulus. In music, filled rhythm can be expressed by the term
‘legato.’
In contrast, an empty interval is marked by two consecutive acoustic events where
the duration of the first event is shorter than the interval. I call it empty interval because
this interval has an empty space, silence, between two events in time. The time of an
empty interval is signaled by a short presentation of two sound events; one at the
beginning of interval and the other at the end of the interval duration (i.e. two clinking
sounds). In terms of sound envelope, a sound event of an empty interval often has a faster
attack time and relatively short sustain level and decay time when compared to that of
filled interval. Empty interval may be characterized by a musical term ‘staccato.’ One
representative example of empty intervals is a clapstick rhythm.
Filled and empty intervals have attracted many psychologists who have been
interested in human sensation of time. For instance, James (1890) noted that filled
14
intervals seem longer than empty ones of the same duration; that is a phenomenon
referred to as the filled interval illusion. Experimenters have found not only that a filled
interval is perceived as longer than an empty interval even if both intervals have the same
duration (Goldstone and Goldfarb, 1963) but also that timing of filled intervals is more
accurate than that of empty ones (Rammsayer and Lima, 1991; Rammsayer and
Skrandies, 1998). As Repp et al. (2009) pointed out, most of the psychophysical research
on duration perception on filled and empty intervals used only single intervals, so
previous studies have certain limitations for our discussion of rhythms.
In addition, filled sound has been studied in the relation with perceptual
restoration of sound in the context of speech. A couple of studies could not find any
superiority of recognition in filled sound when compared to recognition in empty sound
(Miler and Licklider, 1950; Dirks and Bower, 1970). However, continuous speech, even
if it is filled with noise, made listeners perform more accurate decision than
discontinuous speech (Cherry and Wiley, 1967; Holloway, 1970; Powers and Wilcox,
1977; Verschuure and Brocaar, 1983) not only because silences of empty interval may
cause false transition by interfering with sound recognition but also because rhythmic
introduction of silence disrupts the listener’s perception (Bregman,1990).
15
Chapter 3: Experiment
3.1 Experiment design
In this study, I hypothesized that filled and empty rhythms are treated differently
in auditory and memory processing and the hypothesis was tested with four factors; type,
memory, condition and musician.
First, stimuli type is the most important factor in this experiment. Types consist of
three different stimuli groups; vocal rhythm, filled instrumental rhythm and empty
instrumental rhythms. Vocal rhythms, adopted from songs of central Australian
aborigines, were taken as reference rhythms for the creation of stimuli, because filled and
empty instrumental rhythms were derived from the vocal ones so the three sets
correspond to each other. For details about the three stimuli groups see the below
‘Experiment Materials’ section.
The second factor is memory. In this thesis, memory is divided into sensory
memory and working memory on a basis of how long information is stored in mind,
which is a classical way of labeling memory in psychology. Sensory memory that is also
called perceptual memory starts to work the moment our sensations are activated by
information and lasts up to two or three seconds. Information that is filtered and extracted
with the aid of some mechanisms like attention moves into working memory stage where
information is temporarily stored for between about two and 15 seconds. The different
16
memory is tested by two different ISIs; sensory memory for 50 ms and working memory
for 15000ms.
The third factor is same or different condition. Bamber (1969) suggested same
and different responses are based on two different processes and significant reaction time
difference between same and different condition were found in both Hung’s and Klyn’s
experiments. Specifically, the participants gave their response faster and more accurate
in same condition than in different condition in Klyn’s experiment.
The fourth factor is musical training. Musical training is one way of developing
our sense of time (James, 1890). Gaver (1993) argued that there are two different ways of
listening; musical listening and everyday listening and musicians may be trained to
identify auditory exemplars relying on subtle acoustical parameters through musical
listening for a long period (Chartrand et al. 2008). Klyn’s experiment shows that
musicians have shorter reaction time and better accuracy in remembering instrumental
rhythms.
3.2 Experiment Materials
In experimental design section, I mentioned that this experiment has four factors
that are type, memory, condition and musician. However, I considered only three
experimental factors when creating stimuli because the musician factor was tested by
recruiting appropriate participants, one group of musicians and the other group of nonmusicians.
To begin with, I explain how type factor was dealt in detail during stimulus
creation. As I mentioned above, this experiment has three different types of stimuli; vocal
17
rhythm, filled instrumental rhythm and empty instrumental rhythm. First, there are
fourteen mono sound clips that were sung by an Australian aboriginal male performer
were selected as vocal rhythm stimuli. Aboriginal songs were chosen for the experiment
because I would like to avoid an influence by sementic processing due to the participants’
language background and understanding of the song texts.
The second type, the filled instrumental rhythms were derived from the fourteen
vocal rhythms. Through a software, Praat, I could attain information about the
quantitative data of amplitude change of each vocal rhythm, and the extracted amplitude
data were re-synthesized with a cello monotone sample that were found from University
of Iowa Electronic Music Studios. The amplitude re-synthesis causes some problems, for
instance, events that I perceived in vocal rhythm disappears in the corresponding filled
instrumental rhythm. It is because I contained only intensity information to create to
filled instrumental rhythm while vocal rhythm is determined by syllable changes, pitch
changes, intensity changes and timbre changes.
Before ending up with the cello sample, the re-synthesis was tried with a couple
of flute and didgeridoo samples. At the beginning stage of creating filled rhythm stimuli,
I planned to extract both amplitude data and pitch contours from vocal rhythm through
Praat and tried to re-synthesize them with an instrumental sample, which caused a
problem that timbre of instruments is distorted. This is because the frequency range of
male human voice does not correspond with the frequency range of the instruments. In
specific, the spectral distortion of didgeridoo was the worst, which sounded like a noise
18
from a sine wave generator because the difference of the fundamental frequency (Fo)
between didgeridoo and male voice was the widest.
In addition to didgeridoo, I also tried re-synthesis with flute but there was still
spectral distortion because flute tones that correspond to male voice are located in the low
register of the instrument. Low tones of the flute do not have the wide energy distribution
in the spectrogram that is necessary to manipulate or re-synthesize the sound in order to
keep the flute’s timbre.
Therefore, I selected a cello to produce the stimuli because cello tones correspond
more closely to a male voice and are less distorted than flute sounds. In the case of
aerophone, timbre, the quality of sound, is more subject to the physical structure of the
instrument. However, re-synthesizing cello with both amplitude and pitch information of
vocal rhythm still had a problem of spectral change in the cello sample. Finally, I ended
up with amplitude re-synthesis only. Sound of filled rhythm continues from onset to
offset, in other words, amplitude of filled rhythm does not decreased to zero level during
the presentation.
The third type is empty instrumental rhythm that was derived from the filled
rhythms. I could perceive the amplitude peaks of filled rhythms as acoustic event so that
the peaks are the reference points to create empty rhythms. The duration of each event in
empty rhythms is 80 ms and consists of 30ms onset and 50 ms decay. In order to produce
not artificial but natural sound, 30 ms fade-in function was used for onset and 50ms fadeout was done for decay through an audio editing program Audacity. The interval
between the end of the event and the onset of the next was ‘silenced’ through the silence
19
generation function of Audacity. The figure 2 shows an example of the three
corresponding rhythms that were produced for each file using the Audacity program.
Vocal rhythm is a reference sound to filled instrumental rhythm that is also a reference to
empty instrumental rhythms. I tried to keep the same number of events across three
rhythms because filled and empty instrumental rhythms are derivatives of vocal rhythm
so they are meant to correspond to each other. However, filled rhythm is re-synthesized
only with amplitude information of vocal rhythm so acoustic-rhythmic events that are not
generated by amplitude changes in vocal rhythm are sometimes missing.
Figure 2. Examples of the three types of corresponding rhythms used in the study: top is
vocal rhythm; middle is filled instrumental rhythm; bottom is empty instrumental rhythm.
20
The top sound envelope in the figure 2 is a sample of vocal rhythm. There are
numerous spikes that are not conserved but filled and empty instrumental rhythms have
smoother shapes. The different envelope shapes show that vocal rhythm has more
information that is missing in filled and empty rhythms. However, the contour of vocal
rhythm is similar to that of filled rhythm, which means that vocal and filled rhythm share
the same amplitude information. In this filled instrumental rhythm, there are seven
conspicuous peaks that we can perceive as sound events because seven major amplitude
changes causes the perception of distinct events while other minor amplitude variations
are not perceived as discrete events. Therefore, seven events were created in empty
rhythm.
Next, memory is the second factor in this experiment. Each pair is presented
twice, each time with two different inter-stimulus times. To investigate the differences
between sensory and working memory, I use a comparison task in which the two pairs to
be compared are either separated by a long ISI of 15000 ms for working memory and, in
a second presentation, they are separated by a short ISI of 500 ms for sensory memory.
The figure 3 is presented to compare two different ISIs.
21
Figure 3. Comparison of two different interstimulus intervals(ISI); the top image shows
15s ISI for long memory and the bottom image has 500 ms ISI for short memory.
The last factor that is considered during stimulus creation is same or different
condition. Same condition means that two sound clips in a rhythm pair are identical
while the rhythm pair in the different condition, consists of one sound clip and its variant.
Rhythmic variants were created by a sound editing computer program, CoolEdit. For
vocal stimuli they were created by noticeably changing, either by shortening or by
lengthening, one of the vowels, with special care taken to keep the sound changes
inaudible. The figure 4 shows a comparison of same and different conditions of vocal
rhythm. In the figure 4, the variant of the different condition was created through
stretching the duration of a vowel.
22
Figure 4. Difference point in vocal rhythm: The first pair consists of two same sound
clips (same condition) and the second pair consists of a sound clip and its variant
(different condition). The red arrow indicates the difference point between same and
different.
In the filled different condition, I manipulated the shape of amplitude through
Cooledit. Smoothening a sharp amplitude change eliminates a sound event while creating
a sharp amplitude change produces one more event in filled rhythm. The figure 5 shows
a comparison of same and different conditions of filled instrumental rhythm. In the
figure 5, the variant is created by filling up the fourth sharp amplitude change in the
different condition, which makes us perceive one less acoustic event.
23
Figure 5. Difference point in filled instrumental rhythm: The first pair consists of two
same sound clips (same condition) and the second pair consists of a sound clip and its
variant (different condition). The red arrow indicates the difference point between same
and different.
In empty instrumental condition, the variant is created either by adding one more
event or eliminating one event through Audacity. The figure 6 is empty instrumental
rhythm that corresponds to the above filled instrumental rhythm. Since we perceive one
less event in different condition of filled instrumental rhythm in the figure 5, I eliminate
an event of empty rhythm at the location where event in filled rhythm was missing. The
location of the difference is same in both filled and empty rhythms.
24
Figure 6. Difference point in empty instrumental rhythm: The first pair consists of two
same sound clips (same condition) and the second pair consists of a sound clip and its
variant (different condition). The red arrow indicates the difference point between same
and different
During the creation of variants, I carefully selected the location of difference
point to avoid the effect of the location of difference point. I tried to distribute the
location of difference points evenly through out the sound clip: five pairs of each type
have the difference point in the first third of the clip, four pairs of each types have the
difference point in the middle, and four pairs of each type have the difference point in
final third.
3.3 Stimulus set
Before the experiment is run, all participants had a practice section that consists of
a set of trials to familiarize participants with the experiment. One empty instrumental,
one filled instrumental and one vocal rhythm pairs with 500 ms ISI, each in the different
and same condition, were presented for this trial section.
25
This experiment consisted of two blocks each of which also had two sections. One
block consists of two short sections because the stimuli in the block have 500 ms ISI,
while stimuli in the remaining two sections in the other block have 1500ms ISI so they
are called long sections. Each section has fifty-two rhythm pairs; fourteen stimuli from
vocal rhythm, fourteen from filled rhythm and fourteen from empty rhythm. In each
condition, seven stimuli are same pairs and other seven stimuli are different pairs. Stimuli
in one short section are same to those in one long section, thus stimuli are presented in
blocks to avoid successive repetition of same stimuli. In addition, stimuli were randomly
presented in the sections.
3.4 Participants
Originally twenty-two people participated in this experiment, however, one
female non-musician subject was eliminated because she showed 100% wrong responses
in one section due to drowsiness; ten subjects are non-musicians (average age 27.1 years,
range 20 to 45) and the other eleven subjects are musicians (average age 28.27 years,
range 22 to 34, five males and six females). Three criteria to define musicians were
applied; one is that a participant should have no more than 5 years of formal musical
training and another is that a participant should not have regular musical activity for the
past three years. In addition, a participant should define himself or herself as nonmusician. Only participants fulfilling all three criteria were classified as musicians.
Musicians are currently studying music at the institute or perform professionally and on
average they have played music for 20.55 years ranging from 13 to 26 years. All nonmusicians are not active in music performance and the average period of formal music
26
training is 2.55 years ranging from 0 to 5 years except one case who has eight years of
music lesson. Musician group consists of five males and six females and non-musician
group consist of five males and five females which has a pretty good balance in terms of
gender. Participants were mostly recruited from The Ohio State University community
and one subject was from Columbus Community College.
3.5 Equipment
The experiment was performed on a Sony VAIO laptop having Window XP
operation system. Stimuli were presented with DMDX software through which the data
was obtained.
3.6 Procedure
To begin with, all participants had listened to an explanation of their rights as
experiment subjects and consented to proceed to the experiment verbally before a short
questionnaire about age, dominant hand, musical training, etc was given. Next, each
participant performed the practice section so they could understand the experiment
procedure. Participants were asked to make a decision as quickly and precisely as
possible. Their decision was recorded by the track-pad of the laptop. If participants think
the sounds are same in a pair, they press the left track pad which showed a “=” sign. If
not, they press the right button that had a “!” sign. In the practice section, all participants
listened to test examples of vocal, filled instrumental and empty rhythm after seeing the
question, “Same or Different?” on the screen. After the training trials, participants
27
performed the four experimental sections. The experiment has two blocks; one block with
two long sections and the other block with two short sections. Each section has twentyone stimuli; seven vocal rhythms, seven filled rhythms and seven empty rhythms and, in
a section, the stimuli are presented in a random order to reduce order effect. Participants
could control break time between each section. The experiment including introduction
and training trials takes thirty-five minutes in total. After their performance, the
participants were asked about their impression on the experiment and their strategies to
perform experiment during debriefing section. All participants received a five-dollar
compensation and three of participants received extra credit of their Speech and Hearing
Science class on top of it. The short description about the experiment and its procedure
was provided as shown below in the flow chart.
!"#$%&'()*+"*,'%*-'
+.)#,'/0"+1)**%2#"+'
%$)0,'+0$3"4,+5'
$%467)0*-'
8*,#)-041)*',)'
"9:"#2;"*,'%*-'
"9:&%*%1)*')<'
:#)4"-0#"'
=#%414"'+"++2)*'%*-'
+0$3"4,+5'<""-$%46'
>)0#'+"41)*+'?,@)'
&)*7'%*-',@)'+.)#,'
$&)46+A'%*-'+0$3"4,+'
4)*,#)&&"-':#)7#"++'
$",@""*'+"++2)*+''
Figure 7. Experiment procedure
28
B"$#2"C*7'%*-'
4);:"*+%1)*'
3.7 Data analysis
The output from DMDX is raw data of this experiment, thus three kinds of wrong
responses were eliminated. The first wrong response is wrong decision. Wrong decision
is that participants press “=” button after different stimulus presentation or they press “!”
after same stimulus presentation. The second wrong response is decision that is made
before difference point. Difference point for the same condition is the time point where
the second sound ends, while difference point for the different condition is the time point
where the second sound clip differs from the fist sound clip in a pair. The third wrong
response is decision that is made after decision period.
In order to get participants’ reaction times, I subtract the difference point from
recorded time. The raw data is made by DMDX that records time since the second sound
clip in a rhythm pair is presented. After having participants’ reaction time, I filtered the
reaction time data to remove the second and third types of wrong responses. First,
negative reaction times mean that participants made a decision before the difference
point, which shows negative reaction times are wrong decision. In addition, latency of
reaction time is also considered. Humans need at least 200ms to give responses after
listening to stimuli because our brain processes stimuli and operates our motor system to
press the button. Second, I also removed reaction times that are longer than 4s after
difference point. These values appear outliers in the reaction time distribution graph. We
can assume that participants may guess answers if they do not provide their response
instantly. The figure 7 shows the location of correct responses in time.
29
Figure 8. The range of correct response: correct responses are located between 200ms and
4000ms after difference point. ISI=interstimulus interval, RT=reaction time
Accuracy data is the number of correct responses of each category and
transformed into percentage data. This experiment has twelve categories (3x2x2) in
within group. Thus, the number of correct responses is divided by total number of
responses in each category and multiplied to 100 %.
30
Chapter 4: Experiment Results
4.1 Reaction time
A repeated measurement ANOVA with factors type, memory, condition and
musician was performed on the reaction time data. All main factors were found to be
significant (Type: (F (2,20) = 9.7652; p<0.001); Memory: (F (1,20) = 45.1678; p<0.001);
Condition: (F (1,20) = 502.5527; p<0.001; Musician: (F (1,20) = 7.701; p=0.012)). There
was a significant 2-way interaction for type and condition (F (2,42) = 29.679; p<0.001),
and the interaction between memory and type approached significance (F (2,42) = 3.21;
p=0.05
First, the filled instrumental rhythm showed the slowest reaction time (1399ms).
The vocal rhythm mean is 1263 ms and the empty instrumental rhythm mean is 1211ms.
Second, the mean reaction time for short memory is 1132 ms while the mean reaction
time for long memory is 1450 ms. Third, the participants reacted faster in same condition
(905 ms) than in different condition (1677ms.) Fourth, musicians were significantly faster
(1176 ms) than nonmusicians (1406 ms). Fifth, in the long memory task both musicians
and non-musicians showed significantly longer reaction time (1602 ms and 1298 ms,
respectively) than in the short memory task (1210 ms and 1053 ms, respectively), and the
difference between the two participant groups was not significant (p=0.1351). Although
31
the non-musicians show a proportionally longer reaction time in the long memory
condition this difference (245 ms vs 392 ms) is not significant (p=0.135).
Next, the interaction of stimulus type and condition was significant, but there was
no significant difference between musicians and non-musicians. The means for the same
conditions were 916 ms (voice), 873 (filled), and 926 (empty); for the different condition
they were 1611 ms (voice), 1924 ms (filled), and 1496 ms (empty). For the different
condition, the filled instrumental rhythms had the longest and the empty rhythms the
shortest reaction time, in both memory conditions and for both subject groups.
In terms of the interaction between condition and memory, for the same condition
of the long memory task there is a significant difference (p = 0.005) between musicians
and non-musicians, but not between the three stimulus types. For the same condition of
the short memory task the empty rhythms have a slightly longer reaction time (not
significant) than the two other rhythm types, and there is no significant differences
between the subject groups.
32
Figure 9. Reaction time Interaction between stimulus type and same/different condition
for non-musicians (left) and musicians (right). d = different condition, s = same
condition; einst = empty instrumental rhythms, finst = filled instrumental rhythms, voc
=vocal rhythms
4.2 Accuracy
For each subject percentage values were calculated from the decision data. The
square root of these % values were then subjected to an arcsin transformation and a
repeated measure ANOVA with factors type, memory, condition and musician was
performed. All four main factors turned out to be significant. The factor Type was highly
significant (F (2,19) = 165.27; p<0.001) with 84.3% for vocal, 87.8% for empty, and
54.2% for filled instrumental rhythms; the factor Memory (F (1,19) = 45.1678; p<0.001)
indicates that there are more correct responses in the short (78.3%) than in the long
memory task (72.8%); the factor Condition was also significant (F (1,19) = 5.595;
p=0.028) with 79.1% for the same and 71.9% for the different condition; the factor
Musician: (F (1,19) = 7.47; p=0.013) showed that musicians were generally better (80%
33
correct) than non-musicians (71% correct). The below figure 10 overall accuracy of each
category.
Figure 10. Accuracy interaction between type and condition; left graph for non-musician
& right graph for musician;d = different condition, s = same condition; einst = empty
instrumental rhythms, finst = filled instrumental rhythms, voc = vocal rhythms.
A significant interaction between type and memory (F (1,19) = 6.68; p=0.0033)
indicates that the effect of memory is mainly due to the better performance on empty and
vocal rhythms in the short memory task, whereas the performance for the filled rhythm is
the same for both memory tasks.
There is also a significant interaction between memory and condition (F (1,19) =
24.76; p=0.0033), which is caused by a significant decline in accuracy for the same
condition and a minor improvement for the different condition between the short and
long memory task (see fig.11).
34
Figure 11. Interaction between memory task and same/different condition. l = long
memory task, s = short memory task. d (blue) = different condition, s (red) = same
condition.
There was also significant interaction between type and condition (F (1,19) =
12.47; p<0.001), and a significant 3-way interaction between memory, type, and
condition (F (2,38) = 12.39; p<0.001). Both effects can be attributed to the changing
responses of the non-musicians in the two memory tasks, whereas the musicians do not
show a type * condition interaction (see fig. 12).
35
Figure 12. Interaction between stimulus type and same/different condition for nonmusicians (left) and musicians (right). d = different condition, s = same condition; einst =
empty instrumental rhythms, finst = filled instrumental rhythms, voc = vocal rhythms.
36
Chapter 5: Discussion
The experimental results are interesting and challenging. In terms of stimulus
type, the participants showed the best performance for empty instrumental rhythms. First,
it seems that the clear separation between events of empty instrumental rhythms helps
participants to detect their rhythms more easily than those of filled instrumental
examples. The separation between events of the empty instrumental rhythms is created by
zero level amplitude, i.e. by periods of silence, which may help the participants
distinguish rhythmic events more clearly. This contrasts with filled rhythms, for which
the amplitude envelop does not reach zero level, so the separation area between events
(i.e. amplitude maxima) has still continuous sound. This may cause difficulties in
detecting rhythmic events in terms of distinguishability: the amplitude maxima are not as
clearly separated as they are in empty rhythms. In other words, the clear separation of
empty pairs may make participants’ decisions easy, while the absence of the clear
distinction of filled pairs probably increases the difficulties. For instance, a couple of
non-musician subjects reported that empty rhythms reminded them of Morse code but
they could not transform filled rhythms into corresponding mental images. This suggests
that differences in participants’ performance on empty and filled rhythms are not so much
based on the different physical contents then on the different levels of task difficulty.
37
In the experiment, empty rhythms show shorter reaction time and more accuracy
than filled rhythm. The study focused on how fast participants react and how accurate
their reactions were. When we examine previous studies on empty and filled interval, the
filled intervals resulted in more accurate duration judgments than the empty interval,
which seems to contradict the results of this experiment. In those previous studies, the
stimuli for filled intervals are stable tones, with sound onset and offset defining the
interval boundaries, whereas empty interval stimuli are usually created by two clicking
sounds. However, the filled rhythm stimuli of my experiment are made up of several
successive sound events between which the spaces are filled with sound, while empty
rhythms are composed of brief sound events. This difference in stimuli between previous
studies and my experiment most likely explains the different results, namely, that in the
present study the empty rhythms produce the better results in this study.
Second, it seems that multiple components of vocal rhythm, i.e. amplitude, pitch
and timbre changes help the participants recognize vocal rhythmic pattern better when we
compare their performance for filled instrumental rhythm and vocal rhythm. In the
stimulus creation section I described that filled rhythms are created only with the
information of amplitude changes from corresponding vocal rhythm. However, vocal
rhythms are established by a combination of amplitude, pitch, spectral and timbre
changes. It means that the filled rhythms of my study are one-dimensional (the change of
one parameter causes the event segmentation) while the vocal rhythms are multidimensional. In terms of distinguishability, amplitude itself has a limitation and the
38
decreased performance for filled rhythms in comparison to vocal rhythms may due to
lack of pitch, spectral and timbre information in the sounds.
Third, performance for empty rhythms is slight better than for vocal rhythms.
Even if empty rhythm is one-dimensional, the silence between events creates clear
distinction, which makes us extract the rhythm information easily. In contrast, vocal
rhythms are composed of several components from which we extract rhythm information.
In other words, we need to process several different types of information simultaneously.
Because of this simultaneous processing, decisions on vocal rhythms do not take more
time than those on empty rhythms, at least not for the short memory task.
Next, short memory shows faster reaction and more accurate responses than long
memory. It seems that for longer ISIs rhythmic memory seems to fade, making the
comparison with the just heard pattern more difficult and leading to longer reaction times
and higher error rates. Gardiner et al. (1994) proposed a tempting idea to explain this
phenomenon by suggesting two separate ways of rehearsal; maintenance rehearsal and
elaborative rehearsal. Maintenance rehearsal is related to knowing, not to remembering,
while elaborative rehearsal affects remembering, not knowing. Gardiner et al.’s
suggestion of two different rehearsal strategies could be applied to music memory
research and may be tested in the future studies. Craik & Watkins’s (1973) study
indicated that the duration of mental rehearsal is related not to short-term memory but
long term memory and maintenance rehearsal may not improve memory performance.
In addition, same condition shows faster reactions and slight better accuracy. This
result is consistent with that of Klyn’s experiment (2012). It is plausible that same
39
decisions require a different processing than different decisions. A slower reaction time
in different decisions may be related to additional re-check process when the brain
detects or checks for differences (For a general discussion of different processes involved
in same and different decisions see Briggs and Johnsen, 1973; Krueger, 1978). The
interaction of type and same condition in non-musician group shows the interesting
phenomenon that for ‘different’ decisions there is no significant difference between
empty rhythm and filled rhythm and vocal rhythm presents the best accuracy, while
musicians show still their best performance in empty rhythm. The reaction of the nonmusicians can be seen as additional support for the existence of a voice specific response
(VSR) that was first proposed by Levy et al. (2006).
Finally, musical training contributes to the participants’ ability to identify and
judge the sameness and difference of stimuli, especially in empty instrumental rhythms. It
seems that musical training improves rhythm perception and memory of the participants,
but the improvements is strongest for empty rhythms, less so for vocal rhythms, which is
also consistent with findings of Klyn (2012). In addition, the musician participants of my
experiment showed better performance for filled rhythm than non-musicians. This
implies that musicians may have a different representation of instrumental sounds than
non-musicians, which improves the ability to recognize instrumental rhythms more
quickly and correctly. It is possible that musicians, specifically those with western music
training, develop an ability to transform instrumental sound into visual, music notationlike representations, that seem to be more stable and easier to recall. Another possible
explanation for this, an association of sound rhythms with bodily actions, comes from the
40
musician participants’ debriefing: A couple of musicians reported that they combine the
sound with images of playing the cello specifically for filled rhythms in order to
distinguish rhythmic events from vibration; this may have similar stabilizing effects as
the notation-like representation.
41
Chapter 6: Conclusion
Despite the significant difference between filled and empty rhythm that was
identified in this thesis, the results do not support the research hypothesis that differences
between vocal and clapstick rhythms, identified in previous studies, are due to the
difference between filled and empty rhythms. If the hypothesis had been correct, vocal
rhythms should have shown similar results as filled rhythms. However, filled and vocal
rhythms produced clearly different responses in the participants, both in terms of
accuracy and reaction time. Most likely, the processing differences between clapstick and
vocal rhythms have to do with the different dimensionality of the rhythms that was
discussed above. The complex combination of features that forms rhythms of the voice
sounds is, in our experience, linked to the vocal production of sounds. On the other hand,
from an ‘ecological’ perspective, instrumental sounds like that of clapsticks bear the
hallmark of sounds produced by human interaction with external (resonating) objects, i.e.
sounds produced by hitting, slapping, thumping, etc. The cognitive differences in
processing of vocal and instrumental sound may therefore reflect the different actions in
and interactions with our environment that led and leads to the production of these two
types of sound.
The experiment also shows the different influence of sensory and working
memory on the participants’ performance and confirms the existence of different
cognitive processing for ‘same’ or ‘different’ recognitions and decisions. In addition,
42
musical training seems to improve accuracy in instrumental rhythm, especially for empty
rhythm and contribute to the faster perception of empty instrumental rhythm. It indicates
that musical education increases the differences between instrumental and vocal rhythm
processing, and helps to represent musical rhythms in more stable forms in memory.
What does this tell us? Music, as a parallel system to language, provides us a
primary modeling system that enables us to communicate and think in unique, nonlinguistic terms (Blacking 1976). Language heavily relies on human voice while music
mediates between voice and non-voice sounds, thus we do not tend to make a clear
distinction between vocal and instrumental music when we think of music. It may
therefore come as a surprise that there are cognitive differences in the processing of vocal
and instrumental sound. However, as pointed out above, from an ecological perspective
these differences are just a reflection of different interactions in and with our
environment and may shed new light on the origins of these different forms of sound
production.
Through the present study, I tried to contribute a small portion to our
understanding of rhythm in the context of music. Musical rhythm can be produced either
by human voice, music instrument, or both. Given that using both voice and non-voice
sound is quite unique for the human species, studies about vocal rhythm and instrumental
rhythm may therefore shed new light on the origins and evolution of music (Fitch, 2006).
Although this study approached musical rhythm from a very specific angle, namely the
question whether the different cognitive processing of vocal and instrumental rhythms
can be explained by the difference between empty and filled rhythms, I nevertheless hope
43
that the present study contributes to tracing back the origin of musical man (Blacking
1973).
44
References
Audacity Team. (2013). Audacity. (Version 2.0.3.0.) [Computer program]. Retrieved
January, 2013, from http://audacity.sourceforge.net/
Baddeley, A.D. (1997). Human memory: Theory and practice. Psychology Pr.
Baddeley, A. D. (2000). The phonological loop and the irrelevant speech effect: Some
comments on Neath (2000). Psychonomic Bulletin & Review, 7(3), 544-549.
Baddeley, A.D. (2010). Working memory. Current Biology, 20(4), R136–R140.
Baddeley, A.D., & Hitch, G. J. (1994). Developments in the concept of working memory.
Neuropsychology, 8, 485-493.
Baddeley, A. D., & Logie, R. (1992). Auditory imagery and working memory. Auditory
imagery, 179–197.
Bamber, D. (1969). Reaction times and error rates for “same”-“different” judgments of
multidimensional stimuli. Perception & Psychology, 84, 213-219.
Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in
human auditory cortex. Nature, 403(6767), 309-312.
Binder, J. R., Frost, J. A., Hammeke, T. A., Bellgowan, P. S. F., Springer, J. A.,
Kaufman, J. N., & Possing, E. T. (2000). Human temporal lobe activation by speech and
nonspeech sounds. Cerebral Cortex, 10(5), 512-528.
Blacking, J. (1973). How musical is man?. University of Washington Press.
Blacking, J. (1992). The biology of music-making. In H. Meyer (Ed.), Ethnomusicology:
An Introduction (pp. 301-314). New York, NY: Norton.
Bregman, A. S. (1990). Auditory scene analysis. MIT Press: Cambridge, MA
Briggs, G. E., & Johnsen, A. M. (1973). On the nature of control processing in choice
reactions. Memory & Cognition, 1, 91-100.
Boersma, P., & Weenink, D. (2013). Praat: doing phonetics by computer (Version 5.3.59)
[Computer program]. Retrieved January, 2013, from http://www.praat.org/
45
Bowles, E. A. (1957). Were Musical Instruments Used in the Liturgical Service during
the Middle Ages?. The Galpin Society Journal, 10, 40-56.
Burge, T. (2003). Memory and persons. The Philosophical Review, 112(3), 289-337.
Chartrand, J. P., Peretz, I., & Belin, P. (2008). Auditory recognition expertise and domain
specificity. Brain research, 1220, 191-198.
Cherry, C., & Wiley, R. (1967). Speech communication in very noisy environments.
Nature, 214, 1164.
Craik, F. I., & Watkins, M. J. (1973). The role of rehearsal in short-term memory.
Journal of verbal learning and verbal behavior, 12(6), 599-607
Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and
reading. Journal of verbal learning and verbal behavior, 19(4), 450-466
Dirks, D. D., & Bower, D. (1970). Effect of forward and backward masking on speech
intelligibility. The Journal of the Acoustical Society of America, 47, 1003.
Dyirbal song poetry traditional songs of an Australian rainforest people. (1996). Mascot,
N.S.W :: Larrikin.
Fecteau, S., Armony, J. L., Joanette, Y., & Belin, P. (2004). Is voice processing speciesspecific in human auditory cortex? An fMRI study. NeuroImage, 23(3), 840-848.
Fitch, W. T. (2006). The biology and evolution of music: A comparative perspective.
Cognition, 100(1), 173-215.
Forster, K. I., & Forster, J. C. (2003). DMDX: A Windows display program with
millisecond accuracy. Behavior Research Methods, 35(1), 116–124.
Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.), The psychology of music (p.
149-180). New York: Academic Press.
Gardiner, J. M., Gawlik, B., & Richardson-Klavehn, A. (1994). Maintenance rehearsal
affects knowing, not remembering; elaborative rehearsal affects remembering, not
knowing. Psychonomic Bulletin & Review, 1(1), 107-110.
Gaver, W.W. (1993). What in the world do we hear?: An ecological approach to auditory
event perception. Ecological Psychology, 5(1), 1-29.
46
Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain.
Journal of Cognitive Neuroscience, 19(5), 893-906.
Goldfarb, J. L., & Goldstone, S. (1963). Time judgment: A comparison of filled and
unfilled durations. Perceptual and Motor Skills, 16(2), 376-376
Goldstone, S., & Goldfarb, J. L. (1963). Judgment of filled and unfilled durations:
Intersensory factors. Perceptual and motor skills, 17(3), 763-774.
Holloway, C. M. (1970). Passing the strongly voiced components of noisy speech.
Hung, T. H. (2011). One music? Two musics? How many musics? Cognitive
ethnomusicological, behavioral, and fMRI study on vocal and instrumental rhythm
processing (Doctoral dissertation, Ohio State University).
Ihle, R. C., & Wilsoncroft, W. E. (1983). The filled-duration illusion: limits of duration
of interval and auditory fillers. Perceptual and motor skills, 56(2), 655-660.
Joanisse, M. F., & Gati, J. S. (2003). Overlapping neural regions for processing rapid
temporal cues in speech and nonspeech signals. Neuroimage, 19(1), 64-79.
James, W. (1890). 1950) The Principles of Psychology.
Klyn, N. A. M. (2012). Working Memory for Rhythm (Master’s Thesis, The Ohio State
University).
Korea Arts & Culture Education Service. (1980). Beompae : Systematic inventory of
folk music 4. Seoul, Korea: Korea Arts & Culture Education Service.
Krueger, L.E. (1978). A theory of perceptual matching. Psychological review, 85, 278304.
Levy, D. A., Granot, R., & Bentin, S. (2003). Neural sensitivity to human voices: ERP
evidence of task and attentional influences. Psychophysiology, 40(2), 291-305.
Malcolm, N. (1963). Three lectures on memory. Knowledge and certainty, 187-240.
Marsh, E., & Bower, G. (2004). The role of rehearsal and generation in false memory
creation. Memory (Hove, England), 12(6), 748-761.
Merker, B., & Brown, S. (2001). The Origins of Music (New edition.). The MIT
Press.
Mithen, S. J. (2005) The Singing Neanderthals: the Origins of Music, Language, Mind
47
and Body. Cambridge, Mass. : Harvard University Press, 2006
Miller, G. A. (1956). The magical number seven, plus or minus two: some limits on our
capacity for processing information. Psychological Review, 63, 81-97.
Miller George, A. (1957). The magic number seven, plus or minus two. The
Psychological Review, 63, 81-97.
Miller, G. A., & Licklider, J. C. R. (1950). The intelligibility of interrupted speech. The
Journal of the Acoustical Society of America, 22, 167.
Owen, B. et al. (2001). Organ. Grove Music Online. Oxford Music Online. Oxford
University Press, Retrieved April, 2013,
http://www.oxfordmusiconline.com.proxy.lib.ohiostate.edu/subscriber/article/grove/music/44010.
Peretz, I., & Zatorre, R. J. (2005). Brain organization for music processing. Annu. Rev.
Psychol., 56, 89-114.
Powers, G. L., & Wilcox, J. C. (1977). Intelligibility of temporally interrupted speech
with and without intervening noise. The Journal of the Acoustical Society of America, 61,
195.
Rammsayer, T. H., & Lima, S. D. (1991). Duration discrimination of filled and empty
auditory intervals: Cognitive and perceptual factors. Perception & Psychophysics, 50(6),
565-574.
Rammsayer, T. H., & Skrandies, W. (1998). Stimulus characteristics and temporal
information processing: Psychophysical and electrophysiological data. Journal of
Psychophysiology.
Repp, B. H., & Bruttomesso, M. (2009). A filled duration illusion in music: Effects of
metrical subdivision on the perception and production of beat tempo. Advances in
Cognitive Psychology, 5, 114.
Saito, S. (2001). The phonological loop and memory for rhythms: An individual
differences approach. Memory, 9(4), 313–322.
Snyder, B. (2000). Music and memory: an introduction. The MIT Press.
Swanson, H. L. (1993). Individual differences in working memory: A model testing and
subgroup analysis of learning-disabled and skilled readers. Intelligence.
University of Iowa Electronic Music Studio. (2012). Biofuels. Retrieved January, 2013,
48
from http://theremin.music.uiowa.edu/MIScello.html
Verschuure, J., & Brocaar, M. P. (1983). Intelligibility of interrupted meaningful and
nonsense speech with and without intervening noise. Perception & psychophysics, 33(3),
232-240.
Wallaschek, R. & Cattell, J. M. (1891). On the Origin of Music. Mind. 16(63), 375-388.
Will, U. (2004). Oral memory in Australian song performance and the Parry-Kirk debate:
a cognitive ethnomusicological perspective. International Study Group on Music
Archaeology, 10, 1–29.
Korea Arts & Culture Education Service. (1980). Beompae : Systematic inventory of
folk music 4. Seoul, Korea: Korea Arts & Culture Education Service.
49
Download