Jennifer A. Alexandera, Ann R. Bradlowb, Richard D. Ashleyb, and Patrick C.M. Wongb
Simon Fraser University, bNorthwestern University
Speech and music both utilize pitch variation to
convey meaning, such as emotional affect. In
speech, pitch conveys pragmatic meaning and,
in tone languages, lexical information. This
study examines how experience processing
lexical pitch affects music-pitch perception.
Twenty-eight non-musicians (14 English and
14 Mandarin speakers) discriminated and
identified short melodies. The Mandarin
listeners more accurately discriminated the
melodies than the English listeners (MannWhitney
t(21.86)=2.45, p<0.05, d= 0.93), but the English
listeners more accurately matched the melodies
with graphical representations of the pitch
changes than the Mandarin listeners (MannWhitney U=26.5, p<0.005; two-tailed t(25.44)=
-3.94, p<0.001, d=1.15). Experience with
lexical-pitch processing may therefore enhance
attention to, and facilitate discrimination of,
rapidly-changing pitches. But, learned linguistic
pitch-pattern categories may interfere with, and
impair identification of, novel music-pitch
patterns. Results are discussed with respect to a
cognitive-processing framework involving the
influence of experientially-acquired pitchcategory knowledge upon novel-pitch input.
segments) combining to create complex
structures (music melodies; speech utterances).
Both also use stress, duration, and pitch
(fundamental frequency/F0) for communicative
purposes. This study seeks to increase our
understanding of how experience processing
lexical pitch affects perception of music pitch.
Pitch in music can express composition (e.g.,
the key of a piece) and affect (e.g., sadness, via
a minor chord). The use of pitch in speech
differs across languages. In non-tone languages
such as English, pitch change across an
intonation phrase conveys pragmatic meaning
including affect (e.g., lower pitch, if the talker is
unhappy) and emphasis (e.g., “It’s a cat, not a
dog.”). In tone languages, pitch variation also
conveys lexical information. Pitch-contour and height contrasts over a tone-bearing unit
(typically the syllable) signal word meaning. In
Mandarin, the syllable /ma/ produced with the
high-level tone 1 means “mother,” while /ma/
means “hemp” when produced with the rising
tone 2, “horse” with the dipping tone 3, and
“scold” with the falling tone 4. Fig. 1 shows
pitch tracks of the four Mandarin tones.
Tone 1
Tone 2
Time (s)
Time (s)
Tone 3
Tone 4
Scientific areas: Speech Perception & Production,
Language/Music Interfaces
Music and speech are similar in certain respects.
For instance, both are generative, with simple
elements (e.g., music pitches or speech
Time (s)
Time (s)
Figure 1. Pitch tracks of the four Mandarin Chinese tones.
A growing body of literature is devoted to
how linguistic-pitch processing experience
affects music-pitch perception and vice-versa.
Music experience may facilitate processing of
prosody and intonation [Stevens et al., 1],
detection of small pitch changes in final words
and notes [e.g., Schön et al., 2], identification of
intonation-phrase nuclei [Dankovičá et al., 3],
and discrimination of phrasal intonation
contours (ibid.). Music experience is correlated
with comparatively high performance on
behavioral lexical-pitch perception tasks such as
recall and identification of lexical-tone variation
[Delogu et al., 4], non-native tone-pattern
learning [Wong & Perrachione, 5], and
identification and discrimination of isolated
lexical tones [e.g., Alexander et al., [6];
Gottfried, [7]]. Music ability is also correlated
with increased neurophysiological linguisticpitch-processing ability: relative to nonmusicians, musicians display more robust and
faithful encoding of lexical pitch at the
brainstem [Wong et al., 8]. Tone language
experience may also influence nonspeech-pitch
processing ability. Native tone-language
speakers may be more likely than non-tonelanguage speakers to possess absolute pitch
[Deutsch et al., 9] and more easily discriminate
two-note music contours [Stevens et al., 1]. But
native Mandarin listeners more often
misidentify flat and falling sine-wave pitch
contours than English listeners [Bent et al., 10].
The above studies suggest that experience
processing one type of pitch (lexical or musical)
affects perception of the other. This study seeks
a more nuanced view of this phenomenon. We
examine the discrimination and identification of
pitch in short (five-note) musical melodies by
listeners who differ with respect to their
experience with linguistic pitch (native English
speakers and native Mandarin speakers). We
predict that the Mandarin listeners’ stored
lexical-pitch categories will interfere with their
ability to identify music pitch patterns, and that
they will perform the task more poorly than the
English listeners, who lack existing lexical pitch
categories. However, we predict that in the pitch
discrimination task, where listeners do not
compare the input to stored categories but
instead focus on small acoustic differences
between pairs, Mandarin listeners will prevail
due to their experience discriminating short,
rapidly-changing, lexical pitch sequences.
2.1. Stimuli
Stimuli consisted of 48 five-note melodies
replicated from Dowling [11]. In keeping with
Dowling’s methodology and terminology, three
types of contours – “standard,” “answer,” and
“target” – were generated. Standards began on
middle C (F0=262 Hz) and were in the key of C.
Via three successive random permutations,
standards included two probabilities of diatonic
tone steps: P (± 1 step)=0.67, and P (± 2
steps)=0.33. Answers and targets began on
either the E above or the A below middle C, but
remained in the key of C. These transpositions
are moderately distant from C both in pitch
level (+4 and -3 semitones, respectively) and in
shared pitches (3 and 4, respectively). Answers
had the same contours, diatonic intervals, and
tonal scales as the standards, but had different
interval sizes. Targets differed from standards
with respect to scale, being in the key of E or A.
Stimuli were generated via Finale on an
Macintosh G4 computer, converted to MIDI in
Grand Piano sound, normalized for duration at
1.8 s. [QuickTime 7 Pro v. 7.0.4], and converted
to .wav format at a 44.1 kHz sampling rate and
16-bit depth [Adensoft Audio MP3 Converter
version 1.2]. The 1.8-s.-long melodies were
challenging, but possible, to follow. The
duration of each note, 0.36 s., was near the
average duration of a Mandarin tone produced
in isolation (0.43 s., as averaged across multiple
tokens of each tone, produced in multiple
syllables, by two male Beijing Mandarin
speakers; Chang & Yao, [12]). These short,
rapidly-changing music pitches within a longer
melodic context were intended to be analogous
to the short, rapidly-changing lexical pitches
that occur across a phrase in Mandarin Chinese.
2.2. Participants
Participants were 14 native Mandarin (9 female)
and 14 native English (10 female) speakers. The
English speakers had no experience with any
tone language. The groups did not differ in age
(M=25 y.o., SD=3.98 y.) nor music-training
duration (t(25.97)= -0.82, p=0.42). All had
minimal music training (max=4.5 y. music
training (n=1); mode=0 y. (n=13); M=1.4 y.).
None reported hearing, speech, or neurological
deficits. Musicianship and language experience
were assessed via questionnaire.
2.3. Experiments
2.3.1. Experiment 1
Experiment 1 was a 2 AFC AX discrimination
task; melodies were arranged in pairs. In two
blocks of 48 trials (96 trials total), half were
“same” (identical) trials. In “different” trials, the
two melodies had the same starting pitch, but
differed by (a) the 2nd, 3rd, 4th, and 5th notes;
(b) the 3rd, 4th, and 5th notes; (c) the 4th and
5th notes; or (d) the 5th note. Each melody was
paired just once with any other melody, and
each appeared in one trial where (a) or (b) was
the case AND in one trial where either (c) or (d)
was the case. The experiment took place in a
quiet booth with a Dell computer. Stimuli were
presented in random order via Sennheiser HD
linear II or Sony Dynamic Stereo MDR-V700
headphones; accuracy and reaction time were
recorded via Cedrus Model RB-730 response
pad in E-PRIME [Schneider et al., 2002].
Results for experiment 1 are shown in Figure 2.
p<0.05; two-tailed t(21.86)=2.45, p<0.05, d=
0.93). Both groups spent the same amount of
time on the task (Mann-Whitney U=99, p>0.05;
2-tailed t(22.6)=0.12, p>0.05).
2.3.2. Experiment 2
Experiment 2 was a 2-AFC identification task; a
melody corresponded to a sequence of four
arrows. Each arrow corresponded to a different
note in the melody. An up-pointing arrow
indicated that a note was higher in pitch relative
to the one preceding it; a down-pointing arrow,
the opposite. This aimed to mimic the concept
that relative pitch height and contour are
essential to lexical-tone identity while absolute
intervals are not [Morris, 13]. In a trial (see Fig.
3), participants heard one melody and saw two
arrow-sequences. Via button-press, they
indicated which sequence matched the melody.
Trial 1
Figure 3. Trial in music-melody identification task.
There were 2 blocks of 48 melodies (96 trials
total). Each of the 16 possible arrow
combinations appeared 6 times. The melody
matched arrow-sequence A in half the trials. ISI
was 3 s. Presentation order of experiments 1 and
2 was counterbalanced. Both experiments used
the same hardware, software, and instructions.
The results of experiment 2 are shown in Fig. 4.
Figure 2. Music melody discrimination sensitivity.
Figure 4. Music melody identification accuracy.
Fig. 2 shows that the Mandarin speakers more
accurately discriminated the melodies than
English speakers (Mann-Whitney U=140.5,
Fig. 4 shows that the English speakers more
accurately identified the melodies than the
Mandarin speakers (Mann-Whitney U=26.5,
p<0.005; 2-tailed t(25.44)= -3.94, p<0.001,
d=1.15). Both groups spent the same amount of
time on the task (Mann-Whitney U=117,
p>0.05; two-tailed t(22.76)=0.66, p>0.05).
This work was supported by an NU CogSci Fellowship
(J.A.); NIH grant DC005794 (A.B.); and NIH grants
HD051827 & DC007468 (P.W.) We thank our colleagues
and two anonymous reviewers for their comments.
Relative to the English speakers, the Mandarin
speakers more easily discriminated, but less
easily identified, the music-melodies. It seems
that experience discriminating lexical-pitch can
facilitate music-pitch discrimination. Mandarin
listeners’ experience attending to low-level
acoustic cues for tone differentiation may
generalize to and enhance their discrimination
of music pitch. But music-pitch identification,
which involves matching heard pitch patterns to
visual representations of those patterns, is
perhaps subject to the influence of existing
lexical-pitch categories. The music melodies
were similar to Mandarin tones presented in
context, in that they contained short, rapidlychanging, pitches. The Mandarin listeners may
have encountered interference from stored
lexical-tone categories when identifying the
novel music-pitch input. The English listeners’
superior performance may be due to their lack
of this lexical-pitch category structure.
We also suggest another possibility for
why the English listeners outperformed the
Mandarin listeners on the identification task.
The Mandarin listeners knew Pinyin, a
Romanization system for Chinese that indicates
tone with diacritics (e.g., [mā má mă mà] for
[ma] with tones 1, 2, 3, 4). The arrow sequences
may have been confusing since they are similar,
but not identical, to the Pinyin diacritics. Also,
processes absent from the music-identification
task. In tone sandhi, the first of some sequences
of two tones will change; e.g., tone 3+tone 3 
tone 2+tone 3. The Mandarin listeners may have
expected this process in the identification task,
e.g., if they heard a low-high-low-high pitch
sequence, they might have expected a dipping +
rising arrow sequence. The English listeners
would not have encountered such interference.
