MUSIC-MELODY PERCEPTION IN TONE-LANGUAGE AND NONTONE-LANGUAGE SPEAKERS Jennifer A. Alexandera, Ann R. Bradlowb, Richard D. Ashleyb, and Patrick C.M. Wongb a Simon Fraser University, bNorthwestern University jennifer_alexander@sfu.ca ABSTRACT Speech and music both utilize pitch variation to convey meaning, such as emotional affect. In speech, pitch conveys pragmatic meaning and, in tone languages, lexical information. This study examines how experience processing lexical pitch affects music-pitch perception. Twenty-eight non-musicians (14 English and 14 Mandarin speakers) discriminated and identified short melodies. The Mandarin listeners more accurately discriminated the melodies than the English listeners (MannWhitney U=140.5, p<0.05; 2-tailed t(21.86)=2.45, p<0.05, d= 0.93), but the English listeners more accurately matched the melodies with graphical representations of the pitch changes than the Mandarin listeners (MannWhitney U=26.5, p<0.005; two-tailed t(25.44)= -3.94, p<0.001, d=1.15). Experience with lexical-pitch processing may therefore enhance attention to, and facilitate discrimination of, rapidly-changing pitches. But, learned linguistic pitch-pattern categories may interfere with, and impair identification of, novel music-pitch patterns. Results are discussed with respect to a cognitive-processing framework involving the influence of experientially-acquired pitchcategory knowledge upon novel-pitch input. Format: Poster presentation Keywords: Pitch-perception, Music, Lexical-tone segments) combining to create complex structures (music melodies; speech utterances). Both also use stress, duration, and pitch (fundamental frequency/F0) for communicative purposes. This study seeks to increase our understanding of how experience processing lexical pitch affects perception of music pitch. Pitch in music can express composition (e.g., the key of a piece) and affect (e.g., sadness, via a minor chord). The use of pitch in speech differs across languages. In non-tone languages such as English, pitch change across an intonation phrase conveys pragmatic meaning including affect (e.g., lower pitch, if the talker is unhappy) and emphasis (e.g., “It’s a cat, not a dog.”). In tone languages, pitch variation also conveys lexical information. Pitch-contour and height contrasts over a tone-bearing unit (typically the syllable) signal word meaning. In Mandarin, the syllable /ma/ produced with the high-level tone 1 means “mother,” while /ma/ means “hemp” when produced with the rising tone 2, “horse” with the dipping tone 3, and “scold” with the falling tone 4. Fig. 1 shows pitch tracks of the four Mandarin tones. Tone 1 Tone 2 200 200 50 0.0366792 0.291825 50 0.0346018 Time (s) 0.320899 Time (s) Tone 3 Tone 4 175 200 Scientific areas: Speech Perception & Production, Language/Music Interfaces 1. INTRODUCTION Music and speech are similar in certain respects. For instance, both are generative, with simple elements (e.g., music pitches or speech 50 0.0359517 0.439564 Time (s) 50 0.0316676 0.331486 Time (s) Figure 1. Pitch tracks of the four Mandarin Chinese tones. A growing body of literature is devoted to how linguistic-pitch processing experience affects music-pitch perception and vice-versa. Music experience may facilitate processing of prosody and intonation [Stevens et al., 1], detection of small pitch changes in final words and notes [e.g., Schön et al., 2], identification of intonation-phrase nuclei [Dankovičá et al., 3], and discrimination of phrasal intonation contours (ibid.). Music experience is correlated with comparatively high performance on behavioral lexical-pitch perception tasks such as recall and identification of lexical-tone variation [Delogu et al., 4], non-native tone-pattern learning [Wong & Perrachione, 5], and identification and discrimination of isolated lexical tones [e.g., Alexander et al., [6]; Gottfried, [7]]. Music ability is also correlated with increased neurophysiological linguisticpitch-processing ability: relative to nonmusicians, musicians display more robust and faithful encoding of lexical pitch at the brainstem [Wong et al., 8]. Tone language experience may also influence nonspeech-pitch processing ability. Native tone-language speakers may be more likely than non-tonelanguage speakers to possess absolute pitch [Deutsch et al., 9] and more easily discriminate two-note music contours [Stevens et al., 1]. But native Mandarin listeners more often misidentify flat and falling sine-wave pitch contours than English listeners [Bent et al., 10]. The above studies suggest that experience processing one type of pitch (lexical or musical) affects perception of the other. This study seeks a more nuanced view of this phenomenon. We examine the discrimination and identification of pitch in short (five-note) musical melodies by listeners who differ with respect to their experience with linguistic pitch (native English speakers and native Mandarin speakers). We predict that the Mandarin listeners’ stored lexical-pitch categories will interfere with their ability to identify music pitch patterns, and that they will perform the task more poorly than the English listeners, who lack existing lexical pitch categories. However, we predict that in the pitch discrimination task, where listeners do not compare the input to stored categories but instead focus on small acoustic differences between pairs, Mandarin listeners will prevail due to their experience discriminating short, rapidly-changing, lexical pitch sequences. 2. METHOD 2.1. Stimuli Stimuli consisted of 48 five-note melodies replicated from Dowling [11]. In keeping with Dowling’s methodology and terminology, three types of contours – “standard,” “answer,” and “target” – were generated. Standards began on middle C (F0=262 Hz) and were in the key of C. Via three successive random permutations, standards included two probabilities of diatonic tone steps: P (± 1 step)=0.67, and P (± 2 steps)=0.33. Answers and targets began on either the E above or the A below middle C, but remained in the key of C. These transpositions are moderately distant from C both in pitch level (+4 and -3 semitones, respectively) and in shared pitches (3 and 4, respectively). Answers had the same contours, diatonic intervals, and tonal scales as the standards, but had different interval sizes. Targets differed from standards with respect to scale, being in the key of E or A. Stimuli were generated via Finale on an Macintosh G4 computer, converted to MIDI in Grand Piano sound, normalized for duration at 1.8 s. [QuickTime 7 Pro v. 7.0.4], and converted to .wav format at a 44.1 kHz sampling rate and 16-bit depth [Adensoft Audio MP3 Converter version 1.2]. The 1.8-s.-long melodies were challenging, but possible, to follow. The duration of each note, 0.36 s., was near the average duration of a Mandarin tone produced in isolation (0.43 s., as averaged across multiple tokens of each tone, produced in multiple syllables, by two male Beijing Mandarin speakers; Chang & Yao, [12]). These short, rapidly-changing music pitches within a longer melodic context were intended to be analogous to the short, rapidly-changing lexical pitches that occur across a phrase in Mandarin Chinese. 2.2. Participants Participants were 14 native Mandarin (9 female) and 14 native English (10 female) speakers. The English speakers had no experience with any tone language. The groups did not differ in age (M=25 y.o., SD=3.98 y.) nor music-training duration (t(25.97)= -0.82, p=0.42). All had minimal music training (max=4.5 y. music training (n=1); mode=0 y. (n=13); M=1.4 y.). None reported hearing, speech, or neurological deficits. Musicianship and language experience were assessed via questionnaire. 2.3. Experiments 2.3.1. Experiment 1 Experiment 1 was a 2 AFC AX discrimination task; melodies were arranged in pairs. In two blocks of 48 trials (96 trials total), half were “same” (identical) trials. In “different” trials, the two melodies had the same starting pitch, but differed by (a) the 2nd, 3rd, 4th, and 5th notes; (b) the 3rd, 4th, and 5th notes; (c) the 4th and 5th notes; or (d) the 5th note. Each melody was paired just once with any other melody, and each appeared in one trial where (a) or (b) was the case AND in one trial where either (c) or (d) was the case. The experiment took place in a quiet booth with a Dell computer. Stimuli were presented in random order via Sennheiser HD linear II or Sony Dynamic Stereo MDR-V700 headphones; accuracy and reaction time were recorded via Cedrus Model RB-730 response pad in E-PRIME [Schneider et al., 2002]. Results for experiment 1 are shown in Figure 2. p<0.05; two-tailed t(21.86)=2.45, p<0.05, d= 0.93). Both groups spent the same amount of time on the task (Mann-Whitney U=99, p>0.05; 2-tailed t(22.6)=0.12, p>0.05). 2.3.2. Experiment 2 Experiment 2 was a 2-AFC identification task; a melody corresponded to a sequence of four arrows. Each arrow corresponded to a different note in the melody. An up-pointing arrow indicated that a note was higher in pitch relative to the one preceding it; a down-pointing arrow, the opposite. This aimed to mimic the concept that relative pitch height and contour are essential to lexical-tone identity while absolute intervals are not [Morris, 13]. In a trial (see Fig. 3), participants heard one melody and saw two arrow-sequences. Via button-press, they indicated which sequence matched the melody. Trial 1 A B Figure 3. Trial in music-melody identification task. There were 2 blocks of 48 melodies (96 trials total). Each of the 16 possible arrow combinations appeared 6 times. The melody matched arrow-sequence A in half the trials. ISI was 3 s. Presentation order of experiments 1 and 2 was counterbalanced. Both experiments used the same hardware, software, and instructions. The results of experiment 2 are shown in Fig. 4. Figure 2. Music melody discrimination sensitivity. Figure 4. Music melody identification accuracy. Fig. 2 shows that the Mandarin speakers more accurately discriminated the melodies than English speakers (Mann-Whitney U=140.5, Fig. 4 shows that the English speakers more accurately identified the melodies than the Mandarin speakers (Mann-Whitney U=26.5, p<0.005; 2-tailed t(25.44)= -3.94, p<0.001, d=1.15). Both groups spent the same amount of time on the task (Mann-Whitney U=117, p>0.05; two-tailed t(22.76)=0.66, p>0.05). 4. ACKNOWLEDGMENTS This work was supported by an NU CogSci Fellowship (J.A.); NIH grant DC005794 (A.B.); and NIH grants HD051827 & DC007468 (P.W.) We thank our colleagues and two anonymous reviewers for their comments. 3. RESULTS AND DISCUSSION Relative to the English speakers, the Mandarin speakers more easily discriminated, but less easily identified, the music-melodies. It seems that experience discriminating lexical-pitch can facilitate music-pitch discrimination. Mandarin listeners’ experience attending to low-level acoustic cues for tone differentiation may generalize to and enhance their discrimination of music pitch. But music-pitch identification, which involves matching heard pitch patterns to visual representations of those patterns, is perhaps subject to the influence of existing lexical-pitch categories. The music melodies were similar to Mandarin tones presented in context, in that they contained short, rapidlychanging, pitches. The Mandarin listeners may have encountered interference from stored lexical-tone categories when identifying the novel music-pitch input. The English listeners’ superior performance may be due to their lack of this lexical-pitch category structure. We also suggest another possibility for why the English listeners outperformed the Mandarin listeners on the identification task. The Mandarin listeners knew Pinyin, a Romanization system for Chinese that indicates tone with diacritics (e.g., [mā má mă mà] for [ma] with tones 1, 2, 3, 4). The arrow sequences may have been confusing since they are similar, but not identical, to the Pinyin diacritics. Also, Pinyin reflects lexical-pitch alternation processes absent from the music-identification task. In tone sandhi, the first of some sequences of two tones will change; e.g., tone 3+tone 3 tone 2+tone 3. The Mandarin listeners may have expected this process in the identification task, e.g., if they heard a low-high-low-high pitch sequence, they might have expected a dipping + rising arrow sequence. The English listeners would not have encountered such interference. 5. REFERENCES [1] Stevens, C., Keller, P. & Tyler, M. (2004) Language tonality and its effects on the perception of contour in short spoken and musical items. In Proc. 8th Int Conf Music Percept & Cog (pp. 713-716); Evanston, IL. [2] Schön, D., Magne, C., & Besson, M. (2004) The music of speech: Music training facilitates pitch processing in both music and language. Psychophys, 41, 341-349. [3] Dankovičá, J., House, J., Crooks, A., & Jones, K. (2007) The Relationship between Musical Skills, Music Training, and Intonation Analysis Skills. Lang & Speech, 50(2), 177-225. [4] Delogu, F., Lampis, G., & Belardinelli, M. (2006) Music-to-language transfer effect: may melodic ability improve learning of tonal languages by native nontonal speakers? Cogn Process, 7, 203-207. [5] Wong, P. & Perrachione, T. (2007) Learning pitch patterns in lexical identification by native Englishspeaking adults. Appl Psych, 28, 565-585. [6] Alexander, J., Wong, P., & Bradlow, A. (2005) Lexical Tone Perception in Musicians and Nonmusicians. In Proc Interspeech 2005, 9th Euro Conf Speech Comm & Tech; Lisbon, Portugal. [7] Gottfried, T. (2007) Music and language learning. In O.-S. Bohn & M. Munro (Eds.), Language Experience in Second Language Speech Learning (pp. 221-237). Amsterdam: John Benjamins. [8] Wong, P., Skoe, E., Russo, N., Dees, T., & Kraus, N. (2007) Musical Experience Shapes Human Brainstem Encoding of Linguistic Pitch Patterns. Nature Neuro, 10, 420-422. [9] Deutsch, D., Henthorn, T., Marvin, E., & Xu, H. (2006) Absolute pitch among American and Chinese students: Prevalence differences, and evidence for a speech-related critical period. J Acoust Soc Am, 119(2), 719-722. [10] Bent, T., Bradlow, A., & Wright, B. (2006). The Influence of Linguistic Experience on the Cognitive Processing of Pitch in Speech and Nonspeech Sounds. JEP:HPP, 32, 97-103. [11] Dowling, W.J. (1978) Scale and Contour: Two Components of a Theory of Memory for Melodies. Psych Rev, 85(4), 341-354. [12] Chang, C. & Yao, Y. (2007) Tone Production in Whispered Mandarin. In Proc ICPhS XVI (pp. 10851088); Saarbrücken, Germany. [13] Morris, R. (1987) Composition with pitch-classes: a theory of compositional design. New Haven: Yale University Press.