NeuroImage 17, 1300 –1305 (2002) doi:10.1006/nimg.2002.1279 Human Cortical Dynamics Determined by Speech Fundamental Frequency Anna Mari Mäkelä,* ,† Paavo Alku,‡ Ville Mäkinen,* ,† Jussi Valtonen,* Patrick May,* and Hannu Tiitinen* ,† *Apperception & Cortical Dynamics (ACD), Department of Psychology, P. O. Box 13, FIN-00014 University of Helsinki, Finland; †BioMag Laboratory, Engineering Centre, Helsinki University Central Hospital, P. O. Box 340, FIN-00029 HUS, Finland; and ‡Laboratory of Acoustics and Audio Signal Processing, P. O. Box 3000, FIN-02015 Helsinki University of Technology, Finland Received April 1, 2002 Evidence for speech-specific brain processes has been searched for through the manipulation of formant frequencies which mediate phonetic content and which are, in evolutionary terms, relatively “new” aspects of speech. Here we used whole-head magnetoencephalography and advanced stimulus reproduction methodology to examine the contribution of the fundamental frequency F0 and its harmonic integer multiples in cortical processing. The subjects were presented with a vowel, a frequency-matched counterpart of the vowel lacking in phonetic contents, and a pure tone. The F0 of the stimuli was set at that of a typical male (i.e., 100 Hz), female (200 Hz), or infant (270 Hz) speaker. We found that speech sounds, both with and without phonetic content, elicited the N1m response in human auditory cortex at a constant latency of 120 ms, whereas pure tones matching the speech sounds in frequency, intensity, and duration gave rise to N1m responses whose latency varied between 120 and 160 ms. Thus, it seems that the fundamental frequency F0 and its harmonics determine the temporal dynamics of speech processing in human auditory cortex and that speech specificity arises out of cortical sensitivity to the complex acoustic structure determined by the human sound production apparatus. © 2002 Elsevier Science (USA) INTRODUCTION The ability to understand speech signals, despite their vast inherent diversity, is usually taken for granted. For example, the utterance “a,” be it spoken in a high-pitched female voice or in a low-pitched male voice, shouted or whispered, invariably represents the corresponding phoneme /a/. While still requiring a neuronal-level explanation, this aspect of speech perception is referred to as phonetic invariance. Consequently, the perception of speech has been linked to processes specific in human speech production (Liberman and Mattingly, 1989; Liberman and Whalen, 1053-8119/02 $35.00 © 2002 Elsevier Science (USA) All rights reserved. 2000; Rizzolatti and Arbib, 1998). Moreover, it has been suggested that the human brain carries out a categorization process (Eimas et al., 1987) which relies on phoneme representations shared through exposure to the language environment (Kuhl et al., 1992; Dehaene-Lambertz, 1997). These representations, then, enable us to detect, classify, and discriminate extremely rapidly between incoming speech stimuli from an early age onward (Dehaene-Lambertz and Dehaene, 1994). From the point of view of speech production, a voiced speech signal essentially consists of phonation and articulation (Fant, 1970; Kent and Read, 1992). In phonation, the vibrating vocal folds produce a periodic waveform, the glottal excitation. Due to this inherent periodicity, the spectra of voiced speech sounds consist of a fundamental frequency (F0) and its harmonic integer multiples (i.e., 2 䡠 F0, 3 䡠 F0, etc.). The F0 is determined by the rate of the vocal fold vibration which, in turn, is mainly governed by the size of the vocal folds (Titze, 1994). Humans are able to vary the F0 of speech extensively which enables changes in voice pitch. While the vocal fold vibration rate provides for a considerable dynamic range, the average F0 is about 100 Hz in males, 200 Hz in females, and up to 400 Hz in infants (Kent and Read, 1992). Articulation, then again, alters the shape and, consequently, the resonance frequencies of the oral cavity modifying the sound produced by the vocal folds. These resonances, termed the formants (F1, F2, F3, etc.), are the carriers of the phonemic information of speech. Consequently, the main spectral components of speech are F0 and its harmonics as well as the formant frequencies. While previous research in humans (Kuhl et al., 1992; Schwartz and Tallal, 1980; Studdert-Kennedy and Shankweiler, 1970) and the monkey (Tian et al., 2001) has gained several insights into the rapid cognitive processes involved in understanding speech, tackling the temporal as well as the spatial dynamics of the underlying brain activity precludes, to an extent (Belin et al., 2000; Zatorre et al., 1992), the use of slow hemo- 1300 1301 SPEECH F0 AND CORTICAL DYNAMICS dynamic brain imaging tools. Instead, we must turn to methodology capable of discriminating brain events with a millisecond and millimeter resolution, namely magneto- and electroencephalography (MEG and EEG; Hämäläinen et al., 1993). The majority of MEG and EEG studies on speech, however, have addressed speech perception by focusing only on the recently emerged phonetic aspects of speech (i.e., formant transitions in the F1–F2 space) without considering the role of F0 in conveying the information embedded in speech signals (Bolinger, 1989; Katz and Assmann, 2001; Patel and Balaban, 2001). Several converging lines of investigation, nevertheless, seem to suggest that the F0 of speech should perhaps not be overlooked. For example, by using pure tones and pure tone complexes it has been repeatedly shown that at frequencies below 500 Hz—in the F0 range of spoken language—the latency of the most prominent cortical response, the auditory N1(m), typically occurring 100 ms after stimulus onset, can increase up to 150 ms as stimulus frequency is lowered (Crottaz-Herbette and Ragot, 2000; Roberts et al., 2000; May et al., 1999). This slowing down of information processing at those frequencies where the speech F0 typically lies seems not only counterintuitive, but also inefficient from an evolutionary perspective as humans presumably used F0 for communication purposes before spoken language emerged (Liberman and Whalen, 2000; Rizzolatti and Arbib, 1998; Corballis, 1998). Further evidence for the functional significance of F0 can be drawn from studies which show that the human brain is capable of deducing F0 from the spectrum of the harmonics even if it is missing (Pantev et al., 1989; Ragot and Lepaul-Ercole, 1996). Obviously, there must be an ecologically valid reason for these calculations of F0 to occur, and an investigation into this issue might help explain human pitch perception and phenomena such as phonetic invariance. The need to control fully the stimulus parameters has, to date, apparently been the main justification for using pure tones in cognitive brain research. In some cases, the processing of speech has been approached by studying brain activity elicited by pure tone combinations (Patel and Balaban, 2001; Crottaz-Herbette and Ragot, 2000; Roberts et al., 2000; Ragot and LepaulErcole, 1996), while in others, speech sounds of multiple simultaneously varying parameters have been utilized (Schwartz and Tallal, 1980; Belin et al., 2000; Zatorre et al., 1992; Vihla et al., 2000). Recent advances in sound generation methodology (Alku et al., 1999), however, now enable us to combine the use of ecologically valid speech stimuli with full stimulus control. Using this methodology we recently succeeded in showing that already the presence of inherent periodicity of voiced speech is reflected in cortical brain responses (Alku et al., 2001). In the present study, we set out to investigate how fundamental frequency changes in speech are manifested in MEG recordings and, especially, in the N1m response. More specifically, we studied the cortical brain events elicited by speech sounds whose F0 ranged from that of a typical male speaker (i.e., 100 Hz) to that of a typical female (200 Hz) and infant (270 Hz) speaker. Using these three F0 values, the subjects (N ⫽ 13) were presented with a vowel, a frequencymatched counterpart of the vowel with no phonetic content (“pseudo-vowel” lacking in articulatory F1–F2 contribution), and a pure tone. METHODS Subjects. Thirteen volunteers (laboratory personnel and paid students, mean age 32 years, 2 females) participated in the study with informed consent. All were right-handed and none had a history of hearing or neurological disorders. The study was approved by the ethical committee of the Helsinki University Central Hospital. Stimulus preparation and presentation. The stimuli of the study consisted of three types of sounds: vowels, pseudo-vowels, and pure tones (Fig. 1). In each category, three stimuli of different fundamental frequency (F0) values were constructed. In order to produce sounds that differ only in terms of F0, we used a novel stimulus generation methodology, semi-synthetic speech generation (Alku et al., 1999). This synthesizes a speech sound by first estimating the excitation waveform of voiced speech generated by the fluctuation of the vocal folds—the glottal pulseform— from a natural utterance. By using this natural excitation waveform as an input to an artificial vocal tract modeled by a digital all-pole filter it is then possible to synthesize highly natural sounding, yet completely controllable speech stimuli. The stimulus production was started by creating a Finnish /a/ vowel with its F0 representing typical male speech (F0 ⫽ 100 Hz). The lowest four resonances of the vocal tract were adjusted to create the following formant pattern: F1 ⫽ 800 Hz, F2 ⫽ 1120 Hz, F3 ⫽ 1440 Hz, and F4 ⫽ 3100 Hz. By shortening the lengths of the consecutive cycles in the glottal excitation of this sound, it was possible to produce representatives of the same vowel by changing F0 only. With this principle, we synthesized two versions of the vowel /a/ corresponding to female and child speech with F0 values of 200 and 270 Hz, respectively. The second type of stimulus, the pseudo-vowel, was created as a composite of sinusoidals by mimicking the spectra of the three /a/ vowels of different F0 values. In making these signals, each tone was selected to match precisely (both in frequency and in amplitude) the harmonic in the spectrum of the corresponding vowel. The highest harmonic to be matched by a tone was limited to occur halfway between the F1 and F2 of the corre- 1302 MÄKELÄ ET AL. FIG. 1. Spectra on the three stimulus types shown with the fundamental frequency value F0 ⫽ 270 Hz. The spectrum of the pure tone (top) consists of only a single frequency component located at F0. The spectrum of the pseudo-vowel (middle) comprises the F0 and its integer multiples, the harmonics, up to the midway between the first and second formant of the vowel spectrum (bottom). The level difference of consecutive harmonics in the spectrum of the pseudovowel equals that in the vowel spectrum. (The bandwidth of the sounds was 11025 Hz, but the spectra are depicted over the frequency span of 5 kHz in order to clarify differences in the structure of lowest harmonics.) sponding vowel /a/ in order not to make the perceptual quality of the sound too close to that of the vowel /a/. Finally, the third stimulus type, the pure tone, was produced by using single frequencies corresponding to the F0 values of 100, 200, and 270 Hz. The stimuli were presented in separate sequences, at a constant interstimulus interval of 800 ms, with ⬎150 repetitions per stimulus type. The sequences were presented in pseudorandom order and the presentation order was counterbalanced across subjects. The stimulus duration was 200 ms and stimulus intensity was adjusted to 75 dB SPL. MEG recording and data analysis. Event-related magnetic fields were recorded using a whole-head magnetometer (Neuromag ⫺122, Neuromag Ltd., Helsinki, Finland) while the subject, instructed not to pay attention to the auditory stimulation, sat in a reclining chair and watched a silent movie. The stimuli were binaurally delivered through plastic tubes and ear pieces. The evoked responses were filtered with a passband of 1–30 Hz and averaged over a period of 600 ms, includ- ing a 100-ms prestimulus baseline. Electrodes monitoring both horizontal and vertical eye movements were used in removing artifacts defined as the absolute value exceeding 150 V. The N1m response was studied with respect to possible effects in amplitude, latency, and location (data of two subjects was rejected from further analyses due to poor signal-to-noise ratio). The analyses were carried out both for response waveforms and for estimated equivalent current dipoles (ECDs). The waveform analysis was carried out for the vector sum of the channel pair showing the most prominent activity at 100 ms, separately in each hemisphere and for each subject. For ECD estimation, a subset of 34 sensors over either the left or the right temporal brain areas were used in an unconstrained ECD fitting, with a goodness-of-fit ⱖ70% qualifying the data for further analyses. The ECD locations were estimated in a threedimensional coordinate system defined by the x-axis passing through the preauricular points (positive to the right), the y-axis passing through the nasion, and the z-axis as the vector cross-product of the x and y unit vectors. Statistical analyses of the data were performed using repeated measures ANOVA and Newman–Keuls post hoc tests. Finally, in order to illustrate the spatial extent of the activated cortical areas at the moment of the peak amplitude of the N1m response we used the minimum current estimate method (MCE; Uutela et al., 1999). RESULTS Figure 2 shows the spatial extent of cortical activation of a typical subject, illustrated with MCE, at the moment of the N1m peak amplitude for the three stimuli and the three fundamental frequencies used. It can be seen that all nine stimuli activated the same brain areas in the auditory cortex of each hemisphere. The ECD analysis further revealed that on the average, the speech stimuli elicited somewhat stronger activity (ECD strength 26 nAm) than the pure tones (ECD strength 21 nAm); however, the difference was not statistically significant (F(8, 2) ⫽ 2.42, P ⫽ ns). Neither hemispheric lateralization (F(10, 2) ⫽ 0.140, P ⫽ ns) nor location effects of stimulus type (F(10, 2) ⫽ 2.58, 0.31, 1.10 for x, y, and z coordinates, respectively; P ⫽ ns for all cases) were observable at this stage of auditory processing. Figure 3 summarizes the key finding of our study: the latency of the N1m exhibited highly systematic and statistically significant behavior as a function of stimulus type (F(2, 16) ⫽ 40.64, P ⬍ 0.001) and frequency (F(2, 16) ⫽ 10.76, P ⬍ 0.01), but no hemispheric distribution asymmetry (F(1, 8) ⫽ 0.67, P ⫽ ns). The latency of the N1m elicited by the vowel and the pseudo-vowel remained constant for all F0 values (P ⫽ ns), with the N1m always peaking at around 120 ms 1303 SPEECH F0 AND CORTICAL DYNAMICS FIG. 2. The spatial extent of cortical activation at the moment of the N1m peak amplitude. The vowel, pseudo-vowel, and tone stimuli elicited strong and localized activity in the supratemporal brain areas, as revealed by minimum current estimate modeling (MCE; Uutela et al., 1999; baseline, ⫺100 – 0 ms, data detrended and 40-Hz low-pass filtered, the center values of region of interest (ROI) for the x, y, and z coordinates are ⫺57, 9, and 51 mm in the left and 53, 7, and 51 mm in the right hemisphere). Cortical activations at the peak of the N1m response (t ⫽ latency of the response in milliseconds) for F0 values representing infant (270 Hz), female (200 Hz), and male (100 Hz) speech and pure tones of corresponding frequencies are depicted. after stimulus onset. There were no latency differences between the responses evoked by the pseudo-vowels and the vowels with the same F0 value (P ⫽ ns). This invariance was in striking contrast with the latency variation of the N1m responses elicited by the pure tones. In this case, the latency increased from 125 to 160 ms as a function of decreasing F0 (P ⬍ 0.001 for 270 vs 100 Hz and 200 vs 100 Hz and P ⬍ 0.05 for 270 vs 200 Hz). The latencies of the N1m responses evoked by the pure tones differed significantly from those evoked by the vowels and pseudo-vowels of the corresponding F0 values (P ⬍ 0.05 for vowel vs tone at 200 Hz; P ⬍ 0.001 for vowel vs tone and pseudo-vowel vs tone at 100 Hz). DISCUSSION FIG. 3. Latency behavior of the auditory N1m. An analysis of the temporal dynamics of the N1m revealed, both at the single-subject level and at across-subject averages, that the human brain times its response to auditory stimuli according to their spectral structure. Changes in the frequency of a pure tone result in a large latency variation of the N1m response, whose latency is shifted from 125 to 160 ms as the tone frequency is decreased from 270 to 100 Hz. When natural speech sounds matching the pure tone frequency in their F0 are presented, this latency variation is abolished, and both the left and the right hemispheres respond to the vowels and the pseudovowels with a constant latency of 120 ms (bars indicate the standard error of the mean). In conclusion, our results show that natural speech and artificial pure tone stimuli are processed differently already in the auditory sensory areas of the human brain. Whereas pure tones activate the auditory cortex with a latency which strongly depends on tone frequency (Crottaz-Herbette and Ragot, 2000; Roberts et al., 2000; May et al., 1999; Ragot and Lepaul-Ercole, 1996), natural speech sounds (with or without phonetic content) activate the auditory cortex at a constant latency of about 120 ms. Importantly, the timing of this activation is independent of the value of F0, that is, of whether the speech sound is that of a typical male, female, or infant speaker. Thus, the categorization of sounds as speech or nonspeech takes place very rapidly in dedicated neuronal circuits of both the left and the right temporal lobes. As these results were obtained in a passive listening condition, it seems that this categorization process occurs even without attentional engagement. Moreover, it can be directly observed through the behavior of the N1m response, without possibly distorting subtraction procedures (Vihla et al., 2000; Tiitinen et al., 1994; Näätänen et al., 1997). Furthermore, with regard to pho- 1304 MÄKELÄ ET AL. netic invariance, the N1m latency behavior indicates that the large variance due to changes in F0 is effectively canceled out, or filtered, via a constant-latency brain process related to speech categorization. As both the right and the left hemispheric responses were of equal amplitude, our results corroborate our previous observations (Alku et al., 2001) and suggest that sensory analyses of constant-frequency sounds in the unattended condition might not make use of the hemisphere-specific processes involved in analyzing more complex speech stimuli in various experimental settings (Zatorre et al., 1992; 2002; Zatorre and Belin, 2001; Belin et al., 2000; Roberts et al., 2000; Shtyrov et al., 2000; Gandour et al., 1998; Poeppel et al., 1996, 1997; Schwartz and Tallal, 1980). Furthermore, the present observations suggest that within each hemisphere, sensory analysis of pure tones and speech sounds takes place in the same cortical area. Whether this area is the primary auditory cortex (the medial portion of the Heschl’s gyri; Hari et al., 1980; Alho, 1995; Shreiner et al., 2000; Crottaz-Herbette and Ragot, 2000) or the secondary auditory areas in the immediate vicinity of the primary area (the anterolateral portions of the Heschl’s gyri and planum temporale; Johnsrude et al., 2000; Binder et al., 2001; Jäncke et al., 2002) remains to be established, probably through the combination of structural imaging, hemodynamic, and electromagnetic brain research tools (MRI, fMRI, and MEG/EEG). While pure tones have proven to be useful in revealing certain basic properties of the auditory cortex, such as tonotopic (Romani et al., 1982) and amplitopic (Pantev et al., 1989) organization, the approach used in previous research might not be suitable for helping us to understand the speech-related skills the human brain has developed over the evolutionary time scale. For example, with regard to the demonstrations of N1m latency variation for low-frequency pure tones (Crottaz-Herbette and Ragot, 2000; Roberts et al., 2000; May et al., 1999; Ragot and Lepaul-Ercole, 1996), the human brain, accustomed to dealing with speechspecific F0 and F1–F2 parameter spaces, might consider “simple” sinusoids so unusual and difficult that processing latency delays up to 50 ms (Roberts and Poeppel, 1996) have been observed. The two speech sounds of the present study were, nevertheless, treated similarly by auditory cortex. This can be attributed to the presence of the F0 and its harmonics in both sounds. Conversely, the different treatment given to speech sounds and pure tones can be explained by their spectral discrepancies: As the spectrum of high-pitched speech is characterized by a sparse harmonic structure, a major portion of the sound energy is located at the fundamental while the upper harmonics are attenuated (due to the large distance between integer multiples of F0). Consequently, there is a large spectral similarity between high- pitched speech sounds and pure tones. However, in the case of voices of moderate or low pitch, there is a large spectral discrepancy between a speech sound and a pure tone of an equal F0. A low-pitched speech sound, in addition to containing a strong frequency component at F0, also comprises the harmonics whose amplitudes are close to the amplitude of F0 due to the short frequency distance between the harmonics. To summarize, variations in F0 can be used to demonstrate that speech and nonspeech sounds are distinguished from one another at a very early stage of cortical processing. Furthermore, as this speech-specific cortical processing can be attributed to the acoustic properties of natural speech, determined by the anatomy and functioning of the human sound-production apparatus, it becomes obvious that speech production and perception are inseparably linked. These properties and their conjunctions, peculiar to the species-specific ability of humans to produce and understand spoken language, should be taken into careful consideration in future speech perception studies. ACKNOWLEDGMENTS This research was supported by the Academy of Finland, the University of Helsinki, and the Jenny and Antti Wihuri Foundation. We thank Mr. C. Bailey, Mr. S. Monto, and Prof. G. Nyman for their help in the preparation of the manuscript. REFERENCES Alho, K. 1995. Cerebral Generators of Mismatch Negativity (MMN) and its magnetic counterpart (MMNm) elicited by sound changes. Ear Hearing 16: 38 –51. Alku, P., Sivonen, P., Palomäki, K., and Tiitinen, H. 2001. The periodic structure of vowel sounds is reflected in human electromagnetic brain responses. Neurosci. Lett. 298: 25–28. Alku, P., Tiitinen, H., and Näätänen, R. 1999. A method for generating natural-sounding stimuli for cognitive brain research. Clin. Neurophys. 110: 1329 –1333. Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., and Pike, B. 2000. Voice-selective areas in human auditory cortex. Nature 403: 309 – 312. Binder, J. R., Frost, J. A., Hammeke, T. A., Bellgowan, P. S. F., Springer, J. A., Kaufman, J. N., and Possing, E. T. 2000. Human temporal lobe activation by speech and nonspeech sounds. Cereb. Cortex 10: 512–528. Bolinger, D. 1989. Intonation and Its Uses: Melody in Grammar and Discourse, Stanford Univ. Press, Stanford. Corballis, M. C. 1998. Cerebral asymmetry: Motoring on. Trends Cogn. Sci. 2: 152–157. Crottaz-Herbette, S., and Ragot, R. 2000. Perception of complex sounds: N1 latency codes pitch and topography codes spectra. Clin. Neurophys. 111: 1759 –1766. Dehaene-Lambertz, G. 1997. Electrophysiological correlates of categorical phoneme perception in adults. NeuroReport 8: 919 –924. Dehaene-Lambertz, G., and Dehaene, S. 1994. Speed and cerebral correlates of syllable discrimination in infants. Nature 370: 292– 295. Eimas, P. D., Miller, J. L., and Jusczyk, P. W. 1987. In Categorical Perception (S. Harnad, Ed.), pp. 161–188. Cambridge Univ. Press, New York. SPEECH F0 AND CORTICAL DYNAMICS Fant, G. 1970. Acoustic Theory of Speech Production. Mouton, The Hague. Gandour, J., Wong, D., and Hutchins, G. 1998. Pitch processing in the human brain is influenced by language experience. NeuroReport 9: 2115–2119. Hämäläinen, M., Hari, R., Ilmoniemi, R. J., Knuutila, J., and Lounasmaa, O. V. 1993. Magnetoencephalography: Theory, instrumentation, and applications to noninvasive studies of the working human brain. Rev. Mod. Phys. 65: 413– 497. Hari, R., Aittoniemi, K., Järvinen, M.-L., Katila, T., and Varpula, T. 1980. Auditory evoked transient and sustained magnetic fields of the human brain. Exp. Brain. Res. 40: 237–240. Jäncke, L., Wüstenberg, T., Scheich, H., and Heinze, H.-J. 2002. Phonetic perception and the temporal cortex. NeuroImage 15: 733–746. Johnsrude, I. S., Penhune, V. B., and Zatorre, R. J. 2000. Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain 123: 155–163. Katz, W. F., and Assmann, P. F. 2001. Identification of children’s and adults’ vowels: Intrinsic fundamental frequency, fundamental frequency dynamics, and presence of voicing. J. Phonetics 29: 23–51. Kent, R. D., and Read, C. 1992. The Acoustic Analysis of Speech, Singular, San Diego. Kuhl, P. K., Williams, K. A., Lacerda, F., Stevens, K. N., and Lindblom, B. 1992. Linguistic experience alters phonetic perception in infants by 6 months of age. Science 255: 606 – 608. Liberman, A. M., and Mattingly, I. G. 1989. A specialization for speech perception. Science 243: 489 – 494. Liberman, A. M., and Whalen, D. H. 2000. On the relation of speech to language. Trends Cogn. Sci. 4: 187–196. May, P., Tiitinen, H., Ilmoniemi, R. J., Nyman, G., Taylor, J. G., and Näätänen, R. 1999. Frequency change detection in human auditory cortex. J. Comp. Neurosci. 6: 99 –120. Näätänen, R., Lehtokoski, A., Lennes, M., Cheour, M., Huotilainen, M., Iivonen, A., Vainio, M., Alku, P., Ilmoniemi, R. J., Luuk, A., Allik, J., Sinkkonen, J., and Alho, K. 1997. Language-specific phoneme representations revealed by electric and magnetic brain responses. Nature 385: 432– 434. Pantev, C., Hoke, M., Lehnertz, K., and Lütkenhöner, B. 1989. Neuromagnetic evidence of an amplitopic organization of the human auditory cortex. Electroenceph. Clin. Neurophys. 72: 225–31. Pantev, C., Hoke, M., Lütkenhöner, B., and Lehnertz, K. 1989. Tonotopic organization of the auditory cortex: Pitch versus frequency representation. Science 246: 486 – 488. Patel, A. D., and Balaban, E. 2001. Human pitch perception is reflected in the timing of stimulus-related cortical activity. Nat. Neurosci. 4: 839 – 844. Poeppel, D., Phillips, C., Yellin, E., Rowley, H. A., Roberts, T. P. L., and Marantz, A. 1997. Processing of vowels in supratemporal auditory cortex. Neurosci. Lett. 221: 145–148. 1305 Poeppel, D., Yellin, E., Phillips, C., Roberts, T. P. L., Rowley, H. A., Wexler, K., and Marantz, A. 1996. Task-induced asymmetry of the auditory evoked M100 neuromagnetic field elicited by speech sounds. Cogn. Brain Res. 4: 231–242. Ragot, R., and Lepaul-Ercole, R. 1996. Brain potentials as objective indexes of auditory pitch extraction from harmonics. NeuroReport 7: 905–909. Rizzolatti, G., and Arbib, M. A. 1998. Language within our grasp. TINS 21: 188 –194. Roberts, T. P. L., Ferrari, P., Stufflebeam, S. M., and Poeppel, D. 2000. Latency of the auditory evoked neuromagnetic field components: Stimulus dependence and insights toward perception. J. Clin. Neurophys. 17: 114 –129. Roberts, T. P. L., and Poeppel, D. 1996. Latency of auditory evoked M100 as a function of tone frequency. NeuroReport 7: 1138 –1140. Romani, G. L., Williamson, S. J., Kaufman, L., and Brenner, D. 1982. Characterization of the human auditory cortex by the neuromagnetic method. Exp. Brain Res. 47: 381–393. Schreiner, C. E., Read, H. L., and Sutter, M. 2000. Modular organization of frequency integration in primary auditory cortex. Annu. Rev. Neurosci. 23: 501–529. Schwartz, J., and Tallal, P. 1980. Rate of acoustic change may underlie hemispheric specialization for speech perception. Science 207: 1380 –1381. Shtyrov, Y., Kujala, T., Lyytinen, H., Ilmoniemi, R. J., and Näätänen, R. 2000. Auditory cortex evoked magnetic fields and lateralization of speech processing. NeuroReport 11: 2893–2896. Studdert-Kennedy, M., and Shankweiler, D. 1970. Hemispheric specialization for speech perception. J. Acoust. Soc. Am. 48: 579 –594. Tian, B., Reser, D., Durham, A., Kustov, A., and Rauschecker, J. P. 2001. Functional specialization in Rhesus monkey auditory cortex. Science 292: 290 –293. Tiitinen, H., May, P., Reinikainen, K., and Näätänen, R. 1994. Attentive novelty detection in humans is governed by pre-attentive sensory memory. Nature 372: 90 –92. Titze, I. 1994. Principles of Voice Production, Prentice Hall, NJ. Uutela, K., Hämäläinen, M., and Somersalo, E. 1999. Visualization of magnetoencephalographic data using minimum current estimates. NeuroImage 10: 173–180. Vihla, M., Lounasmaa, O. V., and Salmelin, R. 2000. Cortical processing of change detection: Dissociation between natural vowels and two-frequency tones. Proc. Natl. Acad. Sci. USA 97: 10590 – 10594. Zatorre, R. J., and Belin, P. 2001. Spectral and temporal processing in human auditory cortex. Cereb. Cortex 11: 946 –953. Zatorre, R. J., Belin, P., and Penhune, V. B. 2002. Structure and function of auditory cortex: Music and speech. Trends Cogn. Sci. 6: 37– 46. Zatorre, R. J., Evans, A. C., Meyer, E., and Gjedde, A. 1992. Lateralization of phonetic and pitch discrimination in speech processing. Science 256: 846 – 849.