Introduction This document outlines a functional magnetic resonance imaging experiment designed to look at specific aspects of auditory representation of speech sounds. The experiment deals with the neural response to a sudden disruption in the auditory feedback loop, as elicited by an unexpected acoustic shift of speech in real-time. Background When infants first learn to speak, they are learning a mapping between speech gestures and the sounds these gestures allow them to produce. Their own voices act as auditory feedback, enabling a precise tuning, over time, of the motor-acoustic mapping. Adults with years of speech practice have well-tuned neural mappings. When there is a disconnect between the expected and observed acoustic consequences of an articulatory gesture, feedback control allows detection and then correction of the error. By influencing subjects’ perception of their own speech, we can induce such a discrepancy and elicit compensatory movements that counteract the perceived error. Introducing gradual shifts in formant structure1 or pitch2 causes subjects to gradually adapt to the perturbation, producing speech with an opposing shift. In addition, recent experiments3,4 have shown subjects’ ability to compensate for rapidly introduced perturbations, both acoustic and motor. In this experiment, sudden auditory perturbations of subjects’ speech will occur in an attempt to induce activity in the auditory error cells that detect this disconnect. Furthermore, we will attempt to distinguish between Stimuli Stimuli will consist of written text indicating the word the subject should speak aloud for each trial. The stimulus set will include one-syllable CVC words. Furthermore, on a fraction of trials, randomly chosen, the subject’s speech will be abruptly perturbed; that is, the vowel’s first formant will be shifted either up or down by approximately 200 Hz. We have chosen 20%. It is slightly larger than the 15% guideline used in other standard/deviant experiments, but because the direction of the shift is unpredictable (10% in each direction at random), we are using a larger percentage of deviants to maximize usable data while staying within the time constraints of an fMRI experiment. The direction and degree of perturbation depends on the asymmetry of the subject’s phonetic boundaries. A subject whose normal production of “bet” is acoustically closer to their æ/ε boundary than their I/ε boundary will experience the percept shown in gray: an upward shift will be heard as “bat,” whereas a downward shift of the same acoustic amount will be heard as a bad version of “bet.” However, a subject with different phonetic boundaries, or a different vowel pronunciation, could experience the reverse percept represented in white. Subjects will be screened prior to scanning to establish their phonetic boundaries, thus determining which directions and degrees of shift are appropriate. Text stimuli “bet” “putt” Up-pert bat [bæt] bet *[bεt] pot [pɑt] putt *[pʌt] Down-pert bet *[bεt] bit [bIt] putt *[pʌt] put [pƱt] Table 1. Possible stimulus percepts per condition Condition Unperturbed Up-pert Down-pert # stim presentations [bVt] [pVt] 64 64 8 8 8 8 Total 128 16 16 160 Table 2. Number of stimulus presentations per condition The stimuli will be presented visually for ~2 seconds. Eighty (80) repetitions of each word will be presented. Following the word presentation, the subject will utter the word shown on the visual display. There will be a 2-second delay after the start of stimulus presentation to give the subject time to speak the one-syllable word. During this execution phase the perturbation will be introduced: 10% of trials will be accompanied by a downward shift and 10% by an upward shift. The other 80% of trials will remain unperturbed. After the 2-second delay, the stimulus presentation software will trigger the scanner to collect two volumes of functional data. There will then be a pause of 10 seconds before the next trial to allow for the return of the BOLD signal to the steady state. Timeline for single trial. The inter-stimulus interval is approximately 15-18 seconds. 150 trials x ~15-18 seconds = ~38-45 minutes total experiment length (need to add breaks) 5 runs of 30 trials x ~18 seconds/trial = ~9 minutes per run 24 unperturbed trials 3 up-pert 3 down-pert 160 trials x 20% perturbed stimuli = 32 deviant trials in 4 (maximum) conditions Experimental Protocol The experiment is event-related, using the triggering mechanism used in past studies. Because the image acquisition is timed to occur several seconds after the stimulus onset, subjects speak in relative silence. The acquisition parameters will be typical of those used in previous experiments (echo planar imaging, 30 slices covering the entire cortex and cerebellum aligned to the AC-PC line, 5 mm slice thickness, 0 mm gap between slices, flip angle = 90°). Subjects Subjects will consist of right-handed men and women, ages 18-55, whose first language is American English. References Houde and Jordan (1998). Sensorimotor adaptation in speech production. Science 279:121316. [1] Jones JA and Munhall KG (2000). Perceptual calibration of F0 production: Evidence from feedback perturbation. J. Acoust. Soc. Am. 108(3):1246-51. [2] Xu et al. (2004). Compensation for pitch-shifted auditory feedback during the production of Mandarin tone sequences. J. Acoust. Soc. Am. 116(2):1168-78. [3] Tourville et al. (?). Effects of acoustic and articulatory perturbation on cortical activity during speech production. (poster, APE/PPB study) [4]