Journal of Speech and Hearing Research, HILLENBRAND,Volume 26, 268-282, June 1983 I PERCEPTUAL ORGANIZATION OF SPEECH SOUNDS BY I N F A N T S JAMES HILLENBRAND Nort] cestern Unicersit!! Euanston Illinois' An operant head-turn procedure was used to test whether 6-month-old infants recognize the auditor' similariW of speech sounds sharing a value on a phonetic-feature dimension. One group of ini:ants was reiniorced }br head turns when a change occurred from a series of repeating background stimuli containing nasal consonants ([m, n, rj]) to repetitions from a categoD' of syllables containing voiced stop consonants ([b, d, g]), or to a change from stops to nasals. The stiluuli were naturally produced b'v both male and fbmale talkers. The perfbnnance of infants in this "phonetic" group was compared to that of infants in a "nonphonetic" control group. Using the salne procedures, these inf~ants were reinforced for head turns to a group of phonetically unrelated speech sounds. Results indicated that the perfonnance of infants in the group trained on phonetically related speech sounds was far superior to that of"infants in the nonphonetie control group. These findinKs suggest that prelinguistic infants can perceptually organize speech sounds on the basis of auditory properties related to feature simflaritv. A major focus of speech-perception research over the past several decades has been an attempt to define phonetic categories in terms of acoustic properties--for example, to specify the acoustic attributes that define or "cue" the segment [g], or the feature [velar], in all the contexts in which it occurs. Much of the literature in this area has suggested that the critical cues to phonetic categories are often highly variable with changes in "context." The physical cues to speech-sound categories have been found to vary with changes in noncritical dimensions such as the phonetic environment in which the segment appears, the position that the segment occupies within the syllable, and the talker who produces the utterance. These results, combined with a variety of other findings, have led some investigators to theorize that the cues to phonetic categories are not derived from the physical signal in a direct way. Specifically, the suggestion has been made that the perception of speech is mediated in some way by knowledge of how speech is produced. According to this view, the speech waveform is assumed to be interpreted in terms of the articulatm7 gestures that were used to produce the signal (Liberman, 1970; Liberman, Cooper, Shankweiler, & StuddertKennedy, 1967; Stevens & House, 1972). Other investigators have argued that attempts to relate phonetic categories to the acoustic signal have failed to account seriously for the psychophysical processes involved in the coding of complex auditory signals. According to this point of view, invariant acoustic cues to phonetic categories can, in fact, be derived from the physical signal without appealing to articulatory knowledge (Fant, 1967; Kuhl, 1979a; Miller, Engebretson, Spenner, & Cox, 1977; Searle, Jacobson, & Rayment, 1979; Stevens & Blumstein, 1978). Speech-perception research with infants can provide specific kinds of evidence on the contention that articulatory knowledge is a necessary condition for the categorization of speech sounds. The reasoning is rela© 1983, American Speech-Language-Hearing Association tively simple; Since prelinguistic infants are not assumed to possess sophisticated knowledge about the p r o d u c t i o n of speech, d e m o n s t r a t i o n s of p h o n e t i c categorization by infants will indicate the limits of the type of articulatory knowledge likely to be involved in this process. In a recent series of experiments, Kuhl and her associates (Kuhl & Miller, 1982; Kuhl, 1977; 1979b; Holmberg, Morgan, & Kuhl, 1977; Kuhl & Hillenbrand, Note 1), attempted to determine the extent to which y o u n g infants recognize similarities among speech sounds when variations are introduced in noncritical dimensions. For example, an experiment by Kuhl (1979b) demonstrated that 6-month-old infants could detect a change from one category, of vowels to another when the tokens varied randomly in talker and pitch contour. Inihnts in this experiment were initially trained to make a head turn for a visual reward when a change occurred from repetitions of a single token of [aJ, synthesized to simulate a male voice with a falling pitch contour, to repetitions of a single token of Ill, produced by the same male "talker" with the same pitch contour. The infants were then graduaIly exposed to a number of novel tokens synthesized to simulate female and child talkers with either falling or rising pitch contours. The results showed that infants readily transferred learning from the tokens produced by the male talker to the novel tokens produced by female and child talkers. Similar experiments have tested the perception of an [a]-[a] contrast across variations in talker and pitch contour (Kuhl, 1977), fricative contrasts across variations in vowel context and talker (Holmberg et al., 1977), a nasal-consonant place contrast across variations in vowel context and talker (Hillenbrand, 1980, Note 2), and, using a different version of the operant head-turn procedure, a stop-consonant place contrast across variations in vowel context (Fodor, Garrett, & Brill, 1975), To date, infant research on phonetic categories has focused exclusively on the infant's ability to recognize 268 0022-4685/83/2602-0268501.00/0 HILLENBRAND: Infants" Organization of Speech Sounds phonetic similarity at the level of the phone, or phonetic segment. The purpose of the present study was to extend these findings and to test infants on their ability to organize speech sounds at the more abstract level of the phonetic feature. The feature contrast was a stop/nasal distinction: [b, d, g] versus [m, n, r3]. This contrast seemed like a logical starting point for testing feature perception in infancy for two reasons. First, a good deal is known about the physical correlates of this distinction. During the occlusion portion of nasal consonants, a nasal murmur is produced that is characterized by (a) a lowfrequency first resonance at 200-300 Hz, well separated from higher formants; (b) relatively high damping factors (large formant bandwidths and low formant levels); and (c) an antiformant that varies in frequency with place of articulation (Fant, 1960; Fujimura, I962), Voiced stop consonants, on the other hand, (a) do not show a nasal murmur (although a low-frequency "voice bar" may be present during the occlusion), (b) are characterized by aperiodic release bursts, and (c) typically show more rapid changes in amplitude following release than nasal consonants (Fant, 1960). A second reason for studying the stop/nasal eontrast is that information is available on infants' discrimination of stop and nasal consonants. Evidence is available to show that infants can discriminate individual pairs of speech sounds differing in stopconsonant place of articulation (Eimas, 1974; Morse, 1972), nasal-consonant place of articulation (Hillenbrand, Note 2), and a stop-nasal manner-class contrast (Eimas & Miller, 1980). The present study examined the ability of infants to eategorize speech sounds according to the stop-nasal distinction. In other words, the study was designed to determine whether infants recognize that the stops [b, d, g] are similar to one another and distinct from a class eonsisting of the nasals [m, n, 0]. METHODS The general approach of the study was similar to the transfer-of-learning experiments by Kuhl and her colleagues (Kuhl, 1977; i979b; Holmberg et al., 1977; Kuhl & Hillenbrand, Note 1). One group of 6-month-old infants was visually reinforeed for head-turn responses when a change occurred from a background category of syllables containing nasal consonants ([m, n, D]) to a comparison category of syllables containing voiced stop consonants ([b, d, g]), or to a change from stops to nasals. The speech sounds were produced by both male and female talkers. The performance of infants in this "phonetic" group was compared to the performance of a separate group of infants run in a procedurally identical " n o n p h o n e t i c " condition. T h e s e infants w e r e tested using the same pool of stimuli used in the phonetic condition, but the stimuli were assigned to reinforced and unreinforced categories in such a way that the categories could not be organized according to phonetic attributes or talker. The procedure, which is described in detail below, 269 used a visual reward to train an infant to make a headturn response when a change occurred from a class of repeating background stimuli to repetitions from a comparison category. The experimental stages for the phonetic condition are shown in Table 1. The first stage contrasted a single token of [ma] with a single token of [ba]. TABLE 1. Experimental stages for the phonetic condition. Stage 1 : Initial training 2: Place variation 3: Talker x Place 4: Transfer of learning Categor~j 1 ba ba da ba da ba da ba da ga ba da 9a (M) (M) (M) (M) (M) (F) (F) (M) (M) (M) (F) (F) (F) Categor~ d2 ma rna na ma na ma na ma na Da rna na Da (M) (M) (M) (M) (M) (F) (F) (M) (M) (M) (F) (F) (F) Both syllables were naturally produced by the same male voice. In the second stage, postdental consonants were added to each class; that is, [ma] and [na] were contrasted with [ba] and [da]. In the third stage, labial and postdental consonants produced by a female voice were added to eaeh category. In the fourth and final stage, velar consonants were added to each class, resulting in a contrast between male and female [m, n, D] and male and female [b, d, g]. Half of the infants were trained with the stop consonants as the comparison category, and half were trained with the nasal consonants as the comparison category. In the final stage of the experiment, the infant's task was to make a head-turn response whenever a change occurred from a category- of nasal consonants to a category of voiced stop consonants--or from stop consonants to nasal consonants--independent of random variation in place of artieulation and talker. If subjects in this task succeeded in responding to the stimuli in the comparison category, it would be tempting to conclude that the infants recognized the similarity of speech sounds sharing a phonetic-feature value. It is possible, however, that infants might simply memorize which tokens were reinforced and which ones were not. Memorizing tokens, of course, would not n e c e s s a r i l y require a p e r c e p t u a l grouping of the stimuli. To test for this possibility, the performance of infants run in the phonetic task described above was compared to the performanee of a separate group of infants run in a nonphonetie condition. In the nonphonetie condition categories were arranged in such a way that the six stimuli in each class could not be organized according to phonetic or acoustic characteristics. Subjects were tested using the same procedures and equipment, plus the same pool of 12 stimuli as in 270 Journal of Speech and Hearing Research 26 TABLE 2. Experimental stages for the nonphonetie condition. Stage Category 1 Category2 ba ba oa ba (F) (F) (M) (F) na na 0a na (M) (M) (F) (M) ~a (M) ga (F) da ma ba ~a da ma ga na (F) (M) (F) (M) (F) (M) (M) (F) ma (F) ba (M) na (M) ga (F) ma (F) ba (M) da (M) 1ja (F) the phonetic condition. The experimental stages for the nonphonetic condition are shown in Table 2. Subjects were initially trained on a relatively gross contrast between a male [na] and a female [ba]. The subsequent stages were analogous to those of the phonetic condition in terms of the number of tokens added in each stage. However, sounds were added in such a way that, by the final stage, it was not possible to organize the stimuli along any simple dimension: Each class included an equal n u m b e r of stops and nasals, male voices and female voices, labials, postdentals, and velars. As in the phonetic condition, half of the subjects were trained with category 1 as the comparison class and the other half with category 2. It was reasoned that the only way an infant could succeed on this task was to memorize which individual stimuli were reinforced and which ones were not. If the performance of infants in the phonetic group proved to be superior to that of the nonphonetic group, the effect could be attributed to perceptual categorization of the speech sounds by infants in the phonetic group. Stimuli The stimuli were naturally produced tokens of[m, n, rj, b, d, g] in prevocalie position with the vowel [a]. One adult male and one adult female produced several tokens of each syllable. Audio recordings were made in a sound-treated booth with a cardioid microphone (Sennheiser MKH 415T-U) and a high-quality full-track recorder (Nagra 4.2). The talkers were instructed to produce all stimuli with approximately equal durations, intensities, and slightly falling pitch contours. A VU meter was used to monitor intensity. The recorded stimuli were digitized and stored in the disk memory of a digital computer (DEC PDP 11/10). A sample rate of 20 kHz was used with a maximum amplitude resolution of eight bits within a ±4-V dynamic range. All signals were low-pass filtered at 8 kHz and conditioned with an autocorrelator noise-reduction device (Phase Linear 1000). 268-282 June 1983 One token of each syllable produced by the two talkers was selected for use in the discrimination tests. The tokens were chosen by selecting those stimuli that showed the closest match on computer-derived measurements of fundamental frequency contour, intensity contour, and duration. In the final set of stimuli there were no systematic d i f f e r e n c e s b e t w e e n the stop and nasal categories in fundamental frequency, overall RMS intensity, or duration. (Measurements of these stimuli are given in Table A of the Appendix.) Formal listening tests showed that all stimuli were identified reliably by a panel of five adult listeners. Audiotapes for discrimination testing were prepared by recording stimuli from the two categories on separate channels of tape. At the output of the D/A converter, the stimuli were low-pass filtered at 8 kHz, conditioned with an autoeorrelator noise-reduction device (Phase Linear 1000), and recorded with a constant 1.7-see onset-toonset interstimulus interval. The onsets of the stimuli on the two channels of each tape were synchronized using a cueing procedure described by Hillenbrand, Minifie, and Edwards (1979). Gain settings at the input to the tape deck (TEAC 3340-S) were adjusted so that the two stimuli that had contrasted in the initial-training stage balanced for loudness. Calibration Signals were calibrated by a combination of soundlevel measurements and a loudness-balance procedure. The gain setting at the output of the tape deck was adjusted so that the peak intensity of one syllable in the initial-training pair measured 65 dBA, using the fastresponse setting of a sound-level meter (Bruel & Kjaer, Model 2209). A loudness-balance procedure was used to adjust the output gain of the channel carrying the contrasting syllables. An experimenter used an electronic switch to alternate between the two channels. The output gain of the channel carrying the contrasting syllables was adjusted until one adult listener judged that the two signals were equally loud. These same gain settings were used for the experimental conditions involving multiple tokens of the two categories. The loudness balance was checked as part of the daily calibration procedure. Procedures 1. General. A schematic of the experimental site is shown in Figure 1. The infant was held on the parent's lap facing an assistant. An experimenter in an adjacent room controlled the equipment and was able to observe the infant on a video monitor. A loudspeaker (ElectroVoice SP-12) was positioned at a 90° angle to the assistant. In fi-ont of the speaker was an electrically operated stuffed toy bear in a smoked plexiglass box. When activated, the box was illuminated and the bear tapped on a drum. HILLENBRAND: Infants" Organization E-Experimenter A-Assistant P-PIruI I-Infant Ifll-VisuulIleinfercer C.Cameru M-Viiee Muitor of Speech Sounds 271 TRIAL STRUCTURE OBSERVATION INTERVAL PRECHANGETRiAL "NASAL3 j NASAL 1 NASAL2 NASAL41 i STOP 1 STOP4 POSTSTOP2 NASAL 3 NASAL2 NASAL4... I CONTROL TRIAL " NAS AL 3 NASALt NASAL2 NASAL41f NASAL 3 NASAL 1 NASAL21 NASAL 3 NASAL 2 ] I NASAL4. TiME FIGURE 2. Trial structure for the phonetic condition. The figure shows stimuli being presented before, during, and after change and control trials. The subscripts refer to the individual stimuli in the background and comparison categories. The example shown here is for stage 3 of the phonetic condition in which the stop category was reinforced (after Kuhl, 1979b). @ @ TRIAL STRUCTURE equipment : ~ Q OBSERVATION INTERVAL PRE- POST- i CHANGE ,.NASAL5 NASAL2 TRIAL " NASAL2 NASAL21L STOP 1 I STOP 1 CONTROL .NASAL 5 NASAL2 TRIAL " NASAL2 STOP 1 NASAL6 NASAL6 1 NASAL6 " ] I NASAL2 I NASAL4 NASAL4 NASAL41 NASAL 6 NASAL E NASAL 6 . ] TIME FIGURE 1. Experimental site for the visually reinforced headturn procedure (from Kuhl, 1979b). FIGURE 3. Trial structure for the final stage of testing (stage 4). The stimuli are presented in random order, but each stimulus in the order is repeated three times (see Kuhl, 1979b). The experiment was run with a tape deck (TEAC 3340-S) and a logic device. Throughout the entire experinaent, t a p e - r e c o r d e d stimuli were c o n t i n u o u s l y presented at onset-to-onset intervals of 1.7 see. The assistant's task was to keep the infant's attention by manipulating silent toys. When the assistant judged the infant to be in a "ready state," that is, quiet and attending to the toys, he pressed a button signaling the experimenter to initiate a 5-see observation interval. Two kinds of trials could occur during the interval: change trials or control trials. Figure 2 shows stimuli being presented before, during, and after change and control trials for the phonetic condition. During a change trial, a silent switch initiated a change in tape-recorder channels from the repeating background category to three presentations from the comparison category. A hand-held vibrotactile device signaled the start of a 5-sec observation interval to the assistant; a small light mounted on the monitor signaled the start of the interval to the experimenter. If both the experimenter and the assistant judged that a head turn occurred during the observation interval, they independently pressed buttons that activated the visual reinforcer for 3 sec. And-gate circuitry ensured that the reinforcer would be activated only on change trials in which both judges voted during the 5-sec observation interval. During a control interval, the infant continued to hear stimuli from the background category. On control trials, both the experimenter and the assistant made a judgment about the occurrence of a head turn, but reinforcement was not provided, regardless of the infant's response. For the final stage of testing (stage 4), stimuli were presented using a special threerepetition trial structure described by Kuhl (1979b). As shown in Figure 3, the stimuli were presented in random order, but each stimulus in the order was repeated three times. Since a single token was presented on any given trial, this format made it possible to assign the infant's response to a particular stimulus. On both change and control trials the experimenter recorded the stimulus that was presented and the infant's response. For all stages of the e x p e r i m e n t an infant's performanee was measured by comparing the proportion of head turns on change trials to the proportion of head turns on control trials. To reduce the possibility that the parent or assistant might cue the infant's response, and to control for bias in judging head turns, music was presented over earphones to both adults in the test room at a level sufficient to mask a change from one stimulus to another. The experimenter was able to hear the stimuli over an audio monitor in the control room and therefore could have been biased in his judgment of head turns. Experimenter bias in this task would be revealed by his failure to agree with the assistant, who was unbiased. Interjudge agreement for all trials was 98%, indicating that experimenter bias did not play a large role in the judgment of head turns. When the two judges did fail to agree, the trials were always scored as errors. As a further effort to reduce the possibility of bias, an electronic probability generator, set at 50%, was used to determine whether a given observation interval would be a change or control trial. Since previous work with the head-turn procedure suggested that long strings of change and control trials increased the probability of infant errors, the experimenter was instructed to override the probability generator for a single trial after three consecutive change or control trials (see Kuhl, 1979b). 272 Journal of Speech and Hearing Research 2. Conditioning the head-turn response. The headturn response was conditioned by initiating a change trial and, after a few presentations of the comparison stimulus, activating the visual reinforcer. After a variable number of these trials, most infants began to make head turns that anticipated the activation of the visual reinforcer. To be included in the experiment, an infant was required to make three consecutive anticipatory head turns. Subjects were a/lowed a maximum 25 trials to meet the conditioning criterion. Testing on the initialtraining stage was not begun until the infant met the conditioning criterion. Experience with the head-turn procedure has shown that infants who meet the conditioning criterion very quickly will sometimes perform poorly on the initial-training stage. For that reason, all infants were given a minimum of 15 conditioning trials. 3. Progressing subjects through the experiment. An infant advanced from one stage of the experiment to the next when he/she met an accuracy criterion of 9 correct responses in 10 consecutive trials, half being change trials and half being control trials. If an infant did not meet this 9-out-of-10 criterion in 20 trials, he/she was automatically progressed to the next stage of the experiment. When an infant reached the final stage of the experiment, he/she was given as close to 75 trials as possible. A variety of problems prevented this in some eases, including scheduling difficulties, experimenter error, and infants who had become fussy after prolonged testing. The number of trials run on the final stage ranged from 63 to 75, with an average of 68.9 trials. 4. Retraining. It was often the case that infants at various stages of testing would show a marked drop in performanee. In many cases the infant appeared to have forgotten the experimental contingencies or seemed to lose interest in the task. Infants were retrained by the presentation of conditioning trials--change trials in which the visual reinforcer was manually activated if the infant did not respond within about 4 sec of the stimulus change, Two rules controlled the presentation of these retraining trials: 1. A single retraining trial was presented after three consecutive misses on change trials. 2. If after the first 15 trials of a session an infant had missed more than half of the change trials, the next five trials were retraining trials. Regardless of the stage of testing that the infant was in, these retraining trials used the pair of stimuli from the initial-training stage. 5. Testing sessions. A test session was terminated when either the experimenter or the assistant judged that the baby was becoming tired or fussy or at the end of 30 trials. Testing sessions lasted about 10-15 minutes, with an average of 20 trials per session. Infants were usually given all of the trials for a particular experimental stage within the same session. However, if a session was terminated before an infant completed testing on a given stage, testing on the next session would resume where the infant left off. Seven or eight sessions were generally required to complete the experiment. 26 268-282 June 1983 Subjects The subjects were normal 5a/2- to 61/2-month-old infants selected by mail solicitation to parents in the Seattle area. A parent questionnaire was used to screen out infants who (a) had been treated for middle-ear problems, (b) had a family history of congenital hearing loss, or (c) were born more than 2 weeks premature or 2 weeks late. Subjects were assigned randomly to either the phonetic or the nonphonetic group. A total of 23 subjects began testing. Subjects were run until eight infants completed testing in each group. To be included in the study, an infant had to pass the conditioning criterion of three consecutive anticipatory head-turn responses in the first 25 trials of testing. Six subjects failed to pass the conditioning criterion on the [ma]-[ba] contrast for the phonetic study. One additional subject in the phonetic group was eliminated due to an experimenter error, leaving seven subjects in this group instead of eight. The nonphonetic condition offered subjects a much grosset, multidimensional contrast, consequently, only one subject in the nonphonetie condition failed to pass conditioning in the allotted 25 trials. RESULTS The most interesting results of this study come from an analysis of the babies' responses on the final stage of each condition. These analyses are discussed first, followed by a description of the infants' performance on the preliminary stages. Figure 4 displays file percentages of head turns on change trials and on control trials for infants in the phonetic and nonphonetie groups for the final stage of testing. The graph shows that more head turns were observed on change as opposed to control trials for both groups of infants. The trial-type effect, however, was much more pronounced for the phonetic group. Infants in the two groups r e s p o n d e d about equally often on control trials, but the phonetic infants responded much more often on change trials than the nonphonetie infants. A two-way analysis of variance for trial type and group, with repeated measures on the trial-type variable, revealed significant main effects for both trial type (F = 17.4; df = 1, 13; p < .001) and group (F = 8.0; df = 1, 13; p < .01). There was also a significant group x trial-type interaction (F = 7.2; df = 1, 13; p < .05), indicating that the trial-type effect was significantly larger for the phonetic group. Post hoe analysis showed that the trial-type effect was statistieally reliable for both the phonetic group (F = 11.9; df = 1, 6; p < .01) and the nonphonetie group (F = 8.0; df = 1, 7; p < .05). These comparisons indicate that infants in both groups performed significantly above chance on the final stage of testing, but that infants in the phonetic group performed with greater accuracy than those in the nonphonetic group. It was also of interest to determine specifically how the subjects distributed their responses among the individual sounds in the r e i n f o r c e d and u n r e i n f o r e e d HILLENBRAND: 100. 100U.I 8O- oo 70- tU or Z or A uJ "r I-. z LU O or LU O. PHONETIC GROUP 90- z o O. NON-PHONETIC GROUP 50- CO Z o 80. 7060- 50- I ........ 40" 30. y, 8y, 20- ~, 10" ~, liHn 0 III/111/ Ul O. 40~/i////J 30" SUMMARY:PHONETIC GROUP STOPS REINFORCED 90- I ........ 60- Infants' Organizatio~ of Speech Sounds 273 fJJ]JJ]~ bM dM gM bF dF gF mM nM M mF nF nF (16) (32) (10) (17) (19) (20) (16) (16) (13) (15) (17) (13) Change Control STIMULUS ~JJJ]Jf/ 20" 10" 0 Change Control Change Control TRIAL TYPE FIGURE 4. Percent head-turn responses on change and control trials for infants in the phonetic (n = 7) and nonphonetic (n = 8) groups. The data in this figure and in Figures 5-11 are from the final stage of testing (stage 4). FIGURE 5. Percent head-turn responses to each of the stimuli presented during change trials (shaded columns) and control trials (unshaded columns) for the phonetic subgroup in which the stop category was reinforced (n = 3). The figures in parentheses indicate the number of times each stimulus was presented. M = male voice; F = female voice. 100- SUMMARY: PHONETIC GROUP NASALS REINFORCED co 90LU 09 Z O 80- O. 60 categories. F i g u r e 5 p r e s e n t s t h e s e data for the t h r e e infants in the p h o n e t i c group w h o w e r e t r a i n e d to turn to the stop category. T h e six s h a d e d c o l u m n s to the left show the p e r c e n t a g e o f h e a d turns to each of the six stop consonants p r e s e n t e d on c h a n g e trials; the six u n s h a d e d columns to the right show the same data for t h e six nasal consonants presented during control intervals. The stimulus is given on the horizontal axis. Since the stimuli w e r e a r r a n g e d in random order on the audiotape, the exp e r i m e n t e r had no control o v e r w h a t stimulus w o u l d b e p r e s e n t e d on a g i v e n t r i a l . As a c o n s e q u e n c e , t h e n u m b e r o f presentations of the stimuli v a r i e d somewhat. T h e most obvious feature of F i g u r e 5 is that, as n o t e d p r e v i o u s l y , m a n y m o r e h e a d turns w e r e o b s e r v e d on c h a n g e trials as c o m p a r e d to control trials. M o r e specifically, h o w e v e r , infants s e e m e d to turn in r o u g h l y e q u a l proportions in r e s p o n s e to each o f the six sounds in the two categories; that is, t h e y d i d not show any p r o m i n e n t , c o n s i s t e n t p r e f e r e n c e for a particular t a l k e r or place-ofarticulation value. This was also true for the s u b g r o u p of four infants r e i n f o r c e d for h e a d turns in r e s p o n s e to the nasal consonants (see F i g u r e 6). Again, the g e n e r a l picture is one o f a r e l a t i v e l y e v e n d i s t r i b u t i o n o f r e s p o n s e s a m o n g the stimuli. It is e s p e c i a l l y i n t e r e s t i n g that the infants d i d not show a p r e f e r e n c e for t h e stimulus u s e d in the initial-training stage, shown at the e x t r e m e left of each graph. In fact, F i g u r e 6 shows a slight t e n d e n c y to avoid the training token, although this effect is not particularly p r o m i n e n t . uJ re 70- Z cc 60- 1- 50- 7, z < LU -1- 40F- z 30- rr 20- LU O z z z ~ Z LU Q. 100 F/. mM nM riM ml~ nF r}g bM dM gM bF dF gg (26) (30) (19) (33) (24)(18) (21) (28) (23) (23) (22) (23) Control Change STIMULUS FIGURE 6. Percent head-turn responses to each Of the stimuli presented during change trials and control trials for the phonetic subgroup in which the nasal category was reinforced (n = 4). A c l e a r e r picture o f these results can b e o b t a i n e d b y c o m b i n i n g the data for all seven infants in the p h o n e t i c group. This can b e d o n e b y contrasting reinforced versus u n r e i n f o r c e d stimuli and c o l l a p s i n g the data into b r o a d e r categories such as "labial, male," " d e n t a l , male," and so on. A graph c o m b i n i n g the data from all subjects in the p h o n e t i c group is shown in F i g u r e 7. T h e i m p r e s s i o n of an e v e n d i s t r i b u t i o n of r e s p o n d i n g to the stimuli is even 274 Journal of Speech and Hearing Research 26 268-282 100- SUMMARY: PHONETIC GROUP ALL SUBJECTS 100" 90- uJ GO z no 09 t,u ~r 80" o3 IJJ 70- zrr 60- 60- P" 50-~ 50- Lu 40, 40- 6< w. I.:,, re LU Q. ~: 30- I M Y : NON-PHONETIC GROUP TRAINING STIMULUS:bF 90. o3 z o o. June 1983 80. 70 30- 204 10' 10" o LM DM VM LF DF VF LM DM VM LF DF VF (42) (62) (30) (50) (43) (36) (37) (44) (36) (38) (39) (36) Change bF rim f e l i e ~ rle gF mF bM dM qF (27) (17) (21) (27) (21) ( ~ ) (22) (25) (15) (25) (42) (10) Control STIMULUS FIGURE 7. Percent head-turn responses to each of the stimuli presented during change trials and control trials for all subjects in the phonetic group (n = 7). L = labial; D = postdental; V = velar. stronger in this graph. T h e m e a n r e s p o n s e p e r c e n t a g e to the reinforced stimuli was 67.5%, with a range o f only 8% and a standard d e v i a t i o n o f 2.9%. A t h r e e - w a y analysis of variance for talker (male vs. female), place of articulation (labial vs. p o s t d e n t a l vs. velar), a n d trial t y p e (change vs. control) r e v e a l e d a significant main effect for the trial-type factor only (F = 13.4; df = 1, 6; p < .01). There were no effects for talker (F = 1.1; df = 1, 6; 19 NS) or place of articulation (F = 1.4; df = 2, 12; p NS), and none of the interactions a p p r o a c h e d significance. The pattern of r e s p o n d i n g in the n o n p h o n e t i c group was quite different from that o f the p h o n e t i c group. Figure 8 shows the p e r c e n t a g e o f h e a d turns to each o f the stimuli p r e s e n t e d to the group o f four n o n p h o n e t i c infants who w e r e r e i n f o r c e d in initial training for h e a d turns to [ba] (female). As a group, t h e s e infants t e n d e d to turn more often tO the six stimuli in the reinforced class than to those in the u n r e i n f o r c e d class (25% vs. 16%). But, unlike the pattern o b s e r v e d for the p h o n e t i c infants, the responses were d i s t r i b u t e d very u n e v e n l y among the six reinforced stimuli. Specifically, m a n y m o r e responses were cued b y the [ba] (female) stimulus, w h i c h served as the reinforced token in the initial-training contrast. A very similar pattern can b e seen in F i g u r e 9 for the subg r o u p o f four infants t r a i n e d w i t h t h e c a t e g o r i e s reversed, that is, the infants for w h o m [na] (male) served as the reinforced stimulus in initial training. Again, the infants were r e s p o n d i n g most often to the stimulus u s e d in the initial-training contrast, with relatively low levels o f r e s p o n d i n g to the o t h e r stimuli. As a group, the e i g h t subjects in the n o n p h o n e t i c condition r e s p o n d e d to 29% of the c h a n g e trials, c o m p a r e d to 19% of the control trials. However, w h e n data are r e m o v e d from trials on which training stimuli w e r e p r e s e n t e d , the rate o f responding on change trials is only 18%, almost identical Change Control STIII~ULUS FIGURE 8. Percent head-tuna responses to each of the stimuli presented during change trials and control trials for the nonphonetic subgroup in which [ba] (female) served as the training stimulus (n = 4). 100- I Y : O9 90LU O9 Z o 80 ~A~G ~ N - P H O N E T I C GROUP STIMULUS: nM LU 70, Z 60. 5040" ~. 30- 20- ~, 10- o nM gF ~ ~ ~IF bF r~M dF mM gM nF (31) (24) ( 1 ) ( M ) ( g ) ( ~ ) ( H ) (20) (24) (23) (17) (27) C ~ Control ITBIIULUS FIGURE 9. Percent head-tuna responses to each of the stimuli presented during change trials and control trials for the nonphonetic subgroup in which [na] (male) served as the training stimulus (n = 4). to the r e s p o n s e rate on control trials, This s u g g e s t s ' t h a t the significant trial-type effect found for this group was d u e a l m o s t e x c l u s i v e l y to r e s p o n s e s to t h e t r a i n i n g stimulus. It was not p o s s i b l e tm t m m b i n e the data from the two subgroups in the hOt, p h o n e t i c condition. F o r the phonetic c o n d i t i o n this w a s a c c o m p l i s h e d b y c o m b i n i n g the responses to r e i n f o r c e d stimuli w h i c h s h a r e d values on all d i m e n s i o n s e x c e p t the stop/nasal d i m e n s i o n . This perfect symmetry, o f course, d i d not exist for the non- HILLENBRAND: p h o n e t i c categories. C o n s e q u e n t l y , it was n o t p o s s i b l e to line up each stimulus in one category with a stimulus in the other category that differed on a single feature value. TABLE 3. Number of trials required to reach criterion for subjects in the phonetic and nonphonetic groups. Subject Profiles of Individual Subjects T h e data p r e s e n t e d thus far are the results o f averages from groups o f subjects. Results from the s e v e n indiv i d u a l infants in the p h o n e t i c group are p r e s e n t e d in F i g u r e 10. T h r e e m e a s u r e s are given to the right o f each graph: (a) t h e p e r c e n t a g e o f h e a d turns on c h a n g e trials (CH), (b) t h e p e r c e n t a g e o f h e a d turns on conta'ol trials (CL), a n d (c) the overall p e r c e n t correct on both change a n d control trials (%C). T h e s e graphs s h o u l d b e exami n e d with some caution b e c a u s e o f the variation in the n u m b e r o f p r e s e n t a t i o n s o f t h e stimuli, g i v e n in pare n t h e s e s on the horizontal axis. Since the e x p e r i m e n t e r had no control over w h i c h stimulus was p r e s e n t e d on a given trial, some o f the data points in t h e s e graphs are b a s e d on v e r y few r e s p o n s e s . E x a m i n a t i o n o f t h e s e data clearly shows that t h e infants do not form a h o m o g e n e ous group. T w o o f t h e infants, Subjects 3 a n d 7, a p p e a r e d to b e r e s p o n d i n g r a n d o m l y to t h e stimuli, w h i l e the rem a i n i n g five infants p e r f o r m e d with r e l a t i v e l y high accuracy. F i g u r e 11 shows the r e s p o n s e patterns o f the e i g h t infants t e s t e d in t h e n o n p h o n e t i e group. I n t e r s u b j e e t variability in the p e r f o r m a n c e o f t h e s e subjects is also evident. Some o f the infants, p a r t i c u l a r l y Subjects 1, 2, 4, and 6, a p p a r e n t l y f o u n d the task very difficult a n d prod u c e d w h a t s e e m e d to b e e s s e n t i a l l y r a n d o m h e a d - t u r n r e s p o n s e s to t h e 12 stimuli. O t h e r infants, h o w e v e r , res p o n d e d w i t h some c o n s i s t e n c y to the stimulus u s e d in the initial-training contrast, shown at the e x t r e m e left o f e a c h g r a p h . S u b j e c t 8, in fact, a p p e a r e d to h a v e m e m o r i z e d a s e c o n d stimulus. It is i n t e r e s t i n g that this s e c o n d stimulus ([ga], female) has little in c o m m o n w i t h the training stimulus final, male). O n the o t h e r hand, Subject 3, w h o was initially t r a i n e d to [ga] (female), res p o n d e d a l m o s t e x c l u s i v e l y to the tokens p r o d u c e d b y the female talker. T h e p a t t e r n shown b y this infant is m o r e typical of o t h e r infants who have b e e n run u s i n g this t y p e of p r o c e d u r e - - t h a t is, some a t t e m p t b y the infant to formulate a g e n e r a l rule to organize the stimulus c a t e g o r i e s (Kuhl, H o l m b e r g , Morgan, H i l l e n b r a n d , & C a m e r o n , Note 3). Results from Preliminary Stages T h e data d e s c r i b e d to this p o i n t w e r e d e r i v e d from analyses of the infants' responses on t h e final stage o f the e x p e r i m e n t . This section p r o v i d e s a b r i e f d e s c r i p t i o n of the results from the p r e l i m i n a r y stages o f the experiment; a m o r e d e t a i l e d account of t h e s e results can b e found in H i l l e n b r a n d (Note 4). T a b l e 3 shows the results from the first t h r e e e x p e r i m e n t a l stages a n d from the c o n d i t i o n i n g p h a s e for infants in the p h o n e t i c a n d nonp h o n e t i c groups. F o r the c o n d i t i o n i n g p h a s e the erite- Infants" Organization of Speech Sounds 275 Condition 1 Experimental stage 2 3 Phonetic group 1 2 3 4 5 6 7 10 13 11 10 20 21 20 Nonphonetic group 1 8 2 6 3 7 4 9 5 20 6 9 7 5 8 3 20 __a -10 -- 16 _ -12 15 -- _ -17 14 - 10 10 10 10 - - - - aSubject failed to meet criterion (indicated by dashes). rion was t h r e e c o n s e c u t i v e anticipatory h e a d turns; for the t h r e e e x p e r i m e n t a l stages the criterion was n i n e correct r e s p o n s e s in 10 c o n s e c u t i v e trials. O n e fairly p r o m i n e n t finding from t h e s e tables is that, on the average, infants in the p h o n e t i c group r e q u i r e d more trials to reach the c o n d i t i o n i n g criterion (~ = 15.0) than d i d infants in the n o n p h o n e t i c group (~ = 8.7). This difference was p r e d i c t a b l e since the n o n p h o n e t i c infants w e r e t r a i n e d on a contrast involving differences in several a c o u s t i c d i m e n s i o n s , w h i l e t h e p h o n e t i c infants w e r e t r a i n e d on a m i n i m a l pair. A s e c o n d f e a t u r e o f i n t e r e s t in these tables is that in the majority o f cases infants d i d not m e e t the 9-out-of-10 accuracy criterion and, c o n s e q u e n t l y , w e r e p r o g r e s s e d to the next experim e n t a l stage after 20 trials. This was true for b o t h groups a n d for all t h r e e stages. This was not particularly surprising since previous work has shown that infants typically r e q u i r e m o r e than 20 trials to reach criterion on consonant contrasts ( H o l m b e r g et al., 1977). A more r e v e a l i n g picture o f the infants' performance t h r o u g h o u t the e x p e r i m e n t can b e seen b y e x a m i n i n g the overall p e r c e n t a g e o f correct r e s p o n s e s as a function of the e x p e r i m e n t a l stage. M e a n a n d s t a n d a r d d e v i a t i o n p e r c e n t correct for each e x p e r i m e n t a l stage are p l o t t e d in F i g u r e 12 for the p h o n e t i c group and in F i g u r e 13 for the n o n p h o n e t i c group. F i g u r e 12 shows that t h e r e was no t e n d e n c y for the p e r f o r m a n c e of the p h o n e t i c infants to d e c l i n e as the e x p e r i m e n t b e c a m e m o r e complex. In fact, t h e s e data show a slight t r e n d in the o p p o s i t e direction. In contrast, the p e r f o r m a n c e o f the nonphonetie infants d r o p p e d rather s h a r p l y from stage 1 to stage 2 and rem a i n e d at a r e l a t i v e l y low level. T h e s e results suggest t h a t t h e n o n p h o n e t i c infants w e r e a b l e to l e a r n t h e head-turn task b u t w e r e u n a b l e to m e m o r i z e the unrel a t e d tokens that w e r e a d d e d as the e x p e r i m e n t progressed. 276 Journal of Speech and Hearing Research I00 to - 90- P~ ~ 706o SUBJECTR5 PHONETIC GROUP 90 80. CH=86 CL=30 %C=78 ~ ~o ~ 4o ~ 20 !,o ~ 60 ~ 50 CH=I00 CL=26 %C=87 HHnnl ~: 40 ~ 30 ~ a. 10 20 10 bM aM gM bF dF gF mM nM rIM mF nF ~F (6) (14) (3} (S) (3) (T) (5) (71 (3) (5) (5) (4) Change Control STIMULUS rnM nM r(M mF nF qP bM dM gM bF dF gF (7) (7) (5) (12)(9) (4) (3) (7) (3) (5) (5) (81 Change Control STIMULUS SUBJECT#2 PHONETIC GROUP SUBJECT#6 PHONETIC GROUP CH=68 CL=I5 %C=77 CH=88 CL=9 %C=90 I'M dM gM bF CiF gF mM nM qM mF nF I]F (5) (9) (2) (7) (8) (61 (4) (3) (2) (6) (3) (4) Change Conlrol mM nM r~l mF nP t]F bM dM gM bF dF gF (3) (g) (3) (6) (S) (5) (31 (E] (81 (51 (51 (71 Change Contro( STIMULUS STIMULUS 100- 100 ~ SUBJECT~3 PHONETIC GROUP 90" CO 70- ~ 7"0- 60" z 6o ~ 5o ~ 4o CH=I9 CL=26 %C=47 50" 4030- 10 10" mM nM ~M mF oF t]F bM dM gM bF bM dM gM bF dF gF mM nM riM mF nF tiE (5) (9) (6) (4) (8) (7) (7) (E) (E) (4) (9) (5) Change Control STIMULUS 1°°I (9) dF gF (7) (4) (t0) (t) (4) (~t) (3) (lg) (9) (4) (t) Change Control STIMULUS SUBJECT~4 PHONETIC GROUP go 7O¸ ! CH=34 CL=28 %C=53 ~: 20 20- 50; SUBJECT#7 PHONETIC GROUP 90~ g 5~ 80- 0 June 1983 100. SUBJECT~I PHONETIC GROUP F~ 80- ~ 26 268-282 ~ F~ CH=79 CL=9 %C=85 50' ,o. 30. ~:~ 20 I~ 10. O.- mM nM riM mF nF r]F bM dM gM bF dP gF (7) (10) (7) (5) (6) (3) (4) (12) (0) (4) (8) (7) Change Control STIMULUS FIGURE 10. Individual response profiles for subjects in the phonetic group. The figures to the right of each graph indicate the percentage of responses on change trials (CH), the percentage of responses on control trials (CL), and the overall percent correct (%C). HILLENBRAND: Infants" Organization of Speech Sounds 277 1004 = 90- 100- SUBJECI NON-PHONETIC GROUP 90- Z 80- ~ ~ ~ o 8070- 706050. C H = 15 C L = 18 %C= 49 "1 z 6O- 2 50- C H = 18 C L = 13 %C= 53 ~, 30- 30- ,o. 20" ~ SUBJECT #5 NON-PHONETIC GROUP 1 lO. == lo- n bF (7) G ~M dF mM oM nF nM gF mg bM dM (B) (5) (E) (4) (11) (7) (S) (5) (B) (9) Change Control ~F (1) nM gF mF ToM dM =IF bF r~M dF dM gM nF (9) (5) (6) (5) (6) (0) (5) (3) (5) (9) (3) (5) Change Control STIMULUS STIMULUS 100- SUBJECT#6 NON-PHONETIC GROUP 100" 90- SUBJECT #2 NON-PHONETIC GROUP 90" B0" 80- o C H = 14 CL= 8 %C= 53 70¢ Z 60so- C H = 37 C L = 30 %C= 54 7060- ~ 4030- Dn ~ 200 30- nM bF rim dF mM gM nF nM gF mF BF dM ~F (7) (5) (8) (4) (7) (4) (7) (7) (4) (B) (10) (3) Change Control STIMULUS 80- so 70 513 40 ~ 30 ~ ~o . 20 10" 70" C H = 42 C L = I0 %C= 66 60 ~ .o.11 40" 10" Change 0 C H = 32 C L = 19 %C= 57 dM gM nP (E) (2) (3) Control STIMULUS 100- 1009 SUBJECT #4 NON-PHONETIC GROUP 90- SUBJECT#8 NON-PHONETIC GROUP ~Z 80- 80 ~ 70 C H = 28 C L = 24 %C= 52 ~ EO 50 70- CH=43 CL=II %C=66 ~ 60- i~ 40°° ~ 30 ~ 20 10. 10' 0 nM gF mF bM dM rjF hF rIM dF (8) (5) (3) (9) (6) (3) (4) (5) (7) Change Control STIMULUS , R 50" bF ~M dF mM gM nF nM gF mF bM dM r}F (10) (2) (6) (T) (4) (T) (2) (S) (5) (5) (11) (2) 90 dg mM gM nF SUBJECT#7 NON-PHONETIC GROUP 90" ~uI~c~:~T,cGROUP 90 ~ rlF bF qM (9)" (10) "(9) (E) (~) (E) (7) Control STIMULUS 100' 100 =o gF mF bM dM (6) (7) (2) (7) (4) Change bF ~M dF mM gM nP nM gF mF bM dM ~P (3) (4) (2) (10) (B) (7) (B) (7) (t) (8) (12) (4) Change Control STIMULUS nM gF mF bM dM rjF bF rim dF mM 9M nF (8) ( 7 ) (7) (5) (2) (6) (7) (3) (4) (E} (B) (12) Change Control STIMULUS FIGURE 11. Individual response profiles for subjects in the nonphonetic group. The figures to the right of each graph indicate the percentage of responses on change trials (CH), the percentage of responses on control trials (CL), and the overall percent correct (%C). 278 Journal of Speech and Hearing Research 100- 26 NON-PHONETIC GROUP 90I(9 IJ.I rr ft. 0 (9 t-- 807060- Z I.U (9 50- 1.1.1 n 40- Ix 268-282 June 1983 infants in the nonphonetic group. As a consequence, more infants in the phonetic group failed to meet the conditioning criterion. For this reason, it could be argued that the phonetic/nonphonetic difference was the result of bias in subject selection. It is possible that the more difficult initial-training contrast in the phonetic condition resulted in the selection of better subjects than those in the nonphonetic condition. To test for this possibility, an additional control condition was run using a nonphonetie task in which the initial-training contrast was the same as that for the phonetic group--[ma] versus [ba]. The experimental stages for this condition are shown in Table 4. As in the pho- 30- 07 I I I I 1 2 3 4 EXPERIMENTAL STAGE Stage FIGURE 12. Overall percent correct for each experimental stage for the phonetic group. The error bars indicate one standard deviation. 100- PHONETIC GROUP 90I-. 0 ILl rr IX 0 (9 lZ uJ 0 IX I,1.1 TABLE 4. Experimental stages for an additional nonphonetie control condition. 8070- Category 1 ba ba ga ba ga ma na ba ga ma na da oa (M) (M) (F) (M) (F) (F) (M) (M) (F) (F) (M) (M) (F) Category2 ma ma oa ma oa da ba ma 13a da ba ga na (M) (M) (M) (M) (M) (F) (F) (M) (M) (F) (F) (M) (F) 60504030- oq I ! ! I 1 2 3 4 EXPERIMENTAL STAGE FIGURE 13. Overall percent correct for each experimental stage for the nonphonetic group. The error bars indicate one standard deviation. An Additional Control Condition As was discussed previously, the nonphonetie condition was designed to test infants on a set of stimuli comparable to that used in the phonetic condition but which could not be grouped on the basis of auditory similarity. The relatively good performance of infants in the phonetic group led to the conclusion that these subjects recognized similarities among sounds in the stimulus categories. However, infants in the phonetic group were initially trained on a more difficult contrast than were netic condition, the initial-training stage contrasted [ma] (male) with [ha] (male). However, as in the nonphonetic condition described previously, stimuli were added in subsequent stages in sueh a way that the categories could not be organized by talker or by place or manner of production. Testing procedures were identical to those described previously except that the tape deck and modular programming logic were replaced by a digital computer (DEC PDP 11/34). A computer program presented stimuli and controlled experimental contingencies aeeording to the stone rules and with the same timing parameters as were used to design the programming logie described previously. Six 5V2- to 6a/2-month-old inIeants began testing; two of these subjects failed to pass the conditioning criterion. The results of this control experiment do not support the possibility that tile phonetie/nonphonetie difference was due exclusively to bias in subject selection. Average performance on the initial-training stage was 68% correct, comparable to that of the phonetic group. However, unlike the performance of the phonetic group, these subjects' performance fell very close to ehance and stayed there for the remaining stages. Average performance for the final stage was 58% correct. These findings support the eonelusion that infants in the phonetic condition per- HILLENBRAND: Infants' formed well because they recognized the perceptual similarity of syllables sharing a value on a feature dimension. DISCUSSION The principal findings of this study were: 1. The overall performance of infants in the phonetic group was significantly better than that of the nonphonetic group. 2. The phonetic infants tended to distribute their responses more or less evenly among the stimuli in the reinforced category, while infants in the nonphonetic group tended to favor the stimulus that was used in the initial-training contrast. 3. There was no evidence of a systematic decline in the performance of phonetic infants as the experiment became more complex, whereas the performance of infants in the nonphonetic group tended to drop as tokens were added to the two categories. These results suggest that infants do recognize the similarity of speech sounds that share a value on a phonetic-feature dimension. The alternate possibility that simple rote memorization was responsible for these results seems unlikely in light of the relatively poor overall performance of infants in the nonphonetic group. This same phonetic/nonphonetic difference was also found in a similar study examining categorization of fricatives (Kuhl et al., Note 3) and in a study examining categorization of nasal consonants (Hillenbrand, Note 2). It is important to point out, however, that the nonphonetic results do not prove that memorization was not involved in any form in the phonetic condition. It is a well-established finding that memorization is most efficient when the items to be recalled can be organized in some fashion (e.g., see Bartlett, i932; Bransford & Franks, 1974; T u l v i n g & D o n a l d s o n , 1972). T h e phonetic/nonphonetie effect suggests that if memorization was involved, the process was aided by the perceptual similarity of the speech sounds. Whatever the exact role of memory in these experiments, it appears that recognition of perceptual similarity is a necessary condition for good performance on this kind of task. One additional issue that needs to be addressed in interpreting these findings concerns the discriminability of tokens within the stop and nasal categories. To qualify as categorization, it mugt be demonstrated that the tokens in the particular class are being treated as equivalent but different. That is, it would not be interesting to demonstrate common responses to the class [b, d, g] if infants could not discriminate stop-consonant place of articulation. The literature provides ample evidence that infants can discriminate among voieed stop consonants (Eimas, 1974; Morse, 1972). In addition, a recent experiment using procedures very similar to those described in this report p r o v i d e s e v i d e n c e for the d i s c r i m i n a t i o n of nasal-consonant place of articulation by young infants (Hillenbrand, Note 2). T h e s e discrimination results Organization of Speech Sounds 279 suggest that infants in the present study demonstrated what Bornstein (1981) has called "equivalence classification," or "the equivalent treatment of discriminably different stimuli based on their perceptual similarity" (p. 4o). Perceptual Development and Theories of Speech Perception The present results extend the findings of previous research on infants in which speech-sound categorization was tested at the level of the phonetic segment (Fodor et al., 1975; Holmberg et al., 1977; Kuhl, 1977; 1979b; Kuhl & Miller, 1982; Hillenbrand, Note 2). Taken as a group, these studies suggest that young infants have relatively sophisticated abilities to focus on the critical acoustic dimensions that "define" speech-sound categories while ignoring prominent variation in noncritical dimensions. These findings are analogous to the more extensive developmental literature on perceptual constancies in vision. The work of Bower (1964), for example, suggests that young infants perceive the true size of an object despite the substantial variations in retinal-image size that result when object-observer distance is changed. The exact role of experience is not clear in these vision experiments, nor is it a simple issue in relation to the infant studies on speech-sound categories. Since the subjects in these studies were not newborns, it is not possible to rule out learning or simply the effects of exposure to speech in accounting for these results. Two conclusions seem reasonable, however. First, if these abilities are learned, they are learned very quickly and apparently without any specific training. Second, and perhaps more important than the specific question of innateness, these abilities predate the acquisition of detailed knowledge of speeeh production and the acquisition of sophisticated speech-comprehension abilities. This observation bears directly on specific theoretical debates in speech-perception research. An important contention of "motor theories" of speech perception is that the invarianee problem is resolved by processes that involve the mediation of articulatory knowledge. The results of the present study, and other demonstrations of perceptual constancy for speech by infants, suggest that sophisticated articulatory knowledge is not a necessary condition for the demonstration of these abilities. It appears that prelinguistie infants are capable of extracting the acoustic properties that form the basis of phonetic categories. If this general finding is corroborated by further research, it would seem to support the anditorybased theories proposed by Fant and others (Fant, 1967; Miller, 1977; Miller et al., 1977; Searle et al., 1979). However, it is possible to formulate a version of an articulation-based theory consistent with the infant findings. It is necessary only to assume that the artieulatory knowledge which mediates the perception of speech is phylogenetically rather than ontogenically acquired; that is, that part of human genetic endowment is a speciesspecific mechanism for speech perception. In fact, this sort of approach has been successful in explaining the 280 Journal of Speech and Hearing Research perception of biologically relevant signals in other species (Hailman, 1969; Marler, 1970; 1976). However, recent experiments on speech perception by nonhuman listeners are not consistent with this view. Research on the dog (Barn, 1975) and the chinchilla (Burdick & Miller, 1975; Kuhl & Miller, 1975; 1978) suggests that nonhuman listeners are able to sort speech sounds on the basis of phonetic similarity across variations in noncritical dimensions. Taken together, the infant and animal findings suggest that acoustic invariants are available in the speech signal and, further, that the mammalian auditory system seems capable of extracting these properties in a variety of contexts. Implications for Phonological Development The phonetic condition contrasted a category of voiced stop consonants with a category of nasal consonants. The performance of subjects in this task indicates that infants are capable of organizing speech sounds on the basis of categories at least this broad or "abstract." The feature categories tested, however, are phonologically organized within even broader feature classes, such as [_-+continuant] or [_'sonorant]. It would be interesting tO determine w h e t h e r infants are capable of o r g a n i z i n g speech sounds based on very broad feature categories such as these. For example, would infants reinforced for head turns to nasal consonants also respond to presentations of other sonorants, such as liquids and semivowels, but not to presentations of obstruents, such as fricatives and affricates? The importance of determining the infant's proclivities for classifying speech sounds is that these kinds of perceptual abilities may form the basis for acquiring phonological rules that appeal to feature categories. There are a number of phonological rules that appeal to the nasal/oral distinction. For example, in most dialects of American English, voiced stops that precede homorganic syllabic nasals are released nasally rather than orally (e.g., " s a d d e n " ) . Most d e s c r i p t i o n s of phonological rule systems suggest that rules such as these are specified in terms of values on feature dimensions rather than individual phonetic segments. While the present results do not argue that infants are born with anything that could be described as "phonological k n o w l e d g e , " it is possible that the acquisition of phonological rules may be aided by the infant's recognition of the inherent perceptual similarity of speech sounds sharing particular feature values. On a related issue, some investigators have argued that children do not learn the sound system of their language in a straightforward " s e g m e n t - b y - s e g m e n t " fashion, but rather by learning the hierarchical organization of features and feature contrasts (Blache, 1978; Jakobson, 1968; Smith, 1973). More detailed studies of the type presented here might reveal a relationship between the acquisition of phonological rules and phonetic segments and the relative difficulty of organizing speech sounds along various feature dimensions. 26 268-282 June 1983 ACKNOWLEDGMENTS This work is a portion of a dissertation conducted at the University of Washington's Child Development and Mental Retardation Center under the direction of Patricia Kuhl. Her careful guidance is gratefully acknowledged, as is the advice of Fred Minifie, Wesley Wilson, and Philip Dale. I would also like to thank Jean Tully, Tristan Holmberg, Chris Prall, and Kyum-Ha Lee for their valuable contributions to this project. This work was supported by a research contract from the National Institute of Child Health and Human Development to Dr. Fred Minifie (NICHD HD-3-2793), a grant from the National Science Foundation to Dr. Patrieia Kuhl (BNS 79-13767), and by an Annual Fund Doctoral Fellowship to the author from the Graduate School of the University of Washington. REFERENCE NOTES 1. KUHL, P. K., & HILLENBRAND, J. Speech perception by young infants: Perceptual constancy for categories based on pitch contour. Paper presented at the biennial meeting of the Society for Research in Child Development, San Francisco, 1979. 2. HILLENBRAND,J. Speech perception by infants: Categorization along a nasal consonant place dimension. Manuscript submitted for publication. 3. KUIJL, P. K., HOLMBERG, T. L., MORGAN, K. A., HILLEN~RAND, J., & CAMERON,P. Perception of equivalence for fricatives in CV syllables. Manuscript in preparation. 4. HILLENBRAND,J. Perceptual organization of speech sounds by young infants. Unpublished doctoral dissertation, University of Washington, 1980. 5. PRALL, C. W., • HILLENBRAND,J. AUDED: A time-domain analysis and editing program for audio signals. Technical report, Northwestern University, Evanston, IL, 1980. REFERENCES BARTLETT, F. C. Remembering. Cambridge, England: Cambridge University Press, 1932. BARU, A. V. Discrimination of synthesized vowels [a] and Ill with varying parameters in dog. In G. Fant & M. A. A. Tathum (Eds.), Auditory analysis and perception of speech. New York: Academic Press, 1975. BLACI-IE, S. E. The acquisition of distinctive features. Baltimore: University Park Press, 1978. BORNSTEIN, M. H. Two kinds of perceptual organization near the beginning of life. In W. A. Collins (Ed.), Aspects of the development of competence. Hillsdale, NJ: Lawrence Erlbaum Associates, 1981. BOWER, T. G. R. Discrimination of depth in premotor infants. Psychonomic Science, 1964,1,368. BKaNSFOrU),J. D., & FRANKS,J. J. Memory for syntactic form as a function of semantic context. Journal of Experimental Psychology, 1974, 103, 1037-1039. BUttDICK, C. K., & MILLER, J. D. Speech perception by the chinchilla: Discrimination of sustained/a/and/i/. Journal of the Acoustical Society of America, 1975, 58, 415-427. EIMAS, P. D. Auditory and linguistic processing of cues for place of articulation by infants. Perception & Psychophysics, 1974, 16, 513-521. EIMAS, P. D., & MILLER,J. L. Discrimination of information for manner of articulation. Infant Behavior and Development, 1980, 3, 367;375. FANT, G, Acoustic theory of speech production. The Hague: Mouton, 1960. FANT, G. Auditory patterns of speech. In W. Wathen-Dunn (Ed.), Models for the perception of speech and visual form. Cambridge: MIT Press, 1967. HILLENBRAND: Infants' Organization of Speech Sounds FODOR, J. A., GARRETT, M. F., & BRILL, S. L. Pi-ka-pu. The perception of speech sounds by pre-linguistic infants. Perception & Psychophysics, 1975, 18, 74-78. FUJIMURA, O. Analysis of nasal consonants. Journal of the Acoustical Society of America, 1962,34, 1865-1875. HAILMAN,J. P. How an instinct is learned. Scientific American, 1969, 221, 98-106. HILLENBRAND,J. Categorization of stop and nasal consonants by young infants. Journal of the Acoustical Society of America, 1980, 68(Suppl. 1), S31(A). HILLENBRAND, J., MINIFIE, F. D., & EDWARDS, T. J. Tempo of spectrum change as a cue in speech-sound discrimination by infants. Journal of Speech and Hearing Research, 1979, 22, 147-165. HOLMBERG, T. L., MORGAN, K. A., & KUHL, P. K. Speech perception in early infancy: Discrimination of fricative consonants. Journal of the Acoustical Society of America, 1977, 62(Suppl. 1), $99(A). JAKOBSON, R. Child language, aphasia and phonological universals. The Hague: Mouton, 1968. KUHL, P. K. Speech perception in early infancy: Perceptual constancy for the vowel categories/a/and/o/. Journal of the Acoustical Society of America, 1977, 62(Suppl. 1), $39(A). KUHL, P. K. Models and mechanisms in speech perception: Species comparisons provide further contributions. Brain, Behavior and Evolution, 1979, 16,374-408. (a) KL~L, P. K. Speech perception in early infancy: Perceptual constancy for spectrally dissimilar vowel categories. Journal of the Acoustical Society of America, 1979, 66, 1668-1679. (b) KUHL, P. K., & MILLER, J. D. Speech perception by the chinchilla: Voiced-voiceless distinctions in alveolar plosive consonants. Science, 1975,190, 69-72. KUItL, P. K., & MILLER, J. D. Speech perception by the chinchilla: Identification for synthetic VOT stimuli. Journal of the Acoustical Society of America, 1978, 63, 905-917. KUHL, P. K., & MILLER, J. D. Discrimination of auditory target dimensions in the presence or absence of variation in a second dimension by infants. Perception & Psychophysics, 1982, 31,279-292,. LmERMAN, A. M. The grammars of speech and language. Cognitive Psychology, 1970, 1,301-323. 281 LIBERMAN, A. M., COOPER, F. S., SHANKWEILER,D. P., & STUDDERT-KENNEDY, M. Perception of the speech code. Psychological Review, 1967, 74, 431-461. MARLER, P. A comparative approach to vocal learning'. Song development in white-crowned sparrows. Psychological Monographs, 1970, 71, 1-25. MARLER, P. Sensory templates in species-specific behavior. In J. Fentress (Ed.), Simpler networks and behavior. Sunderland: Sinauer Associates, 1976. MmLER, J. D. Perception of speech by animals: Evidence for speech processing by mammalian auditory systems. In T. H. Bullock (Ed.), Recognition of complex auditory signals. Berlin: Abakon Verlagsgesellschaft, 1977. MILLER,J. D., ENGEBRETSON,A. M., SPENNER,B. F., & Cox, J. R. Preliminary analysis of speech sounds with a digital model of the ear.Journal of the Acoustical Society of America, 1977, 62(Supph I), $13(A). MORSE, P. A. The discrimination of speech and non-speech in early infancy.Journal of Child Psychology, 1972,14,477-492. SEARLE, C. L., JACOBSON, J. Z., & RAYMENT, S. G. Stop consonant discrimination based on human audition. Journal of the Acoustical Society of America, 1979, 65, 799-809. SMITH, N. V. The acquisition of phonology: A case study. Cambridge, England: Cambridge University Press, 1973. STEVENS, K. N., ~¢ BLUMSTEIN, S. E. Invariant cues for place of articulation in stop consonants. Journal of the Acoustical Society of America, 1978, 64, 1358-1368. STEVENS, K. N., & HousE, A. S. Speech perception. In J. V. Tobias (Ed.), Foundations of modern auditory theory (Vol. 2). New York: Academic Press, 1972. TULWNG, E., & DONALDSON, W. (Eds.). Organization of memory. New York: Academic Press, 1972. Received March 2, 1982 Accepted August 12, 1982 Requests for reprints should be sent to James Hillenbrand, Department of Communicative Disorders, Northwestern University, 2299 Sheridan Road, Evanston, IL 60201. 282 Journal of Speech and Hearing Research 26 268-282 J u n e 1983 APPENDIX Table A shows the results of acoustic measurements on the stop-vowel and nasal-vowel stimuli used in the infant tests. All measurements were made using the program AUDED (Prall & Hillenbrand, Note 5) written for a DEC PDP 11 computer. Fundamental frequency was measured for the vocalic portion of each utterance by displaying successive 100-msec segments of the waveform on a high-resolution graphics terminal (Tektronix 4010) and using a cross-hair cursor to mark the boundaries of each pitch period. For simplicity, the table shows only mean fundamental frequency. All utterances showed rise/fall fundamental frequency contours. Intensity was measured by a program that simply calculated an RMS value over all data points in the waveform and converted the value to a decibel scale. All values in the table are given in relation to [ba] (male), which was arbitrarily set to 65 dB. The overall duration of each utterance was measured from the same graphics displays as those used to calculate fundamental frequency. TABLE A. Fundamental frequency, intensity, and duration measurements of the stop and nasal stimuli. Fundamental frequency means and standard deviations are given separately for the male and female talkers. Stimuli ba da ga ba da ga (male) (male) (male) (female) (female) (female) mean SD ma (male) na (male) 0a (male) ma (female) na (female) rja (female) mean SD Fundamental frequency (Hz) RMS intensity (dB ) Duration (msec ) 80.7 81.5 82.1 197.6 194,8 197.4 65.0 64.5 64.3 69.1 69.5 68.4 479.2 562.8 518.6 407.2 474.4 560.4 81.4/196.6 .7/1.6 66.8 2.4 500.4 59.4 83.9 84.0 81.3 188,0 197.0 192.4 64.8 64.3 64.4 71.2 71.9 71.5 505.6 561.9 535.2 487.7 508.2 484.9 83.1/192.4 1.5/4.5 68.0 3,9 513.9 29.6