P E R C E P T U A L BY

advertisement
Journal of Speech and Hearing Research, HILLENBRAND,Volume 26, 268-282, June 1983
I
PERCEPTUAL
ORGANIZATION
OF SPEECH
SOUNDS
BY I N F A N T S
JAMES HILLENBRAND
Nort] cestern Unicersit!! Euanston Illinois'
An operant head-turn procedure was used to test whether 6-month-old infants recognize the auditor' similariW of speech
sounds sharing a value on a phonetic-feature dimension. One group of ini:ants was reiniorced }br head turns when a change
occurred from a series of repeating background stimuli containing nasal consonants ([m, n, rj]) to repetitions from a categoD' of
syllables containing voiced stop consonants ([b, d, g]), or to a change from stops to nasals. The stiluuli were naturally produced
b'v both male and fbmale talkers. The perfbnnance of infants in this "phonetic" group was compared to that of infants in a
"nonphonetic" control group. Using the salne procedures, these inf~ants were reinforced for head turns to a group of phonetically
unrelated speech sounds. Results indicated that the perfonnance of infants in the group trained on phonetically related speech
sounds was far superior to that of"infants in the nonphonetie control group. These findinKs suggest that prelinguistic infants can
perceptually organize speech sounds on the basis of auditory properties related to feature simflaritv.
A major focus of speech-perception research over the
past several decades has been an attempt to define phonetic categories in terms of acoustic properties--for
example, to specify the acoustic attributes that define or
"cue" the segment [g], or the feature [velar], in all the
contexts in which it occurs. Much of the literature in this
area has suggested that the critical cues to phonetic
categories are often highly variable with changes in
"context." The physical cues to speech-sound categories
have been found to vary with changes in noncritical dimensions such as the phonetic environment in which the
segment appears, the position that the segment occupies
within the syllable, and the talker who produces the utterance. These results, combined with a variety of other
findings, have led some investigators to theorize that the
cues to phonetic categories are not derived from the
physical signal in a direct way. Specifically, the suggestion has been made that the perception of speech is
mediated in some way by knowledge of how speech is
produced. According to this view, the speech waveform
is assumed to be interpreted in terms of the articulatm7
gestures that were used to produce the signal (Liberman,
1970; Liberman, Cooper, Shankweiler, & StuddertKennedy, 1967; Stevens & House, 1972).
Other investigators have argued that attempts to relate
phonetic categories to the acoustic signal have failed to
account seriously for the psychophysical processes involved in the coding of complex auditory signals. According to this point of view, invariant acoustic cues to
phonetic categories can, in fact, be derived from the
physical signal without appealing to articulatory knowledge (Fant, 1967; Kuhl, 1979a; Miller, Engebretson,
Spenner, & Cox, 1977; Searle, Jacobson, & Rayment,
1979; Stevens & Blumstein, 1978).
Speech-perception research with infants can provide
specific kinds of evidence on the contention that articulatory knowledge is a necessary condition for the
categorization of speech sounds. The reasoning is rela© 1983, American Speech-Language-Hearing Association
tively simple; Since prelinguistic infants are not assumed to possess sophisticated knowledge about the
p r o d u c t i o n of speech, d e m o n s t r a t i o n s of p h o n e t i c
categorization by infants will indicate the limits of the
type of articulatory knowledge likely to be involved in
this process. In a recent series of experiments, Kuhl and
her associates (Kuhl & Miller, 1982; Kuhl, 1977; 1979b;
Holmberg, Morgan, & Kuhl, 1977; Kuhl & Hillenbrand,
Note 1), attempted to determine the extent to which
y o u n g infants recognize similarities among speech
sounds when variations are introduced in noncritical dimensions. For example, an experiment by Kuhl (1979b)
demonstrated that 6-month-old infants could detect a
change from one category, of vowels to another when the
tokens varied randomly in talker and pitch contour. Inihnts in this experiment were initially trained to make a
head turn for a visual reward when a change occurred
from repetitions of a single token of [aJ, synthesized to
simulate a male voice with a falling pitch contour, to
repetitions of a single token of Ill, produced by the same
male "talker" with the same pitch contour. The infants
were then graduaIly exposed to a number of novel tokens synthesized to simulate female and child talkers
with either falling or rising pitch contours. The results
showed that infants readily transferred learning from the
tokens produced by the male talker to the novel tokens
produced by female and child talkers.
Similar experiments have tested the perception of an
[a]-[a] contrast across variations in talker and pitch contour (Kuhl, 1977), fricative contrasts across variations in
vowel context and talker (Holmberg et al., 1977), a
nasal-consonant place contrast across variations in vowel
context and talker (Hillenbrand, 1980, Note 2), and,
using a different version of the operant head-turn procedure, a stop-consonant place contrast across variations in
vowel context (Fodor, Garrett, & Brill, 1975),
To date, infant research on phonetic categories has focused exclusively on the infant's ability to recognize
268
0022-4685/83/2602-0268501.00/0
HILLENBRAND: Infants" Organization of Speech Sounds
phonetic similarity at the level of the phone, or phonetic
segment. The purpose of the present study was to extend
these findings and to test infants on their ability to organize speech sounds at the more abstract level of the
phonetic feature. The feature contrast was a stop/nasal
distinction: [b, d, g] versus [m, n, r3]. This contrast
seemed like a logical starting point for testing feature
perception in infancy for two reasons. First, a good deal
is known about the physical correlates of this distinction.
During the occlusion portion of nasal consonants, a nasal
murmur is produced that is characterized by (a) a lowfrequency first resonance at 200-300 Hz, well separated
from higher formants; (b) relatively high damping factors
(large formant bandwidths and low formant levels); and
(c) an antiformant that varies in frequency with place of
articulation (Fant, 1960; Fujimura, I962), Voiced stop
consonants, on the other hand, (a) do not show a nasal
murmur (although a low-frequency "voice bar" may be
present during the occlusion), (b) are characterized by
aperiodic release bursts, and (c) typically show more
rapid changes in amplitude following release than nasal
consonants (Fant, 1960). A second reason for studying
the stop/nasal eontrast is that information is available on
infants' discrimination of stop and nasal consonants.
Evidence is available to show that infants can discriminate individual pairs of speech sounds differing in stopconsonant place of articulation (Eimas, 1974; Morse,
1972), nasal-consonant place of articulation (Hillenbrand, Note 2), and a stop-nasal manner-class contrast
(Eimas & Miller, 1980).
The present study examined the ability of infants to
eategorize speech sounds according to the stop-nasal distinction. In other words, the study was designed to determine whether infants recognize that the stops [b, d, g]
are similar to one another and distinct from a class eonsisting of the nasals [m, n, 0].
METHODS
The general approach of the study was similar to the
transfer-of-learning experiments by Kuhl and her colleagues (Kuhl, 1977; i979b; Holmberg et al., 1977; Kuhl
& Hillenbrand, Note 1). One group of 6-month-old infants was visually reinforeed for head-turn responses
when a change occurred from a background category of
syllables containing nasal consonants ([m, n, D]) to a
comparison category of syllables containing voiced stop
consonants ([b, d, g]), or to a change from stops to nasals.
The speech sounds were produced by both male and
female talkers. The performance of infants in this "phonetic" group was compared to the performance of a separate group of infants run in a procedurally identical
" n o n p h o n e t i c " condition. T h e s e infants w e r e tested
using the same pool of stimuli used in the phonetic condition, but the stimuli were assigned to reinforced and
unreinforced categories in such a way that the categories
could not be organized according to phonetic attributes
or talker.
The procedure, which is described in detail below,
269
used a visual reward to train an infant to make a headturn response when a change occurred from a class of
repeating background stimuli to repetitions from a comparison category. The experimental stages for the phonetic condition are shown in Table 1. The first stage contrasted a single token of [ma] with a single token of [ba].
TABLE 1. Experimental stages for the phonetic condition.
Stage
1 : Initial training
2: Place variation
3: Talker x Place
4: Transfer of learning
Categor~j 1
ba
ba
da
ba
da
ba
da
ba
da
ga
ba
da
9a
(M)
(M)
(M)
(M)
(M)
(F)
(F)
(M)
(M)
(M)
(F)
(F)
(F)
Categor~
d2
ma
rna
na
ma
na
ma
na
ma
na
Da
rna
na
Da
(M)
(M)
(M)
(M)
(M)
(F)
(F)
(M)
(M)
(M)
(F)
(F)
(F)
Both syllables were naturally produced by the same
male voice. In the second stage, postdental consonants
were added to each class; that is, [ma] and [na] were contrasted with [ba] and [da]. In the third stage, labial and
postdental consonants produced by a female voice were
added to eaeh category. In the fourth and final stage,
velar consonants were added to each class, resulting in a
contrast between male and female [m, n, D] and male and
female [b, d, g]. Half of the infants were trained with the
stop consonants as the comparison category, and half
were trained with the nasal consonants as the comparison category.
In the final stage of the experiment, the infant's task
was to make a head-turn response whenever a change
occurred from a category- of nasal consonants to a category of voiced stop consonants--or from stop consonants
to nasal consonants--independent of random variation in
place of artieulation and talker. If subjects in this task
succeeded in responding to the stimuli in the comparison category, it would be tempting to conclude that the
infants recognized the similarity of speech sounds sharing a phonetic-feature value. It is possible, however, that
infants might simply memorize which tokens were reinforced and which ones were not. Memorizing tokens, of
course, would not n e c e s s a r i l y require a p e r c e p t u a l
grouping of the stimuli. To test for this possibility, the
performance of infants run in the phonetic task described above was compared to the performanee of a
separate group of infants run in a nonphonetie condition.
In the nonphonetie condition categories were arranged
in such a way that the six stimuli in each class could not
be organized according to phonetic or acoustic characteristics. Subjects were tested using the same procedures
and equipment, plus the same pool of 12 stimuli as in
270 Journal of Speech and Hearing Research
26
TABLE 2. Experimental stages for the nonphonetie condition.
Stage
Category 1
Category2
ba
ba
oa
ba
(F)
(F)
(M)
(F)
na
na
0a
na
(M)
(M)
(F)
(M)
~a
(M)
ga
(F)
da
ma
ba
~a
da
ma
ga
na
(F)
(M)
(F)
(M)
(F)
(M)
(M)
(F)
ma (F)
ba (M)
na (M)
ga (F)
ma (F)
ba (M)
da (M)
1ja (F)
the phonetic condition. The experimental stages for the
nonphonetic condition are shown in Table 2. Subjects
were initially trained on a relatively gross contrast between a male [na] and a female [ba]. The subsequent
stages were analogous to those of the phonetic condition
in terms of the number of tokens added in each stage.
However, sounds were added in such a way that, by the
final stage, it was not possible to organize the stimuli
along any simple dimension: Each class included an
equal n u m b e r of stops and nasals, male voices and
female voices, labials, postdentals, and velars. As in the
phonetic condition, half of the subjects were trained
with category 1 as the comparison class and the other
half with category 2. It was reasoned that the only way
an infant could succeed on this task was to memorize
which individual stimuli were reinforced and which
ones were not. If the performance of infants in the phonetic group proved to be superior to that of the nonphonetic group, the effect could be attributed to perceptual categorization of the speech sounds by infants in the
phonetic group.
Stimuli
The stimuli were naturally produced tokens of[m, n, rj,
b, d, g] in prevocalie position with the vowel [a]. One
adult male and one adult female produced several tokens
of each syllable. Audio recordings were made in a
sound-treated booth with a cardioid microphone (Sennheiser MKH 415T-U) and a high-quality full-track recorder (Nagra 4.2). The talkers were instructed to produce all stimuli with approximately equal durations,
intensities, and slightly falling pitch contours. A VU
meter was used to monitor intensity. The recorded
stimuli were digitized and stored in the disk memory of
a digital computer (DEC PDP 11/10). A sample rate of 20
kHz was used with a maximum amplitude resolution of
eight bits within a ±4-V dynamic range. All signals were
low-pass filtered at 8 kHz and conditioned with an autocorrelator noise-reduction device (Phase Linear 1000).
268-282
June 1983
One token of each syllable produced by the two talkers
was selected for use in the discrimination tests. The tokens were chosen by selecting those stimuli that showed
the closest match on computer-derived measurements of
fundamental frequency contour, intensity contour, and
duration. In the final set of stimuli there were no systematic d i f f e r e n c e s b e t w e e n the stop and nasal
categories in fundamental frequency, overall RMS intensity, or duration. (Measurements of these stimuli are
given in Table A of the Appendix.) Formal listening tests
showed that all stimuli were identified reliably by a
panel of five adult listeners.
Audiotapes for discrimination testing were prepared
by recording stimuli from the two categories on separate
channels of tape. At the output of the D/A converter, the
stimuli were low-pass filtered at 8 kHz, conditioned with
an autoeorrelator noise-reduction device (Phase Linear
1000), and recorded with a constant 1.7-see onset-toonset interstimulus interval. The onsets of the stimuli on
the two channels of each tape were synchronized using a
cueing procedure described by Hillenbrand, Minifie,
and Edwards (1979). Gain settings at the input to the
tape deck (TEAC 3340-S) were adjusted so that the two
stimuli that had contrasted in the initial-training stage
balanced for loudness.
Calibration
Signals were calibrated by a combination of soundlevel measurements and a loudness-balance procedure.
The gain setting at the output of the tape deck was adjusted so that the peak intensity of one syllable in the
initial-training pair measured 65 dBA, using the fastresponse setting of a sound-level meter (Bruel & Kjaer,
Model 2209). A loudness-balance procedure was used to
adjust the output gain of the channel carrying the contrasting syllables. An experimenter used an electronic
switch to alternate between the two channels. The output gain of the channel carrying the contrasting syllables
was adjusted until one adult listener judged that the two
signals were equally loud. These same gain settings
were used for the experimental conditions involving
multiple tokens of the two categories. The loudness balance was checked as part of the daily calibration procedure.
Procedures
1. General. A schematic of the experimental site is
shown in Figure 1. The infant was held on the parent's
lap facing an assistant. An experimenter in an adjacent
room controlled the equipment and was able to observe
the infant on a video monitor. A loudspeaker (ElectroVoice SP-12) was positioned at a 90° angle to the assistant. In fi-ont of the speaker was an electrically operated
stuffed toy bear in a smoked plexiglass box. When activated, the box was illuminated and the bear tapped on a
drum.
HILLENBRAND: Infants" Organization
E-Experimenter
A-Assistant
P-PIruI
I-Infant
Ifll-VisuulIleinfercer
C.Cameru
M-Viiee Muitor
of Speech Sounds 271
TRIAL STRUCTURE
OBSERVATION
INTERVAL
PRECHANGETRiAL "NASAL3
j
NASAL 1 NASAL2 NASAL41
i
STOP 1
STOP4
POSTSTOP2
NASAL 3 NASAL2
NASAL4...
I
CONTROL
TRIAL " NAS AL 3 NASALt
NASAL2 NASAL41f NASAL 3 NASAL 1 NASAL21 NASAL 3 NASAL 2
]
I
NASAL4.
TiME
FIGURE 2. Trial structure for the phonetic condition. The figure
shows stimuli being presented before, during, and after change
and control trials. The subscripts refer to the individual stimuli
in the background and comparison categories. The example
shown here is for stage 3 of the phonetic condition in which the
stop category was reinforced (after Kuhl, 1979b).
@
@
TRIAL STRUCTURE
equipment
: ~
Q
OBSERVATION
INTERVAL
PRE-
POST-
i
CHANGE ,.NASAL5 NASAL2
TRIAL
"
NASAL2 NASAL21L STOP 1
I
STOP 1
CONTROL .NASAL 5 NASAL2
TRIAL "
NASAL2
STOP 1
NASAL6
NASAL6
1
NASAL6 "
]
I
NASAL2 I NASAL4 NASAL4 NASAL41 NASAL 6 NASAL E NASAL 6 .
]
TIME
FIGURE 1. Experimental site for the visually reinforced headturn procedure (from Kuhl, 1979b).
FIGURE 3. Trial structure for the final stage of testing (stage 4).
The stimuli are presented in random order, but each stimulus in
the order is repeated three times (see Kuhl, 1979b).
The experiment was run with a tape deck (TEAC
3340-S) and a logic device. Throughout the entire experinaent, t a p e - r e c o r d e d stimuli were c o n t i n u o u s l y
presented at onset-to-onset intervals of 1.7 see. The assistant's task was to keep the infant's attention by manipulating silent toys. When the assistant judged the
infant to be in a "ready state," that is, quiet and attending to the toys, he pressed a button signaling the experimenter to initiate a 5-see observation interval. Two
kinds of trials could occur during the interval: change
trials or control trials. Figure 2 shows stimuli being
presented before, during, and after change and control
trials for the phonetic condition. During a change trial, a
silent switch initiated a change in tape-recorder channels from the repeating background category to three
presentations from the comparison category. A hand-held
vibrotactile device signaled the start of a 5-sec observation interval to the assistant; a small light mounted on
the monitor signaled the start of the interval to the experimenter. If both the experimenter and the assistant
judged that a head turn occurred during the observation
interval, they independently pressed buttons that activated the visual reinforcer for 3 sec. And-gate circuitry
ensured that the reinforcer would be activated only on
change trials in which both judges voted during the
5-sec observation interval. During a control interval, the
infant continued to hear stimuli from the background
category. On control trials, both the experimenter and
the assistant made a judgment about the occurrence of a
head turn, but reinforcement was not provided, regardless of the infant's response. For the final stage of testing
(stage 4), stimuli were presented using a special threerepetition trial structure described by Kuhl (1979b). As
shown in Figure 3, the stimuli were presented in random order, but each stimulus in the order was repeated
three times. Since a single token was presented on any
given trial, this format made it possible to assign the infant's response to a particular stimulus. On both change
and control trials the experimenter recorded the stimulus
that was presented and the infant's response.
For all stages of the e x p e r i m e n t an infant's performanee was measured by comparing the proportion of
head turns on change trials to the proportion of head
turns on control trials. To reduce the possibility that the
parent or assistant might cue the infant's response, and
to control for bias in judging head turns, music was presented over earphones to both adults in the test room at a
level sufficient to mask a change from one stimulus to
another. The experimenter was able to hear the stimuli
over an audio monitor in the control room and therefore
could have been biased in his judgment of head turns.
Experimenter bias in this task would be revealed by his
failure to agree with the assistant, who was unbiased. Interjudge agreement for all trials was 98%, indicating that
experimenter bias did not play a large role in the judgment of head turns. When the two judges did fail to
agree, the trials were always scored as errors. As a further effort to reduce the possibility of bias, an electronic
probability generator, set at 50%, was used to determine
whether a given observation interval would be a change
or control trial. Since previous work with the head-turn
procedure suggested that long strings of change and control trials increased the probability of infant errors, the
experimenter was instructed to override the probability
generator for a single trial after three consecutive change
or control trials (see Kuhl, 1979b).
272 Journal of Speech and Hearing Research
2. Conditioning the head-turn response. The headturn response was conditioned by initiating a change
trial and, after a few presentations of the comparison
stimulus, activating the visual reinforcer. After a variable
number of these trials, most infants began to make head
turns that anticipated the activation of the visual reinforcer. To be included in the experiment, an infant was
required to make three consecutive anticipatory head
turns. Subjects were a/lowed a maximum 25 trials to
meet the conditioning criterion. Testing on the initialtraining stage was not begun until the infant met the
conditioning criterion. Experience with the head-turn
procedure has shown that infants who meet the conditioning criterion very quickly will sometimes perform
poorly on the initial-training stage. For that reason, all
infants were given a minimum of 15 conditioning trials.
3. Progressing subjects through the experiment. An infant advanced from one stage of the experiment to the
next when he/she met an accuracy criterion of 9 correct
responses in 10 consecutive trials, half being change
trials and half being control trials. If an infant did not
meet this 9-out-of-10 criterion in 20 trials, he/she was
automatically progressed to the next stage of the experiment. When an infant reached the final stage of the experiment, he/she was given as close to 75 trials as possible. A variety of problems prevented this in some eases,
including scheduling difficulties, experimenter error,
and infants who had become fussy after prolonged testing. The number of trials run on the final stage ranged
from 63 to 75, with an average of 68.9 trials.
4. Retraining. It was often the case that infants at various stages of testing would show a marked drop in performanee. In many cases the infant appeared to have forgotten the experimental contingencies or seemed to lose
interest in the task. Infants were retrained by the presentation of conditioning trials--change trials in which the
visual reinforcer was manually activated if the infant did
not respond within about 4 sec of the stimulus change,
Two rules controlled the presentation of these retraining
trials:
1. A single retraining trial was presented after three consecutive misses on change trials.
2. If after the first 15 trials of a session an infant had missed
more than half of the change trials, the next five trials were
retraining trials. Regardless of the stage of testing that the
infant was in, these retraining trials used the pair of stimuli
from the initial-training stage.
5. Testing sessions. A test session was terminated
when either the experimenter or the assistant judged
that the baby was becoming tired or fussy or at the end
of 30 trials. Testing sessions lasted about 10-15 minutes,
with an average of 20 trials per session. Infants were
usually given all of the trials for a particular experimental stage within the same session. However, if a session
was terminated before an infant completed testing on a
given stage, testing on the next session would resume
where the infant left off. Seven or eight sessions were
generally required to complete the experiment.
26
268-282 June 1983
Subjects
The subjects were normal 5a/2- to 61/2-month-old infants
selected by mail solicitation to parents in the Seattle
area. A parent questionnaire was used to screen out infants who (a) had been treated for middle-ear problems,
(b) had a family history of congenital hearing loss, or (c)
were born more than 2 weeks premature or 2 weeks late.
Subjects were assigned randomly to either the phonetic
or the nonphonetic group. A total of 23 subjects began
testing. Subjects were run until eight infants completed
testing in each group. To be included in the study, an
infant had to pass the conditioning criterion of three consecutive anticipatory head-turn responses in the first 25
trials of testing. Six subjects failed to pass the conditioning criterion on the [ma]-[ba] contrast for the phonetic
study. One additional subject in the phonetic group was
eliminated due to an experimenter error, leaving seven
subjects in this group instead of eight. The nonphonetic
condition offered subjects a much grosset, multidimensional contrast, consequently, only one subject in the
nonphonetie condition failed to pass conditioning in the
allotted 25 trials.
RESULTS
The most interesting results of this study come from an
analysis of the babies' responses on the final stage of
each condition. These analyses are discussed first, followed by a description of the infants' performance on the
preliminary stages. Figure 4 displays file percentages of
head turns on change trials and on control trials for infants in the phonetic and nonphonetie groups for the
final stage of testing. The graph shows that more head
turns were observed on change as opposed to control
trials for both groups of infants. The trial-type effect,
however, was much more pronounced for the phonetic
group. Infants in the two groups r e s p o n d e d about
equally often on control trials, but the phonetic infants
responded much more often on change trials than the
nonphonetie infants. A two-way analysis of variance for
trial type and group, with repeated measures on the
trial-type variable, revealed significant main effects for
both trial type (F = 17.4; df = 1, 13; p < .001) and group
(F = 8.0; df = 1, 13; p < .01). There was also a significant group x trial-type interaction (F = 7.2; df = 1, 13; p
< .05), indicating that the trial-type effect was significantly larger for the phonetic group. Post hoe analysis
showed that the trial-type effect was statistieally reliable
for both the phonetic group (F = 11.9; df = 1, 6; p < .01)
and the nonphonetie group (F = 8.0; df = 1, 7; p < .05).
These comparisons indicate that infants in both groups
performed significantly above chance on the final stage
of testing, but that infants in the phonetic group performed with greater accuracy than those in the nonphonetic group.
It was also of interest to determine specifically how
the subjects distributed their responses among the individual sounds in the r e i n f o r c e d and u n r e i n f o r e e d
HILLENBRAND:
100.
100U.I
8O-
oo
70-
tU
or
Z
or
A
uJ
"r
I-.
z
LU
O
or
LU
O.
PHONETIC
GROUP
90-
z
o
O.
NON-PHONETIC
GROUP
50-
CO
Z
o
80.
7060-
50-
I ........
40"
30.
y,
8y,
20-
~,
10"
~,
liHn
0
III/111/
Ul
O.
40~/i////J
30"
SUMMARY:PHONETIC GROUP
STOPS REINFORCED
90-
I ........
60-
Infants' Organizatio~ of Speech Sounds 273
fJJ]JJ]~
bM dM gM bF dF gF mM nM M mF nF nF
(16) (32) (10) (17) (19) (20) (16) (16) (13) (15) (17) (13)
Change
Control
STIMULUS
~JJJ]Jf/
20"
10"
0
Change
Control
Change
Control
TRIAL TYPE
FIGURE 4. Percent head-turn responses on change and control
trials for infants in the phonetic (n = 7) and nonphonetic (n = 8)
groups. The data in this figure and in Figures 5-11 are from the
final stage of testing (stage 4).
FIGURE 5. Percent head-turn responses to each of the stimuli
presented during change trials (shaded columns) and control
trials (unshaded columns) for the phonetic subgroup in which
the stop category was reinforced (n = 3). The figures in parentheses indicate the number of times each stimulus was presented. M = male voice; F = female voice.
100-
SUMMARY: PHONETIC GROUP
NASALS REINFORCED
co
90LU
09
Z
O
80-
O.
60
categories. F i g u r e 5 p r e s e n t s t h e s e data for the t h r e e infants in the p h o n e t i c group w h o w e r e t r a i n e d to turn to
the stop category. T h e six s h a d e d c o l u m n s to the left
show the p e r c e n t a g e o f h e a d turns to each of the six stop
consonants p r e s e n t e d on c h a n g e trials; the six u n s h a d e d
columns to the right show the same data for t h e six nasal
consonants presented during control intervals. The
stimulus is given on the horizontal axis. Since the stimuli
w e r e a r r a n g e d in random order on the audiotape, the exp e r i m e n t e r had no control o v e r w h a t stimulus w o u l d b e
p r e s e n t e d on a g i v e n t r i a l . As a c o n s e q u e n c e , t h e
n u m b e r o f presentations of the stimuli v a r i e d somewhat.
T h e most obvious feature of F i g u r e 5 is that, as n o t e d
p r e v i o u s l y , m a n y m o r e h e a d turns w e r e o b s e r v e d on
c h a n g e trials as c o m p a r e d to control trials. M o r e specifically, h o w e v e r , infants s e e m e d to turn in r o u g h l y e q u a l
proportions in r e s p o n s e to each o f the six sounds in the
two categories; that is, t h e y d i d not show any p r o m i n e n t ,
c o n s i s t e n t p r e f e r e n c e for a particular t a l k e r or place-ofarticulation value. This was also true for the s u b g r o u p of
four infants r e i n f o r c e d for h e a d turns in r e s p o n s e to the
nasal consonants (see F i g u r e 6). Again, the g e n e r a l picture is one o f a r e l a t i v e l y e v e n d i s t r i b u t i o n o f r e s p o n s e s
a m o n g the stimuli. It is e s p e c i a l l y i n t e r e s t i n g that the
infants d i d not show a p r e f e r e n c e for t h e stimulus u s e d
in the initial-training stage, shown at the e x t r e m e left of
each graph. In fact, F i g u r e 6 shows a slight t e n d e n c y to
avoid the training token, although this effect is not particularly p r o m i n e n t .
uJ
re
70-
Z
cc
60-
1-
50-
7,
z
<
LU
-1- 40F-
z
30-
rr
20-
LU
O
z
z
z
~
Z
LU
Q.
100
F/.
mM nM riM
ml~ nF r}g bM dM gM bF dF
gg
(26) (30) (19) (33) (24)(18) (21) (28) (23) (23) (22) (23)
Control
Change
STIMULUS
FIGURE 6. Percent head-turn responses to each Of the stimuli
presented during change trials and control trials for the phonetic subgroup in which the nasal category was reinforced (n =
4).
A c l e a r e r picture o f these results can b e o b t a i n e d b y
c o m b i n i n g the data for all seven infants in the p h o n e t i c
group. This can b e d o n e b y contrasting reinforced versus
u n r e i n f o r c e d stimuli and c o l l a p s i n g the data into b r o a d e r
categories such as "labial, male," " d e n t a l , male," and so
on. A graph c o m b i n i n g the data from all subjects in the
p h o n e t i c group is shown in F i g u r e 7. T h e i m p r e s s i o n of
an e v e n d i s t r i b u t i o n of r e s p o n d i n g to the stimuli is even
274 Journal of Speech and Hearing Research
26 268-282
100-
SUMMARY: PHONETIC GROUP
ALL SUBJECTS
100"
90-
uJ
GO
z
no
09
t,u
~r
80"
o3
IJJ
70-
zrr
60-
60-
P"
50-~
50-
Lu
40,
40-
6<
w.
I.:,,
re
LU
Q.
~:
30-
I M Y :
NON-PHONETIC GROUP
TRAINING STIMULUS:bF
90.
o3
z
o
o.
June 1983
80.
70
30-
204
10'
10"
o
LM DM VM LF DF
VF LM DM VM LF DF VF
(42) (62) (30) (50) (43) (36) (37) (44) (36) (38) (39) (36)
Change
bF rim f
e l i e ~ rle gF mF bM dM qF
(27) (17) (21) (27) (21) ( ~ ) (22) (25) (15) (25) (42) (10)
Control
STIMULUS
FIGURE 7. Percent head-turn responses to each of the stimuli
presented during change trials and control trials for all subjects
in the phonetic group (n = 7). L = labial; D = postdental; V =
velar.
stronger in this graph. T h e m e a n r e s p o n s e p e r c e n t a g e to
the reinforced stimuli was 67.5%, with a range o f only
8% and a standard d e v i a t i o n o f 2.9%. A t h r e e - w a y analysis of variance for talker (male vs. female), place of articulation (labial vs. p o s t d e n t a l vs. velar), a n d trial t y p e
(change vs. control) r e v e a l e d a significant main effect for
the trial-type factor only (F = 13.4; df = 1, 6; p < .01).
There were no effects for talker (F = 1.1; df = 1, 6; 19 NS)
or place of articulation (F = 1.4; df = 2, 12; p NS), and
none of the interactions a p p r o a c h e d significance.
The pattern of r e s p o n d i n g in the n o n p h o n e t i c group
was quite different from that o f the p h o n e t i c group. Figure 8 shows the p e r c e n t a g e o f h e a d turns to each o f the
stimuli p r e s e n t e d to the group o f four n o n p h o n e t i c infants who w e r e r e i n f o r c e d in initial training for h e a d
turns to [ba] (female). As a group, t h e s e infants t e n d e d to
turn more often tO the six stimuli in the reinforced class
than to those in the u n r e i n f o r c e d class (25% vs. 16%).
But, unlike the pattern o b s e r v e d for the p h o n e t i c infants,
the responses were d i s t r i b u t e d very u n e v e n l y among the
six reinforced stimuli. Specifically, m a n y m o r e responses
were cued b y the [ba] (female) stimulus, w h i c h served as
the reinforced token in the initial-training contrast. A
very similar pattern can b e seen in F i g u r e 9 for the subg r o u p o f four infants t r a i n e d w i t h t h e c a t e g o r i e s reversed, that is, the infants for w h o m [na] (male) served as
the reinforced stimulus in initial training. Again, the infants were r e s p o n d i n g most often to the stimulus u s e d in
the initial-training contrast, with relatively low levels o f
r e s p o n d i n g to the o t h e r stimuli. As a group, the e i g h t
subjects in the n o n p h o n e t i c condition r e s p o n d e d to 29%
of the c h a n g e trials, c o m p a r e d to 19% of the control
trials. However, w h e n data are r e m o v e d from trials on
which training stimuli w e r e p r e s e n t e d , the rate o f responding on change trials is only 18%, almost identical
Change
Control
STIII~ULUS
FIGURE 8. Percent head-tuna responses to each of the stimuli
presented during change trials and control trials for the nonphonetic subgroup in which [ba] (female) served as the training
stimulus (n = 4).
100-
I Y :
O9 90LU
O9
Z
o 80
~A~G
~ N - P H O N E T I C GROUP
STIMULUS: nM
LU 70,
Z
60.
5040"
~.
30-
20-
~,
10-
o
nM
gF
~
~
~IF
bF
r~M dF
mM
gM
nF
(31) (24) ( 1 ) ( M ) ( g ) ( ~ ) ( H ) (20) (24) (23) (17) (27)
C ~
Control
ITBIIULUS
FIGURE 9. Percent head-tuna responses to each of the stimuli
presented during change trials and control trials for the nonphonetic subgroup in which [na] (male) served as the training
stimulus (n = 4).
to the r e s p o n s e rate on control trials, This s u g g e s t s ' t h a t
the significant trial-type effect found for this group was
d u e a l m o s t e x c l u s i v e l y to r e s p o n s e s to t h e t r a i n i n g
stimulus.
It was not p o s s i b l e tm t m m b i n e the data from the two
subgroups in the hOt, p h o n e t i c condition. F o r the phonetic c o n d i t i o n this w a s a c c o m p l i s h e d b y c o m b i n i n g the
responses to r e i n f o r c e d stimuli w h i c h s h a r e d values on
all d i m e n s i o n s e x c e p t the stop/nasal d i m e n s i o n . This
perfect symmetry, o f course, d i d not exist for the non-
HILLENBRAND:
p h o n e t i c categories. C o n s e q u e n t l y , it was n o t p o s s i b l e to
line up each stimulus in one category with a stimulus in
the other category that differed on a single feature value.
TABLE 3. Number of trials required to reach criterion for subjects in the phonetic and nonphonetic groups.
Subject
Profiles of Individual Subjects
T h e data p r e s e n t e d thus far are the results o f averages
from groups o f subjects. Results from the s e v e n indiv i d u a l infants in the p h o n e t i c group are p r e s e n t e d in
F i g u r e 10. T h r e e m e a s u r e s are given to the right o f each
graph: (a) t h e p e r c e n t a g e o f h e a d turns on c h a n g e trials
(CH), (b) t h e p e r c e n t a g e o f h e a d turns on conta'ol trials
(CL), a n d (c) the overall p e r c e n t correct on both change
a n d control trials (%C). T h e s e graphs s h o u l d b e exami n e d with some caution b e c a u s e o f the variation in the
n u m b e r o f p r e s e n t a t i o n s o f t h e stimuli, g i v e n in pare n t h e s e s on the horizontal axis. Since the e x p e r i m e n t e r
had no control over w h i c h stimulus was p r e s e n t e d on a
given trial, some o f the data points in t h e s e graphs are
b a s e d on v e r y few r e s p o n s e s . E x a m i n a t i o n o f t h e s e data
clearly shows that t h e infants do not form a h o m o g e n e ous group. T w o o f t h e infants, Subjects 3 a n d 7, a p p e a r e d
to b e r e s p o n d i n g r a n d o m l y to t h e stimuli, w h i l e the rem a i n i n g five infants p e r f o r m e d with r e l a t i v e l y high accuracy.
F i g u r e 11 shows the r e s p o n s e patterns o f the e i g h t infants t e s t e d in t h e n o n p h o n e t i e group. I n t e r s u b j e e t variability in the p e r f o r m a n c e o f t h e s e subjects is also evident. Some o f the infants, p a r t i c u l a r l y Subjects 1, 2, 4,
and 6, a p p a r e n t l y f o u n d the task very difficult a n d prod u c e d w h a t s e e m e d to b e e s s e n t i a l l y r a n d o m h e a d - t u r n
r e s p o n s e s to t h e 12 stimuli. O t h e r infants, h o w e v e r , res p o n d e d w i t h some c o n s i s t e n c y to the stimulus u s e d in
the initial-training contrast, shown at the e x t r e m e left o f
e a c h g r a p h . S u b j e c t 8, in fact, a p p e a r e d to h a v e
m e m o r i z e d a s e c o n d stimulus. It is i n t e r e s t i n g that this
s e c o n d stimulus ([ga], female) has little in c o m m o n w i t h
the training stimulus final, male). O n the o t h e r hand,
Subject 3, w h o was initially t r a i n e d to [ga] (female), res p o n d e d a l m o s t e x c l u s i v e l y to the tokens p r o d u c e d b y
the female talker. T h e p a t t e r n shown b y this infant is
m o r e typical of o t h e r infants who have b e e n run u s i n g
this t y p e of p r o c e d u r e - - t h a t is, some a t t e m p t b y the infant to formulate a g e n e r a l rule to organize the stimulus
c a t e g o r i e s (Kuhl, H o l m b e r g , Morgan, H i l l e n b r a n d , &
C a m e r o n , Note 3).
Results from Preliminary Stages
T h e data d e s c r i b e d to this p o i n t w e r e d e r i v e d from
analyses of the infants' responses on t h e final stage o f the
e x p e r i m e n t . This section p r o v i d e s a b r i e f d e s c r i p t i o n of
the results from the p r e l i m i n a r y stages o f the experiment; a m o r e d e t a i l e d account of t h e s e results can b e
found in H i l l e n b r a n d (Note 4). T a b l e 3 shows the results
from the first t h r e e e x p e r i m e n t a l stages a n d from the
c o n d i t i o n i n g p h a s e for infants in the p h o n e t i c a n d nonp h o n e t i c groups. F o r the c o n d i t i o n i n g p h a s e the erite-
Infants" Organization of Speech Sounds 275
Condition
1
Experimental stage
2
3
Phonetic group
1
2
3
4
5
6
7
10
13
11
10
20
21
20
Nonphonetic group
1
8
2
6
3
7
4
9
5
20
6
9
7
5
8
3
20
__a
-10
--
16
_
-12
15
--
_
-17
14
-
10
10
10
10
-
-
-
-
aSubject failed to meet criterion (indicated by dashes).
rion was t h r e e c o n s e c u t i v e anticipatory h e a d turns; for
the t h r e e e x p e r i m e n t a l stages the criterion was n i n e correct r e s p o n s e s in 10 c o n s e c u t i v e trials.
O n e fairly p r o m i n e n t finding from t h e s e tables is that,
on the average, infants in the p h o n e t i c group r e q u i r e d
more trials to reach the c o n d i t i o n i n g criterion (~ = 15.0)
than d i d infants in the n o n p h o n e t i c group (~ = 8.7). This
difference was p r e d i c t a b l e since the n o n p h o n e t i c infants
w e r e t r a i n e d on a contrast involving differences in several a c o u s t i c d i m e n s i o n s , w h i l e t h e p h o n e t i c infants
w e r e t r a i n e d on a m i n i m a l pair. A s e c o n d f e a t u r e o f
i n t e r e s t in these tables is that in the majority o f cases
infants d i d not m e e t the 9-out-of-10 accuracy criterion
and, c o n s e q u e n t l y , w e r e p r o g r e s s e d to the next experim e n t a l stage after 20 trials. This was true for b o t h groups
a n d for all t h r e e stages. This was not particularly surprising since previous work has shown that infants typically
r e q u i r e m o r e than 20 trials to reach criterion on consonant contrasts ( H o l m b e r g et al., 1977).
A more r e v e a l i n g picture o f the infants' performance
t h r o u g h o u t the e x p e r i m e n t can b e seen b y e x a m i n i n g the
overall p e r c e n t a g e o f correct r e s p o n s e s as a function of
the e x p e r i m e n t a l stage. M e a n a n d s t a n d a r d d e v i a t i o n
p e r c e n t correct for each e x p e r i m e n t a l stage are p l o t t e d in
F i g u r e 12 for the p h o n e t i c group and in F i g u r e 13 for the
n o n p h o n e t i c group. F i g u r e 12 shows that t h e r e was no
t e n d e n c y for the p e r f o r m a n c e of the p h o n e t i c infants to
d e c l i n e as the e x p e r i m e n t b e c a m e m o r e complex. In fact,
t h e s e data show a slight t r e n d in the o p p o s i t e direction.
In contrast, the p e r f o r m a n c e o f the nonphonetie infants
d r o p p e d rather s h a r p l y from stage 1 to stage 2 and rem a i n e d at a r e l a t i v e l y low level. T h e s e results suggest
t h a t t h e n o n p h o n e t i c infants w e r e a b l e to l e a r n t h e
head-turn task b u t w e r e u n a b l e to m e m o r i z e the unrel a t e d tokens that w e r e a d d e d as the e x p e r i m e n t progressed.
276 Journal of Speech and Hearing Research
I00
to
-
90-
P~
~
706o
SUBJECTR5
PHONETIC GROUP
90
80.
CH=86
CL=30
%C=78
~ ~o
~ 4o
~
20
!,o
~
60
~
50
CH=I00
CL=26
%C=87
HHnnl
~: 40
~ 30
~
a.
10
20
10
bM aM gM bF dF gF mM nM rIM mF nF ~F
(6) (14) (3} (S) (3) (T) (5) (71 (3) (5) (5) (4)
Change
Control
STIMULUS
rnM nM r(M mF nF qP bM dM gM bF dF gF
(7) (7) (5) (12)(9) (4) (3) (7) (3) (5) (5) (81
Change
Control
STIMULUS
SUBJECT#2
PHONETIC GROUP
SUBJECT#6
PHONETIC GROUP
CH=68
CL=I5
%C=77
CH=88
CL=9
%C=90
I'M dM gM bF CiF gF mM nM qM mF nF I]F
(5) (9) (2) (7) (8) (61 (4) (3) (2) (6) (3) (4)
Change
Conlrol
mM nM r~l mF nP t]F bM dM gM bF dF gF
(3) (g) (3) (6) (S) (5) (31 (E] (81 (51 (51 (71
Change
Contro(
STIMULUS
STIMULUS
100-
100 ~
SUBJECT~3
PHONETIC GROUP
90"
CO
70-
~ 7"0-
60"
z 6o
~ 5o
~ 4o
CH=I9
CL=26
%C=47
50"
4030-
10
10"
mM nM ~M mF oF t]F bM dM gM bF
bM dM gM bF dF gF mM nM riM mF nF tiE
(5) (9) (6) (4) (8) (7) (7) (E) (E) (4) (9) (5)
Change
Control
STIMULUS
1°°I
(9)
dF gF
(7) (4) (t0) (t) (4) (~t) (3) (lg) (9) (4) (t)
Change
Control
STIMULUS
SUBJECT~4
PHONETIC GROUP
go
7O¸ !
CH=34
CL=28
%C=53
~: 20
20-
50;
SUBJECT#7
PHONETIC GROUP
90~
g 5~
80-
0
June 1983
100.
SUBJECT~I
PHONETIC GROUP
F~
80-
~
26 268-282
~ F~
CH=79
CL=9
%C=85
50'
,o.
30.
~:~
20
I~
10.
O.-
mM nM riM mF nF r]F bM dM gM bF dP gF
(7) (10) (7) (5) (6) (3) (4) (12) (0) (4) (8) (7)
Change
Control
STIMULUS
FIGURE 10. Individual response profiles for subjects in the phonetic group. The figures to the right of each graph indicate the
percentage of responses on change trials (CH), the percentage of responses on control trials (CL), and the overall percent correct
(%C).
HILLENBRAND: Infants" Organization of Speech Sounds 277
1004
=
90-
100-
SUBJECI
NON-PHONETIC GROUP
90-
Z
80-
~
~
~
o
8070-
706050.
C H = 15
C L = 18
%C= 49
"1
z
6O-
2
50-
C H = 18
C L = 13
%C= 53
~, 30-
30-
,o.
20"
~
SUBJECT #5
NON-PHONETIC GROUP
1
lO.
== lo-
n
bF
(7)
G
~M dF mM oM nF nM gF mg bM dM
(B) (5) (E) (4) (11) (7) (S) (5) (B) (9)
Change
Control
~F
(1)
nM gF mF ToM dM =IF bF r~M dF dM gM nF
(9) (5) (6) (5) (6) (0) (5) (3) (5) (9) (3) (5)
Change
Control
STIMULUS
STIMULUS
100-
SUBJECT#6
NON-PHONETIC GROUP
100"
90-
SUBJECT #2
NON-PHONETIC GROUP
90"
B0"
80-
o
C H = 14
CL=
8
%C= 53
70¢
Z
60so-
C H = 37
C L = 30
%C= 54
7060-
~ 4030-
Dn
~ 200
30-
nM
bF rim dF mM gM nF nM gF mF BF dM ~F
(7) (5) (8) (4) (7) (4) (7) (7) (4) (B) (10) (3)
Change
Control
STIMULUS
80-
so
70
513
40
~
30
~
~o .
20
10"
70"
C H = 42
C L = I0
%C= 66
60
~
.o.11
40"
10"
Change
0
C H = 32
C L = 19
%C= 57
dM gM nP
(E) (2) (3)
Control
STIMULUS
100-
1009
SUBJECT #4
NON-PHONETIC GROUP
90-
SUBJECT#8
NON-PHONETIC GROUP
~Z 80-
80
~
70
C H = 28
C L = 24
%C= 52
~ EO
50
70-
CH=43
CL=II
%C=66
~ 60-
i~ 40°°
~ 30
~ 20
10.
10'
0
nM gF mF bM dM rjF hF rIM dF
(8) (5) (3) (9) (6) (3) (4) (5) (7)
Change
Control
STIMULUS
,
R
50"
bF ~M dF mM gM nF nM gF mF bM dM r}F
(10) (2) (6) (T) (4) (T) (2) (S) (5) (5) (11) (2)
90
dg mM gM nF
SUBJECT#7
NON-PHONETIC GROUP
90"
~uI~c~:~T,cGROUP
90
~
rlF bF qM
(9)" (10) "(9) (E) (~) (E) (7)
Control
STIMULUS
100'
100
=o
gF mF bM dM
(6) (7) (2) (7) (4)
Change
bF ~M dF mM gM nP nM gF mF bM dM ~P
(3) (4) (2) (10) (B) (7) (B) (7) (t) (8) (12) (4)
Change
Control
STIMULUS
nM gF mF bM dM rjF bF rim dF mM 9M nF
(8) ( 7 ) (7) (5) (2) (6) (7) (3) (4) (E} (B) (12)
Change
Control
STIMULUS
FIGURE 11. Individual response profiles for subjects in the nonphonetic group. The figures to the right of each graph indicate
the percentage of responses on change trials (CH), the percentage of responses on control trials (CL), and the overall percent
correct (%C).
278 Journal of Speech and Hearing Research
100-
26
NON-PHONETIC GROUP
90I(9
IJ.I
rr
ft.
0
(9
t--
807060-
Z
I.U
(9
50-
1.1.1
n
40-
Ix
268-282
June 1983
infants in the nonphonetic group. As a consequence,
more infants in the phonetic group failed to meet the
conditioning criterion. For this reason, it could be argued that the phonetic/nonphonetic difference was the
result of bias in subject selection. It is possible that the
more difficult initial-training contrast in the phonetic
condition resulted in the selection of better subjects than
those in the nonphonetic condition.
To test for this possibility, an additional control condition was run using a nonphonetie task in which the
initial-training contrast was the same as that for the phonetic group--[ma] versus [ba]. The experimental stages
for this condition are shown in Table 4. As in the pho-
30-
07
I
I
I
I
1
2
3
4
EXPERIMENTAL STAGE
Stage
FIGURE 12. Overall percent correct for each experimental stage
for the phonetic group. The error bars indicate one standard deviation.
100-
PHONETIC GROUP
90I-.
0
ILl
rr
IX
0
(9
lZ
uJ
0
IX
I,1.1
TABLE 4. Experimental stages for an additional nonphonetie
control condition.
8070-
Category 1
ba
ba
ga
ba
ga
ma
na
ba
ga
ma
na
da
oa
(M)
(M)
(F)
(M)
(F)
(F)
(M)
(M)
(F)
(F)
(M)
(M)
(F)
Category2
ma
ma
oa
ma
oa
da
ba
ma
13a
da
ba
ga
na
(M)
(M)
(M)
(M)
(M)
(F)
(F)
(M)
(M)
(F)
(F)
(M)
(F)
60504030-
oq
I
!
!
I
1
2
3
4
EXPERIMENTAL
STAGE
FIGURE 13. Overall percent correct for each experimental stage
for the nonphonetic group. The error bars indicate one standard
deviation.
An Additional Control Condition
As was discussed previously, the nonphonetie condition was designed to test infants on a set of stimuli comparable to that used in the phonetic condition but which
could not be grouped on the basis of auditory similarity.
The relatively good performance of infants in the phonetic group led to the conclusion that these subjects recognized similarities among sounds in the stimulus
categories. However, infants in the phonetic group were
initially trained on a more difficult contrast than were
netic condition, the initial-training stage contrasted [ma]
(male) with [ha] (male). However, as in the nonphonetic
condition described previously, stimuli were added in
subsequent stages in sueh a way that the categories
could not be organized by talker or by place or manner
of production. Testing procedures were identical to
those described previously except that the tape deck and
modular programming logic were replaced by a digital
computer (DEC PDP 11/34). A computer program presented stimuli and controlled experimental contingencies aeeording to the stone rules and with the same timing parameters as were used to design the programming
logie described previously. Six 5V2- to 6a/2-month-old inIeants began testing; two of these subjects failed to pass
the conditioning criterion.
The results of this control experiment do not support
the possibility that tile phonetie/nonphonetie difference
was due exclusively to bias in subject selection. Average
performance on the initial-training stage was 68% correct, comparable to that of the phonetic group. However,
unlike the performance of the phonetic group, these subjects' performance fell very close to ehance and stayed
there for the remaining stages. Average performance for
the final stage was 58% correct. These findings support
the eonelusion that infants in the phonetic condition per-
HILLENBRAND: Infants'
formed well because they recognized the perceptual
similarity of syllables sharing a value on a feature dimension.
DISCUSSION
The principal findings of this study were:
1. The overall performance of infants in the phonetic
group was significantly better than that of the nonphonetic group.
2. The phonetic infants tended to distribute their responses more or less evenly among the stimuli in the
reinforced category, while infants in the nonphonetic
group tended to favor the stimulus that was used in
the initial-training contrast.
3. There was no evidence of a systematic decline in the
performance of phonetic infants as the experiment
became more complex, whereas the performance of
infants in the nonphonetic group tended to drop as
tokens were added to the two categories.
These results suggest that infants do recognize the
similarity of speech sounds that share a value on a
phonetic-feature dimension. The alternate possibility
that simple rote memorization was responsible for these
results seems unlikely in light of the relatively poor
overall performance of infants in the nonphonetic group.
This same phonetic/nonphonetic difference was also
found in a similar study examining categorization of
fricatives (Kuhl et al., Note 3) and in a study examining
categorization of nasal consonants (Hillenbrand, Note 2).
It is important to point out, however, that the nonphonetic results do not prove that memorization was not involved in any form in the phonetic condition. It is a
well-established finding that memorization is most efficient when the items to be recalled can be organized in
some fashion (e.g., see Bartlett, i932; Bransford &
Franks, 1974; T u l v i n g & D o n a l d s o n , 1972). T h e
phonetic/nonphonetie effect suggests that if memorization was involved, the process was aided by the perceptual similarity of the speech sounds. Whatever the exact
role of memory in these experiments, it appears that recognition of perceptual similarity is a necessary condition
for good performance on this kind of task.
One additional issue that needs to be addressed in interpreting these findings concerns the discriminability of
tokens within the stop and nasal categories. To qualify as
categorization, it mugt be demonstrated that the tokens
in the particular class are being treated as equivalent but
different. That is, it would not be interesting to demonstrate common responses to the class [b, d, g] if infants
could not discriminate stop-consonant place of articulation. The literature provides ample evidence that infants
can discriminate among voieed stop consonants (Eimas,
1974; Morse, 1972). In addition, a recent experiment
using procedures very similar to those described in this
report p r o v i d e s e v i d e n c e for the d i s c r i m i n a t i o n of
nasal-consonant place of articulation by young infants
(Hillenbrand, Note 2). T h e s e discrimination results
Organization of Speech Sounds
279
suggest that infants in the present study demonstrated
what Bornstein (1981) has called "equivalence classification," or "the equivalent treatment of discriminably different stimuli based on their perceptual similarity" (p.
4o).
Perceptual Development and Theories of Speech
Perception
The present results extend the findings of previous research on infants in which speech-sound categorization
was tested at the level of the phonetic segment (Fodor et
al., 1975; Holmberg et al., 1977; Kuhl, 1977; 1979b; Kuhl
& Miller, 1982; Hillenbrand, Note 2). Taken as a group,
these studies suggest that young infants have relatively
sophisticated abilities to focus on the critical acoustic
dimensions that "define" speech-sound categories while
ignoring prominent variation in noncritical dimensions.
These findings are analogous to the more extensive developmental literature on perceptual constancies in vision. The work of Bower (1964), for example, suggests
that young infants perceive the true size of an object despite the substantial variations in retinal-image size that
result when object-observer distance is changed.
The exact role of experience is not clear in these vision experiments, nor is it a simple issue in relation to
the infant studies on speech-sound categories. Since the
subjects in these studies were not newborns, it is not
possible to rule out learning or simply the effects of exposure to speech in accounting for these results. Two
conclusions seem reasonable, however. First, if these
abilities are learned, they are learned very quickly and
apparently without any specific training. Second, and
perhaps more important than the specific question of innateness, these abilities predate the acquisition of detailed knowledge of speeeh production and the acquisition of sophisticated speech-comprehension abilities.
This observation bears directly on specific theoretical
debates in speech-perception research. An important
contention of "motor theories" of speech perception is
that the invarianee problem is resolved by processes that
involve the mediation of articulatory knowledge. The results of the present study, and other demonstrations of
perceptual constancy for speech by infants, suggest that
sophisticated articulatory knowledge is not a necessary
condition for the demonstration of these abilities. It appears that prelinguistie infants are capable of extracting
the acoustic properties that form the basis of phonetic
categories. If this general finding is corroborated by further research, it would seem to support the anditorybased theories proposed by Fant and others (Fant, 1967;
Miller, 1977; Miller et al., 1977; Searle et al., 1979).
However, it is possible to formulate a version of an
articulation-based theory consistent with the infant findings. It is necessary only to assume that the artieulatory
knowledge which mediates the perception of speech is
phylogenetically rather than ontogenically acquired; that
is, that part of human genetic endowment is a speciesspecific mechanism for speech perception. In fact, this
sort of approach has been successful in explaining the
280 Journal of Speech and Hearing Research
perception of biologically relevant signals in other
species (Hailman, 1969; Marler, 1970; 1976). However,
recent experiments on speech perception by nonhuman
listeners are not consistent with this view. Research on
the dog (Barn, 1975) and the chinchilla (Burdick & Miller, 1975; Kuhl & Miller, 1975; 1978) suggests that
nonhuman listeners are able to sort speech sounds on
the basis of phonetic similarity across variations in noncritical dimensions. Taken together, the infant and animal findings suggest that acoustic invariants are available in the speech signal and, further, that the mammalian auditory system seems capable of extracting these
properties in a variety of contexts.
Implications for Phonological Development
The phonetic condition contrasted a category of voiced
stop consonants with a category of nasal consonants. The
performance of subjects in this task indicates that infants
are capable of organizing speech sounds on the basis of
categories at least this broad or "abstract." The feature
categories tested, however, are phonologically organized
within even broader feature classes, such as [_-+continuant] or [_'sonorant]. It would be interesting tO determine w h e t h e r infants are capable of o r g a n i z i n g
speech sounds based on very broad feature categories
such as these. For example, would infants reinforced for
head turns to nasal consonants also respond to presentations of other sonorants, such as liquids and semivowels,
but not to presentations of obstruents, such as fricatives
and affricates? The importance of determining the infant's proclivities for classifying speech sounds is that
these kinds of perceptual abilities may form the basis for
acquiring phonological rules that appeal to feature
categories.
There are a number of phonological rules that appeal
to the nasal/oral distinction. For example, in most
dialects of American English, voiced stops that precede
homorganic syllabic nasals are released nasally rather
than orally (e.g., " s a d d e n " ) . Most d e s c r i p t i o n s of
phonological rule systems suggest that rules such as
these are specified in terms of values on feature dimensions rather than individual phonetic segments. While
the present results do not argue that infants are born
with anything that could be described as "phonological
k n o w l e d g e , " it is possible that the acquisition of
phonological rules may be aided by the infant's recognition of the inherent perceptual similarity of speech
sounds sharing particular feature values. On a related issue, some investigators have argued that children do not
learn the sound system of their language in a
straightforward " s e g m e n t - b y - s e g m e n t " fashion, but
rather by learning the hierarchical organization of features and feature contrasts (Blache, 1978; Jakobson,
1968; Smith, 1973). More detailed studies of the type
presented here might reveal a relationship between the
acquisition of phonological rules and phonetic segments
and the relative difficulty of organizing speech sounds
along various feature dimensions.
26
268-282
June 1983
ACKNOWLEDGMENTS
This work is a portion of a dissertation conducted at the University of Washington's Child Development and Mental Retardation Center under the direction of Patricia Kuhl. Her careful
guidance is gratefully acknowledged, as is the advice of Fred
Minifie, Wesley Wilson, and Philip Dale. I would also like to
thank Jean Tully, Tristan Holmberg, Chris Prall, and Kyum-Ha
Lee for their valuable contributions to this project. This work
was supported by a research contract from the National Institute
of Child Health and Human Development to Dr. Fred Minifie
(NICHD HD-3-2793), a grant from the National Science Foundation to Dr. Patrieia Kuhl (BNS 79-13767), and by an Annual
Fund Doctoral Fellowship to the author from the Graduate
School of the University of Washington.
REFERENCE
NOTES
1. KUHL, P. K., & HILLENBRAND, J. Speech perception by
young infants: Perceptual constancy for categories based on
pitch contour. Paper presented at the biennial meeting of
the Society for Research in Child Development, San Francisco, 1979.
2. HILLENBRAND,J. Speech perception by infants: Categorization along a nasal consonant place dimension. Manuscript
submitted for publication.
3. KUIJL, P. K., HOLMBERG, T. L., MORGAN, K. A., HILLEN~RAND, J., & CAMERON,P. Perception of equivalence for
fricatives in CV syllables. Manuscript in preparation.
4. HILLENBRAND,J. Perceptual organization of speech sounds
by young infants. Unpublished doctoral dissertation, University of Washington, 1980.
5. PRALL, C. W., • HILLENBRAND,J. AUDED: A time-domain
analysis and editing program for audio signals. Technical
report, Northwestern University, Evanston, IL, 1980.
REFERENCES
BARTLETT, F. C. Remembering. Cambridge, England: Cambridge University Press, 1932.
BARU, A. V. Discrimination of synthesized vowels [a] and Ill
with varying parameters in dog. In G. Fant & M. A. A.
Tathum (Eds.), Auditory analysis and perception of speech.
New York: Academic Press, 1975.
BLACI-IE, S. E. The acquisition of distinctive features. Baltimore: University Park Press, 1978.
BORNSTEIN, M. H. Two kinds of perceptual organization near
the beginning of life. In W. A. Collins (Ed.), Aspects of the
development of competence. Hillsdale, NJ: Lawrence
Erlbaum Associates, 1981.
BOWER, T. G. R. Discrimination of depth in premotor infants.
Psychonomic Science, 1964,1,368.
BKaNSFOrU),J. D., & FRANKS,J. J. Memory for syntactic form as
a function of semantic context. Journal of Experimental Psychology, 1974, 103, 1037-1039.
BUttDICK, C. K., & MILLER, J. D. Speech perception by the
chinchilla: Discrimination of sustained/a/and/i/. Journal of
the Acoustical Society of America, 1975, 58, 415-427.
EIMAS, P. D. Auditory and linguistic processing of cues for
place of articulation by infants. Perception & Psychophysics,
1974, 16, 513-521.
EIMAS, P. D., & MILLER,J. L. Discrimination of information for
manner of articulation. Infant Behavior and Development,
1980, 3, 367;375.
FANT, G, Acoustic theory of speech production. The Hague:
Mouton, 1960.
FANT, G. Auditory patterns of speech. In W. Wathen-Dunn
(Ed.), Models for the perception of speech and visual form.
Cambridge: MIT Press, 1967.
HILLENBRAND: Infants' Organization of Speech Sounds
FODOR, J. A., GARRETT, M. F., & BRILL, S. L. Pi-ka-pu. The
perception of speech sounds by pre-linguistic infants. Perception & Psychophysics, 1975, 18, 74-78.
FUJIMURA, O. Analysis of nasal consonants. Journal of the
Acoustical Society of America, 1962,34, 1865-1875.
HAILMAN,J. P. How an instinct is learned. Scientific American,
1969, 221, 98-106.
HILLENBRAND,J. Categorization of stop and nasal consonants
by young infants. Journal of the Acoustical Society of
America, 1980, 68(Suppl. 1), S31(A).
HILLENBRAND, J., MINIFIE, F. D., & EDWARDS, T. J. Tempo of
spectrum change as a cue in speech-sound discrimination by
infants. Journal of Speech and Hearing Research, 1979, 22,
147-165.
HOLMBERG, T. L., MORGAN, K. A., & KUHL, P. K. Speech perception in early infancy: Discrimination of fricative consonants. Journal of the Acoustical Society of America, 1977,
62(Suppl. 1), $99(A).
JAKOBSON, R. Child language, aphasia and phonological universals. The Hague: Mouton, 1968.
KUHL, P. K. Speech perception in early infancy: Perceptual
constancy for the vowel categories/a/and/o/. Journal of the
Acoustical Society of America, 1977, 62(Suppl. 1), $39(A).
KUHL, P. K. Models and mechanisms in speech perception:
Species comparisons provide further contributions. Brain,
Behavior and Evolution, 1979, 16,374-408. (a)
KL~L, P. K. Speech perception in early infancy: Perceptual
constancy for spectrally dissimilar vowel categories. Journal
of the Acoustical Society of America, 1979, 66, 1668-1679. (b)
KUHL, P. K., & MILLER, J. D. Speech perception by the chinchilla: Voiced-voiceless distinctions in alveolar plosive consonants. Science, 1975,190, 69-72.
KUItL, P. K., & MILLER, J. D. Speech perception by the chinchilla: Identification for synthetic VOT stimuli. Journal of the
Acoustical Society of America, 1978, 63, 905-917.
KUHL, P. K., & MILLER, J. D. Discrimination of auditory target
dimensions in the presence or absence of variation in a second dimension by infants. Perception & Psychophysics, 1982,
31,279-292,.
LmERMAN, A. M. The grammars of speech and language. Cognitive Psychology, 1970, 1,301-323.
281
LIBERMAN, A. M., COOPER, F. S., SHANKWEILER,D. P., &
STUDDERT-KENNEDY, M. Perception of the speech code.
Psychological Review, 1967, 74, 431-461.
MARLER, P. A comparative approach to vocal learning'. Song
development in white-crowned sparrows. Psychological
Monographs, 1970, 71, 1-25.
MARLER, P. Sensory templates in species-specific behavior. In
J. Fentress (Ed.), Simpler networks and behavior. Sunderland: Sinauer Associates, 1976.
MmLER, J. D. Perception of speech by animals: Evidence for
speech processing by mammalian auditory systems. In T. H.
Bullock (Ed.), Recognition of complex auditory signals. Berlin: Abakon Verlagsgesellschaft, 1977.
MILLER,J. D., ENGEBRETSON,A. M., SPENNER,B. F., & Cox, J.
R. Preliminary analysis of speech sounds with a digital model
of the ear.Journal of the Acoustical Society of America, 1977,
62(Supph I), $13(A).
MORSE, P. A. The discrimination of speech and non-speech in
early infancy.Journal of Child Psychology, 1972,14,477-492.
SEARLE, C. L., JACOBSON, J. Z., & RAYMENT, S. G. Stop consonant discrimination based on human audition. Journal of the
Acoustical Society of America, 1979, 65, 799-809.
SMITH, N. V. The acquisition of phonology: A case study. Cambridge, England: Cambridge University Press, 1973.
STEVENS, K. N., ~¢ BLUMSTEIN, S. E. Invariant cues for place of
articulation in stop consonants. Journal of the Acoustical Society of America, 1978, 64, 1358-1368.
STEVENS, K. N., & HousE, A. S. Speech perception. In J. V.
Tobias (Ed.), Foundations of modern auditory theory (Vol. 2).
New York: Academic Press, 1972.
TULWNG, E., & DONALDSON, W. (Eds.). Organization of memory. New York: Academic Press, 1972.
Received March 2, 1982
Accepted August 12, 1982
Requests for reprints should be sent to James Hillenbrand,
Department of Communicative Disorders, Northwestern University, 2299 Sheridan Road, Evanston, IL 60201.
282
Journal of Speech and Hearing Research
26
268-282
J u n e 1983
APPENDIX
Table A shows the results of acoustic measurements on the stop-vowel and nasal-vowel stimuli used in the infant
tests. All measurements were made using the program AUDED (Prall & Hillenbrand, Note 5) written for a DEC PDP
11 computer. Fundamental frequency was measured for the vocalic portion of each utterance by displaying successive 100-msec segments of the waveform on a high-resolution graphics terminal (Tektronix 4010) and using a
cross-hair cursor to mark the boundaries of each pitch period. For simplicity, the table shows only mean fundamental frequency. All utterances showed rise/fall fundamental frequency contours. Intensity was measured by a program that simply calculated an RMS value over all data points in the waveform and converted the value to a decibel
scale. All values in the table are given in relation to [ba] (male), which was arbitrarily set to 65 dB. The overall
duration of each utterance was measured from the same graphics displays as those used to calculate fundamental
frequency.
TABLE A. Fundamental frequency, intensity, and duration measurements of the stop and nasal stimuli. Fundamental frequency means and standard deviations are given separately for the male and female talkers.
Stimuli
ba
da
ga
ba
da
ga
(male)
(male)
(male)
(female)
(female)
(female)
mean
SD
ma (male)
na (male)
0a (male)
ma (female)
na (female)
rja (female)
mean
SD
Fundamental
frequency (Hz)
RMS
intensity (dB )
Duration (msec )
80.7
81.5
82.1
197.6
194,8
197.4
65.0
64.5
64.3
69.1
69.5
68.4
479.2
562.8
518.6
407.2
474.4
560.4
81.4/196.6
.7/1.6
66.8
2.4
500.4
59.4
83.9
84.0
81.3
188,0
197.0
192.4
64.8
64.3
64.4
71.2
71.9
71.5
505.6
561.9
535.2
487.7
508.2
484.9
83.1/192.4
1.5/4.5
68.0
3,9
513.9
29.6
Download