The feature [sonorant] in auditory word identification

advertisement
Sonorant in spoken word recognition 1
The feature [sonorant] in spoken word recognition
Chao-Yang Lee1, Danny R. Moates2, and Russell Fox2
1
School of Hearing, Speech and Language Sciences, 2Department of Psychology
Ohio University, Athens, OH 45701, USA
Running title: Sonorant in word recognition
Corresponding author:
Chao-Yang Lee
School of Hearing, Speech and Language Sciences
Grover Center W225
Ohio University
Athens, OH 45701, USA
Telephone: (740) 593-0232
Fax: (740) 593-0287
E-mail: leec1@ohio.edu
Sonorant in spoken word recognition 2
ABSTRACT
Distinctive features reveal the internal organization of phonetic segments, but
their role in representing and processing spoken words by humans has not been evaluated
extensively. It is normally assumed in models of spoken word recognition that all
phonetic segments or features are treated equally in lexical processing, but this
assumption has been challenged by findings showing varying degrees of difficulty in
word reconstruction. The present study examined the role of the feature [sonorant] in two
form priming experiments with the lexical decision task. Participants responded to primetarget pairs, where non-word primes contrasting with real-word targets in one consonant
were either a match (e.g., [knrm] - conform) or mismatch (e.g., [knwrm] conform) in the feature [sonorant]. The results showed faster response when the prime
and target matched in the feature [sonorant]. The effect, however, was limited to fricative
targets. Stop targets did not show reduced reaction time compared to the sonorant targets.
Speech sounds classified by the feature [sonorant] appear to be processed differently
during spoken word recognition and that this processing difference is modulated by
further featural classifications.
Sonorant in spoken word recognition 3
INTRODUCTION
Spoken word recognition involves extracting information from the acoustic signal
and mapping the input onto the mental lexicon. Naturally, two major issues in the study
of spoken word recognition are the mechanism of the mapping and the nature of lexical
representations. It has been well-established by cognitive models of spoken word
recognition that mapping process implicates lexical activation and competition (e.g.,
Luce & Pisoni, 1998; Marslen-Wilson & Welsh, 1978; McClelland & Elman, 1986;
Norris, 1994). As for lexical representations, units of lexical processing have been
proposed at the level of the word, the syllable, the phonetic segment, distinctive features,
and spectral templates.
The purpose of this study was to examine the role of distinctive features in spoken
word recognition. In particular, we investigated the role of the feature [sonorant] in
lexical processing with two form priming experiments. Although the nature of sublexical
representations remains debated, implicit in most models of spoken word recognition is
the assumption that all types of a sublexical unit are treated equally in lexical processing.
For example, for models with the phonetic segment as the basic unit, all phonetic
segments, vowels or consonants, are normally assumed to be equally effective in
participating in lexical activation and competition.
This assumption, however, has been challenged by studies showing the vowel
mutability effect (van Ooijen, 1996; Cutler, Sebastián-Gallés, Soler-Vilageliu, & Ooijen,
2000). In particular, van Ooijen (1996) showed in a word reconstruction task that English
listeners tend to change vowels rather than consonants when asked to turn a non-word
sequence to a real word by changing one sound. For example, when given the non-word
Sonorant in spoken word recognition 4
sequence teeble, listeners are more likely to propose table rather than feeble. Cutler et al.
(2000) further showed in a cross-linguistic study on word reconstruction that the
tendency to change vowels rather than consonants appeared to be language independent,
reflecting the intrinsic differences between the information provided by vowels and
consonants. In other words, not all phonetic segments are treated equal in lexical
processing.
Obviously all sounds are not equal in their internal structure. Linguists have long
noted that the phonetic segment can be further analyzed into bundles of distinctive
features (Jakobson, 1928; Jakbson, Fant, & Halle, 1952; Chomsky & Halle, 1968).
Importantly, these categorical features are grounded in physical principles governing the
articulatory-acoustic-auditory relationships (Stevens, 1972, 1989, 1997). Specifically,
non-linear or “quantal” relations exist in the mapping from articulation onto acoustics and
from acoustics onto auditory responses. Consequently, continuous changes in one domain
(e.g., articulation) could result in discrete changes in another domain (e.g., acoustics). It
is these non-linear relations that serve as the basis for the categorically specified
distinctive features (Stevens, 1972, 1989, 1997, 2001).
Given the internal organization of phonetic segments revealed by featural
specifications, it is conceivable that distinctive features would play a role in spoken word
recognition. Indeed, sensitivity to featural specification in lexical access has been
demonstrated in many experimental investigations (Connine, Blasko, & Titone, 1993;
Connine, Blasko, & Wang, 1994; Milberg, Blumstein, & Dworetzky, 1988). Distinctive
features have also figured in some cognitive models of spoken word recognition. For
example, TRACE (McClelland & Elman, 1986) incorporates a feature-level
Sonorant in spoken word recognition 5
representation in addition to phonemic and lexical nodes. The Cohort model has also
acknowledged the role of features in lexical activation in that candidacy into word-initial
cohort can tolerate certain featural mismatches (Marslen-Wilson, 1993; Marslen-Wilson
& Warren, 1994; Marslen-Wilson & Zwitserlood, 1989).
While ample evidence exists for a feature-level representation and for the role of
features in spoken word recognition, the implicit assumption remains that all types of
distinctive features are treated equal in lexical processing. From the vowel mutability
effect (van Ooijen, 1996; Cutler, et al., 2000), it is clear that processing differences can
exist between vowels and consonants. The next question is whether similar processing
differences also exist for types of contrasts specified by other distinctive features. The
answer to this question has implications for the speech sound structure proposed by
linguists (i.e., distinctive features and their organization) and the relevance of these
features in spoken word recognition.
There are reasons to expect that lexical processing differences are present for
contrasts other than the vowel-consonant distinction. It has been proposed that distinctive
features are not an unorganized bundle but rather are grouped into a hierarchical structure
(Clements, 1985; McCarthy, 1988; Halle, 1992; Halle & Stevens, 1991). For example,
there is a consensus that the major class features [consonantal] and [sonorant] form the
“root” of a feature tree and that other features are derived from the root with further
reference to specific articulators (Kenstowicz, 1994). Stevens (2002, 2006) developed a
model for lexical access based on the distinctive features proposed by Halle (1992). In
this model, acoustic “landmarks” for consonants, vowels, and glides are first identified.
Acoustic parameters and cues are then extracted from the vicinity of the landmarks to
Sonorant in spoken word recognition 6
estimate the values of other features. Based on the estimations, a lexical hypothesis is
generated and compared to words stored in the lexicon. Lexical access is achieved when
a match is found.
Compared to other cognitive models of spoken word recognition, Stevens’ (2002,
2006) model explicitly specifies lexical representation in terms of distinctive features.
Procedures have also been developed for automatically estimating the landmarks and
features. It is not clear, however, whether the proposed procedures also reflect lexical
processing by humans, particularly in real-time speech processing. That is, are human
listeners also engaged in consonant/vowel landmark detection prior to feature estimation?
Do human listeners evaluate all features simultaneously or do they give preference to
particular features? Studies showing the vowel mutability effect (van Ooijen, 1996;
Cutler, et al., 2000) appear to have provided evidence for the processing difference
between consonants and vowels. What remains to be evaluated is the processing of other
features by human listeners.
The feature [sonorant] is a good candidate for addressing the issue of processing
features by humans. Every language distinguishes sonorant consonants from obstruent
consonants just as all languages distinguish consonants from vowels. This is part of the
reason why [sonorant] is one of two major class features and is one of the two features
that are placed at the root of feature geometry (Kenstowicz, 1994). Furthermore,
[sonorant] is one of the articulator-free features (Halle, 1992), meaning that it does not
specify any specific articulators, but rather reflects general characteristics of consonant
constriction in the vocal tract and the acoustic effect of forming the constriction (Stevens,
2002, 2006). In particular, obstruent consonants are produced with substantial intraoral
Sonorant in spoken word recognition 7
air pressure and sonorant consonants are produced without such significant pressure,
irrespective of the articulators involved. Despite the seemingly fundamental status of
these articulator-free features, Steven’s model (2002, 2006) does not indicate whether
they are processed any differently from other features.
Nonetheless, there exists some evidence for the processing of [sonorant] by
human listeners. Marks, Moates, Bond and Stockmal (2002) conducted a word
reconstruction task using American English and Spanish materials. Participants heard a
non-word (e.g., bavalry or mavalry) that could be changed into a real word (e.g., cavalry)
by changing just one consonant. The consonant to be recovered was an obstruent in half
the cases and a sonorant in the other half. Half the obstruents were replaced with other
obstruents (match in [sonorant]) and half were replaced with sonorants (mismatch).
Similarly, half the sonorants were replaced with other sonorants (match) and the other
half were replaced with obstruents (mismatch). The results showed, when an obstruent
was replaced by another obstruent, reconstructing the correct word was significantly
more accurate than when the obstruent was replaced by a resonant. In contrast, sonorant
target words showed no such effect. That is, accuracy of constructing sonorant target
words did not differ between the match and mismatch conditions.
Analogous to the vowel mutability effect, Marks et al. (2002) showed that speech
sounds divided into sonorants and obstruents were not processed equally by human
listeners. When the feature [sonorant] matches between a non-word stimulus and a target
word, word reconstruction was more accurate. However, this statement was true only for
obstruent target words, further illustrating the processing difference. The difference
between sonorant and obstruent target words was attributed to the observation that
Sonorant in spoken word recognition 8
sonorants were phonetically similar to vowels while obstruents were maximally distinct
from vowels. That is, sonorant consonants are probably more mutable than obstruent
consonants in spoken word recognition.
Marks et al. (2002) was the first study to evaluate the impact of the feature
[sonorant] in spoken word recognition by humans. The present study extended the Marks
et al. (2002) study in several ways. First, the form priming paradigm (Zwitserlood, 1996)
with a lexical decision task was used. The use of a task different from word
reconstruction would evaluate the generalizability of the [sonorant] effect found earlier.
More importantly, the speeded-response task could provide a potentially more sensitive
measure than accuracy alone, and would better assess the on-line nature of lexical
processing, as has been shown in earlier investigations on features (Connine, et al., 1993,
1994; Milberg, et al., 1988).
In the present study, prime-target pairs were constructed where the target (e.g.,
conform) was preceded by one of two types of nonword primes: one with a sound change
matching the target in [sonorant] (e.g., [knrm]) and the other with a sound change
mismatching the target in [sonorant] (e.g., [knwrm]). If listeners are sensitive to the
[sonorant] specification in word recognition, response should be facilitated in the
matching condition relative to the mismatching condition.
Second, the present study divided obstruent consonants into fricatives and stops.
In a post hoc analysis not reported in the Marks et al. (2002) study, it was discovered that
among the obstruent words, response accuracy appeared to differ between fricative and
stop consonants. Coincidentally, these two classes of sounds are distinguished in the
feature system by another articulator-free feature [continuant]. A subsequent word
Sonorant in spoken word recognition 9
reconstruction study of the feature [continuant] revealed that reconstruction of fricative
words were less error-prone in the match condition than in the mismatch condition
(Moates, Sutherland, Bond, & Stockmal, manuscript in preparation). In contrast, stop
words showed no such difference. In other words, fricative words alone could be
responsible for the mismatch effect found in the Marks et al. (2002) study. For these
reasons, it was decided to examine fricatives vs. sonorants (Experiment 1) and stops vs.
sonorants (Experiment 2) separately to evaluate the potential difference between
fricatives and stops.
EXPERIMENT 1
Method
Materials
Ninety-eight English words were selected as the real-word targets in the priming
experiment. All words have one of 14 target phonemes including seven fricative
consonants [f, v, , , s, z, ] and seven sonorant consonants [m, n, , l, r, j, w]. Half of
the words have two syllables and the other half have three syllables. Half of the words
have the target phoneme in the onset of the stressed syllable and the other half in the coda
of the stressed syllable. Since comparisons would be made between fricatives and
sonorants as target phonemes, care was taken to balance word frequency, number of
consonants, word recognition point, number of phonemes, and neighborhood density.
(Were these actually balanced? If so, should we report these numbers and statistical tests?)
Ideally there would be a total of 112 items, including 14 (seven fricatives and seven
sonorants) x 2 (two- vs. three-syllables) x 2 (syllable onset vs. coda) x 2 (tokens).
Sonorant in spoken word recognition 10
However, only 98 words could be selected due to phonotactic constraints (e.g., [] does
not appear in syllable-onset position; [j, w] do not appear in syllable-coda position).
For each word, two non-word primes were constructed by replacing the target
phoneme in the real word: one with a phoneme matching the value of the feature
[sonorant] in the target phoneme, and the other with a sound mismatching the value of the
feature [sonorant] in the target phoneme. For example, for the word conform, where the
target phoneme is [f], a matching prime was [knrm] and a mismatching prime was
[knwrm].
In addition to the real-word targets, 98 pronounceable non-word fillers were
constructed to serve as non-word targets. Similar to the word target setup, these nonwords included both two-syllable and three-syllable items with the target phoneme in
either the onset or coda of a stressed syllable. The 14 target phonemes were identical to
those used in the word targets. For each non-word, two non-word primes were
constructed by replacing the target phoneme in the non-word: one with a phoneme
matching the value of the feature [sonorant] in the target phoneme, and the other with a
sound mismatching the value of the feature [sonorant] in the target phoneme. For
example, for the target [rflv], where the critical sound is [f], a feature-matching
prime was [rblv] and a mismatching prime was [rmlv]. In other words, the
prime-target relationship in the non-word target set was identical to that in the word
target set. The complete set of stimuli is listed in Appendix A.
Participants
Sonorant in spoken word recognition 11
Forty undergraduate students (25 females and 15 males) at Ohio University
participated in the experiment. All were native speakers of American English with
normal hearing. They received partial course credit for participating in the experiment.
Procedure
The stimuli were recorded by a phonetically-trained female speaker of American
English. The recording was made in a sound-treated booth in the School of Hearing,
Speech and Language Sciences at Ohio University with a high-quality microphone
(Audio-technica AT825 field recording microphone) connected through a preamplifier
and A/D converter (USBPre microphone interface) to a Windows personal computer
(Dell). The recording was sampled using the Brown Lab Interactive Speech System
(BLISS, Mertus, 2000) at 20 kHz with 14-bit quantization. The stimuli, saved as
individual audio files, were imported to AVRunner, the subject-testing program in BLISS,
for stimulus presentation.
Two stimulus lists were constructed with the following considerations. The
relationship between the prime and target (match, mismatch) was intended to be a withinsubject factor. It was also determined that participants were not to hear the same stimulus
more than once during the experiment to avoid any familiarity effects. To these ends, two
stimulus lists were constructed such that for a given target, each of the two primes was
assigned to a different list such that each list would include both prime types without
repeating any stimulus in a list. The fillers (non-word primes and non-word targets) were
assigned to the two lists in the same way. Therefore, no primes or targets were repeated
in any list. In sum, each list included 196 targets (98 word targets and 98 non-word
targets) with the two prime types (match, mismatch) equally distributed. Each participant
Sonorant in spoken word recognition 12
was randomly assigned to be tested on one list only. The presentation of lists was
counterbalanced across participants such that the two lists were presented equally often
across participants. For each participant, AVRunner assigned a uniquely randomized
presentation order such that no two participants received the same order of presentation.
The inter-stimulus interval between the prime and target was 50 milliseconds. The intertrial interval was three seconds.
Participants were tested individually in a quiet room in the School of Hearing,
Speech and Language Sciences at Ohio University. They listened to the stimuli through a
pair of high-quality headphones (Koss R80) connected to a Windows personal computer
(Dell). The participants were told that they would be listening to pairs of auditory stimuli
that could be real words or non-words in English. Their task was to judge whether the
second item in a pair was a real word or a non-word by pressing the computer keys
labeled with YES (for real words) or NO (for non-words). They were also instructed to
respond as quickly as possible as reaction time would be measured. Prior to the actual
experiment, 10 practice trials, none appeared in the actual experiment, were given to
familiarize the participants with the experimental procedure.
Data analysis
Response accuracy and reaction time were recorded by BLISS automatically.
Reaction time was measured from the onset of the target. Only responses to real word
targets were analyzed and only correct responses were included in the reaction time
analysis. Repeated measures ANOVAs were conducted on response accuracy and
reaction time with relation between prime and target (match, mismatch) and target
phoneme (fricative, sonorant) as fixed factors and participants as a random factor.
Sonorant in spoken word recognition 13
Results
Figure 1 shows the average reaction time of lexical decision by relation and target
phoneme. The ANOVAs revealed a significant main effect of relation (F (1, 39) = 7.88, p
< .01). In particular, response was faster when there was a match between the target
phonemes (950 ms) than when there was no match (968 ms). The main effect of target
phoneme was also significant (F (1, 39) = 10.97, p < .005). Specifically, response was
faster for sonorants (943 ms) than for fricatives (976 ms). The relation-phoneme
interaction was also significant (F (1, 39) = 5.82, p < .05). As Figure 1 shows, the
interaction arose because the [sonorant] feature mismatch slowed down lexical decision
for fricatives but not for sonorants.
Table 1 shows the average number of errors in the lexical decision task by relation
and target phoneme. Overall, participants made very few errors. Still, the ANOVAs
revealed a significant main effect of target phoneme (F (1, 39) = 41.6, p < .0001).
Specifically, response was more accurate for sonorants (0.7 out of 49) than for fricatives
(1.7 out of 49). The relation-phoneme interaction was also significant (F (1, 39) = 5.03, p
< .05). The interaction arose because the [sonorant] feature mismatch resulted in more
errors for fricatives but not for sonorants. The pattern of errors is similar to that of the
reaction time, indicating no tradeoff between speed and accuracy.
Summary
As predicted, lexical decision response was faster when there was a match in the
feature [sonorant] between the prime and target, indicating the match/mismatch in this
feature does impact lexical processing. However, the feature match facilitated response
only when the target phoneme was a fricative consonant. In contrast, the feature match
Sonorant in spoken word recognition 14
did not make a difference when the target phoneme was a sonorant consonant. This
pattern is identical to what was found in Marks et al. (2002) with a word reconstruction
task. Together these findings suggest that the effect of feature match hinges on the type of
target phoneme involved. A mismatch in [sonorant] disrupted response to fricative targets
but not sonorant targets.
Would this result generalize to all obstruents? The next experiment examined the
feature match effect with another group of obstruent consonants, the stop consonants, to
evaluate whether the feature match effect was limited only to fricative consonants.
EXPERIMENT 2
Method
Materials
Ninety-two English words were selected to be the real-word targets in the priming
experiment. All words have one of the 13 target phonemes including six stop consonants
[p, b, t, d, k, ] and seven sonorant consonants [m, n, , l, r, j, w]. As in the previous
experiment, half of the words have two syllables and the other half have three syllables.
Half of the words have the target phoneme in the onset of the stressed syllable and the
other half in the coda of the stressed syllable. Since comparisons would be made between
stops and sonorants as target phonemes, care was taken to balance word frequency,
number of consonants, word recognition point, number of phonemes, and neighborhood
density. (Were these actually balanced? If so, should we report these numbers and
statistical tests?) Ideally there would be a totally of 104 items, including 13 (six stops and
seven sonorants) x 2 (two- vs. three-syllables) x 2 (syllable onset vs. coda) x 2 (tokens).
Sonorant in spoken word recognition 15
However, only 92 words could be selected due to phonotactic constraints, as was noted in
experiment 1.
For each word, two non-word primes were constructed by replacing the target
phoneme in the real word: one with a phoneme matching the value of the feature
[sonorant] in the target phoneme, and the other with a phoneme mismatching the value of
the feature [sonorant] in the target phoneme. For example, for the word pothole, where
the target phoneme is [p], a matching prime was [bthol] and a mismatching prime was
[wthol].
In addition to the real-word targets, 92 pronounceable non-word fillers were
constructed to serve as non-word targets. Similar to the word-target setup, these nonwords included both two-syllable and three-syllable items with the target phoneme in
either the onset or coda of a stressed syllable. The target phonemes were identical to
those used in the word targets. For each non-word, two non-word primes were
constructed by replacing the target phoneme in the non-word: one with a sound matching
the value of the feature [sonorant] in the target phoneme, and the other with a sound
mismatching the value of the feature [sonorant] in the target phoneme. For example, for
the target [ptl], where the critical sound is [p], a feature-matching prime was
[stl] and a mismatching prime was [ntl]. In other words, the prime-target
relationship in the non-word target set was identical to that in the word target set. The
complete set of stimuli is listed in Appendix B.
Participants
Forty undergraduate students (26 females and 14 males) at Ohio University
participated in the experiment. All were native speakers of American English with
Sonorant in spoken word recognition 16
normal hearing. They received partial course credit for participating in the experiment.
None of the participants participated in the previous experiment.
Procedure
The procedure was identical to that of Experiment 1.
Results
Figure 2 shows the average reaction time of lexical decision by relation and target
phoneme. The ANOVAs revealed no main or interaction effects. The average reaction
time for matching relation was 925 ms (SD = 121); for mismatching relation it was 933
ms (SD = 117). The average reaction time for stops was 926 ms (SD = 119); for
sonorants it was 933 ms (SD = 119). Although a mismatch appears to slow down stops
more than sonorants, the interaction was not statistically significant.
Table 2 shows the average number of errors in the lexical decision task by relation
and target phoneme. Again, participants made very few errors. The ANOVAs revealed a
significant main effect of relation (F (1, 39) = 40.03, p < .0001). In particular, response
was more error-prone when the target phonemes matched in the feature [sonorant] (0.9
out of 49) than when the phonemes did not match (1.8 out of 49). This pattern of result is
rather counter-intuitive given that one would expect the matching relation to facilitate
responses and to generate fewer errors. However, since the actual number of errors is
very small (2% for match and 4% for mismatch), the statistical difference found here may
not be meaningful.
Summary
In contrast to the fricative-sonorant comparison in Experiment 1, results from the
stop-sonorant comparison showed no significant reaction time difference between the
Sonorant in spoken word recognition 17
match and mismatch conditions for either stop or sonorant target phonemes. While the
null result for sonorant target phonemes were consistent with the finding from
Experiment 1, the lack of effect for stop consonants suggests that the feature matching
effect did not apply to all obstruent consonants.
GENERAL DISCUSSION
The research question in this study was whether a match or mismatch in the
feature [sonorant] would impact spoken word recognition by humans. Two form priming
experiments with the lexical decision task investigated this issue. The results showed that
reaction time was reduced when the prime and target matched in [sonorant] for obstruent
consonants but not sonorant consonants. The results further showed that the match effect
was restricted to fricative consonants only. Stop consonants did not show the match effect.
The first result replicated findings from Marks et al. (2002), who found word
reconstruction to be more accurate when target and replacing segments were matched on
the feature [sonorant] than when they were mismatched. This effect occurred for
obstruents but not sonorants, just as what was found in the present study. Overall, the
finding that feature mismatch makes an impact in spoken word recognition indicates the
distinctive features originally posited on the basis of articulation and acoustics are also
implicated in perceptual processing. The finding also challenges the assumption that all
phonetic segments and features are treated equally in spoken word recognition, as
responses to obstruents and sonorants showed two different patterns.
What could be the reason for the different patterns between sonorants and
obstruents? As noted earlier, Marks et al. (2002) speculated that sonorants were
phonetically similar to vowels, which could explain why a feature mismatch did not
Sonorant in spoken word recognition 18
disrupt word reconstruction for sonorants as substantially as it did for obstruents.
Articulatorily, both are consonants, which are produced with a narrow constriction in the
vocal tract. Acoustically, the formation and subsequent release of the constriction
introduces acoustic discontinuity at the formation and release. However, they are
different in that the articulation of sonorants involves an abrupt switching of the airflow
to a different path in the vocal tract without a substantial increase in intraoral air pressure
(Stevens, 1998, 2002). Liu (1996) developed a sonorant detector as part of an algorithm
for automatic speech recognition. She noted that energy in the second to fourth formant
range decreases at the formation and increases at the release of a sonorant consonant.
During the constriction, the spectrum remains relatively steady, especially at low
frequencies, since the vocal tract shape is relatively constant. Given that vowel landmarks
are where there is maximum amplitude in the first formant range (Stevens, 2002), there
seems to be some merit to the argument that sonorant consonants are phonetically to
similar to vowels. This acoustic similarity could account for the obstruent-sonorant
contrast observed in Marks et al. (2002) and the current study.
The finding that responses to fricatives are different from responses to stops is
also noteworthy. Although obstruents overall had a feature match effect in Marks et al.
(2002), the contrast between the two experiments in the current study showed that the
source of the effect is limited to fricatives. The feature match effect is thus not spread
evenly across the class of obstruents. As noted, the feature [continuant] classifies
obstruent consonants into fricatives and stops. Stops are produced with a complete
closure in the vocal tract and thus abrupt amplitude decrease and increase at consonant
closure and release. Fricatives, on the other hand, are produced with a sustained narrow
Sonorant in spoken word recognition 19
constriction and thus continuous turbulent noise (Stevens, 1998, 2002, 2006). Given their
distinct articulatory and acoustic properties, it is perhaps not surprising that they are
processed differently during spoken word recognition, as revealed by the current study.
(What are the other issues to be discussed? More general theoretical issues on
distinctive features and their relevance to perceptual processing? Discussions on
implications for specific spoken word recognition models? Methodological issues?)
Sonorant in spoken word recognition 20
REFERENCES
Chomsky, N. and Halle, M. (1968). The Sound Patttern of English. New York:
Haper & Row.
Clements, G. N. (1985). The geometry of phonological features. Phonology
Yearbook, 2, 225-252.
Connine, C. M., Blasko, D. G., and Titone, D. (1993). Do the beginnings of words
have a special status in auditory word recognition? Journal of Memory & Language, 32,
193-210.
Connine, C. M., Blasko, D. G., and Wang, J. (1994). Vertical similarity in spoken
word recognition: Multiple lexical activation, individual differences, and the role of
sentence in context. Perception & Psychophysics, 56, 624-636.
Cutler, A., Sebastián-Gallés, N., Soler-Vilageliu, O., and van Ooijen, B. (2000).
Constraints of vowels and consonants on lexical selection: Cross-linguistic comparisons.
Memory & Cognition, 28, 746-755.
Halle, M. (1992). Features. In W. Bright (ed.), Oxford International Encyclopedia
of Linguistics (pp. 207-212). New York: Oxford University Press.
Halle, M., and Stevens, K. N. (1991). Knowledge of language and the sounds of
speech. In J. Sundberg, L. Nord, and R. Carlson (Eds.), Music, Language, Speech and
Brain (pp. 1-19). London: MacMillan.
Jakobson, R. (1928)
Jakbson, R., Fant, C. G. M., and Halle, M. (1952). Preliminaries to speech
analysis: the distinctive features and their correlates. MIT Acoustics Laboratory
Technical Report 13. Reprinted 1967, Cambridge, MA: MIT Press.
Sonorant in spoken word recognition 21
Kenstowicz, M. (1994). Phonology in Generative Grammar. Cambridge:
Blackwell Publishers.
Liu, S. A. (1996). Landmark detection for distinctive feature-based speech
recognition. Journal of the Acoustical Society of America, 100, 3417-3430.
Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The
Neighborhood Activation Model. Ear & Hearing, 19, 1-36.
Marks, E. A., Moates, D. R., Bond, Z, S., and Stockmal, V. (2002). Word
reconstruction and consonant features in English and Spanish. Linguistics, 40, 421-438.
Marks, E. A., Moates, D. R., Bond, Z. S., and Vazquez, L. (2002). Vowel
mutability: The case of monolingual Spanish listeners and bilingual Spanish-English
listeners. Southwest Journal of Linguistics, 21, 73-99.
Marslen-Wilson, W. D. (1993). Issues of process and representation in lexical
access. In G. Altmann & R. Shillcock (Eds.), Cognitive models of speech processing: The
Second Sperlonga Meeting (pp. 187-210). Hillsdale, NJ: Erlbaum.
Marslen-Wilson, W. D., and Warren, P. (1994). Levels of perceptual
representation and process in lexical access: Words, phonemes and features.
Psychological Review, 101, 653-675.
Marslen-Wilson, W. D., and Welsh, A. (1978). Processing interactions and lexical
access during word recognition in continuous speech. Cognitive Psychology, 10, 29-63.
Marslen-Wilson, W. D., and Zwitserlood, P. (1989). Accessing spoken words:
The importance of word onsets. Journal of Experimental Psychology: Human Perception
& Performance, 15, 576-585.
Sonorant in spoken word recognition 22
McCarthy, J. J. (1988). Feature geometry and dependency: a review. Phonetica,
45, 84–108.
McClelland, J. L., and Elman, J. L. (1986). The TRACE model of speech
perception. Cognitive Psychology, 18, 1-86.
Mertus, J. A. (2000). BLISS: The Brown Lab Interactive Speech System. Brown
University.
Milberg, W., Blumstein, S. E., and Dworetzky, B. (1988). Phonological factors in
lexical access: Evidence from an auditory lexical decision task. Bulletin of the
Psychonomic Society, 26, 305-308.
Moates, D. R., Sutherland, M. T., Bond, Z. S., and Stockmal, V. The feature
[continuant] in word reconstruction. Manuscript in preparation.
Norris, D. (1994). SHORTLIST: A connectionist model of continuous speech
recognition. Cognition, 52, 189-234.
Ooijen, B. van (1996). Vowel mutability and lexical selection in English:
Evidence from a word reconstruction task. Memory & Cognition, 24, 573-583.
Stevens, K. N. (1972). The quantal nature of speech: Evidence from articulatoryacoustic data. In P. B. Denes and E. E. David, Jr. (Eds.), Human Communication: A
Unified View (pp. 51–66). New York: McGraw-Hill.
Stevens, K. N. (1989). On the quantal nature of speech. Journal of Phonetics, 17,
3–46.
Stevens, K. N. (1997). Articulatory-acoustic-auditory relationships. In W. J.
Hardcastle and J. Laver (Eds.), The Handbook of Phonetic Sciences (pp. 507-538).
Cambridge: MIT Press.
Sonorant in spoken word recognition 23
Stevens, K. N. (1998). Acoustic Phonetics. Cambridge: MIT Press.
Stevens, K. N. (2002). Toward a model for lexical access based on acoustic
landmarks and distinctive features. Journal of the Acoustic Society of America, 111,
1872-1891.
Stevens, K. N. (2006). Features in speech perception and lexical access. In D. B.
Pisoni and R. Remez (Eds.), The Handbook of Speech Perception (pp. 125-155).
Cambridge: Blackwell Publishers.
Zwitserlood, P. (1996). Form priming. Language and Cognitive Processes, 11,
589-596.
Sonorant in spoken word recognition 24
ACKNOWLEDGMENTS
We thank Sara Kellgreen for administering the experiment and assisting in data
analysis and Carla Youngdahl for recording the materials.
Sonorant in spoken word recognition 25
Table 1. Average number of errors (out of possible 49) in the lexical decision task in
Experiment 1. Standard deviation is shown in parenthesis.
Prime-target relation
Target phoneme
Match
Mismatch
Fricatives
1.53 (1.38)
1.88 (1.54)
Sonorants
0.8 (1.09)
0.6 (0.81)
Sonorant in spoken word recognition 26
Table 2. Average number of errors (out of possible 46) in the lexical decision task in
Experiment 2. Standard deviation is shown in parenthesis.
Prime-target relation
Target phoneme
Match
Mismatch
Fricatives
1.65 (0.89)
1.05 (0.93)
Sonorants
1.85 (0.95)
0.73 (0.88)
Sonorant in spoken word recognition 27
FIGURE CAPTIONS
Figure 1. Average reaction time (in milliseconds) with standard error bars in
Experiment 1.
Figure 2. Average reaction time (in milliseconds) with standard error bars in
Experiment 2.
Sonorant in spoken word recognition 28
Interaction Bar Plot for Reaction Tim e
Effect: Resonant * Match
Error Bars: ± 1 Standard Error(s)
1100
1000
950
900
Cell
Resonant, mismatch
Resonant, match
800
Obstruent, mismatch
850
Obstruent, match
Cell Mean
1050
Sonorant in spoken word recognition 29
Interaction Bar Plot for Response Tim e
Effect: Resonancy * Match
Error Bars: ± 1 Standard Error(s)
1100
1050
950
900
Cell
Resonant, Mismatch
Resonant, Match
800
Obstruent, Mismatch
850
Obstruent, Match
Cell Mean
1000
Download