manuscript sonorant 0507 2008

advertisement
Sonorant in spoken word recognition 1
The feature [sonorant] in spoken word recognition
Chao-Yang Lee1, Danny R. Moates2, and Russell Fox2
1
School of Hearing, Speech and Language Sciences, 2Department of Psychology
Ohio University, Athens, OH 45701, USA
Running title: Sonorant in spoken word recognition
Corresponding author:
Chao-Yang Lee
School of Hearing, Speech and Language Sciences
Grover Center W225
Ohio University
Athens, OH 45701, USA
Telephone: (740) 593-0232
Fax: (740) 593-0287
E-mail: leec1@ohio.edu
Sonorant in spoken word recognition 2
ABSTRACT
Distinctive features characterize the internal organization of phonetic segments,
but their role in representing and processing spoken words has not been evaluated
extensively. It is normally assumed in models of spoken word recognition that all
phonetic segments and features are treated equally in lexical processing, but this
assumption has been challenged by findings showing varying degrees of difficulty in
word reconstruction. The present study examined the role of the feature [sonorant] in two
form priming experiments with the lexical decision task. Participants responded to primetarget pairs, where non-word primes contrasting with real-word targets in one consonant
were either a match (e.g., [knrm] - conform) or mismatch (e.g., [knwrm] conform) in the feature [sonorant]. The results showed faster response when the prime
and target matched in the feature [sonorant]. The effect, however, was limited to fricative
targets only. Stop targets did not show reduced reaction time compared to the sonorant
targets. These findings suggest that speech sounds classified by the feature [sonorant] are
processed differently during spoken word recognition and that this processing difference
is modulated by further featural classifications.
Sonorant in spoken word recognition 3
INTRODUCTION
Spoken word recognition involves extracting information from the acoustic signal
and mapping the information onto the mental lexicon. Naturally, two major issues in the
study of spoken word recognition are the mechanism of the mapping process and the
nature of lexical representations. It has been established by cognitive models of spoken
word recognition that the mapping process implicates lexical activation and competition
(e.g., Luce & Pisoni, 1998; Marslen-Wilson & Welsh, 1978; McClelland & Elman, 1986;
Norris, 1994). As for lexical representations, units of lexical processing have been
proposed at different levels ranging from spectral templates and distinctive features to the
phonetic segment and the syllable.
The purpose of this study was to examine the role of distinctive features in spoken
word recognition. In particular, the role of the feature [sonorant] in lexical processing
was examined in two form priming experiments. Although the nature of sublexical
representations remains debated, implicit in most models of spoken word recognition is
the assumption that all tokens of a sublexical unit are treated equally in lexical processing.
For example, for models with the phonetic segment as the basic unit, all phonetic
segments, vowels or consonants, are assumed to be equally effective in participating in
lexical activation and competition.
This assumption, however, has been challenged by studies showing the vowel
mutability effect (van Ooijen, 1996; Cutler, Sebastián-Gallés, Soler-Vilageliu, & Ooijen,
2000). In particular, van Ooijen (1996) showed in a word reconstruction task that English
listeners tend to change vowels rather than consonants when asked to turn a non-word
sequence to a real word by changing one sound. For example, when given the non-word
Sonorant in spoken word recognition 4
teeble, listeners are more likely to propose table rather than feeble. Cutler et al. (2000)
further showed in a cross-linguistic study on word reconstruction that the tendency to
change vowels rather than consonants appeared to be language independent, reflecting
the intrinsic differences between the information provided by vowels and consonants. In
other words, not all phonetic segments are treated equal in lexical processing.
Obviously all sounds are not equal in their internal structure. Linguists have long
noted that the phonetic segment can be further analyzed into bundles of distinctive
features (Jakobson, Fant, & Halle, 1952; Chomsky & Halle, 1968). Importantly, these
categorical features are grounded in physical principles governing the articulatoryacoustic-auditory relationships (Stevens, 1972, 1989, 1997). Specifically, non-linear or
“quantal” relations exist in the mapping from articulation onto acoustics and from
acoustics onto auditory responses. Consequently, continuous changes in one domain (e.g.,
articulation) could result in discrete changes in another domain (e.g., acoustics). It is
these non-linear relations that serve as the basis for the categorically specified distinctive
features (Stevens, 1972, 1989, 1997).
Given the internal organization of phonetic segments revealed by featural
specifications, it is conceivable that distinctive features would play a role in spoken word
recognition. Indeed, sensitivity to featural specification in lexical access has been
demonstrated in many experimental investigations (Connine, Blasko, & Titone, 1993;
Connine, Blasko, & Wang, 1994; Milberg, Blumstein, & Dworetzky, 1988). Distinctive
features have also figured in some cognitive models of spoken word recognition. For
example, TRACE (McClelland & Elman, 1986) incorporates a feature-level
representation in addition to phonemic and lexical nodes. The Cohort model has also
Sonorant in spoken word recognition 5
acknowledged the role of sublexical features in lexical activation. Specifically, candidacy
into word-initial cohort can tolerate certain featural mismatches (Marslen-Wilson, 1993;
Marslen-Wilson & Warren, 1994; Marslen-Wilson & Zwitserlood, 1989).
While ample evidence exists for the feature-level representation and for the role
of features in spoken word recognition, the implicit assumption remains that all types of
distinctive features are treated equal in lexical processing. However, from the vowel
mutability effect (van Ooijen, 1996; Cutler, et al., 2000), it is clear that processing
differences are present between vowels and consonants. The next question is whether
similar processing differences also exist for other types of contrasts specified by
distinctive features. The answer to this question has implications for the speech sound
structure proposed by linguists (i.e., distinctive features and their organization) and the
relevance of these features in the processing of spoken words.
There are reasons to expect that lexical processing differences are present for
contrasts other than the vowel-consonant distinction. It has been proposed that distinctive
features are not an unorganized bundle but rather are grouped into a hierarchical structure
(Clements, 1985; McCarthy, 1988; Halle, 1992; Halle & Stevens, 1991). For example,
there is a consensus that the major class features [consonantal] and [sonorant] form the
“root” of a feature tree and that other features are derived from the root with further
reference to specific articulators (Kenstowicz, 1994). Stevens (2002, 2005) developed a
model for lexical access based on the distinctive features proposed by Halle (1992). In
this model, acoustic “landmarks” for consonants, vowels, and glides are first identified.
Acoustic parameters and cues are then extracted from the vicinity of the landmarks to
estimate the values of other features. Based on the estimations, lexical hypotheses are
Sonorant in spoken word recognition 6
generated and compared to words stored in the lexicon. Lexical access is achieved when
a match is found.
Compared to other cognitive models of spoken word recognition, Stevens’ (2002,
2005) model explicitly specifies lexical representation in terms of distinctive features.
Procedures have also been developed for automatically estimating the landmarks and
some features. It is not clear, however, whether the proposed procedures also reflect
lexical processing by humans, particularly in real-time speech processing. That is, are
human listeners also engaged in consonant/vowel landmark detection prior to feature
estimation? Do human listeners evaluate all features simultaneously or do they give
preference to particular features? Studies showing the vowel mutability effect (van
Ooijen, 1996; Cutler, et al., 2000) appear to have provided evidence for the processing
difference between consonants and vowels (i.e., the feature [consonantal]). What remains
to be evaluated is the processing of other features by human listeners.
The feature [sonorant] is a good candidate for addressing this issue. Every
language distinguishes sonorant consonants from obstruent consonants, just as all
languages distinguish consonants from vowels. This is part of the reasons why [sonorant]
is one of two major class features and is one of the two features placed at the root of
feature geometry (Kenstowicz, 1994). Furthermore, [sonorant] is one of the articulatorfree features (Halle, 1992), meaning that it does not specify any specific articulators, but
rather reflects general characteristics of consonant constriction in the vocal tract and the
acoustic effect of forming the constriction (Stevens, 2002, 2006). In particular,
irrespective of the articulators involved, obstruent consonants are produced with
substantial intraoral air pressure and sonorant consonants are produced without such
Sonorant in spoken word recognition 7
significant pressure. Despite the seemingly fundamental status of these articulator-free
features, neither cognitive models of spoken word recognition nor Steven’s model (2002,
2005) indicates whether these features are processed any differently from other features.
Nonetheless, there exists some evidence for the processing of [sonorant] by
human listeners. Marks, Moates, Bond and Stockmal (2002) conducted a word
reconstruction study using American English and Spanish materials. Participants heard a
non-word (e.g., bavalry or mavalry) that could be changed into a real word (e.g., cavalry)
by changing just one consonant. The consonant to be recovered was an obstruent in half
the cases and a sonorant in the other half. Half the obstruents were replaced with other
obstruents (match in [sonorant]) and half were replaced with sonorants (mismatch in
[sonorant]). Similarly, half the sonorants were replaced with other sonorants (match) and
the other half were replaced with obstruents (mismatch). The results showed that when an
obstruent was replaced by another obstruent, reconstructing the correct word was
significantly more accurate than when the obstruent was replaced by a sonorant. In
contrast, sonorant target words showed no such effect. That is, accuracy of constructing
sonorant target words did not differ between the match and mismatch conditions.
Analogous to the vowel mutability effect, Marks et al. (2002) showed that speech
sounds, when divided into sonorants and obstruents, were not processed equally by
human listeners. When there was a match in the feature [sonorant], word reconstruction
was more accurate. However, this statement was true only for obstruent target words but
not for sonorant target words. The processing difference between sonorants and
obstruents was attributed to the observation that sonorants are phonetically similar to
vowels while obstruents are maximally distinct from vowels. That is, sonorant
Sonorant in spoken word recognition 8
consonants are probably more “mutable” than obstruent consonants in spoken word
recognition.
Marks et al. (2002) was the first study to evaluate the impact of the feature
[sonorant] in spoken word recognition by humans. The present study extended Marks et
al. (2002) in several ways. First, the form priming paradigm (Zwitserlood, 1996) with a
lexical decision task was used. Form priming has been used extensively to investigate the
nature of lexical representation and process. The use of a task different from word
reconstruction could evaluate the generalizability of the [sonorant] effect found in Marks
et al. (2002). More importantly, the speeded-response task could provide a potentially
more sensitive measure than accuracy alone and would better assess the on-line nature of
lexical processing, as has been shown in earlier investigations on features (Connine, et al.,
1993, 1994; Milberg, et al., 1988). In the present study, prime-target pairs were
constructed where the target (e.g., conform) was preceded by one of two types of
nonword primes: one with a sound change matching the target in [sonorant] (e.g.,
[knrm]) and the other with a sound change mismatching the target in [sonorant]
(e.g., [knwrm]). If listeners are sensitive to the [sonorant] specification in word
recognition, response should be facilitated in the matching condition relative to the
mismatching condition.
Second, the present study divided obstruent consonants into fricatives and stops.
In a post hoc analysis not reported in the Marks et al. (2002) study, it was discovered that
among the obstruent words, response accuracy appeared to differ between fricative and
stop consonants. Coincidentally, these two classes of sounds are distinguished in the
feature system by another articulator-free feature [continuant]. A subsequent word
Sonorant in spoken word recognition 9
reconstruction study of the feature [continuant] also revealed that reconstruction of
fricative words were less error-prone in the match condition than in the mismatch
condition (Moates, Sutherland, Bond, & Stockmal, manuscript in preparation). In contrast,
stop words showed no such difference. In other words, fricative words alone could be
responsible for the mismatch effect found in the Marks et al. (2002) study. For these
reasons, it was decided to examine fricatives vs. sonorants (Experiment 1) and stops vs.
sonorants (Experiment 2) separately to evaluate the potential difference between
fricatives and stops.
EXPERIMENT 1
Method
Materials
Ninety-eight English words were selected as real-word targets in the priming
experiment. All words have one of 14 target phonemes including seven fricative
consonants [f, v, , , s, z, ] and seven sonorant consonants [m, n, , l, r, j, w]. Half of
the words have two syllables and the other half have three syllables. Half of the words
have the target phoneme in the onset of the stressed syllable and the other half in the coda
of the stressed syllable. Target words in the fricative and resonant lists were balanced for
variables affecting lexical access. A set of t tests showed no significant difference in
word frequency (p = 0.68), number of segments (p = 0.72), number of consonants (p =
0.97), and uniqueness point (p = 0.24). The consonant change occurred before the
uniqueness point in all target words. Ideally there would be a total of 112 items, including
14 (seven fricatives and seven sonorants) x 2 (two- vs. three-syllables) x 2 (syllable onset
vs. coda) x 2 (tokens). However, only 98 words could be selected due to phonotactic
Sonorant in spoken word recognition 10
constraints (e.g., [] does not appear in syllable-onset position; [j, w] do not appear in
syllable-coda position).
For each word, two non-word primes were constructed by replacing the target
phoneme in the real word: one with a phoneme matching the value of the feature
[sonorant] in the target phoneme, and the other with a sound mismatching the value of the
feature [sonorant] in the target phoneme. For example, for the word conform, where the
target phoneme is [f], a matching prime was [knrm] and a mismatching prime was
[knwrm].
In addition to the real-word targets, 98 pronounceable non-word fillers were
constructed to serve as non-word targets. Similar to the word target setup, these nonwords included both two-syllable and three-syllable items with the target phoneme in
either the onset or coda of a stressed syllable. The 14 target phonemes were identical to
those used in the word targets. For each non-word, two non-word primes were
constructed by replacing the target phoneme in the non-word: one with a phoneme
matching the value of the feature [sonorant] in the target phoneme, and the other with a
sound mismatching the value of the feature [sonorant] in the target phoneme. For
example, for the target [rflv], where the critical sound is [f], a feature-matching
prime was [rblv] and a mismatching prime was [rmlv]. In other words, the
prime-target relationship in the non-word target set was identical to that in the word
target set. The complete set of stimuli is listed in Appendix A.
Participants
Forty undergraduate students (25 females and 15 males) at Ohio University
participated in the experiment. All were native speakers of American English with self-
Sonorant in spoken word recognition 11
reported normal hearing, speech, and language. They received partial course credit for
participating in the experiment.
Procedure
The stimuli were recorded by a phonetically-trained female speaker of American
English. The recording was made in a sound-treated booth in the School of Hearing,
Speech and Language Sciences at Ohio University with a high-quality microphone
(Audio-technica AT825 field recording microphone) connected through a preamplifier
and A/D converter (USBPre microphone interface) to a Windows personal computer
(Dell). The recording was sampled using the Brown Lab Interactive Speech System
(BLISS, Mertus, 2000) at 20 kHz with 14-bit quantization. The stimuli, saved as
individual audio files, were imported to AVRunner, the subject-testing program in BLISS,
for stimulus presentation.
Two stimulus lists were constructed with the following considerations. The
relationship between the prime and target (match, mismatch) was intended to be a withinsubject factor. It was also determined that participants were not to hear the same stimulus
more than once during the experiment to avoid any familiarity or learning effects. To
these ends, two stimulus lists were constructed such that for a given target, each of the
two primes was assigned to a different list such that each list would include both prime
types without repeating any stimulus. The fillers (non-word primes and non-word targets)
were assigned to the two lists in the same way. Therefore, no primes or targets were
repeated in any list. In sum, each list included 196 targets (98 word targets and 98 nonword targets) with the two prime types (match, mismatch) equally distributed. Each
participant was randomly assigned to be tested on one list only. The presentation of lists
Sonorant in spoken word recognition 12
was counterbalanced across participants such that the two lists were presented equally
often across participants. For each participant, AVRunner assigned a uniquely
randomized presentation order such that no two participants received the same order of
presentation. The inter-stimulus interval between the prime and target was 50
milliseconds. The inter-trial interval was three seconds.
Participants were tested individually in a quiet room in the School of Hearing,
Speech and Language Sciences at Ohio University. They listened to the stimuli through a
pair of high-quality headphones (Koss R80) connected to a Windows personal computer
(Dell). The participants were told that they would be listening to pairs of auditory stimuli,
where the second item could a real word or a non-word in English. Their task was to
judge whether the second item in a pair was a real word or a non-word by pressing the
computer keys labeled with YES (for real words) or NO (for non-words). They were also
instructed to respond as quickly as possible as reaction time would be measured. Prior to
the actual experiment, 10 practice trials, none appeared in the actual experiment, were
given to familiarize the participants with the experimental procedure.
Many of the target words were of low frequency, and it was possible that
participants might not know some of them. Following the experiment, participants
received a word check sheet to test whether any target words were unfamiliar to them.
The sheet listed 20 phonotactically legal nonwords and the 20 target words having the
lowest word frequencies. Participants were asked to circle all items they thought were not
words. If a participant circled any real words, those words were removed from that
participant's data set.
Data analysis
Sonorant in spoken word recognition 13
Response accuracy and reaction time were recorded by BLISS automatically.
Reaction time was measured from the onset of the target. Only responses to real word
targets were analyzed and only correct responses were included in the reaction time
analysis. Repeated measures ANOVAs were conducted on response accuracy and
reaction time with relation between prime and target (match, mismatch) and target
phoneme (fricative, sonorant) as fixed factors and participants as a random factor.
Results
Figure 1 shows the average reaction time of lexical decision by relation and target
phoneme. The ANOVAs revealed a significant main effect of relation (F (1, 39) = 7.88, p
< .01). In particular, response was faster when there was a match between the prime and
target phonemes (950 ms) than when there was no match (968 ms). The main effect of
target phoneme was also significant (F (1, 39) = 10.97, p < .005). Specifically, response
was faster for sonorants (943 ms) than for fricatives (976 ms). The relation-target
phoneme interaction was also significant (F (1, 39) = 5.82, p < .05). As Figure 1 shows,
the interaction arose because the [sonorant] feature mismatch slowed down reaction for
fricatives but not for sonorants.
Table 1 shows the average number of errors in the lexical decision task by relation
and target phoneme. Overall, participants made very few errors. Still, the ANOVAs
revealed a significant main effect of target phoneme (F (1, 39) = 41.6, p < .0001).
Specifically, response was more accurate for sonorants (0.7 out of 49) than for fricatives
(1.7 out of 49). The relation-phoneme interaction was also significant (F (1, 39) = 5.03, p
< .05). The interaction arose because the [sonorant] feature mismatch resulted in more
Sonorant in spoken word recognition 14
errors for fricatives but not for sonorants. The pattern of errors is similar to that of the
reaction time, indicating no tradeoff between speed and accuracy.
Summary
As predicted, lexical decision response was faster when there was a match in the
feature [sonorant] between the prime and target, indicating the match/mismatch in this
feature does impact lexical processing. However, the feature match facilitated response
only when the target phoneme was a fricative consonant. In contrast, the feature match
did not make a difference when the target phoneme was a sonorant consonant. This
pattern is identical to what was found in Marks et al. (2002) with a word reconstruction
task. Together these findings suggest that the effect of feature match hinges on the type of
target phoneme involved. A mismatch in [sonorant] disrupted response to fricative targets
but not sonorant targets.
Would this result generalize to all obstruents? The next experiment examined the
feature match effect with another group of obstruent consonants, the stop consonants, to
evaluate whether the feature match effect was limited only to fricative consonants.
EXPERIMENT 2
Method
Materials
Ninety-two English words were selected to be the real-word targets in the priming
experiment. All words have one of 13 target phonemes including six stop consonants [p,
b, t, d, k, ] and seven sonorant consonants [m, n, , l, r, j, w]. As in the previous
experiment, half of the words have two syllables and the other half have three syllables.
Sonorant in spoken word recognition 15
Half of the words have the target phoneme in the onset of the stressed syllable and the
other half in the coda of the stressed syllable.
Target words in the stop and resonant lists were balanced for variables affecting
lexical access. A set of t tests showed no significant difference in word frequency (p =
0.57), number of segments (p = 0.45), number of consonants (p = 0.38), and uniqueness
point (p = 0.74). The consonant change occurred before the uniqueness point in all target
words. Ideally there would be a totally of 104 items, including 13 (six stops and seven
sonorants) x 2 (two- vs. three-syllables) x 2 (syllable onset vs. coda) x 2 (tokens).
However, only 92 words could be selected due to phonotactic constraints, as was noted in
experiment 1.
For each word, two non-word primes were constructed by replacing the target
phoneme in the real word: one with a phoneme matching the value of the feature
[sonorant] in the target phoneme, and the other with a phoneme mismatching the value of
the feature [sonorant] in the target phoneme. For example, for the word pothole, where
the target phoneme is [p], a matching prime was [bthol] and a mismatching prime was
[wthol].
In addition to the real-word targets, 92 pronounceable non-word fillers were
constructed to serve as non-word targets. Similar to the word-target setup, these nonwords included both two-syllable and three-syllable items with the target phoneme in
either the onset or coda of a stressed syllable. The target phonemes were identical to
those used in the word targets. For each non-word, two non-word primes were
constructed by replacing the target phoneme in the non-word: one with a sound matching
the value of the feature [sonorant] in the target phoneme, and the other with a sound
Sonorant in spoken word recognition 16
mismatching the value of the feature [sonorant] in the target phoneme. For example, for
the target [ptl], where the critical sound is [p], a feature-matching prime was
[stl] and a mismatching prime was [ntl]. In other words, the prime-target
relationship in the non-word target set was identical to that in the word target set. The
complete set of stimuli is listed in Appendix B.
Participants
Forty undergraduate students (26 females and 14 males) at Ohio University
participated in the experiment. All were native speakers of American English with selfreported normal hearing, speech, and language. They received partial course credit for
participating in the experiment. None of the participants participated in the previous
experiment.
Procedure
The procedure was identical to that of Experiment 1.
Results
Figure 2 shows the average reaction time of lexical decision by relation and target
phoneme. The ANOVAs revealed no main or interaction effects. The average reaction
time for the matching relation was 925 ms (SD = 121); for the mismatching relation it
was 933 ms (SD = 117). The average reaction time for stops was 926 ms (SD = 119); for
sonorants it was 933 ms (SD = 119). Although a mismatch appeared to slow down stops
more than sonorants, the interaction was not statistically significant.
Table 2 shows the average number of errors in the lexical decision task by relation
and target phoneme. Again, participants made very few errors. The ANOVAs revealed a
significant main effect of relation (F (1, 39) = 40.03, p < .0001). In particular, response
Sonorant in spoken word recognition 17
was more error-prone when the target phonemes matched in the feature [sonorant] (0.9
out of 49) than when the phonemes did not match (1.8 out of 49). This result is rather
counter-intuitive given that one would expect the matching relation to facilitate responses
and to generate fewer errors. However, since the actual number of errors is very small
(2% for match and 4% for mismatch), the statistical difference found here may not be
meaningful.
Summary
In contrast to the fricative-sonorant comparison in Experiment 1, results from the
stop-sonorant comparison showed no significant reaction time difference between the
match and mismatch conditions for either stop or sonorant target phonemes. While the
null result for sonorant target phonemes was consistent with the finding from Experiment
1, the lack of effect for stop consonants suggests that the feature matching effect did not
apply to all obstruent consonants.
GENERAL DISCUSSION
The research question in this study was whether a match or mismatch in the
feature [sonorant] would impact spoken word recognition by humans. Two form priming
experiments with the lexical decision task investigated this issue. The results showed that
reaction was faster when the prime and target matched in [sonorant] for obstruent
consonants but not sonorant consonants. The results further showed that the match effect
was restricted to fricative consonants only. Stop consonants did not show the match effect.
The first result replicated findings from Marks et al. (2002), who found word
reconstruction to be more accurate when target and replacing segments were matched on
the feature [sonorant] than when they were mismatched. This effect occurred for
Sonorant in spoken word recognition 18
obstruents but not sonorants, just as what was found in the present study. Taken together,
the finding that feature mismatch resulted in more errors and increased reaction time in
word recognition indicates that distinctive features, originally posited on the basis of
articulation and acoustics, are also implicated in perceptual processing. The findings also
challenge the assumption that all phonetic segments and features are treated equally in
spoken word recognition.
What could be the reason for the different patterns between sonorants and
obstruents? As noted earlier, Marks et al. (2002) speculated that sonorants were
phonetically similar to vowels, which could explain why a feature mismatch did not
disrupt word reconstruction for sonorants as substantially as it did for obstruents.
Articulatorily, both are consonants, which are produced with a narrow constriction in the
vocal tract. Acoustically, the formation and subsequent release of the constriction
introduces acoustic discontinuity at both the formation and the release. However,
obstruents and sonorants are different in that the articulation of sonorants involves an
abrupt switching of the airflow to a different path in the vocal tract without a substantial
increase in intraoral air pressure (Stevens, 1998, 2002). Liu (1996) developed a sonorant
detector as part of an algorithm for automatic speech recognition. She noted that energy
in the second to fourth formant range decreases at the formation and increases at the
release of a sonorant consonant. During the constriction, the spectrum remains relatively
steady, especially at low frequencies, since the vocal tract shape is relatively constant.
Given that vowel landmarks are where there is maximum amplitude in the first formant
range (Stevens, 2002), there seems to be some merit to the argument that sonorant
consonants are phonetically to similar to vowels. This acoustic similarity could account
Sonorant in spoken word recognition 19
for the obstruent-sonorant contrast observed in Marks et al. (2002) and the current study.
This finding is also consistent with the finding that vowels are appreciably easier to
process in the word reconstruction task (Ooijen, 1996; Cutler, et al., 2000).
The finding that responses to fricatives are different from responses to stops is
also noteworthy. Although obstruents overall generated a feature match effect in Marks et
al. (2002), the contrast between the two experiments in the current study further showed
that the source of the effect is limited to fricatives. The feature match effect is thus not
spread evenly across all obstruents. As noted, the feature [continuant] classifies obstruent
consonants into fricatives and stops. Stops are produced with a complete closure in the
vocal tract; therefore they generate abrupt amplitude decrease and increase at consonant
closure and release. Fricatives, on the other hand, are produced with a sustained narrow
constriction and thus continuous turbulent noise (Stevens, 1998, 2002, 2005). Given their
distinct articulatory and acoustic properties, it is perhaps not surprising that they are
processed differently during spoken word recognition, as revealed by the current study.
How would the current results compare to other spoken word recognition studies
using the form priming paradigm? In the literature, two types of form priming have been
shown. First, the prime and target overlap at the stimulus onset in one or more segments,
using either words or pseudowords as primes. Priming has been shown with both lexical
decision and naming tasks. Second, the prime and target overlap in the rime of the target
word. Rime overlap generally leads to priming more often than does onset overlap, and
pseudoword primes produce greater facilitation than word primes. Rime priming has been
shown with both lexical decision and naming tasks.
Sonorant in spoken word recognition 20
Several studies illustrate these two types of priming. Radeau, Morais and Segui
(1995) used three-phoneme words as both primes and targets in lexical decision and
naming tasks. The primes overlapped the targets in either the first two segments or the
last two segments. Final overlap produced facilitation in both tasks, but initial overlap
produced no facilitation. Slowiaczek, McQueen, Soltano and Lynch (2000) also showed
final overlap priming with monosyllabic words in both naming and continuous lexical
decision tasks. The rime was a major contributor to the priming effect, but the amount of
phonological overlap was also an important contributor.
The Radeau et al. (1995) and Slowiaczek et al. (2000) studies used monosyllabic
targets, but the targets in our two studies were 2- and 3-syllable words. Marslen-Wilson,
Moss and van Halen (1996, Exp. 1) showed rime priming with 2- and 3-syllable Dutch
words. These words were semantic mediators to targets in a lexical decision task.
Nonword primes to the mediators differed from the mediators in only the first segment.
Emmorey (1989) used pairs of 2-syllable words in a lexical decision task. Words with a
strong-weak syllabic stress pattern (the pattern in 79% of our 2-syllable words) showed
large priming effects when primes and targets shared the last syllable. Sharing only the
rime (vowel plus final consonants) did not produce priming. Burton (1992) also used 2syllable primes and targets in lexical decision and naming shadowing tasks. Both tasks
showed facilitation for second syllable overlap but no effect for initial syllable overlap.
Priming with initial overlap was shown by Corina (1992). Two-syllable items,
overlapping in the first syllable, produced significant priming in a lexical decision task
The present experiments used primes that differed from the targets in only one
segment, showing a mix of onset and rime overlap. As noted in the Method, the target
Sonorant in spoken word recognition 21
segment varied in its position in the target word. For 66 (41%) of the words, the target
fell in the onset of the word. This is the condition for rime priming. An additional 12
items (7%) fell in the coda of the second or third syllable. This is the condition for onset
priming. The remaining 83 items fell in the onset of the second or third syllable or the
rime of the first syllable, conditions that do not clearly match the conditions for either
onset or rime priming. Nonetheless, post hoc analyses showed no interactions between
feature matching and target location (onset vs. offset) or between feature matching and
number of syllables (2 vs. 3), indicating the feature matching effect was uniform across
all stimuli.
The results reported in this study have some implications for models of spoken
word recognition. Three models of spoken word recognition incorporate feature
processors: TRACE (McClelland & Elman, 1986), the Distributed Cohort Model
(Gaskell & Marslen-Wilson, 1997), and Stevens' Feature-Based Model (Stevens, 2002,
2005). In TRACE, speech input is first assessed by a set of feature detectors. The effect
of a feature mismatch in the prime, as used in the present two experiments, is to reduce
the number of appropriate features activated for the target segment and its target word,
thereby reducing activation of the target word and its probability of being recognized. In
this manner, TRACE explains the effect of feature mismatch when fricatives are targets.
When the targets were stops or sonorants, however, there was no effect for feature
mismatch, and TRACE seemingly has no mechanism for explaining that outcome.
The Gaskell and Marslen-Wilson (1997) model is a distributed connectionist
model that represents lexical knowledge in a distributed substrate having abstract
representation of both the forms and meanings of words. Successive sets of distinctive
Sonorant in spoken word recognition 22
features, representing connected speech, are input to the model. The identity of the
phonological form of a word is assessed by the goodness of fit between the output
computed by the model and the nearest word entry in the distributed network. The effect
of feature mismatch in the primes, as occurred with fricatives in our Experiment 1, can be
explained by the poorer goodness of fit. When feature mismatch occurred with stops and
sonorants, as in our Experiment 2, the model predicts poorer goodness of fit as well, but
the results of the experiment showed no increase in latencies relative to the match
condition.
Stevens' Feature-Based Model estimates distinctive features from the acoustic
properties and landmarks in the speech signal. These features are matched against
ordered bundles of features representing the phonology of words in the mental lexicon.
Word identification occurs when a sequence of estimated features matches a sequence in
the mental lexicon. When a feature mismatch occurs, the latency for identifying a word
should increase, as occurred in the feature mismatch for fricatives in our Experiment 1.
The model does not explain why latencies were unaffected by the feature mismatch
condition for stops and sonorants.
Sonorant in spoken word recognition 23
REFERENCES
Burton, M. W. (1992, November). Syllable priming in auditory word recognition.
Poster presented at the 33rd Annual Meeting of the Psychonomic Society, St. Louis, MO.
Chomsky, N. and Halle, M. (1968). The Sound Patttern of English. New York:
Haper & Row.
Clements, G. N. (1985). The geometry of phonological features. Phonology
Yearbook, 2, 225-252.
Connine, C. M., Blasko, D. G., and Titone, D. (1993). Do the beginnings of words
have a special status in auditory word recognition? Journal of Memory & Language, 32,
193-210.
Connine, C. M., Blasko, D. G., and Wang, J. (1994). Vertical similarity in spoken
word recognition: Multiple lexical activation, individual differences, and the role of
sentence in context. Perception & Psychophysics, 56, 624-636.
Corina, D. P. (1992). Syllable priming and lexical representations: Evidence from
experiments and simulations. In Proceedings of the Fourteenth Annual Conference of the
Cognitive Science Society (pp. 779-784). Bloomington: Indiana University.
Cutler, A., Sebastián-Gallés, N., Soler-Vilageliu, O., and van Ooijen, B. (2000).
Constraints of vowels and consonants on lexical selection: Cross-linguistic comparisons.
Memory & Cognition, 28, 746-755.
Emmorey, K. D. (1989). Auditory morphological priming in the lexicon.
Language and Cognitive Processes, 4, 73-92.
Sonorant in spoken word recognition 24
Gaskell, M. G., and Marslen-Wilson, W. D. (1997). Integrating form and
meaning: A distributed model of speech perception. Language and Cognitive Processes,
12, 613-656.
Halle, M. (1992). Features. In W. Bright (ed.), Oxford International Encyclopedia
of Linguistics (pp. 207-212). New York: Oxford University Press.
Halle, M., and Stevens, K. N. (1991). Knowledge of language and the sounds of
speech. In J. Sundberg, L. Nord, and R. Carlson (Eds.), Music, Language, Speech and
Brain (pp. 1-19). London: MacMillan.
Jakbson, R., Fant, C. G. M., and Halle, M. (1952). Preliminaries to speech
analysis: the distinctive features and their correlates. MIT Acoustics Laboratory
Technical Report 13. Reprinted 1967, Cambridge, MA: MIT Press.
Kenstowicz, M. (1994). Phonology in Generative Grammar. Cambridge:
Blackwell Publishers.
Liu, S. A. (1996). Landmark detection for distinctive feature-based speech
recognition. Journal of the Acoustical Society of America, 100, 3417-3430.
Luce, P. A., and Pisoni, D. B. (1998). Recognizing spoken words: The
Neighborhood Activation Model. Ear & Hearing, 19, 1-36.
Marks, E. A., Moates, D. R., Bond, Z, S., and Stockmal, V. (2002). Word
reconstruction and consonant features in English and Spanish. Linguistics, 40, 421-438.
Marks, E. A., Moates, D. R., Bond, Z. S., and Vazquez, L. (2002). Vowel
mutability: The case of monolingual Spanish listeners and bilingual Spanish-English
listeners. Southwest Journal of Linguistics, 21, 73-99.
Sonorant in spoken word recognition 25
Marslen-Wilson, W. D. (1993). Issues of process and representation in lexical
access. In G. Altmann & R. Shillcock (Eds.), Cognitive Models of Speech Processing:
The Second Sperlonga Meeting (pp. 187-210). Hillsdale, NJ: Erlbaum.
Marslen-Wilson, W. D., Moss, H. E., and van Halen, S. (1996). Perceptual
distance and competition in lexical access. Journal of Experimental Psychology: Human
Perception and Performance, 22, 1376-1392.
Marslen-Wilson, W. D., and Warren, P. (1994). Levels of perceptual
representation and process in lexical access: Words, phonemes and features.
Psychological Review, 101, 653-675.
Marslen-Wilson, W. D., and Welsh, A. (1978). Processing interactions and lexical
access during word recognition in continuous speech. Cognitive Psychology, 10, 29-63.
Marslen-Wilson, W. D., and Zwitserlood, P. (1989). Accessing spoken words:
The importance of word onsets. Journal of Experimental Psychology: Human Perception
& Performance, 15, 576-585.
McCarthy, J. J. (1988). Feature geometry and dependency: a review. Phonetica,
45, 84–108.
McClelland, J. L., and Elman, J. L. (1986). The TRACE model of speech
perception. Cognitive Psychology, 18, 1-86.
Mertus, J. A. (2000). BLISS: The Brown Lab Interactive Speech System. Brown
University.
Milberg, W., Blumstein, S. E., and Dworetzky, B. (1988). Phonological factors in
lexical access: Evidence from an auditory lexical decision task. Bulletin of the
Psychonomic Society, 26, 305-308.
Sonorant in spoken word recognition 26
Moates, D. R., Sutherland, M. T., Bond, Z. S., and Stockmal, V. The feature
[continuant] in word reconstruction. Manuscript in preparation.
Norris, D. (1994). SHORTLIST: A connectionist model of continuous speech
recognition. Cognition, 52, 189-234.
Ooijen, B. van (1996). Vowel mutability and lexical selection in English:
Evidence from a word reconstruction task. Memory & Cognition, 24, 573-583.
Radeau, M., Morais, J., and Segui, J. (1995). Phonological priming between
monosyllabic spoken words. Journal of Experimental Psychology: Human Perception
and Performance, 21, 1297-1311.
Slowiaczek, L. M., McQueen, J. M., Soltano, E. G., and Lynch, M. (2000).
Phonological representations in prelexical speech processing: Evidence from form-based
priming. Journal of Memory and Language, 43, 530-560.
Stevens, K. N. (1972). The quantal nature of speech: Evidence from articulatoryacoustic data. In P. B. Denes and E. E. David, Jr. (Eds.), Human Communication: A
Unified View (pp. 51–66). New York: McGraw-Hill.
Stevens, K. N. (1989). On the quantal nature of speech. Journal of Phonetics, 17,
3-46.
Stevens, K. N. (1997). Articulatory-acoustic-auditory relationships. In W. J.
Hardcastle and J. Laver (Eds.), The Handbook of Phonetic Sciences (pp. 507-538).
Cambridge: MIT Press.
Stevens, K. N. (1998). Acoustic Phonetics. Cambridge: MIT Press.
Sonorant in spoken word recognition 27
Stevens, K. N. (2002). Toward a model for lexical access based on acoustic
landmarks and distinctive features. Journal of the Acoustic Society of America, 111,
1872-1891.
Stevens, K. N. (2005). Features in speech perception and lexical access. In D. B.
Pisoni and R. E. Remez (Eds.), The Handbook of Speech Perception (pp. 125-155).
Cambridge: Blackwell Publishers.
Zwitserlood, P. (1996). Form priming. Language and Cognitive Processes, 11,
589-596.
Sonorant in spoken word recognition 28
ACKNOWLEDGMENTS
We thank Sara Kellgreen for administering the experiment and assisting in data
analysis and Carla Youngdahl for recording the materials. We also thank Z. S. Bond for
many helpful discussions.
Sonorant in spoken word recognition 29
Table 1. Average number of errors (out of possible 49) in the lexical decision task in
Experiment 1. Standard deviation is shown in parenthesis.
Prime-target relation
Target phoneme
Match
Mismatch
Fricatives
1.53 (1.38)
1.88 (1.54)
Sonorants
0.8 (1.09)
0.6 (0.81)
Sonorant in spoken word recognition 30
Table 2. Average number of errors (out of possible 46) in the lexical decision task in
Experiment 2. Standard deviation is shown in parenthesis.
Prime-target relation
Target phoneme
Match
Mismatch
Fricatives
1.65 (0.89)
1.05 (0.93)
Sonorants
1.85 (0.95)
0.73 (0.88)
Sonorant in spoken word recognition 31
FIGURE CAPTIONS
Figure 1. Average reaction time (+SE) for fricative and sonorant targets in
Experiment 1.
Figure 2. Average reaction time (+SE) for stop and sonorant targets in Experiment
2.
Download