Sentence Recognition Materials Based on Frequency of Word Use

advertisement
J Am Acad Audiol 12 : 514-522 (2001)
Sentence Recognition Materials Based on
Frequency of Word Use and Lexical
Confusability
Theodore S. Bell*
Richard H. Wilson t
Abstract
The sentence stimuli developed in this project combined aspects from several traditional
approaches to speech audiometry. Sentences varied with respect to frequency of word use
and phonetic confusability. Familiar consonant-vowel-consonant words, nouns and modifiers,
were used to form 500 sentences of seven to nine syllables. Based on concepts from the
Neighborhood Activation Model for spoken word recognition, each sentence contained three
key words that were all characterized as high- or low-use frequency and high or low lexical
confusability. Use frequency was determined by published indices of word use, and lexical
confusability was defined by a metric based on the number of other words that were similar
to a given word using a single phoneme substitution algorithm. Thirty-two subjects with normal
hearing were randomly assigned to one of seven presentation levels in quiet, and an additional 32 listeners were randomly assigned to a fixed-level noise background at one of six
signal-to-noise ratios . The results indicated that in both quiet and noise listening conditions,
high-use words were more intelligible than low-use words, and there was an advantage for
phonetically unique words; the position of the key word in the sentence was also a significant factor . These data formed the basis for a sequence of experiments that isolated significant
nonacoustic sources of variation in spoken word recognition .
Key Words: Neighborhood Activation Model, speech intelligibility, word recognition
Abbreviations : CVC = consonant-vowel-consonant, HD = high frequency of use word from
a dense neighborhood, HS = high frequency of use word from a sparse neighborhood, LID
= low frequency of use word from a dense neighborhood, LS = low frequency of use word
from a sparse neighborhood, NAM = Neighborhood Activation Model, SIN = Speech in Noise,
SNR = signal-to-noise ratio
peech audiometry is used in the evaluation of auditory function as a diagnostic
measure and as a measure of communication ability. For the most part, spondaic words
and monosyllabic words have been used to assess
these two aspects of word recognition abilities
(e .g ., Hudgins et al, 1947 ; Egan, 1948 ; Hirsh et
S
*Department of Communication Disorders, California
State University at Los Angeles, Los Angeles, California ;
tJames H . Quillen VA Medical Center, Mountain Home,
Tennessee, and Departments of Surgery and Communication
Disorders, East Tennessee State University, Johnson City,
Tennessee
Reprint requests : Theodore S . Bell, Department of
Communication Disorders, California State University at
Los Angeles, 5151 State University Drive, Los Angeles, CA
90032-8170
514
al, 1952 ; Tillman and Carhart, 1966). Diagnostic tests require the sensitivity to discriminate
between listeners with normal hearing and
patients with various hearing impairments. For
diagnostic purposes, the test items should have
little redundancy (e.g ., monosyllabic word tests) .
It has been known for many years that word tests
provide useful but limited information about
receptive communication in everyday life by
individuals with hearing impairment . The
assessment of receptive communication abilities ideally should involve real-life speech materials and real-life listening conditions . This
report describes a corpus of sentence materials
that ultimately are intended for use in the clinical assessment of speech recognition abilities.
The target words in the sentences were selected
based on aspects of the Neighborhood Activation
Model (NAM) of spoken language (Luce, 1986).
Sentence Recognition Materials/Bell and Wilson
There are considerable data to support the
NAM with regard to spoken word recognition
(e .g ., Pisoni and Luce, 1987 ; Goldinger et al,
1989 ; Luce and Pisoni, 1998 ; Dirks et al, 2001) .
The NAM assumes that the recognition of spoken words is characterized by a process in which
phonetically similar words in memory are organized for perceptual processing. Then the member of the activated set that is most consistent
with the acoustic-phonetic information in the
speech waveform is selected . Further, it is
assumed that word frequency (of occurrence)
biases responses toward the more likely, or more
frequent, members of the activated neighborhood . The NAM predicts that an increase in the
activation level of a stimulus word's similarity
neighborhood lowers the probability of identifying the stimulus word itself.
Connected speech materials (sentences),
which are exemplary of everyday communication, are by definition a valid speech test paradigm for assessing the receptive communication
ability of individuals . Since the early development of speech recognition materials, sentence
materials have been used to evaluate communication systems and individuals (Fletcher, 1929 ;
Egan, 1944 ; Hudgins et al, 1947 ; Silverman and
Hirsh, 1955 ; Speaks and Jerger, 1965 ; Kalikow
et al, 1977) . Recently, several sentence tests
have been developed for use in assessing various aspects of speech recognition function, such
as the Connected Speech Test (Cox et al, 1987),
Speech in Noise (SIN) (Killion and Villchur,
1993), and Hearing in Noise Test (Nilsson et al,
1994) ; however, few have incorporated sentence
materials in a reliable and systematic protocol
for routine clinical use (Martin et al, 1998) .
The importance of the use of sentences in the
auditory evaluation of patients was emphasized
by Jerger et al (1968), who stated that sentence
tests compared with isolated word tests "manipulate a crucial parameter of ongoing speech, its
changing pattern over time" (p . 319) . A sentence
provides information about its constituent words
by providing the relationships among words .
The increased redundancy and semantic cues
in sentence materials result in a more rapid rise
in the psychometric function as compared to
monosyllabic words . There are several formats
employed using sentence materials, ranging from
simple interrogative sentences, which the subject answers (Fletcher, 1929 ; Hudgins et al, 1947),
to target-word formats, in which the subject
identifies target words within the sentence (Silverman and Hirsh, 1955 ; Berger, 1969) . Kalikow
et al (1977) and Bilger et al (1984) introduced the
concept of redundancy more formally into their
Speech Perception in Noise test by controlling the
predictability of the target word, which was
always the final word of the sentence . The difference between words with high and low predictability from the sentence cues may provide
a measure of the individual's cognitive and memory capabilities in speech perception .
Other sentence tests have been devised that
differ in format and composition, including artificial sentences (Speaks and Jerger, 1965), nonsense sentences (Nakatani and Dukes, 1973),
and meaningful sentences of everyday language
(Bench and Bamford, 1979 ; Plomp and Mimpen, 1979 ; Smoorenburg, 1986, 1989) . Plomp
and Mimpen analyzed whole sentences, words,
and individual phonemes, concentrating on the
analysis of many variables and processes from
sentence scores to phoneme scores . These sentences have been employed with an adaptive
procedure format to determine specific points
(e .g ., 50%) on the psychometric function . The
Speech SIN test was based on the Institute of
Electrical and Electronics Engineers (IEEE)
sentences recorded by a female talker in a background of four-talker babble (Killion and Villchur,
1993) . Each sentence contains five key words
that are used for scoring . The SIN is used for
evaluation of hearing aids and is presented at
four signal-to-noise ratios (SNRs) in 83 and 53
dB SPL of noise in a sound field .
Nonacoustic knowledge sources have been
known to contribute to the identification of
words in normal continuous discourse (e .g .,
Marlsen-Wilson and Tyler, 1980 ; Salasoo and
Pisoni, 1985) . Marlsen-Wilson and Tyler found
that less than half of the acoustic-phonetic code
was required to understand words in normal
sentence contexts . Further support for this finding comes from studies that have used a stimulus gating paradigm, wherein measures were
collected that reflected the minimum acousticphonetic input required for word recognition
(Grosjean, 1980 ; Cotton and Grosjean, 1984 ;
Salasoo and Pisoni, 1985) . As Miller (1951)
demonstrated, these studies showed that less
stimulus information was required to identify
words in sentences than to identify the same
words in isolation . The results of Grosjean's
(1980) gating experiments suggested that incorrect responses in a recognition task included
not only acoustically similar words but also
semantically related words . These data were
used by Grosjean to refute the claim that only
acoustic-phonetic information was used to compose the set of possible lexical candidates . He
515
Journal of the American Academy of Audiology/Volume 12, Number 10, November/December 2001
concluded that a model similar to Morton's (1979)
interactive logogen model was required to
explain these data ; it was suggested that both
acoustic and nonacoustic knowledge sources
interacted when possible word candidates were
selected by listeners.
Salasoo and Pisoni (1985) provided support
for Marlsen-Wilson and Tyler's "principle of
bottom-up priority." Acoustic-phonetic patterns
are the primary source of information used to
form a set of lexical candidates accessible from
long-term memory, although semantic and syntactic information available from sentence contexts also provides additional candidates to the
pool of potential words. The balance between
these sources of knowledge in bottom-up and
top-down processes allows the listener to comprehend speech even when the encoding is impoverished either by noise or sensory impairment .
Assuming that the acoustic-phonetic code is
degraded for listeners with hearing loss as compared with listeners with normal hearing, the
degraded stimulus leads to an inherently larger
neighborhood . The impoverished sensory encoding of the impaired auditory systems leads to
acoustic-phonetic encodings that are "fuzzy,"
resulting in greater similarity to other words.
Thus, one consequence of an impaired auditory
system is that lexical similarity neighborhoods
are larger and word frequency of occurrence
effects is diminished because the speech stimuli
are inherently ill defined. If the NAM is applied
to listeners with hearing impairment, the size of
the neighborhood would be dense because of the
increased uncertainty from the degraded encoding, leading to a larger set of alternatives in the
word recognition process, and although a high frequency of occurrence words would be more likely,
the ratio of frequency of occurrence to neighborhood frequency would be dominated by the
neighborhood frequencies.
The speech stimuli developed in this report
combine aspects from several traditional
approaches to speech audiometry. Sentences
vary with respect to redundancy and semantic
context, and the individual word constituents
vary with respect to word frequency of use and
phonetic confusability. The target words are
embedded in sentences in a format similar to the
Plomp and Mimpen (1979) or Bench and Bamford (1979) sentences. An important difference,
however, is that the target words vary with
respect to word frequency of occurrence and
word confusability. The sentences developed are
representative of everyday speech ; specifically,
the sentences are brief and easy to repeat. Phone516
mic content is equivalent across lists . The data
reported here provide the basis for a speech test
with potential clinical applications . A protocol
based on these materials could improve evaluation of speech communication problems associated with hearing impairment by addressing
receptive speech problems beyond issues related
to simple audibility of the signal .
METHOD
Materials
Monosyllabic consonant-vowel-consonant
(CVC) stimuli were selected as target words on
the basis of use frequency, lexical confusability,
and familiarity ratings . Here, monosyllabic
refers to spoken words as opposed to written
words. Words containing syllabic l, m, or n are
considered monosyllabic when spoken but polysyllabic when written. Word use frequency was
based on the Computational Analysis of Present Day English (Kucera and Francis, 1969), in
which samples of everyday reading materials
were analyzed for individual word use expressed
as the number of times the word was found per
million words sampled . Lexical confusability
was defined as the number of other words in the
language that are phonetically similar to a given
target word . A word was considered to be similar to the target word if it differed by a single
phoneme. A "single phoneme substitution" rule
was then employed in which a word was considered similar if the word could be created by
substituting one phoneme.
The terminology advanced by Luce (1986)
in the NAM of word recognition is used . Aword
is considered "sparse" if it is relatively phonetically unique, that is, similar words are few in
number. A word is considered "dense" if it is phonetically similar to many other words in the lexicon . Sparse and dense metaphorically refer to
"similarity neighborhoods" in an assumed representation of the mental lexicon. The categorization of low- versus high-use frequency words
was based on the entire set of familiar monosyllabic CVC words in the pocket lexicon. Familiarity was described by a 7-point rating scale
applied to the entire pocket lexicon rated by Indiana University undergraduate student volunteers (Nusbaum et al, 1984). For the current
protocol, only words rated greater than 6.5
(highly familiar) were selected . The words were
sorted on the basis of use frequency with the
upper and lower thirds of the distribution
selected and labeled high and low use, respec-
Sentence Recognition Materials/Bell and Wilson
Table 1
Example Sentences for Each of the Four Categories
Word
Frequency
Density
cheese
9
7
13
11
2
2
19
30
High use, sparse
The point of the knife is too sharp .
The breeze helped to clear the fog .
dent
dine
roam
point
fog
395
25
8
10
High use, dense
The rope has been tied in a knot .
knot
4617
264
1662
27
21
23
Lexical Category
Low use, sparse
The lump of cheese has turned sour.
The chops will sizzle in the blaze.
Toss the crab onto the barge .
Low use, dense
The dent in his new bike made him yell .
I like to sip pop while I dine
The lamb likes to roam around the moat .
The rebel has a large horse to mount.
Use a fan to keep the room cool.
Heat some pea soup over the fire .
blaze
barge
large
keep
some
7
6
361
9
28
7
The target words in each sentence are italicized, Use frequency (count per million) and density (number of words similar by single
phoneme substitution) for a selected target word in each sample sentence are also provided .
tively. Within the use categories, the words
were again sorted on the basis of lexical similarity. Within each category, words of high
(dense) and low (sparse) lexical confusability
were determined using a tertiary split. The following four categories of words formed a 2 x 2
factorial arrangement of high and low word use
frequency and dense and sparse lexical neighborhoods : low frequency of use, sparse neighborhood (LS) ; low frequency of use, dense
neighborhood (1,D) ; high frequency of use, sparse
neighborhood (HS) ; and high frequency of use,
dense neighborhood (HD) .
A pool of approximately 1800 words
resulted from this selection process . Each sen-
tence contained 3 key words selected from
within the same lexical category to form 500
sentences of seven to nine syllables . Examples
of sentences from each lexical category are
listed in Table 1 . The corresponding use frequency (count per million) and density (number of words similar by single phoneme
substitution rule) for selected words from each
category are also presented in Table 1 . The
target words were embedded in sentences in a
format similar to the Plomp and Mimpen (1979)
and Bench and Bamford (1979) sentences . An
important difference, however, was that the
target words varied with respect to word frequency of occurrence and word confusability.
A female speaker with a standard Midwestern dialect recited three repetitions of each
sentence while seated in a sound-attenuated
audiometric test chamber. A low-noise microphone (AKG, Model C460-B) and preamplifier
(Symetrix, Model SX202) were situated 7.5 cm
from the talker at a 20-degree angle of incidence. The sentences were recorded on digital
audiotape (Sony, Model 59ES). Levels were monitored using an oscilloscope throughout the single session in which the entire corpus was
recorded . The sentences were screened for intonation pattern, mispronunciations, peak clipping,
and extraneous noises . The best example of each
recorded sentence was then selected and transferred to a digital waveform editor (Kay Elemetrics, Computer Speech Lab, Model 1600-B).
Criteria included fidelity, dynamic range, extraneous sounds, pronunciation errors, naturalness, and prosody. The overall rms level of each
sentence was determined as the median rms of
overlapping 20-msec windows. Following this
analysis, the level of each of the individual sentences was adjusted so that every sentence had
an identical rms value of x-0.1 dB . This procedure for adjusting the level of the sentence was
accomplished by computing a scaling factor
needed to produce the median rms decibel level
using the relation scaling factor = l0 dB/2°, where
AdB is the difference between the nominal
median level and the actual level of the recorded
utterance . The correction was applied by multiplying the voltage (D/A counts) of individual
sentence waveform by the computed scaling factor. The average scaling factor was <2 dB . Any
sentence that required scaling that produced
clipping of the waveform was discarded, and an
alternate sentence was substituted if available
from the original corpus of recorded materials .
The sentences were then transferred back onto
517
Journal of the American Academy of Audiology/Volume 12, Number 10, November/December 2001
digital audiotape for experimental presentation . A 1000-Hz calibration signal was set to
coincide with the overall rms level of the adjusted
sentence lists.
Subjects
In experiment 1 (quiet background), 32 paid
listeners were recruited . Experiment 2 (noise
background) involved an additional group of 32
paid subjects . All participants were determined
to have pure-tone thresholds less than 20 dB HL
(ANSI, 1996) at octave frequencies between 250
and 8000 Hz and normal middle ear function as
determined by otoscopic examination and aural
acoustic immittance measures . Speech audiometry measures were normal for all subjects .
There were no exclusion criteria with regard to
gender or ethnicity of the participants . The ages
of the subjects varied between 18 and 48 years .
English was the first language of all participants .
Procedure
Following audiometric evaluation, subjects
continued with a 45-minute test session during
which they were presented sentences in quiet or
noise. Following a brief familiarization period,
the sentences were presented in four blocks of
120 sentences corresponding to each of the four
lexical test conditions (LS, LD, HS, HD). The test
sentences originated from a digital audiotape
tape deck (Sony, Model 59ES) that was routed
through an audiometer (Grason-Stadler, Model
16) to earphones (TDH-50P) with Telephonics
cushions (P/N 510C017-1) to the subject, who was
seated in a double-walled audiometric test booth.
In experiment 1 (quiet), the subjects were randomly assigned to one of five presentation levels (14-22 dB SPL in 2-dB steps) . In experiment
2 (noise), the subjects were randomly assigned
to one of five SNRs (-8 to 0 dB SNR in 2-dB steps)
to form five groups of five to seven listeners in
each experiment . The noise was spectrally
shaped to match the long-term rms frequency
contour of the speech materials. The noise was
presented at 70 dB SPL, and the level of the
speech varied to produce the SNRs . The selection of presentation levels for the experiment was
based on preliminary data that established the
upper and lower limits of the psychometric function relating presentation level to recognition
performance. In both experiments, the entire corpus of sentences was presented in randomized
blocks, and the subjects'task was to repeat the
518
sentence . The experimenter monitored the subject responses and scored each of the three key
words in each sentence for accuracy. The dependent variable is the same in both experiments
(i .e ., word recognition) . In experiment 1, the
independent variables are word use, lexical confusability, target word position, and presentation
level . In experiment 2, the independent variables
are word use, lexical confusability, target word
position, and SNR .
RESULTS
Experiment 1: Sentence Recognition in
Quiet
The percentage of target words correctly recognized was determined for each of the four sentence categories as a function of presentation
level and word location in the sentence (Fig . 1 and
Table 2, upper panel) . An analysis of variance
(ANOVA) was performed with four factors, with
presentation level forming a between-subject
factor (14, 16, 18, 20, and 22 dB SPL) and word
use (low, high), lexical similarity neighborhood
(sparse, dense), and position of the key word
(first, second, third) in the sentence forming
within-subject factors.
First, the high-use words (open symbols)
were significantly more intelligible by 10 to 30
percent than the low-use words (filled symbols)
(F = 22 .7, df = 1, 18, p < .001), even though all
words were familiar to the listeners and presented at equal rms levels . This significant difference was apparent at all presentation levels,
regardless of word position .
Second, there was a significant effect for
lexical confusability of the key words in the sentences (F = 71 .4, df = 1, 18, p < .0001), with
sparse words (squares) more intelligible than
dense words (circles). This significant difference was diminished at the higher presentation levels and was largest at the lower
presentation levels . The lexical confusability
effect (i .e ., sparse versus dense words) is generally smaller than the word use effect and also
tends to be larger for high-use words.
Third, there was also a significant difference
in key word intelligibility as a function of its
position in the sentences (F = 48 .6, df = 2, 36,
p < .0001) (Fig . 2, upper panel) . The third word
was less intelligible than the first and second
words in the sentences. This difference may be
the result of the natural inflection of the sentences ; however, the pattern of results attributable to word use and lexical confusability did
Sentence Recognition Materials/Bell and Wilson
Table 2
Percent Correct Recognition (and SDs) for the Three Word Positions and the Four Lexical
Categories for the Quiet and Noise Conditions
LS1
156
(9 .3)
33 .2
(97)
37 .9 (11 .5)
49 .5 (33 .1)
56 .6
(9 .9)
LS3
150
(7 .7)
33 .2
(9 .8)
40 .0 (10 .4)
49 .4 (29 .6)
56 .9
(6 .5)
194
LD1
LD2
141
17 .5
LD3
10 .3
(9 .9)
(8 .0)
(7 .5)
(8 .4)
HS1
HS2
394 (14.3)
46 .9 (12 .8)
HD1
HD2
32 .3
27 .8
HS3
35 .3 (17 .6)
HD3
21 .7
(7 .2)
(6 .4)
(5 .5)
37 .2
26 .0
26 .2
(7 .2)
(9 .9)
(9 .6)
22 .0 (13 .5)
55 .5 (9 .5)
62 .1 (11 .3)
48 .1 (15 .0)
46 .4 (9 .8)
43 .4 (11 .1)
18 dB SPL
22 dB SPL
14 dB SPL
LS2
16 dB SPL
20 dB SPL
Quiet
40 .8 (18 .1)
51 .7 (31 .9)
29 .2 (144)
49 .2 (29 .1)
34 .7 (14 .4)
52 .4 (12 .7)
47 .8 (10.6)
36 .9 (11 .2)
48 .7 (17 .3)
68 .4 (19 .8)
79 .6
67 .7 (22.2)
68 .0 (22 .5)
60.8 (23 .5)
Noise
-8 de SNP
-6 de SNR
-4 dB SNR
-2 dB SNR
LS1
11 .8 (12 .5)
28 .1
(6 .0)
51 .7
(9 .5)
72 .9
17 .6
(8 .8)
34 .3
(7 .5)
59 .4 (12 .2)
70 .3
66 .1
(5 .6)
(8 .5)
LS2
LS3
LD1
10 .8
4.9
(4 .2)
(6 .6)
8.8
2.1
(5 .0)
(3 .0)
16 .8
5.3
(4 .2)
(5 .6)
30 .1
33 .4
5 .6
LD2
LD3
HS1
HS2
34 .5
31 7
HD2
HD3
18 .8
8.3
HS3
HD1
176
206
(4 .2)
(9 .5)
(9 .4)
(8 .5)
(5 .2)
26 .4 (10.4)
15 .3 (7 .2)
(9 .2)
(3 .5)
52 .1 (10 .2)
48 .7 (12.2)
(6 .4)
(8 .0)
29 .4 (13 .2)
15 .4 (7 .0)
50 .0 (6 .8)
57 .8 (10.5)
52 .3
39 .6
(5 .5)
90 .2
87 .5
(8 .2)
(9 .8)
78 .6
63 .4
(8 .5)
(8 .8)
75 .4
81 .9
(9 .0)
(9 .4)
(6 .3)
(7 .8)
76 .2
(7 .4)
(5 .0)
(5 .7)
(7 .6)
0 d6 SNR
79 .3 (11 .2)
72 .1
68 .2
(6 .6)
(9 .4)
72 .4
57 .1
(6.6)
(6 .0)
74 .6
61 .4 (12.4)
41 .4 (9 .8)
(4 .2)
(3 .5)
78 .4
77 .5
74 .3
68 .5
74 .0 (13.5)
63 .3 (8 .8)
45 .9 (10.2)
39 .0 (9 .4)
38 .3
19 .4
57 .5 (12.2)
47 .5 (25 .5)
69 .7 (19.6)
71 .8 (17.9)
55 .7 (167)
61 .1 (14 .3)
(9 .5)
54 .7 (11 .1)
56 .0 (27.8)
26 .5 (15 .7)
53 .5 (21 .1)
45 .4 (13 .8)
56 .9
91 .6
90 .4
(8 .4)
(6 .2)
85 .6
88 .4
(6.0)
(5 .2)
(5 .4)
(5 .0)
(9 .5)
87 .6 (9 .8)
75 .2 (11 .2)
LS = low use, sparse, LID = low use, dense, HS = high use, sparse, HD = high use, dense .
response patterns in noise mirrored the patterns observed in quiet. First, high-use words
(open symbols) were more intelligible than lowuse words (filled symbols) (F = 32 .4, df = 1, 18,
not remain constant for each of the key word
positions, indicated statistically by a three-way
interaction among use, density, and word position (F = 23 .4, df = 2, 36, p < .0001) . Word position is included in both experiments to test the
homogeneity of intelligibility of words within
sentences .
100
Experiment 2: Sentence Recognition in
Noise
80
As in the previous analysis, the percentage
of key words correctly recognized was determined for each of the four sentence categories
as a function of SNR (-8, -6, -4, -2, 0 dB SNR),
with the background noise presented at 70 dB
SPL (Fig . 3 and Table 2, lower panel) and word
location in the sentence (see Fig. 2, lower panel) .
An ANOVA was performed with four factors,
with SNR forming a between-subject factor and
word use (low, high), lexical similarity neighborhood (sparse, dense), and position of the key
word (first, second, third) in the sentence forming within-subject factors .
Again, the ANOVA indicated three significant differences. As shown in Figure 3, the
60
40
20
0
-8
-6
-4
-2
Signal-to-Noise Ratio (dB)
0
Figure 1 Psychometric functions relating percent correct to presentation level in quiet as a function of word
use frequency (low use and high use) and lexical-phonetic
confusability (sparse and dense) . HS = high use, sparse ;
LS = low use, sparse ; HD = high use, dense; LD = to low
use, dense.
519
Journal of the American Academy of Audiology/Volume 12, Number 10, November/December 2001
100
Quiet
r_
0
m
40
U 20
70
0
60
14
50
HS
40
HD
LS
30
-
LD
1
2
Word Position in Sentence
3
Figure 2 Average percent correct for word use frequency (low and high use) and lexical-phonetic confusability (sparse and dense) as a function of word position
in sentences (first, second, or third) . HS = high use,
sparse ; LS = low use, sparse ; HD = high use, dense; LD
= low use, dense.
p < .0001), even though all words were familiar
to the listeners and presented at equal SNRs .
This significant difference was apparent at all
levels, regardless of the word position in the
sentences (Fig . 2, lower panel) .
Second, there was a significant effect for
lexical confusability of the key words in the sentences (F = 23 .6, df = 1, 18, p < .0001), shown in
Figure 3 with sparse words (squares) more intelligible than dense words (circles). The data from
noise conditions are less variable (see Table 2),
and the relative size of the lexical effects does not
vary significantly with SNR. These data are similar to those obtained in the first experiment.
Third, there was also a significant difference in key word intelligibility as a function of
the position of the word in the sentence (F =
39 .0, df = 2, 36, p < .0001). Figure 2 (lower panel)
displays the mean values for word position for
each of the four lexical categories collapsed across
SNRs . The third word again was less intelligible
than the first and second words in the sentences .
DISCUSSION
T
16
18
20
22
Presentation Level (dB SPL)
he fact that the difference in speech recognition scores between lexical conditions
Figure 3 Psychometric functions relating percent correct to the signal-to-noise ratio in a 70 dB SPL noise
background as a function of word use frequency (low use
and high use) and lexical-phonetic confusability (sparse
and dense) . HS = high use, sparse ; LS = low use, sparse ;
HD = high use, dense; LD = low use, dense.
was as large as 30 percent demonstrates that
nonacoustic sources contribute significantly to
spoken word recognition and thus partly
explains why speech recognition is so difficult
to reliably or efficiently measure. By isolating
these nonacoustic factors, speech recognition can
be more effectively assessed in research and
clinical settings, potentially separating peripheral and central contributions to speech perception. It is important to stress that all of the
key words in the sentences were simple, familiar monosyllables and that the difference
obtained between low- and high-use words does
not reflect language proficiency or vocabulary .
For example, the word "cheese" in Table 1 is a
relatively low-use word, but it is commonly
understood by all. Because of the large number
of target words, possible phonemic peculiarities
are minimized.
The third word typically was 5 to 10 percent
lower than other key words. The observation
that the third target word in the sentences was
less intelligible than the target words in the
first or second position may be attributable to
several causes . The most likely reason is the natural decrease in level and articulation common
at the end of spoken sentences. It may be possible to reduce the word to word variability in
these sentences by rescaling the speech and
noise to improve the level or SNR selectively for
the third target word of the sentences. The effect
of word position is relatively insignificant when
scoring target words . If scoring is based on full
sentence intelligibility (i .e ., all or none), then this
Sentence Recognition Materials/Bell and Wilson
phenomenon with the third word is a significant
drawback because the sentence score will, in
essence, reflect the weakest word, which the
current data indicate are typically the final word
of the sentence .
As expected, the variability of speech recognition scores obtained in noise background listening conditions was less than that obtained
under quiet listening conditions . The dynamic
range of the psychometric functions in quiet
and noise was similar, as were the slopes of
the functions. This result supports the internal
validity of the measurements obtained in this
study. The use of sentence materials addresses
the issue of validity as well, especially in the
ability to generalize beyond the laboratory or
clinic to real-life speech samples. The test based
on these materials can be very efficient because
of the reduced variability as the result of isolating extraneous variation in percentage scores .
Further, the use of a sentence format gives the
instrument both validity and efficiency. Validity derives from the fact that sentence materials are used, composed entirely of simple and
familiar constituents . Efficiency derives from the
fact that three words are tested within a single
trial and also from the reduced variability.
Currently, we are in the process of refining these materials to reduce further the variability between sentences within lexical
categories . The fluctuations caused by semantic context are randomly distributed in these
materials, and future work should explore
interactive arrangements between these lexical variables and semantic context . The sentences on which the current data were based
were normalized to a common level (rms), thus
equating them on the basis of energy. The next
step in this process is to equate the sentences
on the basis of intelligibility based on psychometric functions obtained in the current study.
This refinement will enhance the homogeneity of the corpus of sentences, making these
materials more suitable for an adaptive psychophysical technique .
Acknowledgment . Appreciation is expressed to Sandy
Oba, Amy Schaeffer, Tina Stabinski, and Richard Wright
for their contributions to this project. This project was
made possible by funding to the Auditory Research
Laboratory at Mountain Home, Tennessee, from the
Rehabilitation, Research and Development Service,
Department of Veterans Affairs. Portions of this article
were presented at the Annual Conventions of the
American Academy of Audiology, Phoenix, Arizona, April
1993, and of the Association for Research in
Otolaryngology, St . Petersburg, Florida, February 1994.
REFERENCES
American National Standards Institute. (1996). American
National Standard Specifications for Audiometers . (ANSI
S3-1996) . New York : ANSI.
Bench J, Bamford J, eds. (1979) . Speech-Hearing Tests
and the Spoken Language ofHearing-Impaired Children.
London : Academic Press.
Berger KW. (1969) . Speech discrimination task using
multiple-choice key words in sentences. J Audit Res
9:247-262 .
Bilger RC, Nuetzel JM, Rabinowitz WM, Rzeczkowski
C. (1984). Standardization of a test of speech perception
in noise. J Speech Hear Res 27 :32-48 .
Cotton S, Grosjean F. (1984) . The gating paradigm : a
comparison of'successive and individual presentation formats . Percept Psyehophys 35 :41-48 .
Cox RM, Alexander GC, Gilmore C . (1987) . Development
of the Connected Speech Test (CST). Ear Hear 8(Supp15):
1195-1265.
Dirks DD, Takayanagi S, Moshfegh A, Noffsinger D,
Fausti SA . (2001) . Examination of the neighborhood activation theory in normal and hearing-impaired listeners.
Ear Hear 22 :1-13.
Egan J. (1944) . Articulation Testing Methods II. OSRD
Report No 3802 . Cambridge, MA : Psychoacoustic
Laboratory Harvard University.
Egan J . (1948) . Articulation testing methods.
Laryngoscope 58 :955-991 .
Fletcher H. (1929) . Speech and Hearing. New York : Van
Nostrand .
Goldinger S, Luce P, Pisoni D . (1989) . Priming lexical
neighbors of spoken words: effects of competition and
inhibition. J Memory Lang 28 :501-518 .
Grosjean F. (1980) . Spoken word recognition processes
and the gating paradigm . Percept Psychophys 28 :267-283 .
Hirsh IJ, Davis H, Silverman SR, Reynolds EG, Eldert
E, Benson RW . (1952) . Development of materials for
speech audiometry. J Speech Hear Disord 17 :321-337 .
Hudgins CV, Hawkins JE, Karlin JE, Stevens SS .
(1947) . The development of recorded auditory tests for
measuring hearing loss for speech . Laryngoscope
57 :57-89 .
Jerger J, Speaks C, Trammell J. (1968) . A new approach
to speech audiometry. JSpeech Hear Disord 33 :318-328 .
Kalikow DN, Stevens KM, Elliot LL . (1977) . Development
of a test of speech intelligibility in noise using sentence
materials with controlled word predictability. J Acoust
Soc Am 61 :1337-1351 .
Killion MC, Villchur E . (1993) . Kessler was right partly :
but SIN test shows some aids improve hearing in noise.
Hear J 46 :31-35 .
Journal of the American Academy of Audiology/Volume 12, Number 10, November/December 2001
Kucera F, Francis W (1969) . Computational Analysis of
Present Day English. Providence, RI : Brown University
Press.
Luce PA . (1986) . Acomputational analysis of uniqueness
points in auditory word recognition. Percept Psychophys
39 :155-159 .
Luce PA, Pisoni DB . (1998) . Recognizing spoken words:
the Neighborhood Activation Model. Ear Hear 19 :1-36.
Marlsen-Wilson WD, Tyler LK. (1980) . The temporal
structure of spoken language understanding. Cognition
8:1-71.
Martin FN, Champlin CA, Chambers JA . (1998) . Seventh
survey of audiological practices in the United States . J
Am Acad Audiol 9:95-104.
Miller GA . (1951) . Language and Communication . New
York : McGraw-Hill.
Morton J. (1979) . Facilitation in word recognition: experiments causing change in the logogen model. In : Kolers
PA, Wrolstal ME, Bouma H, eds. Processing of Visible
Language 1 . New York : Plenum, 259-268.
Nakatani LH, Dukes KD . (1973) . A sensitive test of speech
communication quality. JAcoust Soc Am 53 :1083-1092 .
Nilsson M, Soli SD, Sullivan JA . (1994) . Development of
the Hearing in Noise Test for the measurement of speech
reception thresholds in quiet and in noise. JAcoust Soc
Am 95 :1085-1099 .
Nusbaum HC, Pisoni DB, Davis CK. (1984) . Sizing Up
the Hoosier Mental Lexicon: Measuring the Familiarity
of20000 Words . Research on Speech Perception Progress
Report No . 10 . Bloomington, IN : Indiana University Press .
Pisoni D, Luce PA . (1987) . Acoustic-phonetic representation in word recognition. Cognition 25 :21-52 .
Plomp R, Mimpen AM . (1979) . Improving the reliability
of testing the speech reception threshold for sentences .
Audiology 18 :43-52 .
Salasoo A, Pisoni DP. (1985) . Interaction of knowledge
sources in spoken word identification . J Memory Lang
24 :210-231 .
Silverman SR, Hirsh IJ . (1955) . Problems related to the
use of speech in clinical audiometry. Ann Otol Rhinol
Laryngol 64 :1234-1244 .
Smoorenburg GF. (1986) . Speech perception in individuals with noise-induced hearing loss and its implication
for hearing loss criteria . In : Salvi RJ, Henderson D,
Hamernik RP, eds. Basic and Applied Aspects of NoiseInduced Hearing Loss . New York: Plenum Press.
Smoorenburg GF. (1989) . Speech Reception in Quiet and
in Noisy Conditions by Individuals with Noise-Induced
Hearing Loss in Relation to Their Audiogram. Report
1989-11 :1-58. Soesterberg, The Netherlands : TNO
Institute for Perception .
Speaks C, Jerger J. (1965) . Method for measurement of
speech identification . J Speech Hear Res 8:185-194 .
Tillman TW, Carhart R. (1966) . An Expanded Test for
Speech Discrimination Utilizing CNCMonosyllabic Words.
Northwestern University Auditory Test No . 6. Brooks Air
Force Base, TX : USAF School of Aerospace Medicine
Technical Report .
Download