J Am Acad Audiol 12 : 514-522 (2001) Sentence Recognition Materials Based on Frequency of Word Use and Lexical Confusability Theodore S. Bell* Richard H. Wilson t Abstract The sentence stimuli developed in this project combined aspects from several traditional approaches to speech audiometry. Sentences varied with respect to frequency of word use and phonetic confusability. Familiar consonant-vowel-consonant words, nouns and modifiers, were used to form 500 sentences of seven to nine syllables. Based on concepts from the Neighborhood Activation Model for spoken word recognition, each sentence contained three key words that were all characterized as high- or low-use frequency and high or low lexical confusability. Use frequency was determined by published indices of word use, and lexical confusability was defined by a metric based on the number of other words that were similar to a given word using a single phoneme substitution algorithm. Thirty-two subjects with normal hearing were randomly assigned to one of seven presentation levels in quiet, and an additional 32 listeners were randomly assigned to a fixed-level noise background at one of six signal-to-noise ratios . The results indicated that in both quiet and noise listening conditions, high-use words were more intelligible than low-use words, and there was an advantage for phonetically unique words; the position of the key word in the sentence was also a significant factor . These data formed the basis for a sequence of experiments that isolated significant nonacoustic sources of variation in spoken word recognition . Key Words: Neighborhood Activation Model, speech intelligibility, word recognition Abbreviations : CVC = consonant-vowel-consonant, HD = high frequency of use word from a dense neighborhood, HS = high frequency of use word from a sparse neighborhood, LID = low frequency of use word from a dense neighborhood, LS = low frequency of use word from a sparse neighborhood, NAM = Neighborhood Activation Model, SIN = Speech in Noise, SNR = signal-to-noise ratio peech audiometry is used in the evaluation of auditory function as a diagnostic measure and as a measure of communication ability. For the most part, spondaic words and monosyllabic words have been used to assess these two aspects of word recognition abilities (e .g ., Hudgins et al, 1947 ; Egan, 1948 ; Hirsh et S *Department of Communication Disorders, California State University at Los Angeles, Los Angeles, California ; tJames H . Quillen VA Medical Center, Mountain Home, Tennessee, and Departments of Surgery and Communication Disorders, East Tennessee State University, Johnson City, Tennessee Reprint requests : Theodore S . Bell, Department of Communication Disorders, California State University at Los Angeles, 5151 State University Drive, Los Angeles, CA 90032-8170 514 al, 1952 ; Tillman and Carhart, 1966). Diagnostic tests require the sensitivity to discriminate between listeners with normal hearing and patients with various hearing impairments. For diagnostic purposes, the test items should have little redundancy (e.g ., monosyllabic word tests) . It has been known for many years that word tests provide useful but limited information about receptive communication in everyday life by individuals with hearing impairment . The assessment of receptive communication abilities ideally should involve real-life speech materials and real-life listening conditions . This report describes a corpus of sentence materials that ultimately are intended for use in the clinical assessment of speech recognition abilities. The target words in the sentences were selected based on aspects of the Neighborhood Activation Model (NAM) of spoken language (Luce, 1986). Sentence Recognition Materials/Bell and Wilson There are considerable data to support the NAM with regard to spoken word recognition (e .g ., Pisoni and Luce, 1987 ; Goldinger et al, 1989 ; Luce and Pisoni, 1998 ; Dirks et al, 2001) . The NAM assumes that the recognition of spoken words is characterized by a process in which phonetically similar words in memory are organized for perceptual processing. Then the member of the activated set that is most consistent with the acoustic-phonetic information in the speech waveform is selected . Further, it is assumed that word frequency (of occurrence) biases responses toward the more likely, or more frequent, members of the activated neighborhood . The NAM predicts that an increase in the activation level of a stimulus word's similarity neighborhood lowers the probability of identifying the stimulus word itself. Connected speech materials (sentences), which are exemplary of everyday communication, are by definition a valid speech test paradigm for assessing the receptive communication ability of individuals . Since the early development of speech recognition materials, sentence materials have been used to evaluate communication systems and individuals (Fletcher, 1929 ; Egan, 1944 ; Hudgins et al, 1947 ; Silverman and Hirsh, 1955 ; Speaks and Jerger, 1965 ; Kalikow et al, 1977) . Recently, several sentence tests have been developed for use in assessing various aspects of speech recognition function, such as the Connected Speech Test (Cox et al, 1987), Speech in Noise (SIN) (Killion and Villchur, 1993), and Hearing in Noise Test (Nilsson et al, 1994) ; however, few have incorporated sentence materials in a reliable and systematic protocol for routine clinical use (Martin et al, 1998) . The importance of the use of sentences in the auditory evaluation of patients was emphasized by Jerger et al (1968), who stated that sentence tests compared with isolated word tests "manipulate a crucial parameter of ongoing speech, its changing pattern over time" (p . 319) . A sentence provides information about its constituent words by providing the relationships among words . The increased redundancy and semantic cues in sentence materials result in a more rapid rise in the psychometric function as compared to monosyllabic words . There are several formats employed using sentence materials, ranging from simple interrogative sentences, which the subject answers (Fletcher, 1929 ; Hudgins et al, 1947), to target-word formats, in which the subject identifies target words within the sentence (Silverman and Hirsh, 1955 ; Berger, 1969) . Kalikow et al (1977) and Bilger et al (1984) introduced the concept of redundancy more formally into their Speech Perception in Noise test by controlling the predictability of the target word, which was always the final word of the sentence . The difference between words with high and low predictability from the sentence cues may provide a measure of the individual's cognitive and memory capabilities in speech perception . Other sentence tests have been devised that differ in format and composition, including artificial sentences (Speaks and Jerger, 1965), nonsense sentences (Nakatani and Dukes, 1973), and meaningful sentences of everyday language (Bench and Bamford, 1979 ; Plomp and Mimpen, 1979 ; Smoorenburg, 1986, 1989) . Plomp and Mimpen analyzed whole sentences, words, and individual phonemes, concentrating on the analysis of many variables and processes from sentence scores to phoneme scores . These sentences have been employed with an adaptive procedure format to determine specific points (e .g ., 50%) on the psychometric function . The Speech SIN test was based on the Institute of Electrical and Electronics Engineers (IEEE) sentences recorded by a female talker in a background of four-talker babble (Killion and Villchur, 1993) . Each sentence contains five key words that are used for scoring . The SIN is used for evaluation of hearing aids and is presented at four signal-to-noise ratios (SNRs) in 83 and 53 dB SPL of noise in a sound field . Nonacoustic knowledge sources have been known to contribute to the identification of words in normal continuous discourse (e .g ., Marlsen-Wilson and Tyler, 1980 ; Salasoo and Pisoni, 1985) . Marlsen-Wilson and Tyler found that less than half of the acoustic-phonetic code was required to understand words in normal sentence contexts . Further support for this finding comes from studies that have used a stimulus gating paradigm, wherein measures were collected that reflected the minimum acousticphonetic input required for word recognition (Grosjean, 1980 ; Cotton and Grosjean, 1984 ; Salasoo and Pisoni, 1985) . As Miller (1951) demonstrated, these studies showed that less stimulus information was required to identify words in sentences than to identify the same words in isolation . The results of Grosjean's (1980) gating experiments suggested that incorrect responses in a recognition task included not only acoustically similar words but also semantically related words . These data were used by Grosjean to refute the claim that only acoustic-phonetic information was used to compose the set of possible lexical candidates . He 515 Journal of the American Academy of Audiology/Volume 12, Number 10, November/December 2001 concluded that a model similar to Morton's (1979) interactive logogen model was required to explain these data ; it was suggested that both acoustic and nonacoustic knowledge sources interacted when possible word candidates were selected by listeners. Salasoo and Pisoni (1985) provided support for Marlsen-Wilson and Tyler's "principle of bottom-up priority." Acoustic-phonetic patterns are the primary source of information used to form a set of lexical candidates accessible from long-term memory, although semantic and syntactic information available from sentence contexts also provides additional candidates to the pool of potential words. The balance between these sources of knowledge in bottom-up and top-down processes allows the listener to comprehend speech even when the encoding is impoverished either by noise or sensory impairment . Assuming that the acoustic-phonetic code is degraded for listeners with hearing loss as compared with listeners with normal hearing, the degraded stimulus leads to an inherently larger neighborhood . The impoverished sensory encoding of the impaired auditory systems leads to acoustic-phonetic encodings that are "fuzzy," resulting in greater similarity to other words. Thus, one consequence of an impaired auditory system is that lexical similarity neighborhoods are larger and word frequency of occurrence effects is diminished because the speech stimuli are inherently ill defined. If the NAM is applied to listeners with hearing impairment, the size of the neighborhood would be dense because of the increased uncertainty from the degraded encoding, leading to a larger set of alternatives in the word recognition process, and although a high frequency of occurrence words would be more likely, the ratio of frequency of occurrence to neighborhood frequency would be dominated by the neighborhood frequencies. The speech stimuli developed in this report combine aspects from several traditional approaches to speech audiometry. Sentences vary with respect to redundancy and semantic context, and the individual word constituents vary with respect to word frequency of use and phonetic confusability. The target words are embedded in sentences in a format similar to the Plomp and Mimpen (1979) or Bench and Bamford (1979) sentences. An important difference, however, is that the target words vary with respect to word frequency of occurrence and word confusability. The sentences developed are representative of everyday speech ; specifically, the sentences are brief and easy to repeat. Phone516 mic content is equivalent across lists . The data reported here provide the basis for a speech test with potential clinical applications . A protocol based on these materials could improve evaluation of speech communication problems associated with hearing impairment by addressing receptive speech problems beyond issues related to simple audibility of the signal . METHOD Materials Monosyllabic consonant-vowel-consonant (CVC) stimuli were selected as target words on the basis of use frequency, lexical confusability, and familiarity ratings . Here, monosyllabic refers to spoken words as opposed to written words. Words containing syllabic l, m, or n are considered monosyllabic when spoken but polysyllabic when written. Word use frequency was based on the Computational Analysis of Present Day English (Kucera and Francis, 1969), in which samples of everyday reading materials were analyzed for individual word use expressed as the number of times the word was found per million words sampled . Lexical confusability was defined as the number of other words in the language that are phonetically similar to a given target word . A word was considered to be similar to the target word if it differed by a single phoneme. A "single phoneme substitution" rule was then employed in which a word was considered similar if the word could be created by substituting one phoneme. The terminology advanced by Luce (1986) in the NAM of word recognition is used . Aword is considered "sparse" if it is relatively phonetically unique, that is, similar words are few in number. A word is considered "dense" if it is phonetically similar to many other words in the lexicon . Sparse and dense metaphorically refer to "similarity neighborhoods" in an assumed representation of the mental lexicon. The categorization of low- versus high-use frequency words was based on the entire set of familiar monosyllabic CVC words in the pocket lexicon. Familiarity was described by a 7-point rating scale applied to the entire pocket lexicon rated by Indiana University undergraduate student volunteers (Nusbaum et al, 1984). For the current protocol, only words rated greater than 6.5 (highly familiar) were selected . The words were sorted on the basis of use frequency with the upper and lower thirds of the distribution selected and labeled high and low use, respec- Sentence Recognition Materials/Bell and Wilson Table 1 Example Sentences for Each of the Four Categories Word Frequency Density cheese 9 7 13 11 2 2 19 30 High use, sparse The point of the knife is too sharp . The breeze helped to clear the fog . dent dine roam point fog 395 25 8 10 High use, dense The rope has been tied in a knot . knot 4617 264 1662 27 21 23 Lexical Category Low use, sparse The lump of cheese has turned sour. The chops will sizzle in the blaze. Toss the crab onto the barge . Low use, dense The dent in his new bike made him yell . I like to sip pop while I dine The lamb likes to roam around the moat . The rebel has a large horse to mount. Use a fan to keep the room cool. Heat some pea soup over the fire . blaze barge large keep some 7 6 361 9 28 7 The target words in each sentence are italicized, Use frequency (count per million) and density (number of words similar by single phoneme substitution) for a selected target word in each sample sentence are also provided . tively. Within the use categories, the words were again sorted on the basis of lexical similarity. Within each category, words of high (dense) and low (sparse) lexical confusability were determined using a tertiary split. The following four categories of words formed a 2 x 2 factorial arrangement of high and low word use frequency and dense and sparse lexical neighborhoods : low frequency of use, sparse neighborhood (LS) ; low frequency of use, dense neighborhood (1,D) ; high frequency of use, sparse neighborhood (HS) ; and high frequency of use, dense neighborhood (HD) . A pool of approximately 1800 words resulted from this selection process . Each sen- tence contained 3 key words selected from within the same lexical category to form 500 sentences of seven to nine syllables . Examples of sentences from each lexical category are listed in Table 1 . The corresponding use frequency (count per million) and density (number of words similar by single phoneme substitution rule) for selected words from each category are also presented in Table 1 . The target words were embedded in sentences in a format similar to the Plomp and Mimpen (1979) and Bench and Bamford (1979) sentences . An important difference, however, was that the target words varied with respect to word frequency of occurrence and word confusability. A female speaker with a standard Midwestern dialect recited three repetitions of each sentence while seated in a sound-attenuated audiometric test chamber. A low-noise microphone (AKG, Model C460-B) and preamplifier (Symetrix, Model SX202) were situated 7.5 cm from the talker at a 20-degree angle of incidence. The sentences were recorded on digital audiotape (Sony, Model 59ES). Levels were monitored using an oscilloscope throughout the single session in which the entire corpus was recorded . The sentences were screened for intonation pattern, mispronunciations, peak clipping, and extraneous noises . The best example of each recorded sentence was then selected and transferred to a digital waveform editor (Kay Elemetrics, Computer Speech Lab, Model 1600-B). Criteria included fidelity, dynamic range, extraneous sounds, pronunciation errors, naturalness, and prosody. The overall rms level of each sentence was determined as the median rms of overlapping 20-msec windows. Following this analysis, the level of each of the individual sentences was adjusted so that every sentence had an identical rms value of x-0.1 dB . This procedure for adjusting the level of the sentence was accomplished by computing a scaling factor needed to produce the median rms decibel level using the relation scaling factor = l0 dB/2°, where AdB is the difference between the nominal median level and the actual level of the recorded utterance . The correction was applied by multiplying the voltage (D/A counts) of individual sentence waveform by the computed scaling factor. The average scaling factor was <2 dB . Any sentence that required scaling that produced clipping of the waveform was discarded, and an alternate sentence was substituted if available from the original corpus of recorded materials . The sentences were then transferred back onto 517 Journal of the American Academy of Audiology/Volume 12, Number 10, November/December 2001 digital audiotape for experimental presentation . A 1000-Hz calibration signal was set to coincide with the overall rms level of the adjusted sentence lists. Subjects In experiment 1 (quiet background), 32 paid listeners were recruited . Experiment 2 (noise background) involved an additional group of 32 paid subjects . All participants were determined to have pure-tone thresholds less than 20 dB HL (ANSI, 1996) at octave frequencies between 250 and 8000 Hz and normal middle ear function as determined by otoscopic examination and aural acoustic immittance measures . Speech audiometry measures were normal for all subjects . There were no exclusion criteria with regard to gender or ethnicity of the participants . The ages of the subjects varied between 18 and 48 years . English was the first language of all participants . Procedure Following audiometric evaluation, subjects continued with a 45-minute test session during which they were presented sentences in quiet or noise. Following a brief familiarization period, the sentences were presented in four blocks of 120 sentences corresponding to each of the four lexical test conditions (LS, LD, HS, HD). The test sentences originated from a digital audiotape tape deck (Sony, Model 59ES) that was routed through an audiometer (Grason-Stadler, Model 16) to earphones (TDH-50P) with Telephonics cushions (P/N 510C017-1) to the subject, who was seated in a double-walled audiometric test booth. In experiment 1 (quiet), the subjects were randomly assigned to one of five presentation levels (14-22 dB SPL in 2-dB steps) . In experiment 2 (noise), the subjects were randomly assigned to one of five SNRs (-8 to 0 dB SNR in 2-dB steps) to form five groups of five to seven listeners in each experiment . The noise was spectrally shaped to match the long-term rms frequency contour of the speech materials. The noise was presented at 70 dB SPL, and the level of the speech varied to produce the SNRs . The selection of presentation levels for the experiment was based on preliminary data that established the upper and lower limits of the psychometric function relating presentation level to recognition performance. In both experiments, the entire corpus of sentences was presented in randomized blocks, and the subjects'task was to repeat the 518 sentence . The experimenter monitored the subject responses and scored each of the three key words in each sentence for accuracy. The dependent variable is the same in both experiments (i .e ., word recognition) . In experiment 1, the independent variables are word use, lexical confusability, target word position, and presentation level . In experiment 2, the independent variables are word use, lexical confusability, target word position, and SNR . RESULTS Experiment 1: Sentence Recognition in Quiet The percentage of target words correctly recognized was determined for each of the four sentence categories as a function of presentation level and word location in the sentence (Fig . 1 and Table 2, upper panel) . An analysis of variance (ANOVA) was performed with four factors, with presentation level forming a between-subject factor (14, 16, 18, 20, and 22 dB SPL) and word use (low, high), lexical similarity neighborhood (sparse, dense), and position of the key word (first, second, third) in the sentence forming within-subject factors. First, the high-use words (open symbols) were significantly more intelligible by 10 to 30 percent than the low-use words (filled symbols) (F = 22 .7, df = 1, 18, p < .001), even though all words were familiar to the listeners and presented at equal rms levels . This significant difference was apparent at all presentation levels, regardless of word position . Second, there was a significant effect for lexical confusability of the key words in the sentences (F = 71 .4, df = 1, 18, p < .0001), with sparse words (squares) more intelligible than dense words (circles). This significant difference was diminished at the higher presentation levels and was largest at the lower presentation levels . The lexical confusability effect (i .e ., sparse versus dense words) is generally smaller than the word use effect and also tends to be larger for high-use words. Third, there was also a significant difference in key word intelligibility as a function of its position in the sentences (F = 48 .6, df = 2, 36, p < .0001) (Fig . 2, upper panel) . The third word was less intelligible than the first and second words in the sentences. This difference may be the result of the natural inflection of the sentences ; however, the pattern of results attributable to word use and lexical confusability did Sentence Recognition Materials/Bell and Wilson Table 2 Percent Correct Recognition (and SDs) for the Three Word Positions and the Four Lexical Categories for the Quiet and Noise Conditions LS1 156 (9 .3) 33 .2 (97) 37 .9 (11 .5) 49 .5 (33 .1) 56 .6 (9 .9) LS3 150 (7 .7) 33 .2 (9 .8) 40 .0 (10 .4) 49 .4 (29 .6) 56 .9 (6 .5) 194 LD1 LD2 141 17 .5 LD3 10 .3 (9 .9) (8 .0) (7 .5) (8 .4) HS1 HS2 394 (14.3) 46 .9 (12 .8) HD1 HD2 32 .3 27 .8 HS3 35 .3 (17 .6) HD3 21 .7 (7 .2) (6 .4) (5 .5) 37 .2 26 .0 26 .2 (7 .2) (9 .9) (9 .6) 22 .0 (13 .5) 55 .5 (9 .5) 62 .1 (11 .3) 48 .1 (15 .0) 46 .4 (9 .8) 43 .4 (11 .1) 18 dB SPL 22 dB SPL 14 dB SPL LS2 16 dB SPL 20 dB SPL Quiet 40 .8 (18 .1) 51 .7 (31 .9) 29 .2 (144) 49 .2 (29 .1) 34 .7 (14 .4) 52 .4 (12 .7) 47 .8 (10.6) 36 .9 (11 .2) 48 .7 (17 .3) 68 .4 (19 .8) 79 .6 67 .7 (22.2) 68 .0 (22 .5) 60.8 (23 .5) Noise -8 de SNP -6 de SNR -4 dB SNR -2 dB SNR LS1 11 .8 (12 .5) 28 .1 (6 .0) 51 .7 (9 .5) 72 .9 17 .6 (8 .8) 34 .3 (7 .5) 59 .4 (12 .2) 70 .3 66 .1 (5 .6) (8 .5) LS2 LS3 LD1 10 .8 4.9 (4 .2) (6 .6) 8.8 2.1 (5 .0) (3 .0) 16 .8 5.3 (4 .2) (5 .6) 30 .1 33 .4 5 .6 LD2 LD3 HS1 HS2 34 .5 31 7 HD2 HD3 18 .8 8.3 HS3 HD1 176 206 (4 .2) (9 .5) (9 .4) (8 .5) (5 .2) 26 .4 (10.4) 15 .3 (7 .2) (9 .2) (3 .5) 52 .1 (10 .2) 48 .7 (12.2) (6 .4) (8 .0) 29 .4 (13 .2) 15 .4 (7 .0) 50 .0 (6 .8) 57 .8 (10.5) 52 .3 39 .6 (5 .5) 90 .2 87 .5 (8 .2) (9 .8) 78 .6 63 .4 (8 .5) (8 .8) 75 .4 81 .9 (9 .0) (9 .4) (6 .3) (7 .8) 76 .2 (7 .4) (5 .0) (5 .7) (7 .6) 0 d6 SNR 79 .3 (11 .2) 72 .1 68 .2 (6 .6) (9 .4) 72 .4 57 .1 (6.6) (6 .0) 74 .6 61 .4 (12.4) 41 .4 (9 .8) (4 .2) (3 .5) 78 .4 77 .5 74 .3 68 .5 74 .0 (13.5) 63 .3 (8 .8) 45 .9 (10.2) 39 .0 (9 .4) 38 .3 19 .4 57 .5 (12.2) 47 .5 (25 .5) 69 .7 (19.6) 71 .8 (17.9) 55 .7 (167) 61 .1 (14 .3) (9 .5) 54 .7 (11 .1) 56 .0 (27.8) 26 .5 (15 .7) 53 .5 (21 .1) 45 .4 (13 .8) 56 .9 91 .6 90 .4 (8 .4) (6 .2) 85 .6 88 .4 (6.0) (5 .2) (5 .4) (5 .0) (9 .5) 87 .6 (9 .8) 75 .2 (11 .2) LS = low use, sparse, LID = low use, dense, HS = high use, sparse, HD = high use, dense . response patterns in noise mirrored the patterns observed in quiet. First, high-use words (open symbols) were more intelligible than lowuse words (filled symbols) (F = 32 .4, df = 1, 18, not remain constant for each of the key word positions, indicated statistically by a three-way interaction among use, density, and word position (F = 23 .4, df = 2, 36, p < .0001) . Word position is included in both experiments to test the homogeneity of intelligibility of words within sentences . 100 Experiment 2: Sentence Recognition in Noise 80 As in the previous analysis, the percentage of key words correctly recognized was determined for each of the four sentence categories as a function of SNR (-8, -6, -4, -2, 0 dB SNR), with the background noise presented at 70 dB SPL (Fig . 3 and Table 2, lower panel) and word location in the sentence (see Fig. 2, lower panel) . An ANOVA was performed with four factors, with SNR forming a between-subject factor and word use (low, high), lexical similarity neighborhood (sparse, dense), and position of the key word (first, second, third) in the sentence forming within-subject factors . Again, the ANOVA indicated three significant differences. As shown in Figure 3, the 60 40 20 0 -8 -6 -4 -2 Signal-to-Noise Ratio (dB) 0 Figure 1 Psychometric functions relating percent correct to presentation level in quiet as a function of word use frequency (low use and high use) and lexical-phonetic confusability (sparse and dense) . HS = high use, sparse ; LS = low use, sparse ; HD = high use, dense; LD = to low use, dense. 519 Journal of the American Academy of Audiology/Volume 12, Number 10, November/December 2001 100 Quiet r_ 0 m 40 U 20 70 0 60 14 50 HS 40 HD LS 30 - LD 1 2 Word Position in Sentence 3 Figure 2 Average percent correct for word use frequency (low and high use) and lexical-phonetic confusability (sparse and dense) as a function of word position in sentences (first, second, or third) . HS = high use, sparse ; LS = low use, sparse ; HD = high use, dense; LD = low use, dense. p < .0001), even though all words were familiar to the listeners and presented at equal SNRs . This significant difference was apparent at all levels, regardless of the word position in the sentences (Fig . 2, lower panel) . Second, there was a significant effect for lexical confusability of the key words in the sentences (F = 23 .6, df = 1, 18, p < .0001), shown in Figure 3 with sparse words (squares) more intelligible than dense words (circles). The data from noise conditions are less variable (see Table 2), and the relative size of the lexical effects does not vary significantly with SNR. These data are similar to those obtained in the first experiment. Third, there was also a significant difference in key word intelligibility as a function of the position of the word in the sentence (F = 39 .0, df = 2, 36, p < .0001). Figure 2 (lower panel) displays the mean values for word position for each of the four lexical categories collapsed across SNRs . The third word again was less intelligible than the first and second words in the sentences . DISCUSSION T 16 18 20 22 Presentation Level (dB SPL) he fact that the difference in speech recognition scores between lexical conditions Figure 3 Psychometric functions relating percent correct to the signal-to-noise ratio in a 70 dB SPL noise background as a function of word use frequency (low use and high use) and lexical-phonetic confusability (sparse and dense) . HS = high use, sparse ; LS = low use, sparse ; HD = high use, dense; LD = low use, dense. was as large as 30 percent demonstrates that nonacoustic sources contribute significantly to spoken word recognition and thus partly explains why speech recognition is so difficult to reliably or efficiently measure. By isolating these nonacoustic factors, speech recognition can be more effectively assessed in research and clinical settings, potentially separating peripheral and central contributions to speech perception. It is important to stress that all of the key words in the sentences were simple, familiar monosyllables and that the difference obtained between low- and high-use words does not reflect language proficiency or vocabulary . For example, the word "cheese" in Table 1 is a relatively low-use word, but it is commonly understood by all. Because of the large number of target words, possible phonemic peculiarities are minimized. The third word typically was 5 to 10 percent lower than other key words. The observation that the third target word in the sentences was less intelligible than the target words in the first or second position may be attributable to several causes . The most likely reason is the natural decrease in level and articulation common at the end of spoken sentences. It may be possible to reduce the word to word variability in these sentences by rescaling the speech and noise to improve the level or SNR selectively for the third target word of the sentences. The effect of word position is relatively insignificant when scoring target words . If scoring is based on full sentence intelligibility (i .e ., all or none), then this Sentence Recognition Materials/Bell and Wilson phenomenon with the third word is a significant drawback because the sentence score will, in essence, reflect the weakest word, which the current data indicate are typically the final word of the sentence . As expected, the variability of speech recognition scores obtained in noise background listening conditions was less than that obtained under quiet listening conditions . The dynamic range of the psychometric functions in quiet and noise was similar, as were the slopes of the functions. This result supports the internal validity of the measurements obtained in this study. The use of sentence materials addresses the issue of validity as well, especially in the ability to generalize beyond the laboratory or clinic to real-life speech samples. The test based on these materials can be very efficient because of the reduced variability as the result of isolating extraneous variation in percentage scores . Further, the use of a sentence format gives the instrument both validity and efficiency. Validity derives from the fact that sentence materials are used, composed entirely of simple and familiar constituents . Efficiency derives from the fact that three words are tested within a single trial and also from the reduced variability. Currently, we are in the process of refining these materials to reduce further the variability between sentences within lexical categories . The fluctuations caused by semantic context are randomly distributed in these materials, and future work should explore interactive arrangements between these lexical variables and semantic context . The sentences on which the current data were based were normalized to a common level (rms), thus equating them on the basis of energy. The next step in this process is to equate the sentences on the basis of intelligibility based on psychometric functions obtained in the current study. This refinement will enhance the homogeneity of the corpus of sentences, making these materials more suitable for an adaptive psychophysical technique . Acknowledgment . Appreciation is expressed to Sandy Oba, Amy Schaeffer, Tina Stabinski, and Richard Wright for their contributions to this project. This project was made possible by funding to the Auditory Research Laboratory at Mountain Home, Tennessee, from the Rehabilitation, Research and Development Service, Department of Veterans Affairs. Portions of this article were presented at the Annual Conventions of the American Academy of Audiology, Phoenix, Arizona, April 1993, and of the Association for Research in Otolaryngology, St . Petersburg, Florida, February 1994. REFERENCES American National Standards Institute. (1996). American National Standard Specifications for Audiometers . (ANSI S3-1996) . New York : ANSI. Bench J, Bamford J, eds. (1979) . Speech-Hearing Tests and the Spoken Language ofHearing-Impaired Children. London : Academic Press. Berger KW. (1969) . Speech discrimination task using multiple-choice key words in sentences. J Audit Res 9:247-262 . Bilger RC, Nuetzel JM, Rabinowitz WM, Rzeczkowski C. (1984). Standardization of a test of speech perception in noise. J Speech Hear Res 27 :32-48 . Cotton S, Grosjean F. (1984) . The gating paradigm : a comparison of'successive and individual presentation formats . Percept Psyehophys 35 :41-48 . Cox RM, Alexander GC, Gilmore C . (1987) . Development of the Connected Speech Test (CST). Ear Hear 8(Supp15): 1195-1265. Dirks DD, Takayanagi S, Moshfegh A, Noffsinger D, Fausti SA . (2001) . Examination of the neighborhood activation theory in normal and hearing-impaired listeners. Ear Hear 22 :1-13. Egan J. (1944) . Articulation Testing Methods II. OSRD Report No 3802 . Cambridge, MA : Psychoacoustic Laboratory Harvard University. Egan J . (1948) . Articulation testing methods. Laryngoscope 58 :955-991 . Fletcher H. (1929) . Speech and Hearing. New York : Van Nostrand . Goldinger S, Luce P, Pisoni D . (1989) . Priming lexical neighbors of spoken words: effects of competition and inhibition. J Memory Lang 28 :501-518 . Grosjean F. (1980) . Spoken word recognition processes and the gating paradigm . Percept Psychophys 28 :267-283 . Hirsh IJ, Davis H, Silverman SR, Reynolds EG, Eldert E, Benson RW . (1952) . Development of materials for speech audiometry. J Speech Hear Disord 17 :321-337 . Hudgins CV, Hawkins JE, Karlin JE, Stevens SS . (1947) . The development of recorded auditory tests for measuring hearing loss for speech . Laryngoscope 57 :57-89 . Jerger J, Speaks C, Trammell J. (1968) . A new approach to speech audiometry. JSpeech Hear Disord 33 :318-328 . Kalikow DN, Stevens KM, Elliot LL . (1977) . Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. J Acoust Soc Am 61 :1337-1351 . Killion MC, Villchur E . (1993) . Kessler was right partly : but SIN test shows some aids improve hearing in noise. Hear J 46 :31-35 . Journal of the American Academy of Audiology/Volume 12, Number 10, November/December 2001 Kucera F, Francis W (1969) . Computational Analysis of Present Day English. Providence, RI : Brown University Press. Luce PA . (1986) . Acomputational analysis of uniqueness points in auditory word recognition. Percept Psychophys 39 :155-159 . Luce PA, Pisoni DB . (1998) . Recognizing spoken words: the Neighborhood Activation Model. Ear Hear 19 :1-36. Marlsen-Wilson WD, Tyler LK. (1980) . The temporal structure of spoken language understanding. Cognition 8:1-71. Martin FN, Champlin CA, Chambers JA . (1998) . Seventh survey of audiological practices in the United States . J Am Acad Audiol 9:95-104. Miller GA . (1951) . Language and Communication . New York : McGraw-Hill. Morton J. (1979) . Facilitation in word recognition: experiments causing change in the logogen model. In : Kolers PA, Wrolstal ME, Bouma H, eds. Processing of Visible Language 1 . New York : Plenum, 259-268. Nakatani LH, Dukes KD . (1973) . A sensitive test of speech communication quality. JAcoust Soc Am 53 :1083-1092 . Nilsson M, Soli SD, Sullivan JA . (1994) . Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. JAcoust Soc Am 95 :1085-1099 . Nusbaum HC, Pisoni DB, Davis CK. (1984) . Sizing Up the Hoosier Mental Lexicon: Measuring the Familiarity of20000 Words . Research on Speech Perception Progress Report No . 10 . Bloomington, IN : Indiana University Press . Pisoni D, Luce PA . (1987) . Acoustic-phonetic representation in word recognition. Cognition 25 :21-52 . Plomp R, Mimpen AM . (1979) . Improving the reliability of testing the speech reception threshold for sentences . Audiology 18 :43-52 . Salasoo A, Pisoni DP. (1985) . Interaction of knowledge sources in spoken word identification . J Memory Lang 24 :210-231 . Silverman SR, Hirsh IJ . (1955) . Problems related to the use of speech in clinical audiometry. Ann Otol Rhinol Laryngol 64 :1234-1244 . Smoorenburg GF. (1986) . Speech perception in individuals with noise-induced hearing loss and its implication for hearing loss criteria . In : Salvi RJ, Henderson D, Hamernik RP, eds. Basic and Applied Aspects of NoiseInduced Hearing Loss . New York: Plenum Press. Smoorenburg GF. (1989) . Speech Reception in Quiet and in Noisy Conditions by Individuals with Noise-Induced Hearing Loss in Relation to Their Audiogram. Report 1989-11 :1-58. Soesterberg, The Netherlands : TNO Institute for Perception . Speaks C, Jerger J. (1965) . Method for measurement of speech identification . J Speech Hear Res 8:185-194 . Tillman TW, Carhart R. (1966) . An Expanded Test for Speech Discrimination Utilizing CNCMonosyllabic Words. Northwestern University Auditory Test No . 6. Brooks Air Force Base, TX : USAF School of Aerospace Medicine Technical Report .