This article was downloaded by: [International Society of Audiology] On: 18 January 2010 Access details: Access Details: [subscription number 789957306] Publisher Informa Healthcare Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 3741 Mortimer Street, London W1T 3JH, UK International Journal of Audiology Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713721994 Polish sentence tests for measuring the intelligibility of speech in interfering noise Edward Ozimek a; Dariusz Kutzner a; Aleksander Sęk a; Andrzej Wicher a a Institute of Acoustics, A. Mickiewicz University, Poznań, Poland To cite this Article Ozimek, Edward, Kutzner, Dariusz, Sęk, Aleksander and Wicher, Andrzej(2009) 'Polish sentence tests for measuring the intelligibility of speech in interfering noise', International Journal of Audiology, 48: 7, 433 — 443 To link to this Article: DOI: 10.1080/14992020902725521 URL: http://dx.doi.org/10.1080/14992020902725521 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. Original Article International Journal of Audiology 2009; 48:433443 Edward Ozimek Dariusz Kutzner Aleksander Se˛k Andrzej Wicher Institute of Acoustics, A. Mickiewicz University, Poznań, Poland Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010 Key Words Speech intelligibility Sentence test Psychometric function Speech reception threshold Abbreviations CCITT: International Telegraph and Telephone Consultative Committee SNR: Signal-to-noise ratio SRT: Speech reception threshold Polish sentence tests for measuring the intelligibility of speech in interfering noise Abstract Sumario The aim of this study was to develop Polish sentence tests for accurate measuring of speech intelligibility in masking interfering noise. Two sets of sentence lists have been developed. The first set was composed of 25 lists and was used for sentence intelligibility scoring. The second set was composed of 22 lists and was used for word intelligibility scoring. The lists in each set have been phonemically and statistically balanced. The speech reception threshold (SRT) and slope of the psychometric function at the SRT point (S50) were determined in normal-hearing subjects. It was found that the mean SRT and mean list-specific S50list for the first set were equal to 6.1 dB and 25.5%/dB, respectively. The mean SRT and the mean list-specific S50list for the second set were:7.5 dB and 20.8%/dB. Due to a relatively steep slope of the psychometric functions, the Polish sentence tests were shown to be accurate materials for speech intelligibility measurements against interfering noise. They are the first sentence speech-in-noise tests developed for Slavonic languages. El objetivo de este estudio fue desarrollar pruebas de frases en Polaco para medir con exactitud la inteligibilidad del lenguaje en medio de ruido de balbuceo enmascarante. Dos grupos de listas de frases han sido desarrolladas. El primer grupo se compuso de 25 listas y se uso para puntaje de inteligibilidad de frases. El segundo grupo se compuso de 22 listas y se uso para puntaje de inteligibilidad de palabras. Las listas en cada grupo fueron balanceadas fonémica y estadı́sticamente. El umbral de recepción del lenguaje (SRT) y la pendiente de la función psicométrica en el punto del SRT (S50) se determinaron en individuos normo-oyentes. Se encontró que el SRT medio y la lista especı́fica S50list media para el primer grupo eran iguales a 6.1 dB y 25.5% dB, respectivamente. El SRT medio y la lista especı́fica S50list media para el segundo grupo fueron: 7.5 dB y 20.8% dB. Debido a la pendiente relativamente empinada de las funciones psicométricas, se vio que las pruebas de frases en Polaco eran materiales exactos para medidas de inteligibilidad del lenguaje contra el ruido de interferencia. Ellas constituyen la primera prueba de frases en ruido desarrolladas para las lenguas eslávicas. Many methods have been proposed for measuring speech intelligibility in quiet and against an interfering noise (Kalikow et al, 1977; Plomp & Mimpen, 1979; Hagerman, 1982; Nilsson et al, 1994; Pruszewicz et al, 1994a,b; Kollmeier & Wesselkamp, 1997; Brachmański & Staroniewicz, 1999; Versfeld et al, 2000; Wagener et al, 2003; Smits et al, 2004). The methods differ in such aspects as the structure of the speech material, details of the test procedure, presentation level, range of the signal-to-noise ratio, type of interfering noise, and presentation mode. As far as the structure of the speech material is concerned, one can distinguish two basic types of tests: sentence intelligibility tests (Plomp & Mimpen, 1979; Nilsson et al, 1994; Kollmeier & Wesselkamp, 1997; Versfeld et al, 2000), and word intelligibility tests (Runge & Hosford-Dunn, 1985; Pruszewicz et al, 1994a,b; Bosman & Smoorenburg, 1995; Martin, 1997). The sentence tests can be divided into those using meaningful, everyday utterances (Plomp & Mimpen, 1979; Smoorenburg, 1992; Nilsson et al, 1994; Kollmeier & Wesselkamp, 1997; Versfeld et al, 2000; Ozimek et al, 2007), and those using semantically unpredictable sentences (Hagerman, 1982; Wagener et al, 1999a,b,c; Wagener et al, 2003). Many sentence tests have been developed so far for measuring the speech reception threshold (SRT) in noise, defined as the signal-to-noise ratio (SNR) expressed in dB that yields 50% speech intelligibility. They are based on utterances taken from everyday speech communication (Kalikow et al, 1977; Plomp & Mimpen, 1979; Nilsson et al, 1994; Kollmeier & Wesselkamp, 1997; Versfeld et al, 2000). Some pa- pers have shown that the speech intelligibility score is mainly determined by the SNR (Smoorenburg, 1992), but according to others, the intelligibility depends both on the SNR and on the presentation level (Hagerman, 1982; Studebaker et al, 1999) as well as on the speaker (Versfeld et al, 2000). The second important speech intelligibility test is based on word material. Speech materials used in word intelligibility assessments include logatomes, i.e. nonsense words (Welge-Lüssen et al, 1997; Brachmański & Staroniewicz, 1999), numbers, and monosyllabic words (Pruszewicz et al, 1994a,b). Another test category comprises tests based on sequences of digit pairs and digit triplets (Smits et al, 2004; Wagener et al, 2005; Wilson et al, 2005; Ozimek et al, 2009). It is suggested by some authors that, as compared with word intelligibility tests, sentence intelligibility tests have proved to be in many cases more accurate as a speech intelligibility measure (Plomp & Mimpen, 1979; Hagerman, 1982; Nilsson et al, 1994; Kollmeier & Wesselkamp, 1997; Versfeld et al, 2000; Wagener et al, 2003). They usually produce steeper psychometric functions than tests based on single words (Bosman & Smoorenburg, 1995). Furthermore, word intelligibility tests are less suited for use with a fluctuating noise masker due to listening in dips (which might improve subjects’ performance), or signal processing algorithms with long time constants (such as compression). An interesting comparison of the data obtained with word and sentence intelligibility tests, applied to two groups of subjects (with normal hearing and with sensorineural hearing loss), is given in Wilson et al (2007). ISSN 1499-2027 print/ISSN 1708-8186 online DOI: 10.1080/14992020902725521 # 2009 British Society of Audiology, International Society of Audiology, and Nordic Audiological Society Edward Ozimek Institute of Acoustics, A. Mickiewicz University, Ul. Umultowska 85, 61-614 Poznań, Poland. Email: ozimaku@amu.edu.pl Received: March 1, 2008 Accepted: January 5, 2009 Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010 Differences between recognition of the whole sentence and identification of single, isolated words is usually related to context. Everyday utterances contain a lot of contextual information that helps the listener to deduce unintelligible words in the presented sentence. However, there is no general agreement among authors, whether this cognitive issue increases or decreases applicability and reliability of the tests. On the one hand, the tests are designed to reflect the natural communication process that context is a part of. In general, tests including context information enable more accurate speech intelligibility measurement than semantically unpredictable sentences (Brand & Kollmeier, 2002). On the other hand, the cognitive abilities might not be constant across subjects; it could introduce additional variability of data (Bronkhorst et al, 2002), which is not the case for tests based on semantically unpredictable sentences. The present study deals with the preparation and evaluation of Polish sentence tests to be used for accurate measurement of the SRT in noise. The tests are similar to the Dutch sentence tests (Plomp & Mimpen, 1979; Versfeld et al, 2000), the American sentence test (Nilsson et al, 1994), and the German sentence test (Kollmeier & Wesselkamp, 1997). Similarity means that all these tests comprise semantically neutral ‘everyday’ utterances of unfixed grammar structure. At present, there are only a few Polish tests for the measurement of speech intelligibility. One word test developed by Pruszewicz et al (1994a,b) consists of 10 word lists. Each list contains 20 words representing the most frequent monosyllabic Polish nouns. Another Polish speech test, the so-called Corpora (Grocholewski, 2001) database consists of a collection of short sentences (114) and numbers (20 in each set), pronounced by about 70 different speakers. It has been used mainly in the study of automatic speech recognition. There is also a Polish test based on a collection of 20 logatome sets, each set composed of three lists and each list containing 100 logatomes (Brachmański & Staroniewicz, 1999). This test has been mainly used for assessment of the quality of electroacoustic and transmission systems. Despite the fact that Polish is used by a population of about 50 million people, there is no Polish sentence test optimized for measuring speech intelligibility against a background noise. This fact prompted us to develop such a test, designed for adults. Speech materials used for reliable measurements of intelligibility should produce very steep psychometric functions to permit the detection of changes in intelligibility resulting from small differences in signal-to-noise ratio. Moreover, to keep a high accuracy of the measurement, the intelligibility across different lists should not vary significantly. Therefore, the test lists should yield statistically equivalent results, i.e. the SRT, the slope of the psychometric function (S50) (see formula (2)), at the SRT point, should not vary significantly across the lists chosen for speech intelligibility measurement. Development of speech material Preparation and recording of the sentence material INITIAL SELECTION OF WRITTEN SENTENCES them fulfilled the definition of a sentence, i.e. they included a subject, an object, and a verb, and contained normal everyday contexts. The following criteria were used in the automatic selection of the sentences (Versfeld et al, 2000): the total number of syllables in a sentence should be eight or nine; the words in the sentences should not contain more than three syllables; and the sentences should not contain punctuation characters and capitals (excluding the initial capital). No duplicate sentences were selected. The second stage of sentence selection was conducted manually on the basis of the following criteria: the sentence should be grammatically and syntactically correct and semantically neutral; namely political, war, or gender topics were excluded. Questions, proverbs, proper names, and exclamations were eliminated. This process reduced the set of sentences to 2000. In the Polish language, unlike in English, all verbs are subject to declension with respect to the speaker’s gender. The phonemes distinguishing the present and the past tense are usually characterized by a relatively low energy. The low energy of these grammatically important morphemes might result in more errors and, consequently, increased scatter of the data both within a given sentence list and between lists. Certain verbs that might lead to ambiguities during the listening sessions and data analysis were rejected. This limited the final set of sentences to 1200. The sentences chosen were composed of three to seven words (average 4.6 words per sentence) or eight to nine syllables, as suggested by Plomp & Mimpen (1979) and Versfeld et al (2000). The minimal and maximal number of morphemes across the sentences was six and nine, respectively (average 7.2 morphemes per sentence). To see whether the phoneme distribution in the 1200 sentences is the same as the average phoneme distribution of the Polish language (Jassem, 1973), the chi-square test was performed. It proved that the two distributions were nearly perfectly fitted {x2 14.81, d.f. 38, p0.99}. Thus, the 1200-sentence set could be regarded as phonemically balanced with respect to average Polish speech. RECORDING OF SPEECH MATERIAL The sentence set was read out in a radio studio by a professional male speaker. The speaker’s age was 27 and he was a Polish radio announcer. He was asked to pronounce the sentences in a natural way. Therefore, in many sentences the level of the last word was slightly lower resulting from a decreased vocal effort of the speaker signalling the end of utterance. Recording was performed using a Neumann U87 capacitor microphone. The microphone output fed one of the input channels of a Yamaha 02R mixer. In the mixer, the microphone signal was preamplified and converted into the digital domain at a sampling rate of 44.1 kHz and with a resolution of 24 bits. It was also digitally high-pass filtered at a cut-off frequency of 80 Hz. The signals were then sent via an optical connection (ADAT-type) to a PC and stored on a computer hard disk using Samplitude Pro v.8.2. software. To prepare a set of 1200 sentences, four recording sessions were necessary, each lasting no longer than two hours. Masking noise generation The test was prepared in two stages. In the first stage, about 3500 sentences were selected automatically from a large digitalized database containing about 16 million written Polish sentences taken from everyday speech, literature, TV, and theatre. All of During measurements, the test sentences were mixed digitally with a speech babble noise (masker) with a constant level of 70 dB SPL. The babble noise was generated by summing up all of the sentences tested and after adjusting the rms value, 70 dB SPL 434 International Journal of Audiology, Volume 48 Number 7 Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010 at the output of the Sennheiser HD580 headphones was obtained. Waveforms representing single sentences were shifted with respect to each other in the time domain, and, additionally, some of them were reversed in the time domain. The sentences shifted in the time domain, as well as those reversed, were selected at random according to a uniform probability distribution. Moreover, the time shift applied was random too. This resulted in the generation of stationary noise of the power spectrum that was very similar to the spectra of the presented sentences. As a result, a 15-s sample of the speech babble noise was obtained. In the listening sessions, sentences (duration of the longest one was 2.3 s) were presented against a background of a randomly chosen portion of the babble noise. The babble noise seems to be a much more effective masker than, for example, International Telegraph and Telephone Consultative Committee (CCITT) noise reflecting spectral properties of English, German, Hungarian, Swedish, Russian, and Italian languages (Tarnoczy, 1971). First of all, the CCITT noise is characterized by a constant spectrum level up to 500 Hz and decreases by 12 dB/octave above that frequency. However, the long-time average spectrum (LTASS) of the Polish spoken language is markedly different from that of the CCITT, especially in a high-frequency region (Kosiel, 1972). The reason is that the Polish language is a sort of ‘consonant’ language: it contains six vowels only (plus two diphthongs) and thirty-one consonants, especially fricatives (Jassem, 1973). The relatively large number of consonants brings about a marked increase in the spectrum level, especially above 5 kHz, where it reaches a higher value than that for the English, Dutch, or German languages. Moreover, using the noise of the power spectrum that precisely matches the power spectra of speech signals yields steeper psychometric functions than those obtained for less ‘spectrally fitted’ maskers (Wagener et al, 1999a,b,c, 2003). The power spectral density of the babble noise, obtained by means of a fast Fourier transform (frequency resolution of 40 Hz) is depicted in Figure 1 (solid line). The spectrum reveals a Figure 1. The long-term power spectrum of the babble noise (solid line) used in the present study. For comparison, the spectrum of the masker used in the studies by Versfeld et al (2000) (dashed line), and by Kollmeier & Wesselkamp (1997) (dotted line) are presented. Polish sentence tests for measuring the intelligibility of speech in interfering noise specific minimum below 5 kHz and some enhancement above 5 kHz, reflecting the energy of consonants (mainly fricatives) typical of the Polish language. As can be seen from Figure 1, the power spectrum density of the babble noise is somewhat different from that used by Versfeld et al (2000) and by Kollmeier & Wesselkamp (1997) which, for reasons of comparison, are also presented in this figure. However, basic statistical properties of the babble noise were very similar to those used by Kollmeier & Wesselkamp (1997) and Versfeld et al (2000)(1). The differences across the spectra presented reflect mainly individual properties of the German, Dutch, and Polish languages as well as the individual speaker’s voice features. Listening sessions APPARATUS A computer-controlled Tucker-Davis Technologies (TDT) System 3 with a 24-bit digital real-time signal processor RP2, and a headphone amplifier HB7 was used to play back the sentence material with interfering noise at different SNRs. The level (in dB SPL) of the output signal was calibrated with B&K instruments (artificial ear type 4153, connected with microphone type 4143, preamplifier type 2669, and amplifier type 2610). Speech signals were presented monaurally via the Sennheiser HD 580 headphones. Experimental sessions were controlled by experiment-specific software implemented in Matlab 6.5 (MathWorks). PROCEDURE The sentence sound pressure level was changed to obtain different SNRs. The SNR was defined as the ratio of the sentence rms to the interfering noise rms. The interfering noise was a gated signal (20-ms ramps) and started 300 ms before the onset of the sentence, and ended 300 ms after the end of the sentence. Thus, the duration of the masking signal was determined by the duration of a masked utterance. All sentences were presented at three basic SNRs: 9 dB, 5 dB, and 1 dB. These three SNRs were chosen as the levels that optimally encompassed the average SRT value, determined on the basis of the pilot study. However, to obtain a more reliable fit of the psychometric function to intelligibility data, two additional SNRs were chosen individually with respect to each sentence. One of the additional SNRs was chosen from the range 1 to 5 dB, while the other was from the range 5 to 9 dB. Exact values of the additional SNRs depended on the highest absolute values of the second derivate of the psychometric function. Intelligibility scores obtained for the five SNRs allowed a much better fit of the psychometric function to each sentence. The listening session was self-paced and was controlled by the subject, who was placed in an acoustically isolated room. All instructions were displayed on an LCD screen. When the ENTER key was pressed for the first time, written instructions appeared on the monitor. Pressing this key for the second time caused the playback of one sentence via the TDT 3 system as described above. The subject’s task was to repeat the presented sentence as accurately as possible, and his/her response was recorded on the computer hard disk. Both oral as well as written responses were registered. The recording of the subjects’ oral answers was carried out by means of a separate signal channel (a condenser Sennheiser E914 microphone, a Yamaha MG10/2 Ozimek/Kutzner/Se˛ k/Wicher 435 Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010 mixer with preamplifier, and an external soundcard for a PC). By pressing the ENTER key again, the recording of the subject’s answer was stopped, and the subject’s task was to type the sentence heard. To start listening to a new sentence, the subject had to press the ENTER key again. It should be emphasized that subjects were instructed to provide an answer even if they had understood only one word in a sentence. The double recording of the subject’s answer, i.e. collecting both oral and ‘typed’ responses, was aimed at minimizing potential ambiguities that might result from, for example, typing mistakes. In case of any ambiguities or typing errors in the ‘typed’ response, the subject’s recorded oral response was considered. Each sentence from the initial set of 1200 was presented to each subject once. This means that each subject evaluated each sentence for one value of SNR only. To determine intelligibility function for each sentence, five different SNRs were necessary. Therefore, 35 subjects were divided into seven virtual groups and each group evaluated one out of five different SNRs. The virtual groups were created for each sentence. The goal behind this approach was to avoid a situation where subjects from one group were systematically presented with one SNR only. Each SNR was presented to each subject 240 times (24051200). SUBJECTS Thirty-five subjects (18 male and 17 female, aged from 18 to 25 years, mean age 22.7) participated in the experiment. The subjects had hearing levels (HL) of less than 10 dB HL at all audiometric frequencies up to 8 kHz and had no history of hearing disorders. All of them were native, monolingual Polish speakers. Prior to the actual measurements, each subject had been trained for one to two hours to become familiar with the task. This training applied also to all subjects who participated in the next experiments. The additional sentences that had been employed in the training sessions were not used in the main experiment. Collection of responses for all 1200 sentences for one subject took about eight hours, but the duration of a single measurement session was limited to two hours. The subjects were allowed to have a rest whenever they wished to. The subjects were paid for their participation in the measurements. meaning, the word order in subjects’ responses was not taken into account, i.e. if a word was repeated correctly but at an altered sentence position, it was scored as a correct answer. This approach is called the ‘word scoring’ method. In order to determine the intelligibility score for a given sentence and SNR, intelligibility data was pooled across subjects and a ratio of correctly repeated sentences and a ratio of correctly repeated words for sentence and word scoring, respectively, were computed. Determination of psychometric functions and their parameters The psychometric (intelligibility) function, i.e. the function which links the probability of correct response to the SNR, is usually approximated by the standardized cumulative normal distribution (Versfeld et al, 2000; Smits et al, 2004). It is generally assumed that the relationship between speech intelligibility and the SNR is expressed by function F(SNR) (1): SNRSRT s 100 F(SNR) pffiffiffiffiffiffi 2p g t2 2 e dt (1) The cumulative normal distribution function, F(SNR), is characterized by two parameters: SRT (i.e. the signal-to-noise ratio that produces 50% correct responses) and the standard deviation, s, characterizing a spread of data at the SRT point. The standard deviation, s, determines the S50 parameter, i.e. the slope of the cumulative distribution at the SRT point, which can be expressed by the following equation: 100 S50 pffiffiffiffiffiffi s 2p (2) Intelligibility scores for each sentence were computed in two ways. In the first method, the subject’s response was considered correct when all the words constituting the presented sentence and their order were repeated correctly. In this case the score was 100%. Otherwise, the score was set to 0% (Versfeld et al, 2000). This type of scoring is called hereafter ‘sentence scoring’. Since sentence scoring produces data of a binary nature, this set is aimed at an adaptive up/down procedure that is commonly used in laboratory and clinical measurements. The second way of scoring was based on the correct repetition of individual words in a sentence. The intelligibility score in this case was estimated as a ratio of the number of correctly repeated words to the overall number of presented words, multiplied by 100% (Kollmeier & Wesselkamp, 1997; Wagener et al, 1999a,b,c; Wagener, 2003). Missed words were treated as incorrectly repeated words. Moreover, since in Polish it is often the case that a change of the verb position does not affect the sentence Having five data points describing the probability of a correct response at five SNRs for each sentence, it was possible to apply maximum likelihood (ML) for finding the SRT and the corresponding slope S50. The iterative procedure searched for the SRT and S50 by means of minimization of a negative logarithm of likelihood ratio (Versfeld et al, 2000, Wagener et al, 2003). The slope S50 of the psychometric function for a given sentence might be changed by a modification of intelligibility of particular words in this sentence (Kollmeier & Wesselkamp, 1997). For instance, the slope increases and, consequently, the corresponding standard deviation decreases, if the intelligibility of the respective words is equalized. Hence, S50 indirectly describes the spread of intelligibilities of individual words constituting the sentence. However, in this work, the accuracy of speech tests was obtained by means of a selection of sentences characterized by relatively steep psychometric functions (Versfeld et al, 2000; Smits et al, 2004). For each sentence, two psychometric functions were fitted to results pooled across subjects: one for sentence scoring and one for word scoring. Figure 2 presents an example of psychometric functions fitted to intelligibility scores for two chosen sentences (left and right panel, respectively). Circles and solid lines present intelligibility data and the corresponding psychometric functions based on sentence scoring. Squares and dashed lines depict intelligibility data and the fitted psychometric functions for the word scoring method. 436 International Journal of Audiology, Volume 48 Number 7 Intelligibility functions INTELLIGIBILITY SCORING 100 90 90 80 80 70 60 50 40 30 Sentence scoring SRT= -6.7 dB, S50=24.9 %/dB Word scoring SRT= -8.2 dB, S50=17.4 %/dB 20 10 0 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 Speech intelligibility [%] Speech intelligibility [%] 100 70 60 50 40 30 Sentence scoring SRT= -5 dB, S50=22.2 %/dB Word scoring SRT= -6.9 dB, S50=15.4 %/dB 20 10 0 -11 -1 -10 -9 -8 Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010 SNR [dB] -7 -6 -5 -4 -3 -2 -1 SNR [dB] Figure 2. Intelligibility data (circles for sentence scoring and squares for word scoring) and fitted psychometric functions (solid line for sentence scoring and dashed line for word scoring) as an example of two sentences (left panel: Rzadko chodza˛ do teatru: They rarely go to a theatre; right panel: Wiele dzieci tam sie˛ bawi: Many children are playing there). As shown in Figure 2, the intelligibility scores based on sentence and word scoring are very similar at relatively high SNRs, while for low SNRs the intelligibility for word scoring is better than that for sentence scoring. This simply results from the scoring method itself: for low SNRs subjects are unable to repeat the whole sentence correctly. However, it happens quite often that they repeat correctly one or more (yet not all) words in a sentence. Taking into account that sentences consisted on average of 4.6 words and assuming that one word was correctly repeated, the word scoring method results with the intelligibility of about 2025%, as depicted in the right panel of Figure 2 for SNR 9 dB. On the other hand, at SNR 1 dB the differences in intelligibility score for both scoring methods are negligible. Thus, the psychometric functions obtained in the case of word scoring reveal a smaller S50 and lower SRT than these for sentence scoring, as expected. In this way, two sets of 1200 SRT and S50 values were determined, i.e. two (one for sentence scoring and one for word scoring) SRT values and two corresponding S50 values for each sentence. The most important statistics of the fitted psychometric functions, determined SRTs, and S50s, are presented in Table 1. Composition of the final sentence lists The composition of the final sentence lists was aimed at developing statistically and phonemically equivalent sentence lists. It is important to minimize the scatter of SRTs across lists so that test reliability is adequate for clinical purposes. To do this, it was decided that the SRT of any sentence to be included in the final lists should not exceed the range of 91.5 dB with respect to the mean SRT of all sentences tested, i.e. 6.1 dB and 7.3 dB for sentence and word scoring, respectively. Moreover, to reduce the data scatter resulting from inequality of the intelligibility of individual words, only sentences with slope S50 not smaller than 15%/dB were selected. As a result, 500 and 440 sentences fulfilling the above criteria for the sentence and word scoring methods, respectively, were selected. In this way, it was possible to create 25 lists (20 sentences each) for the sentence scoring method, and 22 lists (20 sentences each) for the word scoring method. The last stage of developing the Polish sentence test was to compile the sentences into final lists, so that the following criteria would be met: . the lists should be statistically equivalent, i.e. the average SRT and S50 within each list must not depend on the list number, . the lists should contain phonemically comparable content. The phoneme content should also be consistent with the reference phoneme distribution of the Polish language (Jassem, 1973). A special algorithm was prepared and implemented in Matlab 7.0 (MathWorks), which realized Monte Carlo simulations. The steps in the compilation were as follows: 1. Random permutations of 500 and 440 sentences for the sentence and word scoring, respectively, were generated. 2. The random series of sentences were grouped in 25 (sentence scoring) and 22 (word scoring) preliminary lists of 20 sentences each. Table 1. Basic statistics of the SRTs and S50s for sentence and word scoring methods. Sentence scoring SRT [dB] S50 [%/dB] Word scoring mean s min max Kurtosis (normalized) mean s min max Kurtosis (normalized) 6.1 22.6 1.9 14.2 8.9 10.1 3.5 35.1 0.71 0.73 7.3 19.3 2.3 13.7 9.9 9.4 3.5 31.1 0.86 0.83 Polish sentence tests for measuring the intelligibility of speech in interfering noise Ozimek/Kutzner/Se˛ k/Wicher 437 This algorithm led to the composition of a set of equivalent lists. Finally, 25 statistically and phonemically equivalent lists containing 20 sentences each for sentence scoring and 22 statistically and phonemically equivalent lists containing 20 sentences each for word scoring were obtained. The list-specific SRTlist for a list of 20 sentences was calculated as the average SRTs across the sentences in the lists. The list-specific psychometric function, i.e. the function that models speech intelligibility, for a list composed of different sentences of different SRTs and S50 can be computed according to the so-called probabilistic model proposed by Kollmeier (1990). This model states that the list-specific intelligibility function is a convolution of a mean psychometric function (i.e. a function of S50 equal to mean S50 averaged across the sentences in an analysed list) and a distribution of SRTs within a list. The slope of the list-specific function, i.e. list-specific slope S50list, can be calculated directly according to the following equation (Kollmeier, 1990): S50mean S50list : sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 16S50mean2 sSRT2 1 (ln(2e1=2 1 2e1=4 ))2 (3) LISTS FOR SENTENCE INTELLIGIBILITY SCORING The goodness-of-fit x2 test revealed that there were no significant differences in the mean phoneme distribution between the sentence lists and the reference phoneme distribution for the Polish language {x2 15.31, d.f.38, p0.99}. The phoneme balance is at least as good as the ones characterizing the tests developed by other authors (Kollmeier & Wesselkamp, 1997; Versfeld et al, 2000). Figure 3 presents the average SRT values and corresponding standard deviations for respective lists. The average SRT values for individual lists lie within the range from 6.3 dB to 5.8 dB (mean SRT6.1 dB). The mean list-specific slope S50list (determined according to the formula (3)) is 25.5%/dB (i.e. 0.255 dB 1). The separate one-way ANOVA analyses showed that neither the SRTs {F(24,499) 0.54, p0.96} nor the slope S50 {F(24,499) 0.55, p 0.96} varied significantly across the lists. LISTS FOR WORD INTELLIGIBILITY SCORING The goodness-of-fit x2 test revealed that there were no differences in the mean phoneme distribution between the sentence lists and the reference phoneme distribution {x2 15.41, d.f. 38, p 0.99}. Figure 4 depicts the average SRT values and the corresponding standard deviations for the respective lists. The intelligibility functions for word scoring reveal generally a smaller slope than those determined for sentence scoring; the mean list-specific slope is 20.8%/dB, i.e. close to that characterizing the German sentence test for word scoring (Kollmeier & Wesselkamp, 1997), i.e. 19.2%/dB. The average SRT values for the lists vary in the range from 7.6 dB to 7.2 dB. The mean SRT 7.5 dB is 1.4 dB lower than that characterizing the set for sentence scoring, as expected (see section: Determination of psychometric functions and their parameters, above). The ANOVA revealed that neither the SRT nor the S50 varied significantly across lists ({F(21,439) 0.29, p0.99} and {F(21,439) 0.40, p0.99}, respectively). Therefore, the lists developed for word scoring might be also regarded as statistically and phonemically equivalent. -3 SRT averaged across sentences [dB] Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010 3. For 25 sentence scoring lists, two separate one-way ANOVAs were performed with respect to the SRT and slope S50, respectively. The lists were regarded as equivalent when there were no statistically significant differences between the average SRTs across the lists and average S50s. When the SRTs across the lists and S50s were significantly different, i.e. if there was statistically significant difference in one of the analysed parameters, steps 13 were repeated. The same analysis was carried out independently with respect to the 22 lists for word scoring. 4. In the last step, phoneme distribution analysis was performed. A specially designed algorithm transformed the written sentences into SAMPA-broad (Polish extension) (Wypych et al, 2003) phonemic code, taking into account the co-articulation effects using phoneme distribution rules. Subsequently, phonemic distributions for each list were determined and compared with the reference distribution for the Polish language (Jassem, 1973). The lists were regarded as phonemically balanced if, for each phoneme and each list, the frequency of occurrence of any phoneme did not exceed the range of92.5 percentage points with respect to mean frequency of occurrence for that phoneme in the Polish language. If a sentence permutation meeting the above criteria was not found, a new permutation was generated and steps 14 were repeated. -4 -5 -6 where S50mean is a mean slope (expressed in dB 1, for example 20%/dB00.2 dB1) averaged across the sentences in a list, and sSRT is the standard deviation of the SRT across the sentences in a list. As follows from the above formula, a large list-specific slope requires both high slope values characterizing individual sentences as well as a small spread of the sentence-specific SRT values 3. In an ideal situation, there is no spread of SRT values for the respective sentences, sSRT 0 dB and, consequently, S50list S50mean. Conversely, if sSRT 0 dB, S50list BS50mean. A large S50list implies a low standard deviation of SRT estimation, SDSRT (Smits & Houtgast, 2006). Figure 3. List-specific SRTs (filled circles) for respective lists optimized for sentence scoring (error bars depict standard deviations across sentences) 438 International Journal of Audiology, Volume 48 Number 7 -7 -8 -9 mean SRT= -6.1 dB 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 list index THE -4 mean SRT = -7.5 dB -5 -6 -7 -8 -9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010 list index Figure 4. List-specific SRTs (filled circles) for respective lists optimized for word scoring (error bars depict standard deviations across sentences) Verification of the reliability of the sentence tests In the final stage of this study, verification experiments were carried out to check the reliability of the developed Polish sentence materials. The retest measurements were conducted by means of the constant stimuli paradigm as well as by the adaptive procedure. CONSTANT STIMULI PARADIGM A new group of 20 subjects (12 male and 8 female, mean age 24.8 years), with normal hearing, participated in the measurements. All of them were native, monolingual Polish speakers. They were paid for taking part in the experiments. For ten subjects, the intelligibility data were estimated using the sentence scoring, and for the other ten subjects using the word scoring. Each subject was presented with five randomly chosen lists from each list set. As in traditional speech audiometry, the randomly chosen lists were presented at different SNRs. In this case, SNR values were as follows: 10,8, 6, 4, and2 dB. Each list was presented to a given subject only once. For each subject, one SRT value and the corresponding S50 were determined (note that in this experiment the scatter of data both within and between the lists affected the measurement, hence S50 corresponds to S50list). For the list set optimized for sentence scoring, the mean SRT and the mean S50list averaged across the subjects were 6.0 dB and 24.2%/dB, respectively. The results of the retest measurements, therefore, corresponded very well to the test values, i.e. 6.1 dB and 25.5%/dB, respectively. Standard deviation of SRT across the subjects was 0.6 dB. As far as the data for word scoring are concerned, mean SRT and mean S50 averaged across subjects were 7.3 dB and 21.9%/ dB, respectively. In this case, the retest values also turned out to be very close to those obtained in the first experiment, i.e. 7.5 dB and 20.8%/dB, respectively. Standard deviation of SRT across subjects was equal to 0.8 dB. Therefore, the results of the retest measurement for the constant stimuli paradigm confirmed the statistical properties of the materials developed. Polish sentence tests for measuring the intelligibility of speech in interfering noise ADAPTIVE PROCEDURE Since speech intelligibility measurements are often carried out using an adaptive procedure, in the next step the statistical equivalence of the sentence lists was reanalysed by means of a staircase procedure that enables a direct SRT estimation for each list. This procedure could be used only for sentence scoring. The main retest measurements were carried out using the standard adaptive procedure with the one-up/one-down decision rule, i.e. after a correct or incorrect subject’s response, the SNR was either decreased or increased. The initial SNR was set to 2 dB, i.e. the speech was relatively easily understood at the beginning of the measurement. The initial step was 2 dB and was reduced to 1 dB after the first incorrect response. The SRT was computed as the mean of the last 12 SNR values, including the last SNR determined by the adaptive procedure (Smits et al, 2004; Smits & Houtgast, 2006). The main advantage of the adaptive procedure over the constant stimuli method is that only one list is required to estimate the SRT value, while for the constant stimuli method, at least two different lists are needed to determine the psychometric function and the SRT point. Moreover the intelligibility scores for these SNRs have to encompass 50% intelligibility point, thus in practice some preliminary testing is often required prior an actual SRT measurement, which is not the case for the adaptive method. A new group of 10 (four male and six female, mean age 23.2) normally-hearing subjects took part in the experiments. All of them were native, monolingual Polish speakers. The subjects were paid for their participation in the experiments. Each subject was presented with each sentence list taken from the set for sentence scoring. Figure 5 presents the results of measurements, i.e. SRT values averaged across subjects and the corresponding standard deviations for respective sentence lists. The obtained SRTs were subjected to within-subject ANOVA with respect to the list index. The ANOVA revealed that the SRT did not depend on the list index {F(24,249)0.85, p0.67}, i.e. statistical balance of the respective sentence list was proved. -3 -4 mean SRT = -6.2 dB -5 SRT [dB] SRT averaged across sentences [dB] -3 -6 -7 -8 -9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 list index Figure 5. Mean SRTs (filled circles) and corresponding standard deviations (error bars) obtained in retest measurement using the adaptive procedure (data averaged across subjects). Ozimek/Kutzner/Se˛ k/Wicher 439 Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010 The purpose of this study was to develop a Polish sentence test for speech intelligibility measured in the presence of masking noise. The obtained SRT and slope data have been referred for comparison reasons to the data obtained for other European languages. In Table 2, the data and properties of various sentence intelligibility tests developed by different authors and of those of the current study (shaded bottom section) are compared. It was found, for example, that the mean SRT for Polish sentence materials was close to the SRT for the German Göttingen test based on the word scoring method (Kollmeier & Wesselkamp, 1997), i.e. 6.23 dB. On the other hand, the SRT for Polish materials optimized for word scoring is about 1 dB lower than the Göttingen test, and is comparable to the SRT characterizing semantically unpredictable as well as syntactically and lexically constrained material (Hagerman, 1982; Wagener et al, 1999a,b,c, 2003). The mean SRT of the Polish sentence lists optimized for sentence scoring was approximately 2 dB lower than that obtained for Dutch tests (Plomp & Mimpen, 1979; Smoorenburg, 1992; Versfeld et al, 2000), and about 3 dB lower than the SRT characterizing American English materials (Nilsson et al, 1994). Some similarities and differences can be found in the S50 parameter obtained for Polish and other tests. For example, the S50list obtained for the word scoring method (20.8%/dB) is similar to that characterizing the German Göttingen test (19.2%/dB), (Kollmeier & Wesselkamp, 1997), but is clearly different from slopes obtained for Danish (13.2%/dB) (Wagener, 2003), or Swedish (16.0%/dB) (Hagerman, 1982). The differences across languages in SRTs and slopes may be due to many factors, such as: differences in linguistic structure between languages, an amount of contextual information, the type of used noise (masker), the speaker, and the presentation mode. As can be seen from the data depicted in Table 2, there are significant differences in the masking signal, scoring method, predictability, and psychometric function parameters. Values of SRT vary across the tests from 8.4 dB (Wagener et al, 2003) to 2.9 dB (Nilsson et al, 1994). SRTs are typically lower for speech materials of a limited vocabulary (Hagerman, 1982; Wagener et al, 1999a,b,c; Wagener et al, 2003), while SRTs are the highest for Plomp-like tests (Plomp & Mimpen, 1979; Smoorenburg, 1992; Versfeld et al, 2000). This difference might be a consequence of using a very limited number of words in a test. For example, the entire OLSA or DANTALE tests are composed of 50 words (Wagener et al, 1999a,b,c, 2003). In contrast, in Plomp-like tests, neither word content nor grammar structure are constant across the respective lists. What should be emphasized is the role of masker type used in different studies on the intelligibility data4. Although the longterm spectra of the different types of masker may be very similar, they may differ significantly in their temporal structure, e.g. a band of white noise versus a band of low-noise noise at the same bandwidth and power (Kohlrausch et al, 1997). Another important issue is the choice of the optimal number of sentences used in a list, which determines the number of equivalent lists that can be developed from the selected sentence material and the duration of the intelligibility measurement. This topic is now under our investigation. The first data revealed that the 20-sentence lists used in this experiment can be reduced to 13-sentence lists without diminishing the accuracy of the SRT measurement and the phonemic balance (Ozimek et al, 2007). This number corresponds well with that proposed by Plomp & Mimpen (1997) and Versfeld et al (2000). From a practical point of view, it is more desirable to have 37 equivalent 13-sentence lists than 25 equivalent 20-sentence lists, because it permits a greater number of measurements. However, in the first case, the result of an adaptive measurement might be biased by an inappropriately chosen initial SNR value. In order to determine an individual SRT value it is sufficient to collect the intelligibility scores for three to five randomly selected lists presented at different SNRs depending on the patient’s hearing loss. Determination of the average SRT based on four randomly chosen lists takes about 15 minutes (the constant stimuli method). If the ‘typed’ subjects’ responses are not recorded (like in the traditional speech CVC audiometry), the time of data collection for one patient should decrease to less than 10 minutes. Speech intelligibility assessment using the adaptive procedure with the one-up/one-down decision rule is the most effective, since in this case a measurement takes approximately four minutes and one list only is required to estimate the SRT value. One of the most important aspects of the speech intelligibility study is to apply the speech material in a real clinical situation to determine the intelligibility of the subjects with hearing loss. It is well known that hearing-impaired subjects show worse performance in speech intelligibility tests than normal-hearing subjects do. The intelligibility deterioration may be attributed to several factors such as: diminished audibility, reduced frequency selectivity, loudness recruitment etc. We have recently undertaken an investigation applying developed sentence test to hearing-impaired subjects, showing sloping audiograms at higher frequencies, notch-like hearing loss within the range of 24 kHz, and for subjects with tinnitus determined as a 440 International Journal of Audiology, Volume 48 Number 7 The minimal and maximal SRTs are 6.9 dB (list no. 4) and 5.8 dB (list no. 10), respectively, while the minimal and maximal standard deviation across subjects is 0.6 dB (list no. 1) and 1 dB (list no. 4), respectively. These values correspond very well to the inter-individual standard deviations characterizing the retest data obtained by Versfeld et al (2000), 1.07 dB; Plomp & Mimpen (1979), 0.9 dB; and Nilsson et al (1994), 1.13 dB. The mean SRT (averaged across the lists) and the corresponding standard deviation are 6.2 dB and 0.2 dB, respectively. The standard deviation across the lists corresponds very well to 0.3 dB obtained by Kollmeier (1990), while it is slightly lower than the standard deviation across the sentence lists developed by Versfeld et al (2000), i.e. 0.6 dB. The mean retest value obtained, i.e. 6.2 dB, is very close to the expected value, i.e. 6.1 dB; thus, the result of the retest measurements has confirmed the high accuracy and repeatability of the sentence materials developed. Apart from the standard deviations describing the interindividual differences and the differences between the respective lists, the standard deviations of the SRT estimates were determined, i.e. the spread of SNRs at which the last 12 sentences were presented was analysed. The mean sSRT was 1.7 dB, and corresponded very well to the theoretical value that can be computed by substituting the mean S50list value to equation (2), i.e. 1.6 dB. Discussion Ozimek/Kutzner/Se˛ k/Wicher Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010 Polish sentence tests for measuring the intelligibility of speech in interfering noise Table 2. Properties of sentence intelligibility tests reported by different authors and obtained in the current study (shaded bottom section). No. 1 Language Masker Scoring method SRT [dB] Mean slope [%/dB] Speaker Remarks 2 Dutch (Plomp and Mimpen, 1979) Dutch (Smoorenburg, 1992) 3 Dutch (Versfeld et al, 2000) 4 5 German (Kollmeier and Wesselkamp, 1997) German (Wagener, 1999a,b) Speech-shaped stationary noise Speech-shaped stationary noise Individually shaped white noise Superposition of monosyllabic words Babble noise Sentence scoring 4.5 15.9* Female Sentence scoring 3.7 17.7* Male Semantically predictable sentences (Plomptype test) Plomp-type test Sentence scoring 4.1 16.3* Male, female Plomp-type test Word scoring 6.2 19.2** Male Plomp-type test (‘Götingen test’) Word scoring 7.1 17.1** Male Male Semantically unpredictable sentences (‘Oldenburg’ Satztest, OLSA), fixed grammatical structure Semantically unpredictable sentences (‘DANTALE II’ test), fixed grammatical structure Semantically unpredictable sentences, fixed grammatical structure Plomp-type test, adaptive procedure 6 Danish (Wagener, 2003) Babble noise Word scoring 8.4 13.2** Female 7 Swedish (Hagerman, 1982) Babble noise Word scoring 8.1 16.0* Female 8 2.9 Sentence scoring Sentence scoring Word scoring Sentence scoring 6.1 6.1 7.5 6.7 25.6** 25.5** 20.8** 22.5** Male Male Male Male Plomp-type Plomp-type Plomp-type Plomp-type 13 Polish Speech-shaped stationary noise Babble noise Babble noise Babble noise Speech-shaped stationary noise Speech-shaped stationary noise Sentence scoring 9 10 11 12 American English (Nilsson, 1994) Polish Polish Polish Polish Word scoring 7.8 18.6** Male Plomp-type test (25 lists of 20 sentences) *mean slope S50mean, **list-specific slope S50list determined according to formula (3). test test test test (37 (25 (25 (25 lists lists lists lists of of of of 13 20 20 20 sentences) sentences) sentences) sentences) 441 Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010 phantom auditory perception (Jastreboff, 1990; Ozimek et al, 2006a). The preliminary data indicate that for the subjects with the sloping audiograms and hearing loss less than 40 dB HL, the mean SRT for sentence scoring is 2.3 dB, while for subjects with hearing loss above 40 dB HL, the mean SRT is higher and amounts to 0.7 dB (the corresponding mean SRT for sentence scoring and normal hearing subjects is 6.1 dB as shown in this study). Similar data have been obtained for the subjects with notch-like hearing loss. As can be seen, a separation in SRTs between the normal and hearing-impaired subjects is of the order of 3.8 to 5.4 dB and is narrower and lower than those (410 dB) obtained by Wilson at al (2007). The preliminary data also show that for the tinnitus subjects (without hearing loss), the mean SRT equals 4.2 dB, and is higher by 1.9 dB than that for normal-hearing subjects, indicating the effect of tinnitus. It is worth noticing that even small differences in SRT between the normal-hearing and hearing-impaired subjects indicate significant changes in the speech intelligibility. For example the difference in speech-to-noise ratio by about 1 dB produces changes in intelligibility of about 719% (Nilsson et al, 1994). A substantial data set related to our on-going study with hearingimpaired subjects will be reported in a separate paper. We would like to stress finally, that to the best of our knowledge, the Polish sentence test reported in this study is the first test of this type among Western Slavonic languages. The family of Slavonic languages comprises three main groups: Eastern (Russian, Ukrainian, and Byelorussian), Western (Polish, Czech, and Slovak), and Southern (Serbian-Croatian, Bulgarian, Slovenian, and Macedonian) languages, used by a population of about 300 million. This fact emphasizes the need for developing intelligibility tests for those languages. Notes This work was financially supported by the European Union FP6, Project 004171 HEARCOM, and the State Ministry of Education and Science. A portion of this work was presented at the 19th International Congress on Acoustics, Madrid, 27 September 2007. [1] Characterizing some statistical properties of the babble noise used in this study it was found that the crest factor Cf, reflecting the dynamic range of amplitude fluctuations, and normalized kurtosis (kurt), characterizing a relation of the distribution of instantaneous values to the normal distribution, were Cf 4.51 and kurt 1.02 (kurt m4/(3s4), where m4 is the fourth central moment and s is the standard deviation.) These values are similar to Cf 4.65 and kurt 1.04, determined for masker used by Versfeld et al (2000), and to Cf 4.55 and kurt 0.98, determined for masker used by Kollmeier & Wesselkamp (1997). Furthermore, the Cf and kurt values for the envelopes of these noises were also similar: Cf 3.84, kurt1.15 in Versfeld et al (2000); Cf 3.30, kurt1.05 in Kollmeier & Wesselkamp (1997); and Cf 3.48, kurt1.15 in the present study. [2] Although there are some standards aimed at optimizing the rms measurement of a speech signal (for example, ITU standard P.56 (1994) ‘Objective measurement of active speech level’) a classical method of rms calculation was employed in this study. The main reason for using the classical method of rms calculation was to compare the obtained results with the outcomes of the previous studies (Kollmeier & Wesselkamp, 1997; Versfeld et al, 2000). Furthermore, if the ITU norm had been used, it would be impossible to apply the Polish sentence material to standard clinical audiometers. [3] Equation (3) can also be used for a single sentence. According to this equation, a large S50 of a sentence implies a comparable intelligibility of words constituting the sentence. [4] Our preliminary data show that, for the developed speech material, the approximate slope of psychometric functions for speech-shaped and CCITT maskers is 22%/dB and 14%/ dB, respectively (Ozimek et al, 2006b). The following SRTs have been obtained: speech-shaped white noise 7 dB and CCITT 16 dB, respectively. Therefore, the spectral and temporal properties of the masking signal have a direct influence on S50 and SRT values. Moreover, sensations produced by different maskers could influence the perception of speech under noisy conditions and, consequently, the parameters of the psychometric functions. An analysis of values of the perceptual parameters such as: loudness, sharpness, and roughness of the maskers used in different studies have shown that they are not equal. Definitions and significance of these parameters are given in Zwicker & Fastl (1999). Estimation of these parameters was performed using the ArtemiS v.4.00.300 software (HEAD acoustics GmbH) and was followed by normalization of the masker’s level to 70 dB SPL. It has turned out that the masker used in the present investigation is characterized by higher loudness (G37.2 sone) in comparison with G 31.8 sone, determined for the masker used by Versfeld et al (2000), and G33.9 sone, determined for the masker used by Kollmeier & Wesselkamp (1997). The so-called sharpness of maskers in the present study was S3.7 acum, while that in the study by Versfeld 442 International Journal of Audiology, Volume 48 Number 7 Conclusions This study allows us to draw the following conclusions: . The mean SRT and mean S50list values estimated for the Polish sentence tests using the sentence scoring method were 6.1 dB and 25.5%/dB, respectively, while those estimated using the word scoring method were 7.5 dB and 20.8%/dB. Thus, SRT and S50 values resulting from sentence scoring are different from those for word scoring. . The slope of the psychometric functions for the Polish sentence test optimized for sentence scoring is slightly higher than that for other languages. The difference may be related to the type of masker used, and the linguistic structure of the speech material in various languages. . The Polish sentence lists are reliable, accurate, and bring about repeatable speech intelligibility data, when speech is presented against an interfering noise. Acknowledgements et al (2000) was S3.0 acum; and in the study by Kollmeier & Wesselkamp (1997) it was S2.7 acum. The roughness of the masker in the present study was R3.3 asper; while that in the paper by Versfeld et al (2000) was R 2.8 asper; and in Kollmeier & Wesselkamp’s study (1997), R3.2 asper. Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010 References Bosman, A.J. & Smoorenburg, G.F. 1995. Intelligibility of Dutch CVC syllables and sentences for listeners with normal hearing and with three types of hearing impairment. Audiology, 34 260284. Brachmański S. & Staroniewicz P. 1999. Fonetyczna struktura materialu testowego stosowanego w subiektywnych pomiarach jakości mowy. (in Polish) (Phonetic structure of test material for subjective assessments of speech quality). Speech and Language Technology. Poznań, 3, 7180. Brand, T. & Kollmeier, B. 2002. Efficient adaptive procedures for threshold and concurrent slope estimates for psychophysics and speech intelligibility tests. J Acoust Soc Am, 111, 28012810. Bronkhorst, A.W., Brand, T. & Wagener, K. 2002. Evaluation of context effects in sentence recognition. J Acoust Soc Am, 111, 28742886. Grocholewski, S. 2001. Statystyczne podstawy systemu ARM dla je˛ zyka polskiego. (in Polish) (Statistical description of automatic speech recognition system for Polish language). Poznań: Wydawnictwo Politechniki Poznanskiej. Hagerman, B. 1982. Sentences for testing speech intelligibility in noise. Scand Audiol, 11, 7987. ITU. 1994. Objective measurement of active speech level. Standard P.56. Jassem, W. 1973. Podstawy fonetyki akustycznej (in Polish) (Foundations of Acoustical Phonetics) Warszawa: PWN. Jastreboff P.J. 1990. Phantom auditory perception tinnitus; mechanism of generation and perception. Neurosci Res (NY), 8, 221254. Kalikow, D.N., Stevens, K.N. & Elliot, L.L. 1977. Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. J Acoust Soc Am, 61, 13371351. Kohlrausch, A., Fassel, R., van der Heijden, M., Kortekaas, R., van de Par, S., et al. 1997. Detection of tones in low-noise noise: Further evidence for the role of envelope fluctuations. Acustica. ActaAcustica, 83, 659669. Kollmeier B. 1990. Messmetodik, Modellierung und Verbeserung der Verstandlichkeir von Sprache. (in German) (Methodology, modelling and improvement of speech intelligibility measurements). Habilitation thesis, Göttingen, University of Göttingen. Kollmeier, B. & Wesselkamp, M. 1997. Development and evaluation of a sentence test for objective and subjective speech intelligibility assessment. J Acoust Soc Am, 102(4), 10851099. Kosiel U. 1972. Analiza statystyczna cech indywidualnych glosu w średnim widmie mowy polskiej. (in Polish) (Statistical analysis of individual voice features on basis of average spectrum of Polish speech). PhD Dissertation, Warszawa: IPPT. Martin M. 1997. Speech Audiometry. Wiley Publishers (2nd ed.). Nilsson, M., Soli, S.D. & Sullivan, J.A. 1994. Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. J Acoust Soc Am, 95, 10851099. Ozimek, E., Wicher, A., Szyfter, W. & Szymiec, E. 2006a. Distortion product otoacoustic emission (DPOAE) in tinnitus patients. J Acoust Soc Am, 119, 527538. Ozimek E., Kutzner D., Se˛ k A. & Wicher A. 2006b. The effect of the type of the interfering noise on the slope and SRT value of the intelligibility functions of speech test. XXVIII International Congress of Audiology, Innsbruck, 37 September, LO 63, 50. Ozimek E., Kutzner D., Se˛ k A. & Wicher A. 2007. Polish sentence test for speech intelligibility measurements in masking conditions. 19th International Congress on Acoustics, Madrid, 27 September, CAS03-013, p. 148. Polish sentence tests for measuring the intelligibility of speech in interfering noise Ozimek E., Kutzner D., Se˛ k A. & Wicher A. 2008. Development and evaluation of Polish digit triplet test for auditory screening. Speech Comm, 51, 307316. Plomp, R. & Mimpen, A.M. 1979. Improving the reliability of testing the speech reception threshold for sentences. Audiology, 18, 4353. Pruszewicz, A., Demenko, G., Richter, L. & Wika, T. 1994a. Nowe listy artykulacyjne do badań audiometrycznych. Cz. 1. (in Polish) (New articulation lists for speech audiometry, Part 1). Otolaryngol Pol, 48, 5055. Pruszewicz, A., Demenko, G., Richter, L. & Wika, T. 1994b. Nowe listy artykulacyjne do badań audiometrycznych. Cz. 2. (in Polish) (New articulation lists for speech audiometry, Part 2). Otolaryngol Pol, 48, 5662. Runge, C.A. & Hosford-Dunn, H. 1985. Word recognition performance with modified CID W-22 word lists. J Speech Hear Res, 28(3), 355362. Smits, C. & Houtgast, T. 2006. Measurements and calculations on the simple up-down adaptive procedure for speech-in-noise tests. J Acoust Soc Am, 120(3), 16081621. Smits, C., Kapteyn, T. & Houtgast, T. 2004. Development and validation of an automatic speech-in-noise screening test by telephone. Int J Audiol, 43, 1528. Smoorenburg, G.F. 1992. Speech reception in quiet and in noisy conditions by individuals with noise-induced hearing loss in relation to their tone audiogram. J Acoust Soc Am, 91(1), 421437. Studebaker, G.A., Sherbecoe, R.L., McDaniel, D.M. & Gwaltney, C.A. 1999. Monosyllabic word recognition at higher-than-normal speech and noise levels. J Acoust Soc Am, 105, 24312444. Tarnoczy, T. 1971. Das durchschnittliche Energie-Spektrum der Sprache (für sechs Sprachen). (in German) (A long-term spectrum of speech (for six languages)). Acustica, 24, 4674. Versfeld, N.J., Daalder, L., Festen, J.M. & Houtgast, T. 2000. Method for the selection of sentence material for efficient measurement of the speech reception threshold. J Acoust Soc Am, 107, 16711684. Wagener, K., Brand, T. & Kollmeier, B. 1999a. Development and evaluation of a German sentence test II: Optimalization of the Oldenburg sentence tests (in German). Z Audiol 38, 4456. Wagener, K., Brandt, T. & Kollmeier, B. 1999b. Development and evaluation of a German sentence test I: Design of the Oldenburg sentence test. (in German). Z Audiol, 38, 415. Wagener, K., Brandt, T. & Kollmeier, B. 1999c. Development and evaluation of a German sentence test III: Evaluation of the Oldenburg sentence test. (in German). Z Audiol, 38, 8695. Wagener K., Eenbohm F., Brandt T. & Kollmeier B. 2005. Ziffern-TripelTest: Sprachverstandlichkeitstest uber das Telefon. (in German), (Digit triplets test for speech intelligibility measurements via telephone). Z Audiol, Suppl. 8, p. CD-ROM. Wagener, K., Josvassen, J.L. & Ardenkjaer, R. 2003. Design, optimization, and evaluation of a Danish sentence test in noise. Int J Audiol 42(1), 1017. Welge-Lüssen A., Hauser R., Erdmann J., Schwob C. & Probst R. 1997. Speech audiometry with logatomes. HNO-Universitatsklinik, Kantonsspital Basel, Switzerland. Wilson, R.H., Burks, A.B. & Weakley, G.W. 2005. A comparison of word-recognition abilities assessed with digit pairs and digit triplets in multitalker babble. J Rehabil Res Dev, 42(4), 499510. Wilson, R.H., McArdle, R.A. & Smith, S.J. 2007. An evaluation of the BKB-SIN, HINT, QuickSIN, and WIN materials on listeners with normal hearing and listeners with hearing loss. J Speech Lang Hear Res 50, 844856. Wypych M., Demenko G. & Baranowska E. 2003. Grapheme-tophoneme transcription algorithm based on the SAMPA alphabet extension for the Polish language. International Congress of Phonetic Science, Barcelona. Zwicker, E. & Fastl, H. 1999. Psychoacoustics Facts and Models. New York: Springer. Ozimek/Kutzner/Se˛ k/Wicher 443