International Journal of Audiology Polish sentence tests for

advertisement
This article was downloaded by: [International Society of Audiology]
On: 18 January 2010
Access details: Access Details: [subscription number 789957306]
Publisher Informa Healthcare
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 3741 Mortimer Street, London W1T 3JH, UK
International Journal of Audiology
Publication details, including instructions for authors and subscription information:
http://www.informaworld.com/smpp/title~content=t713721994
Polish sentence tests for measuring the intelligibility of speech in
interfering noise
Edward Ozimek a; Dariusz Kutzner a; Aleksander Sęk a; Andrzej Wicher a
a
Institute of Acoustics, A. Mickiewicz University, Poznań, Poland
To cite this Article Ozimek, Edward, Kutzner, Dariusz, Sęk, Aleksander and Wicher, Andrzej(2009) 'Polish sentence tests
for measuring the intelligibility of speech in interfering noise', International Journal of Audiology, 48: 7, 433 — 443
To link to this Article: DOI: 10.1080/14992020902725521
URL: http://dx.doi.org/10.1080/14992020902725521
PLEASE SCROLL DOWN FOR ARTICLE
Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf
This article may be used for research, teaching and private study purposes. Any substantial or
systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or
distribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contents
will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses
should be independently verified with primary sources. The publisher shall not be liable for any loss,
actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly
or indirectly in connection with or arising out of the use of this material.
Original Article
International Journal of Audiology 2009; 48:433443
Edward Ozimek
Dariusz Kutzner
Aleksander Se˛k
Andrzej Wicher
Institute of Acoustics, A. Mickiewicz
University, Poznań, Poland
Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010
Key Words
Speech intelligibility
Sentence test
Psychometric function
Speech reception threshold
Abbreviations
CCITT: International Telegraph
and Telephone Consultative
Committee
SNR: Signal-to-noise ratio
SRT: Speech reception threshold
Polish sentence tests for measuring the
intelligibility of speech in interfering noise
Abstract
Sumario
The aim of this study was to develop Polish sentence tests
for accurate measuring of speech intelligibility in masking
interfering noise. Two sets of sentence lists have been
developed. The first set was composed of 25 lists and was
used for sentence intelligibility scoring. The second set
was composed of 22 lists and was used for word
intelligibility scoring. The lists in each set have been
phonemically and statistically balanced. The speech
reception threshold (SRT) and slope of the psychometric
function at the SRT point (S50) were determined in
normal-hearing subjects. It was found that the mean
SRT and mean list-specific S50list for the first set were
equal to 6.1 dB and 25.5%/dB, respectively. The mean
SRT and the mean list-specific S50list for the second set
were:7.5 dB and 20.8%/dB. Due to a relatively steep
slope of the psychometric functions, the Polish sentence
tests were shown to be accurate materials for speech
intelligibility measurements against interfering noise.
They are the first sentence speech-in-noise tests developed
for Slavonic languages.
El objetivo de este estudio fue desarrollar pruebas de
frases en Polaco para medir con exactitud la inteligibilidad del lenguaje en medio de ruido de balbuceo
enmascarante. Dos grupos de listas de frases han sido
desarrolladas. El primer grupo se compuso de 25 listas y
se uso para puntaje de inteligibilidad de frases. El
segundo grupo se compuso de 22 listas y se uso para
puntaje de inteligibilidad de palabras. Las listas en cada
grupo fueron balanceadas fonémica y estadı́sticamente.
El umbral de recepción del lenguaje (SRT) y la pendiente
de la función psicométrica en el punto del SRT (S50) se
determinaron en individuos normo-oyentes. Se encontró
que el SRT medio y la lista especı́fica S50list media para el
primer grupo eran iguales a 6.1 dB y 25.5% dB,
respectivamente. El SRT medio y la lista especı́fica
S50list media para el segundo grupo fueron: 7.5 dB y
20.8% dB. Debido a la pendiente relativamente empinada
de las funciones psicométricas, se vio que las pruebas de
frases en Polaco eran materiales exactos para medidas de
inteligibilidad del lenguaje contra el ruido de interferencia. Ellas constituyen la primera prueba de frases en
ruido desarrolladas para las lenguas eslávicas.
Many methods have been proposed for measuring speech
intelligibility in quiet and against an interfering noise (Kalikow
et al, 1977; Plomp & Mimpen, 1979; Hagerman, 1982; Nilsson
et al, 1994; Pruszewicz et al, 1994a,b; Kollmeier & Wesselkamp,
1997; Brachmański & Staroniewicz, 1999; Versfeld et al, 2000;
Wagener et al, 2003; Smits et al, 2004). The methods differ in such
aspects as the structure of the speech material, details of the test
procedure, presentation level, range of the signal-to-noise ratio,
type of interfering noise, and presentation mode. As far as the
structure of the speech material is concerned, one can distinguish two basic types of tests: sentence intelligibility tests
(Plomp & Mimpen, 1979; Nilsson et al, 1994; Kollmeier &
Wesselkamp, 1997; Versfeld et al, 2000), and word intelligibility tests (Runge & Hosford-Dunn, 1985; Pruszewicz et al,
1994a,b; Bosman & Smoorenburg, 1995; Martin, 1997). The
sentence tests can be divided into those using meaningful,
everyday utterances (Plomp & Mimpen, 1979; Smoorenburg,
1992; Nilsson et al, 1994; Kollmeier & Wesselkamp, 1997; Versfeld
et al, 2000; Ozimek et al, 2007), and those using semantically unpredictable sentences (Hagerman, 1982; Wagener et al,
1999a,b,c; Wagener et al, 2003). Many sentence tests have been
developed so far for measuring the speech reception threshold
(SRT) in noise, defined as the signal-to-noise ratio (SNR)
expressed in dB that yields 50% speech intelligibility. They are
based on utterances taken from everyday speech communication
(Kalikow et al, 1977; Plomp & Mimpen, 1979; Nilsson et al, 1994;
Kollmeier & Wesselkamp, 1997; Versfeld et al, 2000). Some pa-
pers have shown that the speech intelligibility score is mainly
determined by the SNR (Smoorenburg, 1992), but according to
others, the intelligibility depends both on the SNR and on the
presentation level (Hagerman, 1982; Studebaker et al, 1999) as
well as on the speaker (Versfeld et al, 2000).
The second important speech intelligibility test is based on word
material. Speech materials used in word intelligibility assessments
include logatomes, i.e. nonsense words (Welge-Lüssen et al, 1997;
Brachmański & Staroniewicz, 1999), numbers, and monosyllabic
words (Pruszewicz et al, 1994a,b). Another test category comprises tests based on sequences of digit pairs and digit triplets
(Smits et al, 2004; Wagener et al, 2005; Wilson et al, 2005; Ozimek
et al, 2009). It is suggested by some authors that, as compared with
word intelligibility tests, sentence intelligibility tests have proved
to be in many cases more accurate as a speech intelligibility
measure (Plomp & Mimpen, 1979; Hagerman, 1982; Nilsson et al,
1994; Kollmeier & Wesselkamp, 1997; Versfeld et al, 2000;
Wagener et al, 2003). They usually produce steeper psychometric
functions than tests based on single words (Bosman & Smoorenburg, 1995). Furthermore, word intelligibility tests are less suited
for use with a fluctuating noise masker due to listening in dips
(which might improve subjects’ performance), or signal processing algorithms with long time constants (such as compression).
An interesting comparison of the data obtained with word and
sentence intelligibility tests, applied to two groups of subjects
(with normal hearing and with sensorineural hearing loss), is given
in Wilson et al (2007).
ISSN 1499-2027 print/ISSN 1708-8186 online
DOI: 10.1080/14992020902725521
# 2009 British Society of Audiology, International
Society of Audiology, and Nordic Audiological Society
Edward Ozimek
Institute of Acoustics, A. Mickiewicz University, Ul.
Umultowska 85, 61-614 Poznań, Poland.
Email: ozimaku@amu.edu.pl
Received:
March 1, 2008
Accepted:
January 5, 2009
Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010
Differences between recognition of the whole sentence and
identification of single, isolated words is usually related to context.
Everyday utterances contain a lot of contextual information that
helps the listener to deduce unintelligible words in the presented
sentence. However, there is no general agreement among authors,
whether this cognitive issue increases or decreases applicability
and reliability of the tests. On the one hand, the tests are designed
to reflect the natural communication process that context is a part
of. In general, tests including context information enable more
accurate speech intelligibility measurement than semantically
unpredictable sentences (Brand & Kollmeier, 2002). On the other
hand, the cognitive abilities might not be constant across subjects;
it could introduce additional variability of data (Bronkhorst et al,
2002), which is not the case for tests based on semantically
unpredictable sentences.
The present study deals with the preparation and evaluation
of Polish sentence tests to be used for accurate measurement of
the SRT in noise. The tests are similar to the Dutch sentence
tests (Plomp & Mimpen, 1979; Versfeld et al, 2000), the
American sentence test (Nilsson et al, 1994), and the German
sentence test (Kollmeier & Wesselkamp, 1997). Similarity means
that all these tests comprise semantically neutral ‘everyday’
utterances of unfixed grammar structure. At present, there are
only a few Polish tests for the measurement of speech intelligibility. One word test developed by Pruszewicz et al (1994a,b)
consists of 10 word lists. Each list contains 20 words representing
the most frequent monosyllabic Polish nouns. Another Polish
speech test, the so-called Corpora (Grocholewski, 2001) database consists of a collection of short sentences (114) and
numbers (20 in each set), pronounced by about 70 different
speakers. It has been used mainly in the study of automatic
speech recognition. There is also a Polish test based on a
collection of 20 logatome sets, each set composed of three lists
and each list containing 100 logatomes (Brachmański &
Staroniewicz, 1999). This test has been mainly used for assessment of the quality of electroacoustic and transmission systems.
Despite the fact that Polish is used by a population of about
50 million people, there is no Polish sentence test optimized for
measuring speech intelligibility against a background noise. This
fact prompted us to develop such a test, designed for adults.
Speech materials used for reliable measurements of intelligibility
should produce very steep psychometric functions to permit the
detection of changes in intelligibility resulting from small
differences in signal-to-noise ratio. Moreover, to keep a high
accuracy of the measurement, the intelligibility across different
lists should not vary significantly. Therefore, the test lists should
yield statistically equivalent results, i.e. the SRT, the slope of the
psychometric function (S50) (see formula (2)), at the SRT point,
should not vary significantly across the lists chosen for speech
intelligibility measurement.
Development of speech material
Preparation and recording of the sentence material
INITIAL
SELECTION OF WRITTEN SENTENCES
them fulfilled the definition of a sentence, i.e. they included a
subject, an object, and a verb, and contained normal everyday
contexts. The following criteria were used in the automatic
selection of the sentences (Versfeld et al, 2000): the total number
of syllables in a sentence should be eight or nine; the words in the
sentences should not contain more than three syllables; and the
sentences should not contain punctuation characters and capitals
(excluding the initial capital). No duplicate sentences were
selected. The second stage of sentence selection was conducted
manually on the basis of the following criteria: the sentence should
be grammatically and syntactically correct and semantically
neutral; namely political, war, or gender topics were excluded.
Questions, proverbs, proper names, and exclamations were
eliminated. This process reduced the set of sentences to 2000.
In the Polish language, unlike in English, all verbs are subject
to declension with respect to the speaker’s gender. The phonemes
distinguishing the present and the past tense are usually
characterized by a relatively low energy. The low energy of these
grammatically important morphemes might result in more errors
and, consequently, increased scatter of the data both within a
given sentence list and between lists. Certain verbs that might
lead to ambiguities during the listening sessions and data
analysis were rejected. This limited the final set of sentences to
1200. The sentences chosen were composed of three to seven
words (average 4.6 words per sentence) or eight to nine syllables,
as suggested by Plomp & Mimpen (1979) and Versfeld et al
(2000). The minimal and maximal number of morphemes across
the sentences was six and nine, respectively (average 7.2
morphemes per sentence).
To see whether the phoneme distribution in the 1200 sentences
is the same as the average phoneme distribution of the Polish
language (Jassem, 1973), the chi-square test was performed. It
proved that the two distributions were nearly perfectly fitted
{x2 14.81, d.f. 38, p0.99}. Thus, the 1200-sentence set
could be regarded as phonemically balanced with respect to
average Polish speech.
RECORDING
OF SPEECH MATERIAL
The sentence set was read out in a radio studio by a professional
male speaker. The speaker’s age was 27 and he was a Polish radio
announcer. He was asked to pronounce the sentences in a
natural way. Therefore, in many sentences the level of the last
word was slightly lower resulting from a decreased vocal effort of
the speaker signalling the end of utterance. Recording was
performed using a Neumann U87 capacitor microphone. The
microphone output fed one of the input channels of a Yamaha
02R mixer. In the mixer, the microphone signal was preamplified and converted into the digital domain at a sampling
rate of 44.1 kHz and with a resolution of 24 bits. It was also
digitally high-pass filtered at a cut-off frequency of 80 Hz. The
signals were then sent via an optical connection (ADAT-type) to
a PC and stored on a computer hard disk using Samplitude Pro
v.8.2. software. To prepare a set of 1200 sentences, four recording
sessions were necessary, each lasting no longer than two hours.
Masking noise generation
The test was prepared in two stages. In the first stage, about 3500
sentences were selected automatically from a large digitalized
database containing about 16 million written Polish sentences
taken from everyday speech, literature, TV, and theatre. All of
During measurements, the test sentences were mixed digitally
with a speech babble noise (masker) with a constant level of 70
dB SPL. The babble noise was generated by summing up all of
the sentences tested and after adjusting the rms value, 70 dB SPL
434
International Journal of Audiology, Volume 48 Number 7
Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010
at the output of the Sennheiser HD580 headphones was
obtained. Waveforms representing single sentences were shifted
with respect to each other in the time domain, and, additionally,
some of them were reversed in the time domain. The sentences
shifted in the time domain, as well as those reversed, were
selected at random according to a uniform probability distribution. Moreover, the time shift applied was random too. This
resulted in the generation of stationary noise of the power
spectrum that was very similar to the spectra of the presented
sentences. As a result, a 15-s sample of the speech babble noise
was obtained. In the listening sessions, sentences (duration of the
longest one was 2.3 s) were presented against a background of a
randomly chosen portion of the babble noise.
The babble noise seems to be a much more effective masker
than, for example, International Telegraph and Telephone
Consultative Committee (CCITT) noise reflecting spectral
properties of English, German, Hungarian, Swedish, Russian,
and Italian languages (Tarnoczy, 1971). First of all, the CCITT
noise is characterized by a constant spectrum level up to 500 Hz
and decreases by 12 dB/octave above that frequency. However,
the long-time average spectrum (LTASS) of the Polish spoken
language is markedly different from that of the CCITT,
especially in a high-frequency region (Kosiel, 1972). The reason
is that the Polish language is a sort of ‘consonant’ language: it
contains six vowels only (plus two diphthongs) and thirty-one
consonants, especially fricatives (Jassem, 1973). The relatively
large number of consonants brings about a marked increase in
the spectrum level, especially above 5 kHz, where it reaches a
higher value than that for the English, Dutch, or German
languages. Moreover, using the noise of the power spectrum that
precisely matches the power spectra of speech signals yields
steeper psychometric functions than those obtained for less
‘spectrally fitted’ maskers (Wagener et al, 1999a,b,c, 2003).
The power spectral density of the babble noise, obtained by
means of a fast Fourier transform (frequency resolution of 40
Hz) is depicted in Figure 1 (solid line). The spectrum reveals a
Figure 1. The long-term power spectrum of the babble noise
(solid line) used in the present study. For comparison, the
spectrum of the masker used in the studies by Versfeld et al
(2000) (dashed line), and by Kollmeier & Wesselkamp (1997)
(dotted line) are presented.
Polish sentence tests for measuring the
intelligibility of speech in interfering noise
specific minimum below 5 kHz and some enhancement above 5
kHz, reflecting the energy of consonants (mainly fricatives)
typical of the Polish language. As can be seen from Figure 1, the
power spectrum density of the babble noise is somewhat different
from that used by Versfeld et al (2000) and by Kollmeier &
Wesselkamp (1997) which, for reasons of comparison, are also
presented in this figure. However, basic statistical properties of
the babble noise were very similar to those used by Kollmeier &
Wesselkamp (1997) and Versfeld et al (2000)(1). The differences
across the spectra presented reflect mainly individual properties
of the German, Dutch, and Polish languages as well as the
individual speaker’s voice features.
Listening sessions
APPARATUS
A computer-controlled Tucker-Davis Technologies (TDT) System 3 with a 24-bit digital real-time signal processor RP2, and a
headphone amplifier HB7 was used to play back the sentence
material with interfering noise at different SNRs. The level
(in dB SPL) of the output signal was calibrated with B&K
instruments (artificial ear type 4153, connected with microphone
type 4143, preamplifier type 2669, and amplifier type 2610).
Speech signals were presented monaurally via the Sennheiser
HD 580 headphones. Experimental sessions were controlled by
experiment-specific software implemented in Matlab 6.5 (MathWorks).
PROCEDURE
The sentence sound pressure level was changed to obtain
different SNRs. The SNR was defined as the ratio of the
sentence rms to the interfering noise rms. The interfering noise
was a gated signal (20-ms ramps) and started 300 ms before the
onset of the sentence, and ended 300 ms after the end of the
sentence. Thus, the duration of the masking signal was
determined by the duration of a masked utterance. All sentences
were presented at three basic SNRs: 9 dB, 5 dB, and 1
dB. These three SNRs were chosen as the levels that optimally
encompassed the average SRT value, determined on the basis of
the pilot study. However, to obtain a more reliable fit of the
psychometric function to intelligibility data, two additional
SNRs were chosen individually with respect to each sentence.
One of the additional SNRs was chosen from the range 1 to
5 dB, while the other was from the range 5 to 9 dB. Exact
values of the additional SNRs depended on the highest absolute
values of the second derivate of the psychometric function.
Intelligibility scores obtained for the five SNRs allowed a much
better fit of the psychometric function to each sentence.
The listening session was self-paced and was controlled by the
subject, who was placed in an acoustically isolated room. All
instructions were displayed on an LCD screen. When the
ENTER key was pressed for the first time, written instructions
appeared on the monitor. Pressing this key for the second time
caused the playback of one sentence via the TDT 3 system as
described above. The subject’s task was to repeat the presented
sentence as accurately as possible, and his/her response was
recorded on the computer hard disk. Both oral as well as written
responses were registered. The recording of the subjects’ oral
answers was carried out by means of a separate signal channel (a
condenser Sennheiser E914 microphone, a Yamaha MG10/2
Ozimek/Kutzner/Se˛ k/Wicher
435
Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010
mixer with preamplifier, and an external soundcard for a PC).
By pressing the ENTER key again, the recording of the subject’s
answer was stopped, and the subject’s task was to type the
sentence heard. To start listening to a new sentence, the subject
had to press the ENTER key again. It should be emphasized
that subjects were instructed to provide an answer even if they
had understood only one word in a sentence. The double
recording of the subject’s answer, i.e. collecting both oral and
‘typed’ responses, was aimed at minimizing potential ambiguities
that might result from, for example, typing mistakes. In case of
any ambiguities or typing errors in the ‘typed’ response, the
subject’s recorded oral response was considered.
Each sentence from the initial set of 1200 was presented to
each subject once. This means that each subject evaluated each
sentence for one value of SNR only. To determine intelligibility
function for each sentence, five different SNRs were necessary.
Therefore, 35 subjects were divided into seven virtual groups and
each group evaluated one out of five different SNRs. The virtual
groups were created for each sentence. The goal behind this
approach was to avoid a situation where subjects from one group
were systematically presented with one SNR only. Each SNR was
presented to each subject 240 times (24051200).
SUBJECTS
Thirty-five subjects (18 male and 17 female, aged from 18 to 25
years, mean age 22.7) participated in the experiment. The
subjects had hearing levels (HL) of less than 10 dB HL at all
audiometric frequencies up to 8 kHz and had no history of
hearing disorders. All of them were native, monolingual Polish
speakers. Prior to the actual measurements, each subject had
been trained for one to two hours to become familiar with the
task. This training applied also to all subjects who participated
in the next experiments. The additional sentences that had been
employed in the training sessions were not used in the main
experiment. Collection of responses for all 1200 sentences for
one subject took about eight hours, but the duration of a single
measurement session was limited to two hours. The subjects were
allowed to have a rest whenever they wished to. The subjects
were paid for their participation in the measurements.
meaning, the word order in subjects’ responses was not taken
into account, i.e. if a word was repeated correctly but at an
altered sentence position, it was scored as a correct answer. This
approach is called the ‘word scoring’ method.
In order to determine the intelligibility score for a given
sentence and SNR, intelligibility data was pooled across subjects
and a ratio of correctly repeated sentences and a ratio of
correctly repeated words for sentence and word scoring, respectively, were computed.
Determination of psychometric functions and their parameters
The psychometric (intelligibility) function, i.e. the function
which links the probability of correct response to the SNR, is
usually approximated by the standardized cumulative normal
distribution (Versfeld et al, 2000; Smits et al, 2004). It is
generally assumed that the relationship between speech intelligibility and the SNR is expressed by function F(SNR) (1):
SNRSRT
s
100
F(SNR) pffiffiffiffiffiffi
2p
g
t2
2
e
dt
(1)
The cumulative normal distribution function, F(SNR), is
characterized by two parameters: SRT (i.e. the signal-to-noise
ratio that produces 50% correct responses) and the standard
deviation, s, characterizing a spread of data at the SRT point.
The standard deviation, s, determines the S50 parameter, i.e. the
slope of the cumulative distribution at the SRT point, which can
be expressed by the following equation:
100
S50 pffiffiffiffiffiffi
s 2p
(2)
Intelligibility scores for each sentence were computed in two
ways. In the first method, the subject’s response was considered
correct when all the words constituting the presented sentence
and their order were repeated correctly. In this case the score was
100%. Otherwise, the score was set to 0% (Versfeld et al, 2000).
This type of scoring is called hereafter ‘sentence scoring’. Since
sentence scoring produces data of a binary nature, this set is
aimed at an adaptive up/down procedure that is commonly used
in laboratory and clinical measurements.
The second way of scoring was based on the correct repetition
of individual words in a sentence. The intelligibility score in this
case was estimated as a ratio of the number of correctly repeated
words to the overall number of presented words, multiplied by
100% (Kollmeier & Wesselkamp, 1997; Wagener et al, 1999a,b,c;
Wagener, 2003). Missed words were treated as incorrectly
repeated words. Moreover, since in Polish it is often the case
that a change of the verb position does not affect the sentence
Having five data points describing the probability of a correct
response at five SNRs for each sentence, it was possible to apply
maximum likelihood (ML) for finding the SRT and the corresponding slope S50. The iterative procedure searched for the SRT
and S50 by means of minimization of a negative logarithm of
likelihood ratio (Versfeld et al, 2000, Wagener et al, 2003).
The slope S50 of the psychometric function for a given sentence might be changed by a modification of intelligibility of
particular words in this sentence (Kollmeier & Wesselkamp,
1997). For instance, the slope increases and, consequently, the
corresponding standard deviation decreases, if the intelligibility of the respective words is equalized. Hence, S50 indirectly
describes the spread of intelligibilities of individual words
constituting the sentence. However, in this work, the accuracy
of speech tests was obtained by means of a selection of sentences characterized by relatively steep psychometric functions
(Versfeld et al, 2000; Smits et al, 2004).
For each sentence, two psychometric functions were fitted to
results pooled across subjects: one for sentence scoring and one
for word scoring. Figure 2 presents an example of psychometric
functions fitted to intelligibility scores for two chosen sentences
(left and right panel, respectively). Circles and solid lines present
intelligibility data and the corresponding psychometric functions
based on sentence scoring. Squares and dashed lines depict
intelligibility data and the fitted psychometric functions for the
word scoring method.
436
International Journal of Audiology, Volume 48 Number 7
Intelligibility functions
INTELLIGIBILITY
SCORING
100
90
90
80
80
70
60
50
40
30
Sentence scoring
SRT= -6.7 dB, S50=24.9 %/dB
Word scoring
SRT= -8.2 dB, S50=17.4 %/dB
20
10
0
-11
-10
-9
-8
-7
-6
-5
-4
-3
-2
Speech intelligibility [%]
Speech intelligibility [%]
100
70
60
50
40
30
Sentence scoring
SRT= -5 dB, S50=22.2 %/dB
Word scoring
SRT= -6.9 dB, S50=15.4 %/dB
20
10
0
-11
-1
-10
-9
-8
Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010
SNR [dB]
-7
-6
-5
-4
-3
-2
-1
SNR [dB]
Figure 2. Intelligibility data (circles for sentence scoring and squares for word scoring) and fitted psychometric functions (solid line
for sentence scoring and dashed line for word scoring) as an example of two sentences (left panel: Rzadko chodza˛ do teatru: They
rarely go to a theatre; right panel: Wiele dzieci tam sie˛ bawi: Many children are playing there).
As shown in Figure 2, the intelligibility scores based on
sentence and word scoring are very similar at relatively high
SNRs, while for low SNRs the intelligibility for word scoring is
better than that for sentence scoring. This simply results from
the scoring method itself: for low SNRs subjects are unable to
repeat the whole sentence correctly. However, it happens quite
often that they repeat correctly one or more (yet not all) words in
a sentence. Taking into account that sentences consisted on
average of 4.6 words and assuming that one word was correctly
repeated, the word scoring method results with the intelligibility
of about 2025%, as depicted in the right panel of Figure 2 for
SNR 9 dB. On the other hand, at SNR 1 dB the
differences in intelligibility score for both scoring methods are
negligible. Thus, the psychometric functions obtained in the case
of word scoring reveal a smaller S50 and lower SRT than these
for sentence scoring, as expected.
In this way, two sets of 1200 SRT and S50 values were
determined, i.e. two (one for sentence scoring and one for word
scoring) SRT values and two corresponding S50 values for each
sentence. The most important statistics of the fitted psychometric functions, determined SRTs, and S50s, are presented in
Table 1.
Composition of the final sentence lists
The composition of the final sentence lists was aimed at
developing statistically and phonemically equivalent sentence
lists. It is important to minimize the scatter of SRTs across lists
so that test reliability is adequate for clinical purposes. To do
this, it was decided that the SRT of any sentence to be included
in the final lists should not exceed the range of 91.5 dB with
respect to the mean SRT of all sentences tested, i.e. 6.1 dB and
7.3 dB for sentence and word scoring, respectively. Moreover,
to reduce the data scatter resulting from inequality of the
intelligibility of individual words, only sentences with slope S50
not smaller than 15%/dB were selected. As a result, 500 and 440
sentences fulfilling the above criteria for the sentence and word
scoring methods, respectively, were selected. In this way, it was
possible to create 25 lists (20 sentences each) for the sentence
scoring method, and 22 lists (20 sentences each) for the word
scoring method. The last stage of developing the Polish sentence
test was to compile the sentences into final lists, so that the
following criteria would be met:
. the lists should be statistically equivalent, i.e. the average SRT
and S50 within each list must not depend on the list number,
. the lists should contain phonemically comparable content.
The phoneme content should also be consistent with the
reference phoneme distribution of the Polish language (Jassem, 1973).
A special algorithm was prepared and implemented in Matlab
7.0 (MathWorks), which realized Monte Carlo simulations. The
steps in the compilation were as follows:
1. Random permutations of 500 and 440 sentences for the
sentence and word scoring, respectively, were generated.
2. The random series of sentences were grouped in 25 (sentence scoring) and 22 (word scoring) preliminary lists of 20
sentences each.
Table 1. Basic statistics of the SRTs and S50s for sentence and word scoring methods.
Sentence scoring
SRT [dB]
S50 [%/dB]
Word scoring
mean
s
min
max
Kurtosis
(normalized)
mean
s
min
max
Kurtosis
(normalized)
6.1
22.6
1.9
14.2
8.9
10.1
3.5
35.1
0.71
0.73
7.3
19.3
2.3
13.7
9.9
9.4
3.5
31.1
0.86
0.83
Polish sentence tests for measuring the
intelligibility of speech in interfering noise
Ozimek/Kutzner/Se˛ k/Wicher
437
This algorithm led to the composition of a set of equivalent
lists. Finally, 25 statistically and phonemically equivalent lists
containing 20 sentences each for sentence scoring and 22
statistically and phonemically equivalent lists containing 20
sentences each for word scoring were obtained. The list-specific
SRTlist for a list of 20 sentences was calculated as the average
SRTs across the sentences in the lists. The list-specific psychometric function, i.e. the function that models speech intelligibility, for a list composed of different sentences of different
SRTs and S50 can be computed according to the so-called
probabilistic model proposed by Kollmeier (1990). This model
states that the list-specific intelligibility function is a convolution
of a mean psychometric function (i.e. a function of S50 equal to
mean S50 averaged across the sentences in an analysed list) and a
distribution of SRTs within a list. The slope of the list-specific
function, i.e. list-specific slope S50list, can be calculated directly
according to the following equation (Kollmeier, 1990):
S50mean
S50list : sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
16S50mean2 sSRT2
1
(ln(2e1=2 1 2e1=4 ))2
(3)
LISTS
FOR SENTENCE INTELLIGIBILITY SCORING
The goodness-of-fit x2 test revealed that there were no significant differences in the mean phoneme distribution between
the sentence lists and the reference phoneme distribution for the
Polish language {x2 15.31, d.f.38, p0.99}. The phoneme
balance is at least as good as the ones characterizing the tests
developed by other authors (Kollmeier & Wesselkamp, 1997;
Versfeld et al, 2000). Figure 3 presents the average SRT values
and corresponding standard deviations for respective lists.
The average SRT values for individual lists lie within the range
from 6.3 dB to 5.8 dB (mean SRT6.1 dB). The mean
list-specific slope S50list (determined according to the formula
(3)) is 25.5%/dB (i.e. 0.255 dB 1). The separate one-way
ANOVA analyses showed that neither the SRTs {F(24,499)
0.54, p0.96} nor the slope S50 {F(24,499) 0.55, p 0.96}
varied significantly across the lists.
LISTS
FOR WORD INTELLIGIBILITY SCORING
The goodness-of-fit x2 test revealed that there were no differences in the mean phoneme distribution between the sentence lists and the reference phoneme distribution {x2 15.41,
d.f. 38, p 0.99}. Figure 4 depicts the average SRT values and
the corresponding standard deviations for the respective lists.
The intelligibility functions for word scoring reveal generally a
smaller slope than those determined for sentence scoring; the
mean list-specific slope is 20.8%/dB, i.e. close to that characterizing the German sentence test for word scoring (Kollmeier &
Wesselkamp, 1997), i.e. 19.2%/dB. The average SRT values for
the lists vary in the range from 7.6 dB to 7.2 dB. The mean
SRT 7.5 dB is 1.4 dB lower than that characterizing the
set for sentence scoring, as expected (see section: Determination
of psychometric functions and their parameters, above). The
ANOVA revealed that neither the SRT nor the S50 varied
significantly across lists ({F(21,439) 0.29, p0.99} and
{F(21,439) 0.40, p0.99}, respectively). Therefore, the lists
developed for word scoring might be also regarded as statistically and phonemically equivalent.
-3
SRT averaged across sentences [dB]
Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010
3. For 25 sentence scoring lists, two separate one-way ANOVAs
were performed with respect to the SRT and slope S50,
respectively. The lists were regarded as equivalent when there
were no statistically significant differences between the
average SRTs across the lists and average S50s. When the
SRTs across the lists and S50s were significantly different, i.e.
if there was statistically significant difference in one of the
analysed parameters, steps 13 were repeated. The same
analysis was carried out independently with respect to the 22
lists for word scoring.
4. In the last step, phoneme distribution analysis was performed. A specially designed algorithm transformed the
written sentences into SAMPA-broad (Polish extension)
(Wypych et al, 2003) phonemic code, taking into account
the co-articulation effects using phoneme distribution rules.
Subsequently, phonemic distributions for each list were
determined and compared with the reference distribution for the Polish language (Jassem, 1973). The lists were
regarded as phonemically balanced if, for each phoneme and
each list, the frequency of occurrence of any phoneme did
not exceed the range of92.5 percentage points with respect
to mean frequency of occurrence for that phoneme in the
Polish language. If a sentence permutation meeting the above
criteria was not found, a new permutation was generated and
steps 14 were repeated.
-4
-5
-6
where S50mean is a mean slope (expressed in dB 1, for example
20%/dB00.2 dB1) averaged across the sentences in a list, and
sSRT is the standard deviation of the SRT across the sentences in
a list. As follows from the above formula, a large list-specific
slope requires both high slope values characterizing individual
sentences as well as a small spread of the sentence-specific SRT
values 3. In an ideal situation, there is no spread of SRT values
for the respective sentences, sSRT 0 dB and, consequently,
S50list S50mean. Conversely, if sSRT 0 dB, S50list BS50mean. A
large S50list implies a low standard deviation of SRT estimation,
SDSRT (Smits & Houtgast, 2006).
Figure 3. List-specific SRTs (filled circles) for respective lists
optimized for sentence scoring (error bars depict standard
deviations across sentences)
438
International Journal of Audiology, Volume 48 Number 7
-7
-8
-9
mean SRT= -6.1 dB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
list index
THE
-4
mean SRT = -7.5 dB
-5
-6
-7
-8
-9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010
list index
Figure 4. List-specific SRTs (filled circles) for respective lists
optimized for word scoring (error bars depict standard deviations across sentences)
Verification of the reliability of the sentence tests
In the final stage of this study, verification experiments were
carried out to check the reliability of the developed Polish
sentence materials. The retest measurements were conducted by
means of the constant stimuli paradigm as well as by the
adaptive procedure.
CONSTANT
STIMULI PARADIGM
A new group of 20 subjects (12 male and 8 female, mean age 24.8
years), with normal hearing, participated in the measurements.
All of them were native, monolingual Polish speakers. They were
paid for taking part in the experiments. For ten subjects, the
intelligibility data were estimated using the sentence scoring, and
for the other ten subjects using the word scoring. Each subject
was presented with five randomly chosen lists from each list set.
As in traditional speech audiometry, the randomly chosen lists
were presented at different SNRs. In this case, SNR values were
as follows: 10,8, 6, 4, and2 dB. Each list was presented
to a given subject only once. For each subject, one SRT value
and the corresponding S50 were determined (note that in this
experiment the scatter of data both within and between the lists
affected the measurement, hence S50 corresponds to S50list). For
the list set optimized for sentence scoring, the mean SRT and the
mean S50list averaged across the subjects were 6.0 dB and
24.2%/dB, respectively. The results of the retest measurements,
therefore, corresponded very well to the test values, i.e. 6.1 dB
and 25.5%/dB, respectively. Standard deviation of SRT across
the subjects was 0.6 dB.
As far as the data for word scoring are concerned, mean SRT
and mean S50 averaged across subjects were 7.3 dB and 21.9%/
dB, respectively. In this case, the retest values also turned out to
be very close to those obtained in the first experiment, i.e. 7.5
dB and 20.8%/dB, respectively. Standard deviation of SRT across
subjects was equal to 0.8 dB. Therefore, the results of the retest
measurement for the constant stimuli paradigm confirmed the
statistical properties of the materials developed.
Polish sentence tests for measuring the
intelligibility of speech in interfering noise
ADAPTIVE PROCEDURE
Since speech intelligibility measurements are often carried out
using an adaptive procedure, in the next step the statistical
equivalence of the sentence lists was reanalysed by means of a
staircase procedure that enables a direct SRT estimation for
each list. This procedure could be used only for sentence
scoring.
The main retest measurements were carried out using the
standard adaptive procedure with the one-up/one-down decision
rule, i.e. after a correct or incorrect subject’s response, the SNR
was either decreased or increased. The initial SNR was set to 2
dB, i.e. the speech was relatively easily understood at the
beginning of the measurement. The initial step was 2 dB and
was reduced to 1 dB after the first incorrect response. The SRT
was computed as the mean of the last 12 SNR values, including
the last SNR determined by the adaptive procedure (Smits et al,
2004; Smits & Houtgast, 2006). The main advantage of the
adaptive procedure over the constant stimuli method is that only
one list is required to estimate the SRT value, while for the
constant stimuli method, at least two different lists are needed to
determine the psychometric function and the SRT point. Moreover the intelligibility scores for these SNRs have to encompass
50% intelligibility point, thus in practice some preliminary
testing is often required prior an actual SRT measurement,
which is not the case for the adaptive method.
A new group of 10 (four male and six female, mean age 23.2)
normally-hearing subjects took part in the experiments. All of
them were native, monolingual Polish speakers. The subjects
were paid for their participation in the experiments. Each subject
was presented with each sentence list taken from the set for
sentence scoring. Figure 5 presents the results of measurements,
i.e. SRT values averaged across subjects and the corresponding
standard deviations for respective sentence lists.
The obtained SRTs were subjected to within-subject ANOVA
with respect to the list index. The ANOVA revealed that the SRT
did not depend on the list index {F(24,249)0.85, p0.67}, i.e.
statistical balance of the respective sentence list was proved.
-3
-4
mean SRT = -6.2 dB
-5
SRT [dB]
SRT averaged across sentences [dB]
-3
-6
-7
-8
-9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
list index
Figure 5. Mean SRTs (filled circles) and corresponding standard deviations (error bars) obtained in retest measurement
using the adaptive procedure (data averaged across subjects).
Ozimek/Kutzner/Se˛ k/Wicher
439
Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010
The purpose of this study was to develop a Polish sentence test
for speech intelligibility measured in the presence of masking
noise. The obtained SRT and slope data have been referred for
comparison reasons to the data obtained for other European
languages. In Table 2, the data and properties of various
sentence intelligibility tests developed by different authors and
of those of the current study (shaded bottom section) are
compared. It was found, for example, that the mean SRT for
Polish sentence materials was close to the SRT for the German
Göttingen test based on the word scoring method (Kollmeier &
Wesselkamp, 1997), i.e. 6.23 dB. On the other hand, the SRT
for Polish materials optimized for word scoring is about 1 dB
lower than the Göttingen test, and is comparable to the SRT
characterizing semantically unpredictable as well as syntactically and lexically constrained material (Hagerman, 1982;
Wagener et al, 1999a,b,c, 2003). The mean SRT of the Polish
sentence lists optimized for sentence scoring was approximately
2 dB lower than that obtained for Dutch tests (Plomp &
Mimpen, 1979; Smoorenburg, 1992; Versfeld et al, 2000), and
about 3 dB lower than the SRT characterizing American
English materials (Nilsson et al, 1994). Some similarities and
differences can be found in the S50 parameter obtained for
Polish and other tests. For example, the S50list obtained for the
word scoring method (20.8%/dB) is similar to that characterizing the German Göttingen test (19.2%/dB), (Kollmeier &
Wesselkamp, 1997), but is clearly different from slopes obtained
for Danish (13.2%/dB) (Wagener, 2003), or Swedish (16.0%/dB)
(Hagerman, 1982).
The differences across languages in SRTs and slopes may be
due to many factors, such as: differences in linguistic structure
between languages, an amount of contextual information, the
type of used noise (masker), the speaker, and the presentation
mode. As can be seen from the data depicted in Table 2, there are
significant differences in the masking signal, scoring method,
predictability, and psychometric function parameters. Values of
SRT vary across the tests from 8.4 dB (Wagener et al, 2003) to
2.9 dB (Nilsson et al, 1994). SRTs are typically lower for
speech materials of a limited vocabulary (Hagerman, 1982;
Wagener et al, 1999a,b,c; Wagener et al, 2003), while SRTs are
the highest for Plomp-like tests (Plomp & Mimpen, 1979;
Smoorenburg, 1992; Versfeld et al, 2000). This difference might
be a consequence of using a very limited number of words in a
test. For example, the entire OLSA or DANTALE tests are
composed of 50 words (Wagener et al, 1999a,b,c, 2003). In
contrast, in Plomp-like tests, neither word content nor grammar
structure are constant across the respective lists.
What should be emphasized is the role of masker type used in
different studies on the intelligibility data4. Although the longterm spectra of the different types of masker may be very similar,
they may differ significantly in their temporal structure, e.g. a
band of white noise versus a band of low-noise noise at the same
bandwidth and power (Kohlrausch et al, 1997).
Another important issue is the choice of the optimal number
of sentences used in a list, which determines the number of
equivalent lists that can be developed from the selected sentence
material and the duration of the intelligibility measurement.
This topic is now under our investigation. The first data
revealed that the 20-sentence lists used in this experiment can
be reduced to 13-sentence lists without diminishing the accuracy
of the SRT measurement and the phonemic balance (Ozimek et
al, 2007). This number corresponds well with that proposed by
Plomp & Mimpen (1997) and Versfeld et al (2000). From a
practical point of view, it is more desirable to have 37 equivalent
13-sentence lists than 25 equivalent 20-sentence lists, because it
permits a greater number of measurements. However, in the first
case, the result of an adaptive measurement might be biased by
an inappropriately chosen initial SNR value. In order to
determine an individual SRT value it is sufficient to collect
the intelligibility scores for three to five randomly selected lists
presented at different SNRs depending on the patient’s hearing
loss. Determination of the average SRT based on four randomly
chosen lists takes about 15 minutes (the constant stimuli
method). If the ‘typed’ subjects’ responses are not recorded
(like in the traditional speech CVC audiometry), the time of
data collection for one patient should decrease to less than 10
minutes. Speech intelligibility assessment using the adaptive
procedure with the one-up/one-down decision rule is the most
effective, since in this case a measurement takes approximately
four minutes and one list only is required to estimate the SRT
value.
One of the most important aspects of the speech intelligibility
study is to apply the speech material in a real clinical situation to
determine the intelligibility of the subjects with hearing loss. It is
well known that hearing-impaired subjects show worse performance in speech intelligibility tests than normal-hearing subjects
do. The intelligibility deterioration may be attributed to several
factors such as: diminished audibility, reduced frequency
selectivity, loudness recruitment etc. We have recently undertaken an investigation applying developed sentence test to
hearing-impaired subjects, showing sloping audiograms at
higher frequencies, notch-like hearing loss within the range of
24 kHz, and for subjects with tinnitus determined as a
440
International Journal of Audiology, Volume 48 Number 7
The minimal and maximal SRTs are 6.9 dB (list no. 4) and
5.8 dB (list no. 10), respectively, while the minimal and
maximal standard deviation across subjects is 0.6 dB (list no. 1)
and 1 dB (list no. 4), respectively. These values correspond very
well to the inter-individual standard deviations characterizing
the retest data obtained by Versfeld et al (2000), 1.07 dB; Plomp
& Mimpen (1979), 0.9 dB; and Nilsson et al (1994), 1.13 dB.
The mean SRT (averaged across the lists) and the corresponding standard deviation are 6.2 dB and 0.2 dB, respectively. The
standard deviation across the lists corresponds very well to 0.3
dB obtained by Kollmeier (1990), while it is slightly lower than
the standard deviation across the sentence lists developed by
Versfeld et al (2000), i.e. 0.6 dB. The mean retest value obtained,
i.e. 6.2 dB, is very close to the expected value, i.e. 6.1 dB;
thus, the result of the retest measurements has confirmed the
high accuracy and repeatability of the sentence materials
developed.
Apart from the standard deviations describing the interindividual differences and the differences between the respective lists, the standard deviations of the SRT estimates were
determined, i.e. the spread of SNRs at which the last 12
sentences were presented was analysed. The mean sSRT was 1.7
dB, and corresponded very well to the theoretical value that can
be computed by substituting the mean S50list value to equation
(2), i.e. 1.6 dB.
Discussion
Ozimek/Kutzner/Se˛ k/Wicher
Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010
Polish sentence tests for measuring the
intelligibility of speech in interfering noise
Table 2. Properties of sentence intelligibility tests reported by different authors and obtained in the current study (shaded bottom section).
No.
1
Language
Masker
Scoring
method
SRT [dB] Mean slope
[%/dB]
Speaker
Remarks
2
Dutch (Plomp and Mimpen,
1979)
Dutch (Smoorenburg, 1992)
3
Dutch (Versfeld et al, 2000)
4
5
German (Kollmeier and
Wesselkamp, 1997)
German (Wagener, 1999a,b)
Speech-shaped
stationary noise
Speech-shaped
stationary noise
Individually
shaped white noise
Superposition of
monosyllabic words
Babble noise
Sentence scoring
4.5
15.9*
Female
Sentence scoring
3.7
17.7*
Male
Semantically predictable sentences (Plomptype test)
Plomp-type test
Sentence scoring
4.1
16.3*
Male, female
Plomp-type test
Word scoring
6.2
19.2**
Male
Plomp-type test (‘Götingen test’)
Word scoring
7.1
17.1**
Male
Male
Semantically unpredictable sentences (‘Oldenburg’ Satztest, OLSA), fixed grammatical
structure
Semantically unpredictable sentences
(‘DANTALE II’ test), fixed grammatical
structure
Semantically unpredictable sentences, fixed
grammatical structure
Plomp-type test, adaptive procedure
6
Danish (Wagener, 2003)
Babble noise
Word scoring
8.4
13.2**
Female
7
Swedish (Hagerman, 1982)
Babble noise
Word scoring
8.1
16.0*
Female
8
2.9
Sentence scoring
Sentence scoring
Word scoring
Sentence scoring
6.1
6.1
7.5
6.7
25.6**
25.5**
20.8**
22.5**
Male
Male
Male
Male
Plomp-type
Plomp-type
Plomp-type
Plomp-type
13
Polish
Speech-shaped
stationary noise
Babble noise
Babble noise
Babble noise
Speech-shaped
stationary noise
Speech-shaped
stationary noise
Sentence scoring
9
10
11
12
American English
(Nilsson, 1994)
Polish
Polish
Polish
Polish
Word scoring
7.8
18.6**
Male
Plomp-type test (25 lists of 20 sentences)
*mean slope S50mean, **list-specific slope S50list determined according to formula (3).
test
test
test
test
(37
(25
(25
(25
lists
lists
lists
lists
of
of
of
of
13
20
20
20
sentences)
sentences)
sentences)
sentences)
441
Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010
phantom auditory perception (Jastreboff, 1990; Ozimek et al,
2006a). The preliminary data indicate that for the subjects with
the sloping audiograms and hearing loss less than 40 dB HL, the
mean SRT for sentence scoring is 2.3 dB, while for subjects
with hearing loss above 40 dB HL, the mean SRT is higher and
amounts to 0.7 dB (the corresponding mean SRT for sentence
scoring and normal hearing subjects is 6.1 dB as shown in this
study). Similar data have been obtained for the subjects with
notch-like hearing loss. As can be seen, a separation in SRTs
between the normal and hearing-impaired subjects is of the
order of 3.8 to 5.4 dB and is narrower and lower than those
(410 dB) obtained by Wilson at al (2007). The preliminary data
also show that for the tinnitus subjects (without hearing loss),
the mean SRT equals 4.2 dB, and is higher by 1.9 dB than
that for normal-hearing subjects, indicating the effect of tinnitus.
It is worth noticing that even small differences in SRT between
the normal-hearing and hearing-impaired subjects indicate
significant changes in the speech intelligibility. For example
the difference in speech-to-noise ratio by about 1 dB produces
changes in intelligibility of about 719% (Nilsson et al, 1994). A
substantial data set related to our on-going study with hearingimpaired subjects will be reported in a separate paper.
We would like to stress finally, that to the best of our
knowledge, the Polish sentence test reported in this study is
the first test of this type among Western Slavonic languages. The
family of Slavonic languages comprises three main groups:
Eastern (Russian, Ukrainian, and Byelorussian), Western (Polish, Czech, and Slovak), and Southern (Serbian-Croatian,
Bulgarian, Slovenian, and Macedonian) languages, used by a
population of about 300 million. This fact emphasizes the need
for developing intelligibility tests for those languages.
Notes
This work was financially supported by the European Union
FP6, Project 004171 HEARCOM, and the State Ministry of
Education and Science. A portion of this work was presented at
the 19th International Congress on Acoustics, Madrid, 27
September 2007.
[1] Characterizing some statistical properties of the babble noise
used in this study it was found that the crest factor Cf,
reflecting the dynamic range of amplitude fluctuations, and
normalized kurtosis (kurt), characterizing a relation of the
distribution of instantaneous values to the normal distribution, were Cf 4.51 and kurt 1.02 (kurt m4/(3s4), where
m4 is the fourth central moment and s is the standard
deviation.) These values are similar to Cf 4.65 and kurt
1.04, determined for masker used by Versfeld et al (2000),
and to Cf 4.55 and kurt 0.98, determined for masker used
by Kollmeier & Wesselkamp (1997). Furthermore, the Cf and
kurt values for the envelopes of these noises were also similar:
Cf 3.84, kurt1.15 in Versfeld et al (2000); Cf 3.30,
kurt1.05 in Kollmeier & Wesselkamp (1997); and Cf 3.48, kurt1.15 in the present study.
[2] Although there are some standards aimed at optimizing the
rms measurement of a speech signal (for example, ITU
standard P.56 (1994) ‘Objective measurement of active
speech level’) a classical method of rms calculation was
employed in this study. The main reason for using the
classical method of rms calculation was to compare the
obtained results with the outcomes of the previous studies
(Kollmeier & Wesselkamp, 1997; Versfeld et al, 2000).
Furthermore, if the ITU norm had been used, it would be
impossible to apply the Polish sentence material to standard
clinical audiometers.
[3] Equation (3) can also be used for a single sentence.
According to this equation, a large S50 of a sentence implies
a comparable intelligibility of words constituting the sentence.
[4] Our preliminary data show that, for the developed speech
material, the approximate slope of psychometric functions
for speech-shaped and CCITT maskers is 22%/dB and 14%/
dB, respectively (Ozimek et al, 2006b). The following SRTs
have been obtained: speech-shaped white noise 7 dB and
CCITT 16 dB, respectively. Therefore, the spectral and
temporal properties of the masking signal have a direct
influence on S50 and SRT values. Moreover, sensations
produced by different maskers could influence the perception
of speech under noisy conditions and, consequently, the
parameters of the psychometric functions. An analysis of
values of the perceptual parameters such as: loudness,
sharpness, and roughness of the maskers used in different
studies have shown that they are not equal. Definitions and
significance of these parameters are given in Zwicker & Fastl
(1999). Estimation of these parameters was performed using
the ArtemiS v.4.00.300 software (HEAD acoustics GmbH)
and was followed by normalization of the masker’s level to 70
dB SPL. It has turned out that the masker used in the present
investigation is characterized by higher loudness (G37.2
sone) in comparison with G 31.8 sone, determined for the
masker used by Versfeld et al (2000), and G33.9 sone,
determined for the masker used by Kollmeier & Wesselkamp
(1997). The so-called sharpness of maskers in the present
study was S3.7 acum, while that in the study by Versfeld
442
International Journal of Audiology, Volume 48 Number 7
Conclusions
This study allows us to draw the following conclusions:
. The mean SRT and mean S50list values estimated for the Polish
sentence tests using the sentence scoring method were 6.1
dB and 25.5%/dB, respectively, while those estimated using
the word scoring method were 7.5 dB and 20.8%/dB. Thus,
SRT and S50 values resulting from sentence scoring are
different from those for word scoring.
. The slope of the psychometric functions for the Polish
sentence test optimized for sentence scoring is slightly higher
than that for other languages. The difference may be related to
the type of masker used, and the linguistic structure of the
speech material in various languages.
. The Polish sentence lists are reliable, accurate, and bring
about repeatable speech intelligibility data, when speech is
presented against an interfering noise.
Acknowledgements
et al (2000) was S3.0 acum; and in the study by Kollmeier
& Wesselkamp (1997) it was S2.7 acum. The roughness of
the masker in the present study was R3.3 asper; while that
in the paper by Versfeld et al (2000) was R 2.8 asper; and in
Kollmeier & Wesselkamp’s study (1997), R3.2 asper.
Downloaded By: [International Society of Audiology] At: 14:02 18 January 2010
References
Bosman, A.J. & Smoorenburg, G.F. 1995. Intelligibility of Dutch CVC
syllables and sentences for listeners with normal hearing and with
three types of hearing impairment. Audiology, 34 260284.
Brachmański S. & Staroniewicz P. 1999. Fonetyczna struktura materialu
testowego stosowanego w subiektywnych pomiarach jakości mowy.
(in Polish) (Phonetic structure of test material for subjective
assessments of speech quality). Speech and Language Technology.
Poznań, 3, 7180.
Brand, T. & Kollmeier, B. 2002. Efficient adaptive procedures for
threshold and concurrent slope estimates for psychophysics and
speech intelligibility tests. J Acoust Soc Am, 111, 28012810.
Bronkhorst, A.W., Brand, T. & Wagener, K. 2002. Evaluation of context
effects in sentence recognition. J Acoust Soc Am, 111, 28742886.
Grocholewski, S. 2001. Statystyczne podstawy systemu ARM dla je˛ zyka
polskiego. (in Polish) (Statistical description of automatic speech
recognition system for Polish language). Poznań: Wydawnictwo
Politechniki Poznanskiej.
Hagerman, B. 1982. Sentences for testing speech intelligibility in noise.
Scand Audiol, 11, 7987.
ITU. 1994. Objective measurement of active speech level. Standard P.56.
Jassem, W. 1973. Podstawy fonetyki akustycznej (in Polish) (Foundations
of Acoustical Phonetics) Warszawa: PWN.
Jastreboff P.J. 1990. Phantom auditory perception tinnitus; mechanism
of generation and perception. Neurosci Res (NY), 8, 221254.
Kalikow, D.N., Stevens, K.N. & Elliot, L.L. 1977. Development of a test
of speech intelligibility in noise using sentence materials with
controlled word predictability. J Acoust Soc Am, 61, 13371351.
Kohlrausch, A., Fassel, R., van der Heijden, M., Kortekaas, R., van de
Par, S., et al. 1997. Detection of tones in low-noise noise: Further
evidence for the role of envelope fluctuations. Acustica. ActaAcustica, 83, 659669.
Kollmeier B. 1990. Messmetodik, Modellierung und Verbeserung der
Verstandlichkeir von Sprache. (in German) (Methodology, modelling
and improvement of speech intelligibility measurements). Habilitation
thesis, Göttingen, University of Göttingen.
Kollmeier, B. & Wesselkamp, M. 1997. Development and evaluation of a
sentence test for objective and subjective speech intelligibility
assessment. J Acoust Soc Am, 102(4), 10851099.
Kosiel U. 1972. Analiza statystyczna cech indywidualnych glosu w
średnim widmie mowy polskiej. (in Polish) (Statistical analysis of
individual voice features on basis of average spectrum of Polish
speech). PhD Dissertation, Warszawa: IPPT.
Martin M. 1997. Speech Audiometry. Wiley Publishers (2nd ed.).
Nilsson, M., Soli, S.D. & Sullivan, J.A. 1994. Development of the
Hearing in Noise Test for the measurement of speech reception
thresholds in quiet and in noise. J Acoust Soc Am, 95, 10851099.
Ozimek, E., Wicher, A., Szyfter, W. & Szymiec, E. 2006a. Distortion
product otoacoustic emission (DPOAE) in tinnitus patients.
J Acoust Soc Am, 119, 527538.
Ozimek E., Kutzner D., Se˛ k A. & Wicher A. 2006b. The effect of the type
of the interfering noise on the slope and SRT value of the
intelligibility functions of speech test. XXVIII International Congress
of Audiology, Innsbruck, 37 September, LO 63, 50.
Ozimek E., Kutzner D., Se˛ k A. & Wicher A. 2007. Polish sentence test
for speech intelligibility measurements in masking conditions. 19th
International Congress on Acoustics, Madrid, 27 September, CAS03-013, p. 148.
Polish sentence tests for measuring the
intelligibility of speech in interfering noise
Ozimek E., Kutzner D., Se˛ k A. & Wicher A. 2008. Development and
evaluation of Polish digit triplet test for auditory screening. Speech
Comm, 51, 307316.
Plomp, R. & Mimpen, A.M. 1979. Improving the reliability of testing the
speech reception threshold for sentences. Audiology, 18, 4353.
Pruszewicz, A., Demenko, G., Richter, L. & Wika, T. 1994a. Nowe listy
artykulacyjne do badań audiometrycznych. Cz. 1. (in Polish) (New
articulation lists for speech audiometry, Part 1). Otolaryngol Pol, 48,
5055.
Pruszewicz, A., Demenko, G., Richter, L. & Wika, T. 1994b. Nowe listy
artykulacyjne do badań audiometrycznych. Cz. 2. (in Polish) (New
articulation lists for speech audiometry, Part 2). Otolaryngol Pol, 48,
5662.
Runge, C.A. & Hosford-Dunn, H. 1985. Word recognition performance
with modified CID W-22 word lists. J Speech Hear Res, 28(3),
355362.
Smits, C. & Houtgast, T. 2006. Measurements and calculations on the
simple up-down adaptive procedure for speech-in-noise tests.
J Acoust Soc Am, 120(3), 16081621.
Smits, C., Kapteyn, T. & Houtgast, T. 2004. Development and validation
of an automatic speech-in-noise screening test by telephone. Int J
Audiol, 43, 1528.
Smoorenburg, G.F. 1992. Speech reception in quiet and in noisy
conditions by individuals with noise-induced hearing loss in relation
to their tone audiogram. J Acoust Soc Am, 91(1), 421437.
Studebaker, G.A., Sherbecoe, R.L., McDaniel, D.M. & Gwaltney, C.A.
1999. Monosyllabic word recognition at higher-than-normal speech
and noise levels. J Acoust Soc Am, 105, 24312444.
Tarnoczy, T. 1971. Das durchschnittliche Energie-Spektrum der Sprache
(für sechs Sprachen). (in German) (A long-term spectrum of speech
(for six languages)). Acustica, 24, 4674.
Versfeld, N.J., Daalder, L., Festen, J.M. & Houtgast, T. 2000. Method for
the selection of sentence material for efficient measurement of the
speech reception threshold. J Acoust Soc Am, 107, 16711684.
Wagener, K., Brand, T. & Kollmeier, B. 1999a. Development and
evaluation of a German sentence test II: Optimalization of the
Oldenburg sentence tests (in German). Z Audiol 38, 4456.
Wagener, K., Brandt, T. & Kollmeier, B. 1999b. Development and
evaluation of a German sentence test I: Design of the Oldenburg
sentence test. (in German). Z Audiol, 38, 415.
Wagener, K., Brandt, T. & Kollmeier, B. 1999c. Development and
evaluation of a German sentence test III: Evaluation of the
Oldenburg sentence test. (in German). Z Audiol, 38, 8695.
Wagener K., Eenbohm F., Brandt T. & Kollmeier B. 2005. Ziffern-TripelTest: Sprachverstandlichkeitstest uber das Telefon. (in German),
(Digit triplets test for speech intelligibility measurements via
telephone). Z Audiol, Suppl. 8, p. CD-ROM.
Wagener, K., Josvassen, J.L. & Ardenkjaer, R. 2003. Design, optimization, and evaluation of a Danish sentence test in noise. Int J Audiol
42(1), 1017.
Welge-Lüssen A., Hauser R., Erdmann J., Schwob C. & Probst R. 1997.
Speech audiometry with logatomes. HNO-Universitatsklinik, Kantonsspital Basel, Switzerland.
Wilson, R.H., Burks, A.B. & Weakley, G.W. 2005. A comparison of
word-recognition abilities assessed with digit pairs and digit triplets
in multitalker babble. J Rehabil Res Dev, 42(4), 499510.
Wilson, R.H., McArdle, R.A. & Smith, S.J. 2007. An evaluation of the
BKB-SIN, HINT, QuickSIN, and WIN materials on listeners with
normal hearing and listeners with hearing loss. J Speech Lang Hear
Res 50, 844856.
Wypych M., Demenko G. & Baranowska E. 2003. Grapheme-tophoneme transcription algorithm based on the SAMPA alphabet
extension for the Polish language. International Congress of Phonetic
Science, Barcelona.
Zwicker, E. & Fastl, H. 1999. Psychoacoustics Facts and Models. New
York: Springer.
Ozimek/Kutzner/Se˛ k/Wicher
443
Download