Room acoustics and speech perception - Rohan

advertisement
Room Acoustics. Boothroyd, 2002.
Page 1 of 18
Room acoustics and speech perception
Prepared for Seminars in Hearing
Arthur Boothroyd, Ph.D.
Distinguished Professor Emeritus, City University of New York
Scholar in Residence, San Diego State University
Visiting Scientist, House Ear Institute
Contact Information:
Arthur Boothroyd
2550 Brant Street, San Diego, CA 92101
(619) 231 7948 (Voice and FAX)
(619) 392 1740 (Mobile)
aboothroyd@cox.net
www.arthurboothroyd.com
Acknowledgement
Preparation of this article was supported by NIDRR grant number H133E010107.
Key Words:
Classroom acoustics, room acoustics, speech perception, reverberation, soundfield amplification, FM amplification.
Abbreviations
SAI - Speech audibility index.
AI - Articulation index
SII - Speech intelligibility index
STI - Speech transmission index
RT - Reverberation time
CVC - Consonant-vowel-consonant
dB - Decibel
Hz - Hertz
SPL - Sound pressure level
LTASS - Long-term average speech spectrum
CASPA - Computer-assisted speech perception assessment
FM - Frequency modulation
ANSI - American National Standards Institute
ASLHA - American Speech-Language Hearing Association
Learning Outcomes: On completion of this article the reader will understand (1)
the variables that need to be considered in evaluating room acoustics and (2) the
effects of these variables on speech perception in a classroom.
Room Acoustics. Boothroyd, 2002.
Page 2 of 18
Abstract
The acoustic speech signal received by a listener is a function of the source,
distance, early reverberation, late reverberation, and noise. Specifically, it
depends on the Speech Audibility Index, which is defined, here, as the proportion
of the combined direct speech and early reverberation (also known as early
reflections) whose level is above that of the combined noise and late
reverberation. Speech Audibility Index rises from 0 to 100% as the effective
signal-to-noise ratio rises from -15 to +15 dB. Both reverberation and ambient
noise need to be low in order to maintain Speech Audibility Index at an optimal
level. Speech Audibility Index can be used to predict various measures of speech
perception, but the results are highly dependent on the complexity of the
language and the characteristics of the listener. Conditions that are tolerable for
normally hearing adults in casual conversation can be difficult for adults and
children in learning situations, and intolerable for persons with deficits of hearing,
language, attention or processing. Sound-field amplification can improve Speech
Audibility Index for all listeners in a noisy room. It offers less benefit when the
primary problem is reverberation and, if improperly installed, can make the
reverberation problem worse. There is no good substitute for reverberation
control. Audiologists have an important contribution to make in the identification
and resolution of continuing inadequacies of classroom acoustics.
Introduction
Room acoustics have a major effect on the transmission of speech sounds
from talker to listener. Four principal factors are involved: distance, early
reverberation, late reverberation, and noise. The present paper outlines the
effects of these factors on the reception and perception of speech.
The initial speech signal
Before examining what happens to speech in a room, it is important to define
the original acoustic signal.
i) Long-term average level
For present purposes, I will consider the original acoustic signal to be
that measured at 1 foot from the lips. At this distance, the long-term
speech level of a typical talker, averaged over 10 or 20 seconds, is
around 70 dB SPL. It is important to remember, however, that this value
is summed across frequency and averaged over time.
ii) Long-term average spectrum
The heavy line in Figure 1 shows the long-term level of a 12 second
speech sample measured in 1/3-octave bands. This is the Long-TermAverage Speech Spectrum or LTASS. The level is highest in the lowfrequency bands, and falls at the rate of around 6 dB per octave at
frequencies above 500 Hz (see, also, Cox and Moore, 1988; Boothroyd,
Erickson, and Medwetsky, 1994). It is a characteristic of the acoustic
Room Acoustics. Boothroyd, 2002.
Page 3 of 18
Long-term level, broad-band rms
Long-term level, 1/3 octave rms
Measured peak level, 1/3 octave rms
Speech level in dBSPL
Idealized short-term (50 ms) range
80
70
60
50
40
30
20
10
0
125
250
500
1000
2000
4000
8000
Frequency in Hz
Figure 1. One-third octave spectral analysis of a 12 second sample of male
speech measured at a distance of 1 foot. The shaded area
extends from 15 dB below to 15 dB above the long-term average
speech spectrum (LTASS) and indicates the approximate
distribution of useful acoustic information.
speech signal that most of the energy (and, therefore, the loudness) is
carried in the lower frequencies - below 1000 Hz (i.e., the region
covered by the first vocal-tract formant). Most of the intelligibility,
however, is carried in the weaker, higher frequencies - between 1000
and 3000 Hz (i.e., the region covered by the second vocal-tract
formant). Note that, because the overall level is summed across
frequency, it is some 7 dB higher than the average level in the lowfrequency bands.
iii) Short-term variation
When the speech signal in each frequency band is measured over short
time intervals, similar to the integration time of the human ear (50 to 100
Room Acoustics. Boothroyd, 2002.
Page 4 of 18
msec), the level varies over a range of approximately 30 dB from 15 dB
below the long-term average to 15 dB above it. The shaded area in
Figure 1 represents this range. Note that, in any given band, the
difference between the level at which speech is just audible, and the
level at which the listener receives all of the useful information, is
approximately 30 dB.
It will be seen from this analysis that the use of a single number to represent
speech level can be misleading. Much of the frequency-specific information in
speech is at levels well below the long-term average, especially in the higher
frequencies. Note, however, that, for the normally hearing listener, some of the
high-frequency discrepancy measured in the sound field is offset by head-baffle
and ear-canal resonance effects.
The effects of distance on the direct speech signal
As the speech travels from the mouth of the talker, the acoustical energy is
spread over an increasingly large area and the average decibel level falls. To a
first approximation, this effect follows the 6 dB rule. That is, the average speech
level falls by 6 dB for every doubling of distance from the lips. If, for example, the
average level is 70 dB SPL at 1 foot, then it is 64 dB SPL at 2 feet, 58 dB SPL at
4 feet and so on. This relationship is illustrated by the broken curve (labeled
"Direct signal only") in Figure 2. In the open air, listeners receive only the direct
speech signal.
Direct and reverberant sound
In enclosed spaces, however, listeners also receive speech via
reverberation. Reverberation refers to the persistence of sound in a room
because of multiple, repeated, reflections from the boundaries. During sound
generation, the reverberant sound is more or less uniformly distributed
throughout the room. The level of this reverberant sound in relation to the level of
the original source depends on the room size, the absorptive properties of its
boundaries and the directionality (also known as Q) of the source (Davis and
Davis, 1997). When the sound source stops, the reverberant sound level begins
to fall but it takes some time for it to become inaudible. The time taken for the
level to fall by 60 dB is known as the reverberation time (RT60). This quantity
provides a rough measure of the reverberant properties of a room. Reverberation
times in large, reflective spaces such as gymnasia can be as high as 2 or 3
seconds. In small classrooms with many absorbent surfaces (including the
surfaces of the students), reverberation times may as low as 0.3 or 0.4 seconds.
At any point in the room, a listener receives both direct sound, whose level
follows the 6 dB rule, and reverberant sound, whose level is relatively
independent of distance. When the listener is close to the source, the level of the
direct sound exceeds that of the reverberant sound. When the listener is far from
Room Acoustics. Boothroyd, 2002.
Page 5 of 18
Average speech level
in dBSPL
80
Reverberation only
Direct signal plus reverberation
60
40
Direct
signal
negligible
distance
= 6 ft
20
0
Direct signal only
Reverberation
negligible Critical
0
5
10
15
20
25
Distance in feet
Figure 2. Predicted long-term average speech level as a function of distance from the
talker in a room measuring 30x20x9 feet with a reverberation time of 0.5
seconds.
the source, the reverberant sound dominates. The critical distance is defined
as the distance at which the levels of the direct and reverberant sound are equal.
At distances less than one third of the critical distance, the direct sound is 10
dB or more stronger than the reverberant sound and reverberation can generally
be ignored. At distances greater than three times the critical distance, the direct
sound is 10 dB or more weaker than the reverberant sound and the received
signal can be considered entirely reverberant.
These points are illustrated in Figure 2, which shows total speech level
(direct plus reverberant) as a function of distance for a small room (30x20x9 feet)
with a relatively short reverberation time (0.5 seconds) and a talker with a Q (i.e.,
directionality) of 3.5. In this example, the estimated critical distance is 6 feet. It
will be seen that most of the listeners are receiving a mixture of direct and
reverberant speech. Those in the last three rows, however, are listening only to
the reverberant speech. Note that most of the listeners experience an increase in
received speech level because of reverberation. For children with elevated
sound-field thresholds, this increase may improve audibility. As will be seen in a
Room Acoustics. Boothroyd, 2002.
Page 6 of 18
moment, however, the gain in audibility (i.e., reception) does not necessarily
translate into improved intelligibility (i.e., perception).
Early and late reverberation
When considering the effects of reverberation on speech perception, it is
important to distinguish between early and late components. The early
components of reverberation (more commonly referred to as early reflections)
arrive at the listener's ear soon enough after the original sound was generated to
enhance both audibility and intelligibility. In contrast, late reverberation arrives at
the listener's ear too late after the original sound. It cannot be integrated with the
direct sound or with the early components of reverberation. Moreover, it
interferes with the recognition of subsequent sounds. The effect of late
reverberation is illustrated by the sound spectrograms of Figure 3. The upper
panel shows the spectrogram of a short phrase without any reverberation. The
lower panel shows the spectrogram of the same phrase subjected to
reverberation, with a reverberation time of 0.5 seconds. In other words, this
spectrogram illustrates the speech signal, as it would be received by a child
sitting in the last three rows in Figure 2. Note how the sound patterns associated
with one speech sound intrude into the next.
10
Mary
/m/
/3/
had a
/r/
little
lamb
/I/ /h//æ/ /d//L//l/ /I/ /d/ /l/ /l/
/æ/
/m/
8
Frequency in kHz
6
4
2
0
10
8
6
4
2
0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
Time in seconds
Figure 3. Spectrograms of a short phrase without reverberation (upper panel) and
after reverberation (lower panel). The reverberation time is 0.5 seconds. The
intensity range between black and white is 30 dB.
Room Acoustics. Boothroyd, 2002.
Page 7 of 18
Effective signal-to-noise
ratio in dB
Because they interfere with intelligibility, the late components of reverberation
are equivalent to noise. In a very real sense, the speech signal generates its own
masking noise. It can be shown that the effective signal-to-noise ratio in
reverberant speech is proportional to the logarithm of the reverberation time, as
illustrated in Figure 4. If we assume that the effective signal-to-noise ratio needs
to be 15 dB for full audibility of the useful information in the reverberant speech
signal, it will be seen that this criterion is met only for reverberation times below
about 0.2 seconds. This conclusion applies to listeners who are so far from the
talker that the contribution of the direct speech signal is negligible (i.e., 3 or more
times the critical distance). Listeners who are closer than this will gain additional
advantage from the direct speech signal.
15
10
5
0
-5
-10
-15
0.1
0.2 0.3
0.5
1.0
2.0 3.0
5.0
10.0
RT60 in seconds
Figure 4. Estimated effective signal-to-noise ratio, as a function of reverberation
time, for the reverberant speech signal (i.e., with no contribution from
the direct speech signal). The broken line shows the signal-to-noise
criterion for full access to the useful acoustic information.
Self-masking in the reverberant speech signal places a limit on its intelligibility.
Based on empirical data from Peutz, the percent phoneme recognition error in
consonant-vowel-consonant words can be assumed to be about nine times the
reverberation time in seconds (Peutz, 1997). Thus, the condition illustrated by the
lower spectrogram of Figure 3 should cause a phoneme recognition error in
Room Acoustics. Boothroyd, 2002.
Page 8 of 18
isolated monosyllables of around 4.5%. When this amount is added to the
residual phoneme recognition error of around 1.5% typically observed under
ideal circumstances, the total is 6%, giving a phoneme recognition score of 94%.
The data of Peutz are based on the recognition of consonants in Dutch but the
rule of thumb works quite well for the recognition of English phonemes in CVCs.
Noise
Potential sources of actual noise (i.e., other than the speech itself) are
numerous and have both internal and external origins. Sound from external
sources can be air-borne or structure-borne. Some of the most common sources
are air and road traffic, heating, ventilating and air conditioning, external human
activity (including speech), and internal human activity (also including speech).
The total effective noise signal is a combination of actual noise and late
reverberation. The effect of the actual noise can be considered negligible if it's
level is 10 dB or more below that of the late reverberation. Similarly, the effect of
late reverberation can be considered negligible if it's level is 10 dB or more below
that of the actual noise.
Effective signal-to-noise ratio
We are now in a position to define the effective signal-to-noise ratio for an
individual listening to speech in a room. The effective signal is the combination of
direct speech and early reverberation. The effective noise is the combination of
actual noise and late reverberation. The effective signal-to-noise ratio is the
decibel difference between the two.
It is important in this context to note that noise measurements in a classroom
do not take account of late reverberation. As a result, empirical measurements of
signal-to-noise ratio can be quite misleading. It would be possible, for example,
to measure a good signal-to-noise ratio in a quiet but highly reverberant room
and to conclude, erroneously, that the conditions are good for speech perception.
Speech Audibility Index (SAI)
If the listener is to have access to all of the useful information in the speech
signal, the effective signal-to-noise ratio at each frequency needs to be at least
15 dB. This will place the short-term speech peaks (which are 15 dB above the
average level) at least 30 dB above the effective noise. Anything less than this
will reduce the available information until, at an effective signal-to-noise ratio of
-15 dB the short-term speech, peaks will become inaudible and the available
information will be zero. In order to simplify the evaluation of telecommunication
systems, early researchers developed the Articulation Index, which specifies the
proportion of the useful acoustic information available to the listener (French and
Steinberg, 1947; Fletcher, 1953; ANSI, 1995). The Speech Intelligibility Index
(SII) is a modified version of the Articulation index (ANSI, 2002a). Neither of
these metrics, however, accounts for the effects of the late components of
Room Acoustics. Boothroyd, 2002.
Page 9 of 18
reverberation. For this reason, I am using an alternative term – Speech Audibility
index.
Speech Audibility Index (SAI) is defined here as the proportion of the useful
speech signal (direct speech plus early reverberation) that is above the level of
the effective noise (actual noise plus late reverberation). Speech Audibility Index
(SAI) is similar to the Speech Transmission Index (STI) (Steeneken and
Houtgast, 1973). STI, however, accounts for both noise and reverberation in
terms of changes in the amplitude envelope of speech.
As in basic Articulation Index theory it may be assumed that the useful speech
information in any frequency band is uniformly distributed over a range of 30 dB,
from 15 dB below, to 15 dB above the average - as indicated in Figure 1. Thus,
the contribution of a given frequency band to Speech Audibility Index rises from 0
to its maximum value as the effective signal-to-noise ratio in that band rises from
-15 to +15 dB. When the signal-to-noise ratio reaches 15 dB in all significant
frequency bands, the Speech Audibility index is 1 or 100%. If we assume that the
signal-to-noise ratio is the same in all frequency bands, then Speech Audibility
Index is given by:
SAI = (sn+15)/30.................................................................................(1)
Where:
SAI = Speech Audibility Index with limits of 0 and 1, and
sn = the overall decibel difference between the useful speech signal and
the effective noise
Note that signal-to-noise ratios of -15 dB, 0dB, and +15dB give Speech
Audibility Indices of 0%, 50%, and 100%, respectively. The assumption, here, is
that both speech and noise are measured in terms of long-term average or leq. If
speech level is measured using the instantaneous setting of a sound level meter,
the average vowel peaks will be some 5 dB above the long-term average level,
and a measured signal-to-noise ratio of 20 dB would then be needed for a
Speech Audibility Index of 100%.
Predicting Speech Intelligibility from Speech Audibility Index
i) Phoneme recognition.
Phoneme recognition can be predicted from Speech Audibility Index using
probability theory. The underlying assumption is that each portion of the
30 dB decibel range makes an independent contribution to the probability
of recognition. For present purposes, we also will assume that the
effective signal-to-noise ratio is constant across frequency. In other words,
this is a single-band implementation of the model. The results are shown
in Figure 5. Also shown in Figure 5 are empirical data obtained from
normally hearing adults listening to consonant-vowel-consonant words in
steady-state noise that was spectrally matched to the long-term average
spectrum of the speech. These data were obtained using CASPA software
Room Acoustics. Boothroyd, 2002.
Page 10 of 18
(Mackersie, Boothroyd, and Minnear, 2001). Because the noise was
spectrally matched to the speech of the talker used for testing, the signalto-noise ratio was the same for all frequency bands. This spectral
matching is the reason for the steepness of the performance vs. intensity
function. When listening in other noises, such as white noise, pink noise,
speech-shaped noise, or multi-talker babble, the signal-to-noise ratio
usually varies with frequency and the slope of the performance vs.
intensity function is less than is shown here.
100
0
20
40
60
80
100
1.0
80
0.8
60
0.6
40
y = (1 - .0054((x+15)/30))1.47
20
0
-20
0.4
0.2
-10
0
10
0.0
20
Phoneme recognition
probability (y)
Phoneme recognition
ptobability in %
Speech Audibility Index
Signal-to-noise ratio in dB (x)
Figure 5. Measured and predicted phoneme recognition, in consonant-vowelconsonant words, as a function of signal-to-noise ratio (bottom axis)
and Speech Audibility Index (top axis). Data points are means for
eight normally hearing adults listening in steady-state, spectrallymatched noise. The equation for the curve is derived from probability
theory.
ii) Recognition of CVC words in isolation
In previous studies (Boothroyd, 1985; Boothroyd and Nittrouer, 1988), it
has been shown that the recognition probability of whole consonantvowel-consonant syllables can be predicted from the recognition
probability of the constituent phonemes by the equation:
w = p j ..........................................................................................(2)
where:
w = syllable recognition probability,
p = phoneme recognition probability and
Room Acoustics. Boothroyd, 2002.
Page 11 of 18
j is a dimensionless exponent representing the effective number of
independently perceived phonemes per syllable.
In nonsense syllables, or highly unfamiliar words, each phoneme in a word
must be perceived independently if the word is to be perceived correctly.
The resulting prediction that j = 3.0 for consonant-vowel-consonant
syllables has been confirmed experimentally. When normally hearing
adults listen to meaningful consonant-vowel-consonant words, however,
the value of j drops to between 2.0 and 2.5, reflecting the fact that
recognition of one phoneme in a word increases the probability of
recognition of the others. This effect is illustrated in Figure 6, which shows
recognition for unfamiliar words (j = 3.0) and familiar words (j = 2.0) as
functions of Speech Audibility Index and effective signal-to-noise ratio.
Speech Audibility Index
Word recognition
probability in %
20
40
60
80
100
80
0.8
60
13 %pts.
40
0.6
0.4
20
0
-20
1.0
Familiar words
Unfamiliar words
2 dB
-10
0
10
0.2
0.0
20
Word recognition
probability
100
0
Effective signal-to-noise ratio in dB
Figure 6. Predicted recognition of familiar and unfamiliar consonant-vowelconsonant words, as functions of signal-to-noise ratio (bottom axis) and
Speech Audibility Index (top axis).
The point needing emphasis here is that classroom communication
automatically involves the presentation of unfamiliar vocabulary. Listening
conditions that are adequate for the recognition of familiar words may not
be adequate for the recognition of unfamiliar words, which remain
nonsense until given meaning in the learning process. It will be seen that
the difference between familiar and unfamiliar words results in a difference
of recognition probability that, for normally hearing listeners, can be as
high as 13 percentage points - equivalent to a change in signal-to-noise
ratio in the region of 2 dB. The effect will be even greater for words
containing more than three phonemes.
Room Acoustics. Boothroyd, 2002.
Page 12 of 18
iii) Recognition of words in sentence context
One can predict recognition probability for words in context from that for
CVC words in isolation using the following equation (Boothroyd, 1985;
Boothroyd and Nittrouer, 1988):
ws = 1-(1-wi)k ......................................................................(3)
Where:
ws = recognition probability for words in sentences,
wi = recognition probability for CVC words in isolation and
k = a dimensionless exponent reflecting the effect of sentence
context.
The value of k is determined by a variety of factors. These include the
length, complexity, syntactic structure and meaning of the sentence and
the language knowledge, world knowledge and processing skills of the
listener (Boothroyd, 2002). In Articulation Index theory, the exponent k
would be referred to as a proficiency factor. It can be thought of as
equivalent to a proportional increase in the number of independent
channels of information. Consider, for example, the frequency spectrum
divided into many equally important bands. The addition of sentence
context when listening via a single band would increase word recognition
by the same amount as listening via k bands, but without sentence
context.
By combining equations (1) through (3), we can predict word
recognition in sentences as a function of effective signal-to-noise ratio.
The results are shown in the upper panel of Figure 7. The solid line uses
values of j = 2.0 and k = 7, representing familiar words in simple
sentences. The broken line uses values of j = 3.0 and k = 2, representing
unfamiliar words in complex sentences. It will be seen from the upper
panel of Figure 7 that the effects of sentence complexity and/or the
listener's world and language knowledge can have an enormous effect on
recognition in poor acoustic conditions. In this example, a normally
hearing adult could achieve 95% word recognition in casual conversation
under conditions that give only 36% word recognition to a child trying to
follow new and difficult material. The child would need a 9 dB
improvement in effective signal-to-noise ratio in order to match the adult's
performance. This kind of discrepancy can lead to erroneous conclusions
by adults about the adequacy of inferior classroom acoustics.
Room Acoustics. Boothroyd, 2002.
Page 13 of 18
Speech Audibility Index in %
100
0
20
40
60
100
1.0
11.5 dB
80
0.8
Normal hearing
60
40
20
Simple sentences,
familiar words
Complex sentences,
unfamiliar words
38 %
0
-20
0.6
-10
0
10
Signal-to-noise ratio in dB
0.4
0.2
0.0
20
Speech Audibility Index in %
100
0
20
40
60
80
100
1.0
80
0.8
60
0.6
6.5 dB
40
20
0
-20
-10
50 dB unaided
hearing loss
36 %
(plus amplification)
0
10
Signal-to-noise ratio in dB
Word recognition probability
Word recognition probability in %
80
0.4
0.2
0.0
20
Figure 7. Predicted recognition of words in simple and complex sentences as a
function of signal-to-noise ratio (bottom axis) and Speech Audibility Index
(top axis). The upper panel applies to persons with normal hearing. The
lower panel applies to a hypothetical person with a 50 dB sensorineural
hearing loss.
Room Acoustics. Boothroyd, 2002.
Page 14 of 18
iv) Effect of sensorineural hearing loss
So far, all of the analyses have assumed normal peripheral auditory
function. Clearly, individuals with sensorineural hearing loss have speech
perception difficulties over and above those caused by poor listening
conditions. These effects cannot be modeled precisely with existing
knowledge. On average, however, it can be assumed that individuals with
uncomplicated sensorineural damage lose about 1 percentage point in
aided phoneme recognition for every decibel of unaided 3-frequencyaverage loss in excess of 20 dB. This approximate relationship is derived
from clinical experience and a variety of research studies (e.g., Boothroyd,
1984), but it does not take account of audiogram slope or deficits of
language, attention or processing. The effect of this correction on the
prediction of word recognition in sentences is shown in the lower panel of
Figure 7. The assumption is of a person with a flat 50 dB sensorineural
hearing loss. It is predicted that this individual needs a 6 dB increase of
effective signal-to-noise ratio, relative to a person with normal hearing, in
order to meet a 95% criterion for word recognition in simple sentences.
When listening to unfamiliar words in complex sentences, however, this
criterion will only provide about 36% recognition. It will need at least
another 10 dB increase in effective signal-to-noise ratio to bring this
individual close to her optimum word recognition score in complex
sentences and, even then, the score will only be around 55%.
The lower panel of Figure 7 illustrates the serious challenge faced by
children with hearing loss, including those using cochlear implants, when
trying to follow complex instructional material in the mainstream setting. It
also illustrates how easy it can be to underestimate this challenge on the
basis of observations of the child's ability to understand simple material in
familiar every day contexts.
Practical Implications
The obvious implication of the foregoing is that the effective speech-to-noise
ratio in classrooms must be high if the occupants are to have adequate access to
the acoustic information in the speech of teachers and classmates. In other
words, the combination of direct speech and the early components of
reverberation should be high in relation to the combination of noise and the late
components of reverberation.
It is not clear, however, that one needs to aim for a Speech Audibility Index
of 100%, which could require a noise level of 20 dBA or less and a reverberation
time of 0.2 seconds or less (conditions one might expect in a recording studio or
an audiological test booth). The speech signal is highly redundant, both
acoustically and linguistically. In other words, the same information is often
available from more than one spectral or temporal location in the signal. Because
Room Acoustics. Boothroyd, 2002.
Page 15 of 18
of this redundancy, excellent levels of speech perception are usually attainable
with less than full access. A reasonable target for speech Audibility Index, even
for complex materials, can be as low as 70 to 75% (or an effective signal-to-noise
ratio of 6 to 7 dB) – as is evident from Figures 5 through 7.
It must be stressed that redundancy is a relative term. The redundancy in
speech is highly dependent on the language material and on the auditory,
cognitive and linguistic status of the listener. What is acceptable for a given
listener and a given situation may be unacceptable for a different listener and/or
a different situation. Acoustical criteria need to be especially stringent for young
children, children listening in a non-native language, and children with deficits of
hearing, cognition, language, attention, auditory processing or language
processing. Because any classroom may contain one or more such children, it is
reasonable to demand a stringent criterion for all.
The recently promulgated American standard for new or refurbished
classrooms calls for noise levels to be 35 dBA or lower when the room is
unoccupied. Reverberation times are to be 0.6 seconds or lower in small-tomedium sized classrooms and 0.7 seconds or lower in large classrooms when
the rooms are unoccupied (ANSI, 2002). The reverberation criteria are, perhaps,
not as stringent as they could be. The American Speech-Language Hearing
Association (1995) recommends a reverberation time of 0.4 seconds or less for
an occupied classroom containing children with hearing loss. This criterion
translates into approximately 0.45 seconds or less for an unoccupied room. The
new ANSI standard, however, does provide a reasonable compromise between
the ideal and the affordable.
When the ANSI criteria are applied to a room of the size illustrated in Figure
2, they translate into a Speech Audibility Index in the region of 70% for students
who are farthest from the teacher. Some correction is needed, however, because
the presence of the students actually lowers the reverberation. Twenty or 25
students in a room of this size might lower reverberation time by 0.05 seconds to
0.55 seconds. This change would increase Speech Audibility Index to around
72% for students at the back of the room. On the other hand, the students are
also a potential source of background noise. The magnitude of this noise will
depend on a variety of factors, including classroom discipline. If we assume,
however, that the occupied noise level rises to 45 dBA, with the students present,
then the Speech Audibility Index for those at the back of the room will fall to
around 66% which may only be marginally adequate for the reception of
unfamiliar words in complex sentences (see Figure 7). Unfortunately, no physical
design standard for room acoustics can adequately address the issue of noise
generated by the intended listeners.
A second implication of the material presented here is that decisions about
the need for, and success of, acoustic modifications should be based on
Room Acoustics. Boothroyd, 2002.
Page 16 of 18
acoustical measurements and not on the apparent ease of every day
conversation between proximate adults. If administrators need data with
ecological validity, older children can be given a simple open-set dictation test,
using monosyllabic words.
Sound-field amplification is often suggested as a cost-effective substitute for
acoustical treatment. A microphone is placed a few inches from the mouth of the
teacher where it picks up a signal with excellent effective signal-to-noise ratio.
This signal is then distributed to one or more strategically placed loudspeakers.
Sound-field amplification can be very beneficial when the primary problem is
ambient noise, because it increases the level of the speech signal without
increasing the noise. In addition, sound-field amplification can offset the negative
effects of distance. But this technology is less effective when the primary problem
is reverberation. Under this condition, any increase in speech level produces an
identical increase in the level of the late reverberation and the net gain of
effective signal-to-noise, for children who are not close to a loudspeaker, is zero.
In fact, the presence of several loudspeakers in the room can actually increase
the level of late reverberation for children who are not close to a loudspeaker.
This is not to say that sound-field amplification is useless in reverberant
conditions. Directional loudspeaker arrays can increase the ratio of direct to
reverberant sound and children sitting close to a loudspeaker will enjoy improved
perception. The extreme instance of this last approach is the desk-mounted
loudspeaker. Because the child is close to the loudspeaker, the volume can be
kept low so as not to increase reverberation for other children. Of course, this
approach only helps the child with the loudspeaker.
It is clear that the first step in dealing with poor room acoustics should be
the installation of sound absorption to reduce reverberation time to acceptable
levels. When this has been done, a sound-field system can be an effective way
both to improve signal-to-noise ratio and to counteract the effects of distance – at
least for the speech of the person with the microphone. If, for any reason,
reverberation cannot be lowered to appropriate levels, any attempt to improve
listening conditions with sound-field amplification requires extreme care in
selection, installation, and adjustment.
For the child who is wearing a hearing aid or cochlear implant, there is the
option of a wireless link (usually FM) from a remote teacher microphone to the
sensory aid itself. An FM amplification system is, in fact, the most effective way
to enhance Speech Audibility Index – at least for the speech of the person with
the microphone. With that microphone only a few inches from the talker’s mouth,
the signal level and signal-to-noise ratios could be increased by some 15 dB for
the child at the back of the room illustrated in Figure 2. This assumes, however,
that the microphone in the hearing aid or implant has been deactivated. While
deactivation of the local (also known as environmental) microphone may be
Room Acoustics. Boothroyd, 2002.
Page 17 of 18
appropriate for a college student listening to a lecture, it is not appropriate for
younger children with hearing loss who are in primary or secondary education.
Activation of the local microphone is critical for auditory feedback of selfgenerated speech and for hearing the comments and responses of fellow
students. As soon as this microphone is turned on, however, the noise and late
reverberation that it picks up are in danger of eliminating some or all of the
benefits of the remote microphone. Careful adjustment of the relative gains via
the two microphones is essential if this problem is to be avoided (American
Speech-Language Hearing Association, 2002).
Room acoustics is a complex, multidisciplinary topic with serious
ramifications. The consequences of poor acoustics have been known for years,
as have the solutions (for an excellent review, see Crandell and Smaldino, 2000).
Nevertheless, many students are expected to listen and learn in rooms with poor
acoustics. This is the equivalent of expecting them to read and learn in darkened
rooms using poor Xerox copies of their texts. The contributions of knowledgeable
Educational and Rehabilitative Audiologists are essential as we continue to work
towards the goal of an acoustically viable learning environment for all children.
For additional Information
The analyses developed in this paper are incorporated into sound-field simulation
software developed by the author for Phonic Ear Inc. This software (Sound-field
Wizard) may be downloaded, free of charge, from www.phonicear.com or from
www.arthurboothroyd.com.
References
American National Standards Institute, (1995). American national standard
method for measuring the intelligibility of speech over communications
systems. ANSI S3.2-1989 (R 1995).
American National Standards Institute, (2002a). American national standard
methods of the calculation of the speech intelligibility index. ANSI S3.5-1997 (R
2002).
American National Standards Institute, (2002b). Acoustical performance criteria,
design requirements, and guidelines for classrooms. ANSI S12.6 -2002.
American Speech-Language Hearing Association, (1995, March). Acoustics in
educational settings: position statement and guidelines. ASHA, 37, (suppl. 14),
pp. 15-19.
American Speech-Language Hearing Association, (2002). Guidelines for fitting
and monitoring FM systems. ASHA Desk Reference, Volume II, pp 151-171.
Boothroyd, A. (1984). Auditory perception of speech contrasts by subjects with
sensorineural hearing loss. Journal of Speech and Hearing Research, 27, 134144.
Room Acoustics. Boothroyd, 2002.
Page 18 of 18
Boothroyd, A. (1985). Evaluation of speech production in the hearing-impaired:
some benefits of forced-choice testing. Journal of Speech & Hearing Research,
28, 185-196.
Boothroyd, A. (2002). Influence of context on the perception of spoken language.
In: Proc. Congreso Internacional de Foniatrίa, Audiologίa, Logopedia y
Psicologίa del lenguaje. Universidad Pontificia de Salamanca.
Boothroyd, A., and Nittrouer, S. (1988). Mathematical treatment of context effects
in phoneme and word recognition. Journal of the Acoustical Society of America,
84, 101-114.
Boothroyd, A., Erickson, F., & Medwetsky, L. (1994). The hearing aid input: a
phonemic approach to assessing the spectral distribution of speech. Ear and
Hearing, 15, 432-442.
Cox, R.M. and Moore, J.R. (1988). Composite speech spectrum for hearing aid
gain prescriptions. Journal of Speech and Hearing Research, 31, 102-107.
Crandell, C.C. and Smaldino, J.J. (2000). Classroom acoustics for children with
normal hearing and with hearing impairment. Language, Speech, and Hearing
Services in Schools, 31, 362-370.
Davis, D. and Davis, C. (1997). Sound system engineering (second edition).
Newton, MA: Focal Press.
Fletcher, H. (1953). Speech and hearing in communication. New York: Van
Nostrand. (Available in the ASA edition, edited by Jont Allen and published by
the Acoustical Society of America in 1995).
French, N.R. and Steinberg, J.C. (1947). Factors governing the intelligibility of
speech sounds. Journal of the Acoustical Society of America, 19, 90-119.
Mackersie, C.L., Boothroyd, A., and Minnear, D. (2001). Evaluation of the
Computer-Assisted Speech Perception Test (CASPA). Journal of the American
Academy of Audiology. 12, 390-396.
Peutz, V. (1997), Speech recognition and information. Appendix 10 in: Davis, D.
and Davis, C. (1997). Sound system engineering (second edition), pp639-644.
Newton, MA: Focal Press.
Steeneken, H.J.M. and Houtgast, T. (1973). The modulation transfer function in
room acoustics as a predictor of speech intelligibility, Acustica, 28, 66-73.
Download