Room acoustics and speech perception - Rohan

Room Acoustics. Boothroyd, 2002. Page 1 of 18 Room acoustics and speech perception Prepared for Seminars in Hearing Arthur Boothroyd, Ph.D. Distinguished Professor Emeritus, City University of New York Scholar in Residence, San Diego State University Visiting Scientist, House Ear Institute Contact Information: Arthur Boothroyd 2550 Brant Street, San Diego, CA 92101 (619) 231 7948 (Voice and FAX) (619) 392 1740 (Mobile) aboothroyd@cox.net www.arthurboothroyd.com Acknowledgement Preparation of this article was supported by NIDRR grant number H133E010107. Key Words: Classroom acoustics, room acoustics, speech perception, reverberation, soundfield amplification, FM amplification. Abbreviations SAI - Speech audibility index. AI - Articulation index SII - Speech intelligibility index STI - Speech transmission index RT - Reverberation time CVC - Consonant-vowel-consonant dB - Decibel Hz - Hertz SPL - Sound pressure level LTASS - Long-term average speech spectrum CASPA - Computer-assisted speech perception assessment FM - Frequency modulation ANSI - American National Standards Institute ASLHA - American Speech-Language Hearing Association Learning Outcomes: On completion of this article the reader will understand (1) the variables that need to be considered in evaluating room acoustics and (2) the effects of these variables on speech perception in a classroom. Room Acoustics. Boothroyd, 2002. Page 2 of 18 Abstract The acoustic speech signal received by a listener is a function of the source, distance, early reverberation, late reverberation, and noise. Specifically, it depends on the Speech Audibility Index, which is defined, here, as the proportion of the combined direct speech and early reverberation (also known as early reflections) whose level is above that of the combined noise and late reverberation. Speech Audibility Index rises from 0 to 100% as the effective signal-to-noise ratio rises from -15 to +15 dB. Both reverberation and ambient noise need to be low in order to maintain Speech Audibility Index at an optimal level. Speech Audibility Index can be used to predict various measures of speech perception, but the results are highly dependent on the complexity of the language and the characteristics of the listener. Conditions that are tolerable for normally hearing adults in casual conversation can be difficult for adults and children in learning situations, and intolerable for persons with deficits of hearing, language, attention or processing. Sound-field amplification can improve Speech Audibility Index for all listeners in a noisy room. It offers less benefit when the primary problem is reverberation and, if improperly installed, can make the reverberation problem worse. There is no good substitute for reverberation control. Audiologists have an important contribution to make in the identification and resolution of continuing inadequacies of classroom acoustics. Introduction Room acoustics have a major effect on the transmission of speech sounds from talker to listener. Four principal factors are involved: distance, early reverberation, late reverberation, and noise. The present paper outlines the effects of these factors on the reception and perception of speech. The initial speech signal Before examining what happens to speech in a room, it is important to define the original acoustic signal. i) Long-term average level For present purposes, I will consider the original acoustic signal to be that measured at 1 foot from the lips. At this distance, the long-term speech level of a typical talker, averaged over 10 or 20 seconds, is around 70 dB SPL. It is important to remember, however, that this value is summed across frequency and averaged over time. ii) Long-term average spectrum The heavy line in Figure 1 shows the long-term level of a 12 second speech sample measured in 1/3-octave bands. This is the Long-TermAverage Speech Spectrum or LTASS. The level is highest in the lowfrequency bands, and falls at the rate of around 6 dB per octave at frequencies above 500 Hz (see, also, Cox and Moore, 1988; Boothroyd, Erickson, and Medwetsky, 1994). It is a characteristic of the acoustic Room Acoustics. Boothroyd, 2002. Page 3 of 18 Long-term level, broad-band rms Long-term level, 1/3 octave rms Measured peak level, 1/3 octave rms Speech level in dBSPL Idealized short-term (50 ms) range 80 70 60 50 40 30 20 10 0 125 250 500 1000 2000 4000 8000 Frequency in Hz Figure 1. One-third octave spectral analysis of a 12 second sample of male speech measured at a distance of 1 foot. The shaded area extends from 15 dB below to 15 dB above the long-term average speech spectrum (LTASS) and indicates the approximate distribution of useful acoustic information. speech signal that most of the energy (and, therefore, the loudness) is carried in the lower frequencies - below 1000 Hz (i.e., the region covered by the first vocal-tract formant). Most of the intelligibility, however, is carried in the weaker, higher frequencies - between 1000 and 3000 Hz (i.e., the region covered by the second vocal-tract formant). Note that, because the overall level is summed across frequency, it is some 7 dB higher than the average level in the lowfrequency bands. iii) Short-term variation When the speech signal in each frequency band is measured over short time intervals, similar to the integration time of the human ear (50 to 100 Room Acoustics. Boothroyd, 2002. Page 4 of 18 msec), the level varies over a range of approximately 30 dB from 15 dB below the long-term average to 15 dB above it. The shaded area in Figure 1 represents this range. Note that, in any given band, the difference between the level at which speech is just audible, and the level at which the listener receives all of the useful information, is approximately 30 dB. It will be seen from this analysis that the use of a single number to represent speech level can be misleading. Much of the frequency-specific information in speech is at levels well below the long-term average, especially in the higher frequencies. Note, however, that, for the normally hearing listener, some of the high-frequency discrepancy measured in the sound field is offset by head-baffle and ear-canal resonance effects. The effects of distance on the direct speech signal As the speech travels from the mouth of the talker, the acoustical energy is spread over an increasingly large area and the average decibel level falls. To a first approximation, this effect follows the 6 dB rule. That is, the average speech level falls by 6 dB for every doubling of distance from the lips. If, for example, the average level is 70 dB SPL at 1 foot, then it is 64 dB SPL at 2 feet, 58 dB SPL at 4 feet and so on. This relationship is illustrated by the broken curve (labeled "Direct signal only") in Figure 2. In the open air, listeners receive only the direct speech signal. Direct and reverberant sound In enclosed spaces, however, listeners also receive speech via reverberation. Reverberation refers to the persistence of sound in a room because of multiple, repeated, reflections from the boundaries. During sound generation, the reverberant sound is more or less uniformly distributed throughout the room. The level of this reverberant sound in relation to the level of the original source depends on the room size, the absorptive properties of its boundaries and the directionality (also known as Q) of the source (Davis and Davis, 1997). When the sound source stops, the reverberant sound level begins to fall but it takes some time for it to become inaudible. The time taken for the level to fall by 60 dB is known as the reverberation time (RT60). This quantity provides a rough measure of the reverberant properties of a room. Reverberation times in large, reflective spaces such as gymnasia can be as high as 2 or 3 seconds. In small classrooms with many absorbent surfaces (including the surfaces of the students), reverberation times may as low as 0.3 or 0.4 seconds. At any point in the room, a listener receives both direct sound, whose level follows the 6 dB rule, and reverberant sound, whose level is relatively independent of distance. When the listener is close to the source, the level of the direct sound exceeds that of the reverberant sound. When the listener is far from Room Acoustics. Boothroyd, 2002. Page 5 of 18 Average speech level in dBSPL 80 Reverberation only Direct signal plus reverberation 60 40 Direct signal negligible distance = 6 ft 20 0 Direct signal only Reverberation negligible Critical 0 5 10 15 20 25 Distance in feet Figure 2. Predicted long-term average speech level as a function of distance from the talker in a room measuring 30x20x9 feet with a reverberation time of 0.5 seconds. the source, the reverberant sound dominates. The critical distance is defined as the distance at which the levels of the direct and reverberant sound are equal. At distances less than one third of the critical distance, the direct sound is 10 dB or more stronger than the reverberant sound and reverberation can generally be ignored. At distances greater than three times the critical distance, the direct sound is 10 dB or more weaker than the reverberant sound and the received signal can be considered entirely reverberant. These points are illustrated in Figure 2, which shows total speech level (direct plus reverberant) as a function of distance for a small room (30x20x9 feet) with a relatively short reverberation time (0.5 seconds) and a talker with a Q (i.e., directionality) of 3.5. In this example, the estimated critical distance is 6 feet. It will be seen that most of the listeners are receiving a mixture of direct and reverberant speech. Those in the last three rows, however, are listening only to the reverberant speech. Note that most of the listeners experience an increase in received speech level because of reverberation. For children with elevated sound-field thresholds, this increase may improve audibility. As will be seen in a Room Acoustics. Boothroyd, 2002. Page 6 of 18 moment, however, the gain in audibility (i.e., reception) does not necessarily translate into improved intelligibility (i.e., perception). Early and late reverberation When considering the effects of reverberation on speech perception, it is important to distinguish between early and late components. The early components of reverberation (more commonly referred to as early reflections) arrive at the listener's ear soon enough after the original sound was generated to enhance both audibility and intelligibility. In contrast, late reverberation arrives at the listener's ear too late after the original sound. It cannot be integrated with the direct sound or with the early components of reverberation. Moreover, it interferes with the recognition of subsequent sounds. The effect of late reverberation is illustrated by the sound spectrograms of Figure 3. The upper panel shows the spectrogram of a short phrase without any reverberation. The lower panel shows the spectrogram of the same phrase subjected to reverberation, with a reverberation time of 0.5 seconds. In other words, this spectrogram illustrates the speech signal, as it would be received by a child sitting in the last three rows in Figure 2. Note how the sound patterns associated with one speech sound intrude into the next. 10 Mary /m/ /3/ had a /r/ little lamb /I/ /h//æ/ /d//L//l/ /I/ /d/ /l/ /l/ /æ/ /m/ 8 Frequency in kHz 6 4 2 0 10 8 6 4 2 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 Time in seconds Figure 3. Spectrograms of a short phrase without reverberation (upper panel) and after reverberation (lower panel). The reverberation time is 0.5 seconds. The intensity range between black and white is 30 dB. Room Acoustics. Boothroyd, 2002. Page 7 of 18 Effective signal-to-noise ratio in dB Because they interfere with intelligibility, the late components of reverberation are equivalent to noise. In a very real sense, the speech signal generates its own masking noise. It can be shown that the effective signal-to-noise ratio in reverberant speech is proportional to the logarithm of the reverberation time, as illustrated in Figure 4. If we assume that the effective signal-to-noise ratio needs to be 15 dB for full audibility of the useful information in the reverberant speech signal, it will be seen that this criterion is met only for reverberation times below about 0.2 seconds. This conclusion applies to listeners who are so far from the talker that the contribution of the direct speech signal is negligible (i.e., 3 or more times the critical distance). Listeners who are closer than this will gain additional advantage from the direct speech signal. 15 10 5 0 -5 -10 -15 0.1 0.2 0.3 0.5 1.0 2.0 3.0 5.0 10.0 RT60 in seconds Figure 4. Estimated effective signal-to-noise ratio, as a function of reverberation time, for the reverberant speech signal (i.e., with no contribution from the direct speech signal). The broken line shows the signal-to-noise criterion for full access to the useful acoustic information. Self-masking in the reverberant speech signal places a limit on its intelligibility. Based on empirical data from Peutz, the percent phoneme recognition error in consonant-vowel-consonant words can be assumed to be about nine times the reverberation time in seconds (Peutz, 1997). Thus, the condition illustrated by the lower spectrogram of Figure 3 should cause a phoneme recognition error in Room Acoustics. Boothroyd, 2002. Page 8 of 18 isolated monosyllables of around 4.5%. When this amount is added to the residual phoneme recognition error of around 1.5% typically observed under ideal circumstances, the total is 6%, giving a phoneme recognition score of 94%. The data of Peutz are based on the recognition of consonants in Dutch but the rule of thumb works quite well for the recognition of English phonemes in CVCs. Noise Potential sources of actual noise (i.e., other than the speech itself) are numerous and have both internal and external origins. Sound from external sources can be air-borne or structure-borne. Some of the most common sources are air and road traffic, heating, ventilating and air conditioning, external human activity (including speech), and internal human activity (also including speech). The total effective noise signal is a combination of actual noise and late reverberation. The effect of the actual noise can be considered negligible if it's level is 10 dB or more below that of the late reverberation. Similarly, the effect of late reverberation can be considered negligible if it's level is 10 dB or more below that of the actual noise. Effective signal-to-noise ratio We are now in a position to define the effective signal-to-noise ratio for an individual listening to speech in a room. The effective signal is the combination of direct speech and early reverberation. The effective noise is the combination of actual noise and late reverberation. The effective signal-to-noise ratio is the decibel difference between the two. It is important in this context to note that noise measurements in a classroom do not take account of late reverberation. As a result, empirical measurements of signal-to-noise ratio can be quite misleading. It would be possible, for example, to measure a good signal-to-noise ratio in a quiet but highly reverberant room and to conclude, erroneously, that the conditions are good for speech perception. Speech Audibility Index (SAI) If the listener is to have access to all of the useful information in the speech signal, the effective signal-to-noise ratio at each frequency needs to be at least 15 dB. This will place the short-term speech peaks (which are 15 dB above the average level) at least 30 dB above the effective noise. Anything less than this will reduce the available information until, at an effective signal-to-noise ratio of -15 dB the short-term speech, peaks will become inaudible and the available information will be zero. In order to simplify the evaluation of telecommunication systems, early researchers developed the Articulation Index, which specifies the proportion of the useful acoustic information available to the listener (French and Steinberg, 1947; Fletcher, 1953; ANSI, 1995). The Speech Intelligibility Index (SII) is a modified version of the Articulation index (ANSI, 2002a). Neither of these metrics, however, accounts for the effects of the late components of Room Acoustics. Boothroyd, 2002. Page 9 of 18 reverberation. For this reason, I am using an alternative term – Speech Audibility index. Speech Audibility Index (SAI) is defined here as the proportion of the useful speech signal (direct speech plus early reverberation) that is above the level of the effective noise (actual noise plus late reverberation). Speech Audibility Index (SAI) is similar to the Speech Transmission Index (STI) (Steeneken and Houtgast, 1973). STI, however, accounts for both noise and reverberation in terms of changes in the amplitude envelope of speech. As in basic Articulation Index theory it may be assumed that the useful speech information in any frequency band is uniformly distributed over a range of 30 dB, from 15 dB below, to 15 dB above the average - as indicated in Figure 1. Thus, the contribution of a given frequency band to Speech Audibility Index rises from 0 to its maximum value as the effective signal-to-noise ratio in that band rises from -15 to +15 dB. When the signal-to-noise ratio reaches 15 dB in all significant frequency bands, the Speech Audibility index is 1 or 100%. If we assume that the signal-to-noise ratio is the same in all frequency bands, then Speech Audibility Index is given by: SAI = (sn+15)/30.................................................................................(1) Where: SAI = Speech Audibility Index with limits of 0 and 1, and sn = the overall decibel difference between the useful speech signal and the effective noise Note that signal-to-noise ratios of -15 dB, 0dB, and +15dB give Speech Audibility Indices of 0%, 50%, and 100%, respectively. The assumption, here, is that both speech and noise are measured in terms of long-term average or leq. If speech level is measured using the instantaneous setting of a sound level meter, the average vowel peaks will be some 5 dB above the long-term average level, and a measured signal-to-noise ratio of 20 dB would then be needed for a Speech Audibility Index of 100%. Predicting Speech Intelligibility from Speech Audibility Index i) Phoneme recognition. Phoneme recognition can be predicted from Speech Audibility Index using probability theory. The underlying assumption is that each portion of the 30 dB decibel range makes an independent contribution to the probability of recognition. For present purposes, we also will assume that the effective signal-to-noise ratio is constant across frequency. In other words, this is a single-band implementation of the model. The results are shown in Figure 5. Also shown in Figure 5 are empirical data obtained from normally hearing adults listening to consonant-vowel-consonant words in steady-state noise that was spectrally matched to the long-term average spectrum of the speech. These data were obtained using CASPA software Room Acoustics. Boothroyd, 2002. Page 10 of 18 (Mackersie, Boothroyd, and Minnear, 2001). Because the noise was spectrally matched to the speech of the talker used for testing, the signalto-noise ratio was the same for all frequency bands. This spectral matching is the reason for the steepness of the performance vs. intensity function. When listening in other noises, such as white noise, pink noise, speech-shaped noise, or multi-talker babble, the signal-to-noise ratio usually varies with frequency and the slope of the performance vs. intensity function is less than is shown here. 100 0 20 40 60 80 100 1.0 80 0.8 60 0.6 40 y = (1 - .0054((x+15)/30))1.47 20 0 -20 0.4 0.2 -10 0 10 0.0 20 Phoneme recognition probability (y) Phoneme recognition ptobability in % Speech Audibility Index Signal-to-noise ratio in dB (x) Figure 5. Measured and predicted phoneme recognition, in consonant-vowelconsonant words, as a function of signal-to-noise ratio (bottom axis) and Speech Audibility Index (top axis). Data points are means for eight normally hearing adults listening in steady-state, spectrallymatched noise. The equation for the curve is derived from probability theory. ii) Recognition of CVC words in isolation In previous studies (Boothroyd, 1985; Boothroyd and Nittrouer, 1988), it has been shown that the recognition probability of whole consonantvowel-consonant syllables can be predicted from the recognition probability of the constituent phonemes by the equation: w = p j ..........................................................................................(2) where: w = syllable recognition probability, p = phoneme recognition probability and Room Acoustics. Boothroyd, 2002. Page 11 of 18 j is a dimensionless exponent representing the effective number of independently perceived phonemes per syllable. In nonsense syllables, or highly unfamiliar words, each phoneme in a word must be perceived independently if the word is to be perceived correctly. The resulting prediction that j = 3.0 for consonant-vowel-consonant syllables has been confirmed experimentally. When normally hearing adults listen to meaningful consonant-vowel-consonant words, however, the value of j drops to between 2.0 and 2.5, reflecting the fact that recognition of one phoneme in a word increases the probability of recognition of the others. This effect is illustrated in Figure 6, which shows recognition for unfamiliar words (j = 3.0) and familiar words (j = 2.0) as functions of Speech Audibility Index and effective signal-to-noise ratio. Speech Audibility Index Word recognition probability in % 20 40 60 80 100 80 0.8 60 13 %pts. 40 0.6 0.4 20 0 -20 1.0 Familiar words Unfamiliar words 2 dB -10 0 10 0.2 0.0 20 Word recognition probability 100 0 Effective signal-to-noise ratio in dB Figure 6. Predicted recognition of familiar and unfamiliar consonant-vowelconsonant words, as functions of signal-to-noise ratio (bottom axis) and Speech Audibility Index (top axis). The point needing emphasis here is that classroom communication automatically involves the presentation of unfamiliar vocabulary. Listening conditions that are adequate for the recognition of familiar words may not be adequate for the recognition of unfamiliar words, which remain nonsense until given meaning in the learning process. It will be seen that the difference between familiar and unfamiliar words results in a difference of recognition probability that, for normally hearing listeners, can be as high as 13 percentage points - equivalent to a change in signal-to-noise ratio in the region of 2 dB. The effect will be even greater for words containing more than three phonemes. Room Acoustics. Boothroyd, 2002. Page 12 of 18 iii) Recognition of words in sentence context One can predict recognition probability for words in context from that for CVC words in isolation using the following equation (Boothroyd, 1985; Boothroyd and Nittrouer, 1988): ws = 1-(1-wi)k ......................................................................(3) Where: ws = recognition probability for words in sentences, wi = recognition probability for CVC words in isolation and k = a dimensionless exponent reflecting the effect of sentence context. The value of k is determined by a variety of factors. These include the length, complexity, syntactic structure and meaning of the sentence and the language knowledge, world knowledge and processing skills of the listener (Boothroyd, 2002). In Articulation Index theory, the exponent k would be referred to as a proficiency factor. It can be thought of as equivalent to a proportional increase in the number of independent channels of information. Consider, for example, the frequency spectrum divided into many equally important bands. The addition of sentence context when listening via a single band would increase word recognition by the same amount as listening via k bands, but without sentence context. By combining equations (1) through (3), we can predict word recognition in sentences as a function of effective signal-to-noise ratio. The results are shown in the upper panel of Figure 7. The solid line uses values of j = 2.0 and k = 7, representing familiar words in simple sentences. The broken line uses values of j = 3.0 and k = 2, representing unfamiliar words in complex sentences. It will be seen from the upper panel of Figure 7 that the effects of sentence complexity and/or the listener's world and language knowledge can have an enormous effect on recognition in poor acoustic conditions. In this example, a normally hearing adult could achieve 95% word recognition in casual conversation under conditions that give only 36% word recognition to a child trying to follow new and difficult material. The child would need a 9 dB improvement in effective signal-to-noise ratio in order to match the adult's performance. This kind of discrepancy can lead to erroneous conclusions by adults about the adequacy of inferior classroom acoustics. Room Acoustics. Boothroyd, 2002. Page 13 of 18 Speech Audibility Index in % 100 0 20 40 60 100 1.0 11.5 dB 80 0.8 Normal hearing 60 40 20 Simple sentences, familiar words Complex sentences, unfamiliar words 38 % 0 -20 0.6 -10 0 10 Signal-to-noise ratio in dB 0.4 0.2 0.0 20 Speech Audibility Index in % 100 0 20 40 60 80 100 1.0 80 0.8 60 0.6 6.5 dB 40 20 0 -20 -10 50 dB unaided hearing loss 36 % (plus amplification) 0 10 Signal-to-noise ratio in dB Word recognition probability Word recognition probability in % 80 0.4 0.2 0.0 20 Figure 7. Predicted recognition of words in simple and complex sentences as a function of signal-to-noise ratio (bottom axis) and Speech Audibility Index (top axis). The upper panel applies to persons with normal hearing. The lower panel applies to a hypothetical person with a 50 dB sensorineural hearing loss. Room Acoustics. Boothroyd, 2002. Page 14 of 18 iv) Effect of sensorineural hearing loss So far, all of the analyses have assumed normal peripheral auditory function. Clearly, individuals with sensorineural hearing loss have speech perception difficulties over and above those caused by poor listening conditions. These effects cannot be modeled precisely with existing knowledge. On average, however, it can be assumed that individuals with uncomplicated sensorineural damage lose about 1 percentage point in aided phoneme recognition for every decibel of unaided 3-frequencyaverage loss in excess of 20 dB. This approximate relationship is derived from clinical experience and a variety of research studies (e.g., Boothroyd, 1984), but it does not take account of audiogram slope or deficits of language, attention or processing. The effect of this correction on the prediction of word recognition in sentences is shown in the lower panel of Figure 7. The assumption is of a person with a flat 50 dB sensorineural hearing loss. It is predicted that this individual needs a 6 dB increase of effective signal-to-noise ratio, relative to a person with normal hearing, in order to meet a 95% criterion for word recognition in simple sentences. When listening to unfamiliar words in complex sentences, however, this criterion will only provide about 36% recognition. It will need at least another 10 dB increase in effective signal-to-noise ratio to bring this individual close to her optimum word recognition score in complex sentences and, even then, the score will only be around 55%. The lower panel of Figure 7 illustrates the serious challenge faced by children with hearing loss, including those using cochlear implants, when trying to follow complex instructional material in the mainstream setting. It also illustrates how easy it can be to underestimate this challenge on the basis of observations of the child's ability to understand simple material in familiar every day contexts. Practical Implications The obvious implication of the foregoing is that the effective speech-to-noise ratio in classrooms must be high if the occupants are to have adequate access to the acoustic information in the speech of teachers and classmates. In other words, the combination of direct speech and the early components of reverberation should be high in relation to the combination of noise and the late components of reverberation. It is not clear, however, that one needs to aim for a Speech Audibility Index of 100%, which could require a noise level of 20 dBA or less and a reverberation time of 0.2 seconds or less (conditions one might expect in a recording studio or an audiological test booth). The speech signal is highly redundant, both acoustically and linguistically. In other words, the same information is often available from more than one spectral or temporal location in the signal. Because Room Acoustics. Boothroyd, 2002. Page 15 of 18 of this redundancy, excellent levels of speech perception are usually attainable with less than full access. A reasonable target for speech Audibility Index, even for complex materials, can be as low as 70 to 75% (or an effective signal-to-noise ratio of 6 to 7 dB) – as is evident from Figures 5 through 7. It must be stressed that redundancy is a relative term. The redundancy in speech is highly dependent on the language material and on the auditory, cognitive and linguistic status of the listener. What is acceptable for a given listener and a given situation may be unacceptable for a different listener and/or a different situation. Acoustical criteria need to be especially stringent for young children, children listening in a non-native language, and children with deficits of hearing, cognition, language, attention, auditory processing or language processing. Because any classroom may contain one or more such children, it is reasonable to demand a stringent criterion for all. The recently promulgated American standard for new or refurbished classrooms calls for noise levels to be 35 dBA or lower when the room is unoccupied. Reverberation times are to be 0.6 seconds or lower in small-tomedium sized classrooms and 0.7 seconds or lower in large classrooms when the rooms are unoccupied (ANSI, 2002). The reverberation criteria are, perhaps, not as stringent as they could be. The American Speech-Language Hearing Association (1995) recommends a reverberation time of 0.4 seconds or less for an occupied classroom containing children with hearing loss. This criterion translates into approximately 0.45 seconds or less for an unoccupied room. The new ANSI standard, however, does provide a reasonable compromise between the ideal and the affordable. When the ANSI criteria are applied to a room of the size illustrated in Figure 2, they translate into a Speech Audibility Index in the region of 70% for students who are farthest from the teacher. Some correction is needed, however, because the presence of the students actually lowers the reverberation. Twenty or 25 students in a room of this size might lower reverberation time by 0.05 seconds to 0.55 seconds. This change would increase Speech Audibility Index to around 72% for students at the back of the room. On the other hand, the students are also a potential source of background noise. The magnitude of this noise will depend on a variety of factors, including classroom discipline. If we assume, however, that the occupied noise level rises to 45 dBA, with the students present, then the Speech Audibility Index for those at the back of the room will fall to around 66% which may only be marginally adequate for the reception of unfamiliar words in complex sentences (see Figure 7). Unfortunately, no physical design standard for room acoustics can adequately address the issue of noise generated by the intended listeners. A second implication of the material presented here is that decisions about the need for, and success of, acoustic modifications should be based on Room Acoustics. Boothroyd, 2002. Page 16 of 18 acoustical measurements and not on the apparent ease of every day conversation between proximate adults. If administrators need data with ecological validity, older children can be given a simple open-set dictation test, using monosyllabic words. Sound-field amplification is often suggested as a cost-effective substitute for acoustical treatment. A microphone is placed a few inches from the mouth of the teacher where it picks up a signal with excellent effective signal-to-noise ratio. This signal is then distributed to one or more strategically placed loudspeakers. Sound-field amplification can be very beneficial when the primary problem is ambient noise, because it increases the level of the speech signal without increasing the noise. In addition, sound-field amplification can offset the negative effects of distance. But this technology is less effective when the primary problem is reverberation. Under this condition, any increase in speech level produces an identical increase in the level of the late reverberation and the net gain of effective signal-to-noise, for children who are not close to a loudspeaker, is zero. In fact, the presence of several loudspeakers in the room can actually increase the level of late reverberation for children who are not close to a loudspeaker. This is not to say that sound-field amplification is useless in reverberant conditions. Directional loudspeaker arrays can increase the ratio of direct to reverberant sound and children sitting close to a loudspeaker will enjoy improved perception. The extreme instance of this last approach is the desk-mounted loudspeaker. Because the child is close to the loudspeaker, the volume can be kept low so as not to increase reverberation for other children. Of course, this approach only helps the child with the loudspeaker. It is clear that the first step in dealing with poor room acoustics should be the installation of sound absorption to reduce reverberation time to acceptable levels. When this has been done, a sound-field system can be an effective way both to improve signal-to-noise ratio and to counteract the effects of distance – at least for the speech of the person with the microphone. If, for any reason, reverberation cannot be lowered to appropriate levels, any attempt to improve listening conditions with sound-field amplification requires extreme care in selection, installation, and adjustment. For the child who is wearing a hearing aid or cochlear implant, there is the option of a wireless link (usually FM) from a remote teacher microphone to the sensory aid itself. An FM amplification system is, in fact, the most effective way to enhance Speech Audibility Index – at least for the speech of the person with the microphone. With that microphone only a few inches from the talker’s mouth, the signal level and signal-to-noise ratios could be increased by some 15 dB for the child at the back of the room illustrated in Figure 2. This assumes, however, that the microphone in the hearing aid or implant has been deactivated. While deactivation of the local (also known as environmental) microphone may be Room Acoustics. Boothroyd, 2002. Page 17 of 18 appropriate for a college student listening to a lecture, it is not appropriate for younger children with hearing loss who are in primary or secondary education. Activation of the local microphone is critical for auditory feedback of selfgenerated speech and for hearing the comments and responses of fellow students. As soon as this microphone is turned on, however, the noise and late reverberation that it picks up are in danger of eliminating some or all of the benefits of the remote microphone. Careful adjustment of the relative gains via the two microphones is essential if this problem is to be avoided (American Speech-Language Hearing Association, 2002). Room acoustics is a complex, multidisciplinary topic with serious ramifications. The consequences of poor acoustics have been known for years, as have the solutions (for an excellent review, see Crandell and Smaldino, 2000). Nevertheless, many students are expected to listen and learn in rooms with poor acoustics. This is the equivalent of expecting them to read and learn in darkened rooms using poor Xerox copies of their texts. The contributions of knowledgeable Educational and Rehabilitative Audiologists are essential as we continue to work towards the goal of an acoustically viable learning environment for all children. For additional Information The analyses developed in this paper are incorporated into sound-field simulation software developed by the author for Phonic Ear Inc. This software (Sound-field Wizard) may be downloaded, free of charge, from www.phonicear.com or from www.arthurboothroyd.com. References American National Standards Institute, (1995). American national standard method for measuring the intelligibility of speech over communications systems. ANSI S3.2-1989 (R 1995). American National Standards Institute, (2002a). American national standard methods of the calculation of the speech intelligibility index. ANSI S3.5-1997 (R 2002). American National Standards Institute, (2002b). Acoustical performance criteria, design requirements, and guidelines for classrooms. ANSI S12.6 -2002. American Speech-Language Hearing Association, (1995, March). Acoustics in educational settings: position statement and guidelines. ASHA, 37, (suppl. 14), pp. 15-19. American Speech-Language Hearing Association, (2002). Guidelines for fitting and monitoring FM systems. ASHA Desk Reference, Volume II, pp 151-171. Boothroyd, A. (1984). Auditory perception of speech contrasts by subjects with sensorineural hearing loss. Journal of Speech and Hearing Research, 27, 134144. Room Acoustics. Boothroyd, 2002. Page 18 of 18 Boothroyd, A. (1985). Evaluation of speech production in the hearing-impaired: some benefits of forced-choice testing. Journal of Speech & Hearing Research, 28, 185-196. Boothroyd, A. (2002). Influence of context on the perception of spoken language. In: Proc. Congreso Internacional de Foniatrίa, Audiologίa, Logopedia y Psicologίa del lenguaje. Universidad Pontificia de Salamanca. Boothroyd, A., and Nittrouer, S. (1988). Mathematical treatment of context effects in phoneme and word recognition. Journal of the Acoustical Society of America, 84, 101-114. Boothroyd, A., Erickson, F., & Medwetsky, L. (1994). The hearing aid input: a phonemic approach to assessing the spectral distribution of speech. Ear and Hearing, 15, 432-442. Cox, R.M. and Moore, J.R. (1988). Composite speech spectrum for hearing aid gain prescriptions. Journal of Speech and Hearing Research, 31, 102-107. Crandell, C.C. and Smaldino, J.J. (2000). Classroom acoustics for children with normal hearing and with hearing impairment. Language, Speech, and Hearing Services in Schools, 31, 362-370. Davis, D. and Davis, C. (1997). Sound system engineering (second edition). Newton, MA: Focal Press. Fletcher, H. (1953). Speech and hearing in communication. New York: Van Nostrand. (Available in the ASA edition, edited by Jont Allen and published by the Acoustical Society of America in 1995). French, N.R. and Steinberg, J.C. (1947). Factors governing the intelligibility of speech sounds. Journal of the Acoustical Society of America, 19, 90-119. Mackersie, C.L., Boothroyd, A., and Minnear, D. (2001). Evaluation of the Computer-Assisted Speech Perception Test (CASPA). Journal of the American Academy of Audiology. 12, 390-396. Peutz, V. (1997), Speech recognition and information. Appendix 10 in: Davis, D. and Davis, C. (1997). Sound system engineering (second edition), pp639-644. Newton, MA: Focal Press. Steeneken, H.J.M. and Houtgast, T. (1973). The modulation transfer function in room acoustics as a predictor of speech intelligibility, Acustica, 28, 66-73.

Room acoustics and speech perception - Rohan

Related documents

Products

Support

Room acoustics and speech perception - Rohan

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib