EVALUATION OF INFINITE PEAK CLIPPING AS A MEANS OF AMPLITUDE COMPRESSION by Karen Ann Silletto If B.S.E.E., University of California at Davis (1982) Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN ELECTRICAL ENGINEERING at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 1984 Massachusetts Institute of Technology 1984 Signature of Author Department of Electxtial Engineeringj aad C6mputer Science July 13, 1984 Certified by: Dr. P.M/ Zurek, Thesis Supervisor Accepted by: A.C.Smith, Chairman Departmental Committee on Graduate Students NSTTUTE OF TFCNOLoGY ivASSACHUSEdiS OCT 0 4 1984 LIBRA1IES ARCHIVES EVALUATION OF INFINITE PEAK CLIPPING AS A MEANS OF AMPLITUDE COMPRESSION by KAREN ANN SILLETTO Submitted to the Department of Electrical Engineering and Computer Science on July 13, 1984 in partial fulfillment of the requirements for the Degree of Master of Science in Electrical Engineering ABSTRACT Infinite technique peak for clipping is a simple and processing speech prior to transmission over an analog channel with limited dynamic range. technique However, the has yet to be applied seriously to the problem of limited auditory dynamic hearing effective loss. In range this accompanying thesis, an sensorineural attempt was made to evaluate the potential for infinite peak clipping as a means of amplitude compression. Measurements were made of the compression of the range of speech levels when followed clipped by (as post-filtering speech is analyzed by clipping is presumed to occur when the ear). 2 - distortions caused by clipping were also assessed. - is Spectral and widths of filtered speech were found to be relatively independent of the characteristics post-filter. level clipped of distributions The of Measured the both ranges and pre-filter between cumulative levels were about 10-15 dB 10% the as the and 90% to the compared input ranges of 30 to 40 dB. The clipped spectra of unvoiced speech The analytically. predicted sounds of spectra clipped can voiced were sounds cannot be easily predicted and so several cases examined empirically. In distortion of the input time wave, not severe. despite general, amplitude that infinite peak clipping as deserves compression Dr. Patrick M. Title: Research Scientist, detailed study Zurek Research Laboratory of - 3 - Electronics It is a means of extreme hearing-impaired listeners. Thesis Supervisor: are to preserve designed relevant cues in the clipped spectra are discussed. concluded radical spectral distortions schemes Pre-filtering the be with ACKNOWLEDGMENT First, Pat I wish to thank his for Zurek continued I would also like to guidance and support on this project. thank all the members of the Communications Biophysics Group for to make the laboratory such an enjoyable place helping to work. Although I cannot mention all the people who I would like to been wonderfully supportive and encouraging, office-mates, forgetting Mineola Minos), away from home. McConnell, Also, Leotta, Dan special thanks to and Diane Bustamante really needed it. and Reid (not Dan Leotta, Mike for their extra help when I I would like to thank my parents Finally, and family for their love, Jean for making the office such a home and continued support. - 4 - thank my have TABLE OF CONTENTS ...................... 2 ACKNOWLEDGEMENTS .............. 4 TABLE OF CONTENTS ............. 5 ............... 7 ABSTRACT LIST OF FIGURES CHAPTER 1 9 INTRODUCTION ....... ......... 0 9 1.1 Problem Definition 1.2 Previous Research 12 1.3 Rationale for Pres ent Study 16 CHAPTER 2 METHODS 18 ............ 18 2.1 Inputs 2.2 Filtering and Clipping 2.4 2.5 6 ... 2.2.1 Pre-filters 2.2.2 Clippers 2.2.3 Post-filters .... 20 a 20 . a ...... 21 .. 21 Dynamic Range Analysis . 24 2.3.1 Level . 24 2.3.2 Probability Distribu tion Analysis Detection Spectral Analysis ......... ............... .25 26 2.4.1 H.P. Spectrum Analyz er .............. 26 2.4.2 ILS Spectral Analysi s ..............27 Control Measurements and System Evaluation...................... .......... - 5 - 2.3 ........... 28 2.5.1 Evaluation of the Level-Distribution Measurement System.................... a .28 2.5.2 CHAPTER 3 Determination of Sample Size for Sentences........................ RESULTS 3.1 37 .............................. Comparison of Distributions in 16 Frequency Channels.................. 3.2 37 Comparison of Digital and Analog Implementations..................... ...... 40 ...... 40 ..... 43 3.2.1 Digital vs. 3.2.2 Digital vs. Analog Processin g Compression Results 3.3 3.4 CHAPTER 4 .32 Analog Sentences ............... ....... 3.3.1 Pre-filtered Speech ........ 3.3.2 Post-filtered Speech 45 . 45 ....... 50 Spectral Modifications ............ ....... 51 ....... 3.4.1 Spectra of Clipped Unvoiced Speech 54 3.4.2 Phase Dependence of Clipped Voiced Sounds...................... ....... 69 3.4.3 Spectra of Clipped Vowels .. ....... 73 .94 DISCUSSION ........................ 4.1 Compression Results 4.2 Comparison with an AGC System ... 4. 3 Spectral Modifications .......... 4.4 Implications for Hearing Aid Desi gn 4.5 Recommendations for Future Work . 98 ... . APPENDIX A Probability Densities of Test Signals APPENDIX B 10 Harvard Sentences . . . .. . . - . 6 .. 99 ......... 101 REFERENCES - . . 95 ............. ...................... 102 .... 104 ..................... 109 LIST OF FIGURES 1.1 A running spectrum of a two-word utterance, before and after infinite peak clipping (Taken from Licklider et al., 1948)..... ....................................... 15 2.1 System Block Diagram................................. 19 2.2 Measured and calculated ranges for narrowband noise after squaring and lowpass filtering..................... 30 2.3 Measured and calculated ranges for tones after squaring and lowpass filtering with an RC-lowpass filter.......... 31 2.4 Ranges in the 1/3-octave band centered at 1000 Hz as a function of'sentence sample size......................... 34 2.5 Ranges across 13 frequency channels comparing inputs of 10 and 20 digital Harvard sentences.................... 36 3.1 Ranges across 13 frequency bands for both the clipping and nonclipping conditions................................. 38 3.2 Clipped and nonclipped ranges comparing inputs of digital and analog Harvard sentences...................... 42 3.3 Effect of different pre-filters on ranges in 13 frequency bands.......................................... 47 3.4 Effect of different narrowband pre-filters on ranges in the 1000 Hz frequency band............................. 49 3.5 Effect of varying the post-filter bandwidth and slopes. - 7 - Pre-filter is an allpass filter........................... 52 3.6 Effect of varying the post-filter bandwidth and slopes. Pre-filter is an optimal filter and differentiator....... 53 3.7-3.12 Spectra of synthetic consonant noise bursts before and after clipping................................ 55 3.13-3.19 Spectra of filtered noise before and after clipping................................................. 62 3.20 The phase dependence of clipped voiced sounds. Phase shifted time-waves................................. 71 The magnitude spectra of phase shifted time-waves... 72 3.21 3.22-3.39 Spectra of synthetic vowels before and after clipping................................................. 76 Al Probability density functions for narrowband noise after squaring and lowpass filtering for R less than 1........ 107 - 8 - A2 Probability density functions for narrowband noise after squaring and lowpass filtering for R greater than 1..... 108 Chapter 1 INTRODUCTION 1.1 Problem Definition Sensorineural hearing impairments are characterized elevated hearing without thresholds a by corresponding elevation in the discomfort threshold. This reduction dynamic can necessitate range, compromise if between Consequently, severe speech amplitude enough, intelligibility compression of and speech in a comfort. has been extensively studied as a means of matching the dynamic range of speech and other acoustic signals to the reduced dynamic range of the impaired listener. The most common means of amplitude compression has been automatic gain control (AGC) in which the gain of -an output amplifier is amplitude. controlled Results of by and limitations estimate of (1979). While there for such systems, applications in the hearing-aid application. due to the "attack time" of an AGC system, limitation - 9 are clear For instance, the effectiveness sounds is is that the actual degree of - Another input there are also in protecting the user from high-level transient limited. the studies of AGC in hearing aids are reviewed by Braida et al. benefits an compression is less than that specified (DeGennaro et al., 1981). Infinite peak-clipping is a simple method of that compression amplitude reduces the dynamic range of drastically so that the speech signal by clipping the input time-wave of zero-crossings the output Since the waveform original preserved. are infinite (envelope) amplitude is constant, produces clipping peak Only the constant amplitude results. of wave rectangular of reduction range maximal a the wideband signal. The obvious cost of this range waveform it However, distortion. highly be Bindra, Pollack, 1948). clipping should in compression implants be tactile and it Thus, as considered hearing Licklider, (Licklider, 1946; intelligible aids speech infinitely-clipped despite this extreme distortion, can found that, been has severe is compression and aids, a means it peak of amplitude perhaps also since that seem would cochlear achieves large compression of the amplitude range of speech with relatively little loss in intelligibility. Though clipping produces a constant-amplitude signal, the perceived loudness may not be constant because the ear effectively filters incoming sounds (critical bands) outputs whose - 10 - bands wideband into frequency are believed to be relevant to loudness The amplitude (Green and Swets, 1966, Chapter 10). range of speech after post-filtering has not yet been studied and clearly needs to be understood in order to evaluate clipping as a means of amplitude compression. The effects of clipping on important intelligibility are particular, while short-time spectral important, also it is not amplitude clipping for speech understood. accepted pattern that of changes in this pattern resulting wideband multiple well generally peak clipping are not understood. though cues speech from fair the is infinite It is possible that, produces In even intelligibility, narrowband clippers could better preserve spectral amplitude and patterns increase consequently intelligibility. The general problem addressed in infinite peak clipping the amplitude range of speech for specifically, be might this used thesis will investigate the the patterns of clipped of clipped how compress the The measured well and of the amplitude to understand past studies which as - 11 - as range and waveforms will be analyzed in order to evaluate the potential for clipping as a means of compression, More influence of distributions speech. filtered-clipped-filtered speech will be spectra to is hearing-impaired. pre- and post-filtering on the amplitude spectral thesis examined the intelligibility of clipped speech. 1.2 Previous Research Peak transmitters clipping was during WWII intelligibility over a range. dynamic first a as means showed experiments only preserves with limited (1946) studied different types of Licklider detrimental least radio for maximizing the channel transmission amplitude distortion and concluded the in introduced to was peak-clipping Licklider's intelligibility. since that that clipping peak infinite the zero crossings of the input signal, pattern of instantaneous amplitudes in the the waveform speech is not essential for intelligibility. Pollack the that (1952) confirmed Licklider's results and showed of intelligibility relative to unmodified speech, signals equated in range of is a function the and roughly square speech than normal speech. the clipper degrades (with SNR has independent speech signal. more intelligible than normal speech at the of speech, terms of peak amplitude and with noise added after the clipper) frequency infinitely-clipped of the Clipped speech is low SNR's because more power per unit peak amplitude At high SNR's, distortion introduced by intelligibility relative to unmodified - 12 - speech. Intelligibility highpass filtering Bindra, Pollack, Niederjohn, of clipped the is improved by speech before clipping (Licklider, 1948; 1970; speech Pollack, Thomas and 1952; Thomas and Ravindran, 1971). One interpretation of this finding (Thomas and Niederjohn, 1970) is that a highpass pre-filter with cutoff at 1100 Hz reduces high-amplitude, low-frequency signals and results in preservation better of the second formant, which is very important for intelligibility. There has been one investigation of clipped speech with hearing-impaired listeners. Thomas and Sparks (1971) compared the intelligibility of filtered-clipped speech with linearly-amplified speech by hearing-impaired subjects. For 13 speech was linearly-amplified speech with filtered-clipped equated in terms of testing their of the ears 17 of cases, intelligible more the 17 two types overall SPLs. of 16 the than speech However, these results are open to the criticism that the linear reference condition may have been less than optimal. Hildebrant (1982) system that filtered investigated the regions corresponding to formants. The incoming the frequency ranges multiband a clipping speech into frequency of the first three bands were individually clipped, - 13 - filtered again and then summed together. A reduced dynamic range was created for normal listeners by adding white noise after processing and restricting the wideband signal below an artificial "discomfort threshold". with clipped speech linearly-amplified greatest speech. (1981), Guidarelli with was 4 who much to be Intelligibility better than with A similar system was studied by found intelligibility that octave-wide or 6 was two-third-octave-wide channels. The general conclusion that drawn from range of wideband speech while maintaining a fair degree This intelligibility. "amplitude discarded information" result, fundamental in a speech waveform understanding this lies result wave. Although it is the However, in often very similar to the Licklider, can be not at key to all obvious signals from are spectra of unclipped signals. and Bindra, Pollack (1948) clearly the importance of examining the spectral changes caused by clipping in order effect of to understand the clipping on intelligibility. - 14 - small the zero-crossings inspection of the wave, the spectra of clipped emphasized that realization that the "amplitude information" is also coded in the a of without a drastic degradation of intelligibility, initially appears quite surprising. of past that infinite peak clipping reduces the dynamic is studies be can relatively Figure 1.1, SW -- 4E SEN --------- rIME--m CH I.. NO CLIPPING SH----- E INFINITE I.- 8 EN-----------;H PEAK CUPPING - 15 - Figure 1.1 A running spectrum of a two-word utterance, before and after infinite peak clipping (taken from Licklider et al., 1948). taken from their paper, of amplitude vs. shows a running spectrum time and utterance before and after this frequency) infinite (a 3-D plot of peak a two-word clipping. From frequency analysis it is clear why clipping has little effect on intelligibility - the spectral patterns of clipped and unclipped speech are not drastically different. 1.3 Rationale for Present Study The rationale for studying the effect of pre- and post-filters on the range of clipped speech and the spectral distortions caused by clipping follows model an from underlying and set of assumptions about the operation of the ear and the perception of speech. First, incoming that the ear analyzes frequency bands that are approximately believed octave wide. In the present study, it is generally speech one-third signals. the question: of an it is assumed that the quantity relevant to dynamic range is the envelope of bandpass into A major portion of this these thesis addresses what are the effects of post-filtering on the - 16 - envelope distributions of infinitely-clipped speech waves? A to second important assumption is that the primary intelligibility lie in the of speech. In particular, spectral cues (magnitude) patterns it is assumed that local maxima in the spectra (formants) are vital to intelligibility. effects of clipping on these patterns well as ways, such as pre-filtering, - 17 - distortions can be minimized. will be studied The as in which the spectral Chapter 2 Methods A is block diagram of the system used in the present study shown in Figure 2.1. test signals) were either and clipped, or these distributions of the were pre-filter The dynamic assessed signal a The spectra of the 'A' were compared. signals (speech and other first filtered with they were unmodified. two signals at point of Input signals by envelopes ranges measuring after the bandpass filtering. 2.1 Inputs Harvard sentences were chosen as input (IEEE, 1969) signals. speech sample that had an amplitude characteristic by It talker was desired to use a distribution that was of spoken English, but yet was of manageable size for digital processing. determined spoken by a male preliminary The number measurements - 18 - (Section 2.5). of sentences described was below Level Distribution Post-filter Level Probability Detector Analysis I (a) (a) (d,a) -- ..-- - ' Single Channel -0 (A) (d,a) Pre-filter O---- -- i Infinite eak clipper - - 16 bands Input (d,a) (d,a) Post-filter (Multifilter) (a) * Level Probability Detectors Analysis (a) (a) d = digital a = analog H.P. 3582A 0--Spectrum Analyzer J (a) ILS Analysis (d) Figure 2.1 System block diagram H.P. 7004B X-Y Recorder (a) Multiple Channel 2.2 Filtering and Clipping Figure Many of the operations shown in implemented 'd' The letters analog equipment. 'a' and in be or with in real time), (not digitally either could 2.1 Figure 2.1 indicate digital and analog capability. 2.2.1 Pre-filters 1100-Hz highpass suggested by Thomas and Niederjohn (1970). For ease The analog pre-filter was a filter 2-pole, of reference, this filter is termed "optimal" because Thomas and Niederjohn found it to give the best intelligibility of in condition the which of Also for ease a number of pre-filters. is signal reference, clipped the without pre-filtering will be termed "allpass" pre-filtering. software package, by written design the infinite Signal highpass filter, and used to waveforms. An used for a digital pre-filters allpass an a filter was Butterworth design was Digital filter, pre-filtering), Incorporated, and filters response the digital filters. "optimal" (Interactive Laboratory System) Technology digital impulse ILS single-pole, were filter 6 (i.e. dB/octave no 8000-Hz which will be termed "differentiator" for - 20 - A obvious reasons. Speech into formant the described. three was regions Hz to 2800 Hz, Hz. digitally that pre-filtered Hildebrant (1982) bandpass filter The first formant filter was a filter was 900 the second formant between 200 Hz and 900 Hz, 6000 also the to narrowband pre-filters with bandwidths Finally, less than 231 Hz and center frequency at 1000 Hz for Hz and the third formant filter was 2800 were used purpose of exploring the effects of the pre-filter bandwidth and slope. 2.2.2 Clippers Analog clipping was performed by a.Schmitt trigger with an The adjustable dead prevented zone to adjusted was triggering zone" "dead hysteresis internal by a (Hildebrant, 1982). minimum performed waveform values by (except zero, Digital clipping the sign of the digitized only keeping which noise so that no output signal resulted from a null input signal. was value which remained zero). 2.2.3 Post-filters The clipped waveform (at point post-filtered "critical-band" detection) (before filters using - 21 a - be 'A' in Figure 2.1) General into Radio could adjacent (1925) or multifilter, frequencies that range slopes skirt a single one-third 30 contains multifilter filters with center Butterworth (3-zero), 6-pole octave, The band. frequency into post-filtered digitally Hz. from 25 Hz to 8000 The filter the passband are about 36 dB/octave and near flatten out to about 18 dB/octave due to the three zeros in the transfer function. from 19 one-third octave bands between Only the outputs 125 since the most important cues and 8000 Hz are used, Hz the from outputs frequencies, proportionately are bands critical three broader lowest one-third octave bands are summed to form channel outputs from third-octaves summed to form channel centered at 250 Hz and 2. frequencies lower at of these 1, and the 315 Hz are The remaining channels are the outputs of single third-octave bands. center Because range. frequency for intelligibility lie in this The bandwidths of the 16 frequency channels and are listed in Table I. For greater flexibility examining in skirts and bandwidths, post-filter the effect single band post-filters Post-filter skirts were were implemented digitally. of varied by changing the number of poles in the filter transform from from 231 Hz The bandwidth of the to .29 The Hz. - 22 postfilter was varied filtered bandpass output was - 2 to 8 poles. Table I RC-Filter Time -3 dB Constant Frequency Channel Bandwidth Center number frequency R 1 160 Hz 113 Hz 2 280 130 9390.0 17 7.65 3 400 93 5730.0 28 3.32 4 500 115 4456.0 36 3.19 5 630 146 3390.0 47 3.11 6 800 185 2657.9 60 3.08 7 1000 231 1989.4 80 2.89 8 1250 289 2021.0 79 3.66 9 1600 371 1559.7 102 3.63 10 2000 463 1368.7 117 3.95 11 2500 579 859.4 185 3.12 12 3150 729 859.4 185 3.94 13 4000 926 251.5 531 1.74 14 5000 1158 198.9 795 1.46 15 6300 1459 99.9 1592 0.92 16 8000 1852 39.9 3979 0.47 - 23 - 17825.0 usecs 9 Hz 12.50 input into a single analog detector. 2.3 Dynamic Range Analysis Level Detection 2.3.1 The hardware used to detect the envelope of the is part a of multi-channel Automatic Gain Control by Coln (1979). compression system designed and developed short-term rms calculation Each level detector performs a which the bandpass signal in signal is first squared and then filtered output of one of these detectors is an estimate of the logarithm of the envelope by an The filter. RC-lowpass There are 16 level detectors, squared. 16 bandpass outputs of the multifilter. is post-filter digital (after D/A conversion) employed, the down signal. bandwidth is to to the level (channel 7). In channels filter capacitor. time constants were adjusted ripple the output of the filter filter time constant can be adjusted by to keep 1 through 12, the the second-harmonic 6 percent at twice the center frequency of However, larger in the than two one-third - 24 - varying the single-band When a is presented directly detector with an 80 Hz RC-filter Each RC-lowpass one for each of the lowest octave, channels the therefore a greater percentage of the signal envelope will be resulting in through 16, to the a In channels 13 the RC-filter bandwidth is larger in input channels, "smoother" signal envelope. filtered, signal center frequency proportion than in the other so more second harmonic ripple will be included in the estimate of the envelope. The filter time-constants are listed in Table I. 2.3.2 Probability Distribution Analysis Samples from the 16 level detector outputs were digitized, and "bins" corresponding to the decibel level and channel of each sample were incremented. band was sampled at a rate of 400 samples/second, 1/2 dB The samples/second. total sampling rate of 6400 was The signal in each giving a binwidth which was determined by and the dynamic range, the level detector, was 80 dB in each band. of Normalized histograms signal envelope are plotted decibel the level for each of the channels. The distributions are normalized to histogram value distribution is amplitude in the 16 channels. along calculated levels., The 10% level which only 10% of the speech samples have greater amplitudes - 25 - samples of the 16 frequency the largest The rms level of each with the 10% and 90% is the amplitude level exceed; than the 90% 90% of the point. Henceforth in this thesis, level the term "range" applied to a distribution will mean the difference between the 10% and 90% levels. Samples of silence or low-level noise may distributions. distributions be simply In cases where the are clearly separated, deleted. corrupt signal the noise and the noise samples can The modified file will produce a more accurate signal amplitude distribution. Such distributions will be labeled "edited" in the following sections. 2.4 Spectral Analysis Spectral analysis was performed using two methods. H.P. spectrum 3528A analyzer graphs were plotted with the H.P. and averaged the computed spectral magnitudes of a number of time samples. spectrum of Spectral 7004B X-Y recorder. ILS software package was also used to compute and magnitude The a single time sample. plot The the Both methods used the Fast Fourier Transform (FFT) algorithm. Spectrum Analyzer The H.P. segment of 3582A spectrum analyzer transforms discrete spectrum using an FFT time data into a discrete implementation. - 26 - 2.4.1 H.P. In a finite frequency single-channel mode, 1024 samples of the input signal are transformed by the FFT into 256 complex values. frame The duration, T, of a time is determined by the selected frequency spans and the number of samples, T = 1024/(4*frequency span) = 256/frequency span. For example, 51.2 to compute the spectrum over a time segment is sampled. msec points are spaced at equal 5000 Hz span, a The discrete frequency intervals, vF=l/T. In this example frequency resolution is 19.5 Hz. analyzer also allows for up to 256 spectra to The H.P. be to give a better averaged estimate of the spectrum of a Time samples were weighted random signal. with a Hanning window in order to minimize both the amplitude and frequency uncertainty in the spectrum. 2.4.2 ILS Spectral Analysis The spectrum of a finite digital wave can and be computed graphed with an ILS routine which computes a 1024-point FFT. window The 1024-point time sample was weighted with a Hamming in distortion. order to the frequency and amplitude minimize Since only 1024 points are repeat waves which points (such as periodically static vowels) - 27 static time in less than 1024 sample are appropriate this analysis. - used, signals for 2.5 Control Measurements and System Evaluation 2.5.1 Evaluation of the Level Distribution Measurement System In order detector system, were measured. were to input assess the performance was known to Analog narrowband level random as follow noise and tones into the level detector in channel 7, directly used narrowband the ranges of signals with known distributions which has an RC-filter cutoff of 80 Hz. noise of a the Narrowband random test signal because its envelope is Rayleigh probability noise was centered at 1000 Hz, density. The and the bandwidth of the noise ranged from 3 Hz to 231 Hz. The measured range for a 2 dB smaller than the 10-Hz band of noise was 11 db, expected Rayleigh-distributed variable. 10-Hz tone was 13 dB, The range of 13.4 dB for a measured range for a 3 dB smaller than the expected range of 16 dB for a tone. The ranges the discrepancy of the influence probability between the measured and expected envelope distributions can be attributed to of the densities RC-lowpass filter. Calculated for narrowband noise and tones after - 28 - squaring and lowpass filtering are given in Appendix A. The calculated and measured ranges graphed as a function bandwidth (or R, frequency) Figures 2.2 and 2.3, figures of that the frequency the to ratio the measured increases, envelope range (dashed envelope range increases. of input RC-filter respectively. ranges to within about 1 dB. tone for noisebands and tones are signal bandwidth in It is evident from these ranges follow the calculated As the noise bandwidth or the the difference between the input line) and If the lowpass-filtered the RC-filter bandwidth is much larger than the noise bandwidth (or tone frequency), and if the second-harmonic ripple can be neglected, then the effect of lowpass filtering will be small. Residual deviations between the calculated and measured ranges may be attributable to sampling errors or to circuit imperfections. being A possible reason slightly for the measured range greater than the calculated range for noise with bandwidths larger than approximately 80 Hz is that analysis did ripple. As not the include noise the the effects of second harmonic bandwidth increases, the signal envelope will include more second-harmonic ripple which will increase the measured level range. It is expected that the decibel underestimation of input envelope range will be the same for signals, such as 29 - speech, with level ranges that are much wider than those - the of j. s- = calculated range o= measu red range 14- + t2 0 1 0 ~ 0 +- 0 + C0 + 6 0 4- 4- 2- 313 .0625 = 0.2 0.5 .0 2.0 Input Noise Bandwidth/Post-detector 4.0 6.0 Bandwidth - 30 - Figure 2.2 Measured and calculated ranges for narrowband noise (centered at 1000 Hz) after squaring and lowpass filtering with an RC-lowpass filter. Range is graphed as a function of the input noise bandwidth divided by the RC-filter Dashed line represents the expected range for bandwidth. a Rayleigh distributed variable. 20 calculated range measured range --------------------------------------------------------- 0 + 0 14 0 12 - #- 0 a, (Y_c 8 0 + 4 A- 0 20 0.0 -/, .0313 .062S 0.12S 0.2S 0.S L. R = Tone Frequency/Post-detector 2.0 4.0 -- 8.0 Bandwidth - 31 - Figure 2.3 Measured and calculated ranges for tones after squaring Range is and lowpass filtering with an RC-lowpass filter. graphed as a function of the tone frequency divided by the RC-filter bandwidth. noise or tones. result from a process, If we assume that such wide distributions large but slow modulation of a then the resulting level distribution should be the convolution of the distributions of the modulation its variation will be for example, unaffected by the lowpass filter. the underestimation about 5 dB if the noise is slowly modulated be still the if there is a 5-dB underestimation of the input 13.4 dB range of narrowband noise, will and If the modulation is sufficiently slow, envelope. bandpass Thus, bandpass over a range of, for instance, 40 dB. 2.5.2 Determination of Sample Size for Sentences The range of a bandpass speech signal estimated with an of n sentences should converge to a constant value as input n is The sample size used in the main study increased. chosen by preliminary measurements in which the number of increased until there was only a sentences in the sample was small change in the was for an increase in the sample range size. The level ranges after single post-filtering filter centered at 1000 Hz. - 32 sentences were measured by a one-third octave (231 The ranges - Hz) digitally of were found to vary from 25 dB (Figure 2.4). to 46 dB among a Since this range, varied 21 dB, sample of 10 sentences for 10 single a larger sample of sentences was considered. The sample size was increased to four, ten, sentences and the range was measured 20-sentence 36 dB and 38 dB, one-half octave wide di'stributions above 500 Hz, (1981) (1980) and DeGennaro et al. Krieg and would sentences were close to the results obtained by Dunn and White for (The sample contained those same 10 sentences.) since the measured ranges, appropriate twenty sentences, It appeared that a sample size of 10 or 20 be or (Figure 2.4). 10-sentence sample contained the original 4 the sentences, (1939) and by for a third-octave distribution. frequency The measured ranges in 13 input (Figure In 2.5). sentences the of sample were 10 20 and order for an sentences were compared examine to channels all channels, the bandpass filtered with the analog one-third and 16 were omitted because the input was filtered at 4.5 kHz). In octave multifilter. dB. the Since were chosen are 15, frequency channels the ranges differed by at difference was small, for the sample data. seconds long, The most 3 10 Harvard sentences 10 sentences chosen The sentences are aproximately 3 listed in Appendix B. so 30 seconds of data at - 33 - all 13 (The ranges in channels 14, 400 samples/second, s0 - 1 1 1 1 1 1 1 1 I I 10 I 12 14 I 1G 46 H- 40 I- Cn 301- 20 0 I 2 i I 4 G 9 I 19 I I 20 22 Number oF Digital Harvard Sentences - 34 - Figure 2.4 Ranges in the one-third octave band centered at 1000 Hz as a function of sentence sample size. or 12000 samples of data are included in each histogram from - 35 - which range is estimated. j. = 10 o = sentences_ 20 sentences 4S40 - 0a <U 20- 10- 0- .16 .28 .4 .83 . .8 1.0 1.2S 1.9 Band Center Frequency 2.0 2.9 3.19 4.0 (kHz) - 36 - Figure 2.5 Ranges across 13 frequency channels comparing inputs of 10 and 20 digital Harvard sentences. Chapter 3 Results 3.1 Comparison of Distributions in 16 Frequency Channels The histograms of one-third octave bands of clipped First, experiments. (pre-filter by the and post-filtered condition reference The multifilter. clipped and post-filtered by the highpass filter (analog), clipped, 1100-Hz were sentences were pre-filtered with an the second, multifilter; following the sentences the 10 digital an allpass filter) was in investigated were speech unmodified and speech of filtered only by the multifilter was also investigated. The amplitude ranges, Figure shown in dependence on band center frequency. may be attributed quantity the R, to Figure (see From the analysis presented in envelope in Section expect a smaller range in the we 2.3), because R is larger I). there should be relatively less smoothing of the higher channel frequency 13 (see at 4000 larger in proportion bandwidth of the RC-filter is - channels (centered - 37 - Conversely, the Table the in of bands across channels centered at 160 Hz and 280 Hz, than a ratio of input bandwidth to the detector's RC-filter bandwidth. 2.5.1 exhibit Part of this dependence variation the 3.1, Hz) since the to the I I [= F2 = F1/C/F2 0= C/F2 0 45- 0 0 03 35- ea1 001 :7) 03 C 30- 03 0 0 25- - 41- 0 0 20is of10 0 -1- - 1-0 -1- 0 0 0 I- 0 0 - 0 0 0 .1 I I .28 I I .4 I .9 I .63 I .8 I I 1.0 1.26 1.6 0 I I~ I 2.0 2.9 3.19 4.0 Band Center Frequency (kHz) Figure 3.1 Ranges across 13 frequency bands for clipping and. Input is 10 digital Harvard sentences. nonclipping conditions. F1 = 1100 Hz highpass filter (analog) C = Infinite peak clipper (analog) - 38 - F2 = Multifilter signal bandwidth (R is smaller) than in the lower-frequency There will also be more second-harmonic ripple channels. channel 13 which will tend to increase the range. are results for only the first 13 channels in (Note that because plotted the digital signals were lowpass filtered at 4.5 kHz, and the three highest channels are centered at frequencies above that cutoff). The amplitude multifilter varies range at for 15 most channels and averages 32 dB. well with system, speech dB The as well as with Dunn and ranges White The present measurements than reported used the same system as envelopes. It should by reasonably agree (1940) are who. used a here for smaller slightly DeGennaro et al. used frequency (peak pressures in 1/8 second intervals). those all across the (1980), who used the present the results of Krieg different measurement procedure by filtered (1981), who also measuring bandpass be noted that while DeGennaro et al. (1981) display a 50-dB range for one talker and a single band (Figure and 7) 4 in their paper), their subsequent for at least two talkers and all sixteen 39 - nearly all ranges to be between 30 and 40 dB. - figures bands (5, 6, show Clipping reduces the range of speech by about 15 dB 20 The range of bandpass speech across all frequency bands. that was pre-filtered with an to 1100-Hz highpass filter and clipped varies 12 dB across frequency bands and averages 15.8 dB. by The range of clipped speech with no pre-filtering varies 11 This difference dB and averages 9.5 dB across bands. will be channel is between highpass pre-filtering and no pre-filtering discussed further in Section 3.3.1. In subsequent experiments the analyzed, band in which a single approximately was chosen as representative because it is the middle the speech frequency range, of This band centered at 1000 Hz is used. band the measured ranges in this are in and also because near averages across bands for the three conditions the respective in Figure 3.1. 3.2 Comparison of Digital and Analog Implementations 3.2.1 Digital vs. Analog Sentences In order to identify any differences due to digital or analog sentence inputs, with tape-recorded analog were compared. sentences and histograms made digital sentences Thirty tape-recorded analog Harvard sentences and 10 digital Harvard sentences (after D/A conversion and filtering at 4.5 kHz) were processed identically with - 40 - lowpass in the measurements the analog equipment. Ranges were compared filtered speech and clipped/filtered speech. for bandpass In the clipping condition, these sentences were pre-filtered with the 1100-Hz highpass filter, infinitely filtered into the sixteen channels by the analog sentence histograms were No for at discrepancies ranges the between difference was editing The ranges 3.2. are shown in Figure Except The "edited" (as described in the digital sentence histograms. for bandpass multifilter. Section 2.3.2) to eliminate obvious noise. neccessary and peak-clipped, 800 for sentences in the nonclipped condition and 2500 and digital is less the Hz, than analog 5 dB. (Note that the three bands above 4.5 kHz are included for the analog sentences.) With the exception of channel 1, was range measured a larger with analog sentences than with digital sentences. In the clipping condition, the difference between the 3 dB. The and analog sentences is 2 to smaller discrepancy here may be attributed to the for digital which "dead-zone" was produced by tape hiss distributions to 40 dB), it distributions clipper's adjusted so that there was no output (when the speech was off). Because the of the unclipped analog sentences are wide is more for likely the speech and (30 noise to overlap in this condition than with clipp.ed - 41 - ranges SI I I 3 s0 I I I I Analog Sentences (edited) - I I I I I = F2 4- = F2 = FI/C/F2 x = FI/C/F2 0 45 3 0 0 0 0 + 33 3- c I OLgftel Sentences 4++ - 30 - + 0 + 2S 0 0 20 xx 15 0 06o 10 I - I .16 .29 .4 I I I I I I II| | | .S .G3 .8 1.0 1.2S LB 2.0 2.G3.194.0 S.0 6.3 9.0 - 0 Band Center Frequency (kHz) Figure 3.2 Ranges across 13 or 16 frequency bands comparing digital and analog Harvard sentences as inputs. Fl = 1100 Hz highpass filter (analog) C = Infinite peak clipper (analog) - 42 - F2 = Multifilter speech. Under clipping, distributions is the reduced mixing because of noise the and range speech of speech is reduced. It appears from these are superior to sentences signal-to-noise ratio. measurements analog that digital in terms of sentences However, for the clipping condition, analog and digital sentences produce similar results. 3.2.2 Digital vs. Analog Processing In order to investigate pre- different designed. and and The 10 digital sentences were compared. were used as test inputs. highpass filter, processing. digital of digital post-filters, Histograms after Counterpart effects the of were processing listed in Appendix B 1100 Hz and clipper were used for analog processing. digital were filters designed for digital The comparison was made on the one-third octave different pre-filters (an allpass and a one-third octave Hz) filters analog (channel 7). Digital and analog processing were compared 1000 number multifilter, The analog distribution centered at 1000 Hz filter, a before clipping well 43 - condition of no clipping. as three an 1100-Hz highpass filter (231 Hz) as - filter, with for centered at the reference Table II A comparison of ranges center frequency=1000 in channel (bandwidth=231 Hz, 7 Hz) after digital and analog processing of 10 digital sentences. F2 = 1/3 octave (231-Hz bandwidth) Fl = 1100-Hz highpass pre-filter C = Infinite peak clipper Digitally processed filter centered at 1000 Hz Fl/C/F2 F2/C/F2 F2 C/F2 36 11 13 6 34 14 18 7 processed - 44 - Analog Table II and shows that the differ processing by at most 5 dB. analog speech is that pre-filtering the with between digital An interesting fact one-third a octave This will be from 36 to 6 dB. range the decreases filter ranges discussed in Section 3.3.1. 3.3 Compression Results Pre-filtering 3.3.1 the various of effect The pre-filters used here various studies used by Niederjohn filter) Licklider and The third is an (1968). (1948) 1100-Hz and and Niederjohn (1968). as and dB/octave) referred to as the (subsequently "optimal" A fourth is the set of pre-filters described by Hildebrant The first-formant Butterworth filter filter bandpass - 45 - 4-pole (1948), Thomas (12 (1982) that was designed to separate the formant speech. in The first is an allpass as used by Licklider Pollack suggested by Thomas pre-filter employed (subsequently referred to as a differentiator) filter highpass those The second is an RC highpass filter with cutoff for example. at 8000 Hz were of clipped speech. no pre-filtering) (i.e. determine pre-filters on the range of clipped speech. filter to conducted The following experiments were is a regions of 200 Hz to 900 Hz, (spanning channels 1 through 6), the second-formant 10-pole Butterworth through 11), and filter is a 900 Hz to 2800 Hz, bandpass filter (spanning channels 7 the third-formant filter covers the range from 2800 Hz to 6000 Hz (channels 12 and 13) is and also a 10-pole Butterworth filter. The ranges in the thirteen channels under from 9 no pre-filtering has the lowest overall range, 20 averaging and dB Differentiating the sentences before clipping range relative 19.8 with Pre-filtering dB. increase the range quite as range to bands. increases the to no pre-filtering by about 6 dB across all these ranges vary from 13 dB to bands; dB frequency across dB 13.7 various Clipped speech with are shown in Figure 3.3. configurations the dB as and differentiating; is 9 dB to 22 dB across all bands and averages In the low-frequency bands, speech average filter does not optimal an much 27 pre-filtered the 15.5 dB. with the first formant filter has the smallest range of the four types of pre-filtering The second similar to average and the ranges (averaging 12.8 dB in the third formant filters and differentiator first 6 channels). produce optimal an effect filters. The in these two formant filter regions are 16.6 dB in channels 7 through 11 and 14.5 dB in - 46 - 13. channels 12 and Fl/C/F = F2/C/F = F3/C/F x = C/F = Fopt/C/F = FdtF/C/F 35- 30+ 7 ~2520 CL/ \ x +-, XV x Zk\4 x" X / 10 0 7- S_ - .16 .28 .4 .5 .S3 .8 10 1.25 1.6 ZO 2.5 3.15 4.0 Band Center Frequency (kHz) Figure 3.3 Effect of different pre-filters on ranges in 13 frequency bands. - 47 - F1 = 200-900 Hz bandpass digital filter F2 = 900-2800 Hz bandpass digital filter F3 = 2800-6000 Hz bandpass digital filter Fopt = 1100 Hz highpass digital filter Fdif = 8000 Hz highpass digital filter C = Infinite peak clipper F = Multifilter The experiments. The in investigated were sentences and bandwidth pre-filter varying were slope rejection the following pre-filtered, digitally and digitally post-filtered with a one-third octave clipped, filter centered at bandpass filter-skirt Hz. of effects slope 1000 which Hz, same the had the pre-filter also centered at 1000 as The pre-filter bandwidth varied from 231 Hz to 29 Hz and the slope of the pre-filter and post-filter skirt were varied together by changing the Butterworth filter transfer function 2-pole from channel to signal level was detected in The 8-pole. the after bypassing (RC-filter bandwidth is 80 Hz) 7 multifilter. comparing the ranges of speech Plots the narrow-band allpass, optimal, 3.4. filter, the range speech to and differentiator Except Figure into filters pre-filtered with the shown are filters with in case of a 2-pole Butterworth the for pre-filtered for clipped speech that was pre-filtered a one-third octave (231 Hz) band is 5-8 dB smaller than speech that optimal was pre-filtered filters. pre-filter decreases, pre-filter transform with the differentiator as the bandwidth Generally, the range increases. has more than 2 - 48 - pre-filter slope does not change the ranges Given poles, or of the that the varying the appreciably. 2-pole Butterworth Filter 4-pole Butterworth Filter 20 - - 20 - 18 16- t4 - 12 to 0, 0, 1-4 M X 11 =1 - 11S - 2q G Bandwidth - 1_7 231 - - - O - -p 2- -2q All- - - 6-pole Butterworth Filter I I I I I S-pole All - OIFP - ;oat - - P'eFlIter Butterworth Filter I - 20 I 1-73 - 231 - Bandwidth (Hz) P-e-FE~ter (Hz) S2 - 1S - 4 2 14 161- * - 18 14 12 14 14 -4 14 14 14 1- 4- 2 2 * 2q S - IlS - 173 - 2.31 Bandwidth (Hz) - - AL- 2q 8 is Bandwidth P t73 2.1 - - OFP A (Hz) Figure 3.4 Effect of different narrowband pre-filters on ranges. Also plotted are the ranges with the differentiator (Diff), the 1100-Hz highpass filter (Fopt), and the allpass filter (All). Post-filter is 1/3-octave centered at 1000 Hz. Pre-filter and post-filter slopes are the same. For the - 49 - cases of 'Diff', 'Fopt', and 'All' these slopes apply only to the post-filter. The pre-filters with a produce 2-pole transfer effect on the low frequency part of the spectrum an similar to the differentiator. Since the speech spectrum dominated by low-frequency components, (upper left pannel of Figure 3.4). the general trends with a 29 Hz, Clipping speech. appear are similar from for speech that is pre-filtered clearly compresses the 16 dB more in than the 20 amplitude 1000-Hz dB (Figure 3.3); that is, range of clipping produces output frequency (Figure 3.4). frequency bands the range is slightly larger, dB is 6-pole Butterworth filter. ranges of 8 to of the ranges Other discrepancies Digitally pre-filtering and reduction will function band, a Across all from 10 to 23 clipping decreases the range by 15 to 20 dB across all frequency bands. 3.3.2 Post-filtering The effect of by post-filter. bandwidth skirts were the the bandwidths clipped and speech slopes Post-filters were centered at 1000 Hz was 1/24 octave varying decreased (29 Hz). one-third octave from varying by to 2-pole - 50 the was of the and (231 Hz) The slopes of the post-filter increased transfer function from - determined post-filtering the to bandpass Butterworth filter 8-pole. Variations in were post-filtering examined pre-filtered with either or filter, the differentiator. that filter, allpass the speech with was optimal the With the differentiator as pre-filter, only the effect of varying postfilter slopes examined; the was post-filter bandwidth was fixed at one-third octave. 3.5 Figures in variations examined, show 3.6 and all for that pre-filters Figure slopes have relatively small effects on compression. 3.5 presents filter) allpass slopes. skirt and bandwidth 3.6, as a function of pre-filter post-filter (i.e. bandwidth an and range of the clipped speech varies from The with the exception of the measurement with the 29 Hz 9-15 dB, optimal no with results range skirt and bandwidth post-filter 8-pole Butterworth pre-filtering, and differentiator with Results filter. shown in Figure exhibit a similar spread of ranges. 3.4 Spectral Modifications Speech determined sounds can by distinction is clipped speech the classified be or presence into absence important in the analysis of sounds. The spectra of two categories of voicing. the spectra - 51 of the two types of speech sounds are analyzed in the following sections. - This 2-pole Butterworth post-filter center frequency=OOOHz I is- I 4-pole Butterworth post-filter center frequency=OOOHz I I I 18- 1.- I I 29 sa I I I I T I - 16Is14 12 - * * - 12 * * - C a- 4- 4- 2- z 2q sa ils 174 Post-ftIter Bandwidth 232 a (Hz) Lis Post-Fi[ter 174 232 Bandwidth (Hz) 3-pole Butterworth post-filter center frequency=OOHz 6-pole Butterworth post-filter center frequency-COOHz te - I 1.8- 1.4 1.4H I I I I I I 5 I * * Is- I -1 2 2VY -I 2 _2 29 I 1.1U5 Post-Filter Figure 3.5 slopes. 174 Bandwidth 2q 232 G2 IUs Post-rtiter (Hz) 174 Bandwidth 232 (Hz) Effect of varying the post-filter bandwidth and Pre-filter is an allpass filter. - 52 - I S 2-pole Butterworth post-filter I I8- 4-pole Butterworth post-filter center frequency=l00oHz I I I 1 1 :8 center frequency=100OHz I 1. I1 II 6 ts Ist4 - ' 12 - 0 0 * 14 :2 * t2:4 8- - 12 0 6- II 3 2q sa - I lls I I I I 174 L a 232 2q Post-filter Bandwidth (Hz) Post-Flter 6-cole Butterworth post-filter center frequency=1000Hz I I I I I i 1* I] i Bandwidth (Hz) 3-pole Butterworth post-filter center frequency=lOOHz I t I I IS - 771 :15L74 232 a 0 * 0 '4 I - * - 10 14 A H * 12 Z 80 0~ caH -4 2t 2 Us Post-Filter 232 :74 Bandwidth 0 (Hz) 20 Sa ,-Is Post-4'iter 232 222 17. Bandwidth (Hz) - 53 - Effect of varying the post-filter bandwidth and Figure 3.6 Pre-filter is an optimal filter (*) or slopes. differentiator (o). 3.4.1 Spectra of Clipped Unvoiced Speech It has been shown gaussian noise as (Papoulis, 1965; input, the after infinite clipping can power spectrum Gx(w) by by determined the = 2/TY 1966) output power "arcsin respective autocorrelation functions, Ry(t)/Ry(O) Fawe, spectrum Gy(w) from noises Ry(t) /f,sh,s/ and the synthesized burst input and Rx(t). Thus, arcsinLRx(t)/Rx(O)] before and after clipping. noises consisted of the law" relating their This relation was verified by measuring the various that for tokens of spectra of The first set of the portions of /p,t,k/. consonants Clipping was done by the analog clipper and spectra were analyzed with the spectrum consonant function analyzer. was also The power calculated spectrum from from which it was generated, the of each synthetic filter transfer and the arcsin law was applied to these spectra using the FFT. Figures and the 3.7 through 3.12 show both the measured spectra calculated from the arcsin law. noise bursts were generated with a sampling rate of only a 5 kHz range is shown). - 54 (Since the 10 kHz, From the figures it is clear that the clipped spectra follow the arcsin law. - spectra Spectrum of /F/ 0 - -20 clipped a ~i% (n C q,40 0 L 4- 4) a-60 -70 -80 100 0 I I I 2000 3000 4000 5000 Frequency (Hz) 1% -10 -20 -30 clippU, -40 -50 CL. -60 -70 -80 0 1000 3000 2000 FREQUENCY Figure 3.7 4000 5000 (HZ) Spectra of the synthetic consonant noise burst, /f/, - 55 - In the top plot, the measured before and after clipping. solid line is the ideal input spectrum and the dashed line is the spectrum calculated from the arcsin law. In the bottom plot, the lower trace is the measured input spectrum and the upper trace is the measured clipped spectrum. Spectrum of /SH/ -20 - 0 -3 -' 4 '-,. -clipped a C C4 ~i +A -so 0 lows 4000 Frequency (Hz) 0 I I -10 -20 -30 U -50 clipped - - -60 -70 -80 0 II 1000 I -I 3000 2000 FREQUENCY (HZ) 4000 5000 - 56 - Figure 3.8 Spectra of the synthetic consonant noise burst, /sh/. Plots are the same as in Figure 3.7. _T -10 clipped C C140 C- V) M --- 4 e 400 Frequency (Hz) 0 -10 -20 2 -30 - Z-4o clipped S-50 -60 -70 -80 1000 2000 3000 FREQUENCY 41000 5000 (Hz) Figure 3.9 Spectra of the synthetic consonant noise burst, /s/. Plots are the same as in Figure 3.7. - 57 - 12 - Spectrum of /S/ . .............. Spectrum of /P/ -T 0 -101 clipped - a 4 - -------- J in 0 U3 -.60 0 3000 000 Frequency (Hz) 100 40M0 5000 0 -10 -20 -30 - clipped ~-40 1 CL -60 -j - -70 -80 0 1000 2000 3000 FREQUENCY (Hz) 400 Figure 3.10 Spectra of the synthetic consonant noise burst, Plots are the same as in Figure 3 .7. - 58 p/. Spectrum of /T/ -101-- -clipped 1-o C II I I e 1000 _ 0 a(I) -60I- -80 20Me 5000 300 Frequency (Hz) 0 -10 -20 '0-30 clipped I-40 -j -50 CA) -60 -70 I -80-. 0) 1000 I I 2000 3000 I 400r) 5000 FREQUENCY (4Z) - 59 - Figure 3.11 Spectra of the synthetic consonant noise burst, /t/. Plots are the same as in Figure 3.7. Spectrum of /K/ 0 10 I -- I Ca 3-1 - - - - ' clipped -------7----- aC use cn -/0 - -60 -88 I vi .300 200 LL. 4000 5000 Frequency (Hz) Z0 210 41) I- clipped SO U~) -70 -80 1000 I 2000 I 3000 FREQUENCY (HZ) 4000 500 - 60 - Figure 3.12 Spectra of the synthetic consonant noise burst, /k/. Plots are the same as in Figure 3.7. In order to look at spectrum after relative changes in the clipped varying spectral frequencies and amplitudes, random noise was filtered with the 1925 General Radio 1/3 octave multifilter. The output was clipped and analyzed with the H.P. Power spectra were analyzer. also calculated by varying the parameters of the filter transfer function of the synthesized (Thus, consonants and the arcsin law was applied. the calculation was performed using a filter transfer one This empirically. used be seen between the unclipped calculated and can difference the from function different measured spectra in the figures.) Spectra with a single at 315 Hz and peak 1250 Hz are shown in Figures 3.13 and 3.14. Spectra with two peaks are shown Figures in 3.15 through 3.19. Several observations can modifications fill be peaks. 3.13 and 3.14 the spectral much as an increased noise Second, as expected from the Fourier series for a square wave, harmonics spectral about First, clipping tends to by clipping. caused in "valleys" in the spectrum, floor would. made This which appear at odd multiples of the especially noticeable in Figures is a display single peak. Third, the - 61 - amplitude relation between two spectral peaks after clipping, I I I I ' a -to a -50 a-3 o --- -7 as E I I IT 1~ LE0 to"M Frequency (Hz) I I 2QSS 25Q 0 -10 r- -20 -30 Z 4 - d 9-50 ULU 156, -60 -70 -80 0 Figure 3.13 500 100 FREQueNCY ([A500 2000 2500 Spectra of filtered narrowband noise centered at - 62 - The top plot is 315 Hz measured before and after clipping. calculated from the arcsin law using a different transfer function from that measured for the input noise. The bottom plot is measured with the spectrum analyzer. The format of this plot is the same as in Figures 3.7-3.12. _ a I I I I I t1-- - CO - clipped I I I 7095 aBG 0 ,CU -50 0 Q- If) -0SI -7, -00 I W to" 2U06 I 4WQ 3UU I I S" Sees Iees I teee Frequency (Hz) II i -10 -20 -30 -I clipped -S-50 .................-70 -80 IW 200 =7 =m 10000 - 63 - As in Figure 3.13 but with input noise centered Figure 3.14 at 1250 Hz. _ 1 1 1 1 a -1 -to -2S-- C 0-4. 0 -- - - clipped aO -8sf-- -0S L. a 200 _258 Frequency (Hz) 0 -10 -20 -30 0--ippe -50 -60 -70 1 -80 500 FREQUENCY (RO 2OW 2UO Figure 3.15 As in Figure 3.13 but with input noise peaks centered at 250 Hz and 1250 Hz. - 64 -10- -20- 30 - clipped a. (n 70- - -M -80 0 low 2000 3000 4000 5000 Frequency (Hz) -20 -30 -clipped -60 -70 800 FREQUEICY (HZ - 65 - Figure 3.16 As in Figure 3.13 but with input noise peaks centered at 250 Hz and 1250 Hz. I -1 -Wo- -201 C q40 - .- H- clipped - ------------- -80 I. I 0 I000 I 300 200 I 4000 5000 Frequency (Hz) I I OT I -10 1-20 -30 - clipped 1-50 -I -70 -80 U LWJ LIAAJ FREQiENCy 30 C (Hz) 50Y 3.17 As in Figure 3.13 but with input noise peaks centered at 315 Hz and 1250 Hz. - 66 - Figure 3 i I - -20- 30 c --- ,- -clipped - - 140 L a. I) -60 -70- .. -. 0 1000 2008 3000 4000 Sam Frequency (Hz) 0 -10 -20 -30 t-50 -70 -80o mu 3m) 32(W) FRE'UENCY (Hz) - 67 - Figure 3.18 As in Figure 3.13 but with input noise peaks centered at 315 Hz and 1250 Hz. I- I i -10 -20 '30 - ~clipped --- C q401 a. U) C- -60 tese 4090 -I- iT 6000 Frequency (Hz) 0 -20 - - -10 -30 - clipped 9-50 -70 -80 1 0 1 AJ 2W FRElUNY (OZ limo - 68 - Figure 3.19 As in Figure 3.13 but with input noise peaks centered at 315 Hz and 1250 Hz. when neither is obscurred by distortion components, is nearly the Low-level same as the input (Figures 3.16, peaks can be peaks, obscurred especially by the 3.18, 3.19). harmonics of higher-level if the frequency of the low-level peak is greater than the frequency of the higher-level peak 3.15 vs. 3.16 and 3.17 vs. also be enhanced by the 3.19). harmonics (Figures The low-level peak can of higher-level peaks (Figure 3.15). 3.4.2 Phase Dependence of Clipped Voiced Sounds For the case of voiced sounds, which consist of harmonic components, it is expected that the arcsin applicable in general, useful although it approximation. predicting the clipped In this spectrum may possibly section of law will not be the harmonic provide a problem in sounds is demonstrated. The problem can be seen by constructing a the sum phases of signal that is harmonically related sine waves and varying the of the components (Figure 3.20). The three frequency - 69 - components were chosen in a one-third octave band with center frequency of 1000 Hz. Any prediction of the clipped spectrum based the input dependent on It the is clear phases that of clipped the in the input (panels A-C components spectra individual of It will be noted that consistent cases frequency 3.21). Figure the three the across gross The of the is unknown. shape spectral clipped follows the shape predicted by the arcsin law. validity are from the arcsin law is shown is Panel D of Figure prediction rather on auto-correlation function would be independent of the input phases. 3.21. solely is spectra and However, the arcsin law approximation in other harmonic Given that fine details of the spectrum are lost as a result of "critical-band" filtering by the ear, an approximation as good as that shown in Figure probably be adequate. approximation for lengthy which task 3.21 would Determining the goodness of the arcsin harmonic was is signals not pursued a worthwhile but here because of time constraints. If the tones in the signal are not harmonically related, phase shifts time orgin, the of components can be viewed as a shift in the and there should be no effect of phase shifts clipped spectrum. speech signals, However, this case is not relevant to since the components - 70 - harmonically related. on of voiced sounds are VALUES 1i 0 8 0 (A) 0 -- -8 0 -16 VALUES 16( 0 0 0 (B) 0 VV 0r -160 VV VALUES | iC || 0 -- (C) -8 -1G6 BEG = .006400 SEC MID = .01640 SEC ENO = .02640 SEC - 71 - Figure 3.20 Differing time-waves caused by phase shifting. The signal was generated by summing three harmonically related tones with frequencies of 900, 1000, and 1100 Hz, and relative amplitudes of 0, -2, and -5 dB respectively. The three tones (900, 1000, and 1100 Hz) in Panel A have zero phase. The phases of the three tones were respectively The phases were 90, 0, 180, and 90 degrees in Panel B. 270, and 180 degrees in Panel C. (A) (B) (OB) T 10 0 f A (DB) -1100 s0 70 _ _ A7-.. SO IWill I)I1A IIHIH IIf 1 Jill HIA 111991111 i v 40 40 30 20 20 to 3000 2000 FREQUENCY (HZ) Y I 3000 2000 FREQUENCY (HZ) 4000 I (08) 100 too qG 90 1 80 IA 111111 MITI, I 41 r )j 0 4000 so 70 11 so MIR h 1.1 1111 1 li w 11U M 11.11111W HIMIll H LU.A. 11111MM 11 40 C40 U 30 V) 30 20 20 I-4 to 0 2000 3000 FREQUENCY (HZ) 4000 0 (C) t000 2000 3000 4000 Frequency (Hz) (D) Figure 3.21 The phase dependence of harmonically related tones. Panels A, B, and C show the clipped. magnitude spectra with Panel D shows the three signals in Figure 3.20 as input. the input. to applied law the arcsin the prediction of - 72 3.4.3 Spectra of Clipped Vowels In order to observe the effects spectra, vowels. measurements of clipping on vowel were made with steady-state synthetic The spectra of the vowels were evaluated with the which the ILS spectral analysis package. The vowels were synthesized using a model glottal model the is passed through a filter designed to model source the vocal tract (Rabiner and Schafer, impulse 1978). the change to varied were In the bandwidths of the formants. an present Parameters frequencies of and implementation, was used as a glottal pulse waveform in order to remove the negative spectral tilt associated natural in It waveform. is easier to the with more evaluate the clipped spectra after eliminating this low-frequency emphasis. frequency The vowels were synthesized with a fundamental of 125 Hz. Thus, the spectrum of each vowel is a sequence of harmonics spaced 125 Hz apart with spectral-envelope peaks at specified formant frequencies. The spectra of clipped synthetic vowels are presented in Figures There are some major differences 3.22 through 3.39. between the clipped spectra of vowels and noise. spectrum is a sequence - 73 - vowel of A clipped harmonics with the same inter-component spacing of 125 Hz as the unmodified spectrum, but with components extending to higher frequencies. noise, harmonics appear at peaks. However, odd of multiples the As with spectral "noise floor" there is no counterpart to the seen with clipped noise. harmonics Clipped components of the frequency fundamental can emphasize or de-emphasize example, in Figures 3.22 through 3.25, formant peaks. For the formant is at 1000 a multiple of the pitch period, and the clipped spectrum Hz, has peaks only at odd harmonics of this formant. In another example where the formant of the vowel is higher in frequency and not exactly at a harmonic of the 3.26-3.29, at low as an However, the original (Figures formant absolute maximum in the spectral The extent to which the formant is spectrum, period the clipped spectrum contains other spectral peaks frequencies. preserved pitch in retained even when the input spectr'al peak is the is envelope. clipped very shallow, is quite remarkable (Figures 3.25 and 3.29). The clipped spectra of vowels peaks with two formants have at not only odd multiples of the formants but at other frequencies as well emphasize (Figure 3.30). individual Again, - the unmodified spectrum (figure 3.31). 74 to that are difficult to see in formants - clipping seems In Figures 3000 Hz) 3.32-3.34, the two formants (at 1000 Hz are not only harmonically related to the pitch, but the higher frequency formant is an odd multiple of the frequency and formant. After clipping, lower narrow spectral peaks Even appear at only odd harmonics of the formants. as the bandwidths of the formants are varied, no new peaks appear in the spectrum (as seen in Figures spectrum 3.30 and remains unchanged except for the 3.31), and the spectral height of the peaks. When two formants are not harmonics of pitch frequency, fundamental the formants are preserved but the clipped spectrum contains other spectral peaks to noise, lower-amplitude - (Figures 75 3.35-3.39). formants in the spectrum (Figure 3.37). are obscurred after clipping. - Similarly the (OB) 1 00 70 6o 30 20 10 1000 0 3000 2000 FREQUENCY (HZ) 4000 500 08) 100 60 'A( 70 60 20 10 0 0 1000 2000 3000 FREQUENCY 4000 S000 (HZ) - 76 - Figure 3.22 Spectra of a synthetic steady-state vowel measured The before (top plot) and after (bottom plot) clipping. Hz. 50 formant frequency is 1000 Hz and bandwidth is (DB) 100 80 70 Ah I I I A so 11111 HA V IIIIIIIIII IIIIIII III I)IIIII IIIIIIIII I IIIII 11 111111 lifill" 111IM 11 , IIH IIIII 11111111 11 H IM 111111111MI IIII uvd Jj ju tin m 4fW 30 20 10 0 0 2000 3000 FREQUENCY (HZ) 4000 5000 (OB) 100 80 I A 70 so 20 10 0 0 1000 2000 3000 FREQUENCY 4000 5000 (HZ) Figure 3.23 As in Figure 3.22 with a 1000 Hz formant frequency and a 200 Hz bandwidth. - 77 100 90 80 I Hit IlHil s0 111111111 11IN I II 111 11111111 11111 11H I111111 11111 1111IN 111111111 1111 11H 1IlUV ' W -44M 1111 1111 H ill 11 111 111 1111 1 III III II; NI W 11111 30 20 10 0 0 3000 2000 FREQUENCY (HZ) 4000 5000 (0B) 100 70 11111 .1111 H I IJ 11 H IH A III . III Jill so A il 111 111 11 11111 1 11111111 11 11111 ji I I I Ill 111 1111 H I 111 1 1 111, 111 11 1111111 H I II I M 1111 1 111111111 111H I M 11111H 11111111 111111 1HIH IIN IIHMv 30 IU I I I20 to 0 0 1000 2000 3000 FREQUENCY (HZ) 5000 Figure 3.24 As in Figure 3.22 with a 1000 Hz formant frequency and a 500 Hz bandwidth. - 78 (MB) 100 I W 111 )I 11 IHII U F- 11 11 A I ' I !111 HI I III M 70 1]~~~~~ 111- 1IIII I s0 30 20 10 0 0 3000 2000 FREQUENCY (HZ) (OB) - 100 80 70 * * j li.1 60 11 40 30 20 10 1000 0 2000 3000 FREQUENCY (HZ) 4000 0100 As in Figure 3.22 with a 1000 Hz formant frequency Figure 3.25 and a 1000 Hz bandwidth. - 79 @- - - -- - .. -1, 11/ -- , - - - -z A - ---------- - .(OB) 1 0 80 70 II. I i IIIII . fva 1 f I s0 l I I I Im 30 20 0 0 1000 I 4000 3000 2000 FREQUENCY (HZ) 1. I. ,I I________ - (DB) 1100 I__ SO 70 f ______________ 60 - - - 1 - - ________ 20 10 0 1000 2000 FREQUENCY 3000 (HZ) 4000 EQO 0 Figure 3.26 As in Figure 3.22 with a 2700 Hz formant frequency and a 50 Hz bandwidth. - 80 - -1- I I I DB) 100 90 80 70 IliI________1_11 60 40 30 to 2000 3000 FREQUENCY (HZ) 4000 0 5000 (OB) 1 100 60 70 60 III III 'lH 1 if iM H 1 40 20 to 0 0 1.000 2000 3000 FREQUENCY (HZ) S000 - 81 - Figure 3.27 As in Figure 3.22 with a 2700 Hz formant frequency and a 100 Hz bandwidth. 80 70 60 40 30 20 20il30l 01 0 0 2000 3000 FREQUENCY (HZ) 4000 Eooo (OB) 100 80 HIM M HIM II IIIH JI 11111 I I 1 111111 11111 HIM I 1 11 ji il 11 4111 1 H M W HIC HIIIIIII11 . 1111H II .114 111 11 1,111 ( H ill 1 11 11 11 U ll I'll 1 11 Ij I I IM I 11 1 '1 H IP 11 1 lj 11 11j I( 11I 111 1 M 11111H IIII. 1111111 11111 1 111 111 1 IN I I III Io 70 60 SO 40 30 20 10 0 1000 2000 3000 FREQUENCY (HZ) 4000 0 5000 Figure 3.28 As in Figure 3.22 with a 2700 Hz formant frequency and a 500 Hz bandwidth. - 82 III HIM HINIII IIM IIII MI Jill I IIII Jill 11 H 111 111111-11 111 11111 1111111 1 111 9 1 11 s0 1111 1111 11Jill 0 UA-1.1 1) [1 Ij so 111111( U l III HIM 30 20 t0 0 1000 0 3000 2000 FREQUENCY (HZ) 4000 5000 (DB) t00 80 I E 70 A 60 30 20 to 1000 0 2000 3000 FREQUENCY 5200 - 0 (HZ) Figure 3.29 As in Figure 3.22 with a 2700 Hz formant frequency and a 1000 Hz bandwidth. - 83 100 I I I s0 l 70 Lai 60 30 10 0 0 3000 2000 FREQUENCY 4000 (HZ) DB) 100 ilt( 70 60 50 30 20 10 0 0 3000 2000 FREQUENCY 4000 5000 (HZ) - 84 - First formant frequency is 1000 Hz Figure 3.30 As in Figure 3.22. formant frequency is 2000 Hz with Second with 50 Hz bandwidth. 50 Hz bandwidth. 100 60 A 70 T. IN III I 11 11 1 11 HIM IN J 11 I I I], r4 i 4 1111 I HIP 11111 11 1111111111. 1111i 11 11 Ili Ilim , 1 1 .1. I fl 11 1111111 11 I I 111 J I llu l I 1 1 1)11 1 11111 11,11MIll 111 1) 111111 1 .. 11 1111 I III HIM I P, I I 1 11P III 1 11 111111 JIM Mill Hil I 30 10 0 0 1000 4000 2000 3000 FREQUENCY (HZ) Soo 100 IN I 111 L II II II I t 70 60 IIIIIIIIIII I H 111111 I I I i ill I IIIIH I 11111 11 ll 11.1 U1111 i j 1 1 1 ' Jill i ll Hil 0 111111 1 11 H I h ill 11 H il 11M 11 M ill 1. 1111I I1 1111111 1 i ll ill Hill L 10 0 0 2000 3000 FREQUENCY 4000 E000 (HZ) Figure 3.31 As in Figure 3.22. First formant frequency is 1000 Hz with 500 Hz bandwidth. Second formant frequency is 2000 Hz with - 85 - 500 Hz bandwidth. 00 A1 so 70 60 30 20 10 0 0 1000 2000 3000 FREQUENCY (HZ) 4000 5000 OB) 100 ___( 630 70 60 30 20 0 2000 3000 FREQUENCY (HZ) 1000 4000 S000 - 86 - As in Figure 3.22. First formant frequency is 1000 Hz Figure 3.32 with 50 Hz bandwidth. Sencond formant frequency is 3000 Hz with 500 Hz bandwidth. \ t I IH II 0sL.Jj 100 I H IIII 30 20 to 0 0 1000 Sao0 3000 2000 FREQUENCY (HZ) (DB) 1 100 s0 70 s0 , --------- 40 20 10 0 0 1000 2000 3000 FREQUENCY 4000 S000 (HZ) - 87 - Figure 3.33 As in Figure 3.22. First formant frequency is 1000 Hz Second formant frequency is 3000 Hz with with 100 Hz bandwidth. bandwidth. 100 Hz (Ut:$) 1 100 90 70 .~ .. 11 11 111M INIIII III 60 30 10 0 a 1000 2000 3000 FREQUENCY (HZ) 4000 S000 (OB) 100 80 -- 70 60 40 30 20 10 0 100 2000 3000 FREQUENCY (HZ) 0 4000 As in Figure 3.22. First formant frequency is 1000 Hz Figure 3.34 Second formant frequency is 3000 Hz with with 500 Hz bandwidth. - 88 - 50 Hz bandwidth. __A I I___ 100 ___ s0 40 30 20 10 0 0 1000 2000 3000 FREQUENCY (HZ) 4000 SOO OB) 100 70 30 IH I~ II 50 30 20 to 0 Figu re 3.35 1000 2000 3000 FREQUENCY (HZ) 4000 5000 0 As in Figure 3.22. First formant frequency is 1000 Hz Second formant frequency is 2700 Hz with with 50 Hz bandwidth. 50 Hz ba ndwidth. - 89 - \ 1- 100 80 A I I A 60 11111 11 1 H ill M ill 4 11111 11 11111 Jill 20 10 0 1000 4000 2000 3000 FREQUENCY (HZ) 5000 OB) 100 80 I A 70 s0 jt U H IM 11111111111 I 11 Ill I I W, J I 111111 IIJ 11,11 d IN l I Ill H ill H I L 11 J I I I I I I II I I II I J M illi I ji H II IIIIIII 11111111 M ill 20 10 0 0 1000 2000 3000 FREQUENCY 4000 E000 (HZ) First formant frequency is 1000 Hz As in Figure 3.22. formant frequency is 2700 HzSecond bandwidth. with 500 Hz with 500 Hz bandwidth. - 90 - Figure 3.36 1 100 80 70 6o I- J1111 H 11 11 IN I II U 111 11 (1 IRM 11114 1 AI 1 I I jII1 O- 11.1 11 1 11111 11 1 111111 1111H 111 1 11111111 11 Iffl - t 1 111111 111 1111 9 111 1.1 U1111 ti 11 1i 11 11 1111111111 1 1 I 30 20 p I P I I I l' I 10 0 0 2000 3000 FREQUENCY (HZ) 4000 S000 (B) 100 70 630 - - - 4 I 1'i 30 20 10 0 1000 3000 2000 FREQUENCY 4000 0 5000 (HZ) As in Figure 3.22. First formant frequency is 600 Hz Figure 3.37 Sencond formant frequency is 2300 Hz Hz bandwidth. with 50 with 500 Hz bandwidth. - 91 80 11 I W 911 IIM 111 1111 - 111 5 s0 J1 IIIHIIII PH 111W40101 i so IMU HIM I HIM 11111 IIIH IIII... 11 .30 10 0 0 4000 ( 3000 2000 FREQUENCY (HZ) OB) 100 s0 1 A I I I HIM ... 1 M l I 1 1 1 l 1 1 A l 1 1 11 1 30 :1 20 10 0 0 3000 2000 FREQUENCY (HZ) 4000 500 First formant frequency is 600 Hz As in Figure 3.22. Figure 3.38 Second formant frequency is 2300 Hz with 100 Hz bandwidth. with 100 Hz bandwidth. - 92 DB) 100 I AI( I 11 s0 30 20 10 4000 3000 2000 FREQUENCY (HZ) 0 (DB) 100 60 H ill I 1 1111 11 11 1 1 so I IJ 111 wif 111' i IN A till 1 11 11 H 11 U I Ill 30 20 10 (0 0 2000 FREQUENCY 3000 (HZ) 4000 5000 Figure 3.39 As in Figure 3.22. FIrst formant frequency is 600 Hz with 500 Hz bandwidth. Second formant frequency is 2300 Hz with 50 Hz bandwidth. - 93 Chapter 4 Discussion The present study was designed to assess clipping as The study was of two about hearing and the perception of speech. The analyzes intelligibility of speech. determine infinite into "critical-band" The based if peak was investigation on these assumptions: spectral an clipping answer what are the effects infinite-peak distortion be minimized? amplitude to designed compression clipping, and The goal was to system based on be designed which would give could maximum compression and minimum spectral distortion, maximizing cues lie in the spectral (magnitude) patterns of post-filtering speech after can sound the The second assumption is that the primary to how on the outputs of which are relevant to the perception loudness. questions based following first is that the ear filters, peak a means of reducing the amplitude variations in speech levels. assumptions infinite thereby intelligibility for individuals with sensorineural - 94 - hearing loss. 4.1 Compression Results The general result from the study of compression was the insensitivity of the amplitude range of clipped speech to the characteristics of the filters that preceded and followed the of which pre-filters and post-filters clipper. Regardless were used, clipping reduced the range of speech by least Output ranges were on the order of 10 to 20 dB. 15 to 20 dB. The at general trends the of pre- of effects and post-filtering clipped speech can be understood qualitatively with concepts discussed by DeGennaro et al. (1981). in a bandpass speech envelope can be modeled as fluctuations slow and of components result from the narrowband filtering (the rate of distributions component the in across (with results from intensity overall different accounts for about The slow component few per second) fluctuations on the order of a speech measured slow component. that is slowly and spectral in The sounds. 30 dB of the range. approximation, the effect of infinite peak range account and fluctuations are on the order of the bandwidth) for 10-15 dB of the measured range. rapid The components. rapid consisting variation The slow To a first on clipping the after narrow post-filtering is to remove the An example is a third-octave band modulated noise over a large decibel range, then with the 95 - clipped and post-filtered by a third-octave filter - of same center frequency. The unaffected by the slow modulation, zero-crossing pattern so the clipped noise exhibit the same envelope variations as is will if there were no slow modulations. The preceding analysis applies to post-filter is case where the in the spectral region that is dominant at the input to the clipper. time, the variations If this dominant region changes over in the clipped spectral distributions will contribute to the measured output range. This discussion aids in interpreting with no pre-filtering the 16 band the ranges smaller than with a highpass pre-filter (the or differentiator), filters). (and or a bandpass With no pre-filtering, hence the sequence of regions will reduced zero-crossing in be amplitude sequence speech relatively by will filter (formant waveform is dominated by The harmonics in the higher various speech sounds after clipping. is pre-filter the input that were slightly optimal zero-crossings) the low-frequency first formant. frequency finding constant across the If the first-formant highpass filtering, exhibit greater the input variability, resulting in greater variation in the clipped spectrum across - 96 - the different speech sounds. It was also observed that ranges were reduced further if pre-filter bandwidth was narrowed to one-third octave or the less. This effect can be understood by considering the of very a small The input to the bandwidth. pre-filter case clipper will approximate a tone with slowly varying amplitude and Since the clipped signal will be modulation. frequency the third-octave frequency modulation and an from the the less with slow modulation resulting a the and modulation there will be modulation amplitude Thus, post-filter. frequency the less narrower the input band, consequently by of tone be amplitude conversion FM-to-AM also will filter output the frequency modulated with constant amplitude, after post-filtering. The finding that dependent inplies the clipped on the bandwidth that the exact filters critical-band or is range of slope not be strongly the post-filter of characteristics need not the ear's known in order to make meaningful measurements of range. It is interesting to speculate on minimimum range considers that compression the variations a system. When - 10-15 dB, one must in the speech envelope could be 97 - how is the envelope ranges of narrowband noise and harmonic complexes are on the order of wonder there of speech levels that can be achieved after (auditory) post-filtering by any one whether further reduced. these It would seem envelope that further reduction of variations could be achieved only in special cases. 4.2 Comparison with AGC Systems Krieg achieved (1980) by studied multiband independent AGC amplitude in each band. was CVC lists spoken by a male talker, ratios third-octave across all in 0.5 ranges 16 channels were channels the compression, was and the to 25 compression input dB, averaging 36.8 dB Figure After averaging 24.3 dB 1980, Figures in the present speech by approximately 20 dB, the AGC system used by Krieg results 9). 11 and 15) and ratio was approximately 1.5. Infinite peak clipping, intelligibility The 16). The actual amount of compression approximately 12 dB (Krieg, of 42 1980, (Krieg, all frequency bands. range and 11 ranges were 14 to 38 dB, the effective compression the The input speech in the 16 frequency bands were 3 (with a deviation of approximately across compression (1980). from Assuming study, reduced 8 dB more than that greater presenting a greater range of speech within the residual hearing area of the listener, then clipping should allow superior intelligibility results when - 98 - the listener's dynamic range is less than about 20 dB. 4.3 Spectral Modifications The spectral pattern of clipped unvoiced sounds predicted from the arcsin law (Section 3.4). peaks. Clipping the spreads the spectrum, the low-level region much as a noise be After clipping, spectral peaks will appear at odd multiples of spectral can floor original filling in would. Clipped spectra with two formants have harmonic peaks at not only odd multiples of the formants but at combination tends to obscure lower-frequency formant, when stronger, The frequencies. the higher-frequency formant. The clipped spectra of voiced sounds the phases are of the input frequency components, law cannot be aplied. Except for the fact that dependent on so the arcsin the clipped harmonic spectrum contains frequency components with the same fundamental spacing as the unmodified spectrum is no counterpart to the the general sounds. results are "noise floor" similar with (i.e. there of unvoiced sounds), voiced and unvoiced Spectral peaks which appear at odd multiples of the formants and combination frequencies after clipping will obscure the low-level formants. A which remarkable aspect of clipping vowels spectral peaks can be enhanced. This is the degree finding suggests 99 - that infinite peak clipping could be exploited in schemes - to for formant detection and tracking. 4.4 Implications for Hearing Aid Design As stated above, the effects of the present study sought to determine infinite peak clipping in order to minimize both output ranges in critical-band-like filters and spectral distortions. It was found that output ranges are relatively insensitive to pre-filtering. post-filtering and If output minimizing be criterion, there narrowband pre-filtering. and would Hildebrant's (1982) 1100-Hz formant intelligibility. on ranges were the only no pre-filtering highpass or filter and filters produced ranges only a few dB larger than no pre-filter and greater dependent However, pre-filters like Thomas (1968) Niederjohn's either slightly are known to produce The differentiator pre-filter gave the largest ranges of the pre-filters investigated. The present results on spectral distortions will help to determine a choice of pre-filter. filters produce relative the spectral amount greatest qualities lost in such a system. Even though the narrowband of compression, among frequency bands would be For multiple-formant sounds seem that a pre-filtering scheme that separates like Hildebrant's formants (1982), would be beneficial it would the formants, for preserving (without losing the relationship between bands) - 100 - the the and avoiding interactions in the clipper. However, for single-peak sounds, this scheme may result in spurious output peaks, or a less well-defined peak than this respect, a single-channel in the clipper input. with In highpass pre-filtering might prove superior. Of course, the particular loss of the individual also be considered. only on first formant formant hearing, then should For example if the individual depends cues, and has virtually no second pre-filtering with an optimal filter will hinder intelligibility by decreasing the saliency of the cues. 4.5 Recomendations for Future Work Future work should test intelligibiltiy and clipper hearing-impaired Ideally, optimal filter/clipper listeners with small with a 3-band systems dynamic on ranges. results could be compared with performance on other amplitude Examination compression of the systems types hearing-impaired and normal using of the speech same errors subjects. for both subjects should help to determine - 101 - the next course of action. REFERENCES Braida, L.D., et al. (1979). "Hearing Aids--A review of Past Research on Linear Amplification, Amplitude 19 Compression, and Frequency Lowering," ASHA Monograph No. Controlled Multiband "A Computer (1979). Coln, M.C.W. Tech., Inst. Amplitude Compressor," Master's Thesis, Mass. Cambridge, MA. "A (1981). N.I. Durlach, De Gennaro, S., Braida, L.D., Amplitude Speech of Third-Octave Analysis Statictical the Distributions," Paper presented at the 101st meeting of Accoustical Society of America, Ottawa, Ontario, Canada De Gennaro, S. Krieg, K.R., Braida L.D., Durlach, N.I. "Third-Octave Analysis of Multichannel Amplitude (1981). Compressed Speech," Proceedings of IEEE Inter. Conf. on Acoust., Speech and Signal Processing, Atlanta, Georgia De Gennaro, S. (1982). "An Analytical Study of Syllabic Compression for Severely Impaired Listeners," PhD Thesis, Mass. Dunn, Inst. Tech., H.K., White, Cambridge, S.D. MA. (1940). "Statistical Measurements on Conversational Speech," JASA 11, pp278-288 Fawe, A.L. Speech (1966). "Interpretation of Properties," IEEE Infinitely Trans. on Clipped Audio and .Electroacoustics 14, ppl78-183 Green, D.M., Swets, J.A. (1966). Signal Detection and Psychophysics, (John Wiley and Sons, New York) Guidarelli, G. (1981). "Intelligibility and Theory Naturalness Improvement of Infinitely Clipped Speech," Acoustics Letters 4, No.7 Hildebrant, E.M. (1982). "An Electronic Device to Reduce Thesis, Mass. Dynamic Range of Speech," Bachelor's the Inst. of Tech., Cambridge, MA practice "Recommended Trans. Audio IEEE - 102 - IEEE (1969). measurements." pp225-246 for speech quality 17, Electroacoust. Kac, M., Siegert, A.J.F. (1947). "On the Theory of Noise in Radio Receivers with Square Law Detectors," J. Applied Physics 18, pp383-397 Krieg, K.R. (1980). "Third Octave Band Level Distributions of Amplitude Compressed Speech," Bachelor's Thesis, Mass. Inst. of Tech., Cambridge, MA. Licklider, J.C.R. (1946). "Effects of Amplitude Distortion upon the Intelligibility of Speech," J. Acoust. Soc. Am. 18, pp429-434 Licklider, J.C.R., Bindra, D., Pollack, I. (1948). Intelligibility of Rectangular Speech-Waves," Am. "The J. Psychology 61, ppl-20 Licklider, J.C.R., Pollack, Differentiation, Integration, I. and (1948). Infinite "Effects of Peak Clipping upon the Intelligibility of Speech," JASA 20, pp42-51 Papoulis, A. (1965). Stochastic Processes, Pollack, I. (1952). Amplitude Distortion Noise," JASA 24, Rabiner, I.B., Random Variables and "On the Effect of Frequency and on the Intelligibility of Speech in pp538-540 L.R., Schafer, R.W. Speech Signals, Thomas,. Probability, (McGraw-Hill, New York) (1978). (Prentice-Hall, Digital Processing of New Jersey) Niederjohn, R.J. Intelligibility of Filtered-Clipped Audio Engin. Soc. 18, pp299-303 (1970). Speech "The in Noise," J. Thomas, I.B., Sparks, D.W. (1971). "Discrimination of Filtered/Clipped Speech by Hearing-Impaired Subjects," JASA - 103 - 49, ppl881-1887 Appendix A Probability Densities of Test Signals Narrow-band Noise The probability density generated for the random variable, V, squaring and then lowpass-filtering a band of by noise centered at a high frequency was calculated by Kac and Siegert (1947) to 00 P(V) i where integral = be exp(-V/) s) ----------------s (1 i3s) is the ith eigenvalue noise spectrum is that of a lowpass filter is in (Al) the a "simple-tuned" circuit simple RC-lowpass eigenvalues are found from the zeroes of the Bessel function, where solution L2(R/, i) an R and the filter, the (R-1)th J = 0. (A2) 104 - R-/ - order is the ratio of the input noise bandwidth to the lowpass filter's -3 dB bandwidth. J of the specific case where the input For equation. V>O Probability density functions were computed for values of between 0 and 1 (Figure (Figure A2). The case of Rayleigh random Al), R=0 and values between 1 and 8 corresponds variable to that squared, exponentially-distributed variable. of or The range was equal distribution at which the cumulative to 0.10 and 0.90, a an computed to be the decibel difference between the the two values, and V90, R function V10 is The range values are respectively. graphed in Figure 2.2. Tones wave An input sine frequency amplitude and is considered in this section. f=w/2T( and lowpass unit with filtering of this signal results cyclic Squaring in the variable 1 - /H(2f)//cos(4Thft) --------------------- (A3) V =----------- 2 where H(f) = 1/ 1+RA and R is the ratio of f to the - 105 - RC-lowpass filter. -3 dB bandwidth of the Because of symmetry in the time-wave, can be limited cumulative to the calculations the range for c^'= 47'fft of 0< distribution function for V is /<17/. found The by integrating the probability density function on o/ (which is assumed uniformly distributed) between the limits determined by the inverse function, cX =F (1-2V) (V) = cos L ------------ ] (A4) . /H(2f)/ The cumulative distribution function is derived below. Pr(c/< F = (Vo)) /v = arccos L(1-2V)// H(2f)/ The two values, distribution is equal equation A5. these two - Pr(V< Vo) V10 and V90, at I which (A5) the cumulative to 0.10 and 0.90, were calculated from The range, which is the decibel difference values, is graphed as a - 106 - 2.3. function of R of in Figure I - 2.0 ~~1 R .8 1/2 1/4 1/8 0 (exponential density) t- .41 0.0 0 2 1 3 4 V - 107 - Figure Al Probability density functions for narrowband noise after squaring and lowpass filtering for R less than 1. input noise bandwidth to the lowpass R is the ratio of the filter's -3dB cutoff. - 2 .0 1.8 R8--1 R -- 8 / I 1.2 \ \ -1 4 -x >\ 2 14 0 1 2 3 4 V - 108 - Figure A2 Probability density functions for narrowband noise after squaring and lowpass filtering for R greater -than or equal to 1. APPENDIX B 10 Harvard sentences used as is male. 1. A input for experiments. Speaker large size in stockings is hard to sell. 2. The birch canoe slid on the smooth planks. 3. Glue the sheet to the dark, 4. It's 5. These days a chiken leg is a rare dish. 6. Rice is often served in round bowls. 7. The juice of lemons makes fine punch. 8. The box was thrown beside the parked truck. easy to tell blue background. the depth of a well. 9. The hogs were fed chopped corn and garbage. - 109 - 10. Four hours of steady work faced us.