by (1982)

advertisement
EVALUATION OF
INFINITE PEAK CLIPPING
AS A MEANS OF AMPLITUDE COMPRESSION
by
Karen Ann Silletto
If
B.S.E.E.,
University of California at Davis
(1982)
Submitted to the Department of
Electrical Engineering and Computer Science
in Partial Fulfillment of the
Requirements for the
Degree of
MASTER OF SCIENCE IN ELECTRICAL ENGINEERING
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September
1984
Massachusetts Institute of Technology 1984
Signature of Author
Department of Electxtial Engineeringj aad C6mputer Science
July 13, 1984
Certified by:
Dr.
P.M/ Zurek,
Thesis
Supervisor
Accepted by:
A.C.Smith, Chairman
Departmental Committee on Graduate Students
NSTTUTE
OF TFCNOLoGY
ivASSACHUSEdiS
OCT 0 4 1984
LIBRA1IES
ARCHIVES
EVALUATION OF
INFINITE PEAK CLIPPING
AS A MEANS OF AMPLITUDE COMPRESSION
by
KAREN ANN SILLETTO
Submitted to the Department of Electrical Engineering and
Computer Science on July 13, 1984 in partial fulfillment
of the requirements for the Degree of
Master of Science in Electrical Engineering
ABSTRACT
Infinite
technique
peak
for
clipping
is
a
simple
and
processing speech prior to transmission over
an analog channel with limited dynamic range.
technique
However,
the
has yet to be applied seriously to the problem of
limited auditory dynamic
hearing
effective
loss.
In
range
this
accompanying
thesis,
an
sensorineural
attempt
was made to
evaluate the potential for infinite peak clipping as a means
of
amplitude
compression.
Measurements
were made of the
compression of the range of speech levels when
followed
clipped
by
(as
post-filtering
speech
is
analyzed
by
clipping
is presumed to occur when
the
ear).
2
-
distortions caused by clipping were also assessed.
-
is
Spectral
and
widths
of
filtered
speech
were found to be relatively independent of
the
characteristics
post-filter.
level
clipped
of
distributions
The
of
Measured
the
both
ranges
and
pre-filter
between
cumulative levels were about 10-15 dB
10%
the
as
the
and
90%
to
the
compared
input ranges of 30 to 40 dB.
The clipped spectra of unvoiced speech
The
analytically.
predicted
sounds
of
spectra
clipped
can
voiced
were
sounds cannot be easily predicted and so several cases
examined
empirically.
In
distortion of the input time wave,
not
severe.
despite
general,
amplitude
that infinite peak clipping as
deserves
compression
Dr.
Patrick M.
Title: Research Scientist,
detailed
study
Zurek
Research Laboratory of
-
3
-
Electronics
It
is
a means of extreme
hearing-impaired listeners.
Thesis Supervisor:
are
to preserve
designed
relevant cues in the clipped spectra are discussed.
concluded
radical
spectral distortions
schemes
Pre-filtering
the
be
with
ACKNOWLEDGMENT
First,
Pat
I wish to thank
his
for
Zurek
continued
I would also like to
guidance
and support on this project.
thank all
the members of the Communications Biophysics Group
for
to make the laboratory such an enjoyable place
helping
to work.
Although I cannot mention all the people who
I would like to
been wonderfully supportive and encouraging,
office-mates,
forgetting Mineola Minos),
away from home.
McConnell,
Also,
Leotta,
Dan
special thanks to
and Diane Bustamante
really needed
it.
and
Reid
(not
Dan
Leotta,
Mike
for their extra help when I
I would like to thank my parents
Finally,
and family for their love,
Jean
for making the office such a home
and continued support.
-
4
-
thank my
have
TABLE OF CONTENTS
......................
2
ACKNOWLEDGEMENTS ..............
4
TABLE OF CONTENTS
.............
5
...............
7
ABSTRACT
LIST OF FIGURES
CHAPTER 1
9
INTRODUCTION .......
.........
0
9
1.1
Problem Definition
1.2
Previous Research
12
1.3
Rationale for Pres ent Study
16
CHAPTER 2
METHODS
18
............
18
2.1
Inputs
2.2
Filtering and Clipping
2.4
2.5
6
...
2.2.1
Pre-filters
2.2.2
Clippers
2.2.3
Post-filters
....
20
a
20
.
a
......
21
..
21
Dynamic Range Analysis
.
24
2.3.1
Level
.
24
2.3.2
Probability Distribu tion Analysis
Detection
Spectral Analysis
......... ...............
.25
26
2.4.1
H.P. Spectrum Analyz er .............. 26
2.4.2
ILS Spectral Analysi s ..............27
Control Measurements and System
Evaluation...................... ..........
-
5
-
2.3
...........
28
2.5.1
Evaluation of the Level-Distribution
Measurement System.................... a .28
2.5.2
CHAPTER 3
Determination of Sample Size for
Sentences........................
RESULTS
3.1
37
..............................
Comparison of Distributions in 16
Frequency Channels..................
3.2
37
Comparison of Digital and Analog
Implementations..................... ......
40
......
40
.....
43
3.2.1
Digital vs.
3.2.2
Digital vs. Analog Processin g
Compression Results
3.3
3.4
CHAPTER 4
.32
Analog Sentences
............... .......
3.3.1
Pre-filtered Speech ........
3.3.2
Post-filtered Speech
45
. 45
.......
50
Spectral Modifications ............ .......
51
.......
3.4.1
Spectra of Clipped Unvoiced Speech 54
3.4.2
Phase Dependence of Clipped Voiced
Sounds...................... .......
69
3.4.3
Spectra of Clipped Vowels
.. .......
73
.94
DISCUSSION ........................
4.1
Compression Results
4.2
Comparison with an AGC System ...
4. 3
Spectral Modifications ..........
4.4
Implications for Hearing Aid Desi gn
4.5
Recommendations for Future Work
. 98
...
.
APPENDIX A
Probability Densities of Test Signals
APPENDIX B
10 Harvard Sentences
.
.
.
..
.
.
-
.
6
..
99
......... 101
REFERENCES
-
.
. 95
.............
...................... 102
.... 104
..................... 109
LIST OF FIGURES
1.1 A running spectrum of a two-word utterance, before
and after infinite peak clipping (Taken from Licklider
et al., 1948).....
....................................... 15
2.1 System Block Diagram................................. 19
2.2 Measured and calculated ranges for narrowband noise
after squaring and lowpass filtering.....................
30
2.3 Measured and calculated ranges for tones after squaring
and lowpass filtering with an RC-lowpass filter.......... 31
2.4 Ranges
in the 1/3-octave band centered at 1000 Hz as a
function of'sentence sample size......................... 34
2.5 Ranges across 13 frequency channels comparing inputs
of 10 and 20 digital Harvard sentences.................... 36
3.1 Ranges across 13 frequency bands for both the clipping
and nonclipping conditions................................. 38
3.2 Clipped and nonclipped ranges comparing inputs of
digital and analog Harvard sentences...................... 42
3.3 Effect of different pre-filters on ranges in 13
frequency bands..........................................
47
3.4 Effect of different narrowband pre-filters on ranges
in the 1000 Hz frequency band............................. 49
3.5 Effect of varying the post-filter bandwidth and slopes.
- 7
-
Pre-filter is an allpass filter........................... 52
3.6 Effect of varying the post-filter bandwidth and slopes.
Pre-filter is an optimal filter and differentiator....... 53
3.7-3.12 Spectra of synthetic consonant noise bursts
before and after clipping................................
55
3.13-3.19 Spectra of filtered noise before and after
clipping................................................. 62
3.20 The phase dependence of clipped voiced sounds.
Phase shifted time-waves.................................
71
The magnitude spectra of phase shifted time-waves...
72
3.21
3.22-3.39 Spectra of synthetic vowels before and after
clipping.................................................
76
Al
Probability density functions for narrowband noise after
squaring and lowpass filtering for R less than 1........ 107
- 8
-
A2
Probability density functions for narrowband noise after
squaring and lowpass filtering for R greater than 1..... 108
Chapter 1
INTRODUCTION
1.1 Problem Definition
Sensorineural hearing impairments are characterized
elevated
hearing
without
thresholds
a
by
corresponding
elevation in the discomfort threshold.
This
reduction
dynamic
can
necessitate
range,
compromise
if
between
Consequently,
severe
speech
amplitude
enough,
intelligibility
compression
of
and
speech
in
a
comfort.
has
been
extensively studied as a means of matching the dynamic range
of
speech and other acoustic signals to the reduced dynamic
range of the impaired listener.
The most common means of amplitude compression has been
automatic
gain control (AGC) in which the gain of -an output
amplifier
is
amplitude.
controlled
Results
of
by
and
limitations
estimate
of
(1979).
While
there
for such systems,
applications
in the hearing-aid application.
due to the "attack time" of an AGC system,
limitation
-
9
are
clear
For
instance,
the effectiveness
sounds
is
is that the actual degree of
-
Another
input
there are also
in protecting the user from high-level transient
limited.
the
studies of AGC in hearing aids are
reviewed by Braida et al.
benefits
an
compression is less than that specified
(DeGennaro
et
al.,
1981).
Infinite peak-clipping is a simple method of
that
compression
amplitude
reduces the dynamic range of
drastically
so that
the speech signal by clipping the input time-wave
of
zero-crossings
the output
Since
the
waveform
original
preserved.
are
infinite
(envelope) amplitude is constant,
produces
clipping
peak
Only the
constant amplitude results.
of
wave
rectangular
of
reduction
range
maximal
a
the
wideband signal.
The obvious cost of this range
waveform
it
However,
distortion.
highly
be
Bindra, Pollack, 1948).
clipping
should
in
compression
implants
be
tactile
and
it
Thus,
as
considered
hearing
Licklider,
(Licklider, 1946;
intelligible
aids
speech
infinitely-clipped
despite this extreme distortion,
can
found that,
been
has
severe
is
compression
and
aids,
a
means
it
peak
of amplitude
perhaps
also
since
that
seem
would
cochlear
achieves
large
compression of the amplitude range of speech with relatively
little loss in intelligibility.
Though clipping produces a constant-amplitude
signal,
the
perceived loudness may not be constant because
the ear effectively filters incoming sounds
(critical
bands)
outputs
whose
-
10
-
bands
wideband
into
frequency
are believed to be
relevant to loudness
The
amplitude
(Green and Swets,
1966,
Chapter
10).
range of speech after post-filtering has not
yet been studied and clearly needs to be understood in order
to evaluate clipping as a means of amplitude compression.
The effects of clipping on important
intelligibility
are
particular,
while
short-time
spectral
important,
also
it
is
not
amplitude
clipping
for
speech
understood.
accepted
pattern
that
of
changes in this pattern resulting
wideband
multiple
well
generally
peak clipping are not understood.
though
cues
speech
from
fair
the
is
infinite
It is possible that,
produces
In
even
intelligibility,
narrowband clippers could better preserve spectral
amplitude
and
patterns
increase
consequently
intelligibility.
The general problem addressed in
infinite
peak
clipping
the
amplitude range of speech for
specifically,
be
might
this
used
thesis will investigate the
the
patterns
of
clipped
of
clipped
how
compress
the
The
measured
well
and
of
the
amplitude
to understand past studies which
as
-
11
-
as
range
and
waveforms will be analyzed in order to
evaluate the potential for clipping as a means of
compression,
More
influence of
distributions
speech.
filtered-clipped-filtered speech will be
spectra
to
is
hearing-impaired.
pre- and post-filtering on the amplitude
spectral
thesis
examined the
intelligibility of clipped speech.
1.2 Previous Research
Peak
transmitters
clipping
was
during
WWII
intelligibility over a
range.
dynamic
first
a
as
means
showed
experiments
only
preserves
with
limited
(1946) studied different types of
Licklider
detrimental
least
radio
for maximizing the
channel
transmission
amplitude distortion and concluded
the
in
introduced
to
was
peak-clipping
Licklider's
intelligibility.
since
that
that
clipping
peak
infinite
the zero crossings of the input signal,
pattern of instantaneous amplitudes in the
the
waveform
speech
is not essential for intelligibility.
Pollack
the
that
(1952) confirmed Licklider's results and showed
of
intelligibility
relative to unmodified speech,
signals
equated
in
range
of
is a
function
the
and
roughly
square
speech
than normal speech.
the
clipper degrades
(with
SNR
has
independent
speech signal.
more intelligible than normal speech at
the
of
speech,
terms of peak amplitude and with noise
added after the clipper)
frequency
infinitely-clipped
of
the
Clipped speech is
low
SNR's
because
more power per unit peak amplitude
At high SNR's, distortion introduced by
intelligibility relative to unmodified
-
12
-
speech.
Intelligibility
highpass
filtering
Bindra,
Pollack,
Niederjohn,
of
clipped
the
is
improved
by
speech before clipping (Licklider,
1948;
1970;
speech
Pollack,
Thomas
and
1952;
Thomas
and
Ravindran,
1971).
One
interpretation of this finding (Thomas and Niederjohn, 1970)
is that a highpass pre-filter with cutoff at 1100 Hz reduces
high-amplitude, low-frequency signals and results in
preservation
better
of the second formant, which is very important
for intelligibility.
There has been one investigation of clipped speech with
hearing-impaired
listeners.
Thomas
and
Sparks
(1971)
compared the intelligibility of filtered-clipped speech with
linearly-amplified
speech
by
hearing-impaired subjects.
For 13
speech
was
linearly-amplified speech
with
filtered-clipped
equated
in
terms
of
testing
their
of
the
ears
17
of
cases,
intelligible
more
the
17
two
types
overall SPLs.
of
16
the
than
speech
However, these
results are open to the criticism that the linear
reference
condition may have been less than optimal.
Hildebrant (1982)
system
that
filtered
investigated
the
regions corresponding to
formants.
The
incoming
the
frequency
ranges
multiband
a
clipping
speech into frequency
of
the
first
three
bands were individually clipped,
-
13
-
filtered again and then summed together.
A reduced
dynamic
range was created for normal listeners by adding white noise
after processing and restricting the wideband signal
below an artificial "discomfort threshold".
with
clipped
speech
linearly-amplified
greatest
speech.
(1981),
Guidarelli
with
was
4
who
much
to
be
Intelligibility
better
than
with
A similar system was studied by
found
intelligibility
that
octave-wide
or
6
was
two-third-octave-wide
channels.
The general conclusion that
drawn
from
range of wideband speech while maintaining a fair degree
This
intelligibility.
"amplitude
discarded
information"
result,
fundamental
in
a
speech
waveform
understanding
this
lies
result
wave.
Although
it
is
the
However,
in
often very similar to the
Licklider,
can
be
not
at
key
to
all
obvious
signals
from
are
spectra of unclipped signals.
and
Bindra,
Pollack
(1948)
clearly
the importance of examining the spectral changes
caused by clipping in order
effect
of
to
understand
the
clipping on intelligibility.
-
14
-
small
the
zero-crossings
inspection of the wave, the spectra of clipped
emphasized
that
realization that
the
"amplitude information" is also coded in the
a
of
without a drastic degradation of intelligibility,
initially appears quite surprising.
of
past
that infinite peak clipping reduces the dynamic
is
studies
be
can
relatively
Figure 1.1,
SW --
4E
SEN ---------
rIME--m
CH
I..
NO CLIPPING
SH----- E
INFINITE
I.-
8 EN-----------;H
PEAK
CUPPING
-
15
-
Figure 1.1 A running spectrum of a two-word utterance,
before and after infinite peak clipping (taken from
Licklider et al., 1948).
taken from their paper,
of
amplitude
vs.
shows a running spectrum
time
and
utterance before and after
this
frequency)
infinite
(a 3-D plot
of
peak
a
two-word
clipping.
From
frequency analysis it is clear why clipping has little
effect on intelligibility -
the spectral patterns of clipped
and unclipped speech are not drastically different.
1.3 Rationale for Present Study
The rationale for
studying
the
effect
of
pre-
and
post-filters on the range of clipped speech and the spectral
distortions caused by clipping follows
model
an
from
underlying
and set of assumptions about the operation of the ear
and the
perception
of
speech.
First,
incoming
that
the
ear
analyzes
frequency bands
that
are
approximately
believed
octave
wide.
In the present study,
it
is
generally
speech
one-third
signals.
the question:
of
an
it is assumed that the
quantity relevant to dynamic range is the envelope of
bandpass
into
A major portion of this
these
thesis addresses
what are the effects of post-filtering on the
-
16
-
envelope distributions of infinitely-clipped speech waves?
A
to
second important assumption is that the primary
intelligibility lie in the
of speech.
In particular,
spectral
cues
(magnitude) patterns
it is assumed that
local
maxima
in the spectra (formants) are vital to intelligibility.
effects of clipping on these patterns
well
as
ways, such as pre-filtering,
-
17
-
distortions can be minimized.
will
be
studied
The
as
in which the spectral
Chapter
2
Methods
A
is
block diagram of the system used in the present study
shown
in
Figure
2.1.
test signals) were either
and
clipped,
or
these
distributions
of
the
were
pre-filter
The dynamic
assessed
signal
a
The spectra of the
'A' were compared.
signals
(speech and other
first filtered with
they were unmodified.
two signals at point
of
Input signals
by
envelopes
ranges
measuring
after
the
bandpass
filtering.
2.1 Inputs
Harvard sentences
were
chosen
as
input
(IEEE,
1969)
signals.
speech sample that had an amplitude
characteristic
by
It
talker
was desired to use a
distribution
that
was
of spoken English, but yet was of manageable
size for digital processing.
determined
spoken by a male
preliminary
The number
measurements
-
18
-
(Section 2.5).
of
sentences
described
was
below
Level Distribution
Post-filter
Level
Probability
Detector Analysis
I
(a)
(a)
(d,a)
--
..--
-
'
Single Channel
-0
(A)
(d,a)
Pre-filter
O----
--
i
Infinite
eak clipper
-
-
16 bands
Input
(d,a)
(d,a)
Post-filter
(Multifilter)
(a)
*
Level
Probability
Detectors Analysis
(a)
(a)
d = digital
a = analog
H.P. 3582A
0--Spectrum Analyzer
J
(a)
ILS Analysis
(d)
Figure 2.1
System block diagram
H.P. 7004B
X-Y Recorder
(a)
Multiple Channel
2.2 Filtering and Clipping
Figure
Many of the operations shown in
implemented
'd'
The letters
analog equipment.
'a'
and
in
be
or with
in real time),
(not
digitally
either
could
2.1
Figure
2.1
indicate digital and analog capability.
2.2.1 Pre-filters
1100-Hz
highpass
suggested by Thomas and Niederjohn (1970).
For ease
The analog pre-filter was a
filter
2-pole,
of reference, this filter is termed "optimal" because Thomas
and
Niederjohn found it to give the best intelligibility of
in
condition
the
which
of
Also for ease
a number of pre-filters.
is
signal
reference,
clipped
the
without
pre-filtering will be termed "allpass" pre-filtering.
software package,
by
written
design
the
infinite
Signal
highpass
filter,
and
used to
waveforms.
An
used for
a
digital
pre-filters
allpass
an
a
filter
was
Butterworth design was
Digital
filter,
pre-filtering),
Incorporated,
and
filters
response
the digital filters.
"optimal"
(Interactive Laboratory System)
Technology
digital
impulse
ILS
single-pole,
were
filter
6
(i.e.
dB/octave
no
8000-Hz
which will be termed "differentiator" for
-
20
-
A
obvious reasons.
Speech
into
formant
the
described.
three
was
regions
Hz to 2800 Hz,
Hz.
digitally
that
pre-filtered
Hildebrant
(1982)
bandpass
filter
The first formant filter was a
filter was 900
the second formant
between 200 Hz and 900 Hz,
6000
also
the
to
narrowband pre-filters with bandwidths
Finally,
less than 231 Hz and center frequency at 1000 Hz
for
Hz
and the third formant filter was 2800
were
used
purpose of exploring the effects of the pre-filter
bandwidth and slope.
2.2.2 Clippers
Analog clipping was performed by a.Schmitt trigger with
an
The
adjustable
dead
prevented
zone
to
adjusted
was
triggering
zone"
"dead
hysteresis
internal
by
a
(Hildebrant, 1982).
minimum
performed
waveform values
by
(except zero,
Digital
clipping
the sign of the digitized
only
keeping
which
noise so that no output
signal resulted from a null input signal.
was
value
which remained zero).
2.2.3 Post-filters
The clipped waveform (at point
post-filtered
"critical-band"
detection)
(before
filters
using
-
21
a
-
be
'A' in Figure 2.1)
General
into
Radio
could
adjacent
(1925)
or
multifilter,
frequencies that range
slopes
skirt
a
single
one-third
30
contains
multifilter
filters with center
Butterworth
(3-zero),
6-pole
octave,
The
band.
frequency
into
post-filtered
digitally
Hz.
from 25 Hz to 8000
The
filter
the passband are about 36 dB/octave and
near
flatten out to about 18 dB/octave due to the three zeros
in
the transfer function.
from 19 one-third octave bands between
Only the outputs
125
since the most important cues
and 8000 Hz are used,
Hz
the
from
outputs
frequencies,
proportionately
are
bands
critical
three
broader
lowest
one-third octave bands are summed to form channel
outputs
from third-octaves
summed
to
form
channel
centered at 250 Hz and
2.
frequencies
lower
at
of
these
1,
and the
315 Hz are
The remaining channels are the
outputs of single third-octave bands.
center
Because
range.
frequency
for intelligibility lie in this
The
bandwidths
of the 16 frequency channels
and
are listed
in Table I.
For greater flexibility
examining
in
skirts and bandwidths,
post-filter
the
effect
single band post-filters
Post-filter skirts were
were implemented digitally.
of
varied
by changing the number of poles in the filter transform from
from
231
Hz
The bandwidth of the
to
.29
The
Hz.
-
22
postfilter
was
varied
filtered bandpass output was
-
2 to 8 poles.
Table
I
RC-Filter
Time
-3 dB
Constant
Frequency
Channel
Bandwidth
Center
number
frequency
R
1
160 Hz
113 Hz
2
280
130
9390.0
17
7.65
3
400
93
5730.0
28
3.32
4
500
115
4456.0
36
3.19
5
630
146
3390.0
47
3.11
6
800
185
2657.9
60
3.08
7
1000
231
1989.4
80
2.89
8
1250
289
2021.0
79
3.66
9
1600
371
1559.7
102
3.63
10
2000
463
1368.7
117
3.95
11
2500
579
859.4
185
3.12
12
3150
729
859.4
185
3.94
13
4000
926
251.5
531
1.74
14
5000
1158
198.9
795
1.46
15
6300
1459
99.9
1592
0.92
16
8000
1852
39.9
3979
0.47
-
23
-
17825.0 usecs
9 Hz
12.50
input into a single analog detector.
2.3
Dynamic Range Analysis
Level Detection
2.3.1
The hardware used to detect the envelope of the
is
part
a
of
multi-channel
Automatic
Gain
Control
by
Coln
(1979).
compression system designed and developed
short-term rms calculation
Each level detector performs a
which the bandpass
signal
in
signal is first squared and then filtered
output
of
one
of these
detectors is an estimate of the logarithm
of
the
envelope
by
an
The
filter.
RC-lowpass
There are 16 level detectors,
squared.
16 bandpass outputs of the multifilter.
is
post-filter
digital
(after D/A conversion)
employed,
the
down
signal.
bandwidth
is
to
to
the
level
(channel 7).
In channels
filter capacitor.
time constants were adjusted
ripple
the output of the filter
filter time constant can be adjusted by
to
keep
1 through 12,
the
the
second-harmonic
6 percent at twice the center frequency of
However,
larger
in
the
than
two
one-third
-
24
-
varying the
single-band
When a
is presented directly
detector with an 80 Hz RC-filter
Each RC-lowpass
one for each of the
lowest
octave,
channels
the
therefore a
greater percentage of the signal envelope will be
resulting
in
through 16,
to
the
a
In channels 13
the RC-filter bandwidth is larger in
input
channels,
"smoother" signal envelope.
filtered,
signal
center
frequency
proportion
than in the other
so more second harmonic ripple will be included in
the estimate of the envelope.
The filter time-constants are
listed in Table I.
2.3.2 Probability Distribution Analysis
Samples
from
the
16
level
detector
outputs
were
digitized, and "bins" corresponding to the decibel level and
channel of each sample were incremented.
band
was
sampled at a rate of 400 samples/second,
1/2
dB
The
samples/second.
total sampling rate of 6400
was
The signal in each
giving a
binwidth
which was determined by
and the dynamic range,
the level detector, was 80 dB in each band.
of
Normalized histograms
signal
envelope
are
plotted
decibel
the
level
for each of the
channels.
The distributions are normalized to
histogram
value
distribution is
amplitude
in the 16 channels.
along
calculated
levels.,
The
10%
level
which only 10% of the speech samples
have
greater
amplitudes
-
25
-
samples
of
the
16 frequency
the
largest
The rms level of each
with
the
10%
and
90%
is the amplitude level
exceed;
than
the
90%
90%
of
the
point.
Henceforth in this thesis,
level
the term
"range"
applied
to
a
distribution will mean the difference between the 10%
and 90% levels.
Samples of silence or low-level noise may
distributions.
distributions
be
simply
In
cases
where
the
are clearly separated,
deleted.
corrupt
signal
the noise
and
the
noise
samples
can
The modified file will produce a more
accurate signal amplitude distribution.
Such
distributions
will be labeled "edited" in the following sections.
2.4 Spectral Analysis
Spectral analysis was performed using two methods.
H.P.
spectrum
3528A
analyzer
graphs
were plotted with the H.P.
and averaged the
computed
spectral magnitudes of a number of time
samples.
spectrum
of
Spectral
7004B X-Y recorder.
ILS software package was also used to compute and
magnitude
The
a single time sample.
plot
The
the
Both methods
used the Fast Fourier Transform (FFT) algorithm.
Spectrum Analyzer
The H.P.
segment
of
3582A spectrum analyzer transforms
discrete
spectrum using an
FFT
time
data
into a discrete
implementation.
-
26
-
2.4.1 H.P.
In
a
finite
frequency
single-channel
mode,
1024
samples
of the input signal are transformed by
the FFT into 256 complex values.
frame
The duration,
T,
of a time
is determined by the selected frequency spans and the
number of samples,
T = 1024/(4*frequency span) = 256/frequency span.
For example,
51.2
to compute the spectrum over a
time segment is sampled.
msec
points are spaced
at
equal
5000 Hz span,
a
The discrete frequency
intervals,
vF=l/T.
In
this
example frequency resolution is 19.5 Hz.
analyzer also allows for up to 256 spectra to
The H.P.
be
to give a better
averaged
estimate of the spectrum of a
Time samples were weighted
random signal.
with
a
Hanning
window in order to minimize both the amplitude and frequency
uncertainty in the spectrum.
2.4.2 ILS Spectral Analysis
The spectrum of a finite digital wave can
and
be
computed
graphed with an ILS routine which computes a 1024-point
FFT.
window
The
1024-point time sample was weighted with a Hamming
in
distortion.
order
to
the frequency and amplitude
minimize
Since only 1024 points are
repeat
waves
which
points
(such as
periodically
static vowels)
-
27
static
time
in less than 1024 sample
are appropriate
this analysis.
-
used,
signals
for
2.5 Control Measurements and System Evaluation
2.5.1
Evaluation
of
the
Level
Distribution
Measurement
System
In
order
detector
system,
were measured.
were
to
input
assess
the
performance
was
known to
Analog narrowband
level
random
as
follow
noise
and
tones
into the level detector in channel 7,
directly
used
narrowband
the
ranges of signals with known distributions
which has an RC-filter cutoff of 80 Hz.
noise
of
a
the
Narrowband
random
test signal because its envelope is
Rayleigh
probability
noise was centered at 1000 Hz,
density.
The
and the bandwidth
of the noise ranged from 3 Hz to 231 Hz.
The measured range for a
2
dB
smaller
than
the
10-Hz band of noise was 11 db,
expected
Rayleigh-distributed variable.
10-Hz
tone
was 13 dB,
The
range
of 13.4 dB for a
measured
range
for
a
3 dB smaller than the expected range
of 16 dB for a tone.
The
ranges
the
discrepancy
of
the
influence
probability
between
the
measured
and
expected
envelope distributions can be attributed to
of
the
densities
RC-lowpass
filter.
Calculated
for narrowband noise and tones after
-
28
-
squaring and lowpass filtering are given in Appendix A.
The
calculated
and measured ranges
graphed as a function
bandwidth
(or
R,
frequency)
Figures 2.2 and 2.3,
figures
of
that
the
frequency
the
to
ratio
the
measured
increases,
envelope
range
(dashed
envelope
range
increases.
of
input
RC-filter
respectively.
ranges to within about 1 dB.
tone
for noisebands and tones are
signal
bandwidth in
It is evident from these
ranges
follow
the calculated
As the noise bandwidth or
the
the difference between the input
line)
and
If
the
lowpass-filtered
the RC-filter bandwidth is
much larger than the noise bandwidth
(or
tone
frequency),
and if the second-harmonic ripple can be neglected,
then the
effect of lowpass filtering will be small.
Residual deviations between the calculated and measured
ranges
may be attributable to sampling errors or to circuit
imperfections.
being
A possible reason
slightly
for
the
measured
range
greater than the calculated range for noise
with bandwidths larger than approximately 80 Hz is that
analysis
did
ripple.
As
not
the
include
noise
the
the
effects of second harmonic
bandwidth
increases,
the
signal
envelope will include more second-harmonic ripple which will
increase the measured level range.
It is expected that the decibel underestimation of
input
envelope
range will be the same for
signals,
such as
29
-
speech, with level ranges that are much wider than those
-
the
of
j.
s-
= calculated range
o=
measu red
range
14-
+
t2
0
1
0
~
0
+- 0
+
C0
+
6
0
4-
4-
2-
313 .0625
=
0.2
0.5
.0
2.0
Input Noise Bandwidth/Post-detector
4.0
6.0
Bandwidth
-
30
-
Figure 2.2
Measured and calculated ranges for narrowband noise
(centered at 1000 Hz) after squaring and lowpass filtering
with an RC-lowpass filter.
Range is graphed as a function
of the input noise bandwidth divided by the RC-filter
Dashed line represents the expected range for
bandwidth.
a Rayleigh distributed variable.
20
calculated range
measured
range
---------------------------------------------------------
0
+
0
14
0
12
-
#-
0
a,
(Y_c
8
0
+
4
A-
0
20
0.0
-/,
.0313 .062S
0.12S
0.2S
0.S
L.
R = Tone Frequency/Post-detector
2.0
4.0
--
8.0
Bandwidth
-
31
-
Figure 2.3 Measured and calculated ranges for tones after squaring
Range is
and lowpass filtering with an RC-lowpass filter.
graphed as a function of the tone frequency divided by the
RC-filter bandwidth.
noise
or
tones.
result from a
process,
If we assume that such wide distributions
large
but
slow
modulation
of
a
then the resulting level distribution should be the
convolution of the distributions of the modulation
its variation will be
for example,
unaffected
by
the
lowpass
filter.
the underestimation
about 5 dB if the noise is slowly modulated
be
still
the
if there is a 5-dB underestimation of the
input 13.4 dB range of narrowband noise,
will
and
If the modulation is sufficiently slow,
envelope.
bandpass
Thus,
bandpass
over a range of,
for instance,
40 dB.
2.5.2 Determination of Sample Size for Sentences
The range of a bandpass speech signal estimated with an
of n sentences should converge to a constant value as
input
n is
The sample size used in the main study
increased.
chosen
by
preliminary
measurements in which the number of
increased until there was only a
sentences in the sample was
small
change
in
the
was
for an increase in the sample
range
size.
The level ranges
after
single
post-filtering
filter centered at 1000 Hz.
-
32
sentences
were
measured
by a one-third octave (231
The ranges
-
Hz)
digitally
of
were
found
to
vary
from
25
dB
(Figure 2.4).
to
46 dB among a
Since this range,
varied 21 dB,
sample of 10 sentences
for
10
single
a larger sample of sentences was considered.
The sample size was increased to four, ten,
sentences
and
the
range
was measured
20-sentence
36 dB and 38 dB,
one-half octave wide di'stributions above 500 Hz,
(1981)
(1980) and DeGennaro et al.
Krieg
and
would
sentences
were close to the results obtained by Dunn and White
for
(The
sample contained those same 10 sentences.)
since the measured ranges,
appropriate
twenty
sentences,
It appeared that a sample size of 10 or 20
be
or
(Figure 2.4).
10-sentence sample contained the original 4
the
sentences,
(1939)
and by
for a third-octave
distribution.
frequency
The measured ranges in 13
input
(Figure
In
2.5).
sentences
the
of
sample
were
10
20
and
order
for
an
sentences were compared
examine
to
channels
all
channels,
the
bandpass filtered with the analog one-third
and
16
were omitted because the input was filtered at 4.5 kHz).
In
octave multifilter.
dB.
the
Since
were chosen
are
15,
frequency channels the ranges differed by at
difference was small,
for the sample data.
seconds long,
The
most
3
10 Harvard sentences
10
sentences
chosen
The sentences are aproximately 3
listed in Appendix B.
so 30 seconds of data at
-
33
-
all 13
(The ranges in channels 14,
400
samples/second,
s0 -
1
1
1
1
1
1
1
1
I
I
10
I
12
14
I
1G
46 H-
40 I-
Cn
301-
20
0
I
2
i
I
4
G
9
I
19
I
I
20
22
Number oF Digital Harvard Sentences
-
34
-
Figure 2.4 Ranges in the one-third octave band centered at
1000 Hz as a function of sentence sample size.
or 12000 samples of data are included in each histogram from
-
35
-
which range is estimated.
j. = 10
o
=
sentences_
20 sentences
4S40 -
0a
<U
20-
10-
0-
.16
.28
.4
.83
.
.8
1.0 1.2S 1.9
Band Center Frequency
2.0 2.9 3.19 4.0
(kHz)
-
36
-
Figure 2.5
Ranges across 13 frequency channels comparing
inputs of 10 and 20 digital Harvard sentences.
Chapter 3
Results
3.1 Comparison of Distributions in 16 Frequency Channels
The histograms of one-third octave bands of clipped
First,
experiments.
(pre-filter
by
the
and post-filtered
condition
reference
The
multifilter.
clipped
and post-filtered by the
highpass filter (analog), clipped,
1100-Hz
were
sentences were pre-filtered with an
the
second,
multifilter;
following
the
sentences
the 10 digital
an allpass filter)
was
in
investigated
were
speech
unmodified
and
speech
of
filtered only by the multifilter was also investigated.
The amplitude ranges,
Figure
shown in
dependence on band center frequency.
may be attributed
quantity
the
R,
to
Figure
(see
From the analysis presented in
envelope
in
Section
expect a smaller range in the
we
2.3),
because R
is
larger
I).
there should be relatively less smoothing of
the
higher
channel
frequency
13
(see
at
4000
larger
in
proportion
bandwidth of the RC-filter is
-
channels
(centered
-
37
-
Conversely,
the
Table
the
in
of
bands
across
channels centered at 160 Hz and 280 Hz,
than
a
ratio of input bandwidth to the detector's
RC-filter bandwidth.
2.5.1
exhibit
Part of this dependence
variation
the
3.1,
Hz)
since the
to
the
I
I
[= F2
= F1/C/F2
0= C/F2
0
45-
0
0
03
35-
ea1
001
:7)
03
C
30-
03
0
0
25-
-
41-
0
0
20is
of10
0
-1-
-
1-0
-1-
0
0
0
I-
0
0
-
0
0
0
.1
I I
.28
I I
.4
I
.9
I
.63
I
.8
I
I
1.0 1.26 1.6
0
I
I~ I
2.0 2.9 3.19 4.0
Band Center Frequency (kHz)
Figure 3.1 Ranges across 13 frequency bands for clipping and.
Input is 10 digital Harvard sentences.
nonclipping conditions.
F1 = 1100 Hz highpass filter (analog)
C = Infinite peak clipper (analog)
-
38
-
F2 = Multifilter
signal
bandwidth
(R is smaller)
than in the lower-frequency
There will also be more second-harmonic ripple
channels.
channel 13 which will tend to increase the range.
are
results for only the first 13 channels
in
(Note that
because
plotted
the digital signals were lowpass filtered at 4.5 kHz,
and the
three highest channels are centered at frequencies above that
cutoff).
The
amplitude
multifilter
varies
range
at
for
15
most
channels and averages 32 dB.
well
with
system,
speech
dB
The
as well as with Dunn and
ranges
White
The present measurements
than
reported
used the same system as
envelopes.
It
should
by
reasonably
agree
(1940)
are
who. used
a
here
for
smaller
slightly
DeGennaro et al.
used
frequency
(peak pressures in 1/8 second
intervals).
those
all
across
the
(1980), who used the present
the results of Krieg
different measurement procedure
by
filtered
(1981), who also
measuring
bandpass
be noted that while DeGennaro et al.
(1981) display a 50-dB range for one talker and a single band
(Figure
and 7)
4
in
their paper),
their subsequent
for at least two talkers and all
sixteen
39
-
nearly all ranges to be between 30 and 40 dB.
-
figures
bands
(5,
6,
show
Clipping reduces the range of speech by about 15
dB
20
The range of bandpass speech
across all frequency bands.
that was pre-filtered with an
to
1100-Hz
highpass
filter
and
clipped varies 12 dB across frequency bands and averages 15.8
dB.
by
The range of clipped speech with no pre-filtering varies
11
This difference
dB and averages 9.5 dB across bands.
will
be
channel
is
between highpass pre-filtering and no pre-filtering
discussed further in Section 3.3.1.
In subsequent experiments
the
analyzed,
band
in which a single
approximately
was chosen as representative because it is
the
middle
the speech frequency range,
of
This band
centered at 1000 Hz is used.
band
the measured ranges in this
are
in
and also because
near
averages across bands for the three conditions
the
respective
in Figure 3.1.
3.2 Comparison of Digital and Analog Implementations
3.2.1 Digital vs.
Analog Sentences
In order to identify any differences
due
to
digital
or
analog sentence inputs,
with tape-recorded analog
were compared.
sentences
and
histograms made
digital
sentences
Thirty tape-recorded analog Harvard sentences
and 10 digital Harvard sentences
(after
D/A
conversion
and
filtering at 4.5 kHz) were processed identically with
-
40
-
lowpass
in the measurements
the analog equipment.
Ranges
were
compared
filtered speech and clipped/filtered speech.
for
bandpass
In the clipping
condition,
these sentences were pre-filtered with the 1100-Hz
highpass
filter,
infinitely
filtered into the sixteen channels by the
analog
sentence
histograms
were
No
for
at
discrepancies
ranges
the
between
difference
was
editing
The ranges
3.2.
are shown in Figure
Except
The
"edited" (as described in
the digital sentence histograms.
for
bandpass
multifilter.
Section 2.3.2) to eliminate obvious noise.
neccessary
and
peak-clipped,
800
for
sentences in the nonclipped condition
and
2500
and
digital
is
less
the
Hz,
than
analog
5
dB.
(Note that the three bands above 4.5 kHz are included for the
analog sentences.) With the exception of channel 1,
was
range
measured
a
larger
with analog sentences than with digital
sentences.
In the clipping condition,
the
difference
between
the
3 dB.
The
and analog sentences is
2 to
smaller discrepancy here may be attributed to
the
for
digital
which
"dead-zone"
was
produced by tape hiss
distributions
to 40 dB),
it
distributions
clipper's
adjusted so that there was no output
(when the speech was off).
Because the
of the unclipped analog sentences are wide
is
more
for
likely
the
speech
and
(30
noise
to overlap in this condition than with clipp.ed
-
41
-
ranges
SI
I
I
3
s0
I
I
I
I
Analog Sentences (edited)
-
I
I
I
I
I
= F2
4-
= F2
= FI/C/F2
x
= FI/C/F2
0
45
3
0
0
0
0
+
33
3-
c
I
OLgftel Sentences
4++
-
30 -
+
0
+
2S
0
0
20
xx
15
0
06o
10
I
-
I
.16 .29 .4
I
I
I
I
I
I II|
|
|
.S .G3 .8 1.0 1.2S LB 2.0 2.G3.194.0 S.0 6.3 9.0
-
0
Band Center Frequency (kHz)
Figure 3.2 Ranges across 13 or 16 frequency bands comparing
digital and analog Harvard sentences as inputs.
Fl = 1100 Hz highpass filter (analog)
C = Infinite peak clipper (analog)
-
42
-
F2 = Multifilter
speech.
Under clipping,
distributions
is
the
reduced
mixing
because
of
noise
the
and
range
speech
of speech is
reduced.
It
appears
from
these
are
superior
to
sentences
signal-to-noise ratio.
measurements
analog
that
digital
in
terms of
sentences
However, for the clipping
condition,
analog and digital sentences produce similar results.
3.2.2 Digital vs.
Analog Processing
In order to investigate
pre-
different
designed.
and
and
The 10 digital sentences
were compared.
were used as test inputs.
highpass filter,
processing.
digital
of
digital
post-filters,
Histograms after
Counterpart
effects
the
of
were
processing
listed in Appendix B
1100
Hz
and clipper were used for analog processing.
digital
were
filters
designed
for
digital
The comparison was made on the one-third octave
different pre-filters
(an allpass
and a one-third octave
Hz)
filters
analog
(channel 7).
Digital and analog processing were compared
1000
number
multifilter,
The analog
distribution centered at 1000 Hz
filter,
a
before
clipping
well
43
-
condition of no clipping.
as
three
an 1100-Hz highpass
filter
(231 Hz)
as
-
filter,
with
for
centered
at
the reference
Table II
A comparison of
ranges
center
frequency=1000
in
channel
(bandwidth=231 Hz,
7
Hz)
after
digital and analog
processing of 10 digital sentences.
F2 = 1/3 octave (231-Hz bandwidth)
Fl = 1100-Hz highpass pre-filter
C = Infinite peak clipper
Digitally
processed
filter centered at 1000 Hz
Fl/C/F2
F2/C/F2
F2
C/F2
36
11
13
6
34
14
18
7
processed
-
44
-
Analog
Table II
and
shows that the
differ
processing by at most 5 dB.
analog
speech
is that pre-filtering the
with
between
digital
An interesting fact
one-third
a
octave
This will be
from 36 to 6 dB.
range
the
decreases
filter
ranges
discussed in Section 3.3.1.
3.3 Compression Results
Pre-filtering
3.3.1
the
various
of
effect
The pre-filters used here
various
studies
used
by
Niederjohn
filter)
Licklider
and
The third is an
(1968).
(1948)
1100-Hz
and
and Niederjohn (1968).
as
and
dB/octave)
referred to as the
(subsequently
"optimal"
A
fourth
is the set of pre-filters described by Hildebrant
The
first-formant
Butterworth
filter
filter
bandpass
-
45
-
4-pole
(1948),
Thomas
(12
(1982) that was designed to separate the formant
speech.
in
The first is an allpass
as used by Licklider
Pollack
suggested by Thomas
pre-filter
employed
(subsequently referred to as a differentiator)
filter
highpass
those
The second is an RC highpass filter with cutoff
for example.
at 8000 Hz
were
of clipped speech.
no pre-filtering)
(i.e.
determine
pre-filters on the range of clipped
speech.
filter
to
conducted
The following experiments were
is
a
regions
of
200 Hz to 900 Hz,
(spanning
channels
1
through 6),
the second-formant
10-pole Butterworth
through
11),
and
filter is a 900 Hz to 2800 Hz,
bandpass
filter
(spanning
channels
7
the third-formant filter covers the range
from 2800 Hz to 6000 Hz
(channels 12 and 13)
is
and
also
a
10-pole Butterworth filter.
The ranges in the thirteen channels
under
from 9
no pre-filtering has the lowest overall range,
20
averaging
and
dB
Differentiating the sentences before clipping
range
relative
19.8
with
Pre-filtering
dB.
increase the range quite as
range
to
bands.
increases
the
to no pre-filtering by about 6 dB across all
these ranges vary from 13 dB to
bands;
dB
frequency
across
dB
13.7
various
Clipped speech with
are shown in Figure 3.3.
configurations
the
dB
as
and
differentiating;
is 9 dB to 22 dB across all bands and averages
In the low-frequency
bands,
speech
average
filter does not
optimal
an
much
27
pre-filtered
the
15.5 dB.
with
the
first formant filter has the smallest range of the four types
of pre-filtering
The
second
similar to
average
and
the
ranges
(averaging 12.8 dB in the
third
formant
filters
and
differentiator
first 6 channels).
produce
optimal
an effect
filters.
The
in these two formant filter regions are 16.6
dB in channels 7 through 11 and 14.5 dB in
-
46
-
13.
channels
12
and
Fl/C/F
= F2/C/F
= F3/C/F
x
= C/F
= Fopt/C/F
= FdtF/C/F
35-
30+
7
~2520 CL/
\
x
+-,
XV
x
Zk\4
x"
X
/
10 0
7-
S_
-
.16 .28 .4
.5 .S3 .8 10 1.25 1.6 ZO 2.5 3.15 4.0
Band Center Frequency (kHz)
Figure 3.3
Effect of different pre-filters on ranges in 13
frequency bands.
-
47
-
F1 = 200-900 Hz bandpass digital filter
F2 = 900-2800 Hz bandpass digital filter
F3 = 2800-6000 Hz bandpass digital filter
Fopt = 1100 Hz highpass digital filter
Fdif = 8000 Hz highpass digital filter
C = Infinite
peak clipper
F = Multifilter
The
experiments.
The
in
investigated
were
sentences
and
bandwidth
pre-filter
varying
were
slope
rejection
the
following
pre-filtered,
digitally
and digitally post-filtered with a one-third octave
clipped,
filter centered at
bandpass
filter-skirt
Hz.
of
effects
slope
1000
which
Hz,
same
the
had
the pre-filter also centered at 1000
as
The pre-filter bandwidth varied from 231 Hz
to 29 Hz and
the slope of the pre-filter and post-filter skirt were varied
together by changing the Butterworth filter transfer function
2-pole
from
channel
to
signal level was detected in
The
8-pole.
the
after bypassing
(RC-filter bandwidth is 80 Hz)
7
multifilter.
comparing the ranges of speech
Plots
the
narrow-band
allpass,
optimal,
3.4.
filter,
the range
speech
to
and differentiator
Except
Figure
into
filters
pre-filtered
with the
shown
are
filters
with
in
case of a 2-pole Butterworth
the
for
pre-filtered
for clipped speech
that
was
pre-filtered
a one-third octave (231 Hz) band is 5-8 dB smaller than
speech that
optimal
was
pre-filtered
filters.
pre-filter decreases,
pre-filter
transform
with
the
differentiator
as
the
bandwidth
Generally,
the range increases.
has
more
than
2
-
48
-
pre-filter slope does not change the ranges
Given
poles,
or
of
the
that
the
varying the
appreciably.
2-pole Butterworth Filter
4-pole Butterworth Filter
20
-
-
20
-
18
16-
t4
-
12
to
0,
0,
1-4
M
X
11
=1
-
11S
-
2q G
Bandwidth
- 1_7
231 -
-
-
O
-
-p
2-
-2q
All-
-
-
6-pole Butterworth Filter
I
I
I
I
I
S-pole
All
- OIFP - ;oat -
-
P'eFlIter
Butterworth Filter
I
-
20
I
1-73 - 231 -
Bandwidth (Hz)
P-e-FE~ter
(Hz)
S2 - 1S
-
4
2
14
161-
*
-
18
14
12
14
14
-4
14
14
14
1-
4-
2
2
* 2q S
-
IlS
-
173 - 2.31
Bandwidth (Hz)
-
-
AL-
2q
8
is
Bandwidth
P
t73
2.1
-
- OFP
A
(Hz)
Figure 3.4
Effect of different narrowband pre-filters on
ranges.
Also plotted are the ranges with the differentiator
(Diff), the 1100-Hz highpass filter (Fopt), and the allpass
filter (All). Post-filter is 1/3-octave centered at 1000 Hz.
Pre-filter and post-filter slopes
are the
same.
For the
-
49
-
cases of 'Diff', 'Fopt', and 'All' these slopes apply only
to the post-filter.
The pre-filters with a
produce
2-pole
transfer
effect on the low frequency part of the spectrum
an
similar to the differentiator.
Since the speech spectrum
dominated by low-frequency components,
(upper left pannel of Figure 3.4).
the
general
trends
with a 29 Hz,
Clipping
speech.
appear
are similar
from
for speech that is pre-filtered
clearly
compresses
the
16
dB
more
in
than
the
20
amplitude
1000-Hz
dB
(Figure 3.3);
that is,
range
of
clipping produces output
frequency
(Figure
3.4).
frequency bands the range is slightly larger,
dB
is
6-pole Butterworth filter.
ranges of 8 to
of
the ranges
Other discrepancies
Digitally pre-filtering and
reduction
will
function
band,
a
Across all
from 10
to
23
clipping decreases the range by 15
to 20 dB across all frequency bands.
3.3.2 Post-filtering
The effect of
by
post-filter.
bandwidth
skirts
were
the
the
bandwidths
clipped
and
speech
slopes
Post-filters were centered at 1000 Hz
was
1/24 octave
varying
decreased
(29 Hz).
one-third octave
from
varying
by
to
2-pole
-
50
the
was
of the
and
(231 Hz)
The slopes of the post-filter
increased
transfer function
from
-
determined
post-filtering
the
to
bandpass
Butterworth filter
8-pole.
Variations
in
were
post-filtering
examined
pre-filtered with either
or
filter,
the
differentiator.
that
filter,
allpass
the
speech
with
was
optimal
the
With the differentiator as
pre-filter,
only the effect of varying postfilter slopes
examined;
the
was
post-filter bandwidth was fixed at one-third
octave.
3.5
Figures
in
variations
examined,
show
3.6
and
all
for
that
pre-filters
Figure
slopes have relatively small effects on compression.
3.5
presents
filter)
allpass
slopes.
skirt
and
bandwidth
3.6,
as a function of
pre-filter
post-filter
(i.e.
bandwidth
an
and
range of the clipped speech varies from
The
with the exception of the measurement with the 29 Hz
9-15 dB,
optimal
no
with
results
range
skirt
and
bandwidth
post-filter
8-pole
Butterworth
pre-filtering,
and differentiator
with
Results
filter.
shown
in
Figure
exhibit a similar spread of ranges.
3.4 Spectral Modifications
Speech
determined
sounds can
by
distinction is
clipped
speech
the
classified
be
or
presence
into
absence
important in the analysis of
sounds.
The
spectra
of
two
categories
of voicing.
the
spectra
-
51
of
the two types of
speech sounds are analyzed in the following sections.
-
This
2-pole Butterworth post-filter
center frequency=OOOHz
I
is-
I
4-pole Butterworth post-filter
center frequency=OOOHz
I
I
I
18-
1.-
I
I
29
sa
I
I
I
I
T
I
-
16Is14
12
-
*
*
-
12
*
*
-
C
a-
4-
4-
2-
z
2q
sa
ils
174
Post-ftIter Bandwidth
232
a
(Hz)
Lis
Post-Fi[ter
174
232
Bandwidth
(Hz)
3-pole Butterworth post-filter
center frequency=OOHz
6-pole Butterworth post-filter
center frequency-COOHz
te - I
1.8-
1.4
1.4H
I
I
I
I
I
I
5
I
*
*
Is-
I
-1
2
2VY
-I
2
_2
29
I
1.1U5
Post-Filter
Figure 3.5
slopes.
174
Bandwidth
2q
232
G2
IUs
Post-rtiter
(Hz)
174
Bandwidth
232
(Hz)
Effect of varying the post-filter bandwidth and
Pre-filter is an allpass filter.
-
52
-
I
S
2-pole Butterworth post-filter
I
I8-
4-pole Butterworth post-filter
center frequency=l00oHz
I
I
I
1
1
:8
center frequency=100OHz
I
1.
I1
II
6
ts
Ist4
-
'
12
-
0
0
*
14
:2
*
t2:4
8-
-
12
0
6-
II
3
2q
sa
-
I
lls
I
I
I
I
174
L
a
232
2q
Post-filter Bandwidth (Hz)
Post-Flter
6-cole Butterworth post-filter
center frequency=1000Hz
I
I
I
I I
i
1*
I]
i
Bandwidth
(Hz)
3-pole Butterworth post-filter
center frequency=lOOHz
I
t I
I
IS
-
771
:15L74 232
a
0
*
0
'4
I
-
*
-
10
14
A
H
*
12
Z
80
0~
caH
-4
2t
2
Us
Post-Filter
232
:74
Bandwidth
0
(Hz)
20
Sa
,-Is
Post-4'iter
232
222
17.
Bandwidth
(Hz)
-
53
-
Effect of varying the post-filter bandwidth and
Figure 3.6
Pre-filter is an optimal filter (*) or
slopes.
differentiator (o).
3.4.1 Spectra of Clipped Unvoiced Speech
It has been shown
gaussian
noise
as
(Papoulis, 1965;
input,
the
after infinite clipping can
power
spectrum
Gx(w)
by
by
determined
the
= 2/TY
1966)
output power
"arcsin
respective autocorrelation functions,
Ry(t)/Ry(O)
Fawe,
spectrum Gy(w)
from
noises
Ry(t)
/f,sh,s/
and
the
synthesized
burst
input
and Rx(t).
Thus,
arcsinLRx(t)/Rx(O)]
before and after clipping.
noises consisted of
the
law" relating their
This relation was verified by measuring the
various
that for
tokens
of
spectra
of
The first set of
the
portions of /p,t,k/.
consonants
Clipping was
done by the analog clipper and spectra were analyzed with the
spectrum
consonant
function
analyzer.
was
also
The
power
calculated
spectrum
from
from which it was generated,
the
of each synthetic
filter
transfer
and the arcsin law was
applied to these spectra using the FFT.
Figures
and
the
3.7 through 3.12 show both the measured
spectra calculated from the arcsin law.
noise bursts were generated with a sampling rate of
only
a
5 kHz range is shown).
-
54
(Since the
10
kHz,
From the figures it is clear
that the clipped spectra follow the arcsin law.
-
spectra
Spectrum of /F/
0
-
-20
clipped
a
~i%
(n
C
q,40
0
L
4-
4)
a-60
-70
-80
100
0
I
I
I
2000
3000
4000
5000
Frequency (Hz)
1%
-10
-20
-30
clippU,
-40
-50
CL.
-60
-70
-80
0
1000
3000
2000
FREQUENCY
Figure 3.7
4000
5000
(HZ)
Spectra of the synthetic consonant noise burst,
/f/,
- 55
-
In the top plot, the
measured before and after clipping.
solid line is the ideal input spectrum and the dashed line
is the spectrum calculated from the arcsin law.
In the bottom
plot, the lower trace is the measured input spectrum and the
upper trace is the measured clipped spectrum.
Spectrum of /SH/
-20
-
0
-3
-'
4
'-,.
-clipped
a
C
C4
~i
+A
-so
0
lows
4000
Frequency (Hz)
0
I
I
-10
-20
-30
U -50
clipped
-
-
-60
-70
-80
0
II
1000
I -I
3000
2000
FREQUENCY (HZ)
4000
5000
- 56
-
Figure 3.8
Spectra of the synthetic consonant noise burst, /sh/.
Plots are the same as in Figure 3.7.
_T
-10
clipped
C
C140
C-
V)
M
---
4
e
400
Frequency (Hz)
0
-10
-20
2 -30
-
Z-4o
clipped
S-50
-60
-70
-80
1000
2000
3000
FREQUENCY
41000
5000
(Hz)
Figure 3.9 Spectra of the synthetic consonant noise burst, /s/.
Plots are the same as in Figure 3.7.
-
57
-
12 -
Spectrum of /S/
. ..............
Spectrum
of /P/
-T
0
-101
clipped
-
a
4
- --------
J
in
0
U3
-.60
0
3000
000
Frequency (Hz)
100
40M0
5000
0
-10
-20
-30
-
clipped
~-40
1
CL
-60
-j
-
-70
-80
0
1000
2000
3000
FREQUENCY
(Hz)
400
Figure 3.10 Spectra of the synthetic consonant noise burst,
Plots are the same as in Figure 3 .7.
-
58
p/.
Spectrum of /T/
-101--
-clipped
1-o
C
II
I
I
e
1000
_
0
a(I)
-60I-
-80
20Me
5000
300
Frequency (Hz)
0
-10
-20
'0-30
clipped
I-40
-j
-50
CA)
-60
-70
I
-80-.
0)
1000
I
I
2000
3000
I
400r)
5000
FREQUENCY (4Z)
-
59
-
Figure 3.11 Spectra of the synthetic consonant noise burst, /t/.
Plots are the same as in Figure 3.7.
Spectrum of /K/
0
10
I
--
I
Ca 3-1
- - -
-
'
clipped
-------7-----
aC
use
cn
-/0
-
-60
-88
I
vi
.300
200
LL.
4000
5000
Frequency (Hz)
Z0
210
41)
I-
clipped
SO
U~)
-70
-80
1000
I
2000
I
3000
FREQUENCY (HZ)
4000
500
- 60
-
Figure 3.12 Spectra of the synthetic consonant noise burst, /k/.
Plots are the same as in Figure 3.7.
In order to look at
spectrum
after
relative
changes
in
the
clipped
varying spectral frequencies and amplitudes,
random noise was filtered with the
1925
General
Radio
1/3
octave multifilter.
The output was clipped and analyzed with
the H.P.
Power spectra were
analyzer.
also
calculated
by
varying the parameters of the filter transfer function of the
synthesized
(Thus,
consonants
and
the
arcsin
law
was
applied.
the calculation was performed using a filter transfer
one
This
empirically.
used
be seen between the unclipped calculated and
can
difference
the
from
function different
measured spectra in the figures.) Spectra with a single
at
315
Hz
and
peak
1250 Hz are shown in Figures 3.13 and 3.14.
Spectra with two peaks are
shown
Figures
in
3.15
through
3.19.
Several observations can
modifications
fill
be
peaks.
3.13 and 3.14
the
spectral
much as
an increased noise
Second, as expected from the Fourier series for
a square wave, harmonics
spectral
about
First, clipping tends to
by clipping.
caused
in "valleys" in the spectrum,
floor would.
made
This
which
appear
at
odd
multiples
of
the
especially noticeable in Figures
is
a
display
single
peak.
Third,
the
-
61
-
amplitude relation between two spectral peaks after clipping,
I
I
I I
'
a
-to
a
-50
a-3
o
---
-7
as
E
I
I
IT
1~
LE0
to"M
Frequency (Hz)
I
I
2QSS
25Q
0
-10 r-
-20
-30
Z
4
-
d
9-50
ULU
156,
-60
-70
-80 0
Figure
3.13
500
100
FREQueNCY
([A500
2000
2500
Spectra of filtered narrowband noise centered at
- 62
-
The top plot is
315 Hz measured before and after clipping.
calculated from the arcsin law using a different transfer
function from that measured for the input noise. The bottom
plot is measured with the spectrum analyzer.
The format of
this plot is the same as in Figures 3.7-3.12.
_
a
I
I
I
I
I
t1--
-
CO
-
clipped
I
I
I
7095
aBG
0
,CU -50
0
Q-
If)
-0SI
-7,
-00
I
W
to"
2U06
I
4WQ
3UU
I
I
S"
Sees
Iees
I
teee
Frequency (Hz)
II
i
-10
-20
-30
-I clipped -S-50
.................-70
-80
IW
200
=7 =m
10000
-
63
-
As in Figure 3.13 but with input noise centered
Figure 3.14
at 1250 Hz.
_
1
1
1
1
a -1
-to
-2S--
C
0-4.
0
--
-
-
clipped
aO
-8sf--
-0S
L.
a
200
_258
Frequency (Hz)
0
-10
-20
-30
0--ippe
-50
-60
-70 1
-80
500
FREQUENCY
(RO
2OW
2UO
Figure 3.15
As in Figure 3.13 but with input noise peaks
centered at 250 Hz and 1250 Hz.
-
64
-10-
-20-
30
-
clipped
a.
(n
70-
-
-M
-80
0
low
2000
3000
4000
5000
Frequency (Hz)
-20
-30
-clipped
-60
-70
800
FREQUEICY
(HZ
-
65
-
Figure 3.16
As in Figure 3.13 but with input noise peaks
centered at 250 Hz and 1250 Hz.
I
-1
-Wo-
-201
C
q40
-
.-
H-
clipped
-
-------------
-80
I.
I
0
I000
I
300
200
I
4000
5000
Frequency (Hz)
I
I
OT
I
-10 1-20
-30
-
clipped
1-50
-I
-70
-80
U
LWJ
LIAAJ
FREQiENCy
30 C
(Hz)
50Y
3.17 As in Figure 3.13 but with input noise peaks
centered at 315 Hz and 1250 Hz.
-
66
-
Figure
3
i
I
-
-20-
30
c
---
,-
-clipped
-
-
140 L
a.
I)
-60
-70-
..
-.
0
1000
2008
3000
4000
Sam
Frequency (Hz)
0
-10
-20
-30
t-50
-70
-80o
mu
3m)
32(W)
FRE'UENCY (Hz)
-
67
-
Figure 3.18
As in Figure 3.13 but with input noise peaks
centered at 315 Hz and 1250 Hz.
I-
I
i
-10
-20
'30
- ~clipped
---
C
q401
a.
U)
C-
-60
tese
4090
-I-
iT
6000
Frequency (Hz)
0
-20
-
-
-10
-30
-
clipped
9-50
-70
-80 1
0
1 AJ
2W
FRElUNY (OZ
limo
-
68
-
Figure 3.19
As in Figure 3.13 but with input noise peaks
centered at 315 Hz and 1250 Hz.
when neither is obscurred by distortion components,
is nearly
the
Low-level
same as the input (Figures 3.16,
peaks can be
peaks,
obscurred
especially
by
the
3.18,
3.19).
harmonics
of
higher-level
if the frequency of the low-level peak is
greater than the frequency of the higher-level peak
3.15
vs.
3.16 and 3.17 vs.
also be enhanced
by
the
3.19).
harmonics
(Figures
The low-level peak can
of
higher-level
peaks
(Figure 3.15).
3.4.2 Phase Dependence of Clipped Voiced Sounds
For the case of voiced sounds, which consist of harmonic
components,
it
is
expected that the arcsin
applicable in general,
useful
although it
approximation.
predicting
the
clipped
In
this
spectrum
may
possibly
section
of
law will not be
the
harmonic
provide
a
problem
in
sounds
is
demonstrated.
The problem can be seen by constructing a
the
sum
phases
of
signal that is
harmonically related sine waves and varying the
of the components
(Figure 3.20).
The three
frequency
-
69
-
components were chosen in a one-third octave band with center
frequency of 1000 Hz.
Any prediction of the clipped spectrum based
the
input
dependent
on
It
the
is
clear
phases
that
of
clipped
the
in the input (panels A-C
components
spectra
individual
of
It will be noted that
consistent
cases
frequency
3.21).
Figure
the
three
the
across
gross
The
of
the
is unknown.
shape
spectral
clipped
follows the shape predicted by the arcsin law.
validity
are
from the arcsin law is shown is Panel D of Figure
prediction
rather
on
auto-correlation function would be independent of
the input phases.
3.21.
solely
is
spectra
and
However,
the
arcsin law approximation in other harmonic
Given that fine details
of
the
spectrum
are lost as a result of "critical-band" filtering by the ear,
an approximation as good as that shown in Figure
probably be adequate.
approximation
for
lengthy
which
task
3.21
would
Determining the goodness of the arcsin
harmonic
was
is
signals
not
pursued
a
worthwhile
but
here because of time
constraints.
If the tones in the signal are not harmonically related,
phase
shifts
time orgin,
the
of
components can be viewed as
a shift in the
and there should be no effect of phase shifts
clipped spectrum.
speech signals,
However,
this case is not relevant to
since the components
-
70
-
harmonically related.
on
of
voiced
sounds
are
VALUES
1i 0
8
0
(A)
0 --
-8
0
-16
VALUES
16( 0
0
0
(B)
0
VV
0r
-160
VV
VALUES
|
iC
||
0 --
(C)
-8
-1G6
BEG
=
.006400 SEC
MID = .01640 SEC
ENO = .02640 SEC
-
71
-
Figure 3.20 Differing time-waves caused by phase shifting.
The signal was generated by summing three harmonically
related tones with frequencies of 900, 1000, and 1100 Hz,
and relative amplitudes of 0, -2, and -5 dB respectively.
The three tones (900, 1000, and 1100 Hz) in Panel A have
zero phase. The phases of the three tones were respectively
The phases were 90,
0, 180, and 90 degrees in Panel B.
270, and 180 degrees in Panel C.
(A)
(B)
(OB)
T 10 0
f A
(DB)
-1100
s0
70
_
_
A7-..
SO
IWill I)I1A
IIHIH IIf 1
Jill HIA 111991111 i v
40
40
30
20
20
to
3000
2000
FREQUENCY (HZ)
Y
I
3000
2000
FREQUENCY (HZ)
4000
I
(08)
100
too
qG
90
1
80
IA
111111
MITI, I
41
r )j
0
4000
so
70
11
so
MIR h 1.1 1111 1 li
w
11U
M 11.11111W HIMIll H
LU.A.
11111MM 11
40
C40
U
30 V) 30
20
20
I-4
to
0
2000
3000
FREQUENCY (HZ)
4000
0
(C)
t000
2000
3000
4000
Frequency (Hz)
(D)
Figure 3.21
The phase dependence of harmonically related tones.
Panels A, B, and C show the clipped. magnitude spectra with
Panel D shows
the three signals in Figure 3.20 as input.
the input.
to
applied
law
the
arcsin
the prediction of
-
72
3.4.3 Spectra of Clipped Vowels
In order to observe the effects
spectra,
vowels.
measurements
of
clipping
on
vowel
were made with steady-state synthetic
The spectra of the vowels were
evaluated
with
the
which
the
ILS spectral analysis package.
The vowels were synthesized using a model
glottal
model
the
is passed through a filter designed to model
source
the vocal tract
(Rabiner and Schafer,
impulse
1978).
the
change
to
varied
were
In the
bandwidths of the formants.
an
present
Parameters
frequencies
of
and
implementation,
was used as a glottal pulse waveform in order to
remove the negative spectral tilt associated
natural
in
It
waveform.
is
easier
to
the
with
more
evaluate the clipped
spectra after eliminating this low-frequency emphasis.
frequency
The vowels were synthesized with a fundamental
of 125 Hz.
Thus,
the spectrum of each vowel is a sequence of
harmonics spaced 125 Hz apart with spectral-envelope peaks at
specified formant frequencies.
The spectra of clipped synthetic vowels are presented in
Figures
There are some major differences
3.22 through 3.39.
between the clipped spectra of vowels and noise.
spectrum
is
a
sequence
-
73
-
vowel
of
A
clipped
harmonics with the same
inter-component spacing of 125 Hz as the unmodified spectrum,
but with components extending to higher frequencies.
noise, harmonics appear at
peaks.
However,
odd
of
multiples
the
As with
spectral
"noise floor"
there is no counterpart to the
seen with clipped noise.
harmonics
Clipped
components
of
the
frequency
fundamental
can emphasize or de-emphasize
example, in Figures 3.22 through 3.25,
formant peaks.
For
the formant is at 1000
a multiple of the pitch period, and the clipped spectrum
Hz,
has peaks only at odd harmonics of this formant.
In
another
example where the formant of the vowel is higher in frequency
and not exactly at a harmonic of the
3.26-3.29,
at
low
as
an
However,
the
original
(Figures
formant
absolute maximum in the spectral
The extent to which the formant is
spectrum,
period
the clipped spectrum contains other spectral peaks
frequencies.
preserved
pitch
in
retained
even when the input spectr'al peak is
the
is
envelope.
clipped
very shallow,
is quite remarkable (Figures 3.25 and 3.29).
The clipped spectra of vowels
peaks
with
two
formants
have
at not only odd multiples of the formants but at other
frequencies as well
emphasize
(Figure 3.30).
individual
Again,
-
the unmodified spectrum (figure 3.31).
74
to
that are difficult to see in
formants
-
clipping seems
In Figures
3000
Hz)
3.32-3.34,
the two formants
(at 1000
Hz
are not only harmonically related to the pitch, but
the higher frequency formant is an odd multiple of the
frequency
and
formant.
After
clipping,
lower
narrow spectral peaks
Even
appear at only odd harmonics of the formants.
as
the
bandwidths of the formants are varied, no new peaks appear in
the spectrum (as seen in Figures
spectrum
3.30
and
remains unchanged except for the
3.31),
and
the
spectral height of
the peaks.
When two formants are not harmonics of
pitch
frequency,
fundamental
the formants are preserved but the clipped
spectrum contains other spectral peaks
to noise,
lower-amplitude
-
(Figures
75
3.35-3.39).
formants in the spectrum
(Figure 3.37).
are obscurred after clipping.
-
Similarly
the
(OB)
1 00
70
6o
30
20
10
1000
0
3000
2000
FREQUENCY (HZ)
4000
500
08)
100
60
'A(
70
60
20
10
0
0
1000
2000
3000
FREQUENCY
4000
S000
(HZ)
-
76
-
Figure 3.22 Spectra of a synthetic steady-state vowel measured
The
before (top plot) and after (bottom plot) clipping.
Hz.
50
formant frequency is 1000 Hz and bandwidth is
(DB)
100
80
70
Ah I I I
A
so
11111 HA V IIIIIIIIII IIIIIII III I)IIIII IIIIIIIII
I IIIII 11 111111
lifill"
111IM 11 , IIH IIIII
11111111
11 H IM
111111111MI IIII
uvd
Jj ju
tin m 4fW
30
20
10
0
0
2000
3000
FREQUENCY (HZ)
4000
5000
(OB)
100
80
I
A
70
so
20
10
0
0
1000
2000
3000
FREQUENCY
4000
5000
(HZ)
Figure 3.23 As in Figure 3.22 with a 1000 Hz formant frequency
and a 200 Hz bandwidth.
-
77
100
90
80
I
Hit
IlHil
s0
111111111 11IN I II 111 11111111
11111 11H I111111
11111 1111IN 111111111
1111
11H
1IlUV '
W -44M
1111 1111 H ill 11 111 111 1111 1
III III
II;
NI W
11111
30
20
10
0
0
3000
2000
FREQUENCY (HZ)
4000
5000
(0B)
100
70
11111 .1111
H I IJ 11 H IH A III . III Jill
so
A il 111
111 11
11111 1 11111111
11 11111 ji I I I Ill 111
1111 H I
111 1 1 111, 111 11 1111111 H I II I M 1111 1 111111111
111H I M 11111H 11111111
111111 1HIH
IIN IIHMv 30
IU I I I20
to
0
0
1000
2000
3000
FREQUENCY (HZ)
5000
Figure 3.24 As in Figure 3.22 with a 1000 Hz formant frequency
and a 500 Hz bandwidth.
- 78
(MB)
100
I
W
111
)I
11
IHII
U
F-
11 11
A
I
'
I
!111
HI I III M
70
1]~~~~~ 111- 1IIII I
s0
30
20
10
0
0
3000
2000
FREQUENCY (HZ)
(OB)
- 100
80
70
*
*
j
li.1
60
11
40
30
20
10
1000
0
2000
3000
FREQUENCY (HZ)
4000
0100
As in Figure 3.22 with a 1000 Hz formant frequency
Figure 3.25
and a 1000 Hz bandwidth.
-
79
@- - -
--
- ..
-1, 11/ --
, - -
-
-z A
- ---------- -
.(OB)
1 0
80
70
II. I i
IIIII
.
fva
1
f
I
s0
l
I I
I Im
30
20
0
0
1000
I
4000
3000
2000
FREQUENCY (HZ)
1.
I.
,I
I________
-
(DB)
1100
I__
SO
70
f
______________
60
-
- -
1
-
-
________
20
10
0
1000
2000
FREQUENCY
3000
(HZ)
4000
EQO
0
Figure 3.26
As in Figure 3.22 with a 2700 Hz formant frequency
and a 50 Hz bandwidth.
-
80
-
-1-
I
I
I
DB)
100
90
80
70
IliI________1_11
60
40
30
to
2000
3000
FREQUENCY (HZ)
4000
0
5000
(OB)
1 100
60
70
60
III III
'lH 1 if iM H
1
40
20
to
0
0
1.000
2000
3000
FREQUENCY (HZ)
S000
-
81
-
Figure 3.27 As in Figure 3.22 with a 2700 Hz formant frequency
and a 100 Hz bandwidth.
80
70
60
40
30
20
20il30l
01
0
0
2000
3000
FREQUENCY (HZ)
4000
Eooo
(OB)
100
80
HIM
M HIM
II
IIIH
JI 11111 I I 1
111111 11111 HIM
I 1 11 ji
il 11 4111 1 H M
W HIC HIIIIIII11 . 1111H II
.114 111 11 1,111
( H ill 1 11 11
11 U ll
I'll 1 11
Ij I I IM I 11 1 '1 H IP 11
1
lj 11
11j
I( 11I 111 1 M
11111H IIII. 1111111
11111 1 111 111
1 IN I I
III
Io
70
60
SO
40
30
20
10
0
1000
2000
3000
FREQUENCY (HZ)
4000
0
5000
Figure 3.28 As in Figure 3.22 with a 2700 Hz formant frequency
and a 500 Hz bandwidth.
-
82
III HIM HINIII IIM IIII MI
Jill I IIII Jill
11
H 111 111111-11
111 11111 1111111 1 111
9 1 11
s0
1111 1111
11Jill
0 UA-1.1
1) [1 Ij
so
111111( U l
III HIM
30
20
t0
0
1000
0
3000
2000
FREQUENCY (HZ)
4000
5000
(DB)
t00
80
I
E
70
A
60
30
20
to
1000
0
2000
3000
FREQUENCY
5200
-
0
(HZ)
Figure 3.29 As in Figure 3.22 with a 2700 Hz formant frequency
and a 1000 Hz bandwidth.
-
83
100
I
I I
s0
l
70
Lai
60
30
10
0
0
3000
2000
FREQUENCY
4000
(HZ)
DB)
100
ilt(
70
60
50
30
20
10
0
0
3000
2000
FREQUENCY
4000
5000
(HZ)
- 84
-
First formant frequency is 1000 Hz
Figure 3.30 As in Figure 3.22.
formant frequency is 2000 Hz with
Second
with 50 Hz bandwidth.
50 Hz bandwidth.
100
60
A
70
T. IN III I
11
11
1
11
HIM
IN
J
11
I I I],
r4
i
4
1111 I
HIP
11111 11
1111111111.
1111i 11
11
Ili Ilim , 1 1
.1.
I fl 11 1111111 11
I I 111 J I
llu l I 1
1
1)11
1 11111 11,11MIll
111
1) 111111 1 .. 11 1111
I III HIM
I
P,
I
I
1 11P III 1 11
111111
JIM Mill Hil I
30
10
0
0
1000
4000
2000
3000
FREQUENCY (HZ)
Soo
100
IN
I 111 L
II II II I
t
70
60
IIIIIIIIIII I H 111111 I I I i ill
I IIIIH I
11111 11 ll 11.1 U1111 i
j 1 1 1 ' Jill i ll
Hil
0
111111 1 11 H I h ill 11
H il
11M 11 M ill 1.
1111I I1 1111111 1
i ll
ill
Hill
L
10
0
0
2000
3000
FREQUENCY
4000
E000
(HZ)
Figure 3.31 As in Figure 3.22.
First formant frequency is 1000 Hz
with 500 Hz bandwidth.
Second formant frequency is 2000 Hz with
-
85
-
500 Hz bandwidth.
00
A1
so
70
60
30
20
10
0
0
1000
2000
3000
FREQUENCY (HZ)
4000
5000
OB)
100
___(
630
70
60
30
20
0
2000
3000
FREQUENCY (HZ)
1000
4000
S000
-
86
-
As in Figure 3.22.
First formant frequency is 1000 Hz
Figure 3.32
with 50 Hz bandwidth.
Sencond formant frequency is 3000 Hz with
500 Hz bandwidth.
\
t
I
IH
II
0sL.Jj
100
I H IIII
30
20
to
0
0
1000
Sao0
3000
2000
FREQUENCY (HZ)
(DB)
1 100
s0
70
s0
,
---------
40
20
10
0
0
1000
2000
3000
FREQUENCY
4000
S000
(HZ)
-
87
-
Figure 3.33 As in Figure 3.22. First formant frequency is 1000 Hz
Second formant frequency is 3000 Hz with
with 100 Hz bandwidth.
bandwidth.
100 Hz
(Ut:$)
1 100
90
70
.~
..
11 11 111M
INIIII
III
60
30
10
0
a
1000
2000
3000
FREQUENCY (HZ)
4000
S000
(OB)
100
80
--
70
60
40
30
20
10
0
100
2000
3000
FREQUENCY (HZ)
0
4000
As in Figure 3.22. First formant frequency is 1000 Hz
Figure 3.34
Second formant frequency is 3000 Hz with
with 500 Hz bandwidth.
-
88
-
50 Hz bandwidth.
__A
I
I___
100
___
s0
40
30
20
10
0
0
1000
2000
3000
FREQUENCY (HZ)
4000
SOO
OB)
100
70
30
IH
I~
II
50
30
20
to
0
Figu re 3.35
1000
2000
3000
FREQUENCY (HZ)
4000
5000
0
As in Figure 3.22. First formant frequency is 1000 Hz
Second formant frequency is 2700 Hz with
with 50 Hz bandwidth.
50 Hz ba ndwidth.
-
89
-
\ 1-
100
80
A
I I A
60
11111 11 1
H ill
M ill
4 11111 11
11111
Jill
20
10
0
1000
4000
2000
3000
FREQUENCY (HZ)
5000
OB)
100
80
I
A
70
s0
jt
U
H IM
11111111111
I 11 Ill I I
W,
J
I
111111
IIJ 11,11
d
IN l I
Ill H ill
H I L
11 J I I I I
I I II
I I II I J
M illi I
ji H II
IIIIIII
11111111
M ill
20
10
0
0
1000
2000
3000
FREQUENCY
4000
E000
(HZ)
First formant frequency is 1000 Hz
As in Figure 3.22.
formant frequency is 2700 HzSecond
bandwidth.
with 500 Hz
with 500 Hz bandwidth.
-
90
-
Figure 3.36
1 100
80
70
6o
I- J1111
H
11 11 IN
I II U
111
11 (1 IRM
11114 1
AI 1
I I
jII1
O-
11.1
11
1
11111 11 1
111111
1111H
111 1 11111111 11
Iffl - t 1 111111 111 1111
9 111 1.1
U1111 ti 11
1i 11 11 1111111111 1 1
I
30
20
p
I
P I
I
I
l'
I
10
0
0
2000
3000
FREQUENCY (HZ)
4000
S000
(B)
100
70
630
-
-
-
4
I
1'i
30
20
10
0
1000
3000
2000
FREQUENCY
4000
0
5000
(HZ)
As in Figure 3.22. First formant frequency is 600 Hz
Figure 3.37
Sencond formant frequency is 2300 Hz
Hz
bandwidth.
with 50
with 500 Hz bandwidth.
-
91
80
11
I W
911
IIM 111
1111 -
111
5
s0
J1
IIIHIIII
PH 111W40101
i
so
IMU HIM I
HIM 11111
IIIH IIII...
11
.30
10
0
0
4000
(
3000
2000
FREQUENCY (HZ)
OB)
100
s0
1 A I
I I HIM
... 1 M l
I
1
1
1 l
1 1
A l 1 1
11 1
30
:1
20
10
0
0
3000
2000
FREQUENCY (HZ)
4000
500
First formant frequency is 600 Hz
As in Figure 3.22.
Figure 3.38
Second formant frequency is 2300 Hz
with 100 Hz bandwidth.
with 100 Hz bandwidth.
-
92
DB)
100
I AI(
I
11
s0
30
20
10
4000
3000
2000
FREQUENCY (HZ)
0
(DB)
100
60
H ill I 1
1111 11 11
1 1
so
I
IJ 111
wif
111'
i IN A
till 1
11 11 H
11 U I Ill
30
20
10
(0
0
2000
FREQUENCY
3000
(HZ)
4000
5000
Figure 3.39
As in Figure 3.22.
FIrst formant frequency is 600 Hz
with 500 Hz bandwidth.
Second formant frequency is 2300 Hz
with 50 Hz bandwidth.
-
93
Chapter
4
Discussion
The present study was designed to assess
clipping
as
The study was
of
two
about hearing and the perception of speech.
The
analyzes
intelligibility
of
speech.
determine
infinite
into
"critical-band"
The
based
if
peak
was
investigation
on these assumptions:
spectral
an
clipping
answer
what are the effects
infinite-peak
distortion be minimized?
amplitude
to
designed
compression
clipping,
and
The goal was to
system
based
on
be designed which would give
could
maximum compression and minimum spectral distortion,
maximizing
cues
lie in the spectral (magnitude) patterns
of post-filtering speech after
can
sound
the
The second assumption is that the primary
to
how
on
the outputs of which are relevant to the perception
loudness.
questions
based
following
first is that the ear
filters,
peak
a means of reducing the amplitude variations in
speech levels.
assumptions
infinite
thereby
intelligibility for individuals with sensorineural
-
94
-
hearing loss.
4.1 Compression Results
The general result from the study of compression was the
insensitivity of the amplitude range of clipped
speech to the
characteristics of the filters that preceded and followed the
of
which pre-filters and post-filters
clipper.
Regardless
were used,
clipping reduced the range of speech by
least
Output ranges were on the order of 10 to 20 dB.
15 to 20 dB.
The
at
general
trends
the
of
pre-
of
effects
and
post-filtering clipped speech can be understood qualitatively
with concepts discussed by DeGennaro
et
al.
(1981).
in a bandpass speech envelope can be modeled as
fluctuations
slow
and
of
components
result from the narrowband filtering
(the rate of
distributions
component
the
in
across
(with
results
from
intensity
overall
different
accounts for about
The slow component
few per second)
fluctuations on the order of a
speech
measured
slow component.
that
is
slowly
and
spectral
in
The
sounds.
30 dB of the range.
approximation, the effect of infinite peak
range
account
and
fluctuations are on the order of the bandwidth)
for 10-15 dB of the measured range.
rapid
The
components.
rapid
consisting
variation
The
slow
To a first
on
clipping
the
after narrow post-filtering is to remove the
An example is a third-octave band
modulated
noise
over a large decibel range,
then
with
the
95
-
clipped and post-filtered by a third-octave filter
-
of
same
center
frequency.
The
unaffected by the slow modulation,
zero-crossing
pattern
so the clipped noise
exhibit the same envelope variations as
is
will
if there were no slow
modulations.
The preceding analysis applies to
post-filter is
case
where
the
in the spectral region that is dominant at the
input to the clipper.
time,
the
variations
If this dominant region
changes
over
in the clipped spectral distributions will
contribute to the measured output range.
This discussion aids in interpreting
with
no
pre-filtering
the
16
band
the
ranges
smaller than with a highpass pre-filter (the
or
differentiator),
filters).
(and
or
a
bandpass
With no pre-filtering,
hence
the
sequence of
regions
will
reduced
zero-crossing
in
be
amplitude
sequence
speech
relatively
by
will
filter
(formant
waveform
is dominated by
The harmonics in the higher
various speech sounds after clipping.
is
pre-filter
the input
that
were slightly
optimal
zero-crossings)
the low-frequency first formant.
frequency
finding
constant across the
If
the
first-formant
highpass filtering,
exhibit
greater
the input
variability,
resulting in greater variation in the clipped spectrum across
-
96
-
the different speech sounds.
It was also observed that ranges were reduced further if
pre-filter bandwidth was narrowed to one-third octave or
the
less.
This effect can be understood by considering the
of
very
a
small
The input to the
bandwidth.
pre-filter
case
clipper will approximate a tone with slowly varying amplitude
and
Since the clipped signal will be
modulation.
frequency
the
third-octave
frequency modulation and an
from
the
the
less
with slow
modulation
resulting
a
the
and
modulation
there will be
modulation
amplitude
Thus,
post-filter.
frequency
the less
narrower the input band,
consequently
by
of
tone
be
amplitude
conversion
FM-to-AM
also
will
filter
output
the
frequency modulated with constant amplitude,
after post-filtering.
The finding that
dependent
inplies
the
clipped
on
the
bandwidth
that
the
exact
filters
critical-band
or
is
range
of
slope
not
be
strongly
the post-filter
of
characteristics
need
not
the
ear's
known in order to make
meaningful measurements of range.
It is interesting to speculate on
minimimum
range
considers
that
compression
the
variations
a
system.
When
-
10-15 dB,
one must
in the speech envelope could be
97
-
how
is
the envelope ranges of narrowband noise
and harmonic complexes are on the order of
wonder
there
of speech levels that can be achieved after
(auditory) post-filtering by any
one
whether
further reduced.
these
It would seem
envelope
that
further
reduction
of
variations could be achieved only in special
cases.
4.2 Comparison with AGC Systems
Krieg
achieved
(1980)
by
studied
multiband
independent AGC
amplitude
in each band.
was CVC lists spoken by a male talker,
ratios
third-octave
across all
in
0.5
ranges
16
channels
were
channels
the
compression,
was
and
the
to
25
compression
input
dB,
averaging
36.8 dB
Figure
After
averaging 24.3 dB
1980,
Figures
in the
present
speech by approximately 20 dB,
the AGC system used by Krieg
results
9).
11 and 15)
and
ratio was approximately 1.5.
Infinite peak clipping,
intelligibility
The
16).
The actual amount of compression
approximately 12 dB (Krieg,
of
42
1980,
(Krieg,
all frequency bands.
range
and
11
ranges were 14 to 38 dB,
the effective compression
the
The input speech
in the 16 frequency bands were 3 (with a deviation of
approximately
across
compression
(1980).
from
Assuming
study,
reduced
8 dB more than
that
greater
presenting a greater range of
speech within the residual hearing area of the listener, then
clipping
should
allow superior intelligibility results when
-
98
-
the listener's dynamic range is less than about 20 dB.
4.3 Spectral Modifications
The spectral pattern of clipped unvoiced sounds
predicted from the arcsin law (Section 3.4).
peaks.
Clipping
the
spreads the spectrum,
the low-level region much as a noise
be
After clipping,
spectral peaks will appear at odd multiples of
spectral
can
floor
original
filling in
would.
Clipped
spectra with two formants have harmonic peaks at not only odd
multiples of the
formants
but
at
combination
tends to obscure
lower-frequency formant, when stronger,
The
frequencies.
the higher-frequency formant.
The clipped spectra of voiced sounds
the
phases
are
of the input frequency components,
law cannot be aplied.
Except for the fact that
dependent
on
so the arcsin
the
clipped
harmonic spectrum contains frequency components with the same
fundamental spacing as the unmodified spectrum
is
no
counterpart to the
the general
sounds.
results are
"noise floor"
similar
with
(i.e.
there
of unvoiced sounds),
voiced
and
unvoiced
Spectral peaks which appear at odd multiples of the
formants and
combination
frequencies
after
clipping
will
obscure the low-level formants.
A
which
remarkable aspect of clipping vowels
spectral peaks can be enhanced.
This
is the degree
finding suggests
99
-
that infinite peak clipping could be exploited in schemes
-
to
for
formant detection and tracking.
4.4 Implications for Hearing Aid Design
As stated above,
the
effects
of
the present study sought
to
determine
infinite peak clipping in order to minimize
both output ranges in critical-band-like filters and spectral
distortions.
It was found that output ranges are relatively
insensitive
to
pre-filtering.
post-filtering
and
If
output
minimizing
be
criterion,
there
narrowband
pre-filtering.
and
would
Hildebrant's
(1982)
1100-Hz
formant
intelligibility.
on
ranges were the only
no
pre-filtering
highpass
or
filter
and
filters produced ranges only a
few dB larger than no pre-filter and
greater
dependent
However, pre-filters like Thomas
(1968)
Niederjohn's
either
slightly
are
known
to
produce
The differentiator pre-filter gave
the largest ranges of the pre-filters investigated.
The present results on spectral distortions will help to
determine a choice of pre-filter.
filters produce
relative
the
spectral
amount
greatest
qualities
lost in such a system.
Even though the narrowband
of
compression,
among frequency bands would be
For multiple-formant sounds
seem that a pre-filtering scheme that separates
like Hildebrant's
formants
(1982), would be beneficial
it
would
the formants,
for
preserving
(without losing the relationship between bands)
-
100
-
the
the
and avoiding
interactions
in
the
clipper.
However,
for
single-peak sounds, this scheme may result in spurious output
peaks,
or a less well-defined peak than
this
respect,
a
single-channel
in
the
clipper
input.
with
In
highpass
pre-filtering might prove superior.
Of course, the particular loss of the individual
also
be
considered.
only on first formant
formant
hearing,
then
should
For example if the individual depends
cues,
and
has
virtually
no
second
pre-filtering with an optimal filter
will hinder intelligibility by decreasing the saliency of the
cues.
4.5 Recomendations for Future Work
Future work should test intelligibiltiy
and
clipper
hearing-impaired
Ideally,
optimal
filter/clipper
listeners
with
small
with
a
3-band
systems
dynamic
on
ranges.
results could be compared with performance on other
amplitude
Examination
compression
of
the
systems
types
hearing-impaired and normal
using
of
the
speech
same
errors
subjects.
for
both
subjects should help to determine
-
101
-
the next course of action.
REFERENCES
Braida,
L.D.,
et al.
(1979).
"Hearing
Aids--A
review
of
Past
Research
on
Linear
Amplification,
Amplitude
19
Compression, and Frequency Lowering," ASHA Monograph No.
Controlled
Multiband
"A
Computer
(1979).
Coln, M.C.W.
Tech.,
Inst.
Amplitude Compressor," Master's Thesis, Mass.
Cambridge, MA.
"A
(1981).
N.I.
Durlach,
De Gennaro, S., Braida, L.D.,
Amplitude
Speech
of
Third-Octave
Analysis
Statictical
the
Distributions," Paper presented at the 101st meeting of
Accoustical Society of America, Ottawa, Ontario, Canada
De Gennaro,
S.
Krieg,
K.R.,
Braida
L.D.,
Durlach,
N.I.
"Third-Octave Analysis of Multichannel Amplitude
(1981).
Compressed Speech," Proceedings of IEEE Inter.
Conf.
on
Acoust.,
Speech and Signal Processing,
Atlanta,
Georgia
De Gennaro, S.
(1982).
"An Analytical
Study of Syllabic
Compression
for Severely
Impaired Listeners," PhD Thesis,
Mass.
Dunn,
Inst.
Tech.,
H.K., White,
Cambridge,
S.D.
MA.
(1940).
"Statistical
Measurements
on Conversational Speech," JASA 11, pp278-288
Fawe, A.L.
Speech
(1966).
"Interpretation of
Properties,"
IEEE
Infinitely
Trans.
on
Clipped
Audio
and
.Electroacoustics 14, ppl78-183
Green, D.M., Swets, J.A.
(1966).
Signal
Detection
and Psychophysics, (John Wiley and Sons, New York)
Guidarelli,
G.
(1981).
"Intelligibility
and
Theory
Naturalness
Improvement of Infinitely Clipped Speech," Acoustics Letters
4, No.7
Hildebrant, E.M.
(1982).
"An Electronic Device
to
Reduce
Thesis, Mass.
Dynamic
Range
of
Speech,"
Bachelor's
the
Inst.
of Tech., Cambridge, MA
practice
"Recommended
Trans.
Audio
IEEE
-
102
-
IEEE
(1969).
measurements."
pp225-246
for
speech
quality
17,
Electroacoust.
Kac, M., Siegert, A.J.F.
(1947).
"On the Theory
of
Noise
in
Radio
Receivers with Square Law Detectors," J.
Applied
Physics 18, pp383-397
Krieg, K.R.
(1980).
"Third Octave Band Level Distributions
of
Amplitude
Compressed
Speech," Bachelor's Thesis, Mass.
Inst.
of Tech., Cambridge, MA.
Licklider, J.C.R.
(1946).
"Effects of Amplitude Distortion
upon the Intelligibility of Speech," J.
Acoust.
Soc.
Am.
18, pp429-434
Licklider, J.C.R., Bindra, D., Pollack,
I.
(1948).
Intelligibility
of
Rectangular
Speech-Waves,"
Am.
"The
J.
Psychology 61, ppl-20
Licklider,
J.C.R.,
Pollack,
Differentiation,
Integration,
I.
and
(1948).
Infinite
"Effects
of
Peak Clipping
upon the Intelligibility of Speech," JASA 20, pp42-51
Papoulis, A.
(1965).
Stochastic Processes,
Pollack, I.
(1952).
Amplitude
Distortion
Noise," JASA 24,
Rabiner,
I.B.,
Random
Variables
and
"On the
Effect
of
Frequency
and
on
the
Intelligibility of Speech in
pp538-540
L.R., Schafer, R.W.
Speech Signals,
Thomas,.
Probability,
(McGraw-Hill, New York)
(1978).
(Prentice-Hall,
Digital Processing of
New Jersey)
Niederjohn,
R.J.
Intelligibility of Filtered-Clipped
Audio Engin.
Soc.
18, pp299-303
(1970).
Speech
"The
in Noise," J.
Thomas, I.B.,
Sparks,
D.W.
(1971).
"Discrimination
of
Filtered/Clipped
Speech by Hearing-Impaired Subjects," JASA
-
103
-
49, ppl881-1887
Appendix A
Probability Densities of Test Signals
Narrow-band Noise
The probability density
generated
for
the
random
variable,
V,
squaring and then lowpass-filtering a band of
by
noise centered at a high frequency was calculated by Kac and
Siegert
(1947)
to
00
P(V)
i
where
integral
=
be
exp(-V/) s)
----------------s
(1
i3s)
is the
ith
eigenvalue
noise spectrum is that of a
lowpass
filter
is
in
(Al)
the
a
"simple-tuned" circuit
simple
RC-lowpass
eigenvalues are found from the zeroes of the
Bessel
function,
where
solution
L2(R/,
i)
an
R
and
the
filter,
the
(R-1)th
J = 0.
(A2)
104
-
R-/
-
order
is the ratio of the input noise
bandwidth to the lowpass filter's -3 dB bandwidth.
J
of
the specific case where the input
For
equation.
V>O
Probability density functions were computed for values of
between
0
and
1
(Figure
(Figure A2).
The case of
Rayleigh
random
Al),
R=0
and values between 1 and 8
corresponds
variable
to
that
squared,
exponentially-distributed variable.
of
or
The range was
equal
distribution
at which the cumulative
to 0.10 and 0.90,
a
an
computed
to be the decibel difference between the the two values,
and V90,
R
function
V10
is
The range values are
respectively.
graphed in Figure 2.2.
Tones
wave
An input sine
frequency
amplitude
and
is considered in this section.
f=w/2T(
and lowpass
unit
with
filtering of this signal results
cyclic
Squaring
in the variable
1 - /H(2f)//cos(4Thft)
--------------------- (A3)
V =-----------
2
where
H(f)
=
1/
1+RA
and R is the ratio of f
to
the
-
105
-
RC-lowpass filter.
-3
dB
bandwidth
of
the
Because of symmetry in the time-wave,
can
be
limited
cumulative
to
the
calculations
the range for c^'= 47'fft of 0<
distribution
function
for
V
is
/<17/.
found
The
by
integrating the probability density function on o/ (which is
assumed uniformly distributed)
between the
limits determined
by the inverse function,
cX =F
(1-2V)
(V) = cos L ------------
]
(A4)
.
/H(2f)/
The cumulative distribution function is derived below.
Pr(c/< F
=
(Vo))
/v
=
arccos L(1-2V)// H(2f)/
The two values,
distribution is equal
equation A5.
these
two
-
Pr(V< Vo)
V10 and V90,
at
I
which
(A5)
the
cumulative
to 0.10 and 0.90, were calculated from
The range, which is the decibel difference
values,
is graphed as a
-
106
-
2.3.
function of R
of
in Figure
I
-
2.0
~~1
R
.8
1/2
1/4
1/8
0 (exponential
density)
t-
.41
0.0
0
2
1
3
4
V
-
107
-
Figure Al Probability density functions for narrowband noise
after squaring and lowpass filtering for R less than 1.
input noise bandwidth to the lowpass
R is the ratio of the
filter's -3dB cutoff.
-
2 .0
1.8
R8--1
R -- 8
/
I
1.2
\
\
-1
4
-x
>\
2
14
0
1
2
3
4
V
-
108
-
Figure A2 Probability density functions for narrowband noise
after squaring and lowpass filtering for R greater -than or
equal to 1.
APPENDIX B
10 Harvard sentences used as
is male.
1. A
input for experiments.
Speaker
large size in stockings is hard to sell.
2.
The birch canoe slid on the smooth planks.
3.
Glue the sheet to the dark,
4.
It's
5.
These days a chiken leg is a rare dish.
6.
Rice is often served in round bowls.
7.
The juice of lemons makes fine punch.
8.
The box was thrown beside the parked truck.
easy to tell
blue background.
the depth of a well.
9. The hogs were fed chopped corn and garbage.
-
109
-
10. Four hours of steady work faced us.
Download