Analysis of Frequency-Smearing Models

Analysis of Frequency-Smearing Models
Simulating Hearing Loss
by
Alan T. Asbeck
Submitted to the Department of Electrical Engineering and Computer
Science
in Partial Fulfillment of the Requirements for the Degree of
Master of Engineering in Electrical Engineering and Computer Science
at the Massachusetts Institute of Technology
May 21, 2003
Copyright 2003 M. I. T. All rights reserved.
MASSACHUSETTS INSTITUTE
OFTECHNOLOGY
UL 302003
LIBRARIES
Author
Department of Nfectrical Engineering and Computer Science
May 21, 2003
Certified by......
.
.......................................
Louis D. Braida
Henry Ellis Warren Professor of Electrical Engineering
Thesis Supervisor
Accepted by ...
.....................
Arthur C. Smith
Chairman, Department Committee on Graduate Theses
BARKER
Analysis of Frequency-Smearing Models Simulating Hearing
Loss
by
Alan T. Asbeck
Submitted to the Department of Electrical Engineering and Computer Science
on May 21, 2003, in partial fulfillment of the
requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
Abstract
Various authors have created models of the psychoacoustic effects of sensorineural
hearing loss that transform ordinary sounds into sounds that evoke the perception
of hearing loss in normal-hearing listeners. In this thesis, models of the reduced
frequency resolution, reduced temporal resolution, and loudness recruitment and
absolute threshold loss were evaluating using a model of human perception. Confusion
matrices of vowels were simulated using these models and compared to confusion
matrices from five hearing-impaired subjects. It was found that the model of human
perception used spectral cues in a different way than actually occurs in humans,
making it difficult to evaluate the hearing loss models, and causing the simulated
confusion matrices to be different than the real subjects' matrices. Even so, the
frequency smearing models caused the total error rate to increase with increasing
smearing bandwidth, and the results were generally consistent with the expected
behavior.
Thesis Supervisor: Louis D. Braida
Title: Henry Ellis Warren Professor of Electrical Engineering
2
Acknowledgments
I first want to thank my advisor, Professor Louis Braida, for his help and support
throughout this project, and for giving me a project in the first place. I especially
want to thank him for putting up with my tendency to procrastinate, and for spending
so much time reading my thesis and being there at the end.
I want to thank Michael "Q" Chin, for really making my time here a wonderful
experience, and for helping me think through this project and other things throughout
the year. I've really appreciated the time we've been able to get to know each other,
including everything from the "sea-snakes" on the chalkboard and talking about the
various topics you have come up with.
I also want to thank Peninah Rosengard, Charlotte Reed, Andrew Oxenham, and
everyone else in the Sensory Communication group that has helped me along the way.
Finally, I acknowledge the NIH and NSF, which have provided funding for my
work on this thesis.
3
Contents
1
25
Introduction
1.1
Background . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
25
1.2
Physiology of Hearing Impairment . . . . . . . .
. . . . . . . . . .
25
1.3
Psychoacoustics of Hearing Impairment . . . . .
. . . . . . . . . .
28
1.4
Simulating Hearing Loss . . . . . . . . . . . . .
. . . . . . . . . .
31
1.4.1
Reasons for Simulating Hearing Loss . .
. . . . . . . . . .
31
1.4.2
Previous Work Simulating Hearing Loss
. . . . . . . . . .
32
. . . . . . . . . .
35
1.5
Problem Statement . . . . . . . . . . . . . . . .
36
2 Signal Processing in the Experiment
36
2.1
Overview of Experiment . . . . . . . . . . . . . . . . . . .
2.2
Overview of Signal Processing . . . . . . . . . . . . . . . .
. . . . .
37
2.2.1
Processing by Real Subjects . . . . . . . . . . . . .
. . . . .
37
2.2.2
Processing by the Simulation
. . . . . . . . . . . .
. . . . .
37
2.3
Detail on the Brantley Confusion Matrix Experiment . . .
. . . . .
40
2.4
Speech Stimuli and Vowel Extraction . . . . . . . . . . . .
. . . . .
41
2.5
Signal processing simulating hearing loss . . . . . . . . . .
. . . . .
42
42
. . . . . . . . . . .
2.5.1
Methods of frequency smearing
2.5.2
Method of Recruitment . . . . . . . . . . . . . . . .
. . . . .
61
2.5.3
Combining the Smearing Methods and Recruitment
. . . . .
71
2.6
d' Discrimination Model
. . . . . . . . . . . . . . . . . . .
. . . . .
71
2.7
Decision M odel . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
75
4
3
78
Results
3.1
Total Error Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
3.2
Locations of Errors in Confusion Matrices
. . . . . . . . . . . . . . .
79
Comparison of Moore and ter Keurs Methods . . . . . . . . .
84
Effects of Processing Parameters on the Results . . . . . . . . . . . .
85
3.3.1
Loudness recruitment and Audibility . . . . . . . . . . . . . .
85
3.3.2
Male vs. Female Speakers
. . . . . . . . . . . . . . . . . . . .
86
Results with Simulating Full Audibility . . . . . . . . . . . . . . . . .
87
3.4.1
Error rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
3.4.2
Locations of Errors . . . . . . . . . . . . . . . . . . . . . . . .
89
3.4.3
Comparison of Moore and ter Keurs Methods . . . . . . . . .
89
3.2.1
3.3
3.4
4 Discussion
4.1
Why the Simulation Did Not Work . . . . . . . . . . . . . . . . . . .
118
4.2
Other Sources of Error . . . . . . . . . . . . . . . . . . . . . . . . . .
122
4.3
Conclusions that Can Be Drawn . . . . . . . . . . . . . . . . . . . . .
122
4.3.1
Total Error Rates and Smearing and Recruitment . . . . . . .
123
4.3.2
Comparing Moore and ter Keurs Smearing . . . . . . . . . . .
126
4.3.3
The Effect of Audibility . . . . . . . . . . . . . . . . . . . . . 127
4.4
5
118
Potential Perceptual Models in Humans . . . . . . . . . . . . . . . . .
Conclusions
128
132
5.1
Summary of Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.2
Relevance to Hearing Aid Signal Processing
5.3
Recommendations for Future Work . . . . . . . . . . . . . . . . . . . 133
. . . . . . . . . . . . . . 133
A Audiograms and Confusion Matrices for the Real Subjects
135
B Tables of Computed Confusion Matrices
140
C
Plots of x 2 Values
151
5
D Recommendations for an Improved Simulation
183
D.1 Rationale and Background ........................
183
D.2 Overview of Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
183
D .3 Details of M odel
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
184
D.4 Simulating Hearing Loss . . . . . . . . . . . . . . . . . . . . . . . . .
186
D.5 Possible Sources of Error . . . . . . . . . . . . . . . . . . . . . . . . .
187
References
189
6
List of Figures
1-1
Figure showing how the basilar membrane is compressive for tones at
the best frequency of the location of measurement. This figure shows
"the level of a masker required to mask [a] 6-kHz [tone] signal, as
a function of signal level" (Oxenham and Plack, 1997). The plot of
the on-frequency masker is linear, as would be expected.
However,
the off-frequency masker shows compression, indicating both that the
basilar membrane is compressive and that it responds linearly for tones
an octave below the best frequency of a given location. Taken from
Oxenham and Plack, 1997. . . . . . . . . . . . . . . . . . . . . . . . .
1-2
27
Loudness matching curves for subjects with unilateral hearing loss.
The thin line at a 45-degree angle is the expected result if the two ears
gave equal loudness results for a tone at a given level. The fact that
the actual data is at a steeper slope than this indicates that for the
impaired ear, as the stimulus sound pressure level increases a little,
the apparent loudness increases more rapidly than in a normal ear.
Also, the asterisks indicate the subjects' absolute thresholds; these are
elevated in the impaired ear relative to the normal ear. Taken from
2-1
Moore et al., 1996, with the 45-degree line added. . . . . . . . . . . .
30
Signal processing overview. . . . . . . . . . . . . . . . . . . . . . . . .
38
7
2-2
Block diagrams of Moore frequency smearing algorithm. The top block
diagram shows the main processing sequence, and the bottom block
diagram shows detail for the "Smear Spectrum" block in the top part.
Taken from Baer and Moore, 1993.
2-3
. . . . . . . . . . . . . . . . . . .
44
Spectra resulting from different amounts of Moore smearing on the
vowel II taken from the word BIIDH. All files were put through the
recruitment algorithm but with no recruitment. The unsmeared signal
is shown at the regular level, while the other traces have been offset
by -40 dB increments. The labels to the left of the traces indicate the
amount of smearing, in terms of the effective number of ERBs of the
widened auditory filters; US indicates the unsmeared signal.
2-4
. . . . .
47
Channels resulting from different amounts of Moore smearing on the
vowel II taken from the word BIIDH. All files were put through the
recruitment algorithm but with no recruitment. The unsmeared signal
is shown at the regular level, while the other traces have been offset
by -8 dB increments. The labels to the left of the traces indicate the
amount of smearing, in terms of the effective number of ERBs of the
widened auditory filters; US indicates the unsmeared signal.
2-5
. . . . .
48
Sample spectra showing the results of smearing, for a synthetic vowel
/ae/.
The different frames (vertical dimension in the figure) show
successive chunks of signal that are partially overlapping. As can be
seen in the right column, showing the output spectra, the effective
degree of smearing is less than was intended. The intended degree of
smearing is seen in the middle column, which shows the results after
smearing but before the overlap-add process. Taken from Baer and
M oore, 1993.
2-6
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
Block diagram showing the details of the ter Keurs processing method.
Taken from ter Keurs et al., 1992. . . . . . . . . . . . . . . . . . . . .
8
52
2-7
Block diagram showing the details of the Spectral Smearing block from
2-6. Taken from ter Keurs et al., 1992.
2-8
. . . . . . . . . . . . . . . . .
53
Sample of the outputs of the ter Keurs frequency smearing algorithm.
The top log spectrum, labeled "US," shows the unsmeared signal, while
the bottom three spectra show the results of smearing of 1/8, 1/2, and
2 octaves, as labeled. Taken from ter Keurs et al., 1992 . . . . . . . .
2-9
54
Spectra resulting from different amounts of ter Keurs smearing on the
vowel II taken from the word BIIDH. All files were put through the
recruitment algorithm but with no recruitment. The unsmeared signal
is shown at the regular level, while the other traces have been offset
by -50 dB increments. The labels to the left of the traces indicate the
amount of smearing, in octaves; US indicates the unsmeared signal.
One octave of smearing is approximately equivalent to 3.9 ERBs of
sm earing.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
2-10 Channels resulting from different amounts of ter Keurs smearing on the
vowel II taken from the word BIIDH. All files were put through the
recruitment algorithm but with no recruitment. The unsmeared signal
is shown at the regular level, while the other traces have been offset
by -12 dB increments. The labels to the left of the traces indicate the
amount of smearing, in octaves; US indicates the unsmeared signal.
One octave of smearing is approximately equivalent to 3.9 ERBs of
sm earing.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
2-11 Block diagram of perceptual model of human temporal processing, used
by Hou and Pavlovic in providing rationale for their temporal smearing
model. Taken from Hou and Pavlovic, 1994. . . . . . . . . . . . . . .
58
2-12 Block diagram of the Hou temporal smearing model. The top diagram
(a) shows the overall signal processing path, and the bottom diagram
(b) shows the specifics of how temporal smearing is accomplished.
Figure reproduced from Hou and Pavlovic, 1994. . . . . . . . . . . . .
9
59
2-13 Sample of outputs from the Hou temporal smearing model. The top
panels (a and b) show the unsmeared signal and its spectrum, and the
lower panels show the signal and its spectrum after it has been smeared
with an ERD of 0.lms and 28ms, respectivley. Figure taken from Hou
and Pavlovic, 1994. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
2-14 Spectra resulting from different amounts of Hou smearing on the
vowel II taken from the word BIIDH. All files were put through the
recruitment algorithm but with no recruitment. The unsmeared signal
is shown at the regular level, while the other traces have been offset
by -50 dB increments. The labels to the left of the traces indicate the
amount of smearing, that is, the equivalent rectangular duration of the
smearing temporal window, in msec; US indicates the unsmeared signal. 63
2-15 Channels resulting from different amounts of Hou smearing on the
vowel II taken from the word BIIDH. All files were put through the
recruitment algorithm but with no recruitment. The unsmeared signal
is shown at the regular level, while the other traces have been offset
by -8 dB increments. The labels to the left of the traces indicate the
amount of smearing, that is, the equivalent rectangular duration of the
smearing temporal window, in msec; US indicates the unsmeared signal. 64
2-16 Block diagram of the model of loudness recruitment by Moore and
Glasberg, 1993.
The parallel arrows between the middle blocks
represent the outputs of the 23 bandpass filters in the first block.
Figure taken from Moore and Glasberg, 1993.
10
. . . . . . . . . . . . .
65
2-17 Channels resulting from different amounts of recruitment on the vowel
II taken from the word BIIDH. The vowel was at a level of 100 dB.
Recruitment of N = 1.0 (where N is the exponent to which the
envelope is taken) corresponds to a hearing loss of 0 dB, recruitment
of N = 2.0 corresponds to a flat hearing loss of 50 dB, and in general
the recruitment amounts correspond to hearing losses via the equation
N = lOOdB/(1OOdB - hearinglossdB). No additional offsets were
used in creating the plot: the effective level changes with increasing
recruitment due to the increased absolute threshold. . . . . . . . . . .
68
2-18 Channels resulting from different amounts of recruitment on the vowel
II taken from the word BIIDH. The vowel was at a level of 80 dB.
Recruitment of N = 1.0 (where N is the exponent to which the
envelope is taken) corresponds to a hearing loss of 0 dB, recruitment
of N = 2.0 corresponds to a flat hearing loss of 50 dB, and the other
recruitment amounts correspond to hearing losses via the equation
N = lOOdB/(1OOdB - hearinglossdB). No additional offsets were
used in creating the plot: the effective level changes with increasing
recruitment due to the increased absolute threshold. . . . . . . . . . .
11
69
2-19 Channels resulting from different amounts of recruitment on the vowel
II taken from the word BIIDH. The solid lines are the result of
recruitment with a simulated loss of 0 dB, and the dotted lines are the
result of recruitment with a flat loss of 50 dB. The dashed line shows
channels made from the original signal without recruitment, showing
that simulating a loss of 0 dB does not distort the signal very much.
Signals with levels of 120, 100, and 80 dB were recruited corresponding
to losses of 0 dB and 50 dB, creating pairs of lines for each signal
level. The 120 dB signal is above 100 dB the limit of recruitment, so
recruitment has little effect. The 100 dB signal has the highest parts
at around the limit of recruitment, so they are not affected, while the
higher frequency portions of the signal are lower in level and so are
distorted. The 80 dB signal is severely distorted and reduced in level
by recruitm ent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
2-20 Block diagram of d' discrimination model. The bottom block comes
after the top block in the processing path. The top block is used to
create channels from all vowels processed, and the bottom block takes
all combinations of pairs of vowels, and produces a d' distance between
each pair. In the top figure, the signal is first filtered into bands. Then,
in each band, the average power per unit length of the signal is found
by first squaring it, summing over the signal length, dividing by the
signal length, and then normalizing to account for variation in signal
length between stimuli. Finally, the natural log of the resulting number
is taken to produce a channel. In the bottom diagram, the difference
(Ai) between corresponding channels is found for a pair of stimuli,
as indicated in the plot. These differences are then put through the
formula in the bottom of the figure. E is an arbitrary constant that is
a scale factor for all d' distances . . . . . . . . . . . . . . . . . . . . .
12
73
2-21 Illustration of decision model function in two dimensions. Stimulus
centers are represented by the small points in the figure, and are
surrounded by Gaussian distributions in perceptual space. Response
centers are found by taking the centroid of the stimulus centers of a
single type. The probability distributions around each stimulus center
are assumed to evoke responses corresponding to the closest response
center. The lines dividing the figure are the boudaries of these so-called
response regions.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-22 Block diagram of decision model.
3-1
. . . . . . . . . . . . . . . . . . . .
76
77
Total error rates for the simulation using subject JB's audiogram.
Solid lines indicate results with just smearing and no recruitment,
as indicated in the legend, and dotted lines indicate those conditions
followed by recruitment. The dashed line is subject JB's actual error
rate. .........
3-2
....................................
94
Total error rates for the simulation using subject JO's audiogram.
Solid lines indicate results with just smearing and no recruitment,
as indicated in the legend, and dotted lines indicate those conditions
followed by recruitment. The dashed line is subject JO's actual error
rate. .........
3-3
....................................
95
Total error rates for the simulation using subject MG's audiogram.
Solid lines indicate results with just smearing and no recruitment,
as indicated in the legend, and dotted lines indicate those conditions
followed by recruitment. The dashed line is subject MG's actual error
rate. ....
3-4
......
.............
...............
. ..
96
Total error rates for the simulation using subject PW's audiogram.
Solid lines indicate results with just smearing and no recruitment,
as indicated in the legend, and dotted lines indicate those conditions
followed by recruitment. The dashed line is subject PW's actual error
rate. .........
....................................
13
97
3-5
Total error rates for the simulation using subject RC's audiogram.
Solid lines indicate results with just smearing and no recruitment,
as indicated in the legend, and dotted lines indicate those conditions
followed by recruitment. The dashed line is subject RC's actual error
rate. ........
3-6
.............
.................
...
98
Error rates relative to the total error rate for unsmeared stimuli for the
simulation using subject JB's audiogram. Solid lines indicate results
with just smearing and no recruitment, as indicated in the legend, and
dotted lines indicate those conditions followed by recruitment. The
dashed line is subject JB's actual error rate. . . . . . . . . . . . . . .
3-7
99
Error rates relative to the total error rate for unsmeared stimuli for the
simulation using subject JO's audiogram. Solid lines indicate results
with just smearing and no recruitment, as indicated in the legend, and
dotted lines indicate those conditions followed by recruitment. The
dashed line is subject JO's actual error rate. . . . . . . . . . . . . . .
3-8
100
Error rates relative to the total error rate for unsmeared stimuli for the
simulation using subject MG's audiogram. Solid lines indicate results
with just smearing and no recruitment, as indicated in the legend, and
dotted lines indicate those conditions followed by recruitment. The
dashed line is subject MG's actual error rate.
3-9
. . . . . . . . . . . . . 101
Error rates relative to the total error rate for unsmeared stimuli for the
simulation using subject PW's audiogram. Solid lines indicate results
with just smearing and no recruitment, as indicated in the legend, and
dotted lines indicate those conditions followed by recruitment. The
dashed line is subject PW's actual error rate.
14
. . . . . . . . . . . . .
102
3-10 Error rates relative to the total error rate for unsmeared stimuli for the
simulation using subject RC's audiogram. Solid lines indicate results
with just smearing and no recruitment, as indicated in the legend, and
dotted lines indicate those conditions followed by recruitment. The
dashed line is subject RC's actual error rate. . . . . . . . . . . . . . . 103
3-11 Plot showing how well the average of all of the confusion matrix data
fit to the subject's actual confusion matrices. Averages were taken
across all subjects and smearing amounts, for the reduced-audibility
sim ulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
104
3-12 A comparison of the ter Keurs results (dotted line) and Moore results
(solid line) if the total error rates are approximately the same for
the two smearing conditions. The ter Keurs result had a smearing
multiplier of 1.0, and the Moore smearing had a multiplier of 1.5. For
each pair of ter Keurs and Moore values (plotted in the same vertical
space), the subject used to create the simulated matrix and the real
subject it was being compared to were the same. . . . . . . . . . . . . 105
3-13 A comparison of the ter Keurs results (dotted line) and Moore results
(solid line) if the total error rates are approximately the same for
the two smearing conditions.
The ter Keurs result had a smearing
multiplier of 1.5, and the Moore smearing had a multiplier of 2.0. For
each pair of ter Keurs and Moore values (plotted in the same vertical
space), the subject used to create the simulated matrix and the real
subject it was being compared to were the same. . . . . . . . . . . . .
15
106
3-14 Graph showing the percentage of stimuli that had audible channels
for a given channel number, for each subject. Note that the channels
correspond to a log scale with channel 1 centered at 100 Hz, channel
12 at 1575 Hz, and channel 19 at 4839 Hz (for reference). Also, it can
be seen that in most plots two groups of curves can be seen. The set
of curves to the left corresponds to no recruitment, and the set to the
right is with recruitment. . . . . . . . . . . . . . . . . . . . . . . . . . 107
3-15 Plot showing the stimulus centers and response centers for the stimuli
from the male and female speakers combined (top plot), and just
the female speaker (bottom plot).
The stimulus centers are shown
by the small black dots, and the response centers are shown by the
open circles.
In each plot, the response centers are the average of
the stimulus centers for each vowel. Only half of the total number of
stimulus centers were plotted for computational reasons. As can be
seen around the "AH" in the top plot, the stimuli corresponding to the
male and female speakers formed two distinct clusters for each vowel.
The clusters are not distinct for the other vowels in this plot due to
the perspective. Also, it can be seen that the stimulus centers for each
vowel are less spread out in the bottom plot. . . . . . . . . . . . . . .
108
3-16 Plot showing how well the average of all of the confusion matrix data
fit to the subject's actual confusion matrices. Averages were taken
across all subjects, with the smearing multiplier fixed at 1.0, for the
reduced-audibility simulation. . . . . . . . . . . . . . . . . . . . . . . 109
3-17 Total error rates for the simulation using subject JB's audiogram,
assuming that all frequencies are audible to the subject. Solid lines
indicate results with just smearing and no recruitment, as indicated
in the legend, and dotted lines indicate those conditions followed by
recruitment. The dashed line is subject JB's actual error rate. . . . . 110
16
3-18 Total error rates for the simulation using subject JO's audiogram,
assuming that all frequencies are audible to the subject. Solid lines
indicate results with just smearing and no recruitment, as indicated
in the legend, and dotted lines indicate those conditions followed by
recruitment. The dashed line is subject JO's actual error rate. . . . . 111
3-19 Total error rates for the simulation using subject MG's audiogram,
assuming that all frequencies are audible to the subject. Solid lines
indicate results with just smearing and no recruitment, as indicated
in the legend, and dotted lines indicate those conditions followed by
recruitment. The dashed line is subject MG's actual error rate. . . . .
112
3-20 Total error rates for the simulation using subject PW's audiogram,
assuming that all frequencies are audible to the subject. Solid lines
indicate results with just smearing and no recruitment, as indicated
in the legend, and dotted lines indicate those conditions followed by
recruitment. The dashed line is subject PW's actual error rate. . . . . 113
3-21 Total error rates for the simulation using subject RC's audiogram,
assuming that all frequencies are audible to the subject. Solid lines
indicate results with just smearing and no recruitment, as indicated
in the legend, and dotted lines indicate those conditions followed by
recruitment. The dashed line is subject RC's actual error rate. . . . .
114
3-22 Plot showing how well the average of all of the confusion matrix data
fit to the subject's actual confusion matrices. Averages were taken
across all subjects and smearing amounts, for the simulation in which
all frequencies were audible. . . . . . . . . . . . . . . . . . . . . . . .
17
115
3-23 Assuming full audibility, a comparison of the ter Keurs results (dotted
line) and Moore results (solid line) if the total error rates are
approximately the same for the two smearing conditions. The ter Keurs
result had a smearing multiplier of 1.0, and the Moore smearing had a
multiplier of 1.5. For each pair of ter Keurs and Moore values (plotted
in the same vertical space), the subject used to create the simulated
matrix and the real subject it was being compared to were the same.
116
3-24 Assuming full audibility, a comparison of the ter Keurs results (dotted
line) and Moore results (solid line) if the total error rates are
approximately the same for the two smearing conditions. The ter Keurs
and Moore results both had a smearing multiplier of 2.0. For each pair
of ter Keurs and Moore values (plotted in the same vertical space), the
subject used to create the simulated matrix and the real subject it was
being compared to were the same. . . . . . . . . . . . . . . . . . . . .
4-1
117
Average vowel lengths for the different vowels. The error bars represent
one standard deviation of the vowel length. The averages were nearly
identical for male and female speakers.
4-2
. . . . . . . . . . . . . . . . .
130
Examples of spectra. In both cases, the d' discrimination algorithm
will produce equal d' distances between all stimuli. However, a real
human listener could easily distinguish that the third spectrum was
much different than the other two. In Stimulus Set A, the spectrum
heights are A, 2A, and A, for spectra 1,2, and 3, respectively. The
widths of the spectra are such that the total area under all of the
spectra is the sam e. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
131
A-1 Audiogram for subject PW. . . . . . . . . . . . . . . . . . . . . . . .
135
A-2 Audiograms for subjects JB and JO.
. . . . . . . . . . . . . . . . . .
136
A-3 Audiograms for subjects RC and MG . . . . . . . . . . . . . . . . . .
137
C-1 Original simulation including reduced audibility. . . . . . . . . . . . .
153
18
C-2 Original simulation including reduced audibility. ...............
154
C-3 Original simulation including reduced audibility. ...............
155
C-4 Original simulation including reduced audibility. . . . . . . . . . . . . 156
C-5 Original simulation including reduced audibility. . . . . . . . . . . . . 157
C-6 Original simulation including reduced audibility. . . . . . . . . . . . .
158
C-7 Original simulation including reduced audibility. . . . . . . . . . . . .
159
C-8 Original simulation including reduced audibility. . . . . . . . . . . . .
160
C-9 Original simulation including reduced audibility. . . . . . . . . . . . .
161
C-10 Original simulation including reduced audibility. . . . . . . . . . . . .
162
C-11 Original simulation including reduced audibility. . . . . . . . . . . . .
163
C-12 Original simulation including reduced audibility. . . . . . . . . . . . .
164
C-13 Original simulation including reduced audibility. . . . . . . . . . . . .
165
C-14 Original simulation including reduced audibility. . . . . . . . . . . . . 166
C-15 Original simulation including reduced audibility. . . . . . . . . . . . .
167
C-16 All frequencies audible condition.
. . . . . . . . . . . . . . . . . .
168
C-17 All frequencies audible condition.
. . . . . . . . . . . . . . . . . .
169
C-18 All frequencies audible condition.
. . . . . . . . . . . . . . . . . .
170
C-19 All frequencies audible condition.
. . . . . . . . . . . . . . . . . .
17 1
C-20 All frequencies audible condition.
. . . . . . . . . . . . . . . . . .
172
C-21 All frequencies audible condition.
. . . . . . . . . . . . . . . . . .
173
C-22 All frequencies audible condition.
. . . . . . . . . . . . . . . . . .
174
C-23 All frequencies audible condition.
. . . . . . . . . . . . . . . . . .
175
C-24 All frequencies audible condition.
. . . . . . . . . . . . . . . . . .
176
C-25 All frequencies audible condition.
.
. . . . . . . . . . . . . . . . . .
177
C-26 All frequencies audible condition.
.
. . . . . . . . . . . . . . . . . .
178
C-27 All frequencies audible condition.
.
. . . . . . . . . . . . . . . . . .
179
C-28 All frequencies audible condition.
.
. . . . . . . . . . . . . . . . . .
180
C-29 All frequencies audible condition.
.
. . . . . . . . . . . . . . . . . .
18 1
C-30 All frequencies audible condition.
.
. . . . . . . . . . . . . . . . . .
18 2
19
D-1 Diagram of new hearing loss model. . . . . . . . . . . . . . . . . . . . 188
20
List of Tables
2.1
Table of the levels at which the low-passed (below 2500 Hz) and highpassed (above 2500 Hz) portions of the CVC syllables were presented to
the subjects in the confusion matrix experiment by Brantley (1990-1991). 41
2.2
Table of the different processing conditions used in the simulation. . .
3.1
Table of the average simulated confusion matrices for each processing
71
condition. Averages were taken across subjects and smearing amounts.
From top to bottom, the processing conditions are: No smearing,
Moore smearing, ter Keurs smearing, and Hou smearing. The left
column contains the results of smearing without recruitment, and the
right column contains the results of smearing followed by recruitment.
3.2
83
Table of the simulated confusion matrices for approximately equivalent
total smearing amounts for the Moore and ter Keurs methods, without
recruitment. The results shown in the table are the averages across all
subjects. The top row contains a Moore smearing with a widening
factor of 4.5, and ter Keurs smearing of 0.75 octaves. The total error
rate with that amount of smearing for the Moore method was a little
lower than the total error rate for that amount of ter Keurs smearing.
The bottom row contains a Moore smearing with a widening factor of
6, and ter Keurs smearing of 1.125 octaves. The total error rate with
that amount of smearing for the Moore method was a little higher than
the total error rate for that amount of ter Keurs smearing. . . . . . .
21
85
3.3
Table of the average simulated confusion matrices for each processing
condition.
Averages were taken across subjects for a smearing
multiplier of 1.
From top to bottom, the processing conditions are:
No smearing, Moore smearing, ter Keurs smearing, and Hou smearing.
The left column contains the results of smearing without recruitment,
and the right column contains the results of smearing followed by
recruitm ent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4
91
Table of the average simulated confusion matrices for each processing
condition, assuming that all frequencies are audible. Averages were
taken across subjects and smearing amounts. From top to bottom, the
processing conditions are: No smearing, Moore smearing, ter Keurs
smearing, and Hou smearing. The left column contains the results
of smearing without recruitment, and the right column contains the
results of smearing followed by recruitment.
3.5
. . . . . . . . . . . . . .
92
Table of the simulated confusion matrices for approximately equivalent
total smearing amounts for the Moore and ter Keurs methods, without
recruitment, assuming that all frequencies are audible.
The results
shown in the table are the averages across all subjects. The top row
contains a Moore smearing with a widening factor of 4.5, and ter Keurs
smearing of 0.75 octaves. The total error rate with that amount of
smearing for the Moore method was a somewhat (~5%) lower than the
total error rate for that amount of ter Keurs smearing. The bottom
row contains a Moore smearing with a widening factor of 6, and ter
Keurs smearing of 1.5 octaves. The total error rate with that amount
of smearing for the Moore method was almost identical to the total
error rate for that amount of ter Keurs smearing. . . . . . . . . . . .
A.1
93
Confusion matrix for subject JB.
. . . . . . . . . . . . . . . . . . . .
138
A.2 Confusion matrix for subject JO.
. . . . . . . . . . . . . . . . . . . .
138
A.3 Confusion matrix for subject MG. . . . . . . . . . . . . . . . . . . . .
138
22
A.4 Confusion matrix for subject PW. . . . . . . . . . . . . . . . . . . . .
139
A.5 Confusion matrix for subject RC. . . . . . . . . . . . . . . . . . . . .
139
B.1 Confusion matrices for the simulation done with subject
JB's
parameters. From top to bottom, for each row the smearing conditions
were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only.
The amount of smearing in each column was the "Smearing Mulitplier"
times a 'standard amount of smearing' (see beginning of the appendix). 141
B.2 Confusion matrices for the simulation done with subject JB's
parameters.From top to bottom, for each row the smearing conditions
were: Moore sm. then recruitment, ter Keurs sm. then recruitment,
Hou sm. then recruitment, and just recruitment. The amount
of smearing in each column was the "Smearing Mulitplier" times a
'standard amount of smearing' (see beginning of the appendix).
B.3 Confusion matrices for the simulation done with subject
. . .
142
JO's
parameters. From top to bottom, for each row the smearing conditions
were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only.
The amount of smearing in each column was the "Smearing Mulitplier"
times a 'standard amount of smearing' (see beginning of the appendix). 143
B.4 Confusion matrices for the simulation done with subject JO's
parameters.From top to bottom, for each row the smearing conditions
were: Moore sm. then recruitment, ter Keurs sm. then recruitment,
Hou sm. then recruitment, and just recruitment. The amount
of smearing in each column was the "Smearing Mulitplier" times a
'standard amount of smearing' (see beginning of the appendix).
. .
.
144
B.5 Confusion matrices for the simulation done with subject MG's
parameters. From top to bottom, for each row the smearing conditions
were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only.
The amount of smearing in each column was the "Smearing Mulitplier"
times a 'standard amount of smearing' (see beginning of the appendix). 145
23
B.6 Confusion matrices for the simulation done with subject MG's
parameters.From top to bottom, for each row the smearing conditions
were: Moore sm. then recruitment, ter Keurs sm. then recruitment,
Hou sm.
then recruitment, and just recruitment.
The amount
of smearing in each column was the "Smearing Mulitplier" times a
'standard amount of smearing' (see beginning of the appendix).
. . .
146
B.7 Confusion matrices for the simulation done with subject RC's
parameters. From top to bottom, for each row the smearing conditions
were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only.
The amount of smearing in each column was the "Smearing Mulitplier"
times a 'standard amount of smearing' (see beginning of the appendix). 147
B.8 Confusion matrices for the simulation done with subject RC's
parameters.From top to bottom, for each row the smearing conditions
were: Moore sm. then recruitment, ter Keurs sm. then recruitment,
Hou sm.
then recruitment, and just recruitment.
The amount
of smearing in each column was the "Smearing Mulitplier" times a
'standard amount of smearing' (see beginning of the appendix).
. . .
148
B.9 Confusion matrices for the simulation done with subject PW's
parameters. From top to bottom, for each row the smearing conditions
were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only.
The amount of smearing in each column was the "Smearing Mulitplier"
times a 'standard amount of smearing' (see beginning of the appendix). 149
B.10 Confusion matrices for the simulation done with subject PW's
parameters.From top to bottom, for each row the smearing conditions
were: Moore sm. then recruitment, ter Keurs sm. then recruitment,
Hou sm.
then recruitment, and just recruitment.
The amount
of smearing in each column was the "Smearing Mulitplier" times a
'standard amount of smearing' (see beginning of the appendix).
24
. . .
150
Chapter 1
Introduction
1.1
Background
Due to the current state of technology, it is possible to do substantial signal processing
in hearing aids.
As such, the goal of hearing aid research is to come up with
signal processing schemes to enable hearing-impaired individuals to experience sound
as much as is possible like normal listeners, or at least to be able to consistently
understand speech in various environments and at various levels. In order to develop
said signal processing algorithms, it is necessary to understand how hearing-impared
people experience sound, so that any differences can be accounted for. Thus, hearing
aid research currently is largely focused on developing models of how hearing-impaired
people experience sound. This chapter contains a discussion of the physiology of
hearing impairment and the perceptual consequences of hearing loss to provide
background for understanding the models, and also discusses previous work on hearing
loss simulations.
1.2
Physiology of Hearing Impairment
The first stages of the auditory system include the outer, middle, and inner ear, after
which the auditory nerve takes the resulting signals to the brain. The outer and
25
middle ear serve to collect and channel the sound into the head. The inner ear serves
several functions: it acts as a sort of frequency analyzer, to separate the sound into
different frequency bands; it amplifies the sound; and it converts the mechanical sound
signal into neural signals. These functions take place within the inner ear structure
known as the cochlea. The cochlea is essentially a coiled tube that is partitioned
lengthwise by several membranes. One of these is known as the basilar membrane
(BM), which is a stiff structure that performs the first stage of frequency analysis
through its mechanical properties: it is about 100 times stiffer at the base (the place
where the sound first encounters it) than at the apex (the other end from the base).
This variable stiffness means that different locations on the BM will respond (oscillate)
the best to different input frequencies: it acts as a frequency-to-place converter. The
frequency that causes the maximum response of a particular location on the BM is
known as the best frequency (BF) or characteristic frequency (CF) of that location.
High frequencies to cause the basal end to be excited more, while low frequencies
travel down the length of the BM and create more of a response toward the apical
end.
Attached to the BM are rows of cells known as outer hair cells (OHCs). These
cells serve primarily as mechanical transducers, actively causing the BM to vibrate
even more in response to incoming sounds than it would on its own. This causes
the BM to be even more frequency selective (that is, exhibit sharper tuning to tone
stimuli) and causes the incoming sounds to be amplified. It has been found that
due to the OHCs, the transfer function between the incoming sound level and the
resulting BM oscillation displacement at a particular location is compressive for CF
tones. This compression takes place for tones at input levels between about 30-90 dB
SPL (Oxenham and Plack, 1997). A graph showing this transfer function is shown in
figure 1-1. For tones that are an octave or more lower than the CF of a location on
the BM, the BM responds linearly. This is also shown in figure 1-1.
The motion of the BM is detected by another type of cell that is found along the
length of the BM, the inner hair cell (IHC). These cells connect to auditory nerve
26
100
Normal hearing
Impaired hearing
90
a_
cn
80
70
(1)
50
C-)
Ut)
50
XD
40
40
50
60
CP JB so6
0 0 [> 3 kHz
0 0 NO 5 k Hz
MV
70
70
80
40 50 60
90 100
Signal level (dB SPL)
AR JK
0
80
3 kHz
6 kHz
90
100
Figure 1-1: Figure showing how the basilar membrane is compressive for tones at
the best frequency of the location of measurement. This figure shows "the level
of a masker required to mask [a] 6-kHz [tone] signal, as a function of signal level"
(Oxenham and Plack, 1997). The plot of the on-frequency masker is linear, as would
be expected. However, the off-frequency masker shows compression, indicating both
that the basilar membrane is compressive and that it responds linearly for tones an
octave below the best frequency of a given location. Taken from Oxenham and Plack,
1997.
27
fibers, and act to convert the amplitude (or velocity) of BM oscillation into neural
firing patterns. In response to frequencies below about 2 kHz, the IHCs cause spikes
on the auditory nerve to be "phase-locked" to those frequencies, meaning that the
spike trains are in phase with the BM motion caused by the signal, and the temporal
patterns of the signal itself. At higher frequencies, auditory nerve fiber firing patterns
are not phase-locked to the BM motion, meaning that their firing rate is not correlated
with the signal.
When sensorineural hearing loss occurs, typically the OHCs and/or IHCs are
damaged. This damage can occur in any location along the length of the cochlea,
which translates to different frequency regions having different amounts of hearing
loss due to the frequency-to-place transformation in the cochlea. IHC loss is primarily
a conduction loss, as the IHCs act as sensors of the BM motion. This type of hearing
loss can typically be corrected for by linear amplification of the incoming signal.
However, in the majority of cases, OHC loss is the more dominant effect in hearing
loss. The effect of OHC loss is that the sharp frequency tuning of the BM is lost in
regions where there has been OHC damage, since the OHCs will no longer function
as active amplifiers. Additionally, those regions of the cochlea do not have the gain in
signal level caused by the OHCs. This means that compression will not take place to
as large a degree. In cases of severe OHC loss, no compression or signal gain remains,
and the BM responds linearly to the incoming sound waves.
The current experiment, and hearing aid research in general, are focused on
studying the effects of hearing loss caused by OHC damage alone. The next section
will discuss the perceptual consequences of this type of hearing loss.
1.3
Psychoacoustics of Hearing Impairment
In discussing the perceptual consequences of hearing loss, it will be helpful to first
define some terms commonly used in discussing auditory perception. The first idea is
the "auditory filter." If the transfer function from a sound in air to a person's internal
28
perception of the sound is examined, a good model for this transfer function is a filter
bank of auditory filters. The specific form of the auditory filters will be described
later, but in general their center frequencies are logarithmically spaced above around
600 Hz (below this they have constant spacing). For moderate input sound levels
the filters are symmetric, but for high input sound levels they are asymmetric and
respond more to frequencies higher than the center frequency. The next idea is the
"excitation pattern," which is defined as "the output from the auditory filters as a
function of filter center frequency..." (Baer and Moore, 1993). Or in other words, if
a sound is played to a listener, the excitation pattern is the plot of how much each
auditory filter responds versus the center frequencies of the auditory filters.
The psychoacoustic effects of OHC loss are generally described in terms of loudness
recruitment and absolute threshold reduction, and frequency and temporal smearing.
These will be discussed in turn.
Loudness recruitment is a phenomenon that stems from the reduced gain of the
cochlea in regions where there is OHC loss. In normal ears, quiet sounds are amplified
to a moderate level through the compressive behavior of the cochlea. However, at
and above about 100 dB, the incoming sounds are no longer amplified (Moore et al.,
1996). In impaired ears, quiet sounds are not amplified, and so below a certain level
threshold, sounds cannot be heard at all. This is an absolute threshold elevation. For
sounds just above the threshold, the perceived loudness increases more rapidly with
increases in signal level than it would for normal-hearing listeners. This is because of
the lack of or reduced compression in the cochlea, and because the perceived loudness
is related to the amount of BM motion. A graph showing equal-loudness contours for
subjects with unilateral hearing losses is shown in figure 1-2.
Frequency and temporal smearing occurs because of the less-sharp tuning of the
BM in an impaired cochlea. Locations on the BM at which OHC damage has occurred
respond to a wider range of frequencies than they would in a normal ear.
The
psychoacoustic effect of this is that an impaired-listener's auditory filters are wider
than they are supposed to be, and that the excitation patterns produced also less
29
120
JH
100
80
60
0
40
20
0 varying in normal ear
0 Varying in impaired ear
CLas'
-
S100
s
80
<0
20
0
Gr
VF
100
80
60
40
20
0-
0
20
Level irn
40
60 80 100 120
impaired ear (dB SPL)
Figure 1-2: Loudness matching curves for subjects with unilateral hearing loss. The
thin line at a 45-degree angle is the expected result if the two ears gave equal loudness
results for a tone at a given level. The fact that the actual data is at a steeper slope
than this indicates that for the impaired ear, as the stimulus sound pressure level
increases a little, the apparent loudness increases more rapidly than in a normal ear.
Also, the asterisks indicate the subjects' absolute thresholds; these are elevated in
the impaired ear relative to the normal ear. Taken from Moore et al., 1996, with the
45-degree line added.
30
sharp. Also, since a sum of a lot of frequencies (which is what will come through the
wider auditory filter) produces a less-regular temporal pattern than a single frequency
or just a few frequencies (which is what would come through normal-width auditory
filters), the temporal patterns at the output of each auditory filter are more complex.
Moore et al. (1992) provide a convenient summary of the psychoacoustic changes
produced by hearing impairments:
(1) It would produce broader excitation patterns for sinusoidal signals.
For complex signals containing many harmonics (e.g. speech sounds) this
would make the individual components harder to resolve.
(2) For complex signals such as speech it would result in a blurring of
the spectral envelope, making spectral features such as formants harder
to detect and discriminate (Leek et al., 1987).
(3) It would change the time patterns at the outputs of the auditory filters,
generally giving more complex stimulus waveforms (Rosen and Fourcin,
1986).
It is also important to note that these perceptual effects of hearing loss are all
closely related: it is likely not possible to create neural stimuli that code for one
effect but not the others (Moore et al., 1992). In the processes of changing either the
frequency or temporal patterns in a signal, the perception of the other will almost
certainly be changed due to their being coded in the same IHC responses.
1.4
Simulating Hearing Loss
1.4.1
Reasons for Simulating Hearing Loss
As stated in the background, in order to develop signal processing algorithms for use
in hearing aids, it is necessary to understand how hearing-impared people experience
sound, so that any differences can be accounted for.
31
It would be very useful to
have a computer model that would accurately represent hearing-impaired listeners'
responses, because then simulations could be done using the model to predict real
listeners' responses and provide the optimal hearing aid correction. Also, it is useful to
model the psychoacoustic effects of hearing loss separately. Though it is not possible
to isolate the effects in real listeners, it would be beneficial to determine which effects
cause the most deleterious effects on speech reception, because hearing aid signal
processing schemes may be able to correct for one effect (at a possible expense to the
other effects, but with a net benefit to the hearing aid user). Finally, hearing loss
models can be used to educate normal-hearing listeners about hearing impairments.
Such education could be useful for people who interact with hearing-impaired listeners
frequently.
1.4.2
Previous Work Simulating Hearing Loss
Simulations of hearing loss were first produced the 1930s, when Steinberg and
Gardener (1937) used masking noise to simulate increased hearing thresholds and
loudness recruitment.
However, with masking noise the mechanisms that occur
in the ear and brain to cause the percept of hearing loss in normal hearing
listeners are thought to be different than the mechanisms of hearing impairment
in real hearing-impaired listeners (Graf, 1997).
More recently, advances in signal
processing capabilities have allowed attempts to be made at more accurate hearing
loss simulations. Generally, studies have attempted to generate stimuli that, when
played to a normal-hearing listener, simulate one aspect of hearing loss at a time.
Some studies have simulated the aspects of reduced time or frequency resolution, and
some have simulated loudness recruitment. A few studies have used a recruitment
model in conjunction with a reduced-resolution model to try simulate hearing loss
more generally. However, as was pointed out in the previous section, it is not generally
possible to create signals that, when presented to normal-hearing listeners, will evoke
a completely accurate perception of hearing loss in all respects.
Moore and Glasberg (1993) simulated simulated threshold elevation and loudness
32
recruitment with a method used later in the current experiment.
They simulated
moderate and severe flat hearing loss conditions, and a sloping moderate-to-severe
hearing loss condition. They examined responses of normal-hearing subjects listening
to the stimuli, which were sentences containing key words. They found that with
speech processed in quiet, at low presentation levels the intelligibility was reduced. If
the speech was amplified to normal conversation levels before processing (simulating a
linear hearing aid), then it was highly intelligible, however. In contrast, if the speech
contained a competing talker in the background, intelligibility was compromised even
at high presentation levels. These results were extended by Moore et al., 1995, who
simulated recruitment in the same way except with a background of speech-shaped
noise. They found that intelligibility was reduced for the unamplified case, but if
linear amplification was applied then intelligibility was almost as good as in the
unprocessed (control) condition-performance decreased much less than with the single
talker interference in the previous experiment.
A number of experiments have simulated reduced frequency selectivity. Baer and
Moore (1993) processed sentences and presented them at a moderate level to normalhearing listeners, using an algorithm also used in the current simulation. They found
that if speech was processed in a background of quiet, even simulating auditory
filters with 6 times the normal width did not affect the intelligibility very much.
If the speech was processed in a background of speech-shaped noise, intelligibility
significantly decreased.
The intelligibility decreased more with larger amounts of
smearing and with lower signal-to-noise ratios. The authors state that these results
are consistent with the behaviors of real hearing-impaired listeners. Another study by
Baer and Moore (1994) extended these results by processing speech in the presence
of a competing talker. They found that intelligibility was reduced, but not a much
as with the noise background.
ter Keurs et al. (1992) tested a different spectral smearing algorithm, also used
in the current simulation. They examined both the speech reception threshold (SRT)
for sentences and in a second experiment, vowel (and consonant-not discussed here)
33
confusions caused by the smearing. They found that the subjects' SRT increased
once the smearing bandwidth increased beyond the ear's critical bandwidth, about
1/3 to 1/2 octaves, and continued to increase after this point.
In the confusion
matrix experiments, the total error rate for vowel confusions remained close to zero
until somewhere between 1/2 and 2 octaves of smearing. The particular confusions
causing the errors tended to be with the back vowels. These results held for both
vowels in quiet and in a noise background. A second experiment by ter Keurs, et al.
(1993) examined SRTs to sentences spoken by a male speaker (the first experiment
had used a female speaker) and found similar results regarding the critical bandwidth.
Hou and Pavlovic (1994) performed temporal smearing on vowel-consonant
nonsense syllables to evaluate its effect on speech intelligibility, using an algorithm
repeated in the current experiment. They found that with small smearing durations,
speech was not degraded. Even with the larger durations, speech intelligibility was
only reduced when the signals were low-pass filtered. Also, they demonstrated that
the spectral characteristics of the signals were largely unaltered.
Nejime and Moore (1997) simulated the combined effects of frequency smearing
followed by loudness recruitment on speech intelligibility, using the models from Baer
and Moore (1993) and Moore and Glasberg (1993).
Their stimuli were sentences
in speech-shaped noise. They tested a simulated moderate flat loss or moderate-tosevere sloping loss for both frequency smearing and loudness recruitment, and found
that speech intelligibility was degraded substantially, and could not be corrected for
by amplification alone.
In general, these studies have shown that simulations of frequency smearing and
loudness recruitment degrade speech intelligibility for sentences with key words, for
both male and female speakers. Temporal smearing does not seem to have as much
of an effect on speech intelligibility. However, aside from the ter Keurs et al. (1992)
experiment, these studies have not attempted to verify the accuracy of the smearing
methods aside from gross intelligibility rates. Even in the ter Keurs (1992) paper, a
comparison between the confusion matrix resulting from simulated smearing and a
34
confusion matrix corresponding to real hearing-impaired listeners was not provided.
So, questions remain about how well these smearing methods simulate real hearing
impairments.
1.5
Problem Statement
Most previous studies on hearing loss simulation have examined the amount of speech
intelligibility degradation caused by the hearing loss simulations without examining
how well the percept evoked in normal-hearing listeners matched what hearingimpaired listeners experience. The simulations described in this thesis attempt to
evaluate different hearing loss models of reduced frequency or temporal resolution
and loudness recruitment/threshold elevation in order to determine their accuracy.
To accomplish this task, a model of human speech perception is employed.
This
provides a deterministic way of evaluating the hearing loss simulations, and allows
for multiple parameter values to be considered efficiently. It is also convenient that
the stimuli under consideration, vowels, are determined largely by their spectra; the
perceptual model being used operates solely on the basis of signals' spectra, allowing
discrimination to proceed without interference from other signal characteristics.
35
Chapter 2
Signal Processing in the
Experiment
2.1
Overview of Experiment
This thesis involved simulating hearing loss using several models.
In particular,
the models of hearing loss that were being compared and evaluated performed
frequency smearing and loudness recruitment. These models converted waveforms
corresponding to ordinary speech into waveforms that would evoke the percept of
hearing loss in normal-hearing listeners.
It was desired to determine how well these models simulated what real hearingimpaired people experience when they hear sounds.
In order to determine what
people experience, something was needed to convert the physical sound into a person's
perception of the sound. This could either be a real person (i.e., subjects could listen
to sounds processed by the hearing loss models and report on their perceptions), or
it could be a model simulating how normal-hearing people perceive sounds. In this
thesis, the latter option was decided upon in order to try accurately simulate the entire
pathway from a sound in air to what a hearing-impaired listener would experience
in response to that sound. The combination of a hearing loss model followed by a
perceptual model would ideally react the same way to stimuli as would a real person.
36
So, to test this simulation, the same stimuli were presented to real hearing-impaired
subjects as well as to the simulation, and the responses of each were compared.
2.2
Overview of Signal Processing
A block diagram showing an overview of all the signal processing steps that were done
is shown in figure 2-1. The bottom path in the block diagram shows the processing by
real subjects, while the top path shows the steps in the simulation. The stimuli used
to compare the simulation to real subjects were a set of CVC tokens, approximately
half spoken by a male speaker, and about half spoken by a female speaker.
2.2.1
Processing by Real Subjects
The real hearing-impaired subjects performed their "processing" as follows: five
hearing-impaired subjects listened to the set of CVC tokens, and confusion matrices
were generated based on the subject's responses.
This step was done by Merry
Brantley in the Sensory Communication group of the Research Laboratory of
Electronics at MIT in 1990 and 1991. Although confusion matrices for both the
vowels and the consonants were generated in the previous experiment by Brantley,
only the vowel confusion matrices were used in the current research.
2.2.2
Processing by the Simulation
In the simulation, processing was done as follows: first, the vowels were extracted
from the CVC tokens. Next, the extracted vowels were put through the hearing loss
models, which involved simulating frequency smearing, loudness recruitment, or a
combination of the two. The models used in this section were taken from papers by
Baer and Moore (1993), ter Keurs et al. (1992), Hou and Pavlovic (1994), Moore and
Glasberg (1993), Farrar et al. (1987), and Braida (1991). The hearing loss models
produced a set of waveforms that would sound to normal-hearing listeners as if the
listeners had hearing loss. The combination of the vowel extraction block and the
37
Signal Processing Overview
Smeared/
d' discrimination algorithm
Extract
Vowels
Simulated Hearing
Loss: frequency
smearing and/or
recruitment
Hearing-loss Model
n
CD
matrix of d'
values for
vowel pairs
processed
vowels
Simluation:
Signal
Processing
Make
Channels
Compare
confusion
matrix
tofindd's
Perceptual Model
CVCs said
by a male
and female
speaker
Compute
Chii2
C
confusion
Hearing-
Simpaired
listeners
Real
Subjects
Decision
Model
matrix
simulated hearing loss block formed the hearing loss model portion of the simulation.
The subsequent stages in the simulation made up the perceptual model portion of
the simulation.
The first stage of the perceptual model shall be referred to as the "d' discrimination
algorithm," and it was concerned with determining how different two sounds would
be perceptually. This model was created by Farrar, et al. (1987). The first stage
of the model, which shall be referred to as the "make channels" block, took as an
input a single vowel waveform and essentially found the signal power in a number
of consecutive frequency bands. The outputs from this model will be referred to as
"channels." The next stage of the d' discrimination algorithm took the channels from
two different vowels, and computed a d' distance between them. This d' distance was
interpreted as being the distance between the two vowels in "perceptual space," that
is, how different they would be perceived to be by a human listener. In response to
the set of processed vowel waveforms, the d' discrimination algorithm generated a
matrix of d' distances between all vowel pairs.
The second stage of the perceptual model converted this set of d' distances into
a confusion matrix. This was done by the decision model block. This block first
used classical multi-dimensional scaling to produce a map of the vowels that was
reflective of the d' between-vowel distances. This map was also interpreted as being
in perceptual space, as its dimensions were in units of d'. To take into account
of the perceptual uncertainty that is present if a person is exposed to a stimulus,
each point was converted into a Gaussian probability distribution centered at the
point's location. Finally, a decision algorithm was applied as follows: the centroid
of the locations for each vowel was computed. These centroids were interpreted to
be the "response centers" in perceptual space, while the individual vowel locations
surrounded by Gaussians were interpreted to be the "stimulus centers."
An ideal
detector algorithm was used such that the stimulus centers (or really, the parts of the
Gaussians around them) closest to a response center were classified as corresponding
to that response. In this manner a confusion matrix was generated corresponding to
39
the processed sounds, completing the simulation pathway.
Finally, to compare the responses created by the simulation to those generated
by real listeners, a Chi-squared-like measure was used to evaluate how closely the
confusion matrices matched.
2.3
Detail
on the Brantley
Confusion
Matrix
Experiment
The experiment that produced the confusion matrices for this thesis was examining
the effect of filtering on audibility in hearing-impaired listeners, and was done
by Brantley in 1990-1991.
The five subjects in this experiment had moderate or
moderate-severe hearing losses; three subjects' losses were flat, and two were sloping.
Audiograms for the five subjects are presented in figures A-1, A-2, and A-3, in
Appendix A.
The experiment used part of a set of 862 consonant-vowel-consonant (CVC)
syllables of the form /@/-CVC, where
/a/
is the unstressed schwa. The set of CVC
syllables had the vowels /i, Q, u, 1, C, u/, and the 16 consonants /p, t, k, b, d, g, f,
0, s, f, v, a) z, t,
, h, 3/. The Brantley experiment limited their stimuli to the 12
consonants /p, t, k, b, d, g, s, f, 0, v, z, 6/, so they only used about 484 syllables.
The mean durations of the syllables spoken by the male speaker were 634 ins, and the
syllables spoken by the female speaker had a mean duration of 574 ms (the vowels
used in this thesis spoken by the male and female speakers had approximately equal
durations). The stimuli in the Brantley experiment were lowpass filtered at 9 kHz,
and digitized at a sampling rate of 20 kHz with 12-bit resolution.
Each stimulus was filtered into a low-passed portion (below 2500 Hz) and a
high-passed portion (above 2500 Hz). Each portion was presented separately to the
five subjects in a background of quiet, and the level was adjusted to maximize the
comprehension of the CVC stimuli. Then, the low-passed and high-passed portions
of the signal were added together, using the levels found in the first part of the
40
Subject
JB
JO
MG
PW
RC
Low-passed portion
[dB SPL]
80
110
107
85
80
High-passed portion
[dB SPL]
95
105
110
88
102
Table 2.1: Table of the levels at which the low-passed (below 2500 Hz) and highpassed (above 2500 Hz) portions of the CVC syllables were presented to the subjects
in the confusion matrix experiment by Brantley (1990-1991).
experiment, and a confusion matrix experiment was done. In the experiment, each
stimulus was presented to each subject 50 times, and confusion matrices for the vowels
and consonants were generated. A table of the levels at which the high-passed and
low-passed portions of the signals were presented to each subject is shown in table
2.3.
Since the simulations done in this thesis attempted to duplicate this confusion
matrix experiment, a simulation was done using the parameters for each subject
(hearing loss and levels used in the experiment). It was hoped that the results of the
simulations would correspond to the confusion matrices produced by the respective
subjects. The description of the simulation in the following sections describes it as if
just one set of parameters was being used, but in reality the simulation was run five
times, corresponding to once for each subject. The confusion matrices produced by
the real subjects are shown in Appendix A.
2.4
Speech Stimuli and Vowel Extraction
The stimuli used in the simulation were the same as described above in the section on
the Brantley confusion matrix experiment, except that the simulation used the entire
set of stimuli with 16 possible consonants, instead of being limited to 12 consonants
as in the actual experiment. Also, the sampling rate of the stimuli in this thesis was
10 kHz instead of 20 kHz, thereby limiting the bandwidth of the stimuli to 5 kHz.
41
For the remainder of this thesis, the vowels /a, i, C, 1, u, u/ will be referred to as AH,
EE, EH, II, 00, and UH, respectively.
Though these tokens contained both consonants and vowels, only the vowels were
used in the simulation. The vowels were thus extracted from the entire word and
used in isolation. This is another difference between the Brantley confusion matrix
experiment and the simulation in this thesis: in the Brantley experiment, the vowels
were heard in the context of consonants, but in this simulation they were in isolation.
Vowel extraction was achieved by making use of the fact that the energy in the vowels
was usually much greater than that of the consonants. The waveform for each word
was squared, finding the power, and the envelope of this was found using a secondorder lowpass filter with a cutoff frequency of about 10.3 Hz. Next, a threshold was
set to be 0.2 times the maximum value of power envelope. The longest stretch of
contiguous signal above this threshold was judged to be the vowel (this metric was
correct in all cases to identify the general region of the vowel). Next, to eliminate
the transition regions between the vowel and surrounding consonants, 20 ms of signal
was removed from the beginning of the vowel, and 35 ms of signal was removed from
the end of the vowel. The extracted vowels were listened to in order to verify that the
vowels had been extracted well. A few vowels were removed from the set due to clicks
or other signal artifacts. The syllable waveforms at this point had a sampling rate of
Fs=10000. They were normalized to have a specified average power per unit time. It
was also at this point that the signal was scaled so that the levels above and below
2500 Hz matched those that were used in the confusion matrix experiment with real
subjects, for a given subject.
2.5
Signal processing simulating hearing loss
2.5.1
Methods of frequency smearing
There were three methods of frequency smearing considered; or more specifically,
two methods simulated reduced frequency selectivity in the impaired ear (Baer and
42
Moore, 1993 and ter Keurs, et al., 1992), and one method simulated reduced temporal
resolution in the impaired ear (Hou and Pavlovic, 1994). These methods shall be
referred to as the "Moore," "ter Keurs," and "Hou" methods in short. In all of these
methods, the purpose of processing was to generate a time waveform that would
sound to normal-hearing listeners as if they had hearing loss.
Moore Smearing
The "Moore" method was first presented in Moore et al.
(1992), then modified
somewhat in Baer and Moore (1993) and later expanded upon and further modified
by Nejime and Moore (1997). The method used in this experiment is that presented
in the Baer and Moore (1993) paper. This smearing method attempted to create
an excitation pattern for a normal-hearing listene evoked by a processed sound that
would resemble the excitation pattern for an impaired-hearing listener by a normal
sound.
This attempted to duplicate the effect of the wider-than-normal auditory
filters and reduced frequency selectivity found in impaired-hearing listeners.
The
particular degree of widening was intended to resemble that found in individuals
with "moderate or severe cochlear hearing loss" (Baer and Moore, 1993).
This
processing notably did not attempt to duplicate any time-domain features of what
hearing-impaired listeners would experience, but only spectral differences, as it is
likely impossible to accomplish both simultaneously (Baer and Moore, 1993).
Block diagrams of the processing done in this method are shown in figure 2-2.
This method of smearing used overlap-and-add to process the input signal over time.
8 msec chunks of the waveform were extracted, then windowed, and an FFT was
done on each chunk. Next, the FFT was separated into the amplitude and phase
components. The phase component was left unmodified during the processing. The
amplitude component was squared to generate the power, and then was smeared as
follows. Auditory filters of normal width and of a width N times larger than normal
were calculated, with center frequencies at each of the frequencies in the FFT. These
auditory filters were stored in a matrix, such that if a FFT was multiplied by the
43
INPUT
HAMMING
WAVEFOAM
- - FF T
-
- -
100
IFF T
- --
WAVEFOAK
---
-
AND ADD
100
INPUT SPECTRUM
s0
80
60
60
40
40
0
INPUT
SPECTRUM
OVERLAP
SPECTRUM
WINDOW
C
OUTPUT
SMEAR
1
2
3
4
Frequency (kHz)
OUTPUT SPECTRUM
PCA0
OTU
f i4
Frequency
(kHz)
SEPARATE INTO
PowER
CONVOLVE WITH
RECOMBINE
POWER AND PHASE
PHASE
SMEARING FUNCTION
WITH PHASE
OUTPUT
SPECTRUM
Figure 2-2: Block diagrams of Moore frequency smearing algorithm. The top block
diagram shows the main processing sequence, and the bottom block diagram shows
detail for the "Smear Spectrum" block in the top part. Taken from Baer and Moore,
1993.
matrix, each frequency in the FFT would contribute its own excitation pattern to
the final result. In other words, the end result of an FFT multiplied by the matrix
was the excitation pattern generated by a stimulus with that FFT, and using those
auditory filters.
To create the effect of normal-hearing listeners having widened auditory filters,
the signal was multiplied by the wider-than-normal filter matrix in order to generate
the effect of wider auditory filter, and then multiplied by the inverse of the normalwidth filter matrix. This procedure removed the effect of a normal-hearing listener's
own auditory filters, leaving just the wider-than-normal filters to create an excitation
pattern.
After this smearing procedure, the square root of the result was taken
to convert back to amplitude, and this was combined with the unmodified phase.
An inverse FFT was taken, and then different processed chunks of the signal were
combined using the overlap-add procedure to create the final output waveform.
44
The particular details of the processing were as follows. The original vowel signals,
at a sampling rate of Fs=10000 Hz, were resampled to 16000 Hz for the Moore
processing. Next, the vowel waveforms were divided up into frames for processing
(analogous to the inverse of overlap-and-add). Frames were of length 128 samples,
corresponding to 8 ms, with a shift of 64 samples (4 ms) between frames. The frames
were then windowed with Hamming windows of length 128 samples, then padded
with 64 zeros on each side, to create a total length of 256 samples. An FFT of each
frame was then taken, and separated into the amplitude and phase. The amplitude
component was squared, then multiplied by two matrices. One represented auditory
filters widened by a factor of N, and one was the inverse of a matrix of normal-width
auditory filters. The auditory filters were symmetric and were of the form of a roex(p)
filter:
(2.1)
W(g) = (1 + pg)exp(-pg),
"where g is the [frequency] deviation from the center frequency (fc) of the filter
divided by fc." W(g) is the resulting filter shape, and p is a measure of the sharpness
of the filter. p is calculated using the following equations:
(2.2)
ERB = 24.7(0.00437fc + 1).
(2.3)
p = 4fc/ERB,
In these equations (and elsewhere), ERB is the equivalent rectangular bandwidth
of the filter. To simulate widening of the auditory filters, the value of p calculated
via these equations was then divided by the widening factor (N). Or equivalently,
the calculated ERB was multiplied by the widening factor before calculating p. For
the normal-width filters, p was not modified. The frequencies in the FFT (and hence
the frequencies at which the auditory filters were centered on) were from 0 to 8000
Hz every 62.5 Hz.
After the smearing was performed, the frequencies higher than 6875 Hz (the 110th
45
frequency) were set to 0. Next, the square root of the result was taken to generate
the spectrum amplitude instead of power. Finally, the original phase was recombined
with the modified spectrum, and an inverse FFT was taken to create a time-domain
signal of length 256. These 256-point frames were recombined using the overlap-add
method with a shift between frames of 64 samples. In the Baer and Moore paper
(1993), the resulting signal was low-pass filtered at 7 kHz both before and after the
smearing process; however, this was not done in the current implementation of the
smearing algorithm. Finally, the power of the smeared waveform was normalized
to equal the power of the original input signal, in order to keep the level constant
across stimuli. Examples of the outputs of the processing by the Moore method in
the present simulation are shown in figures 2-3 and 2-4. The second figure shows the
spectrum converted into "channels," which essentially shows the excitation pattern
produced by the signal. The process of creating these channels will be explained later.
It should be noted that in the implementation in this research, symmetric auditory
filters were used. This assumption is realistic at moderate sound levels, but both the
final stages of processing and the real hearing-impaired listener experiment used high
sound levels, such that this assumption would not be accurate. This is one potential
source of error in the results, and will be discussed later.
Also, the Baer and Moore (1993) paper discussed the actual effective degree of
smearing created by the model relative to the intended degree of smearing. This
is illustrated in figure 2-5, showing examples of a vowel spectrum before and after
smearing.
They found that as a result of the overlap-add procedure, the final
waveform had an effective smearing degree of much less than specified.
For an
intended smearing factor of 3, a representative vowel had an actual smearing factor
of about 1.5, as determined by a least-squares matching over the formant-containing
frequencies. This result also applies to the present simulation, and it is likely that it
applies to the ter Keurs, et al. (1992) algorithm as well, which also uses the overlapadd method. The ter Keurs, et al. algorithm used longer windows than the Baer and
Moore algorithm, so the overlap-add method may have less of an effect on reducing
46
Spectra of signals with different amounts of Moore smearing.
100 -u
1.5
50
3.0
0-
-50
0
500
1000
1500
2000
2500
frequency [Hz]
3000
3500
4000
4500
5000
Figure 2-3: Spectra resulting from different amounts of Moore smearing on the vowel
II taken from the word BIIDH. All files were put through the recruitment algorithm
but with no recruitment. The unsmeared signal is shown at the regular level, while
the other traces have been offset by -40 dB increments. The labels to the left of the
traces indicate the amount of smearing, in terms of the effective number of ERBs of
the widened auditory filters; US indicates the unsmeared signal.
47
110
I
100-
us
90
1.5
Channels corresponding to signals with different amounts of Moore smearing.
II
I
I
I
I
I
I
I
CO
0
_0
3.0
80-
0D
C
0
4.5
706.0
060-9.0
50-
40I
I
I
0
2
4
I
I
I
I
I
6
8
10
12
14
Channel #: spacing corresponds to a log scale.
16
18
20
Figure 2-4: Channels resulting from different amounts of Moore smearing on the vowel
II taken from the word BIIDH. All files were put through the recruitment algorithm
but with no recruitment. The unsmeared signal is shown at the regular level, while
the other traces have been offset by -8 dB increments. The labels to the left of the
traces indicate the amount of smearing, in terms of the effective number of ERBs of
the widened auditory filters; US indicates the unsmeared signal.
48
the effective degree of smearing.
ter Keurs
Like the Moore method of smearing, the ter Keurs method (ter Keurs et al., 1992,
and ter Keurs et al., 1993) simulated only reduced frequency resolution. Unlike the
Moore method, the algorithm for smearing did not attempt to simulate the frequency
resolution of hearing-impaired listeners exactly; rather, the algorithm was a more
general-purpose smearing algorithm that simulated reduced frequency resolution. The
model did not smear the actual spectrum of the signal, as did the Moore model, but
rather the spectral envelope, and then imposed the smeared spectrum on the original
signal: ter Keurs et al. (1992) write, "in this way spectral contrasts [are] gradually
reduced, but the signal's phase and harmonic structure [are] preserved." The smearing
was accomplished by first projecting the spectral envelope onto a log-frequency scale,
then convolving it with a Gaussian-shaped filter that effectively acted as a low-pass
filter. The convolution acted to average the envelope over B octaves, where B is the
effective ERB of the filter.
The overall processing system used in the ter Keurs et al. (1992) model is shown in
figure 2-6, and the specifics of the spectral smearing are shown in more detail in figure
2-7. A sample of the output of the model used by ter Keurs et al. is shown in figure
2-8, while samples of the output of the implentation of the ter Keurs method used in
the current simulation (which differs from the original slightly) are shown in figures
2-9 and 2-10. The signal processing sequence is rather similar to the Moore method,
in general; it divides the signal into frames, does an FFT, smearing, an inverse FFT,
then creates the new signal via the overlap-add method. In detail, the processing
was as follows. The vowel waveforms were first resampled to be at Fs=15625, having
previously been at Fs=10000. (The waveforms used in the experiments run by ter
Keurs et al. (1992 and 1993) were bandlimited to 6250 Hz.) A vowel to be smeared
was then divided into 256-sample frames (corresponding to 16.4ms) with a shift of 64
samples (4.1ms) between frames. Each frame was then windowed using a Hamming
49
SMEAREO
INPUT SPECTRUM
OUTPUT SPECTRUM
INPUT SPECTRUM
100
AFTER OVERLAP/ADD
100
100
FRAME 2
FRAME 2
FRAME 2
80
80
80
l
60
60
60
CA
40
40
0
1
2
3
4
6
5
100
40
0
1
2
3
4
5
6
100
0
60
60
50
40
40
3
4
5
6
4
5
FRAME
80
2
3
FRAME 3
80
1
2
5
100
FRAME 3
s0
0
1
3
40
0
1
2
3
4
5
6
0
1
2
3
4
5
6
100
100
FR
10
Do
Do
FRAME 4
FRAME 4
80
80
60
60
60
40
40
1
2
3
4
5
40
0
6
100
1
2
3
4
5
6
100
0
So
60
60
60
40
40
3
4
5
6
100
1
2
3
4
5
0
6
200
60
60
60
40
6
6
2
3
4
5
6
AVERAGE
80
2
3
4
5
Frequency (kHz)
5
100
80
1
1
AVERAGE
80
40
4
40
0
AVERAGE
0
3
FRAME 5
80
2
2
FRAME 5
80
1
1
100
FRAME 5
0
10
100
4
80
0
C
AE
40
0
1
2
3
4
5
Frequency (kHz)
6
0
1
2
3
4
5
Frequency jkHz)
6
Figure 2-5: Sample spectra showing the results of smearing, for a synthetic vowel
/ae/. The different frames (vertical dimension in the figure) show successive chunks
of signal that are partially overlapping. As can be seen in the right column, showing
the output spectra, the effective degree of smearing is less than was intended. The
intended degree of smearing is seen in the middle column, which shows the results
after smearing but before the overlap-add process. Taken from Baer and Moore, 1993.
50
window, and padded with 384 zeros on each side to make a total length of 1024
samples. An FFT was taken of the result, and the power spectrum was found by
squaring the amplitude of the FFT.
At this point, the processing done in the current research differed slightly from
that which was described in the ter Keurs et al. papers. In the current simulation,
the envelope of the power spectrum was found by low-pass filtering it twice (in
succession) with a first-order low-pass filter with a 3dB frequency of 894.4 Hz. In
the paper, the spectral envelope was found by taking the log of the power spectrum
then an inverse FFT to find the signal's cepstrum. In ter Keurs et al. (1992), they
write about the subsequent processing, "This cepstrum is "liftered" and through a
forward FFT the (log) spectral envelope is obtained. The envelope is then adjusted
to run smoothly over the spectral peaks." It is unclear whether or not in ter Keurs
et al.'s implementation at what point the inverse log of the spectral envelope was
taken-they might have done the convolution with the Gaussian filter with the log
spectral envelope rather than the linear spectral envelope, as was done in the current
simulation. This is another potential way that the processing techniques could have
differed.
After the spectral envelope was found, it was projected onto a log base 2
frequency scale. In the current implementation, since this projection is non-uniform,
intermediate values were interpolated in a stepwise manner-the values between one
known point and the next were set to value of the lower frequency point. Next,
the log-scale power spectrum was smeared by convolving it with a Gaussian of the
following form:
(2.4)
W(log 2 f) = e
(1092f /B)
2
where log2 f is the base 2 logarithm of the frequency scale, and B is the ERB of the
Gaussian, in octaves, as previously stated. Further details about the mathematics
behind this smearing process can be found in ter Keurs et al.
(1993).
After the
log-scale power spectrum was smeared, it was converted back to a linear frequency
51
digital input signal
15,625 samples/s
input buffer filled with
256 samples (16.4 ms);
192 samples overlap
with previous buffer
256
samples
Hamming window
256 samples
symmetrical padding
with 768 zeros
1024 samples
forward FFT
SPECTRAL
SHEARING
I
inverse FFT
output window
over 1024 samples
overlapping addition
with previous buffers
output signal
Figure 2-6: Block diagram showing the details of the ter Keurs processing method.
Taken from ter Keurs et al., 1992.
52
original complex spectrum
1024 samples
calculation of
power spectrum
calculation of
spectral envelope
projection of spectral
envelope on a
log-frequency scale
convolution with
Gaussian-shaped filter
imposition of the smeared
spectral envelope on the
original complex spectrum
I
Figure 2-7: Block diagram showing the details of the Spectral Smearing block from
2-6. Taken from ter Keurs et al., 1992.
53
1/8
20 dB
1/2
2
4
6
frequency in kHz
Figure 2-8: Sample of the outputs of the ter Keurs frequency smearing algorithm.
The top log spectrum, labeled "US," shows the unsmeared signal, while the bottom
three spectra show the results of smearing of 1/8, 1/2, and 2 octaves, as labeled.
Taken from ter Keurs et al., 1992.
54
Spectra of signals with different amounts of ter Keurs smearing.
100 -
US
500.5
0
1.0
0 -50>
1.5
-100
2.0
-1503.0
-200
0
500
1000
1500
2000
2500
frequency [Hz]
3000
3500
4000
4500
5000
Figure 2-9: Spectra resulting from different amounts of ter Keurs smearing on the
vowel II taken from the word BIIDH. All files were put through the recruitment
algorithm but with no recruitment. The unsmeared signal is shown at the regular
level, while the other traces have been offset by -50 dB increments. The labels to
the left of the traces indicate the amount of smearing, in octaves; US indicates the
unsmeared signal. One octave of smearing is approximately equivalent to 3.9 ERBs
of smearing.
55
110
1
100-
us
90-
0.5
Channels corresponding to signals with different amounts of ter Keurs smearing.
1
1
1
1
1
CU)
80-
*
1.0
CU
0
701.5
CD
60-
0
CU cc:
40-
-
2.0
..
3.0
302010
0
2
I
4
I
I
I
I
I
6
8
10
12
14
Channel #: spacing corresponds to a log scale.
I
16
18
20
Figure 2-10: Channels resulting from different amounts of ter Keurs smearing on
the vowel II taken from the word BIIDH. All files were put through the recruitment
algorithm but with no recruitment. The unsmeared signal is shown at the regular
level, while the other traces have been offset by -12 dB increments. The labels to
the left of the traces indicate the amount of smearing, in octaves; US indicates the
unsmeared signal. One octave of smearing is approximately equivalent to 3.9 ERBs
of smearing.
56
scale, then imposed on the original spectrum by multiplying the original by the ratio
of the old and new amplitudes (i.e., the square root of the power spectrum). The
inverse FFT was taken, and the result was then windowed again. This second window
was of length 1024 samples, and had a "unity central part of 256 samples and sine
wave-squared skirts." Finally, the output signal was reconstructed by overlap-adding
the frames, with a shift of 64 samples between frames. The output signal was then
truncated appropriately to match the length of the input signal, and normalized to
match the power per unit time of the input signal.
Hou
The method of smearing created by Hou and Pavlovic (1994) attempted to simulate
the reduced temporal resolution found in hearing-impaired listeners. As such, the
frequency content of the processed signals was close to normal, and indeed, one of the
intentions of the authors was to not corrupt the signal's spectrum, as they intended to
isolate the reduced frequency selectivity aspects of hearing loss while leaving out the
other aspects of hearing loss. Also unlike the previous two models, this algorithm does
not use an overlap-add paradigm, but instead first filters the signal into a number of
adjacent frequency bands, then does processing on the entire waveform for each band.
This was done to simulate the ear's analysis of the sound by the "auditory filters"
into similar adjacent frequency bands (or channels).
By doing temporal smearing
within each frequency region separately, the brain would be presented with a signal
that would be similar to what hearing-impaired listeners' brains would receive.
The model's method of reducing temporal resolution was based on a model of
human perception that included a few stages beyond the separation of the signal into
different frequency channels. A block diagram of this model is shown in figure 2-11.
In this model, after the signal is filtered into adjacent frequency bands, a nonlinear
device transforms the signal. A squaring operation was assumed by the authors,
although other nonlinear devices such as rectification are possible (Hou and Pavlovic,
1994). After the nonlinear stage is a temporal window that uses the information under
57
W(-t
BANK
i
b
OF
C
DECISION
DEVICE
FILTERS
Figure 2-11: Block diagram of perceptual model of human temporal processing, used
by Hou and Pavlovic in providing rationale for their temporal smearing model. Taken
from Hou and Pavlovic, 1994.
the window; and finally a decision device compares the resulting signals. The Hou
and Pavlovic model incorporates stages to exploit the properties of this perceptual
model, as is explained more in the following paragraph.
The specifics of the implementation of the Hou and Pavlovic (1994) processing
method are as follows (see figure 2-12).
The 10000 Hz vowel signals were first
resampled to 16000 Hz, to have a sampling rate similar to the other smearing methods.
Next, the signal is filtered into 32 channels by fourth-order Gammatone filters spaced
one ERB apart. The Gammatone filter has a shape very similar to that of the roex(p)
filter used in the Moore method. (The authors considered using roex(p) filters, which
are presumably a better representation of the true auditory filters, but decided on
Gammatone filters for their calculation efficiency.) For the purposes of implementing
the described model as closely as possible, the current research also used Gammatone
filters to create the channels for this stage. The impulse response of the Gammatone
filter is
(2.5)
1(t)
=
t 3 exp(-27rbt)cos(2rfot),
58
[multiple filter bank outputs in parallel]
Input
signal
Auditory
Filter
Bank
Fitr1
FitrSmsignal
Envelope
Smearing
u
0Smeared
A
(a)
wc(-t)
&Envelope
A
ADetectorRotE
_B
X2
D
*
C
SquareE*
--
G
F
SA / B
(b)
Figure 2-12: Block diagram of the Hou temporal smearing model. The top diagram
(a) shows the overall signal processing path, and the bottom diagram (b) shows the
specifics of how temporal smearing is accomplished. Figure reproduced from Hou and
Pavlovic, 1994.
59
"where t is time, fo is the center frequency of the filter, and b = 1.019B is the
parameter determining the sharpness of the filter," with B being the ERB of the filter
(Hou and Pavlovic, 1994). In implementing the Gammatone filter bank, the paper
used 23 filters covering a frequency span of 200-5000 Hz, as the stimuli they used
were bandlimited to this frequency range. However, in the present implementation
stimuli were used that had frequencies below 200 Hz, and (due to upsampling to 16000
Hz) up to 8000 Hz. Though the frequencies above 5000 Hz were generated through
the upsampling process and extremely small in magnitude, to make the results as
compatible as possible with the other smearing methods, filters were used up to this
frequency.
After the signal is separated into channels, each channel is processed separately
for smearing. The temporal smearing process, shown in figure 2-12 (b), first involved
finding the temporal envelope of the channel, using a Hilbert transform. The envelope
was then squared, and convolved with a "smearing temporal window." This smearing
window has a shape similar to that of the auditory temporal window used in the
perception model described above; both of these windows are roex(T,) functions.
These have the form of
(2.6)
W(t) = (1 + 2t/T)exp(-2t/T,),
where t is time, Tp is a time constant related to the window, and ERD = 2Tp, where
ERD is the equivalent rectangular duration of the window. In the paper, the value
of ERD ranged from 0-28ms. After the envelope of the signal was convolved with
this temporal window, a square root was taken. Through this process of squaring the
envelope, convolving it with the auditory temporal window, and taking the square
root, the net effect on the signal was to generate a new auditory window that is the
convolution of the auditory window internal to a listener and the smearing temporal
window, if a normal-hearing listener heard the smeared signal:
(2.7)
Werceivednew (t)
Wsmearing (t) * Winternai(t).
60
Following the square root, the original channel waveform is then scaled to have
the new envelope (appropriately time-aligned). This preserves the fine structure but
changes the envelope of the channel. Then, each channel is re-filtered with the same
Gammatone filter that was used to generate the channel in the first place, in an
attempt to reduce the spectral consequences of the time-domain smearing process.
Finally, the smeared channels are added together again to produce the final output
signal, and the result is normalized to the power per unit time of the original input
signal. A sample of outputs from the model used by Hou and Pavlovic is shown in
figure 2-13, while examples of outputs from the model used in the present simulation
(which differs slightly from the original) are shown in figures 2-14 and 2-15.
2.5.2
Method of Recruitment
The method of loudness recruitment was taken from the paper by Moore and Glasberg
(1993). The algorithm simulates the effects of threshold elevation found in hearingimpaired people, and the rapid increase in loudness level that occurs at audible levels.
Loudness recruitment appears to stem from impaired-hearing listeners not having
as much of the compressive nonlinearity normally found in the cochlea.
Since a
normal-hearing listener's cochlea will act to compress incoming sounds over a large
range of levels, to induce the perception in normal-hearing listeners of not having
this compression, the processing procedure expanded the envelopes of the signals
exponentially. Also, in listeners with only one impaired ear, the loudness perceived in
the impaired ear equals the loudness in the normal ear at a certain (high) level. This
effect was also included in the simulation by not compressing portions of the signal
above a threshold.
A general block diagram of the recruitment processing procedure can be found
in figure 2-16. As can be seen in this diagram, the signal to be processed was first
broken into a number of frequency bands.
As with the similar step in the Hou
smearing method, this step confined the effects of the processing so that the different
"channels" would be processed independently.
61
In this recruitment algorithm, the
60
0
.i~I
-I
-.
50
(-)
ii
S40
---I
S30
li:Z
- --W
P
WC
20
-4
I
10
0
1
f
1
i
--
f)
50
(C)
40
30
40
'0
20
'0
10
50
(A
40
030
0
'20
-14
10
0
0
0.2
0.4
0.6
0
0.8
---
-
1.6
3.2
4.8
6.4
8.0
9.6
Frequency (kHz)
Time (second)
Figure 2-13: Sample of outputs from the Hou temporal smearing model. The top
panels (a and b) show the unsmeared signal and its spectrum, and the lower panels
show the signal and its spectrum after it has been smeared with an ERD of O.lms
and 28ms, respectivley. Figure taken from Hou and Pavlovic, 1994.
62
Spectra of signals with different amounts of Hou smearing.
100
50
us
14
0M28
0
-50
42
- 100
56
-150
84
-200
!
0
500
1000
1500
2000
2500
frequency [Hz]
3000
3500
4000
4500
5000
Figure 2-14: Spectra resulting from different amounts of Hou smearing on the vowel
II taken from the word BIIDH. All files were put through the recruitment algorithm
but with no recruitment. The unsmeared signal is shown at the regular level, while
the other traces have been offset by -50 dB increments. The labels to the left of the
traces indicate the amount of smearing, that is, the equivalent rectangular duration
of the smearing temporal window, in msec; US indicates the unsmeared signal.
63
Channels corresponding to signals with different amounts of Hou smearing.
I
I
I
I
I
I
I
1 10
100-
US
14
90Q
m
*0
I
28
8042
0
CD
70r-
-56
c
0
84
0 60CD
.C
50-
C>D
40-
30-
20I
I
I
0
2
4
I
I
I
I
6
8
10
12
14
Channel #: spacing corresponds to a log scale.
I
I
16
18
20
Figure 2-15: Channels resulting from different amounts of Hou smearing on the vowel
II taken from the word BIIDH. All files were put through the recruitment algorithm
but with no recruitment. The unsmeared signal is shown at the regular level, while
the other traces have been offset by -8 dB increments. The labels to the left of the
traces indicate the amount of smearing, that is, the equivalent rectangular duration
of the smearing temporal window, in msec; US indicates the unsmeared signal.
64
INPUT
-----
-ROA
FILTER 5"
CONSTRUCT
ANALYTIC
AUDITORY
---Pm
SIGNAL
-W
SEPARATE
-MENVELCPE
->FINE
AND
STRUCTURE
-..
EXPAM
-g
-W
AND SCALE
ENVELOPE
ECWMBINE
->ENVELOPE
-*
FINE
AND
-AOUP
-30
STAJCTRAE ___
CHANNELS
Figure 2-16: Block diagram of the model of loudness recruitment by Moore and
Glasberg, 1993. The parallel arrows between the middle blocks represent the outputs
of the 23 bandpass filters in the first block. Figure taken from Moore and Glasberg,
1993.
filters used to create these channels had bandwidths similar to those of the auditory
filters that would be found in impaired-hearing listeners. This was not to simulate
additional frequency smearing, but to try duplicate the frequency independence that
would be found in impaired-hearing listeners. Furthermore, this bandpass filtering
allowed the amount of recruitment being simulated at different frequencies to be
varied. Next, the envelope of each channel was found, and the fine structure was
set aside for later use. The envelope of the signal was raised to a power N, then
the fine structure was recombined with the now-recruited envelope. The purpose of
expanding the envelope while leaving the fine structure left alone was to minimize
spectral distortion. Finally, the channels were recombined to form a single output
waveform.
The details of the processing are as follows. In the paper, the filters in the initial
filter bank were each formed of a series of four first-order Gammatone filters. In
the present implementation, however, it was decided to use a single fourth-order
Gammatone filter as had been done in the Hou method. The simulation described in
the paper used thirteen filters, with center frequencies from 100 to 5837 Hz. However,
it was found that using the filters as described caused undesired spectral distortion in
the resulting signal. So, instead, 23 filters were used, with center frequencies from 100
to 9190 Hz. The first six filters were spaced every 100 Hz, and thereafter they had a
spacing of one ERB, with the ERBs corresponding to the auditory filters of a normalhearing listener, 0.16fcente,. 1 The filter center frequencies were the same as those
'The 100-Hz spacing of the low-frequency auditory filters with non-constant ERBs probably
contributed to some spectral distortion, but having logarithmically-spaced auditory filters there
65
used later in the d' discrimination model, and the filter spacings were described in
that paper (Farrar et al., 1987). Though the filters in the recruitment algorithm were
spaced every ERB that a normal-hearing listener would have, the filters themselves
had ERBs three times that width, corresponding to the ERBs of impaired-hearing
listeners, as explained above. This led to a gain in the signal which was removed at
the end of the processing for this algorithm.
After filtering into channels, the envelope of each channel was found via a Hilbert
transform, and the fine structure was calculated by dividing the original signal by
the envelope.
Next, the envelope was filtered with a low-pass filter with a 3-dB
point of 100 Hz. The purpose of this step was to prevent spectral distortion; also,
Moore and Glasberg (1993) found that if the envelope were not low-pass filtered, the
resulting sounds "had a somewhat 'crunchy' quality." After the channel envelopes
were low-pass filtered, each envelope was expanded in the following manner. It was
assumed that the loudness recruitment would not take place above 100 dB, that is,
above this level the loudness hearing-impaired listeners would experience would match
the loudness experienced by normal-hearing listeners.
If the unprocessed channel
envelope, in dB, is Lu; the envelope after amplitude expansion occurs is Lp; and the
power that the envelope is raised to (only below 100 dB) is N, then the following
formula was used to simulate amplitude expansion:
(2.8)
Lp = NLu + K.
K is a constant defined by K = (1 - N)100 such that at 100 dB, Lu = Lp = 100
dB. Substituting in for K, we have:
(2.9)
Lp = NLu - (N - 1)100.
If Lu was above 100 dB, Lp was set to equal Lu. This method of expansion was
the one used in Nejime and Moore (1997), while the method in Moore and Glasberg
would have been even more unnatural. The best solution would probably be to have fixed-width
auditory filters with ERBs corresponding to an fcente, of 600 Hz.
66
(1993) multiplied the original Hilbert envelope by the low-pass filtered envelope raised
to the power N - 1.
The current implementation assumed that a value of N = 2 corresponded to a 50
dB hearing loss. At this point, an attempt was made to fit the simulation's parameters
to the audiograms of the different real subjects that took part in the confusion matrix
experiment. To do this, for each subject, different values of N were assigned to each
channel in accordance with how much hearing loss the subject had at that frequency.
Since loudness recruitment is caused by outer hair cell loss, the amount of which is
correlated with the amount of hearing loss, the value of N for the channel with center
frequency
f, was
determined by the equation
(2.10)
N(fc) =
100
100
100 - HL(fe)'
where HL(fc) is the amout of hearing loss at frequency
fe,
in dB.
Finally, the independently-recruited channels were added back together, and the
output was scaled so that if the recruitment parameter was N = 1, i.e. no recruitment
took place, then the level of the resulting spectrum matched the original signal's
spectrum as much as was possible. The algorithm did modify the signal slightly even
if no recruitment took place through the process of dividing the signal into channels
and adding them back together again. Also, in the implementation by Moore and
Glasberg (1993), the stimuli they used were low-pass filtered at 7000 Hz. Examples of
the output from the current simulation are shown in figures 2-17, 2-18, and 2-19. As
can be seen in these figures, the recruitment process produces both spectral distortion
and absolute level shifts for signals below 100 dB, while for signals at 100 dB there
is a little spectral distortion, and for signals well above 100 dB there is no distortion
or change in level.
67
Channels corresponding to different amounts of recruitment on an originally 100 dB signal.
10510095 -
CU
90-
0
CD a'-85
-
0
0
80-
1
*70 ~
2.
C
0
S7-.0
5.0
60551
0
2
4
6
8
10
12
14
Channel #: spacing corresponds to a log scale.
16
18
20
Figure 2-17: Channels resulting from different amounts of recruitment on the vowel
II taken from the word BIIDH. The vowel was at a level of 100 dB. Recruitment of
N = 1.0 (where N is the exponent to which the envelope is taken) corresponds to
a hearing loss of 0 dB, recruitment of N = 2.0 corresponds to a flat hearing loss of
50 dB, and in general the recruitment amounts correspond to hearing losses via the
equation N = 100dB/(10OdB - hearinglossdB). No additional offsets were used in
creating the plot: the effective level changes with increasing recruitment due to the
increased absolute threshold.
68
Channels corresponding to different amounts of recruitment on an originally 80 dB signal.
80
F
1.0
1.5
60 [-
2.0
40 k
3.0
20
F
0
5.0
-20 I-
-40
0
2
4
6
8
10
12
14
Channel #: spacing corresponds to a log scale.
16
18
20
Figure 2-18: Channels resulting from different amounts of recruitment on the vowel
II taken from the word BIIDH. The vowel was at a level of 80 dB. Recruitment of
N = 1.0 (where N is the exponent to which the envelope is taken) corresponds to a
hearing loss of 0 dB, recruitment of N = 2.0 corresponds to a flat hearing loss of 50
dB, and the other recruitment amounts correspond to hearing losses via the equation
N = lOOdB/(lO0dB - hearinglossdB). No additional offsets were used in creating
the plot: the effective level changes with increasing recruitment due to the increased
absolute threshold.
69
Effect of recruitment on different signal levels
130120110ai
(z 90-
8080
C
C60
C.O
0
0-50
403020
I
I
I
0
2
4
I
I
I
I
I
14
12
10
8
6
Channel #: spacing corresponds to a log scale.
I
I
I
16
18
20
Figure 2-19: Channels resulting from different amounts of recruitment on the vowel
II taken from the word BIIDH. The solid lines are the result of recruitment with a
simulated loss of 0 dB, and the dotted lines are the result of recruitment with a flat
loss of 50 dB. The dashed line shows channels made from the original signal without
recruitment, showing that simulating a loss of 0 dB does not distort the signal very
much. Signals with levels of 120, 100, and 80 dB were recruited corresponding to
losses of 0 dB and 50 dB, creating pairs of lines for each signal level. The 120 dB
signal is above 100 dB the limit of recruitment, so recruitment has little effect. The
100 dB signal has the highest parts at around the limit of recruitment, so they are not
affected, while the higher frequency portions of the signal are lower in level and so are
distorted. The 80 dB signal is severely distorted and reduced in level by recruitment.
70
First Stage
Second Stage
Moore smearing
ter Keurs smearing
Hou smearing
Moore smearing
ter Keurs smearing
Hou smearing
Recruitment
Recruitment
Recruitment
Recruitment
Table 2.2: Table of the different processing conditions used in the simulation.
2.5.3
Combining the Smearing Methods and Recruitment
In the simulation, in some processing conditions the various smearing methods were
applied in isolation, and in other processing conditions the model of recruitment was
applied after one of the smearing methods had been applied. The case of smearing
followed by recruitment was previously studied by Nejime and Moore (1997) using
the same Moore frequency smearing function and model of recruitment used in the
current simulation. Since the model of recruitment tended to modify the spectrum
a little even if no recruitment was being done, in the current simulation all of the
processing conditions were put through the recruitment model, but for the conditions
where recruitment was not desired, a recruitment parameter of N = 1 was used. The
eight different processing conditions done in this simulation are shown in table 2.5.3.
In the table, an entry specifying recruitment indicates a value of N other than 1.
2.6
d' Discrimination Model
The next steps in the signal processing chain are part of the perceptual model. This
model is based around the d' discrimination model (Farrar et al., 1987). In essence,
the d' discrimination model assumes that people will discriminate between sounds
based on their spectral differences. So, the model first finds a representation of the
spectra of the signals to be compared. It assumes that signals will be classified based
71
on the linear differences between their spectra, then it computes a linear difference
between pairs of spectra. Support for the model's accuracy is found in the Farrar et
al. (1987) paper, in which listeners' responses to speech sounds modeled by shaped
noise were reasonably well predicted by the model.
The model involved two sub-blocks, which are illustrated in figure 2-20. In the
first block, the make channels block, the model filters the signal into a number of
bands spaced by one ERB, as was done in the Hou and recruitment models. The
center frequencies of the filters were the same as those used in the recruitment model.
As with the Hou and recruitment models, this stage is meant to simulate the auditory
filtering performed in humans. One difference from the Hou and recruitment models
was the specific shape of the filters used to form the channels. The Farrar et al. paper
used an earlier model of the filters, also derived by Patterson (1974). The form of
these filters was
Hi (f) = 1 + ('
(2.11)
where
fej
cj)2
,
is the Jth filter's center frequency, and a is a constant that determines the
width of the filter via ERB
=
(7r/2)a. The ERBs of the filters were set to be 100 for
the lowest 6 filters (up to 600 Hz), and were set to 0.16fe,j for the filters centered at
frequencies higher than 600 Hz. There were 23 filters total. After filtering, the total
power per unit length was found for each filter output, then the natural log (ln) was
taken of the result. The resulting quantity shall be referred to as the "channels" from
a signal. It is important to note that though the signals used had different lengths,
and even the ter Keurs processing method required a different sampling rate than
the other methods, the process of finding the power per unit time accomodates for
these differences. The steps just described filtering the signal into these channels took
place in the make channels block in the main block diagram in figure 2-1, as well as
in figure 2-20.
The next block in the d' discrimination algorithm compares the channels from
72
d' discrimination algorithm block
Make Channels block
sumaoe
A single A
oellnt
vowe length
overnormalize
sumand
in2
23 Channe2
23 Channels
Wx)
vowel
X2
Can1
sumaover vowe length
Compare vowel pairs to find d's
Level in
each
Channels
from
1
Ai
channel
vowel a
d' between
vowels a and b
Channels
from
vowel b
Channel # (i)
l
d'
2
23
sqrt( If A,
)
a
os
Figure 2-20: Block diagram of d' discrimination model. The bottom block comes
after the top block in the processing path. The top block is used to create channels
from all vowels processed, and the bottom block takes all combinations of pairs of
vowels, and produces a d' distance between each pair. In the top figure, the signal
is first filtered into bands. Then, in each band, the average power per unit length
of the signal is found by first squaring it, summing over the signal length, dividing
by the signal length, and then normalizing to account for variation in signal length
between stimuli. Finally, the natural log of the resulting number is taken to produce a
channel. In the bottom diagram, the difference (A2 ) between corresponding channels
is found for a pair of stimuli, as indicated in the plot. These differences are then put
through the formula in the bottom of the figure. E is an arbitrary constant that is a
scale factor for all d' distances.
73
pairs of signals to find the d' distance between each pair of signals. This is illustrated
in the bottom part of figure 2-20. For a given pair of signals, the difference between
each corresponding channel from the two signals was found, labeled Ai in the figure.
Then, the d' value was found via
23
(2.12)
A
d =
in which A2 and - have been previously described. At the output of this processing
block, then, was a matrix of d' values for all combinations of signal pairs.
One additional processing step was done while finding d' values for pairs of signals.
To simulate the effect of the real subjects' hearing thresholds, the audibility of the
signals was computed, for each subject separately. The levels of the channels were
first converted into physical sound pressure levels through a gain. Next, this physical
level was compared against the real subjects' absolute hearing thresholds. In the d'
discrimination algorithm, if the level of any channel from either signal being compared
fell below a subject's hearing threshold, the difference between that channel and its
pair was removed from the calculation of d' for that subject. Or in other words, a
simulation was done for each real subject, and the audibility of the stimuli to that
subject was computed. In doing this, it was hoped that the results found for each
subject would match the subject's actual confusion matrix.
Finally, it should be noted that while the d' discrimination model assumed
that humans discriminate between sounds on the basis of the spectrum alone, the
Moore processing method (and to some extent the ter Keurs processing method)
only attempted to simulate the spectral effects of reduced frequency resolution in
the impaired ear. So, while this perceptual model may be leaving some elements
of the signals out relative to how real people would perceive sounds, it is entirely
appropriate for evaluation of these frequency smearing models. The Hou model,
however, attempted to simulate reduced timing resolution, and in fact tried to limit
the spectral consequences. There were time differences between the lengths of the
74
vowels in the CVC tokens, so the real subjects' comprehension would be affected by
temporal smearing, but in the present simulation this was not taken into account.
2.7
Decision Model
Once the d' values for all of the pairs of signals had been found, some way was
needed to convert these results into a confusion matrix, for comparison with the
real subjects' results. A decision model was used that assumed the following. Each
stimulus generates a vector of perceptual cues in a perceptual space internal to a
listener. The locations of these points in the perceptual space will be referred to
as "stimulus centers." The listener also creates a set of "response centers," points
in the perceptual space that are used as models or prototypes for each category of
stimulus. In the present simulation, it is assumed there is one response center for
each vowel, while the stimulus centers are the many examples of each vowel that are
presented to a listener. It is assumed that the response centers are used in classifying
the stimulus centers (or stimuli) into different categories. It is also assumed that, due
to uncertainty in a listener's mental representation, the cue vector for each stimulus
center is the center of a multi-dimensional Gaussian probability distribution. The
distribution corresponds to the various cue vectors that could be used by a listener
in the actual decision of classifying that stimulus, and the distribution probabilities
being the associated probabilities of those cue vectors being used. The last step of the
decision model assumes that a stimulus will be classified in the category corresponding
to the response center that its cue vector is closest to. A simple illustration of the
model is shown in figure 2-21.
A block diagram of the inner workings of the decision model is shown in figure
2-22. In the implementation of this model, first the matrix of d' values was put into
a classical multi-dimensional scaling algorithm that produced a map of the points
that accurately represented their d' distances.
Only the first three dimensions of
the resulting map were used, but these provided sufficient accuracy. It was assumed
75
= Response Centers
x
X
x
x
0
CO)
X
x
x
X
xx
x
C
0
C
E
5
0T
xf x
XX
x
xx
x
x
x
1
x
X
0
x
0
c
0
0
0
x
0
0
0 0
00
0
X
x
x
0
x
X
x
0
0
0
x X
x X
x
C
c'J
X
X
x,O, 0 = Stimulus
0 0
0
00
0
0
Centers for Response
Centers 1,2,and 3,
respectively. Each
stimulus creates
perceptual cues in a
Gaussian distribution
about the Stimulus
Center associated with it.
0
o
*
S
0
*
00
S
OS
.S
(D)
.
* * .l3
S
0
0
*
S
Perceptual Dimension 1 -- units of d'
Figure 2-21: Illustration of decision model function in two dimensions. Stimulus
centers are represented by the small points in the figure, and are surrounded by
Gaussian distributions in perceptual space. Response centers are found by taking the
centroid of the stimulus centers of a single type. The probability distributions around
each stimulus center are assumed to evoke responses corresponding to the closest
response center. The lines dividing the figure are the boudaries of these so-called
response regions.
76
Decision Model
Matrix of d's
vowel pairs
around each
Multi-dimensional
Scaling
Stimulus Center
Map ef Stimulus
Sum the
contributions
Confusion
from each
Matrix
vowel type in
Perceptual Space
response
Find Centroids of
Stimulus Centers
(Find Response
Centers)
region
Find boundaries
,
between
Response
Centers
N
Figure 2-22: Block diagram of decision model.
that this resulting map was the perceptual representation of the stimuli by a listener:
the vector of coordinates of each point in this multi-dimensional map was the cue
vector for that point in a perceptual space, and the dimensions of the map were the
dimensions of the perceptual space.
To implement the perceptual uncertainty of each stimulus center, each stimulus
center was replaced by a multi-dimensional Gaussian centered at the stimulus center's
location. Also, the locations of the response centers were assumed to be the centroids
of the stimulus centers for each type of vowel. Finally, each point in the Gaussian
distributions surrounding the stimulus centers was assigned to category based on
which response center it was closest to. A confusion matrix was formed by summing
the probabilities of the points for each stimulus category that were classified as each
of the possible responses.
In the simulation, the following parameters were used in this model: only the
first 3 dimensions of the perceptual (d') map were used, so the Gaussians were in 3
dimensions. A variance of d' =0.1 was assumed, where the maximum dimensions of
the d' map were usually about 4-5 d' units in the first dimension and about 3-4 d'
units in the second dimension, to give an idea of the scale.
77
Chapter 3
Results
Tables of simulated confusion matrices for all processing conditions are in Appendix
B, and tables of the real subjects' actual confusion matrices are found in Appendix C.
3.1
Total Error Rates
Plots of the total error rates for the simulated confusion matrices fitted to the different
subjects' data can be found in figures 3-1 through 3-10. Figures 3-1 through 3-5 show
the total error rates of the simulated confusion matrices, while figures 3-6 through
3-10 show the error rates relative to the confusion matrices for unsmeared stimuli.
All figures in this chapter are located at the end in a dedicated section.
The total error rates of simulated confusion matrices are almost always higher than
the total error rates of the confusion matrices produced by the real subjects in 17 out
of 120 smearing conditions. Of these, 15 of them were either the unsmeared stimuli or
processed with the Hou smearing condition, which tended to give the similar results
as the unsmeared stimuli. Moreover, the unsmeared stimuli have higher error rates
than the real subjects in two cases, in one case they are almost identical, and in the
other two cases they are within 7 percent. This indicates that the total error rates
are limited by the perceptual model, since that was the processing stage responsible
for the unsmeared stimuli producing large total error rates.
78
If we subtract the errors in the unsmeared stimuli from the rest of them (that
is, find the error rates relative to the unsmeared stimuli), then the error rates are
comparable to those of the actual subjects in a significant portion of the conditions.
However, subtracting the confusion matrices from the other processing conditions does
not produce confusions identical to those made by real subjects, as will be discussed
later.
With increasing smearing for the Moore and ter Keurs methods, the total error
rate generally increases approximately linearly, but with a varying slope. As the
smearing is increased by a factor of 2, there is a change in total error rate of between
0 and ~18 percent, with the majority of cases having a change of 5-10 percent. This
change is independent of whether or not recruitment occurs. With the Hou smearing
and no smearing, there is little or no increase in error rate as amount of smearing
increases.
The other trend in the total error rates is that the error rates with smearing
followed by recruitment were above the error rates for the same smeared conditions
but with no recruitment in all except 1 of 60 cases. Recruitment added a total error
rate of between 0 to 20 percent, with a typical change of about 10 percent. This also
meant that the error rates with recruitment were typically far above the subjects'
actual error rates.
3.2
Locations of Errors in Confusion Matrices
The particular errors in the simulated confusion matrices occur in locations generally
different from the errors in the real people's confusion matrices.
There is some
overlap in the real and simulated confusions, but there are also many confusions that
are different between the real subjects and the simulation. With the real subjects,
errors are typically in a few locations, while in the simulations they are much more
widespread. Sample confusion matrices from the processed conditions are shown in
table 3.1. These trends can be seen in a visual inspection of the sample confusion
79
matrices.
Also, the locations of the maximum errors in the simulated confusion
matrices tended to be in the same positions regardless of the type and degree of
smearing, or which subject's parameters were used in doing the processing. Generally,
as the amount of smearing increased or recruitment was added, the confusions in the
existing locations increased in magnitude, while new confusions in other locations were
added that generally had smaller magnitudes. Errors in this same set of locations
also occurred in the confusion matrices for the unsmeared stimuli, showing that this
pattern was likely a consequence of the perceptual model.
Plots showing how well the simulated confusion matrices correspond to the real
confusion matrices can be seen in figures C-1 to C-15 in Appendix C, and a plot
showing how well the average data in table 3.1 fits to the real confusion matrices can
be seen in figure 3-11. These plots show a X2 measure and a Mean Squared Error
(MSE) measure of the goodness of fit of the simulated matrices to the real subjects'
matrices. The MSE measure was computed by
36
(3.1)
N E(O - E)21
i=1
where N is the total number of stimulus presentations, 0 is an entry in a real subject's
confusion matrix, E is an entry in a simulated confusion matrix, and the sum is over
all 36 entries in a confusion matrix. The X2 measure was computed by
36
(3.2)
N
(O- E)2
E F
f orE > 0.5,
where N, 0, and E are the same as in equation 3.1.
In the X2 measure, any entries in the simulated confusion matrices less than 0.5
were excluded because very small non-zero values would cause excessive contributions
to the measure, and zero values would cause values equal to infinity. Even so, if an
entry in the simulated confusion matrix is small and the entry in the real confusion
matrix is large, that element will contribute to the total X2 measure disproportionally
more than it probably should for a "fair" measure. For this reason, the MSE measure
80
is possibly better to use in evaluating the goodness of fits between the real and
simulated confusion matrices, although this measure has the problem that if there
are small entries in both the real and simulated confusion matrices, the contribution
of those elements is fairly trivial. It is this reason that causes some values to be
excessively small (as can be seen in the plots). The plots show the natural log of both
of these measures.
In general, as can be seen in the X2 plots in figures C-1 to C-15, fitting
the recruitment and audibility parameters to correspond to one of the subjects'
audiograms did not produce results that fit that subject better. Also, the X2 values
produced were generally quite large, indicating that the simulated confusion matrices
did not fit the real data very well. Values ranged from about 20-50 for subjects JO
and RC, and values of about 150 for the other subjects for the X2 measure, and about
7-20 for the MSE measure (on the plot, the natural log of these values is shown). With
the X2 measure, the processing condition that produced the best results varied quite a
bit, with almost all processing conditions producing the best fit for some subject's real
confusion matrix. Also, there does not seem to be any benefit of using recruitment
or not using recruitment to produce a better fit. With the MSE measure, in nearly
all of the cases the unsmeared and Hou smearing without recruitment conditions
produced the best-fitting confusion matrices.
There seemed to be slightly better
performance from the unrecruited conditions relative to the recruited conditions with
this measure. Also, typically with both measures of goodness of fit the Moore method
without recruitment did better than the ter Keurs method without recruitment for
the same smearing multiplier.
Two of the subjects' (JO and RC) confusion matrices tend to fit the predictions
better than did the other subjects' confusion matrices, regardless of the subject's
audiogram the recruitment and audibility parameters in the simulation were fit to.
This was true especially for the unsmeared and Hou smearing with no recruitment
conditions. This effect occurred because the real confusion matrices for these subjects
had small total error rates, and the errors that did occur were spread out over many
81
low values rather than having a few large values, which is what happened with the
other subjects' real confusion matrices. Since the simulated confusion matrices tended
to produce confusions in many locations, the contribution towards the X2 measure
from each of the confusions was smaller, but there were more such contributions. The
end result was a smaller value of x2.
Also, because the simulated confusion matrices corresponding to the unsmeared
and Hou without recruitment conditions typically had the smallest error rates among
the simulated confusion matrices, and because the simulated confusion matrices
tended to have confusions in the wrong locations, these conditions tended to fit the
real data better. The Hou method does not cause much change in the spectrum, as
expected, so the confusion matrices produced by it are not much different than the
unsmeared confusion matrices. Also, and notably, the unsmeared and Hou without
recruitment conditions usually had total error rates closest to the total error rates
of the real confusion matrices as well. These conditions fit subjects JO's and RC's
confusion matrices particularly well due to the combination of both confusion matrices
having small error rates, which tended to contribute less to the X2 measures as
described above.
If the confusion matrices resulting from the unsmeared conditions were subtracted
from the other simulated confusion matrices, there was still no benefit in accuracy,
as the resulting confusions still tended to be in the same locations.
82
No
Sm.
Moore
Sm.
ter
Keurs
Sm.
Hou
Sm.
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
99
0
2
0
0
1
94
0
7
0
0
2
88
0
9
0
0
3
99
0
3
0
0
1
No
EE
0
90
0
0
9
0
0
85
0
2
15
0
0
82
0
4
16
0
0
90
0
1
11
0
Recruitment
EH II 00 UH
0
0
0
1
0
9
0
0
4
0
2
92
3
10
2 84
3
81
7
0
83
2
9
5
1
0
0
5
0
13
2
0
0
12
5
76
16
15
7 60
2
71
0 12
62
5
11 19
3
0
8
1
0
13
4
1
18
1
62 10
15
21
10 50
4
59
1 19
57
4
21 16
0
0
0
1
0
1
10
0
6
0
5
87
5
12
4 78
2
80
7
0
3
78
7 11
With Recruitment
AH EE EH II 00 UH
0
0
0
2
1
96
0
16
2
0
81
1
10
1
6
76
1
6
14
13
5 66
1
1
3
70
2
6
17
1
59
5
7 26
0
2
1
0
0
5
0
93
0
17
2
1
0
80
15
2
8
67
1
9
19
18
52
7
2
1
3
62
1 12
1
21
50
8
13 25
0
3
3
1
9
1
2
85
0
17
2
1
80
1
22
3
51 12
2
10
18
22
11 44
4
1
3
53
2 18
1 22
45
3
25 22
0
4
0
0
0
1
4
94
0
1 1 20
1
77
2
13
6
70
1
8
17
13
5 62
1
2
3
67
7
1
21
1
58
5
8 27
0
2
Table 3.1: Table of the average simulated confusion matrices for each processing
condition. Averages were taken across subjects and smearing amounts. From top to
bottom, the processing conditions are: No smearing, Moore smearing, ter Keurs
smearing, and Hou smearing. The left column contains the results of smearing
without recruitment, and the right column contains the results of smearing followed
by recruitment.
83
3.2.1
Comparison of Moore and ter Keurs Methods
Calculations showed that widening the ERBs by 3 times in the Moore method was the
same bandwidth as smearing by 0.75 octaves in the ter Keurs method. However, the
ter Keurs method almost always produced higher error rates than the Moore method,
by about 5-15%. This can be easily seen on the graphs of total error rate (figures
3-1 to 3-5) since the points corresponding to 3 times the width and 0.75 octaves of
smearing were set to be the "base" conditions with the smearing multiplier equal to
1. As the smearing multiplier increases the amount of smearing remains about the
same for the two methods.
A table showing the average confusion matrices for the Moore and ter Keurs
smearing methods with approximately the same total error rates is shown in table
3.2. As can be seen in this table, there are a few differences in the amounts of smearing
caused by each method on the average, though the locations at which confusions occur
are extremely similar. In confusing EH and II with UH, on average the Moore method
produces larger II-UH confusions than EH-UH confusions, while the ter Keurs method
produces the reverse. Also, the ter Keurs method produces larger 11-00 confusions
than the Moore method on average. These are the primary differences between these
average confusions created by smearing, although other minor differences remain.
Though these differences can be seen in the averages, in the individual subjects the
results vary to a large degree, with in some cases the reverse being true.
If the Moore and ter Keurs smearing methods (without recruitment) are compared
with approximately equal total error rates, it is found that in almost all of the cases,
with both measures of the goodness of fit, the ter Keurs smearing method gives a worse
fit to the real subjects' data than the Moore method. This can be seen in figures 3-12
and 3-13, which show the fitting measures compared with different smearing amounts
but the other parameters the same. In the figures the ter Keurs results (dotted line)
are almost always above the Moore results (solid line).
84
Moore
4.5x,
ter
Keurs
0.75
oct.
Moore
6x,
ter
Keurs
1.125
oct.
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
Moore Smearing
AH EE EH II 00 UH
0
5
0
0
1
94
2
14
0
0 85
0
0
12
74
6
7
0
18
14
7 59
0
2
2
73
0 11
0 14
61
6
11 20
0
2
2
0
0
8
0
90
0
17
0
4
0 79
0
14
6
1
70
9
19
17
8 50
5
0
62
3
0 14
0 20
51
8
13 25
3
0
AH
91
0
6
0
0
3
88
0
9
0
0
3
ter Keurs
EE EH
0
6
85
0
0 65
8
4
1
16
20
0
8
0
1
82
0 62
4
10
1
15
21
0
Smearing
II
00
0
4
10
57
19
14
1
4
10
48
20
16
0
10
1
21
60
3
0
13
1
22
59
4
UH
3
0
17
10
5
60
3
0
17
16
4
57
Table 3.2: Table of the simulated confusion matrices for approximately equivalent
total smearing amounts for the Moore and ter Keurs methods, without recruitment.
The results shown in the table are the averages across all subjects. The top row
contains a Moore smearing with a widening factor of 4.5, and ter Keurs smearing
of 0.75 octaves. The total error rate with that amount of smearing for the Moore
method was a little lower than the total error rate for that amount of ter Keurs
smearing. The bottom row contains a Moore smearing with a widening factor of 6,
and ter Keurs smearing of 1.125 octaves. The total error rate with that amount of
smearing for the Moore method was a little higher than the total error rate for that
amount of ter Keurs smearing.
3.3
Effects
of
Processing
Parameters on
the
Results
This section will discuss how altering different parameters in the processing affected
the resulting confusion matrices.
3.3.1
Loudness recruitment and Audibility
The effect of the loudness recruitment algorithm is mainly to lower the level of the
spectrum and linearly increase the level differences between different parts of the
spectrum. Recruitment also tends to introduce some distortion in the signal, causing
its spectrum to be smoothed out similarly to the effect of the frequency smearing.
85
However, this effect occurs to the largest extent in the high-frequency parts of the
signal. These high frequency portions of the signal are frequently not audible in the
current simulation, thereby removing this effect of recruitment. A plot showing the
percent of stimuli that had a given channel audible vs. the channel number is shown
in figure 3-14. Because of this, the main effects of recruitment in this simulation were
likely to alter the audibility of the different channels, and to increase the d' distances
through expanding the spectrum.
Also, for two of the subjects (JO and MG), the stimuli were presented at a high
level (>105 dB) such that recruitment did not make as much of a difference, only
causing a change in total error rate of up to about 8% for the Moore and ter Keurs
smearing methods, and about 10% for the Hou and unsmeared conditions.
The
smaller influence of recruitment can be also seen through inspecting the confusion
matrices associated with these subjects, in Appendix B.
One additional note should be made about the audibility of the high frequencies.
It is clear there is something wrong with the audibility in the current simulation. In
the original experiment with real subjects, the levels of high- and low-passed sounds
with a cutoff of 2500 Hz were adjusted to maximize the subjects' comprehension of
the syllables being played. The current simulation predicts that, at the ideal levels
found in the previous experiment, the high-passed portion of the signal would have
likely been entirely inaudible for all of the subjects. This was clearly not the case, so
something is probably wrong with the simulation. To accomodate for this, a second
simulation was done that assumed all frequencies were audible for all subjects. The
results of this are discussed later.
3.3.2
Male vs. Female Speakers
In the simulation of the perceptual model, the stimuli corresponding to the male
speaker and the female speaker formed two quite distinct clusters in the perceptual
space (of d' distances) map. As such, the male and female speakers' stimuli were
put through the perceptual model (and classified) separately. With the male and
86
female speaker stimuli being classified separately, to form a single confusion matrix
at the end the confusion matrices for the male speaker and female speaker were
averaged.
This would accurately reflect the responses of a person exposed to the
entire stimulus set, as approximately half of the vowel samples were spoken by the
male speaker, and half were by the female speaker. The confusion matrices produced
by the simulation with the male and female points separated were more accurate (as
determined by the X2 and MSE measures) than if the stimuli spoken by the male and
female speakers were mixed together in the perceptual model. This was largely due
to the fact that when they were classifed together, the distribution of stimulus centers
corresponding to each vowel tended to be much larger than the distribution of stimulus
centers from either the male or female speaker separately. The larger distribution of
stimulus centers resulting from the combined male and female simulation, as well
as the distinct clusters formed in the perceptual map of the stimuli from the two
speakers, are illustrated in figure 3-15. A table of the averages across subjects, with a
smearing multiplier of 1, is shown in table 3.3. In this table, the amounts of confusions
are larger than with the averages from the male and female speaker stimuli classified
separately, despite the fact that the smearing multiplier of 1 data corresponded to
the least error rates in the separate simulation. A plot showing the X2 and MSE
measures of how well the data from the table fit the real subjects' confusion matrices
is in figure 3-16. In the figure, the MSE measure and the X2 measure for subjects JO
and RC shows that this combined simulation has a much poorer fit than if the male
and female speakers' stimuli are classified separately and then combined. The other
subjects for the X2 measure show comparable results.
3.4
Results with Simulating Full Audibility
In this new simulation, it was assumed that all frequencies were audible to all subjects;
so, all channels were included in calculating the d' distance between pairs of vowels.
87
3.4.1
Error rates
Graphs of the total error rate assuming that all frequencies are audible are shown in
figures 3-17 through 3-21. In these plots, the points corresponding to the un-recruited
signals are the same in every plot, because the only difference between them was the
audibility of the points, and now that difference has been removed.
The error rates of the unrecruited stimuli with full audibility decreased or stayed
about the same as the error rates of the reduced-audibility stimuli. This shows that
decreasing the audibility of high frequencies has a detrimental effect on perception.
Even with the reduced error rates, the subjects' actual error rates still tended to
be below the calculated error rates. The results of the ter Keurs smearing method
were never below the actual error rates; with small amounts of smearing, however,
the Moore method results could match or be close to the actual error rates. Also, in
all subjects, the unsmeared confusion matrices were below or close to the actual error
rate, showing that it could be theoretically possible to apply a smearing algorithm
and match the error rate exactly. The error rates with recruitment were usually far
above the subjects' actual error rates, as with the reduced audibility case.
Also as with the reduced audibility simulation, the error rates for the Moore and
ter Keurs smearing methods increased with the amount of smearing approximately
linearly, while the error rates for the Hou and unsmeared conditions did not increase.
Comparing recruitment with and without all frequencies being audible, the error
rates corresponding to the recruited signals generally stayed the same or decreased a
little with full audibility relative to the simulation with reduced audibility. Although
increasing the audibility will tend to decrease the error rate (as shown by the
unrecruited stimuli), making use of all the frequencies includes the frequencies that
were smeared by recruitment, which would tend to increase the error rate. So, it is
hard to conclude anything based on this finding. Also, since the audibility differences
have been removed, the only difference in the recruited signals between subjects is
the amount of recruitment applied at different frequencies (according to the subjects'
audiograms). It can be seen in the figures that the between-subject error rates with
88
recruitment can vary by up to 20 percent, showing that recruitment is a significant
factor in producing confusions when the affected frequencies are audible.
3.4.2
Locations of Errors
In general, with full audibility the locations of the confusions (and maximum
confusions) were usually the same as those with reduced audibility. The amounts
of the confusions produced did vary relative to the reduced-audibility case, however.
Tables showing the average confusion matrices for the different processing conditions
and comparing the Moore and ter Keurs smearing results for similar total error rates
are in tables 3.4 and 3.5, respectively. A plot showing the X2 and MSE measures for
how well the average data in table 3.4 fits the real subjects' confusion matrices can
be seen in figure 3-22. As with the reduced audibility case, altering the parameters
of the simulation to fit one of the subjects better than the others did not produce
results that fit that subject better.
Also, the plots showing the x 2 and MSE goodness of fit measures were very
similar to those corresponding to the reduced-audibility case. These can be seen in
Appendix C in figures C-16 through C-30. The values in these figures are also large,
similarly to the reduced audibility condition. With both goodness-of-fit measures,
using recruitment tended to produce worse results than not using recruitment in the
majority of cases, but this was not true in general. As with the reduced-audibility
case, the unsmeared and Hou smearing (without recruitment), and subjects JO and
RC tended to produce better fits than the other conditions. In general, the goodness of
fits with the x 2 measure tended to be better with the all-frequencies-audible condition,
while with the MSE measure the goodness of fits was about the same.
3.4.3
Comparison of Moore and ter Keurs Methods
With the full-audibility condition, if the Moore and ter Keurs smearing methods
(without recruitment) are compared with approximately equal total error rates, it is
89
found that neither method consistently gives better results than the other. This is
different from the reduced-audibility case, in which the Moore method was better.
This can be seen in figures 3-23 and 3-24, which show the fitting measures compared
with different smearing amounts but the other parameters the same.
90
No
Sm.
Moore
Sm.
ter
Keurs
Sm.
Hou
Sm.
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
No
AH EE
92
0
0
79
6
1
0
6
0
29
1
0
81
0
0 84
15
1
0
6
0
28
1
0
84
0
0
78
12
1
0 11
0 28
4
0
92
0
0
76
7
1
0
9
0 31
2
1
Recruitment
EH II 00
8
0
0
0
3
18
78
3
2
2 63
22
0 29
38
13
5
11
15
0
0
0
2
14
62
6
6
3 51
22
0 23
44
11
8 14
13
0
1
0
7
14
53
8
5
6 38
26
1 26
37
22
9
13
6
0
0
0
4
20
71
5
6
2 57
22
0 31
34
18
6
12
UH
1
0
9
7
4
69
4
0
10
19
6
66
3
0
22
18
7
51
2
0
11
10
4
62
With Recruitment
AH EE EH II 00
80
3
17
0
0
1 71
1
1
25
11
6
56
4
14
1
13
4 35
17
1
44
3 23
21
2
3
7 12
21
78
2
19
0
1
0
70
1
2
26
16
4 48
5
17
1 14
5 29
19
0 43
2 28
19
1
3
13 14
20
75
4
16
0
3
2
72
1 6 19
14
6
32 13
11
1 18
5 21
22
2
44
2 24
17
5
4
14 14
18
80
4
15
0
0
2
67
1
3
26
11
6
54
4
12
1 16
3 31
17
1
43
2 28
19
3
4
9 11
19
UH
1
1
10
30
10
54
1
0
10
32
8
49
2
0
23
34
11
45
1
1
13
31
7
55
Table 3.3: Table of the average simulated confusion matrices for each processing
condition. Averages were taken across subjects for a smearing multiplier of 1. From
top to bottom, the processing conditions are: No smearing, Moore smearing, ter
Keurs smearing, and Hou smearing. The left column contains the results of smearing
without recruitment, and the right column contains the results of smearing followed
by recruitment.
91
No
Sm.
Moore
Sm.
ter
Keurs
Sm.
Hou
Sm.
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
99
0
0
0
0
1
94
0
6
0
0
2
88
0
6
0
0
2
98
0
0
0
0
1
With Recruitment
No Recruitment
EH II 00 UH
EE
AH
UH
EE EH II 00
0
0
1
2
0
96
0
0
0
0
0
0
15
5
0
0 80
0
2
1
0
97
15
3
6
71
1
4
3
0
4
92
1
20
18
6 55
1
0
0
5
4 90
1
3
77
7
1
12
0
3
91
6
0
1
69
11
6 13
0
1
90
6
0
2
0
2
1
0
3
1
1
93
0
4 0
0
1
20
3
1
75
0
0
8
3
0
89
20
2
6
59
2
12
9
0
6
79
0
24
17
42
9
5
3
13
8
7 68
4
7
66
8
2
16
1
3
83
7
0
7
10
63
9 11
2
6
8
74
5 11
0
5
1
1
7
1
85
6
0
1
6
0
0
16
4
1
79
0
0
9
7
3
81
22
5
50 11
2
10
9
1
72 12
0
17
25
15 38
4
2
7
13
16 58
7
4
58
2 17
19
0
7
75
1 12
5
52
7
19 15
0
7
74
12
5
7
0
1
0
1
5
1
92
2
0
0
0
0
0
26
5
2
0 66
0
2
2
0
96
13
0
9
72
1
5
3
0
8
88
1
17
14
12 54
3
1
0
4
7 89
1
5
72
9
3
11
0
2
92
6
0
1
59
8
14
16
1
2
87
9
0
3
0
Table 3.4: Table of the average simulated confusion matrices for each processing
condition, assuming that all frequencies are audible. Averages were taken across
subjects and smearing amounts. From top to bottom, the processing conditions are:
No smearing, Moore smearing, ter Keurs smearing, and Hou smearing. The left
column contains the results of smearing without recruitment, and the right column
contains the results of smearing followed by recruitment.
92
Moore
4.5x,
ter
Keurs
0.75
oct.
Moore
6x,
ter
Keurs
1.5
oct.
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
95
0
7
0
0
1
90
0
10
0
0
3
Moore Smearing
EE EH II 00 UH
1
0
0
4
0
0
7
2
0
90
9
0
6
1 77
14
7
8
68
4
4
87
5
0
5
76
8
4 11
0
1
0
0
9
0
0
13
5
0
82
11
0
6
72
0
20
13
8 53
6
4
71
0 10
15
60
9
7 21
0
ter Keurs
AH EE EH
4
0
90
1
0 85
75
1
3
12
8
0
1
4
0
7
0
2
8
0
86
5
0 76
66
0
11
18
6
0
2
5
0
7
0
1
Smearing
II 00
0
0
7
7
1
11
13
64
75
14
11
4
0
1
11
7
1
12
12
53
75
11
12
7
UH
6
0
9
3
6
76
5
0
10
11
8
72
Table 3.5: Table of the simulated confusion matrices for approximately equivalent
total smearing amounts for the Moore and ter Keurs methods, without recruitment,
assuming that all frequencies are audible. The results shown in the table are the
averages across all subjects. The top row contains a Moore smearing with a widening
factor of 4.5, and ter Keurs smearing of 0.75 octaves. The total error rate with that
amount of smearing for the Moore method was a somewhat (~5%) lower than the
total error rate for that amount of ter Keurs smearing. The bottom row contains a
Moore smearing with a widening factor of 6, and ter Keurs smearing of 1.5 octaves.
The total error rate with that amount of smearing for the Moore method was almost
identical to the total error rate for that amount of ter Keurs smearing.
93
Total error rates for subject JB
5550 -
..-.-.-.-.-.-.-.-.-.-.-.........
A5..
40 -
. . . . . ...-.
A
A
. . .. . . .. . . . .. . . .. . . .
2 35
30U
HO
20
A
A
225
- .
-
- -
-
..
-
- -- -
- - - - - -
- -
- - -
- -
- -
- - -
- -- - -
- -
1510--
No smearing
-*- Moore sm.
-9- ter Keurs sm.
-A- Hou sm.
5-
0
1
2
1.8
1.6
1.4
1.2
1
Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms].
Figure 3-1: Total error rates for the simulation using subject JB's audiogram. Solid
lines indicate results with just smearing and no recruitment, as indicated in the legend,
and dotted lines indicate those conditions followed by recruitment. The dashed line
is subject JB's actual error rate.
94
Total error rates for subject JO
55
-9- No smearing
-*- Moore sm.
50-
-EJ- ter Keurs sm.
-A- Hou sm.
4540-
35C
Ca
0
3025-
20.
15
10
- .
-. . .
^A
-
5
0
1.4
1.2
1
Smearing multiplier: amount of smearing
=
2
1.8
1.6
multiplier * [3x width, 0.75 octave, 28 ms].
Figure 3-2: Total error rates for the simulation using subject JO's audiogram. Solid
lines indicate results with just smearing and no recruitment, as indicated in the legend,
and dotted lines indicate those conditions followed by recruitment. The dashed line
is subject JO's actual error rate.
95
Total error rates for subject MG
55
-)-
No smearing
Moore sm.
-8- ter Keurs sm.
-A- Hou sm.
-*-
5045-
40-
-C
Ell
35CD
CL)
300
25......
. . . .. . . . .. . . . .. .
20 -
. .. . . . .. . . . .. . . . .. . . .
-'
'
15 -
1oV
5
0
A
L
9
C
2
1.8
1.6
1.4
1.2
1
Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms].
Figure 3-3: Total error rates for the simulation using subject MG's audiogram. Solid
lines indicate results with just smearing and no recruitment, as indicated in the legend,
and dotted lines indicate those conditions followed by recruitment. The dashed line
is subject MG's actual error rate.
96
Total error rates for subject PW
55-
-e- No smearing
-*-
50-
Moore sm.
-E3- ter Keurs sm.
-A- Hou sm.
4540 -
--.-.
.
3520-
151050
1.4
1.2
1
Smearing multiplier: amount of smearing
=
1.6
multiplier
*
2
1.8
[3x width, 0.75 octave, 28 ins].
Figure 3-4: Total error rates for the simulation using subject PW's audiogram. Solid
lines indicate results with just smearing and no recruitment, as indicated in the legend,
and dotted lines indicate those conditions followed by recruitment. The dashed line
is subject PW's actual error rate.
97
Total error rates for subject RC
55-0-
No smearing
-*-- Moore sm.
50-
-E3- ter Keurs sm.
-A- Hou sm.
4540-
-
35CO)
" 25 (D
020 15 -
e
0
105
0
1
1
1
1
I
2
1.8
1.6
1.4
1.2
1
Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms].
Figure 3-5: Total error rates for the simulation using subject RC's audiogram. Solid
lines indicate results with just smearing and no recruitment, as indicated in the legend,
and dotted lines indicate those conditions followed by recruitment. The dashed line
is subject RC's actual error rate.
98
Relative error rates for subject JB
55-
-e- No smearing
-*-9-
50-
Moore sm.
ter Keurs sm.
-A- Hou sm.
45400
-0
(D
35-
(D
10-C
10
0
''
--
1
.11.5
.21.
5-A
0
ra)e
A
E)
1
1.4
1.2
Smearing multiplier: amount of smearing
e
=
1.6
multiplier
I
*[3x
E
2
1.8
width, 0.75 octave, 28 ins].
Figure 3-6: Error rates relative to the total error rate for unsmeared stimuli for
the simulation using subject JB's audiogram. Solid lines indicate results with just
smearing and no recruitment, as indicated in the legend, and dotted lines indicate
those conditions followed by recruitment. The dashed line is subject JB's actual error
rate.
99
Relative error rates for subject JO
55-G-*-
50-
No smearing
Moore sm.
-8- ter Keurs sm.
-A- Hou sm.
( 45CO)
a 400~
35-
"00
a)
...
............................................
CO 30-
E
C)
o 250
* 20(D
-
15-
-
w 10
5-
2
1.8
1.6
1.4
1.2
Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms].
1
Figure 3-7:
Error rates relative to the total error rate for unsmeared stimuli for
the simulation using subject JO's audiogram. Solid lines indicate results with just
smearing and no recruitment, as indicated in the legend, and dotted lines indicate
those conditions followed by recruitment. The dashed line is subject JO's actual error
rate.
100
Relative error rates for subject MG
55-
-E-
No smearing
-*- Moore sm.
50-
--
ter Keurs sm.
-A- Hou sm.
45-
(
CO
(40-
0-
o 25-
0----
" 1050
'0
1
1.2
1.4
Smearing multiplier: amount of smearing
=
.....
1.6
1.8
2
multiplier * [3x width, 0.75 octave, 28 ins].
Figure 3-8: Error rates relative to the total error rate for unsmeared stimuli for
the simulation using subject MG's audiogram. Solid lines indicate results with just
smearing and no recruitment, as indicated in the legend, and dotted lines indicate
those conditions followed by recruitment. The dashed line is subject MG's actual
error rate.
101
Relative error rates for subject PW
55-&-
-*-
50-
No smearing
Moore sm.
-8- ter Keurs sm.
-A- Hou sm.
2
P
CD
45-
o 400
a)
a)
C
cD
c
2500
-
CO
0
5
0
Figure 3-9:
''
1.6
1.4
1.2
1
Smearing multiplier: amount of smearing = multiplier
*
2
1.8
[3x width, 0.75 octave, 28 ms].
Error rates relative to the total error rate for unsmeared stimuli for
the simulation using subject PW's audiogram. Solid lines indicate results with just
smearing and no recruitment, as indicated in the legend, and dotted lines indicate
those conditions followed by recruitment. The dashed line is subject PW's actual
error rate.
102
Relative error rates for subject RC
55---*-
50-
No smearing
Moore sm.
-E- ter Keurs sm.
-A-
Hou sm.
0 452
(D
0.
o 400
E
35a30 -
.. . . .
--.
-- ..........
--
0
20 -
.....
.
o 150
10-
AA
0
A
I
I
'
1
1.2
1.4
1.6
1.8
2
Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms].
Figure 3-10:
Error rates relative to the total error rate for unsmeared stimuli for
the simulation using subject RC's audiogram. Solid lines indicate results with just
smearing and no recruitment, as indicated in the legend, and dotted lines indicate
those conditions followed by recruitment. The dashed line is subject RC's actual error
rate.
103
.: . . ..
In(Chi-squared)
MK H MK H-R- RRRR
8
measure for the average of the simulated data
I
: . . I . . .
I . . . . I .
MG
P.W
. .
M..
MK H MK H-I- MK H MK H- - MK H MK H-I- MK H MK HRRRR
R R R R --- R R R R- - - R R R R---
7
6
CO
c4
-
-35
1
2
:JO:
0
:
RO-
30
25
15
20
10
Processing condition and real subject compared to the simulation
0,
35
40
S3
In( (O-E) ) measure for the average of the simulated data
5
-
MK H MK H- - MK H MK H- - MK H MK HRRRR.........RR
RRRR
- MK H MK H-- MK H MK H RRRR
RRRR -
4
c'J
2
w
0
0
-1
JO
JB
-2
0
5
MG
PPW
.
30
15
20
25
10
Processing condition and real subject compared to the simulation
RC.
F
35
40
Figure 3-11: Plot showing how well the average of all of the confusion matrix data
fit to the subject's actual confusion matrices. Averages were taken across all subjects
and smearing amounts, for the reduced-audibility simulation.
104
Comparison of Moore and ter Keurs smearing with about equal error rates: In( Chi-squared ) measure
6.5
6
5.5
5
-c
(n
4.5 -
Cr
4
..!
F
3.5
3
2.5 L
0
3
20
15
10
5
Different subjects fitted. Moore smearing of S.M. = 1.5, ter Keurs smearing of S.M. = 1.0
Comparison of Moore and ter Keurs smearing with about equal error rates:
In( (O-E)
25
) measure
.5 r
2.5
C
2
1.5
1
0
20
10
15
5
Different subjects fitted. Moore smearing of S.M. = 1.5, ter Keurs smearing of S.M. = 1.0
25
Figure 3-12: A comparison of the ter Keurs results (dotted line) and Moore results
(solid line) if the total error rates are approximately the same for the two smearing
conditions. The ter Keurs result had a smearing multiplier of 1.0, and the Moore
smearing had a multiplier of 1.5. For each pair of ter Keurs and Moore values (plotted
in the same vertical space), the subject used to create the simulated matrix and the
real subject it was being compared to were the same.
105
Comparison of Moore and ter Keurs smearing with about equal error rates:
In( Chi-squared
) measure
U. 0
-I
-
6
5.5
Cu
1-2
5-1
(n 4.5
4
3.5|
0
3.4
20
15
10
5
Different subjects fitted. Moore smearing of S.M. = 2.0, ter Keurs smearing of S.M. = 1.5
Comparison of Moore and ter Keurs smearing with about equal error rates:
25
In( (O-E) 2 ) measure
3.2 -
3
W2.8 1
62.6
C5
-F
-.
2.4
-....
2.2
2
0
5
15
10
20
25
Different subjects fitted. Moore smearing of S.M. = 2.0, ter Keurs smearing of S.M. = 1.5
Figure 3-13: A comparison of the ter Keurs results (dotted line) and Moore results
(solid line) if the total error rates are approximately the same for the two smearing
conditions. The ter Keurs result had a smearing multiplier of 1.5, and the Moore
smearing had a multiplier of 2.0. For each pair of ter Keurs and Moore values (plotted
in the same vertical space), the subject used to create the simulated matrix and the
real subject it was being compared to were the same.
106
Audibility of different channels for Subject JB
0)
-0
0
2
4
I
I
I
14
12
10
8
6
Audibility of different channels for Subject JO
I
16
I
18
= 0.5 -----
0
2
4
14
12
10
8
6
10
for 2Subject MG1
Audibility of different channels
16
16
18
2
4
6
14
12
10
8
Audibility of different channels for Subject PW
16
18
2
4
14
12
10
8
6
Audibility of different channels for Subject RC
16
18
4
6
18
-0
~0
=3 0. 501
1
0 .5 -
01
.,
.0
:5
.5 -
n
i
I
I
I
2
8
12
10
Channel Number
14
11
16
18
Figure 3-14: Graph showing the percentage of stimuli that had audible channels for
a given channel number, for each subject. Note that the channels correspond to a log
scale with channel 1 centered at 100 Hz, channel 12 at 1575 Hz, and channel 19 at
4839 Hz (for reference). Also, it can be seen that in most plots two groups of curves
can be seen. The set of curves to the left corresponds to no recruitment, and the set
to the right is with recruitment.
107
Stimulus and Response Centers for Unsmeared Stimuli, Male and Female Speakers Combined
1.5
00.0
0.5 F
OUH
04
.. '-OAH
- - -
0
-
-
-l
-
-
E -0.5 ICL
-1
a,
-1.5
F
-2'
-2. 5
-2
-1.5
-1
-0.5
0
perceptual dimension 1 - units of d'
0.5
1
1.5
1
1.5
Stimulus and Response Centers for Unsmeared Stimuli, Male Speaker
1.5
1
0.5 F
. UH
0
C
0
Io
E. -0.5 k
:3
aL
OEH
-1
CL
-1.5
-2.5
-2
-1.5
-1
-0.5
0
perceptual dimension 1 - units of d'
0.5
Figure 3-15: Plot showing the stimulus centers and response centers for the stimuli
from the male and female speakers combined (top plot), and just the female speaker
(bottom plot). The stimulus centers are shown by the small black dots, and the
response centers are shown by the open circles. In each plot, the response centers
are the average of the stimulus centers for each vowel. Only half of the total number
of stimulus centers were plotted for computational reasons. As can be seen around
the "AH" in the top plot, the stimuli corresponding to the male and female speakers
formed two distinct clusters for each vowel. The clusters are not distinct for the other
vowels in this plot due to the perspective. Also, it can be seen that the stimulus
centers for each vowel are less spread out in the bottom plot.
108
In(Chi-squared)
9
measure for the average of results with S.M.=1.0 of combined Male and Female speaker data
-
K H-[-
-- -
8
-
- R R R R-
-
-
-
R RHHHM
RRI-
K H 'M
MK H-- R R RR-
-
MK H M
-
-
-
- - MKHH
R R RR-
--
K HRRRR
-
7
6
--. R-C
05
-
M
s53
2
1
'-
JO:
:JB
0
5
RW:
MVG:
RC
10
.
15
20
25
30
Processing condition and real subject compared to the simulation
35
40
In( (O-E) 2 ) measure for the average of results with S.M.=1 .0 of combined Male and Female speaker data
5
- MK H: M K H:-- - -R: R R: R
- MK H: M K H'RRRR
-
MK
H M K H- - M K H M K: H- -R
-- - - R R: RR
-
MK H M K H -R R: R P
4
3
2
C
1
0
-1
-2
0
JO
JB
5
10
:: PW
MG:
15
20
25
RC
30
35
40
Processing condition and real subject compared to the simulation
Figure 3-16: Plot showing how well the average of all of the confusion matrix data fit
to the subject's actual confusion matrices. Averages were taken across all subjects,
with the smearing multiplier fixed at 1.0, for the reduced-audibility simulation.
109
Total error rates for subject JB, assuming all frequencies are audible
-&
60
No smearing
-*- Moore sm.
- - ter Keurs sm.
-A- Hou sm.
*.----..
AA
50-
A-
E
.......
() 40-
0
O30-
20 =-.
----
-
- - --
--
--
---
--
--
- --
- - - - - - - - -
GUE
0-
10-
2
1.8
1.6
1.4
1.2
1
Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms].
Figure 3-17: Total error rates for the simulation using subject JB's audiogram,
assuming that all frequencies are audible to the subject. Solid lines indicate results
with just smearing and no recruitment, as indicated in the legend, and dotted lines
indicate those conditions followed by recruitment. The dashed line is subject JB's
actual error rate.
110
Total error rates for subject JO, assuming all frequencies are audible
55-
-0- No smearing
-*- Moore sm.
-E-
50-
ter Keurs sm.
-A- Hou sm.
4540t35--
..
0-O
.
.
..
.
.
..
.
.
. . ....
.
.
CD
" 25(D
20 -15 -
-
10-
-- -- -- -- -- --
-- -- ---------------5-
01
1.4
1.2
1
Smearing multiplier: amount of smearing
=
2
1.8
1.6
multiplier * [3x width, 0.75 octave, 28 ms].
Figure 3-18: Total error rates for the simulation using subject JO's audiogram,
assuming that all frequencies are audible to the subject. Solid lines indicate results
with just smearing and no recruitment, as indicated in the legend, and dotted lines
indicate those conditions followed by recruitment. The dashed line is subject JO's
actual error rate.
111
Total error rates for subject MG, assuming all frequencies are audible
55-0- No smearing
-*- Moore sm.
El- ter Keurs sm.
--
50
-A- Hou sm.
45-
40-.-
c 35 ..
C)
... ..
. . . . .
a)..
0-
'..
.........-.
32 5 -
-
25
(D
g
20O
15 0
...........
..........................
0 ...................................
10
5A
0
1
1
1
2
1.6
1.8
1
1.2
1.4
Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms].
Figure 3-19: Total error rates for the simulation using subject MG's audiogram,
assuming that all frequencies are audible to the subject. Solid lines indicate results
with just smearing and no recruitment, as indicated in the legend, and dotted lines
indicate those conditions followed by recruitment. The dashed line is subject MG's
actual error rate.
112
Total error rates for subject PW, assuming all frequencies are audible
55-
-G- No smearing
-*-
-f-
50-
-A-
Moore sm.
ter Keurs sm.
Hou sm.
45 -
40.....................................
35- 30 -
o
. A... .
..
-..-... . ..
. . . . . . . . . . . . . . . ...
a)........
"25 (D
g 20 15-
10 501-
2
1.8
1.6
1.4
1.2
1
Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms].
Figure 3-20: Total error rates for the simulation using subject PW's audiogram,
assuming that all frequencies are audible to the subject. Solid lines indicate results
with just smearing and no recruitment, as indicated in the legend, and dotted lines
indicate those conditions followed by recruitment. The dashed line is subject PW's
actual error rate.
113
Total error rates for subject RC, assuming all frequencies are audible
55-
50-
-E- No smearing
-*- Moore sm.
-8- ter Keurs sm.
-A- Hou sm.
--
45-
4035(D)
CL
300
25-
ci)
0- 20-
-- -- -- -e -- -- ---- -- . -- -- ---- ----
15
10
5
0
1
1.8
1.6
1.4
1.2
Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75
2
octave, 28 ms].
Figure 3-21: Total error rates for the simulation using subject RC's audiogram,
assuming that all frequencies are audible to the subject. Solid lines indicate results
with just smearing and no recruitment, as indicated in the legend, and dotted lines
indicate those conditions followed by recruitment. The dashed line is subject RC's
actual error rate.
114
In(Chi-squared) measure for the average simulated data with all frequencies audible
. .
I
I
. .
MK H MK H-I--MK HMK H_:MK H MK H-MK H MK HIMK H MK H- - -RRRR----RR
RR- --R R
R---RR RR---R R R R
9
8
7
6
CU
05
-5
cn 4
:3
-C.
2
1
0
JO:
0
10
5
P.W
G:
15
20
40
35
30
25
Processing condition and real subject compared to the simulation
In( (O-E)2
) measure for the average simulated data with all frequencies audible
5
-:MK
-_-
H:M K H- - MK H MK H - - M K H M K H.RR R R----. R R R R
R R R R
-
M K H M K: H
-
- - -- RR RR-
MK H M K. H - - -: R R R R
4
3
7zJ 2
0
1
0
-1
-2
0
JO:
JB
5
10
PW
MG:
15
20
25
RC
30
35
40
Processing condition and real subject compared to the simulation
Figure 3-22: Plot showing how well the average of all of the confusion matrix data
fit to the subject's actual confusion matrices. Averages were taken across all subjects
and smearing amounts, for the simulation in which all frequencies were audible.
115
Comparison of Moore and ter Keurs smearing with about equal error rates:
--HH~-
4.5
cI
4
c
Cr
In( Chi-squared
) measure
3.5
10
5
2.500
10
5
15
20
15
20
25
Different subjects fitted. Moore smearing of S.M. = 1.5, ter Keurs smearing of S.M. = 1.0
Comparison of Moore and ter Keurs smearing with about equal error rates:
2.8
-
.
In( (O-E)2 ) measure
-
-
2.62.4
-
2.2
-
2
-
1.8
0
- 1.61.41.2
0.8
0
20
15
10
5
Different subjects fitted. Moore smearing of S.M. = 1.5, ter Keurs smearing of S.M. = 1.0
25
Figure 3-23: Assuming full audibility, a comparison of the ter Keurs results (dotted
line) and Moore results (solid line) if the total error rates are approximately the same
for the two smearing conditions. The ter Keurs result had a smearing multiplier of
1.0, and the Moore smearing had a multiplier of 1.5. For each pair of ter Keurs and
Moore values (plotted in the same vertical space), the subject used to create the
simulated matrix and the real subject it was being compared to were the same.
116
Comparison of Moore and ter Keurs smearing with about equal error rates:
I-
4.8
In(Chi-squared
) measure
4.614.4t--CU
4.21-
---
_0
4
2
3.8[
3.6 -
3.4'
0
15
"*
.
.20
10
5
Different subjects fitted. Moore smearing of S.M. = 2.0, ter Keurs smearing of S.M. = 2.0
25
2
Comparison of Moore and ter Keurs smearing with about equal error rates: In( (O-E) ) measure
2.8
2.6-
-2.4
C
2.2 1
_J
2
0
20
15
10
5
Different subjects fitted. Moore smearing of S.M. = 2.0, ter Keurs smearing of S.M. = 2.0
25
Figure 3-24: Assuming full audibility, a comparison of the ter Keurs results (dotted
line) and Moore results (solid line) if the total error rates are approximately the
same for the two smearing conditions. The ter Keurs and Moore results both had a
smearing multiplier of 2.0. For each pair of ter Keurs and Moore values (plotted in
the same vertical space), the subject used to create the simulated matrix and the real
subject it was being compared to were the same.
117
Chapter 4
Discussion
4.1
Why the Simulation Did Not Work
The primary problem in this simulation seemed to be that the locations of the largest
confusions produced at the output of the perceptual model tended to occur in the same
locations, regardless of the amount of smearing or even if there was any smearing at
all. A certain set of confusions was present in the results for the unsmeared stimuli. As
the amount of smearing increased or recruitment was added, the degree of confusions
at these locations increased, and at other locations new confusions arose, but these
were generally of a smaller magnitude than the original confusions. With recruitment,
there tended to be confusions in almost all entries in the resulting confusion matrix,
and even with no recruitment but large amounts of smearing there were many entries
in the confusion matrices.
Real normal-hearing human listeners do not make many vowel confusions in quiet,
and they do not even if they are listening to stimuli in quiet that have been smeared
with large amounts of smearing (Baer and Moore, 1993, and ter Keurs et al., 1992).
Because humans do not make confusions where the perceptual model in the simulation
does, this implies that humans use a different algorithm for doing perception than
the model. Either humans must use different cues than the model, or they must use
the cues (i.e. the spectrum) in a different manner.
118
Vowel duration is another cue that can be used to discriminate vowels. Certainly,
the vowels presented to the subjects in the real experiment differed in duration, as
shown by figure 4-1. However, the standard deviation of these lengths was large, and
this was almost certainly not the only difference between the real human perceptual
model and the simulated perceptual model.
Instead, it is more likely that humans use the spectral cues of the vowels in a
different way than the simulated perceptual model. This is easily illustrated by the
diagrams in figure 4-2. On the left side of the diagram, there are three sample spectra,
with peaks centered at two possible frequencies representing the formants in a vowel.
Spectra 1 and 2 have a single peak at the first frequency, but the width and height
of the formant varies. Spectrum 3 has a smaller peak a the first frequency, but also
a peak at the second frequency. Because of the differences in the "formants," the
d' algorithm used in the simulation would produce that the spectra were equidistant
from each other in a perceptual space, shown in the bottom portion of the diagram.
However, because of the presence of the second peak, a human would judge that the
third spectrum would be far from the other two. This is illustrated in the perceptual
map corresponding to a real human, in the bottom right of the diagram. So, in this
manner, the d' algorithm would classify the spectra differently than a real human,
which would result in different confusions.
Another example, on the right side of the figure, shows a spectrum that is flat
except for a single bump. In spectrum 1, the bump is wide and short, while in
spectrum 2 it is narrower but taller, and in spectrum 3 it is even more narrow and
tall so as to be tonelike. The d' algorithm in the simulation would judge these to be
equally different from each other, but a human would easily be able to judge that
the tonelike stimulus was very different from the other spectra. It should also be
pointed out that the d' algorithm used in the simulation was developed and tested on
broadband noise stimuli, and vowels and tones are not noise.
So, it is proposed that the following situation occurred in this simulation, that
caused the results to be other than desired:
119
regardless of how real people do
discrimination, the d' algorithm used in this simulation should reflect the fact that
some of the vowels' spectra got closer together as a result of smearing, and thus
would be confused in the perceptual model. Assuming that the smearing models
more or less correctly duplicate the smearing process that goes on in hearing-impaired
listeners, the confusions that real people make should be reflected in the d' distances
between those vowels getting smaller. Let us assume for the moment that this actually
happened in the current simulation.
Even with the d' algorithm reflecting that the vowels humans typically confuse
were getting closer together through smearing, another effect was also taking place
in the current simulation. In particular, the d' discrimination algorithm used in the
simulation will confuse some vowels that real humans could easily tell apart, for
example by their spectral pattern. An example of this was -illtistrated above. This
effect most likely occurred in the current simulation, causing certain confusions to
exist that humans would not have difficulty with due to the different discrimination
algorithm used by humans. This caused the confusions to exist in the unsmeared
stimuli.
When smearing was applied, these already-confused stimuli became more similar
to each other, along with the stimuli that humans would be confused by. So, although
the confusions made by real people were represented in the perceptual map (or
confusion matrices), their effect was masked by the other confusions increasing in
magnitude. In some cases, this effect could have occurred to such a degree that the
simulation would predict the absence of a confusion between some stimuli, while a
human would be confused.
So, because of this background of large numbers of confusions, it was very difficult
to detect the real confusions. Notably, the smearing algorithms did almost-linearly
increase the total error rate of the confusion matrices, so it is clear they were doing
something that contributed towards the signals being confused with one another. But,
it is unknown whether or not the assumption that they were correctly simulating the
frequency smearing is correct, or if they were creating inaccurate confusions.
120
One way of testing whether or not this interpretation of the results is correct
is to examine the directions that the stimuli moved on the average relative to one
another. A necessary constraint for this hypothesis being correct is that if there
are confusions produced by the real subjects, then the vowels involved must move
closer together as smearing is increased. If the vowels moved further apart (i.e. the
d' distance increased), then either this explanation must be incorrect, the smearing
and recruitment algorithms are not functioning properly, or some other mechanism
besides outer hair cell loss is responsible for the real subject's hearing loss. Examining
the relative motion of individual pairs of vowels is likely the only way to determine if
the smearing/recruitment results are accurate or not, but even then there is still the
effect that the confusions different than those humans would make would change and
obscure the desired confusions.
The possibility of using vowel duration as a cue is worth discussing additionally.
It could be the case that the real subjects were able to distinguish between sounds
on the basis of the vowels' durations, even if their spectra were very similar. If this
were the case, we would expect to see fewer confusions in those locations with very
different durations in the real confusion matrices. Assuming that the extracted vowel
durations are indeed reflective of the actual vowel durations perceived by the real
subjects, vowel duration was probably not an important cue in vowel discrimination.
Many of the real subjects made AH-II and AH-UH confusions, but the duration
differences of these vowels are the largest among the stimulus set. It was also the case
that vowels with similar durations did tend to be confused by the real subjects, and
in some cases (i.e. OO-UH) the confusion magnitudes were quite large, which would
be consistent with vowel duration being used as a cue. However, the existence of
confusions between vowels of very different durations argues against vowel duration
being an important cue.
121
4.2
Other Sources of Error
Besides the perceptual model operating differently from how real humans do
perception, there were several other possible sources of error that would cause the
simulation results to not duplicate the real subjects' confusion matrices well.
First, the real subjects' confusion matrices differed somewhat in terms of the
location and severity of confusions, though there were a number of similarities. Many
of the locations at which confusions occurred could be found in as many as three
or four out of the five subjects (see Appendix A for the actual confusion matrices).
However, some variation still remained. It seems reasonable to hypothesize that the
mechanisms of hearing loss in the different subjects varied to some extent because
of this. Other factors that could have varied between subjects include the amount
of outer hair cell versus inner hair cell loss and possible neural conduction losses
in the auditory nerve.
In the current simulation, it was assumed that all of the
hearing loss was due to outer hair cell loss.
To the extent that this assumption
was not true, there could have been differences between the simulated and real
confusion matrices. Also, the current simulation simulated a constant widening of
the auditory filters for all subjects (which would correspond to a flat loss), regardless
of the subjects' audiograms. While the assumption that the subjects' audiograms are
flat is approximately true, it is definitely not the case for subjects MG and RC. It
is interesting to note that despite this assumption not being true, subject RC's real
confusion matrix correlated better with the simulated confusion matrices than the
other subjects' (besides JO's) confusion matrices. This lends support to the earlier
explanation that it was the lower total error rates and distribution of errors that
caused these subjects' confusion matrices to fit the simulated matrices better.
4.3
Conclusions that Can Be Drawn
Despite the fact that the simulated confusion matrices tended to not have their largest
confusions in the same locations as the real confusion matrices, some conclusions
122
still can be drawn from the results.
The smearing and recruitment algorithms
still introduced errors into the confusion matrices, indicating that something was
occuring. The confusion matrices are reflective of the amount of frequency smearing
that occurred through the processing, because the perceptual model acted as a sort
of spectrum analyzer.
4.3.1
Total Error Rates and Smearing and Recruitment
It is interesting to compare the exact error rates produced by this simulation with the
error rates found when Baer and Moore (1993), ter Keurs et al. (1992), and Moore
and Glasberg (1993) simulated smearing and recruitment, respectively, and presented
the results to normal-hearing listeners.
Baer and Moore (1993) presented the results of frequency-smeared words to
listeners at 65 dB SPL with a background of quiet. With No Smearing, ERB x3,
and ERB x6 conditions, the total error rates were about 0% for the first two,-and,
1.7% for the third condition. This is very low compared to the total error rates found
in the current simulation with Moore smearing and no recruitment, which were in
the 10-30% range for smearing of ERB x3, and in the 25-40% range for the ERB x6
condition. Also, in the current simulation as the frequency smearing was increased
from ERB x3 to ERB x6 the total error rate increased about 10-15% on average.
This is far more than the change found by Baer and Moore. One factor that could
potentially explain this difference is that in the Baer and Moore experiment, listeners
heard entire words, while in the current experiment vowel identification in nonsense
syllables was studied. Listeners in the Baer and Moore experiment could potentially
combine cues from different phonemes in the entire word to improve comprehension.
ter Keurs et al.
(1992) simulated frequency smearing with isolated vowels in
quiet, that had been equalized in length, and were presented at 65 dBA. They only
performed smearing at 1/8, 1/2, and 2 octave amounts, and found that with the
unsmeared, 1/8, and 1/2 octave conditions, the total error rate was only 3%, but
with 2 octave smearing it jumped to 68%. In comparison, in the current simulation
123
only the smearing amounts of 0.75, 1.125, and 1.5 octaves was investigated (without
recruitment). For all subjects, the total error rates were about 25-35% at 0.75 octave
smearing, about 30-37% for 1.125 octave smearing, and about 33-40% for 1.5 octaves
of smearing. These error rates are clearly between 3% and 68%, but it should also be
noted that in the simulation, the unsmeared vowels had a total error rate of 6-20%.
Because the smearing amounts in the current simulation did not overlap well with
the smearing amounts in the ter Keurs et al. (1992) experiment, it is difficult to form
conclusions about the accuracy of the simulation based on just the total error rates.
Also, in the ter Keurs et al. experiment, they used 11 different vowels, while the
simulation and experiment by Brantley only used 6.
However, ter Keurs et al. (1992) also formed a confusion matrix based on the
subjects' responses to the 2-octave simulation. A comparison of this result with the
current simulation is justified, as one would expect the confusions to be approximately
in the same locations in both cases, even though the confusion amounts may be
different. ter Keurs et al. found that almost all of the confusions were vowels being
judged by subjects as being back vowels. The experiment used all of the vowels in the
current simulation except UH, but it instead used "AW" which is similar. They also
found a large confusion with EH being classified as AH, which also appeared in the
current simulation, and they found minor confusions with II being classified as AH
and EH, the former of which was not found in the current simulation, but the latter
was. In the current simulation, there do exist many confusions with vowels being
classified as back vowels, with the exception of EE-UH, AH-00, and EH-00. This
suggests that the simulation may be somewhat reflective of the responses of human
listeners. However, the simulation also produced large confusions at other locations,
most notably II-EE, 00-EE, AH-EH, EE-EH, EH-II, 00-II, UH-EH, and UH-II.
This large amount of other confusions limits the degree to which it can be concluded
that the simulation and experiment produced the same results.
Moore and Glasberg (1993) processed sentences with the recruitment algorithm
and presented them to subjects, using different input levels and different amounts of
124
recruitment. It is unfortunately hard to compare the results of their experiment with
the current simulation, because they only presented the 50% correct points in the
paper. With a severe flat hearing loss simulated, the 50% correct point was with an
input level of about 79 dB SPL, and with a moderate-to-severe sloping loss simulated,
the 50% correct point was with an input level of about 52 dB SPL. In comparison,
in the current simulation subjects MG and RC had an approximately moderate-tosevere sloping loss. For subject MG, the input level was about 108 dB for both high
and low frequencies, and simulating recruitment produced a total error rate of about
14%. For subject RC, the input level was 80 dB SPL up to 2500 Hz, then 102 dB SPL
above 2500 Hz. This produced an error rate of nearly 30%. In both of these cases,
the input levels were higher and the resulting error rates were lower than those in the
Moore and Glasberg experiment. This is consistent with the simulation operating
correctly, but it is hard to tell to what extent.
Also in the current experiment, subjects JB, JO, and PW had approximately flat
moderate to severe losses. Subjects JB and PW had the input signals present at levels
of (80,95) and (85,88) dB SPL, where the numbers in parenthesis indicate the levels
below and above 2500 Hz, respectively. In response, subject JB's total error rate was
about 40%, and subject PW's was about 27%. For these subjects, the error rate for
unsmeared stimuli was 21% and 12%, respectively, indicating that this contributed
somewhat to the recruited signals' total error rates. Even so, the results for subject JB
are roughly comparable to the results found in the Moore and Glasberg experiment.
The results for subject PW are consistent with recruitment causing increased error
rates, but cannot specify to what degree. For subject JO, the signal was presented
at a high level (105,110) dB SPL, causing the error rate to be only 15%.
In general, it was found in the simulation that the total error rate as a result
of recruitment was comparable to the total error rates for Moore smearing of ERB
x3 or ERB x4.5, and usually less than the total error rates for all amounts of ter
Keurs smearing. In comparison, the results of the actual experiments indicate that
Moore smearing should cause a very low error rate and the error rate for recruitment
125
be greater than it.
Moreover, in the current experiment, the input levels to the
recruitment algorithm were larger than those used in the actual experiments, but the
resulting error rates were still larger than those corresponding to the Moore smearing.
This likely indicates that the effect of the perceptual model was to drastically increase
the error rates due to Moore smearing, and perhaps also increase the error rates due
to recruitment.
Comparing the results of the Moore and ter Keurs smearing, the Moore results
in the simulation had lower error rates than those from the ter Keurs smearing for
comparable amounts of smearing.
This result from the simulation is most likely
consistent with the experimental results from Baer and Moore, and ter Keurs et al.,
although in the ter Keurs et al. (1992) experiment, no data is available for smearing
bandwidths between 1/2 and 2 octaves.
Comparing the results of the ter Keurs smearing and recruitment, recruitment
almost always caused total error rates below those from the ter Keurs smearing. Given
that in the simulation the input levels to the recruitment algorithm were larger than
those used in the experiment by Moore and Glasberg, which would tend to reduce the
error rates, the simulation results are likely consistent with the experimental results,
although again data for the ter Keurs et al. experiment is not known for intermediate
smearing amounts.
4.3.2
Comparing Moore and ter Keurs Smearing
In the reduced audibility condition, it was found that the results of the Moore
smearing fit the real subjects' confusion matrices better than did the results of
the ter Keurs smearing, for similar total error rates. It is not obvious why this
occurred. The Moore method usually produced slightly fewer confusions than the
ter Keurs method, and had larger confusions in a few locations. This may have
2
caused a few large elements in the real confusion matrices to be excluded from the X
calculation, although the MSE measure also indicated that the Moore method was
better, although less strongly. In any event, because of this result it can tentatively
126
be concluded that (without recruitment) the Moore smearing method produces more
accurate results than the ter Keurs smearing method.
This result involved the
comparison of all of the simulated matrices to all of the real subjects' matrices, so it
does not necessarily indicate that the Moore method is better at creating the specific
confusions made by a particular subject, if the parameters in the model are set to
correspond to that of the subject, even though this was true; rather, it indicates
that the Moore method is better at creating confusions that are generally made
by hearing-impaired listeners. However, it should be noted that though the Moore
method produced better fits than the ter Keurs method, the fits still were not very
accurate.
4.3.3
The Effect of Audibility
In comparing the
results between the reduced-audibility
and full-audibility
simulations, it is clear that the audibility of the different parts of the spectrum
makes a great deal of difference for unrecruited stimuli. The total error rates for the
unrecruited signals (both smeared and not smeared) decreased with the full audibility
simulation, and the amounts of the confusions changed somewhat as well, though the
locations generally remained the same. Presumably, increasing the audibility of the
stimuli would allow for better discrimination because more spectral cues would be
available, even if they were distorted from smearing.
It is interesting to note that the total error rates of the recruited signals did
not decrease very much with increased audibility. The recruitment contributed to the
signal degradation both by changing the level of the signal and by introducing spectral
smearing, particularly at high frequencies. In real hearing-impaired subjects, using
a compression hearing aid could counteract the level-changing effects of recruitment,
but will not affect the frequency smearing effects. It would be interesting to compare
the relative contributions of these effects of loudness recruitment, by simulating level
changes without the frequency-smearing effects and comparing that to the standard
recruitment algorithm preceded by a compression stage.
127
Finally, the relative performance of the Moore and ter Keurs smearing methods
(with no recruitment) with reduced and full audibility merits discussion.
In the
reduced audibility case, the Moore method results fit the real subjects' confusion
matrices better than the ter Keurs method results, but in the full-audibility case
this difference disappeared. This could mean several things. It could be that the
confusions by both the Moore and ter Keurs methods were so different than the
real subjects' confusion matrices that it was by chance that the Moore method fit
better in the reduced audibiility simulation, and that neither really fits well. This
is definitely a possibility given the large values of the X2 measures of goodness of
fit. A second possibility is that the Moore method produces better results only if
there are minimal spectral cues. This could be true even if it is the case that the
reduced audibility simulation is terribly inaccurate, and that in the real confusion
matrix experiment many more frequencies were audible than in the simulation. If the
reduced audibility simulation is inaccurate, and the real subjects heard most of the
frequencies, this would imply that the Moore method is not better in general, but
only for the situation in which the frequency cues are minimal.
4.4
Potential Perceptual Models in Humans
It was hypothesized that the primary inaccuracy in this simulation that caused
relatively poor results was the fact that the the d' discrimination model was different
from how real human listeners do perception. The question remains as to what would
be a better model of human perception, and this section addresses that question.
In the example given of two pairs of spectra that had the same d' value but varied
in how hard it would be for humans to discriminate between them (figure 4-2), the
primary difference between spectral pairs was the presence or absence of distinct
spectral features. In Case A, the spectra would be easy for humans to discriminate
because they contained sharp spectral peaks and notches. In Case B, the spectra
contained no distinct features, only a difference in level between the high and low
128
frequencies.
One possible perceptual model that would account for the ability of spectral
features to help in signal discrimination is as follows. First, it is assumed that humans
do discrimination based on the heights and frequencies of the formants.
"Model"
vowels (that would correspond to the response centers in this simulation) would be
known to have formants of certain heights and at certain frequencies relative to one
another. Discrimination of a sample unknown vowel would proceed by first identifying
the formants present in the sample vowel's spectrum, via the spectral peaks relative
to spectral notches. The frequencies of these formants would be noted, and then the
heights of the formants would be computed relative to an average vowel spectrum. A
vector of cues would then be formed of the relative distances between the formants,
and the heights of the formants.
Finally, a d' discrimination algorithm would be
used on this cue vector, instead of on the original spectrum, and from this point the
perceptual algorithm would proceed as before.
129
Average vowel durations
0.35
-
0.3-
0.25-
-
0.2-
Ca
0
> 0.15 -
0.1 -
0.05-
AH
0
0
EE
EH
11
00
UH
6
5
4
3
2
1
Average vowel durations for the vowels AH,EE,EH,11,00,UH, respectively.
7
Figure 4-1: Average vowel lengths for the different vowels. The error bars represent
one standard deviation of the vowel length. The averages were nearly identical for
male and female speakers.
130
Stimulus Set A
Stimulus Set B
a)
E
A
f
1/A
C)
2A-
E
f
1/2A
16 A
3
3
E
1/16 A
1/16 A
Either of Stimulus Sets A or B results in perceptual maps like those below,
which have different results for the simulation and human listeners.
Human Results
Simulation Results
C
.W
-C
0
03
01
2.
-
~01
5
o2
o2
C
CL
Perceptual Dimension 1
Perceptual Dimension 1
[units of d']
[units of d']
Figure 4-2: Examples of spectra. In both cases, the d' discrimination algorithm will
produce equal d' distances between all stimuli. However, a real human listener could
easily distinguish that the third spectrum was much different than the other two.
In Stimulus Set A, the spectrum heights are A, 2A, and A, for spectra 1,2, and 3,
respectively. The widths of the spectra are such that the total area under all of the
spectra is the same.
131
Chapter 5
Conclusions
5.1
Summary of Findings
Primarily, it was found that the d' algorithm used in this simulation used spectral
cues differently than how real humans do in perception. Because of this, confusions
produced by the simulation tended to not occur in the same locations as the real
subjects' confusions. This was evident because the unsmeared signals were judged to
have confusions generally in the same locations as the confusions from the smeared
signals.
Due to these factors, it was hard to judge the accuracy of the smearing models. In
the case of limited audibility of the spectrum, such that just low-frequency cues are
available, the Moore model did better than the ter Keurs model in terms of predicting
the confusions made by the hearing-impaired listeners, but in both cases the fit was
not very good.
Even so, increasing error rates with increasing smearing bandwidths occurred with
the Moore and ter Keurs methods, both with and without recruitment. This shows
that to some degree the smearing algorithms were acting correctly. Also, recruitment
caused much-increased error rates, due to the spectral smearing it caused as well as the
spectral level changes, which also is consistent with its acting correctly. In general,
the error rates caused by the simulation seemed to be in support of the models
132
operating correctly, with the exception of the Moore model without recruitment, and
the unsmeared stimuli.
We conclude that this method of simulation has potential to be very accurate,
provided a good discrimination algorithm exists. Given the current discrimination
algorithm, it is not possible to form strong conclusions about the relative accuracy of
the Moore and ter Keurs models.
5.2
Relevance to Hearing Aid Signal Processing
Assuming that the model of loudness recruitment used in this simulation is correct,
it was determined that recruitment causes spectral smearing as well as causing level
changes in the spectrum. This is significant because it sets a limit on how well
compression hearing aids could correct for loudness recruitment. Compression hearing
aids could only correct for level changes, not frequency smearing, and would could
possibly introduce additional spectral changes, as well. This limit could be further
studied by simulations of recruitment preceded by a compression algorithm.
Also, this simulation seems to imply that there is a benefit to increasing the
audibility of all frequencies, even if the ear has reduced frequency resolution at those
frequencies. This conclusion is implied from the fact that in the simulation, reduced
error rates were found in the simulation in which all frequencies were considered
audible. This conclusion, however, is somewhat contingent on the accuracy of the
discrimination model.
5.3
Recommendations for Future Work
There is much potential in having a simulation of hearing loss that does not require
stimuli to be presented to and classified by normal-hearing subjects, because any
stimuli played to them will necessarily be single waveforms, and they will be subjected
to sharp frequency and temporal analysis by the subjects' normal ears. Simulations
133
of hearing loss must necessarily operate on the different frequency bands of a signal,
since in impaired ears the signal degredation occurs during the process of its being
analyzed by the cochlea. However, if the results of processing in different frequency
bands are added together again to form a single waveform for presentation to normalhearing subjects, many of the effects of hearing loss will be undone.
In contrast, having multiple frequency bands as the output of a hearing loss
simulator could produce quite accurate results, if the model is correct.
However,
some way of determining how humans would perceive the result of the processing
is needed. So, future work on determining an accurate model of human perception
could be quite beneficial. The results of such work could have application to speech
recognition systems as well as to hearing aid research.
Once a good model of human perception is achieved, or even with the models
currently available (assuming that some other way of analyzing the results is possible),
work could focus on creating improved simulations of hearing loss that would feed
into these perceptual models. One possibility for an improved model is described
in Appendix D. This model attempts to accurately simulate all of the cochlear
nonlinearities, so effects such as two-tone suppression could be seen. Additionally, it
simulates the linear response of a location on the basilar membrane to low-frequency
tones.
Finally, one other application for a model of human perception would be to
determine to what extent the spectral analysis by a normal ear undoes the spectral
smearing created by the hearing loss models currently employed that create output
waveforms for presentation to normal-hearing listeners. Using a perceptual model,
the limits on how much of different aspects of hearing loss could be simulated by
presenting smeared sounds to normal-hearing listeners could be determined.
134
Appendix A
Audiograms and Confusion
Matrices for the Real Subjects
Following are the audiograms of the subjects that participated in the confusion matrix
experiment. The amounts of hearing loss were used to tailor the simulation to try
achieve similar results to the subjects.
Audiogram for Subject PW
0
12
10
frequency [Hz]
Figure A-1: Audiogram for subject PW.
135
Audiogram for Subject JB
0
......
2 0 . ............................
........
...............................
....
....
......
CD
.......
...... .... .................................. ...........
.. ........
4 0 . ..................
-J
...........
0) 6 0 . .. ...............
....
..........
. ...............
...................
cc
80
.........
. .....
..................
10 3
10 2
frequency [Hz]
Audiogram for Subject JO
0
...
....................................
.......
....
.....
......
...... .......
...........
...........
20
CD
.......................................
...........
4 0 . ..................
CD
-J
60 . ......
.................... .......
.............
...... ...
.........
...
CU
M: 8 0
......... . . . . .... . ... . .. .
.....
....
................
10 3
10 2
frequency [Hz]
Figure A-2: Audiograms for subjects JB and JO.
136
.......
......
Audiogram for Subject RC
CO
... ..........
....
.....
......
.... . .......
2 0 . ............................
CD
-1
...
.........
... ............
......
...........
4 0 . ..................
.........
...
...........
....
.......... .......
... ......
.....
....
..
60 . ... ..............
....
...........
..................
....
....
....
.....
.......
CO
80 r ..................
...........
..............
.....
....
.....
....
...
..........................
10 3
10 2
frequency [Hz]
Audiogram for Subject MG
0
CO
....
................ .........
....
.....
....
......
...... .......
...........
2 0 . ..................
.. .....................
.. ...
...........
4 0 . ..................
........... .......
..................
....
....
... ...
......
..............
..... .....
6 0 . ..................
Cz
8 0 r ..................
............*..... ....
.....
..
......... ...................
10 3
10 2
frequency [Hz]
Figure A-3: Audiograms for subjects RC and MG.
137
Next are tables of the real subjects' actual confusion matrices. The confusion
matrices listed here had the bias removed and were reconstructed using algorithm
"a" in 4 dimensions by Dr. Louis Braida at MIT.
EE
5
82
98
1
0
0
0
5
1
0
1
10
I AH
AH
EE
EH
II
00
UH
EH
1
0
98
0
3
1
II
0
0
1
89
1
17
00
0
0
0
1
93
55
UH
10
0
0
4
1
14
Table A.1: Confusion matrix for subject JB.
AH
EE
EH
II
00
UH
AH
88
0
0
2
0
2
EE EH
0
5
0
100
98
0
1
0
6
0
0
1
II
1
0
1
89
1
9
00 UH
5
0
0
0
0
0
7
0
4
88
85
0
Table A.2: Confusion matrix for subject JO.
AH
EE
EH
II
00
UH
AH
95
0
0
1
0
0
EE
2
93
0
0
0
0
EH
0
0
97
0
0
0
II
0
0
1
98
0
2
00
0
0
0
0
97
51
UH
1
5
0
0
1
46
Table A.3: Confusion matrix for subject MG.
138
AH
EE
EH
II
00
UH
AH
89
2
2
1
0
0
EE
1
97
0
0
0
0
EH
1
0
84
2
0
0
II
1
0
13
88
0
0
00
0
0
0
2
68
39
UH
6
0
0
5
31
60
Table A.4: Confusion matrix for subject PW.
AH
EE
EH
II
00
UH
AH
100
0
0
0
0
0
EE
0
100
0
0
0
0
EH
0
0
93
1
11
0
II
0
0
0
92
3
3
00
0
0
6
2
84
3
UH
0
0
0
4
0
92
Table A.5: Confusion matrix for subject RC.
139
Appendix B
Tables of Computed Confusion
Matrices
The "standard amounts of smearing" used were: a widening factor of 3 for the Moore
method, 0.75 octaves for the ter Keurs method, and 28 ms for the Hou method. So,
for example, the Moore method in the Smearing Multiplier = 1.5 column simulates
auditory filters widened by a factor of 1.5 * 3 = 4.5.
140
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
97
0
8
0
0
0
96
0
7
0
0
2
87
0
11
0
0
2
95
0
12
0
0
1
EE
0
84
0
0
17
0
0
82
0
0
19
0
1
83
0
2
23
0
0
82
0
0
18
0
S.M. = 1.0
EH II
3
0
0
0
82
1
1 72
0 10
9 24
4
0
0
1
70
6
3 59
0 16
19 28
10
0
0
1
59
5
5 49
0 20
28 18
5
0
0
0
75
3
1 66
0
8
10 28
00 UH AH
0
0
97
15
0
0
0
9
8
12
15
0
73
1
0
3
63
0
0
0
93
17
0
0
1
17
7
18
20
0
64
1
0
3
48
3
1
1 83
15
0
0
1
24
9
23
20
0
53
4
0
1
50
5
0
0
95
17
0
0
0
10
12
20
0
13
72
2
0
59
1
2
EE
0
84
0
0
17
0
0
78
0
2
23
0
1
81
1
3
21
0
0
82
0
0
18
0
S.M. = 1.5
EH I 00
3
0
0
0
0
15
82
1
0
1 72
12
0 10
73
9 24
3
6
0
0
0
2
19
64
8
1
10 51
20
1 18
56
22 29
5
11
1
1
0
1
18
59
7
1
10 47
23
1 23
52
32 18
1
5
0
0
0
0
17
75
3
0
13
1 66
0
8
72
2
10 28
UH AH EE
0
97
0
0
0 84
9
8
0
15
0
0
1
0 17
63
0
0
1
89
0
0
0 74
20
7
1
17
1
4
1
1 27
42
4
0
4
78
1
0
0 79
23
11
1
16
1
6
3
1 22
44
7
0
0
95
0
0
0 82
10
12
0
20
0
0
2
0
18
59
1
0
S.M. = 2.0
EH II O
3
0
0
0
0
15
82
1
0
1 72
12
73
0 10
9 24
3
7
1
1
1
4
20
65
8
1
11 47
21
1 20
48
22 29
7
12
2
1
0
3
18
9
1
56
13 46
20
1 23
49
30 18
2
5
0
0
0
0
17
75
3
0
13
1 66
0
8
72
2
10 28
UH
0
0
9
15
1
63
3
0
18
17
2
38
6
0
22
14
3
42
0
0
10
20
2
59
Table B.1: Confusion matrices for the simulation done with subject JB's parameters. From top to bottom, for each row the
smearing conditions were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only. The amount of smearing in
each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix).
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH EE
2
87
2 77
2
14
1
4
3 21
0
6
7
78
5 69
9
11
5
5
8 27
1
6
7
78
7 68
4
23
2
8
5 26
1
7
5
85
75
5
4
16
2
6
4 18
1
6
S.M. = 2.0
S.M. = 1.5
S.M. = 1.0
I 00 UH AH EE EH I 00 UH
EH II 00 UH AH EE EH
1
1
1
12
1
84
1
1
1
11
1
85
0
1
10 0
4
21
3
1
2 69
2
19
1
2
2 73
0
18
1
2
17
5
10
51
2
15
16
5
9
54
2
14
15
4
9
56
16
21
8 43
5
6
17
22
7 48
2
5
19
18
6 51
9
39
3 11
4 34
9
49
8
3
4 27
6
58
6
6
27
15
29
17
2
10
31
15
14 31
1
8
38
12
11 32
2
3
1
14
4
76
2
2
1
13
6
77
2
2
1
10
3
20
3
2
3 70
0
20
2
1
5 71
0
21
2
1
23
6
5 43 11
11
7 23
7 42 11
11
24
7
37 11
18
15
46
13
5
4
20
15
43
13
4
4
22
17
12 39
10
35
7 12
5 31
6
40
6 14
7 28
3
39
6 18
32
7
24 26
1
9
34
6
23 27
1
7
37
6
21 28
0
1
0
14
7
77
0
1
0
14
7
77
0
1
0
13
0
18
1
5
6 68
0
18
1
6
6 68
0
18
1
5
15
8
7
42
4
24
15
7
7
4 42
24
15
7
8
43
23
13
6 47
3
8
23
13
48
6
3
8
22
13
6 48
5
52
5
7
5 26
5
53
6
6
5 26
5
52
6
6
39
8
33
13
1
7
39
7
13 33
1
7
39
7
13 33
1
1
0
9
5
85
1
1
0
9
5
85
1
1
0
9
0
18
1
1
5 75
0
18
1
1
5 75
0
18
1
1
15
5
8
51
4
16
15
5
8
51
4
16
15
5
8
51
23
13
52
5
2
6
23
13
5 52
2
6
23
13
5 52
4
58
6
18 10
4
4
58
6
10
18
4
4
58
6
10
40
10
8 35
1
6
40
10
8 35
1
6
40
10
8 35
Table B.2: Confusion matrices for the simulation done with subject JB's parameters.From top to bottom, for each row
the smearing conditions were: Moore sm. then recruitment, ter Keurs sm. then recruitment, Hou sm. then recruitment,
and just recruitment. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of
smearing' (see beginning of the appendix).
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH EE
0
100
0 94
1
0
0
0
4
0
0
1
0
99
0 94
1
4
1
0
8
0
0
1
0
92
0 86
1
6
5
0
12
0
0
2
0
100
0 93
1
0
1
0
8
0
0
1
S.M. = 2.0
S.M. = 1.5
S.M. = 1.0
I 00
EH IIJ O UH AH EE EH II O UH AH EE EH
0
0
0
0
0 100
0,
0
0
0
0 100
0
0
0
5
1
0
94
0
0
5
1
0
0 94
0
5
1
0
0
3
95
1
0
1
0
3
95
1
0
1
0
3
95
6
3 90
0
0
0
6
3 90
0
0
0
6
3 90
89
5
0
4
0
2
89
5
0
4
0
2
89
5
0
1
5
1
0
1
92
1
5
1
0
1
92
1
5
1
0
0
8
0
90
1
0
0
5
0
94
0
0
0
1
15
4
0
0 81
0
13
2
0
0 85
0
6
1
0
0
6
74
1
9
9
0
5
77
1
8
7
0
3
85
16
8 50
4
0
20
13
7 59
2
0
13
8
5 74
66
0 13
18
0
2
76
9
0
14
0
1
84
7
0
8
24
9
0
3
68
6
19
6
0
2
75
1
6 17
0
1
11
0
87
2
0
1
8
0
89
3
0
0
6
14
6
1
0 79
0
11
6
1
0 82
0
9
5
1
1
9
60
0
11
15
1
64 11
1
9
11
0
70 11
20
11 45
4
0
16
21
10 49
4
0
8
18
9 60
58
1 21
0 17
4
60
1 21
13
0
6
61
1 20
3
20
21
0
1
56
5
21 17
0
2
64
4
17 13
0
0
0
0
0 100
0
0
0
0
0 100
0
0
0
6
1
0
0 93
0
6
1
0
0 93
0
6
1
0
0
6
90
1
0
3
0
6
90
1
0
3
0
6
90
8
5 85
1
0
1
8
85
5
1
0
1
8
5 86
85
6
0
8
0
1
85
6
0
8
0
1
85
6
0
3
4
2
0
1
90
3
4
2
0
1
90
3
4
1
UH
0
0
1
0
2
92
2
0
10
21
3
55
2
0
18
21
3
54
0
0
3
1
1
90
Table B.3: Confusion matrices for the simulation done with subject JO's parameters. From top to bottom, for each row the
smearing conditions were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only. The amount of smearing in
each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix).
AH EE
0
99
AH
0 88
EE
0
6
EH
0
0
II
13
0
00
0
2
UH
0
92
AH
0 86
EE
0
6
EH
2
0
II
0 20
00
0
4
UH
0
AH 100
81
0
EE
0
0
EH
0
0
II
0 21
00
0
2
UH
0
AH 100
84
0
EE
1
0
EH
0
0
II
16
0
00
0
1
UH
S.M. = 2.0
S.M. = 1.5
S.M. = 1.0
EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH
2
0
0
8
0
91
0
0
0
3
0
96
0
0
0
1
0
20
3
0
0 77
0
10
2
0
0 88
0
11
1
0
10
0
6
72
0
11
9
0
5
76
0
10
12
0
3
79
21
14
9 52
3
0
17
15
7 59
1
0
8
17
5 71
3
64
0 11
0 22
1
78
9
0
0 12
1
76
0 10
59
10
21
7
0
3
68
8
7 15
0
2
70
3
9 16
2
0
1
10
0
88
3
0
0
6
0
91
3
0
0
5
0
15
3
1
0 81
0
14
3
1
0 82
0
10
3
1
19
5
0 45 14
17
19
3
48 18
0
12
23
1
59 10
21
22
38
12
6
0
16
29
9 43
3
0
11
25
8 52
4
56
1 21
18
0
3
59
1 21
16
0
4
57
0 19
51
5
23 19
0
3
52
3
18 23
0
4
53
4
20 19
0
0
0
0
0
100
0
0
0
0
0
0 100
0
0
0
0
17
1
0
0 82
0
17
1
0
0 81
0
17
2
0
14
0
4
82
0
0
14
0
4
82
0
0
14
0
4
83
3
14
7 76
0
0
3
13
77
7
0
0
2
13
6 78
2
73
6
0
0 19
2
72
6
0
0 20
2
70
7
0
75
4
4 16
0
2
76
4
3 15
0
2
4 78
2 14
0
0
0
0
0
100
0
0
0
0
0
0 100
0
0
0
0
14
2
0
0 84
0
14
2
0
0 84
0
14
2
0
6
0
5
88
1
0
6
0
5
88
1
0
6
0
5
88
2
10
5 83
0
0
2
10
5 83
0
0
2
10
5 83
3
75
5
0
0 16
3
75
5
0
16
0
3
75
5
0
81
3
1 13
0
1
81
3
1 13
0
1
81
3
1 13
Table B.4: Confusion matrices for the simulation done with subject JO's parameters.From top to bottom, for each row
the smearing conditions were: Moore sm. then recruitment, ter Keurs sm. then recruitment, Hou sm. then recruitment,
and just recruitment. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of
smearing' (see beginning of the appendix).
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH EE
0
100
0 95
1
0
0
0
4
0
0
1
0
99
0 93
1
2
1
0
3
0
0
1
0
92
0 85
1
4
6
0
12
0
0
3
0
99
94
0
0
0
1
0
5
0
0
1
S.M.
S.M. = 1.0
EH II O UH AH EE EH
0
0
0 100
0
0
0
0
0 95
0
4
1
0
95
1
0
1
0
3
95
3
0
0
0
6
3 90
0
4
0
2
90
4
0
1
0
1
94
3
1
1
5
0
94
0
0
0
0
0
0 89
0
6
1
0
77
1
8
5
0
5
88
8
3
0
5
8
6 80
0
8
0
2
90
5
0
5
0
2
89
4
2
4
6
0
89
5
0
0
3
1
0 81
0
7
7
1
66
0
9
13
0
71 12
15
4
0
5
17
12 61
2
11
0
6
67
1 14
10
0
2
72
5
9 11
0
0
99
1
0
0
0
0
0 94
0
5
1
0
91
0
0
2
0
7
91
5
1
0
0
8
5 86
0
4
0
1
90
4
0
2
0
1
90
4
3
2
S.M. = 2.0
1.5
UH
I OO UH AH EE EH IIO0
0
0
0
0
0
0 100
0
0
0
4
1
0
0 95
0
4
1
1
0
3
95
1
0
1
0
3
0
6
3 90
0
0
0
6
90
2
90
4
0
4
0
2
90
4
94
3
1
1
0
1
94
3
1
1
0
0
9
0
90
1
0
0
0
13
4
0
0 82
0
9
2
10
0
6
73
0
10
8
0
6
20
13
8 53
6
0
15
9
65
4
70
0 10
0 15
3
84
6
59
9
22
7
0
3
74
7
13
3
0
1
10
0
86
4
0
1
0
13
7
3
0 77
0
11
7
14
0
63 10
0
13
13
1
11
18
18
4 15 45
0
13
20
48
5
63
1 16
15
0
5
65
17
64
7
16
13
0
1
67
7
13
1
0
0
0
0
99
1
0
0
0
5
1
0
0 94
0
5
1
2
0
7
91
0
0
2
0
7
0
8
5 85
1
0
0
8
86
1
90
4
0
5
0
1
90
4
89
4
3
2
0
1
89
4
3
=
Table B.5: Confusion matrices for the simulation done with subject MG's parameters. From top to bottom, for each row
the smearing conditions were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only. The amount of smearing
in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix).
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH EE
99
0
0 88
5
1
0
0
0
14
0
1
91
0
0 85
1
5
0
2
0
20
3
0
100
0
0 79
0
1
0
0
18
0
1
0
100
0
0 86
0
1
0
0
0
15
1
0
S.M.
S.M. = 1.0
EH
I OO UH AH EE EH
97
0
2
0
0
1
0
0
0 84
0
1
11
0
80
3
0
11
8
1
75
1
6
0
5 69
17
9
0
75
1
0
16
0 10
1
0
4
8 16
3
72
0
7
5
0
0
3 89
0
0 84
1
11
1
3
9
1
52
62 11
1
21
3
9
11
0
26
8 53
0 17
60
3
0
17
1
19 20
2
56
3
0
18
0
0
0
0 100
0
0
0
0
79
0
0
2
19
1 83
83
5
0
12
0
4
0
0
7
7 77
12
0
19
0
75
2
0
5
3
3
78
1
0
3 15
0
0
0
0 100
0
0
0
0
2
12
0
0 86
0
7
0
1 89
89
4
6
6 80
12
3
0
0
15
0
78
3
0
0
4
1
0
2
4
82
2 12
= 1.5
II OO
0
1
4
56
8
25
0
2
16
42
21
23
0
1
5
77
5
15
0
2
4
80
4
12
0
14
0
15
75
6
0
13
2
29
58
1
0
19
0
12
75
3
0
12
0
12
78
4
UH
1
0
12
23
1
64
3
0
20
17
3
55
0
0
12
4
2
78
0
0
7
3
3
82
AH EE
91
0
0 81
1
11
0
3
0 17
2
0
87
0
0 82
15
1
1
7
0 17
3
0
100
0
0 79
0
1
0
0
0 19
0
1
100
0
0 86
0
1
0
0
0 15
0
1
S.M. = 2.0
EH II OO
0
7 0
0
3
16
72
6
0
13
9 50
0
8
71
12
6 21
11
1
0
15
12
5
48 15
10 39
22
0 20
59
20 22
4
0
0
0
1
19
0
83
4
0
12
7 76
0
5
75
3 15
3
0
0
0
0
2
12
4
0
89
6 80
12
0
4
78
2 12
4
UH
1
0
10
25
4
59
2
0
18
21
3
51
0
0
12
5
2
77
0
0
7
3
3
82
Table B.6: Confusion matrices for the simulation done with subject MG's parameters.From top to bottom, for each row
the smearing conditions were: Moore sm. then recruitment, ter Keurs sm. then recruitment, Hou sm. then recruitment,
and just recruitment. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of
smearing' (see beginning of the appendix).
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH EE
0
100
0 88
0
0
0
0
0 12
0
1
0
99
0 93
0
2
0
0
10
0
0
1
0
92
0 87
0
4
4
0
17
0
0
3
0
100
89
0
0
1
0
0
0 10
0
1
S.M.
S.M. = 1.0
EH
EE
AH
UH
00
II
EH
0
0
0 100
0
0
0
0
0 88
0
12
0
0
95
0
0
3
0
2
95
2
0
0
1
13
2 84
0
0 12
4
75
9
0
3
0
1
88
2
6
3
5
0
94
0
0
0
0
0
0 90
0
6
0
0
77
0
8
7
1
4
87
8
3
0
8
16
6 70
0
6
0
2
79
9
0
5
0
2
89
1
5
3
6
0
90
3
0
0
5
1
0 85
0
9
4
1
63
0
9
18
1
65 13
10
4
0
8
24
8 57
1
0 16
6
58
0 19
12
0
2
62
2
18 15
0
0
0 100
0
0
0
0
0 89
0
10
0
0
90
0
1
5
0
4
90
4
0
0
2
20
4 75
0
11
0
3
75
0 12
8
0
1
78
2
8 12
S.M. = 2.0
= 1.5
II 00 UH AH EE EH I 00 UH
0
0
0
0
0
0 100
0
0
0
12
0
0
0 88
0
12
0
3
0
2
95
0
0
3
0
2
1
13
2 84
0
0
1
13
84
4
75
9
0
12
0
4
75
9
88
2
6
3
0
1
88
2
6
1
0
0
9
0
90
1
0
0
0
13
4
0
0 82
0
8
2
10
0
6
73
0
10
8
0
6
20
13
8 53
6
0
15
8
66
4
70
0 10
0 15
3
85
6
59
9
21
7
0
3
74
7
13
4
0
1
10
0
86
4
0
0
0
12
5
2
0 80
0
10
5
15
0
11
61
0
13
16
1
11
19
17
14 45
5
0
16
23
47
5
59
1 17
0 18
5
59
19
63
6
11 19
0
1
4 67
15
0
0
0
0
0
100
0
0
0
0
10
0
0
0 89
0
10
0
6
0
4
89
0
1
5
0
4
2
19
74
4
0
0
2
19
75
3
75
0 12
0 10
3
75
12
77
2
8 12
0
1
78
2
12
Table B.7: Confusion matrices for the simulation done with subject RC's parameters. From top to bottom, for each row
the smearing conditions were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only. The amount of smearing
in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix).
00
AH EE
0
97
AH
0 84
EE
0
6
EH
1
0
II
19
0
00
0
2
UH
1
87
AH
0 80
EE
1
8
EH
3
0
II
0 24
00
0
2
UH
0
97
AH
0 78
EE
0
9
EH
0
0
II
21
0
00
0
0
UH
0
98
AH
81
0
EE
0
6
EH
1
0
II
18
0
00
0
0
UH
S.M. = 2.0
S.M. = 1.5
S.M. = 1.0
EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH
3
1
0
6
0
91
1
0
0
5
0
94
0
0
0
3
0
20
4
1
0 74
0
18
2
0
79
0
0
16
0
0
19
2
62 11
1
6
19
2
62 11
0
6
17
1
8
68
21
22
41
11
4
0
21
21
9 47
2
0
20
19
4 56
3
46
2 19
30
1
1
56
18
2
0 23
1
66
1 14
7 42
0 20 27
4
45
6
18 28
0
3
47
4
18 29
6
2
2
10
1
79
3
1
1
9
1
84
1
1
1
10
0
17
1
0
82
0
0
17
1
0
0 82
0
19
1
0
24
1
52 12
2
9
24
3
54 10
1
8
25
2
7
57
15
19
17 42
6
1
17
21
12 45
4
0
22
22
5 48
2
56
19
2
20
1
2
56
1 19
0 22
2
55
0 19
38
2
32 20
0
7
42
2
32 20
0
4
47
2
28 22
0
0
0
3
0
97
0
0
0
3
0
97
0
0
0
3
0
21
1
0
78
0
0
21
1
0
78
0
0
21
1
0
12
1
6
73
0
8
12
1
6
73
0
8
11
1
6
73
26
15
2 56
0
0
26
15
2 57
0
0
26
15
2 57
1
70
8
0
0 20
1
69
8
0
0 21
1
69
9
0
49
3
11 35
0
0
50
3
11 35
0
0
50
3
11 35
0
0
0
2
0
98
0
0
0
2
0
98
0
0
0
2
0
18
1
0
81
0
0
18
1
0
0 81
0
18
1
0
11
1
5
77
0
6
11
1
5
77
0
6
11
1
5
77
20
14
4 61
1
0
20
14
4 61
1
0
20
14
4 61
2
72
8
1
0 18
2
72
8
1
18
0
2
72
8
1
49
5
13 33
0
0
49
5
13 33
0
0
49
5
13 33
Table B.8: Confusion matrices for the simulation done with subject RC's parameters.From top to bottom, for each row
the smearing conditions were: Moore sm. then recruitment, ter Keurs sm. then recruitment, Hou sm. then recruitment,
and just recruitment. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of
smearing' (see beginning of the appendix).
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH EE
0
100
0 91
0
1
1
0
8
0
0
1
0
99
0 91
1
3
0
0
10
0
0
1
0
92
86
0
1
6
5
0
0 15
0
3
0
100
0 91
0
1
2
0
12
0
0
1
S.M. = 2.0
S.M. = 1.5
S.M. = 1.0
EH II 00 UH
EE
AH
EH II 00 UH AH EE EH II 00 UH
0
0
0
0
0
0 100
0
0
0
0
0 100
0
0
0
0
9
0
0
0 91
0
9
0
0
0 91
0
9
0
0
3
0
3
93
0
1
3
0
3
93
0
1
3
0
3
93
1
14
82
2
1
0
1
14
2 82
1
0
1
14
2 82
3
80
8
0
8
0
3
80
8
0
8
0
3
80
8
0
80
1
7
11
0
1
80
1
7
11
0
1
80
1
7
11
1
0
0
8
0
91
0
0
0
5
0
95
0
0
0
1
0
21
3
0
76
0
0
18
1
0
0 81
0
9
0
0
19
0
5
0 68
8
16
0
3
75
1
6
14
0
2
81
19
22
8 48
3
0
22
20
3 54
1
0
8
18
3 71
2
55
0 18
0 25
1
64
0 15
0 20
0
77
0 13
42
8
20 28
0
2
49
6
18 26
0
1
72
2
11 15
2
0
1
10
0
88
1
0
0
9
0
90
1
0
0
6
0
18
1
0
0 80
0
15
1
1
0 82
0
12
2
0
21
1
9
59
0
10
21
1
1 59 11
8
1 19
62 11
20
20
43
12
4
0
18
23
7 49
4
0
7
24
7 58
4
57
2 20
17
0
4
59
1 21
15
0
5
60
1 19
48
2
31 18
0
2
50
1
28 18
0
2
52
1
29 15
0
0
0
0
0
0 100
0
0
0
0
0 100
0
0
0
0
9
0 0
0 91
0
9
0 0
0 91
0
9
0 0
7
0
4
0 87
1
8
0
4
0 87
1
7
0
4
87
2
13
4 79
2
0
2
13
4 79
2
0
2
13
4 79
2
78
7
0
12
0
2
79
7
0
0 12
2
79
7
0
74
2
8
15
0
1
74
2
8
15
0
1
74
2
8
15
Table B.9: Confusion matrices for the simulation done with subject PW's parameters. From top to bottom, for each row
the smearing conditions were: No smearing, Moore sm. only, ter Keurs sm! only, Hou sm. only. The amount of smearing
in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix).
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH
EE
EH
II
00
UH
AH EE
0
98
83
0
0
5
1
0
0 20
0
1
1
85
0 78
2
6
4
0
0 27
0
2
0
97
0 76
0
9
1
0
21
0
0
0
0
98
0 80
1
6
2
0
19
0
0
0
S.M. = 2.0
S.M. = 1.5
S.M. = 1.0
EH II OO UH AH EE EH II OO UH AH EE EH II 00 UH
1
0
0
6
0
93
1
0
0
3
0
96
0
0
0
2
0
20
3
1
0 76
0
18
2
0
80
0
0
17
1
0
18
1
63 10
0
8
18
1
65 11
0
5
16
1
67 11
21
24
41
10
3
0
19
22
8 49
2
0
19
19
6 55
2
50
2 19
0 27
1
57
18
2
0 22
1
64
1 15
40
8
0 21 28
3
45
6
20 27
0
2
4 43
19 32
4
2
2
7
1
84
2
1
1
10
1
84
1
1
0
11
0
17
1
0
81
0
0
18
1
0
0 81
0
21
2
0
25
2
49 13
2
8
24
3
53 11
2
7
24
3
7
57
17
21
18 38
5
1
19
23
13 41
4
0
23
22
6 44
2
54
2 19
22
1
2
54
1 19
0 23
1
53
0 19
39
2
33 20
0
4
41
2
31 23
0
3
44
2
27 24
0
0
0
3
0
97
0
0
0
3
0
97
0
0
0
3
0
23
1
0
0 76
0
22
1
0
0 76
0
23
1
0
11
2
9
0 70
8
11
2
8
70
0
9
10
2
8
70
29
14
51
5
1
0
29
14
4 51
1
0
29
14
4 52
3
68
7
1
21
0
3
68
8
0
0 21
3
68
8
1
46
4
11 38
0
0
48
4
11 37
0
0
48
4
10 37
0
0
0
2
0
98
0
0
0
2
0
98
0
0
0
2
0
18
2
0
80
0
0
18
2
0
0 80
0
18
2
0
9
1
8
75
1
6
9
1
8
75
1
6
9
1
8
75
24
14
5 55
2
0
24
14
5 55
2
0
24
14
5 55
3
69
8
1
19
0
3
69
8
1
0 19
3
69
8
1
44
6
12 38
0
0
44
6
12 38
0
0
44
6
12 38
Table B.10: Confusion matrices for the simulation done with subject PW's parameters.From top to bottom, for each row
the smearing conditions were: Moore sm. then recruitment, ter Keurs sm. then recruitment, Hou sm. then recruitment,
and just recruitment. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of
smearing' (see beginning of the appendix).
Appendix C
Plots of x 2 Values
This appendix contains plots showing how well the simulated confusion matrices
match the real subjects' confusion matrices. To determine how well the confusion
2
matrices match, a X2 measure and a similar measure ((0 - E) ) were employed. The
natural log of each is plotted in the top and bottom parts of each figure, respectively.
Each plot compares one simulated confusion matrix to the real confusion matrices
from all of the subjects. The title of each plot specifies the subject whos audiogram
was used to generate the simulated confusion matrix, and the labels at the bottom
of each column specify the real confusion matrix that it is being compared to. Each
sub-column (indicated by the dotted lines) corresponds to one of the eight processing
conditions. A label at the top of the sub-column indicates the processing condition
for that column. The top symbol stands for the type of smearing, if any: '-' = no
smearing, 'M' = Moore smearing, 'K' = ter Keurs smearing, and 'H' = Hou smearing.
The bottom symbol stands for whether the smearing was followed by recruitment
or not: '-' = no recruitment, and 'R' = recruitment. In the title, 'S.M.' stands for
"smearing multiplier," which is a scale factor indicating the amount of smearing. The
actual amount of smearing used to generate the stimuli were (3 times the auditory
filter width for the Moore method, 0.75 octaves for the ter Keurs method, and 28
msec for the Hou method) times the smearing multiplier. So, for example, if a plot
says S.M. = 2.0 then that means e.g. the Moore method used smearing of 6 times
151
the normal auditory filter width. The first 15 plots in this appendix correspond to
the reduced audibility condition (i.e. the normal simulation), and the last 15 plots
correspond to data generated assuming that all frequencies were audible. These latter
15 are labeled as such in their captions.
152
ln(Chi-squared) measure for the simulation fitted to subject JB with S.M.=1.0
. .H.- - MK H MK H. . . . . . . . . . . . .- . MK
. .H ..
MK H- - MK .H .MK
- .MM--MM
R RR R
-RRRR--- R R RR--. R R RR-R R RR-- - -8
7
--
ci4
O
.
H -
2
1
0
5
)
:PW.
MG:
JB
: RC
:
40
35
30
25
20
15
10
Processing condition and real subject compared to the simulation
in( (O-E)2 ) measure for the simulation fitted to subject JB with S.M.=1.0
5
-M
- --
K H M KH--M K H M K H- - MK H MK H - RRRR---- RRRR
- RRRR- --
-'M
KH M K H-
RR RR
MK H'MK' HM
-RRR R
4
3
2
C,*J
0
1
0
-1
JO.
JB
-2
I
5
MG,
RC-
Pw
30
25
10
20
15
Processing condition and real subject compared to the simulation
35
Figure C-1: Original simulation including reduced audibility.
153
40
ln(Chi-squared) measure for the simulation fitted to subject JB with S.M.=1.5
9
-
.. .. .
.
.
. . .H.
M
M.
MK H MK H- - MK H MK H - - MK H MK H- - MK H MK H-- MK HMK HR R R
-R-R-R-R
- R RR RI-*- -*- RR-R-R
R.R R R--- -.-
8
7
6
5
co 4
-B
3
2
1
J:j~
0
5
RC-
:~
MG:
4)
35
30
25
20
15
10
Processing condition and real subject compared to the simulation
In( (O-E)2 ) measure for the simulation fitted to subject JB with S.M.=1.5
5
I
'.
:MK HMK- H- -- RR RR-
MK: HM K H'-
- - - R'R'R R
-
MK H MK H
- -M
RR R R
K: H M K H'-
-
I
-
-
MK*H MKH- -RPR R'R -* - -- RRRR
-
4
3
76J
2
0\
1
0
-1
JO:
JB
-2
)
5
MG
RC
PW
30
25
20
15
10
Processing condition and real subject compared to the simulation
35
Figure C-2: Original simulation including reduced audibility.
154
40
ln(Chi-squared) measure for the simulation fitted to subject JB with S.M.=2.0
9
.'R.
'
.
MK H MK H
-
-
--
8
MH'
MK H MK H-- MK H MKH K- HK HR' Re R R - - -* - R'R' R'R -- -- -'R' R' R' R
M
RKH-
-R'R: R'R
7
cD6
:3
u4
2
1
JO
JB
0I
t
IL
I
____I__I__
Pw
MG:
30
25
20
15
10
Processing condition and real subject compared to the simulation
5
RC : : :
4)
35
0
In( (O-E)2 ) measure for the simulation fitted to subject JB with S.M.=2.0
-
MK HM K H-
-
--
-
RRRR----
-
MK HM K H
- -
MK H MK H
-
R R R'R
R R R R
- MK H M K H - - MK H MK H RR RR
---- R R R R ----
4
3
2
C\J
1
C
0
-1
JO:
JB
-2
)
5
MG:
PW
30
25
20
15
10
Processing condition and real subject compared to the simulation
RC
35
Figure C-3: Original simulation including reduced audibility.
155
40
In(Chi-squared) measure for the simulation fitted to subject JO with S.M.=1.0
9
-
MK H MK H-I- MK H MK H -I-MK H MK H*-- MK H MK H- - MK H MK HRRR R---RR R R -'- -R'R R R
---. . . .R - .-'- . .- . . .R .R- --. . . RRRR----RR
-'-"-'-'R'R'R
8
7
SD6
0
54
:3
2
1
JB
0
JO
5
MG:
PW
30
25
20
10
15
Processing condition and real subject compared to the simulation
RC
40
35
In( (O-E)2 ) measure for the simulation fitted to subject JO with S.M.=1.0
5
MK H MK H- -MK HMKH- - MK H'MK H- -MKH MKH-- MK H MK H R RR . .R .R . . .--. --. . .- .RRRR
--R R R R-I
--- -RRRR---. . .
.......
4
3
2
W
0
C
1
0
-1
JO
JB
0
5
MG
PW
30
25
20
15
10
Processing condition and real subject compared to the simulation
RC
35
Figure C-4: Original simulation including reduced audibility.
156
40
In(Chi-squared) measure for the simulation fitted to subject JO with S.M.=1.5
9
_ MK H MK H-I- MK HM K HR'R'R'R
--.R'RR:R1
8
K H
MK H
-'--R'RR
--
MK H MK
MRR
R-
-I-
RR
MK H MK H-
R R R R
-
7
6
Cz
a)5
LFL.
-0
2
1
0
5
---
Jo
-
'
5
-
MG
PW:
30
25
20
15
10
Processing condition and real subject compared to the simulation
FC
40
35
In( (O-E) ) measure for the simulation fitted to subject JO with S.M.=1 .5
I
. 1
r
r 1 . .. .
- 1 - .-MK H MK HMK: HMK H- MK H MK H- - MK H MK H -K' H-MK: H
RRR
RRR R---- . . . .R .R .R.R . .-R
'- RRR R--R R R'R ----.
. . .
4
3
2
C\
1
0
-1
JB
-2
0
I
5
JO
MG
RC
PW
30
25
20
15
10
Processing condition and real subject compared to the simulation
35
Figure C-5: Original simulation including reduced audibility.
157
40
In(Chi-squared) measure for the simulation fitted to subject JO with S.M.=2.0
n1-
MK H MK H*- - MK H MK HR* R RR
'-'R"R'R* R - - - - R R R R-
MK H MK H
-
--
8
-
-
-
- - -' -'H
MK H MK HR R R R
-
M 'H MK HMK
RR R R
7
-
L
6
-5
Co
-
4
C)
c3
2
1
JB
C
I
5
JO
MG:
P W. .
30
25
20
15
10
Processing condition and real subject compared to the simulation
RC:
40
35
ln( (0-E)2 ) measure for the simulation fitted to subject JO with S.M.=2.0
-I
r. .
.
.
1
-
5
I
-
H-- M K H M K H- - MK H M K H - - MK HMK*H :HM
MK HK H
R RRR
R RR R
R .R- R R RR----R
. . .R .R*.R .R ---. . . . .R ..
4
3
2
0
1
0
-1
JB
-2
0
I
5
Jo
MG
PW
30
25
20
10
15
Processing condition and real subject compared to the simulation
RC
35
Figure C-6: Original simulation including reduced audibility.
158
40
In(Chi-squared) measure for the simulation fitted to subject MG with S.M.=1.0
9
MK H M K H-- M
-- R RR R- --
8
MK H MK HMK H K HM K H MHK HK H- - MH
- R RR R
RR R R-- - --- R
-. R RR R---
7
a)6
Cz
05
-C
PW1
2
1
JO
JB
01
1
5
0
M G:
1
1L
30
25
20
15
10
Processing condition and real subject compared to the simulation
40
35
In( (O-E) ) measure for the simulation fitted to subject MG with S.M.=1.0
MK H
-- MK
-MK H MKH- - H MKR. H-- M MKR"RR:
R R R---'-R'R'RR-R R
- R
-
--
MK H MK H- R .MK
- .- .H MK HR*R'RR---'-RR*R'R
-R
4
3
2
C\J
w
0:
1
0
-1
-2
r
)
Jo
JB
5
MG
RC
PW .
30
25
10
20
15
Processing condition and real subject compared to the simulation
35
Figure C-7: Original simulation including reduced audibility.
159
40
In(Chi-squared) measure for the simulation fitted to subject MG with S.M.=1.5
9
.
.
.
I
-
I
MK H MK H- -'MK H MK H-I- MK H 'MK H -R R R . . . .R.R. R-R
- - -- R R'R'R -.-.-.-. R R R R- - - -R
.
- MK H MK H -
8
-
MK H MK H .
. . . . RRR
7
2
1
JO
JB
Cz
RC-
.PW:
30
25
20
15
10
Processing condition and real subject compared to the simulation
5
0
MG:
35
4C
In( (O-E) ) measure for the simulation fitted to subject MG with S.M.=1.5
-MK
- - MK H MK H- MKHMK:H - MK*HMK H-- MKH:MK
HMK HR-R
RR R R - - - R R R R -* -. -. - R RRRR-R
- R
R
- R- R-- -'-
. . . .
.- -
. . . . .
~
.-
CD
C\1
wU
-
1 2
JB
J.
MG
PW
.RC
. . .
2-
-
0
5
30
25
20
15
10
Processing condition and real subject compared to the simulation
35
Figure C-8: Original simulation including reduced audibility.
160
40
9
-
8
In(Chi-squared) measure for the simulation fitted to subject MG with S.M.=2.0
I.
I
1
.
.
.
.
MK H MK H-I- MKHMK H-I- MK H MK H-I- MK H MK H- - MK H MK H-- -- .R .RR
- RR R
-"- R R R R- -'- - R RRR -. .R .--. . . .R .R.R
7
6
D-5
4
-
M
.H
-
J
-C.
JO
K
C-)
c53
2
1
0
0
5
..PW
MG
SI Jo
30
25
20
15
10
Processing condition and real subject compared to the simulation
RC
35
40
In( (0-E)2 ) measure for the simulation fitted to subject MG with S.M.=2.0
5
I
I
- MK'H M K H'
-- - -R: R*R R
-'M K
---
-I
H M K H - - MK H: M K H - - M KH M K H RR RR
RRRR--- - RRRR-- --
-MK
H MK H
-
---- R RR R
4
-LT3
CDJ
2
0
1
0
RO-
-1
Jo
JB
-2
0
5
J
MG:
Pw
30
25
20
15
10
Processing condition and real subject compared to the simulation
RC. . .
35
Figure C-9: Original simulation including reduced audibility.
161
40
In(Chi-squared) measure for the simulation fitted to subject PW with S.M.=1.0
1
9
8
.
-MK H MK H-RR R
- -:
MK H MK H-
J13
JO
-
RRR R
H RMK
MK H ---- R R RR
-
MK H MK H --
-
MK H MK - H MKR
R RR
RRRR----
7
'a6
Co
05
4
3
2
1
0
Co
Pw
30
25
20
15
10
Processing condition and real subject compared to the simulation
5
)
MG
40
35
In( (O-E)2 ) measure for the simulation fitted to subject PW with S.M.=1.0
5
MK H
---
.
1
.
[
. . .
-MKHMKH -- MK H MK H-I- MK H MK H'-- MK'H MK HRRRR
RR R R---R R R R- --R RRR ---R R R R----
MK H-
4
3
2
C\J
0
1
0
-1
JB
-2
0
I
5
JO.
-
MG
RC
PW:
30
25
20
15
10
Processing condition and real subject compared to the simulation
35
Figure C-10: Original simulation including reduced audibility.
162
40
In(Chi-squared) measure for the simulation fitted to subject PW with S.M.=1 .5
9
- MK H MK H - - MK H MK H- - MK H MK H- R'RRR--- - R R R
---- R R: R R - -R
8 - -
-
R
MK H MK H R-R RR
-
MK H MK H-
-- R R'RRF
7
RC..
6
~
(D5
-3
2
_j--
1
0
JO-
.JB
0
PW :
:G
5
40
35
30
25
20
15
10
Processing condition and real subject compared to the simulation
In( (O-E)2 ) measure for the simulation fitted to subject PW with S.M.=1.5
5
.
.
. .
.
.
.
.
_MK H MK H -- MK H MK H -R
R-[ -- -
'-
..
MK H MK H -- MK H MK H -R R
RRRR
RR--R
I..
MK*H MK H R RR
-
4
3
2
0
1
j--
0
-1
S
-2
)
JB
J0,
5
MG'
PW
30
25
20
15
10
Processing condition and real subject compared to the simulation
RC
35
Figure C-11: Original simulation including reduced audibility.
163
40
In(Chi-squared) measure for the simulation fitted to subject PW with S.M.=2.0
9
.
.1
RR
-- R
8
I
I
I
RR R---
K HMH
MK H MK H- -
R-
----
-
MK H MK H -
-
'RR R-R
-
MK H MK H
KH M K Hi
-
- MK H MK H
-
-
7
o5
04
FL
s3
J
2
1
JO:
J13
A
0
5
MG:
RC
PW:
40
35
30
25
20
15
10
Processing condition and real subject compared to the simulation
In( (0-E)2 ) measure for the simulation fitted to subject PW with S.M.=2.0
5
I
,...
-M
---
K H:MK H '-R'R*
R'R
- K H: M K H-- - -R* R R*R
M K: H M K H MRR R R
-
-
M K H M K. H-
R'R'R'R
R RRR
4
3
C\1
2
w
1
-IB
--
:JB:
Jo
0
-1
-2
0
5
MG .
P.W.
30
25
20
15
10
Processing condition and real subject compared to the simulation
RC
35
Figure C-12: Original simulation including reduced audibility.
164
40
In(Chi-squared) measure for the simulation fitted to subject RC with S.M.=1.0
I
9
MH MK H -i- MK H MK H- - MK HMK H
--- -- R.RR
_ _: - RRR-'
:- m~
8
M
KH M K* HR R
- MK H MKH-RC
RR
R
7
.: .: . .. .
'6
05
-
:3
M
:
2
1
JO.
JB
0
1
MqG-
RC:
40
35
30
25
20
15
10
Processing condition and real subject compared to the simulation
5
In( (O-E)2 ) measure for the simulation fitted to subject RC with S.M.=1.0
r
5
JBRR R R
MK H MK H -
---- R RRR
-
-
I
-
-
-
- M K: H MK H RRR R
- ---R* R R R - - -.- R R RR-
MK HG K'H-
-
M K H M K H-
4
3
2
0
0
-1
JB .
0
5
JO .
MG .
PW
30
25
20
15
10
Processing condition and real subject compared to the simulation
RC
35
Figure C-13: Original simulation including reduced audibility.
165
40
In(Chi-squared) measure for the simulation fitted to subject RC with S.M.=1.5
-
9
-
MK H MK H-- MK H MK H- -'MK H MK H - -R R. RR
- -. R R R R
R RR-- -
-
8
K'H'MK:H'-
- MK H MK H -
R
7
F6
-
MK: H MK H
-
2
1
JI3
0
5
4
MG:
JO:
PW.......................
RC:
4)
35
30
25
20
15
10
Processing condition and real subject compared to the simulation
In( (O-E)2 ) measure for the simulation fitted to subject RC with S.M.=1.5
5
1
.
--
M K: H MK H - - M K H M K H -RRRR-- -- RRRR
-
-MK H MK H R RRR
- - R: RPRwR
MK H MK H -
-
-
MK H MK H
-
R- RRRR
I
3
2
0
1
0
-1
JB
0
JO.
I
5
-
RC
MG:
30
25
20
15
10
Processing condition and real subject compared to the simulation
35
Figure C-14: Original simulation including reduced audibility.
166
40
In(Chi-squared) measure for the simulation fitted to subject RC with S.M.=2.0
-
9
- MK H MK H-- MK H MK HRRRR
RR RR ----
8
-
I
MK H MK H - - MK H MK H R - RR R - ..
R R'R'R
MK H MK H RR~
-
I
..............
7
(D6
5
.. R
ci4
RC
-3
E53
2
1
0
JO:
J3
5
0
MG-
PW
4)
35
30
25
20
15
10
Processing condition and real subject compared to the simulation
In( (O-E)2 ) measure for the simulation fitted to subject RC with S.M.=2.0
-r
5
-
H-- MK H.M HMK M K H MK H -KMK
RRRR
RRRR---RRRR---- RRRR----
MK H MK H
--
-
L
JB
-
-
M K H M K H-
R R"R R
4
3
(D
C
0
2
1
0
-1
-2
)
JO.
5
MG
PW
30
25
20
15
10
Processing condition and real subject compared to the simulation
RC
35
Figure C-15: Original simulation including reduced audibility.
167
40
ln(Chi-squared) measure for the simulation fitted to subject JB with S.M.=1 .0
9
-
MK H MK H - - -
-
R 'R RR
:
8
I
KKMK
MK H MK H - - MK H MK H R'R R R
----R R RR
-'M
-
MK H MK H - - MK H MK H-
R R R'R
R R'R'R
7
6
CO
(D5
---
a4
04
-
-1
MKHM
3
2
:G
1
JE3
0
5
PW
JO-
I
-
-
30
25
20
15
10
Processing condition and real subject compared to the simulation
35
40
In( (0-E)2 ) measure for the simulation fitted to subject JB with S.M.=1.0
5.
.
1...
.
.
1
I
.
MK H M K' H- -MK HMK H-- MKHMKH- -- MK H K H- - MK:H MK HRR
R -RR
RRRR
RRR R----'-R'RR* R ---- RRRR- --4
3
(U
2
1
C
0
-1
JO
JB
-2
0
5
MG'
PW
30
25
20
15
10
Processing condition and real subject compared to the simulation
Figure C-16: All frequencies audible condition.
168
. RC
35
40
ln(Chi-squared) measure for the simulation fitted to subject JB with S.M.=1.5
KH - - MK H MK H-I- MK H MK H- - MK H MK H- MKHK
-- - -R R R R- - - - R R R R-'- -'- , R'R' RR- - -.
1- ' ,
8
I ,
7
-MKH M K H-- - -R
R'R*R
-
4
C05
:3
2
1
JB
0
I
5
JO
I
MG:
-
CO
PW:
40
35
30
25
20
15
10
Processing condition and real subject compared to the simulation
In( (0-E)2 ) measure for the simulation fitted to subject JB with S.M.=1.5
5
-:MK
---
H MK H- - MK H MK H-- MKHMK H-- MK H MKH R RRRRRRR---RRRR- --RRR R----
- MK:H MK H-
RRR
-
4
3
2
0
1
C
0
-1
Jo
JB
-2
0
5
MG,
PW
30
25
20
15
10
Processing condition and real subject compared to the simulation
Figure C-17: All frequencies audible condition.
169
RC
35
40
9
-
ln(Chi-squared) measure for the simulation fitted to subject JB with S.M.=2.0
I
MK H MK H-- MK H MK H- - MK H MK H- - MK H MK H - - MK H MK H-
R R
---
8
R
R
-.-
*- - R RR R - - --
RRR---
-
R R RR
7
6
-J
co4
3
.--
2
1
JB
0
0
FW:
MG:
5
30
25
20
15
10
Processing condition and real subject compared to the simulation
RC
35
40
In( (0-E) ) measure for the simulation fitted to subject JB with S.M.=2.0
5
.
.T
I
.
.
:MK H MK H - -K
R R R'RI - - -R
--
.
.
I
.
..
MKHMK H- - MK'H MK HK H MKK-H-- K
RR'RR--. . . .R'.RR
.
. . .R
. .R .R- .- -.--. RRR
. . . R'R
. R---.
4
-Lr-L
3
2
C\J
0
1
-R-
-
RC
35
40
0
-1
Jo
JB
-2
0
5
MG*
Pw 30
25
20
15
10
Processing condition and real subject compared to the simulation
Figure C-18: All frequencies audible condition.
170
ln(Chi-squared) measure for the simulation fitted to subject JO with S.M.=11.0
9
- MK H MK H R"R R R
R R R ----
MK HMKH - -MKHMKH-
-
R R R
-
.R*
8
-
MK H MK H - - MK H MK H --R R RR- - -- R R RR
7
6
CO
-05
-
-
.PW
-
-K
2
1
0
JO:
30
25
20
15
10
Processing condition and real subject compared to the simulation
5
)
RC-
M G:
40
35
In( (0-E)2 measure for the simulation fitted to subject JO with S.M.=1 .0
-
-
HM
RR RR-
-M
MKK H MKH-
-
- - RRRR---
.W
-
4
-
-
- MK H MK
RRRR-M
--
MK: H MKHH R'R R
-R
R'R R R--
-
3 -
CI
0
-
-
2
r
~
1
-
JB .RC
... J.
MG
PW
0
-1
-2
0
5
30
25
20
15
10
Processing condition and real subject compared to the simulation
Figure C-19: All frequencies audible condition.
171
35
40
In(Chi-squared) measure for the simulation fitted to subject JO with S.M.=1.5
9
-
MK H MK H
-
- -R* R R R
8
MK H MK H - - MK H MK H - - MK H MK H MKHMK H
R RRR- -- '-'RR R R
R'R R R
RR RR ----
7
6
05
:3
2
1
JO
JB
0
5
RC-
:PW.
MG
30
25
20
15
10
Processing condition and real subject compared to the simulation
35
4)
In( (O-E) 2 ) measure for the simulation fitted to subject JO with S.M.=1.5
5
.
-
.
I.
I
I
.
-
I
MK H MK H - - M K: H M K H M K: H MK H M K H M K: H MK H M K H RRRR--- -. R R R R
R - ----- R R
-- R R R R -----RR
.
4-
376
2:
0
1
C
0-1
-
-MG
0
5
30
25
15
20
10
Processing condition and real subject compared to the simulation
Figure C-20: All frequencies audible condition.
172
35
40
ln(Chi-squared) measure for the simulation fitted to subject JO with S.M.=2.0
9
I
I
- MK H MK H'- - MK H MK H-I- MK H MK H'- - MK H MK H R'R'R'R
-R- R R R R---.R RR R--RRR
----
8
M K'H M K
R R R R
7
MK H MK H
-
-
6
05
2
1
0
JO:
-JB
5
)
Pw .
MG
I
_
I
_
_
_ .I
I
RC:
I__
30
25
20
15
10
Processing condition and real subject compared to the simulation
40
35
In( (O-E)2 ) measure for the simulation fitted to subject JO with S.M.=2.0
5
I
-
MM--
-
-
-
- R RRR-
I
-M
- --
*
I
*
HMK
-
-
R RRR-
MK H MKH
--
-
-
M K HMK H -
- R RRR ----
RR R R
-
K: H' MK'H - -RR RR
4
3
2
1
0
-1
JB
-2
JO
5
MG.
PW
25
30
10
15
20
Processing condition and real subject compared to the simulation
Figure C-21: All frequencies audible condition.
173
RC
35
40
In(Chi-squared) measure for the simulation fitted to subject MG with S.M.=1.0
9
HR-- -MK H
-. MK
R RIR
8
.. .
. . . . ...
.
H-MK*H
R R
- - HMK
- R R HRR
R - MK
- - -RMK
H MK HR--MK- H
RR- MK
RR H-I-*MK
R RI-.- - -R*R
7
a)6
CO
05
3
2
1
0
JO . . .
JB
0
RC.
FW:
M G-
40
35
30
25
20
15
10
Processing condition and real subject compared to the simulation
5
In( (O-E)2 ) measure for the simulation fitted to subject MG with S.M.=11.0
M
-
H MK H
-
- MK H MKH
-
- MK H MK H -
RRR R----
R RRR--
RR R
MK H MK H- - - - - R'R R R-
MK H MK H-
P-W -
RC.
-
RR R'R
---
4
3
2
0 1
C
0
-1
JO
JB
-21
)
I
5
I
MG
Ii
30
25
20
15
10
Processing condition and real subject compared to the simulation
Figure C-22: All frequencies audible condition.
174
35
40
In(Chi-squared) measure for the simulation fitted to subject MG with S.M.=11.5
9
. . MK
. H MK H -I- .MIK H MIK .H-. .H-I. . MK
. . H .MK H. -I-_:MK H MK H -I- MK H MK
R-R RR
-RR R R
R R R RI---R . . .R .R .R- . --. . .R.R .R RI-
8
7
D6
5
RC
4
1
JO
JB3
-C
5
0
:PW
MG:
35
30
25
20
15
10
Processing condition and real subject compared to the simulation
40
In( (O-E)2 measure for the simulation fitted to subject MG with S.M.=1.5
K H
-
-
-
-
-
M
M K H M K H'K H
RRR RRRRR-- --
HMK HRR R R--
K
R R R R-
MKH
-
--
MK: H MK HR RR R
---
3-
0
I
JO
JB
0
5
.
.
.
.
.
.
....
.
.
MG:
.
._..
.
.
.
.
.
.
.
WR
30
25
20
15
10
Processing condition and real subject compared to the simulation
Figure C-23: All frequencies audible condition.
175
35
40
ln(Chi-squared) measure for the simulation fitted to subject MG with S.M.=2.0
I
9
-
8
MK H MK H - - MK H MK H-- MK H MK H- - MK H MK H R RR
- RR RR - - - - R R R R- --- '-'R'R RR
-
MK H MK H
-
7
(D6
05
C4
2
1
JO
JB
C
0
R:-
. .
.PW.
:M:G.
35
30
25
20
15
10
Processing condition and real subject compared to the simulation
5
40
In( (O-E) ) measure for the simulation fitted to subject MG with S.M.=2.0
-MK
-
HM
RR-
H -
-
- -R R
R R R R- R R
M K H MK H - -'M
-
H MKH- R---
-. -R -
M K: H MK H - R R R R
3 -~
(D
0
0-
..
-
-~
-
2
0
JOMGPFC
JB
5
_ . _. _. _.
. . ...
40
35
30
25
20
15
10
Processing condition and real subject compared to the simulation
Figure C-24: All frequencies audible condition.
176
ln(Chi-squared) measure for the simulation fitted to subject PW with S.M.=1 .0
9
MK
- MK H MK H- - MK H"MK H-I - MK H
- RR
R R R R- - - - R R R R
I- ---
8
H'- - MK H MK H - - MK H MK H R'RRF
RR --RRRR -----
7
6
-05
3a
M:
2
1
0
K
JO
JB
5
)
Co
W
40
35
30
25
20
15
10
Processing condition and real subject compared to the simulation
s3
In( (O-E)2 ) measure for the simulation fitted to subject PW with S.M.=1.0
5
-'R
R - R R- R - MK:H MK R. :MK:H:MK:H-. .H-. .H .MK
. .H.-- .MK
. .H .MK
.
H-I
MK..
*H .MK .H.-- .MK
RRRR
RRRR---R---R
R
R
-RR
R
R
RIRR
R
-- -
4
2
C\
1
0
-1
JB
-2
I
)
5
Jo
P
MG:
RC
I
30
25
20
10
15
Processing condition and real subject compared to the simulation
Figure C-25: All frequencies audible condition.
177
35
40
ln(Chi-squared) measure for the simulation fitted to subject PW with S.M.=1.5
9
- MK H MK H - -R R R R
-
-
MK HMK H - - MK H MK H - - MK H MK H - - MK H MK HR R R
RRRR----R
- R'RRR- - -R R RR - -.-
8
7
06
5
C-
.3
2
1
J-
00
JO
RW:
MG-
30
25
20
15
10
Processing condition and real subject compared to the simulation
5
In( (O-E) ) measure for the simulation fitted to subject PW with S.M.=1.5
.M
.
51
MK H
.:..
.
.M-
RC40
35
-
KH MK HMK HMK H- - MK H MK H-- MKHK'MKH
RR RR
- - -- R RRR--- -- R R RR----RRR
K H-
-RRRR--
4
3
2
N'
0U
1
0
-1
JB
-2
)
I
5
JO
I
MG
PW
30
25
20
15
10
Processing condition and real subject compared to the simulation
Figure C-26: All frequencies audible condition.
178
RC
35
40
ln(Chi-squared) measure for the simulation fitted to subject PW with S.M.=2.0
MK H MK H - - MK H MK H --'-R*
R'R'R - - -- R RR R
-
8
MK H MK H
- - R
- -.
-
MK H MK H - - MK H MK H -R-R-R-. . .- .R.R.R.R
. . . .
. . . -RRR---
-
7
7
6
c4
:3
~o
0
2
1
0
JO
JB
5
)
MG:
PW .
30
25
20
15
10
Processing condition and real subject compared to the simulation
40
35
In( (O-E)2 ) measure for the simulation fitted to subject PW with S.M.=2.0
5
I
I
II
-MKH
M H - - MKHMK - - M K: H MK H'- - M K H M K H - - M K: H'M K HRRRR- - - - R R R R
...
-R R R'R R -- - -R RRR---- R-RR R
4
3
2
0 1
0
-1
JB
-21
0
I
5
Jo
I
MG
PW
30
25
10
20
15
Processing condition and real subject compared to the simulation
Figure C-27: All frequencies audible condition.
179
RC.
-
35
40
ln(Chi-squared) measure for the simulation fitted to subject RC with S.M.=1.0
9
MK H MK H -
-
MK H MK H
R-R R R-
-
RJ R'R'R
8
MK H MK H R- R R R
-'MK'H
-
K H MK HKMK H
R- RRRR- - - - RRRR
7
(D6
05
C4
:3
:
2
1
JO:
JB
01
MG:
-
PW
35
30
25
20
15
10
Processing condition and real subject compared to the simulation
5
40
In( (O-E) ) measure for the simulation fitted to subject RC with S.M.=11.0
5
-:MK
----
-
4
H MK H-
-
MK H MK'H-
RRRR---
- R R R R
MK H MK H - R R R R
- MK H MK H - - MKH MK H RRRR---- -R
- - - -
M
. ..
~
.3 .(D
-
2
w
0 1
-
-
-
CMJ
.RC ~J
.....
MG
PW
.
0
-1
-2
0
5
30
25
20
15
10
Processing condition and real subject compared to the simulation
Figure C-28: All frequencies audible condition.
180
35
40
ln(Chi-squared) measure for the simulation fitted to subject RC with S.M.=1.5
9
. .H .MK H-. .H-I. .MK
.MK
. .H .MK
-
RRR-
---
-
-
R R R R
MK H MK H--
-
-
R R R
-
R--
MK H MK H---
R R R R
MK H MK H R R
8
' -'
7
'36
-7
H-
:3
05
c4
-c
-
. .J
2
1
5
0
40
35
In( (0-E)2 ) measure for the simulation fitted to subject RC with S.M.=1 .5
-
5
R-VV:
MG
JO
30
25
20
15
10
Processing condition and real subject compared to the simulation
-'MK
--
HMK H -
'-R'R'R:R
-
MK HMK H-I- MK H MK H-R'R R R-RR R ---
MK H MK H-- MK:H MK HR .R*. R". R
. . .
.RR
. - .- . - . R . RR'
4
---- RRRR
3
R-
i;. 2
0
1
0
-1
JO
JB
-2
0
5
I
MG
PW
30
25
10
20
15
Processing condition and real subject compared to the simulation
Figure C-29: All frequencies audible condition.
181
...RC :
35
40
ln(Chi-squared) measure for the simulation fitted to subject RC with S.M.=2.0
9
8
.
1
I
I
-
-
- MK H MK H- - MK H MK H- - MK H MK'H'R R R R
.. R.RRRI-- -'-'PR PRR
MK:H- -
-MKH:
MK H MK H-
-
-R'R*R'R
----
R*R R R
7
'a6
D05c4
=3
2
1
JO
JB
0
C
RCR
P'W:
40
35
30
25
20
15
10
Processing condition and real subject compared to the simulation
5
In( (O-E) ) measure for the simulation fitted to subject RC with S.M.=2.0
5
I.
M KHMH M RRRR
- -
. .
KHM-MHMH
K HM
-
.
M K H: M K H-- M K H M K H - R RR R
RRR R-----
RR R R-
RRRR----
---
.
..
.
4
3
.
.
.
..
.
.
.
...
....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....
.
.
.
..
..
..
..
..
..
.
.
.
.
....
2
C'J
.
.
.
0
..
.
.
..
.
....
.
...
.
..
1
. .
. .....
0
.
.
.
.
.
.
..
..
.
.
.
.
.
.
.
..
.
. .
. ...
..
.
. .
. .
.
. ...
.
.
.
.
.
. .
..
.
.
. .
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
.
.
.
.
.
...
.
.
...................
.
.
.
-1
-2
JB . ...
a
:
0
5
JO
MG ...
RC'
P.W
I
_______
30
25
20
15
10
Processing condition and real subject compared to the simulation
Figure C-30: All frequencies audible condition.
182
_____
35
L_
40
Appendix D
Recommendations for an Improved
Simulation
D.1
Rationale and Background
It has been found that a location on the basilar membrane (BM) has a transfer
function that acts compressively to tones at its characteristic frequency (CF), while
it responds linearly to tones an octave or more below its CF (Oxenham and Plack,
1997). The current simulation does not take this effect into account. Also, it is desired
in general to have an improved simulation of hearing loss. This appendix describes
one possible model that should address these concerns.
D.2
Overview of Model
The model proposed here is largely based on a model of the BM's response to low-side
suppression. The model is described in Cai and Geisler (1996), and it simulates the
response of a single location on the BM to a low-frequency suppressor tone and a
tone at the CF of the BM location that is suppressed. The model described here uses
the same principles found in that model, but expands it to multiple locations on the
BM. It generally simulates the pattern of BM motion found in the ear, to produce
183
a set of "channels." These channels correspond to the pattern of BM motion that
would be detected by a single inner hair cell in the cochlea. Or, the channels could
be thought of as the outputs of a set of filters similar to auditory filters. Although
it is desirable to have a simulation that can produce a single waveform that can be
played to normal-hearing people in order to simulate hearing loss, it is probably not
possible to achieve all of the effects of hearing loss if this is done (Moore et al., 1992).
However, this thesis made use of a perceptual model that used the outputs of a set
of auditory filters to do discrimination. This model is potentially quite powerful,
as it does not require a signal to be analyzed by a normal ear first, which would
tend to remove the simulated effects of hearing loss due to its good frequency and
time resolution. Due to these potential benefits, a model of simulating hearing loss
is described that has multiple outputs corresponding to the outputs of a filter bank.
The resulting signal could be analyzed by either the perceptual model used in this
thesis, or another model that takes similar inputs.
D.3
Details of Model
A diagram outlining the model is shown in figure D-1.
The model works as
follows. First, a signal is filtered into a number of frequency bands by a series of
logarithmically-spaced filters, such that the filters are of increasing width. This step
is crudely intended to approximate the filtering that would occur on the basilar
membrane in the absence of active amplification and tuning.
The entire model,
including this step, uses filtering of the entire signal on a continuous-time basisthat is, it does not break the input signal up into frames as was done in the Moore
and ter Keurs methods in this thesis. This is to prevent the negative effects of the
overlap-add method, as explained in Chapter 2 of this thesis.
What follows is easiest to explain from the perspective of how to form a single
channel at the output. Each channel, as previously stated, corresponds to the pattern
of BM motion that would be detected by a single inner hair cell. This pattern is similar
184
to that seen at the outputs of auditory filters, if they are simulated, but it differs in
some ways. The channels include low frequencies with a gain of 1, while conventional
auditory filters exclude low frequencies. Also, the frequency selectivity in general of
these filters is probably different from that of typical auditory filters.
To produce a channel, the outputs from each of the initial bandpass filters are
put through a set of gains, then added together. This is to simulate the magnitudes
of travelling waves on the BM as they pass by a particular location corresponding
to the channel. A plot of a possible set of gains corresponding to a single channel
is shown in the bottom right portion of the diagram. At the top of the plot are
shown the contributions from low frequencies: since a location on the BM responds
linearly to frequencies an octave or more below its CF, the low frequencies contribute
with a gain of 1 to the current channel being examined. In the middle of the plot,
it can be seen that frequencies around the frequency corresponding to the CF of the
channel are amplified (the details of how much amplification is used will be described
shortly). At the bottom of the plot, the frequencies higher than the channel CF make
no contribution to that location on the BM. Once the contributions of all the input
frequencies are added together, the result is a channel.
The particular gains used on each of the input frequencies are not fixed, however.
It is known that the BM responds compressively to sounds within about 30-90 dB
SPL, and so this must be taken into account in the model. How this is done is based
on the Cai and Geisler (1996) model previously mentioned. The model simulates
the effect of outer hair cells causing active amplification of the BM motion. This is
done by placing the signal corresponding to the BM motion at the location being
examined (i.e., the channel, in this model) through a nonlinear transfer function that
represents the response of OHCs. This transfer function between the BM motion and
the "OHC output affecting the gain" is sigmoidal and asymmetric, as shown in the
middle left of the diagram. This output from the OHC transfer function is then put
through a lowpass filter to simulate how high-Q filters cannot respond instantaneously
to changes in gain. Finally, the output from this low-pass filter is used to select a set
185
of gains for the input frequencies contributing to the channels.
The particular set of gains chosen to use on the input frequencies will depend on
the level of the channel. It is in this manner that the compression of the BM is taken
into consideration. Also, the set of gains will change shape somewhat with level.
This results from the effect known as "the upward spread of masking;" the change in
gain curves with level can be seen in the plot at the bottom of the diagram. In the
plot, at higher levels the curves amplify more and more low frequencies. The exact
curves used for the gains are the responses of the BM to off-CF tones, which are still
being experimentally determined. Also, it should be noted that the feedback system
between the channels and the gains of the input frequencies could vary considerably,
as long as the appropriate set of gains is selected for a given level of the channel.
Due to this model's combining different frequencies nonlinearly to simulate a
single channel, various nonlinear phenomena of the BM such as suppression should
be accurately simulated at the outputs.
D.4
Simulating Hearing Loss
The model described thus far is intended to simulate a normal ear. To alter the model
to produce the responses corresponding to an impaired ear, the only parameter that
needs changing is the outer hair cell transfer function. Since the compressive action
of the outer hair cells drives the BM nonlinearities, this compression simply must be
removed to simulate hearing loss. So, the outer hair cell transfer function should be
made more linear, which will cause the on-CF frequencies to be amplified less, and
thereby effectively include more of the surrounding frequencies, causing frequency
smearing. Also, the level of the channel will be decreased because less gain is being
applied. This simulates the absolute threshold reduction found in impaired ears. The
effect of loudness recruitment is also implied through this reduction in the compressive
nature of the outer hair cells.
186
D.5
Possible Sources of Error
In the paper by Cai and Geisler (1996), they commented that the cutoff frequency of
the lowpass filter in their model needed to be adjusted for different frequency inputs.
It may be the case that in this model as well, a fixed cutoff value for the lowpass
filter for a given channel will not produce correct gains for all possible sets of input
frequencies. Also, the filtering at the input of this model was somewhat arbitrarily
determined, and may not be completely accurate. Finally, it may prove to be difficult
to find the correct sets of gain curves for various channel levels.
187
LPF
LOW Freq.
Compression
Channel
ft 1
LPF
Compression
Channel
m,2
Original_
Signal
LPF
Compression
Channel
so 3
LPF
Compression
Channel N
High Freq.
Output
Output
Input
Compression
Normal-hearing
Listener
Input
Impaired-hearing
Listener
Gain = 1
=
Variable gain
depending on
output from the
LPF. The pattern
of gains for a
channel should
resemble the
following
diagram:
A set of gains is
chosen depending
on the output of
the Compression
and LPF stages.
Low Freq.
[ >'-s
Center Freq.
of the channel
High Freq.
1_
's
frequency
Figure D-1: Diagram of new hearing loss model.
188
Gain of
different
frequencies
contributing
towards a
channel
References
Baer, Thomas, and Moore, Brian C. J. 1993. Effects of spectral smearing on the
intelligibility of sentences in noise. J. Acoust. Soc. Am., 94(3), 1229-1241.
Baer, Thomas, and Moore, Brian C. J. 1994. Effects of spectral smearing on the
intelligibility of sentences in the presence of interfering speech. J. Acoust. Soc.
Am., 95(4), 2277-2280.
Braida, Louis D. 1991.
Crossmodal integration in the identification of consonant
segments. The Quarterly Journal of Experimental Psychology, 43A(3), 647-677.
Farrar, Catherine L., Reed, Charlotte M., Ito, Yoshiko, Durlach, Nathaniel I.,
Delhorne, Lorraine A., Zurek, Patrick M., and Braida, Louis D. 1987. Spectralshape discrimination. i. results from normal-hearing listeners for stationary
broadband noises. J. Acoust. Soc. Am., 81(4), 1085-1092.
Graf, Issac John. 1997 (Sept.). Simulation of the effects of sensorineuralhearing loss.
Master's thesis, Massachusetts Institute of Technology, Department of Electrical
Engineering and Computer Science.
Hou, Zezhang, and Pavlovic, Chaslav V. 1994.
Effects of temporal smearing on
temporal resolution, frequency selectivity, and speech intelligibility. J. Acoust.
Soc. Am., 96(3), 1325-1340.
Leek, M. R., Dorman, M. F., and Summerfield,
Q. 1987.
Minimum spectral contrast
for vowel identification by normal-hearing and hearing-impaired listeners.
Acoust. Soc. Am., 81, 148-154.
189
J.
Moore, B. C. J., Wojtczak, M., and Vickers, D. A. 1996. Effect of loudness recruitment
on the perception of amplitude modulation. J. Acoust. Soc. Am., 100, 481-489.
Moore, Brian C. J., and Glasberg, Brian R. 1993. Simulation of the effects of loudness
recruitment and threshold elevation on the intelligibility of speech in quiet and
in a background of speech. J. Acoust. Soc. Am., 94(4), 2050-2062.
Moore, Brian C. J., and Glasberg, Brian R. 1997. A model of loudness perception
applied to cochlear hearing loss. Auditory Neuroscience, 3, 289-311.
Moore, Brian C.J., Glasberg, Brian R., and Simpson, Andrew. 1992. Evaluation of a
method of simulating reduced frequency selectivity. J. Acoust. Soc. Am., 91(6),
3402-3423.
Moore, Brian C.J., Glasberg, Brian R., and Vickers, Deborah A. 1995. Simulation of
the effects of loudness recruitment on the intelligibility of speech in noise. British
Journal of Audiology, 29, 131-143.
Nejime, Yoshito, and Moore, Brian C. J. 1997. Simulation of the effect of threshold
elevation and loudness recruitment combined with reduced frequency selectivity
on the intelligibility of speech in noise. J. Acoust. Soc. Am., 102(1), 603-615.
Oxenham, A. J., and Plack, C. J. 1997. A behavioral measure of basilar-membrane
nonlinearity in listeners with normal and impaired hearing. J. Acoust. Soc. Am.,
101(6), 3666-3675.
Oxenham, Y., and Geisler, C. D. 1996. Suppression in auditory-nerve fibers of cats
using low-side suppressors. III. Model results. Hearing Res., 96, 126-140.
Patterson, R. D. 1974. Auditory filter shape. J. Acoust. Soc. Am., 55, 802-809.
Rosen, S., and Fourcin, A. 1986. Frequency selectivity and the perception of speech.
In: Moore, B. C. J. (ed), Frequency selectivity in hearing. London: Academic.
190
Steinberg, J. C., and Gardener, M. B. 1937. The dependence of hearing impairment
on sound intensity. J. Acoust. Soc. Am., 9(1), 11-23.
ter Keurs, Mariken, Festen, Joost M., and Plomp, Reiner. 1992. Effect of spectral
envelope smearing on speech reception. I. J. Acoust. Soc. Am., 91(5), 2872-2880.
ter Keurs, Mariken, Festen, Joost M., and Plomp, Reiner. 1993. Effect of spectral
envelope smearing on speech reception. II. J. Acoust. Soc. Am., 93(3), 15471552.
191