Analysis of Frequency-Smearing Models Simulating Hearing Loss by Alan T. Asbeck Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology May 21, 2003 Copyright 2003 M. I. T. All rights reserved. MASSACHUSETTS INSTITUTE OFTECHNOLOGY UL 302003 LIBRARIES Author Department of Nfectrical Engineering and Computer Science May 21, 2003 Certified by...... . ....................................... Louis D. Braida Henry Ellis Warren Professor of Electrical Engineering Thesis Supervisor Accepted by ... ..................... Arthur C. Smith Chairman, Department Committee on Graduate Theses BARKER Analysis of Frequency-Smearing Models Simulating Hearing Loss by Alan T. Asbeck Submitted to the Department of Electrical Engineering and Computer Science on May 21, 2003, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science Abstract Various authors have created models of the psychoacoustic effects of sensorineural hearing loss that transform ordinary sounds into sounds that evoke the perception of hearing loss in normal-hearing listeners. In this thesis, models of the reduced frequency resolution, reduced temporal resolution, and loudness recruitment and absolute threshold loss were evaluating using a model of human perception. Confusion matrices of vowels were simulated using these models and compared to confusion matrices from five hearing-impaired subjects. It was found that the model of human perception used spectral cues in a different way than actually occurs in humans, making it difficult to evaluate the hearing loss models, and causing the simulated confusion matrices to be different than the real subjects' matrices. Even so, the frequency smearing models caused the total error rate to increase with increasing smearing bandwidth, and the results were generally consistent with the expected behavior. Thesis Supervisor: Louis D. Braida Title: Henry Ellis Warren Professor of Electrical Engineering 2 Acknowledgments I first want to thank my advisor, Professor Louis Braida, for his help and support throughout this project, and for giving me a project in the first place. I especially want to thank him for putting up with my tendency to procrastinate, and for spending so much time reading my thesis and being there at the end. I want to thank Michael "Q" Chin, for really making my time here a wonderful experience, and for helping me think through this project and other things throughout the year. I've really appreciated the time we've been able to get to know each other, including everything from the "sea-snakes" on the chalkboard and talking about the various topics you have come up with. I also want to thank Peninah Rosengard, Charlotte Reed, Andrew Oxenham, and everyone else in the Sensory Communication group that has helped me along the way. Finally, I acknowledge the NIH and NSF, which have provided funding for my work on this thesis. 3 Contents 1 25 Introduction 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.2 Physiology of Hearing Impairment . . . . . . . . . . . . . . . . . . 25 1.3 Psychoacoustics of Hearing Impairment . . . . . . . . . . . . . . . 28 1.4 Simulating Hearing Loss . . . . . . . . . . . . . . . . . . . . . . . 31 1.4.1 Reasons for Simulating Hearing Loss . . . . . . . . . . . . 31 1.4.2 Previous Work Simulating Hearing Loss . . . . . . . . . . 32 . . . . . . . . . . 35 1.5 Problem Statement . . . . . . . . . . . . . . . . 36 2 Signal Processing in the Experiment 36 2.1 Overview of Experiment . . . . . . . . . . . . . . . . . . . 2.2 Overview of Signal Processing . . . . . . . . . . . . . . . . . . . . . 37 2.2.1 Processing by Real Subjects . . . . . . . . . . . . . . . . . . 37 2.2.2 Processing by the Simulation . . . . . . . . . . . . . . . . . 37 2.3 Detail on the Brantley Confusion Matrix Experiment . . . . . . . . 40 2.4 Speech Stimuli and Vowel Extraction . . . . . . . . . . . . . . . . . 41 2.5 Signal processing simulating hearing loss . . . . . . . . . . . . . . . 42 42 . . . . . . . . . . . 2.5.1 Methods of frequency smearing 2.5.2 Method of Recruitment . . . . . . . . . . . . . . . . . . . . . 61 2.5.3 Combining the Smearing Methods and Recruitment . . . . . 71 2.6 d' Discrimination Model . . . . . . . . . . . . . . . . . . . . . . . . 71 2.7 Decision M odel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4 3 78 Results 3.1 Total Error Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.2 Locations of Errors in Confusion Matrices . . . . . . . . . . . . . . . 79 Comparison of Moore and ter Keurs Methods . . . . . . . . . 84 Effects of Processing Parameters on the Results . . . . . . . . . . . . 85 3.3.1 Loudness recruitment and Audibility . . . . . . . . . . . . . . 85 3.3.2 Male vs. Female Speakers . . . . . . . . . . . . . . . . . . . . 86 Results with Simulating Full Audibility . . . . . . . . . . . . . . . . . 87 3.4.1 Error rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.4.2 Locations of Errors . . . . . . . . . . . . . . . . . . . . . . . . 89 3.4.3 Comparison of Moore and ter Keurs Methods . . . . . . . . . 89 3.2.1 3.3 3.4 4 Discussion 4.1 Why the Simulation Did Not Work . . . . . . . . . . . . . . . . . . . 118 4.2 Other Sources of Error . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.3 Conclusions that Can Be Drawn . . . . . . . . . . . . . . . . . . . . . 122 4.3.1 Total Error Rates and Smearing and Recruitment . . . . . . . 123 4.3.2 Comparing Moore and ter Keurs Smearing . . . . . . . . . . . 126 4.3.3 The Effect of Audibility . . . . . . . . . . . . . . . . . . . . . 127 4.4 5 118 Potential Perceptual Models in Humans . . . . . . . . . . . . . . . . . Conclusions 128 132 5.1 Summary of Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.2 Relevance to Hearing Aid Signal Processing 5.3 Recommendations for Future Work . . . . . . . . . . . . . . . . . . . 133 . . . . . . . . . . . . . . 133 A Audiograms and Confusion Matrices for the Real Subjects 135 B Tables of Computed Confusion Matrices 140 C Plots of x 2 Values 151 5 D Recommendations for an Improved Simulation 183 D.1 Rationale and Background ........................ 183 D.2 Overview of Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 D .3 Details of M odel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 D.4 Simulating Hearing Loss . . . . . . . . . . . . . . . . . . . . . . . . . 186 D.5 Possible Sources of Error . . . . . . . . . . . . . . . . . . . . . . . . . 187 References 189 6 List of Figures 1-1 Figure showing how the basilar membrane is compressive for tones at the best frequency of the location of measurement. This figure shows "the level of a masker required to mask [a] 6-kHz [tone] signal, as a function of signal level" (Oxenham and Plack, 1997). The plot of the on-frequency masker is linear, as would be expected. However, the off-frequency masker shows compression, indicating both that the basilar membrane is compressive and that it responds linearly for tones an octave below the best frequency of a given location. Taken from Oxenham and Plack, 1997. . . . . . . . . . . . . . . . . . . . . . . . . 1-2 27 Loudness matching curves for subjects with unilateral hearing loss. The thin line at a 45-degree angle is the expected result if the two ears gave equal loudness results for a tone at a given level. The fact that the actual data is at a steeper slope than this indicates that for the impaired ear, as the stimulus sound pressure level increases a little, the apparent loudness increases more rapidly than in a normal ear. Also, the asterisks indicate the subjects' absolute thresholds; these are elevated in the impaired ear relative to the normal ear. Taken from 2-1 Moore et al., 1996, with the 45-degree line added. . . . . . . . . . . . 30 Signal processing overview. . . . . . . . . . . . . . . . . . . . . . . . . 38 7 2-2 Block diagrams of Moore frequency smearing algorithm. The top block diagram shows the main processing sequence, and the bottom block diagram shows detail for the "Smear Spectrum" block in the top part. Taken from Baer and Moore, 1993. 2-3 . . . . . . . . . . . . . . . . . . . 44 Spectra resulting from different amounts of Moore smearing on the vowel II taken from the word BIIDH. All files were put through the recruitment algorithm but with no recruitment. The unsmeared signal is shown at the regular level, while the other traces have been offset by -40 dB increments. The labels to the left of the traces indicate the amount of smearing, in terms of the effective number of ERBs of the widened auditory filters; US indicates the unsmeared signal. 2-4 . . . . . 47 Channels resulting from different amounts of Moore smearing on the vowel II taken from the word BIIDH. All files were put through the recruitment algorithm but with no recruitment. The unsmeared signal is shown at the regular level, while the other traces have been offset by -8 dB increments. The labels to the left of the traces indicate the amount of smearing, in terms of the effective number of ERBs of the widened auditory filters; US indicates the unsmeared signal. 2-5 . . . . . 48 Sample spectra showing the results of smearing, for a synthetic vowel /ae/. The different frames (vertical dimension in the figure) show successive chunks of signal that are partially overlapping. As can be seen in the right column, showing the output spectra, the effective degree of smearing is less than was intended. The intended degree of smearing is seen in the middle column, which shows the results after smearing but before the overlap-add process. Taken from Baer and M oore, 1993. 2-6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Block diagram showing the details of the ter Keurs processing method. Taken from ter Keurs et al., 1992. . . . . . . . . . . . . . . . . . . . . 8 52 2-7 Block diagram showing the details of the Spectral Smearing block from 2-6. Taken from ter Keurs et al., 1992. 2-8 . . . . . . . . . . . . . . . . . 53 Sample of the outputs of the ter Keurs frequency smearing algorithm. The top log spectrum, labeled "US," shows the unsmeared signal, while the bottom three spectra show the results of smearing of 1/8, 1/2, and 2 octaves, as labeled. Taken from ter Keurs et al., 1992 . . . . . . . . 2-9 54 Spectra resulting from different amounts of ter Keurs smearing on the vowel II taken from the word BIIDH. All files were put through the recruitment algorithm but with no recruitment. The unsmeared signal is shown at the regular level, while the other traces have been offset by -50 dB increments. The labels to the left of the traces indicate the amount of smearing, in octaves; US indicates the unsmeared signal. One octave of smearing is approximately equivalent to 3.9 ERBs of sm earing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2-10 Channels resulting from different amounts of ter Keurs smearing on the vowel II taken from the word BIIDH. All files were put through the recruitment algorithm but with no recruitment. The unsmeared signal is shown at the regular level, while the other traces have been offset by -12 dB increments. The labels to the left of the traces indicate the amount of smearing, in octaves; US indicates the unsmeared signal. One octave of smearing is approximately equivalent to 3.9 ERBs of sm earing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2-11 Block diagram of perceptual model of human temporal processing, used by Hou and Pavlovic in providing rationale for their temporal smearing model. Taken from Hou and Pavlovic, 1994. . . . . . . . . . . . . . . 58 2-12 Block diagram of the Hou temporal smearing model. The top diagram (a) shows the overall signal processing path, and the bottom diagram (b) shows the specifics of how temporal smearing is accomplished. Figure reproduced from Hou and Pavlovic, 1994. . . . . . . . . . . . . 9 59 2-13 Sample of outputs from the Hou temporal smearing model. The top panels (a and b) show the unsmeared signal and its spectrum, and the lower panels show the signal and its spectrum after it has been smeared with an ERD of 0.lms and 28ms, respectivley. Figure taken from Hou and Pavlovic, 1994. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2-14 Spectra resulting from different amounts of Hou smearing on the vowel II taken from the word BIIDH. All files were put through the recruitment algorithm but with no recruitment. The unsmeared signal is shown at the regular level, while the other traces have been offset by -50 dB increments. The labels to the left of the traces indicate the amount of smearing, that is, the equivalent rectangular duration of the smearing temporal window, in msec; US indicates the unsmeared signal. 63 2-15 Channels resulting from different amounts of Hou smearing on the vowel II taken from the word BIIDH. All files were put through the recruitment algorithm but with no recruitment. The unsmeared signal is shown at the regular level, while the other traces have been offset by -8 dB increments. The labels to the left of the traces indicate the amount of smearing, that is, the equivalent rectangular duration of the smearing temporal window, in msec; US indicates the unsmeared signal. 64 2-16 Block diagram of the model of loudness recruitment by Moore and Glasberg, 1993. The parallel arrows between the middle blocks represent the outputs of the 23 bandpass filters in the first block. Figure taken from Moore and Glasberg, 1993. 10 . . . . . . . . . . . . . 65 2-17 Channels resulting from different amounts of recruitment on the vowel II taken from the word BIIDH. The vowel was at a level of 100 dB. Recruitment of N = 1.0 (where N is the exponent to which the envelope is taken) corresponds to a hearing loss of 0 dB, recruitment of N = 2.0 corresponds to a flat hearing loss of 50 dB, and in general the recruitment amounts correspond to hearing losses via the equation N = lOOdB/(1OOdB - hearinglossdB). No additional offsets were used in creating the plot: the effective level changes with increasing recruitment due to the increased absolute threshold. . . . . . . . . . . 68 2-18 Channels resulting from different amounts of recruitment on the vowel II taken from the word BIIDH. The vowel was at a level of 80 dB. Recruitment of N = 1.0 (where N is the exponent to which the envelope is taken) corresponds to a hearing loss of 0 dB, recruitment of N = 2.0 corresponds to a flat hearing loss of 50 dB, and the other recruitment amounts correspond to hearing losses via the equation N = lOOdB/(1OOdB - hearinglossdB). No additional offsets were used in creating the plot: the effective level changes with increasing recruitment due to the increased absolute threshold. . . . . . . . . . . 11 69 2-19 Channels resulting from different amounts of recruitment on the vowel II taken from the word BIIDH. The solid lines are the result of recruitment with a simulated loss of 0 dB, and the dotted lines are the result of recruitment with a flat loss of 50 dB. The dashed line shows channels made from the original signal without recruitment, showing that simulating a loss of 0 dB does not distort the signal very much. Signals with levels of 120, 100, and 80 dB were recruited corresponding to losses of 0 dB and 50 dB, creating pairs of lines for each signal level. The 120 dB signal is above 100 dB the limit of recruitment, so recruitment has little effect. The 100 dB signal has the highest parts at around the limit of recruitment, so they are not affected, while the higher frequency portions of the signal are lower in level and so are distorted. The 80 dB signal is severely distorted and reduced in level by recruitm ent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 2-20 Block diagram of d' discrimination model. The bottom block comes after the top block in the processing path. The top block is used to create channels from all vowels processed, and the bottom block takes all combinations of pairs of vowels, and produces a d' distance between each pair. In the top figure, the signal is first filtered into bands. Then, in each band, the average power per unit length of the signal is found by first squaring it, summing over the signal length, dividing by the signal length, and then normalizing to account for variation in signal length between stimuli. Finally, the natural log of the resulting number is taken to produce a channel. In the bottom diagram, the difference (Ai) between corresponding channels is found for a pair of stimuli, as indicated in the plot. These differences are then put through the formula in the bottom of the figure. E is an arbitrary constant that is a scale factor for all d' distances . . . . . . . . . . . . . . . . . . . . . 12 73 2-21 Illustration of decision model function in two dimensions. Stimulus centers are represented by the small points in the figure, and are surrounded by Gaussian distributions in perceptual space. Response centers are found by taking the centroid of the stimulus centers of a single type. The probability distributions around each stimulus center are assumed to evoke responses corresponding to the closest response center. The lines dividing the figure are the boudaries of these so-called response regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22 Block diagram of decision model. 3-1 . . . . . . . . . . . . . . . . . . . . 76 77 Total error rates for the simulation using subject JB's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject JB's actual error rate. ......... 3-2 .................................... 94 Total error rates for the simulation using subject JO's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject JO's actual error rate. ......... 3-3 .................................... 95 Total error rates for the simulation using subject MG's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject MG's actual error rate. .... 3-4 ...... ............. ............... . .. 96 Total error rates for the simulation using subject PW's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject PW's actual error rate. ......... .................................... 13 97 3-5 Total error rates for the simulation using subject RC's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject RC's actual error rate. ........ 3-6 ............. ................. ... 98 Error rates relative to the total error rate for unsmeared stimuli for the simulation using subject JB's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject JB's actual error rate. . . . . . . . . . . . . . . 3-7 99 Error rates relative to the total error rate for unsmeared stimuli for the simulation using subject JO's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject JO's actual error rate. . . . . . . . . . . . . . . 3-8 100 Error rates relative to the total error rate for unsmeared stimuli for the simulation using subject MG's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject MG's actual error rate. 3-9 . . . . . . . . . . . . . 101 Error rates relative to the total error rate for unsmeared stimuli for the simulation using subject PW's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject PW's actual error rate. 14 . . . . . . . . . . . . . 102 3-10 Error rates relative to the total error rate for unsmeared stimuli for the simulation using subject RC's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject RC's actual error rate. . . . . . . . . . . . . . . 103 3-11 Plot showing how well the average of all of the confusion matrix data fit to the subject's actual confusion matrices. Averages were taken across all subjects and smearing amounts, for the reduced-audibility sim ulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 3-12 A comparison of the ter Keurs results (dotted line) and Moore results (solid line) if the total error rates are approximately the same for the two smearing conditions. The ter Keurs result had a smearing multiplier of 1.0, and the Moore smearing had a multiplier of 1.5. For each pair of ter Keurs and Moore values (plotted in the same vertical space), the subject used to create the simulated matrix and the real subject it was being compared to were the same. . . . . . . . . . . . . 105 3-13 A comparison of the ter Keurs results (dotted line) and Moore results (solid line) if the total error rates are approximately the same for the two smearing conditions. The ter Keurs result had a smearing multiplier of 1.5, and the Moore smearing had a multiplier of 2.0. For each pair of ter Keurs and Moore values (plotted in the same vertical space), the subject used to create the simulated matrix and the real subject it was being compared to were the same. . . . . . . . . . . . . 15 106 3-14 Graph showing the percentage of stimuli that had audible channels for a given channel number, for each subject. Note that the channels correspond to a log scale with channel 1 centered at 100 Hz, channel 12 at 1575 Hz, and channel 19 at 4839 Hz (for reference). Also, it can be seen that in most plots two groups of curves can be seen. The set of curves to the left corresponds to no recruitment, and the set to the right is with recruitment. . . . . . . . . . . . . . . . . . . . . . . . . . 107 3-15 Plot showing the stimulus centers and response centers for the stimuli from the male and female speakers combined (top plot), and just the female speaker (bottom plot). The stimulus centers are shown by the small black dots, and the response centers are shown by the open circles. In each plot, the response centers are the average of the stimulus centers for each vowel. Only half of the total number of stimulus centers were plotted for computational reasons. As can be seen around the "AH" in the top plot, the stimuli corresponding to the male and female speakers formed two distinct clusters for each vowel. The clusters are not distinct for the other vowels in this plot due to the perspective. Also, it can be seen that the stimulus centers for each vowel are less spread out in the bottom plot. . . . . . . . . . . . . . . 108 3-16 Plot showing how well the average of all of the confusion matrix data fit to the subject's actual confusion matrices. Averages were taken across all subjects, with the smearing multiplier fixed at 1.0, for the reduced-audibility simulation. . . . . . . . . . . . . . . . . . . . . . . 109 3-17 Total error rates for the simulation using subject JB's audiogram, assuming that all frequencies are audible to the subject. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject JB's actual error rate. . . . . 110 16 3-18 Total error rates for the simulation using subject JO's audiogram, assuming that all frequencies are audible to the subject. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject JO's actual error rate. . . . . 111 3-19 Total error rates for the simulation using subject MG's audiogram, assuming that all frequencies are audible to the subject. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject MG's actual error rate. . . . . 112 3-20 Total error rates for the simulation using subject PW's audiogram, assuming that all frequencies are audible to the subject. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject PW's actual error rate. . . . . 113 3-21 Total error rates for the simulation using subject RC's audiogram, assuming that all frequencies are audible to the subject. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject RC's actual error rate. . . . . 114 3-22 Plot showing how well the average of all of the confusion matrix data fit to the subject's actual confusion matrices. Averages were taken across all subjects and smearing amounts, for the simulation in which all frequencies were audible. . . . . . . . . . . . . . . . . . . . . . . . 17 115 3-23 Assuming full audibility, a comparison of the ter Keurs results (dotted line) and Moore results (solid line) if the total error rates are approximately the same for the two smearing conditions. The ter Keurs result had a smearing multiplier of 1.0, and the Moore smearing had a multiplier of 1.5. For each pair of ter Keurs and Moore values (plotted in the same vertical space), the subject used to create the simulated matrix and the real subject it was being compared to were the same. 116 3-24 Assuming full audibility, a comparison of the ter Keurs results (dotted line) and Moore results (solid line) if the total error rates are approximately the same for the two smearing conditions. The ter Keurs and Moore results both had a smearing multiplier of 2.0. For each pair of ter Keurs and Moore values (plotted in the same vertical space), the subject used to create the simulated matrix and the real subject it was being compared to were the same. . . . . . . . . . . . . . . . . . . . . 4-1 117 Average vowel lengths for the different vowels. The error bars represent one standard deviation of the vowel length. The averages were nearly identical for male and female speakers. 4-2 . . . . . . . . . . . . . . . . . 130 Examples of spectra. In both cases, the d' discrimination algorithm will produce equal d' distances between all stimuli. However, a real human listener could easily distinguish that the third spectrum was much different than the other two. In Stimulus Set A, the spectrum heights are A, 2A, and A, for spectra 1,2, and 3, respectively. The widths of the spectra are such that the total area under all of the spectra is the sam e. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 A-1 Audiogram for subject PW. . . . . . . . . . . . . . . . . . . . . . . . 135 A-2 Audiograms for subjects JB and JO. . . . . . . . . . . . . . . . . . . 136 A-3 Audiograms for subjects RC and MG . . . . . . . . . . . . . . . . . . 137 C-1 Original simulation including reduced audibility. . . . . . . . . . . . . 153 18 C-2 Original simulation including reduced audibility. ............... 154 C-3 Original simulation including reduced audibility. ............... 155 C-4 Original simulation including reduced audibility. . . . . . . . . . . . . 156 C-5 Original simulation including reduced audibility. . . . . . . . . . . . . 157 C-6 Original simulation including reduced audibility. . . . . . . . . . . . . 158 C-7 Original simulation including reduced audibility. . . . . . . . . . . . . 159 C-8 Original simulation including reduced audibility. . . . . . . . . . . . . 160 C-9 Original simulation including reduced audibility. . . . . . . . . . . . . 161 C-10 Original simulation including reduced audibility. . . . . . . . . . . . . 162 C-11 Original simulation including reduced audibility. . . . . . . . . . . . . 163 C-12 Original simulation including reduced audibility. . . . . . . . . . . . . 164 C-13 Original simulation including reduced audibility. . . . . . . . . . . . . 165 C-14 Original simulation including reduced audibility. . . . . . . . . . . . . 166 C-15 Original simulation including reduced audibility. . . . . . . . . . . . . 167 C-16 All frequencies audible condition. . . . . . . . . . . . . . . . . . . 168 C-17 All frequencies audible condition. . . . . . . . . . . . . . . . . . . 169 C-18 All frequencies audible condition. . . . . . . . . . . . . . . . . . . 170 C-19 All frequencies audible condition. . . . . . . . . . . . . . . . . . . 17 1 C-20 All frequencies audible condition. . . . . . . . . . . . . . . . . . . 172 C-21 All frequencies audible condition. . . . . . . . . . . . . . . . . . . 173 C-22 All frequencies audible condition. . . . . . . . . . . . . . . . . . . 174 C-23 All frequencies audible condition. . . . . . . . . . . . . . . . . . . 175 C-24 All frequencies audible condition. . . . . . . . . . . . . . . . . . . 176 C-25 All frequencies audible condition. . . . . . . . . . . . . . . . . . . . 177 C-26 All frequencies audible condition. . . . . . . . . . . . . . . . . . . . 178 C-27 All frequencies audible condition. . . . . . . . . . . . . . . . . . . . 179 C-28 All frequencies audible condition. . . . . . . . . . . . . . . . . . . . 180 C-29 All frequencies audible condition. . . . . . . . . . . . . . . . . . . . 18 1 C-30 All frequencies audible condition. . . . . . . . . . . . . . . . . . . . 18 2 19 D-1 Diagram of new hearing loss model. . . . . . . . . . . . . . . . . . . . 188 20 List of Tables 2.1 Table of the levels at which the low-passed (below 2500 Hz) and highpassed (above 2500 Hz) portions of the CVC syllables were presented to the subjects in the confusion matrix experiment by Brantley (1990-1991). 41 2.2 Table of the different processing conditions used in the simulation. . . 3.1 Table of the average simulated confusion matrices for each processing 71 condition. Averages were taken across subjects and smearing amounts. From top to bottom, the processing conditions are: No smearing, Moore smearing, ter Keurs smearing, and Hou smearing. The left column contains the results of smearing without recruitment, and the right column contains the results of smearing followed by recruitment. 3.2 83 Table of the simulated confusion matrices for approximately equivalent total smearing amounts for the Moore and ter Keurs methods, without recruitment. The results shown in the table are the averages across all subjects. The top row contains a Moore smearing with a widening factor of 4.5, and ter Keurs smearing of 0.75 octaves. The total error rate with that amount of smearing for the Moore method was a little lower than the total error rate for that amount of ter Keurs smearing. The bottom row contains a Moore smearing with a widening factor of 6, and ter Keurs smearing of 1.125 octaves. The total error rate with that amount of smearing for the Moore method was a little higher than the total error rate for that amount of ter Keurs smearing. . . . . . . 21 85 3.3 Table of the average simulated confusion matrices for each processing condition. Averages were taken across subjects for a smearing multiplier of 1. From top to bottom, the processing conditions are: No smearing, Moore smearing, ter Keurs smearing, and Hou smearing. The left column contains the results of smearing without recruitment, and the right column contains the results of smearing followed by recruitm ent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 91 Table of the average simulated confusion matrices for each processing condition, assuming that all frequencies are audible. Averages were taken across subjects and smearing amounts. From top to bottom, the processing conditions are: No smearing, Moore smearing, ter Keurs smearing, and Hou smearing. The left column contains the results of smearing without recruitment, and the right column contains the results of smearing followed by recruitment. 3.5 . . . . . . . . . . . . . . 92 Table of the simulated confusion matrices for approximately equivalent total smearing amounts for the Moore and ter Keurs methods, without recruitment, assuming that all frequencies are audible. The results shown in the table are the averages across all subjects. The top row contains a Moore smearing with a widening factor of 4.5, and ter Keurs smearing of 0.75 octaves. The total error rate with that amount of smearing for the Moore method was a somewhat (~5%) lower than the total error rate for that amount of ter Keurs smearing. The bottom row contains a Moore smearing with a widening factor of 6, and ter Keurs smearing of 1.5 octaves. The total error rate with that amount of smearing for the Moore method was almost identical to the total error rate for that amount of ter Keurs smearing. . . . . . . . . . . . A.1 93 Confusion matrix for subject JB. . . . . . . . . . . . . . . . . . . . . 138 A.2 Confusion matrix for subject JO. . . . . . . . . . . . . . . . . . . . . 138 A.3 Confusion matrix for subject MG. . . . . . . . . . . . . . . . . . . . . 138 22 A.4 Confusion matrix for subject PW. . . . . . . . . . . . . . . . . . . . . 139 A.5 Confusion matrix for subject RC. . . . . . . . . . . . . . . . . . . . . 139 B.1 Confusion matrices for the simulation done with subject JB's parameters. From top to bottom, for each row the smearing conditions were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). 141 B.2 Confusion matrices for the simulation done with subject JB's parameters.From top to bottom, for each row the smearing conditions were: Moore sm. then recruitment, ter Keurs sm. then recruitment, Hou sm. then recruitment, and just recruitment. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). B.3 Confusion matrices for the simulation done with subject . . . 142 JO's parameters. From top to bottom, for each row the smearing conditions were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). 143 B.4 Confusion matrices for the simulation done with subject JO's parameters.From top to bottom, for each row the smearing conditions were: Moore sm. then recruitment, ter Keurs sm. then recruitment, Hou sm. then recruitment, and just recruitment. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). . . . 144 B.5 Confusion matrices for the simulation done with subject MG's parameters. From top to bottom, for each row the smearing conditions were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). 145 23 B.6 Confusion matrices for the simulation done with subject MG's parameters.From top to bottom, for each row the smearing conditions were: Moore sm. then recruitment, ter Keurs sm. then recruitment, Hou sm. then recruitment, and just recruitment. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). . . . 146 B.7 Confusion matrices for the simulation done with subject RC's parameters. From top to bottom, for each row the smearing conditions were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). 147 B.8 Confusion matrices for the simulation done with subject RC's parameters.From top to bottom, for each row the smearing conditions were: Moore sm. then recruitment, ter Keurs sm. then recruitment, Hou sm. then recruitment, and just recruitment. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). . . . 148 B.9 Confusion matrices for the simulation done with subject PW's parameters. From top to bottom, for each row the smearing conditions were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). 149 B.10 Confusion matrices for the simulation done with subject PW's parameters.From top to bottom, for each row the smearing conditions were: Moore sm. then recruitment, ter Keurs sm. then recruitment, Hou sm. then recruitment, and just recruitment. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). 24 . . . 150 Chapter 1 Introduction 1.1 Background Due to the current state of technology, it is possible to do substantial signal processing in hearing aids. As such, the goal of hearing aid research is to come up with signal processing schemes to enable hearing-impaired individuals to experience sound as much as is possible like normal listeners, or at least to be able to consistently understand speech in various environments and at various levels. In order to develop said signal processing algorithms, it is necessary to understand how hearing-impared people experience sound, so that any differences can be accounted for. Thus, hearing aid research currently is largely focused on developing models of how hearing-impaired people experience sound. This chapter contains a discussion of the physiology of hearing impairment and the perceptual consequences of hearing loss to provide background for understanding the models, and also discusses previous work on hearing loss simulations. 1.2 Physiology of Hearing Impairment The first stages of the auditory system include the outer, middle, and inner ear, after which the auditory nerve takes the resulting signals to the brain. The outer and 25 middle ear serve to collect and channel the sound into the head. The inner ear serves several functions: it acts as a sort of frequency analyzer, to separate the sound into different frequency bands; it amplifies the sound; and it converts the mechanical sound signal into neural signals. These functions take place within the inner ear structure known as the cochlea. The cochlea is essentially a coiled tube that is partitioned lengthwise by several membranes. One of these is known as the basilar membrane (BM), which is a stiff structure that performs the first stage of frequency analysis through its mechanical properties: it is about 100 times stiffer at the base (the place where the sound first encounters it) than at the apex (the other end from the base). This variable stiffness means that different locations on the BM will respond (oscillate) the best to different input frequencies: it acts as a frequency-to-place converter. The frequency that causes the maximum response of a particular location on the BM is known as the best frequency (BF) or characteristic frequency (CF) of that location. High frequencies to cause the basal end to be excited more, while low frequencies travel down the length of the BM and create more of a response toward the apical end. Attached to the BM are rows of cells known as outer hair cells (OHCs). These cells serve primarily as mechanical transducers, actively causing the BM to vibrate even more in response to incoming sounds than it would on its own. This causes the BM to be even more frequency selective (that is, exhibit sharper tuning to tone stimuli) and causes the incoming sounds to be amplified. It has been found that due to the OHCs, the transfer function between the incoming sound level and the resulting BM oscillation displacement at a particular location is compressive for CF tones. This compression takes place for tones at input levels between about 30-90 dB SPL (Oxenham and Plack, 1997). A graph showing this transfer function is shown in figure 1-1. For tones that are an octave or more lower than the CF of a location on the BM, the BM responds linearly. This is also shown in figure 1-1. The motion of the BM is detected by another type of cell that is found along the length of the BM, the inner hair cell (IHC). These cells connect to auditory nerve 26 100 Normal hearing Impaired hearing 90 a_ cn 80 70 (1) 50 C-) Ut) 50 XD 40 40 50 60 CP JB so6 0 0 [> 3 kHz 0 0 NO 5 k Hz MV 70 70 80 40 50 60 90 100 Signal level (dB SPL) AR JK 0 80 3 kHz 6 kHz 90 100 Figure 1-1: Figure showing how the basilar membrane is compressive for tones at the best frequency of the location of measurement. This figure shows "the level of a masker required to mask [a] 6-kHz [tone] signal, as a function of signal level" (Oxenham and Plack, 1997). The plot of the on-frequency masker is linear, as would be expected. However, the off-frequency masker shows compression, indicating both that the basilar membrane is compressive and that it responds linearly for tones an octave below the best frequency of a given location. Taken from Oxenham and Plack, 1997. 27 fibers, and act to convert the amplitude (or velocity) of BM oscillation into neural firing patterns. In response to frequencies below about 2 kHz, the IHCs cause spikes on the auditory nerve to be "phase-locked" to those frequencies, meaning that the spike trains are in phase with the BM motion caused by the signal, and the temporal patterns of the signal itself. At higher frequencies, auditory nerve fiber firing patterns are not phase-locked to the BM motion, meaning that their firing rate is not correlated with the signal. When sensorineural hearing loss occurs, typically the OHCs and/or IHCs are damaged. This damage can occur in any location along the length of the cochlea, which translates to different frequency regions having different amounts of hearing loss due to the frequency-to-place transformation in the cochlea. IHC loss is primarily a conduction loss, as the IHCs act as sensors of the BM motion. This type of hearing loss can typically be corrected for by linear amplification of the incoming signal. However, in the majority of cases, OHC loss is the more dominant effect in hearing loss. The effect of OHC loss is that the sharp frequency tuning of the BM is lost in regions where there has been OHC damage, since the OHCs will no longer function as active amplifiers. Additionally, those regions of the cochlea do not have the gain in signal level caused by the OHCs. This means that compression will not take place to as large a degree. In cases of severe OHC loss, no compression or signal gain remains, and the BM responds linearly to the incoming sound waves. The current experiment, and hearing aid research in general, are focused on studying the effects of hearing loss caused by OHC damage alone. The next section will discuss the perceptual consequences of this type of hearing loss. 1.3 Psychoacoustics of Hearing Impairment In discussing the perceptual consequences of hearing loss, it will be helpful to first define some terms commonly used in discussing auditory perception. The first idea is the "auditory filter." If the transfer function from a sound in air to a person's internal 28 perception of the sound is examined, a good model for this transfer function is a filter bank of auditory filters. The specific form of the auditory filters will be described later, but in general their center frequencies are logarithmically spaced above around 600 Hz (below this they have constant spacing). For moderate input sound levels the filters are symmetric, but for high input sound levels they are asymmetric and respond more to frequencies higher than the center frequency. The next idea is the "excitation pattern," which is defined as "the output from the auditory filters as a function of filter center frequency..." (Baer and Moore, 1993). Or in other words, if a sound is played to a listener, the excitation pattern is the plot of how much each auditory filter responds versus the center frequencies of the auditory filters. The psychoacoustic effects of OHC loss are generally described in terms of loudness recruitment and absolute threshold reduction, and frequency and temporal smearing. These will be discussed in turn. Loudness recruitment is a phenomenon that stems from the reduced gain of the cochlea in regions where there is OHC loss. In normal ears, quiet sounds are amplified to a moderate level through the compressive behavior of the cochlea. However, at and above about 100 dB, the incoming sounds are no longer amplified (Moore et al., 1996). In impaired ears, quiet sounds are not amplified, and so below a certain level threshold, sounds cannot be heard at all. This is an absolute threshold elevation. For sounds just above the threshold, the perceived loudness increases more rapidly with increases in signal level than it would for normal-hearing listeners. This is because of the lack of or reduced compression in the cochlea, and because the perceived loudness is related to the amount of BM motion. A graph showing equal-loudness contours for subjects with unilateral hearing losses is shown in figure 1-2. Frequency and temporal smearing occurs because of the less-sharp tuning of the BM in an impaired cochlea. Locations on the BM at which OHC damage has occurred respond to a wider range of frequencies than they would in a normal ear. The psychoacoustic effect of this is that an impaired-listener's auditory filters are wider than they are supposed to be, and that the excitation patterns produced also less 29 120 JH 100 80 60 0 40 20 0 varying in normal ear 0 Varying in impaired ear CLas' - S100 s 80 <0 20 0 Gr VF 100 80 60 40 20 0- 0 20 Level irn 40 60 80 100 120 impaired ear (dB SPL) Figure 1-2: Loudness matching curves for subjects with unilateral hearing loss. The thin line at a 45-degree angle is the expected result if the two ears gave equal loudness results for a tone at a given level. The fact that the actual data is at a steeper slope than this indicates that for the impaired ear, as the stimulus sound pressure level increases a little, the apparent loudness increases more rapidly than in a normal ear. Also, the asterisks indicate the subjects' absolute thresholds; these are elevated in the impaired ear relative to the normal ear. Taken from Moore et al., 1996, with the 45-degree line added. 30 sharp. Also, since a sum of a lot of frequencies (which is what will come through the wider auditory filter) produces a less-regular temporal pattern than a single frequency or just a few frequencies (which is what would come through normal-width auditory filters), the temporal patterns at the output of each auditory filter are more complex. Moore et al. (1992) provide a convenient summary of the psychoacoustic changes produced by hearing impairments: (1) It would produce broader excitation patterns for sinusoidal signals. For complex signals containing many harmonics (e.g. speech sounds) this would make the individual components harder to resolve. (2) For complex signals such as speech it would result in a blurring of the spectral envelope, making spectral features such as formants harder to detect and discriminate (Leek et al., 1987). (3) It would change the time patterns at the outputs of the auditory filters, generally giving more complex stimulus waveforms (Rosen and Fourcin, 1986). It is also important to note that these perceptual effects of hearing loss are all closely related: it is likely not possible to create neural stimuli that code for one effect but not the others (Moore et al., 1992). In the processes of changing either the frequency or temporal patterns in a signal, the perception of the other will almost certainly be changed due to their being coded in the same IHC responses. 1.4 Simulating Hearing Loss 1.4.1 Reasons for Simulating Hearing Loss As stated in the background, in order to develop signal processing algorithms for use in hearing aids, it is necessary to understand how hearing-impared people experience sound, so that any differences can be accounted for. 31 It would be very useful to have a computer model that would accurately represent hearing-impaired listeners' responses, because then simulations could be done using the model to predict real listeners' responses and provide the optimal hearing aid correction. Also, it is useful to model the psychoacoustic effects of hearing loss separately. Though it is not possible to isolate the effects in real listeners, it would be beneficial to determine which effects cause the most deleterious effects on speech reception, because hearing aid signal processing schemes may be able to correct for one effect (at a possible expense to the other effects, but with a net benefit to the hearing aid user). Finally, hearing loss models can be used to educate normal-hearing listeners about hearing impairments. Such education could be useful for people who interact with hearing-impaired listeners frequently. 1.4.2 Previous Work Simulating Hearing Loss Simulations of hearing loss were first produced the 1930s, when Steinberg and Gardener (1937) used masking noise to simulate increased hearing thresholds and loudness recruitment. However, with masking noise the mechanisms that occur in the ear and brain to cause the percept of hearing loss in normal hearing listeners are thought to be different than the mechanisms of hearing impairment in real hearing-impaired listeners (Graf, 1997). More recently, advances in signal processing capabilities have allowed attempts to be made at more accurate hearing loss simulations. Generally, studies have attempted to generate stimuli that, when played to a normal-hearing listener, simulate one aspect of hearing loss at a time. Some studies have simulated the aspects of reduced time or frequency resolution, and some have simulated loudness recruitment. A few studies have used a recruitment model in conjunction with a reduced-resolution model to try simulate hearing loss more generally. However, as was pointed out in the previous section, it is not generally possible to create signals that, when presented to normal-hearing listeners, will evoke a completely accurate perception of hearing loss in all respects. Moore and Glasberg (1993) simulated simulated threshold elevation and loudness 32 recruitment with a method used later in the current experiment. They simulated moderate and severe flat hearing loss conditions, and a sloping moderate-to-severe hearing loss condition. They examined responses of normal-hearing subjects listening to the stimuli, which were sentences containing key words. They found that with speech processed in quiet, at low presentation levels the intelligibility was reduced. If the speech was amplified to normal conversation levels before processing (simulating a linear hearing aid), then it was highly intelligible, however. In contrast, if the speech contained a competing talker in the background, intelligibility was compromised even at high presentation levels. These results were extended by Moore et al., 1995, who simulated recruitment in the same way except with a background of speech-shaped noise. They found that intelligibility was reduced for the unamplified case, but if linear amplification was applied then intelligibility was almost as good as in the unprocessed (control) condition-performance decreased much less than with the single talker interference in the previous experiment. A number of experiments have simulated reduced frequency selectivity. Baer and Moore (1993) processed sentences and presented them at a moderate level to normalhearing listeners, using an algorithm also used in the current simulation. They found that if speech was processed in a background of quiet, even simulating auditory filters with 6 times the normal width did not affect the intelligibility very much. If the speech was processed in a background of speech-shaped noise, intelligibility significantly decreased. The intelligibility decreased more with larger amounts of smearing and with lower signal-to-noise ratios. The authors state that these results are consistent with the behaviors of real hearing-impaired listeners. Another study by Baer and Moore (1994) extended these results by processing speech in the presence of a competing talker. They found that intelligibility was reduced, but not a much as with the noise background. ter Keurs et al. (1992) tested a different spectral smearing algorithm, also used in the current simulation. They examined both the speech reception threshold (SRT) for sentences and in a second experiment, vowel (and consonant-not discussed here) 33 confusions caused by the smearing. They found that the subjects' SRT increased once the smearing bandwidth increased beyond the ear's critical bandwidth, about 1/3 to 1/2 octaves, and continued to increase after this point. In the confusion matrix experiments, the total error rate for vowel confusions remained close to zero until somewhere between 1/2 and 2 octaves of smearing. The particular confusions causing the errors tended to be with the back vowels. These results held for both vowels in quiet and in a noise background. A second experiment by ter Keurs, et al. (1993) examined SRTs to sentences spoken by a male speaker (the first experiment had used a female speaker) and found similar results regarding the critical bandwidth. Hou and Pavlovic (1994) performed temporal smearing on vowel-consonant nonsense syllables to evaluate its effect on speech intelligibility, using an algorithm repeated in the current experiment. They found that with small smearing durations, speech was not degraded. Even with the larger durations, speech intelligibility was only reduced when the signals were low-pass filtered. Also, they demonstrated that the spectral characteristics of the signals were largely unaltered. Nejime and Moore (1997) simulated the combined effects of frequency smearing followed by loudness recruitment on speech intelligibility, using the models from Baer and Moore (1993) and Moore and Glasberg (1993). Their stimuli were sentences in speech-shaped noise. They tested a simulated moderate flat loss or moderate-tosevere sloping loss for both frequency smearing and loudness recruitment, and found that speech intelligibility was degraded substantially, and could not be corrected for by amplification alone. In general, these studies have shown that simulations of frequency smearing and loudness recruitment degrade speech intelligibility for sentences with key words, for both male and female speakers. Temporal smearing does not seem to have as much of an effect on speech intelligibility. However, aside from the ter Keurs et al. (1992) experiment, these studies have not attempted to verify the accuracy of the smearing methods aside from gross intelligibility rates. Even in the ter Keurs (1992) paper, a comparison between the confusion matrix resulting from simulated smearing and a 34 confusion matrix corresponding to real hearing-impaired listeners was not provided. So, questions remain about how well these smearing methods simulate real hearing impairments. 1.5 Problem Statement Most previous studies on hearing loss simulation have examined the amount of speech intelligibility degradation caused by the hearing loss simulations without examining how well the percept evoked in normal-hearing listeners matched what hearingimpaired listeners experience. The simulations described in this thesis attempt to evaluate different hearing loss models of reduced frequency or temporal resolution and loudness recruitment/threshold elevation in order to determine their accuracy. To accomplish this task, a model of human speech perception is employed. This provides a deterministic way of evaluating the hearing loss simulations, and allows for multiple parameter values to be considered efficiently. It is also convenient that the stimuli under consideration, vowels, are determined largely by their spectra; the perceptual model being used operates solely on the basis of signals' spectra, allowing discrimination to proceed without interference from other signal characteristics. 35 Chapter 2 Signal Processing in the Experiment 2.1 Overview of Experiment This thesis involved simulating hearing loss using several models. In particular, the models of hearing loss that were being compared and evaluated performed frequency smearing and loudness recruitment. These models converted waveforms corresponding to ordinary speech into waveforms that would evoke the percept of hearing loss in normal-hearing listeners. It was desired to determine how well these models simulated what real hearingimpaired people experience when they hear sounds. In order to determine what people experience, something was needed to convert the physical sound into a person's perception of the sound. This could either be a real person (i.e., subjects could listen to sounds processed by the hearing loss models and report on their perceptions), or it could be a model simulating how normal-hearing people perceive sounds. In this thesis, the latter option was decided upon in order to try accurately simulate the entire pathway from a sound in air to what a hearing-impaired listener would experience in response to that sound. The combination of a hearing loss model followed by a perceptual model would ideally react the same way to stimuli as would a real person. 36 So, to test this simulation, the same stimuli were presented to real hearing-impaired subjects as well as to the simulation, and the responses of each were compared. 2.2 Overview of Signal Processing A block diagram showing an overview of all the signal processing steps that were done is shown in figure 2-1. The bottom path in the block diagram shows the processing by real subjects, while the top path shows the steps in the simulation. The stimuli used to compare the simulation to real subjects were a set of CVC tokens, approximately half spoken by a male speaker, and about half spoken by a female speaker. 2.2.1 Processing by Real Subjects The real hearing-impaired subjects performed their "processing" as follows: five hearing-impaired subjects listened to the set of CVC tokens, and confusion matrices were generated based on the subject's responses. This step was done by Merry Brantley in the Sensory Communication group of the Research Laboratory of Electronics at MIT in 1990 and 1991. Although confusion matrices for both the vowels and the consonants were generated in the previous experiment by Brantley, only the vowel confusion matrices were used in the current research. 2.2.2 Processing by the Simulation In the simulation, processing was done as follows: first, the vowels were extracted from the CVC tokens. Next, the extracted vowels were put through the hearing loss models, which involved simulating frequency smearing, loudness recruitment, or a combination of the two. The models used in this section were taken from papers by Baer and Moore (1993), ter Keurs et al. (1992), Hou and Pavlovic (1994), Moore and Glasberg (1993), Farrar et al. (1987), and Braida (1991). The hearing loss models produced a set of waveforms that would sound to normal-hearing listeners as if the listeners had hearing loss. The combination of the vowel extraction block and the 37 Signal Processing Overview Smeared/ d' discrimination algorithm Extract Vowels Simulated Hearing Loss: frequency smearing and/or recruitment Hearing-loss Model n CD matrix of d' values for vowel pairs processed vowels Simluation: Signal Processing Make Channels Compare confusion matrix tofindd's Perceptual Model CVCs said by a male and female speaker Compute Chii2 C confusion Hearing- Simpaired listeners Real Subjects Decision Model matrix simulated hearing loss block formed the hearing loss model portion of the simulation. The subsequent stages in the simulation made up the perceptual model portion of the simulation. The first stage of the perceptual model shall be referred to as the "d' discrimination algorithm," and it was concerned with determining how different two sounds would be perceptually. This model was created by Farrar, et al. (1987). The first stage of the model, which shall be referred to as the "make channels" block, took as an input a single vowel waveform and essentially found the signal power in a number of consecutive frequency bands. The outputs from this model will be referred to as "channels." The next stage of the d' discrimination algorithm took the channels from two different vowels, and computed a d' distance between them. This d' distance was interpreted as being the distance between the two vowels in "perceptual space," that is, how different they would be perceived to be by a human listener. In response to the set of processed vowel waveforms, the d' discrimination algorithm generated a matrix of d' distances between all vowel pairs. The second stage of the perceptual model converted this set of d' distances into a confusion matrix. This was done by the decision model block. This block first used classical multi-dimensional scaling to produce a map of the vowels that was reflective of the d' between-vowel distances. This map was also interpreted as being in perceptual space, as its dimensions were in units of d'. To take into account of the perceptual uncertainty that is present if a person is exposed to a stimulus, each point was converted into a Gaussian probability distribution centered at the point's location. Finally, a decision algorithm was applied as follows: the centroid of the locations for each vowel was computed. These centroids were interpreted to be the "response centers" in perceptual space, while the individual vowel locations surrounded by Gaussians were interpreted to be the "stimulus centers." An ideal detector algorithm was used such that the stimulus centers (or really, the parts of the Gaussians around them) closest to a response center were classified as corresponding to that response. In this manner a confusion matrix was generated corresponding to 39 the processed sounds, completing the simulation pathway. Finally, to compare the responses created by the simulation to those generated by real listeners, a Chi-squared-like measure was used to evaluate how closely the confusion matrices matched. 2.3 Detail on the Brantley Confusion Matrix Experiment The experiment that produced the confusion matrices for this thesis was examining the effect of filtering on audibility in hearing-impaired listeners, and was done by Brantley in 1990-1991. The five subjects in this experiment had moderate or moderate-severe hearing losses; three subjects' losses were flat, and two were sloping. Audiograms for the five subjects are presented in figures A-1, A-2, and A-3, in Appendix A. The experiment used part of a set of 862 consonant-vowel-consonant (CVC) syllables of the form /@/-CVC, where /a/ is the unstressed schwa. The set of CVC syllables had the vowels /i, Q, u, 1, C, u/, and the 16 consonants /p, t, k, b, d, g, f, 0, s, f, v, a) z, t, , h, 3/. The Brantley experiment limited their stimuli to the 12 consonants /p, t, k, b, d, g, s, f, 0, v, z, 6/, so they only used about 484 syllables. The mean durations of the syllables spoken by the male speaker were 634 ins, and the syllables spoken by the female speaker had a mean duration of 574 ms (the vowels used in this thesis spoken by the male and female speakers had approximately equal durations). The stimuli in the Brantley experiment were lowpass filtered at 9 kHz, and digitized at a sampling rate of 20 kHz with 12-bit resolution. Each stimulus was filtered into a low-passed portion (below 2500 Hz) and a high-passed portion (above 2500 Hz). Each portion was presented separately to the five subjects in a background of quiet, and the level was adjusted to maximize the comprehension of the CVC stimuli. Then, the low-passed and high-passed portions of the signal were added together, using the levels found in the first part of the 40 Subject JB JO MG PW RC Low-passed portion [dB SPL] 80 110 107 85 80 High-passed portion [dB SPL] 95 105 110 88 102 Table 2.1: Table of the levels at which the low-passed (below 2500 Hz) and highpassed (above 2500 Hz) portions of the CVC syllables were presented to the subjects in the confusion matrix experiment by Brantley (1990-1991). experiment, and a confusion matrix experiment was done. In the experiment, each stimulus was presented to each subject 50 times, and confusion matrices for the vowels and consonants were generated. A table of the levels at which the high-passed and low-passed portions of the signals were presented to each subject is shown in table 2.3. Since the simulations done in this thesis attempted to duplicate this confusion matrix experiment, a simulation was done using the parameters for each subject (hearing loss and levels used in the experiment). It was hoped that the results of the simulations would correspond to the confusion matrices produced by the respective subjects. The description of the simulation in the following sections describes it as if just one set of parameters was being used, but in reality the simulation was run five times, corresponding to once for each subject. The confusion matrices produced by the real subjects are shown in Appendix A. 2.4 Speech Stimuli and Vowel Extraction The stimuli used in the simulation were the same as described above in the section on the Brantley confusion matrix experiment, except that the simulation used the entire set of stimuli with 16 possible consonants, instead of being limited to 12 consonants as in the actual experiment. Also, the sampling rate of the stimuli in this thesis was 10 kHz instead of 20 kHz, thereby limiting the bandwidth of the stimuli to 5 kHz. 41 For the remainder of this thesis, the vowels /a, i, C, 1, u, u/ will be referred to as AH, EE, EH, II, 00, and UH, respectively. Though these tokens contained both consonants and vowels, only the vowels were used in the simulation. The vowels were thus extracted from the entire word and used in isolation. This is another difference between the Brantley confusion matrix experiment and the simulation in this thesis: in the Brantley experiment, the vowels were heard in the context of consonants, but in this simulation they were in isolation. Vowel extraction was achieved by making use of the fact that the energy in the vowels was usually much greater than that of the consonants. The waveform for each word was squared, finding the power, and the envelope of this was found using a secondorder lowpass filter with a cutoff frequency of about 10.3 Hz. Next, a threshold was set to be 0.2 times the maximum value of power envelope. The longest stretch of contiguous signal above this threshold was judged to be the vowel (this metric was correct in all cases to identify the general region of the vowel). Next, to eliminate the transition regions between the vowel and surrounding consonants, 20 ms of signal was removed from the beginning of the vowel, and 35 ms of signal was removed from the end of the vowel. The extracted vowels were listened to in order to verify that the vowels had been extracted well. A few vowels were removed from the set due to clicks or other signal artifacts. The syllable waveforms at this point had a sampling rate of Fs=10000. They were normalized to have a specified average power per unit time. It was also at this point that the signal was scaled so that the levels above and below 2500 Hz matched those that were used in the confusion matrix experiment with real subjects, for a given subject. 2.5 Signal processing simulating hearing loss 2.5.1 Methods of frequency smearing There were three methods of frequency smearing considered; or more specifically, two methods simulated reduced frequency selectivity in the impaired ear (Baer and 42 Moore, 1993 and ter Keurs, et al., 1992), and one method simulated reduced temporal resolution in the impaired ear (Hou and Pavlovic, 1994). These methods shall be referred to as the "Moore," "ter Keurs," and "Hou" methods in short. In all of these methods, the purpose of processing was to generate a time waveform that would sound to normal-hearing listeners as if they had hearing loss. Moore Smearing The "Moore" method was first presented in Moore et al. (1992), then modified somewhat in Baer and Moore (1993) and later expanded upon and further modified by Nejime and Moore (1997). The method used in this experiment is that presented in the Baer and Moore (1993) paper. This smearing method attempted to create an excitation pattern for a normal-hearing listene evoked by a processed sound that would resemble the excitation pattern for an impaired-hearing listener by a normal sound. This attempted to duplicate the effect of the wider-than-normal auditory filters and reduced frequency selectivity found in impaired-hearing listeners. The particular degree of widening was intended to resemble that found in individuals with "moderate or severe cochlear hearing loss" (Baer and Moore, 1993). This processing notably did not attempt to duplicate any time-domain features of what hearing-impaired listeners would experience, but only spectral differences, as it is likely impossible to accomplish both simultaneously (Baer and Moore, 1993). Block diagrams of the processing done in this method are shown in figure 2-2. This method of smearing used overlap-and-add to process the input signal over time. 8 msec chunks of the waveform were extracted, then windowed, and an FFT was done on each chunk. Next, the FFT was separated into the amplitude and phase components. The phase component was left unmodified during the processing. The amplitude component was squared to generate the power, and then was smeared as follows. Auditory filters of normal width and of a width N times larger than normal were calculated, with center frequencies at each of the frequencies in the FFT. These auditory filters were stored in a matrix, such that if a FFT was multiplied by the 43 INPUT HAMMING WAVEFOAM - - FF T - - - 100 IFF T - -- WAVEFOAK --- - AND ADD 100 INPUT SPECTRUM s0 80 60 60 40 40 0 INPUT SPECTRUM OVERLAP SPECTRUM WINDOW C OUTPUT SMEAR 1 2 3 4 Frequency (kHz) OUTPUT SPECTRUM PCA0 OTU f i4 Frequency (kHz) SEPARATE INTO PowER CONVOLVE WITH RECOMBINE POWER AND PHASE PHASE SMEARING FUNCTION WITH PHASE OUTPUT SPECTRUM Figure 2-2: Block diagrams of Moore frequency smearing algorithm. The top block diagram shows the main processing sequence, and the bottom block diagram shows detail for the "Smear Spectrum" block in the top part. Taken from Baer and Moore, 1993. matrix, each frequency in the FFT would contribute its own excitation pattern to the final result. In other words, the end result of an FFT multiplied by the matrix was the excitation pattern generated by a stimulus with that FFT, and using those auditory filters. To create the effect of normal-hearing listeners having widened auditory filters, the signal was multiplied by the wider-than-normal filter matrix in order to generate the effect of wider auditory filter, and then multiplied by the inverse of the normalwidth filter matrix. This procedure removed the effect of a normal-hearing listener's own auditory filters, leaving just the wider-than-normal filters to create an excitation pattern. After this smearing procedure, the square root of the result was taken to convert back to amplitude, and this was combined with the unmodified phase. An inverse FFT was taken, and then different processed chunks of the signal were combined using the overlap-add procedure to create the final output waveform. 44 The particular details of the processing were as follows. The original vowel signals, at a sampling rate of Fs=10000 Hz, were resampled to 16000 Hz for the Moore processing. Next, the vowel waveforms were divided up into frames for processing (analogous to the inverse of overlap-and-add). Frames were of length 128 samples, corresponding to 8 ms, with a shift of 64 samples (4 ms) between frames. The frames were then windowed with Hamming windows of length 128 samples, then padded with 64 zeros on each side, to create a total length of 256 samples. An FFT of each frame was then taken, and separated into the amplitude and phase. The amplitude component was squared, then multiplied by two matrices. One represented auditory filters widened by a factor of N, and one was the inverse of a matrix of normal-width auditory filters. The auditory filters were symmetric and were of the form of a roex(p) filter: (2.1) W(g) = (1 + pg)exp(-pg), "where g is the [frequency] deviation from the center frequency (fc) of the filter divided by fc." W(g) is the resulting filter shape, and p is a measure of the sharpness of the filter. p is calculated using the following equations: (2.2) ERB = 24.7(0.00437fc + 1). (2.3) p = 4fc/ERB, In these equations (and elsewhere), ERB is the equivalent rectangular bandwidth of the filter. To simulate widening of the auditory filters, the value of p calculated via these equations was then divided by the widening factor (N). Or equivalently, the calculated ERB was multiplied by the widening factor before calculating p. For the normal-width filters, p was not modified. The frequencies in the FFT (and hence the frequencies at which the auditory filters were centered on) were from 0 to 8000 Hz every 62.5 Hz. After the smearing was performed, the frequencies higher than 6875 Hz (the 110th 45 frequency) were set to 0. Next, the square root of the result was taken to generate the spectrum amplitude instead of power. Finally, the original phase was recombined with the modified spectrum, and an inverse FFT was taken to create a time-domain signal of length 256. These 256-point frames were recombined using the overlap-add method with a shift between frames of 64 samples. In the Baer and Moore paper (1993), the resulting signal was low-pass filtered at 7 kHz both before and after the smearing process; however, this was not done in the current implementation of the smearing algorithm. Finally, the power of the smeared waveform was normalized to equal the power of the original input signal, in order to keep the level constant across stimuli. Examples of the outputs of the processing by the Moore method in the present simulation are shown in figures 2-3 and 2-4. The second figure shows the spectrum converted into "channels," which essentially shows the excitation pattern produced by the signal. The process of creating these channels will be explained later. It should be noted that in the implementation in this research, symmetric auditory filters were used. This assumption is realistic at moderate sound levels, but both the final stages of processing and the real hearing-impaired listener experiment used high sound levels, such that this assumption would not be accurate. This is one potential source of error in the results, and will be discussed later. Also, the Baer and Moore (1993) paper discussed the actual effective degree of smearing created by the model relative to the intended degree of smearing. This is illustrated in figure 2-5, showing examples of a vowel spectrum before and after smearing. They found that as a result of the overlap-add procedure, the final waveform had an effective smearing degree of much less than specified. For an intended smearing factor of 3, a representative vowel had an actual smearing factor of about 1.5, as determined by a least-squares matching over the formant-containing frequencies. This result also applies to the present simulation, and it is likely that it applies to the ter Keurs, et al. (1992) algorithm as well, which also uses the overlapadd method. The ter Keurs, et al. algorithm used longer windows than the Baer and Moore algorithm, so the overlap-add method may have less of an effect on reducing 46 Spectra of signals with different amounts of Moore smearing. 100 -u 1.5 50 3.0 0- -50 0 500 1000 1500 2000 2500 frequency [Hz] 3000 3500 4000 4500 5000 Figure 2-3: Spectra resulting from different amounts of Moore smearing on the vowel II taken from the word BIIDH. All files were put through the recruitment algorithm but with no recruitment. The unsmeared signal is shown at the regular level, while the other traces have been offset by -40 dB increments. The labels to the left of the traces indicate the amount of smearing, in terms of the effective number of ERBs of the widened auditory filters; US indicates the unsmeared signal. 47 110 I 100- us 90 1.5 Channels corresponding to signals with different amounts of Moore smearing. II I I I I I I I CO 0 _0 3.0 80- 0D C 0 4.5 706.0 060-9.0 50- 40I I I 0 2 4 I I I I I 6 8 10 12 14 Channel #: spacing corresponds to a log scale. 16 18 20 Figure 2-4: Channels resulting from different amounts of Moore smearing on the vowel II taken from the word BIIDH. All files were put through the recruitment algorithm but with no recruitment. The unsmeared signal is shown at the regular level, while the other traces have been offset by -8 dB increments. The labels to the left of the traces indicate the amount of smearing, in terms of the effective number of ERBs of the widened auditory filters; US indicates the unsmeared signal. 48 the effective degree of smearing. ter Keurs Like the Moore method of smearing, the ter Keurs method (ter Keurs et al., 1992, and ter Keurs et al., 1993) simulated only reduced frequency resolution. Unlike the Moore method, the algorithm for smearing did not attempt to simulate the frequency resolution of hearing-impaired listeners exactly; rather, the algorithm was a more general-purpose smearing algorithm that simulated reduced frequency resolution. The model did not smear the actual spectrum of the signal, as did the Moore model, but rather the spectral envelope, and then imposed the smeared spectrum on the original signal: ter Keurs et al. (1992) write, "in this way spectral contrasts [are] gradually reduced, but the signal's phase and harmonic structure [are] preserved." The smearing was accomplished by first projecting the spectral envelope onto a log-frequency scale, then convolving it with a Gaussian-shaped filter that effectively acted as a low-pass filter. The convolution acted to average the envelope over B octaves, where B is the effective ERB of the filter. The overall processing system used in the ter Keurs et al. (1992) model is shown in figure 2-6, and the specifics of the spectral smearing are shown in more detail in figure 2-7. A sample of the output of the model used by ter Keurs et al. is shown in figure 2-8, while samples of the output of the implentation of the ter Keurs method used in the current simulation (which differs from the original slightly) are shown in figures 2-9 and 2-10. The signal processing sequence is rather similar to the Moore method, in general; it divides the signal into frames, does an FFT, smearing, an inverse FFT, then creates the new signal via the overlap-add method. In detail, the processing was as follows. The vowel waveforms were first resampled to be at Fs=15625, having previously been at Fs=10000. (The waveforms used in the experiments run by ter Keurs et al. (1992 and 1993) were bandlimited to 6250 Hz.) A vowel to be smeared was then divided into 256-sample frames (corresponding to 16.4ms) with a shift of 64 samples (4.1ms) between frames. Each frame was then windowed using a Hamming 49 SMEAREO INPUT SPECTRUM OUTPUT SPECTRUM INPUT SPECTRUM 100 AFTER OVERLAP/ADD 100 100 FRAME 2 FRAME 2 FRAME 2 80 80 80 l 60 60 60 CA 40 40 0 1 2 3 4 6 5 100 40 0 1 2 3 4 5 6 100 0 60 60 50 40 40 3 4 5 6 4 5 FRAME 80 2 3 FRAME 3 80 1 2 5 100 FRAME 3 s0 0 1 3 40 0 1 2 3 4 5 6 0 1 2 3 4 5 6 100 100 FR 10 Do Do FRAME 4 FRAME 4 80 80 60 60 60 40 40 1 2 3 4 5 40 0 6 100 1 2 3 4 5 6 100 0 So 60 60 60 40 40 3 4 5 6 100 1 2 3 4 5 0 6 200 60 60 60 40 6 6 2 3 4 5 6 AVERAGE 80 2 3 4 5 Frequency (kHz) 5 100 80 1 1 AVERAGE 80 40 4 40 0 AVERAGE 0 3 FRAME 5 80 2 2 FRAME 5 80 1 1 100 FRAME 5 0 10 100 4 80 0 C AE 40 0 1 2 3 4 5 Frequency (kHz) 6 0 1 2 3 4 5 Frequency jkHz) 6 Figure 2-5: Sample spectra showing the results of smearing, for a synthetic vowel /ae/. The different frames (vertical dimension in the figure) show successive chunks of signal that are partially overlapping. As can be seen in the right column, showing the output spectra, the effective degree of smearing is less than was intended. The intended degree of smearing is seen in the middle column, which shows the results after smearing but before the overlap-add process. Taken from Baer and Moore, 1993. 50 window, and padded with 384 zeros on each side to make a total length of 1024 samples. An FFT was taken of the result, and the power spectrum was found by squaring the amplitude of the FFT. At this point, the processing done in the current research differed slightly from that which was described in the ter Keurs et al. papers. In the current simulation, the envelope of the power spectrum was found by low-pass filtering it twice (in succession) with a first-order low-pass filter with a 3dB frequency of 894.4 Hz. In the paper, the spectral envelope was found by taking the log of the power spectrum then an inverse FFT to find the signal's cepstrum. In ter Keurs et al. (1992), they write about the subsequent processing, "This cepstrum is "liftered" and through a forward FFT the (log) spectral envelope is obtained. The envelope is then adjusted to run smoothly over the spectral peaks." It is unclear whether or not in ter Keurs et al.'s implementation at what point the inverse log of the spectral envelope was taken-they might have done the convolution with the Gaussian filter with the log spectral envelope rather than the linear spectral envelope, as was done in the current simulation. This is another potential way that the processing techniques could have differed. After the spectral envelope was found, it was projected onto a log base 2 frequency scale. In the current implementation, since this projection is non-uniform, intermediate values were interpolated in a stepwise manner-the values between one known point and the next were set to value of the lower frequency point. Next, the log-scale power spectrum was smeared by convolving it with a Gaussian of the following form: (2.4) W(log 2 f) = e (1092f /B) 2 where log2 f is the base 2 logarithm of the frequency scale, and B is the ERB of the Gaussian, in octaves, as previously stated. Further details about the mathematics behind this smearing process can be found in ter Keurs et al. (1993). After the log-scale power spectrum was smeared, it was converted back to a linear frequency 51 digital input signal 15,625 samples/s input buffer filled with 256 samples (16.4 ms); 192 samples overlap with previous buffer 256 samples Hamming window 256 samples symmetrical padding with 768 zeros 1024 samples forward FFT SPECTRAL SHEARING I inverse FFT output window over 1024 samples overlapping addition with previous buffers output signal Figure 2-6: Block diagram showing the details of the ter Keurs processing method. Taken from ter Keurs et al., 1992. 52 original complex spectrum 1024 samples calculation of power spectrum calculation of spectral envelope projection of spectral envelope on a log-frequency scale convolution with Gaussian-shaped filter imposition of the smeared spectral envelope on the original complex spectrum I Figure 2-7: Block diagram showing the details of the Spectral Smearing block from 2-6. Taken from ter Keurs et al., 1992. 53 1/8 20 dB 1/2 2 4 6 frequency in kHz Figure 2-8: Sample of the outputs of the ter Keurs frequency smearing algorithm. The top log spectrum, labeled "US," shows the unsmeared signal, while the bottom three spectra show the results of smearing of 1/8, 1/2, and 2 octaves, as labeled. Taken from ter Keurs et al., 1992. 54 Spectra of signals with different amounts of ter Keurs smearing. 100 - US 500.5 0 1.0 0 -50> 1.5 -100 2.0 -1503.0 -200 0 500 1000 1500 2000 2500 frequency [Hz] 3000 3500 4000 4500 5000 Figure 2-9: Spectra resulting from different amounts of ter Keurs smearing on the vowel II taken from the word BIIDH. All files were put through the recruitment algorithm but with no recruitment. The unsmeared signal is shown at the regular level, while the other traces have been offset by -50 dB increments. The labels to the left of the traces indicate the amount of smearing, in octaves; US indicates the unsmeared signal. One octave of smearing is approximately equivalent to 3.9 ERBs of smearing. 55 110 1 100- us 90- 0.5 Channels corresponding to signals with different amounts of ter Keurs smearing. 1 1 1 1 1 CU) 80- * 1.0 CU 0 701.5 CD 60- 0 CU cc: 40- - 2.0 .. 3.0 302010 0 2 I 4 I I I I I 6 8 10 12 14 Channel #: spacing corresponds to a log scale. I 16 18 20 Figure 2-10: Channels resulting from different amounts of ter Keurs smearing on the vowel II taken from the word BIIDH. All files were put through the recruitment algorithm but with no recruitment. The unsmeared signal is shown at the regular level, while the other traces have been offset by -12 dB increments. The labels to the left of the traces indicate the amount of smearing, in octaves; US indicates the unsmeared signal. One octave of smearing is approximately equivalent to 3.9 ERBs of smearing. 56 scale, then imposed on the original spectrum by multiplying the original by the ratio of the old and new amplitudes (i.e., the square root of the power spectrum). The inverse FFT was taken, and the result was then windowed again. This second window was of length 1024 samples, and had a "unity central part of 256 samples and sine wave-squared skirts." Finally, the output signal was reconstructed by overlap-adding the frames, with a shift of 64 samples between frames. The output signal was then truncated appropriately to match the length of the input signal, and normalized to match the power per unit time of the input signal. Hou The method of smearing created by Hou and Pavlovic (1994) attempted to simulate the reduced temporal resolution found in hearing-impaired listeners. As such, the frequency content of the processed signals was close to normal, and indeed, one of the intentions of the authors was to not corrupt the signal's spectrum, as they intended to isolate the reduced frequency selectivity aspects of hearing loss while leaving out the other aspects of hearing loss. Also unlike the previous two models, this algorithm does not use an overlap-add paradigm, but instead first filters the signal into a number of adjacent frequency bands, then does processing on the entire waveform for each band. This was done to simulate the ear's analysis of the sound by the "auditory filters" into similar adjacent frequency bands (or channels). By doing temporal smearing within each frequency region separately, the brain would be presented with a signal that would be similar to what hearing-impaired listeners' brains would receive. The model's method of reducing temporal resolution was based on a model of human perception that included a few stages beyond the separation of the signal into different frequency channels. A block diagram of this model is shown in figure 2-11. In this model, after the signal is filtered into adjacent frequency bands, a nonlinear device transforms the signal. A squaring operation was assumed by the authors, although other nonlinear devices such as rectification are possible (Hou and Pavlovic, 1994). After the nonlinear stage is a temporal window that uses the information under 57 W(-t BANK i b OF C DECISION DEVICE FILTERS Figure 2-11: Block diagram of perceptual model of human temporal processing, used by Hou and Pavlovic in providing rationale for their temporal smearing model. Taken from Hou and Pavlovic, 1994. the window; and finally a decision device compares the resulting signals. The Hou and Pavlovic model incorporates stages to exploit the properties of this perceptual model, as is explained more in the following paragraph. The specifics of the implementation of the Hou and Pavlovic (1994) processing method are as follows (see figure 2-12). The 10000 Hz vowel signals were first resampled to 16000 Hz, to have a sampling rate similar to the other smearing methods. Next, the signal is filtered into 32 channels by fourth-order Gammatone filters spaced one ERB apart. The Gammatone filter has a shape very similar to that of the roex(p) filter used in the Moore method. (The authors considered using roex(p) filters, which are presumably a better representation of the true auditory filters, but decided on Gammatone filters for their calculation efficiency.) For the purposes of implementing the described model as closely as possible, the current research also used Gammatone filters to create the channels for this stage. The impulse response of the Gammatone filter is (2.5) 1(t) = t 3 exp(-27rbt)cos(2rfot), 58 [multiple filter bank outputs in parallel] Input signal Auditory Filter Bank Fitr1 FitrSmsignal Envelope Smearing u 0Smeared A (a) wc(-t) &Envelope A ADetectorRotE _B X2 D * C SquareE* -- G F SA / B (b) Figure 2-12: Block diagram of the Hou temporal smearing model. The top diagram (a) shows the overall signal processing path, and the bottom diagram (b) shows the specifics of how temporal smearing is accomplished. Figure reproduced from Hou and Pavlovic, 1994. 59 "where t is time, fo is the center frequency of the filter, and b = 1.019B is the parameter determining the sharpness of the filter," with B being the ERB of the filter (Hou and Pavlovic, 1994). In implementing the Gammatone filter bank, the paper used 23 filters covering a frequency span of 200-5000 Hz, as the stimuli they used were bandlimited to this frequency range. However, in the present implementation stimuli were used that had frequencies below 200 Hz, and (due to upsampling to 16000 Hz) up to 8000 Hz. Though the frequencies above 5000 Hz were generated through the upsampling process and extremely small in magnitude, to make the results as compatible as possible with the other smearing methods, filters were used up to this frequency. After the signal is separated into channels, each channel is processed separately for smearing. The temporal smearing process, shown in figure 2-12 (b), first involved finding the temporal envelope of the channel, using a Hilbert transform. The envelope was then squared, and convolved with a "smearing temporal window." This smearing window has a shape similar to that of the auditory temporal window used in the perception model described above; both of these windows are roex(T,) functions. These have the form of (2.6) W(t) = (1 + 2t/T)exp(-2t/T,), where t is time, Tp is a time constant related to the window, and ERD = 2Tp, where ERD is the equivalent rectangular duration of the window. In the paper, the value of ERD ranged from 0-28ms. After the envelope of the signal was convolved with this temporal window, a square root was taken. Through this process of squaring the envelope, convolving it with the auditory temporal window, and taking the square root, the net effect on the signal was to generate a new auditory window that is the convolution of the auditory window internal to a listener and the smearing temporal window, if a normal-hearing listener heard the smeared signal: (2.7) Werceivednew (t) Wsmearing (t) * Winternai(t). 60 Following the square root, the original channel waveform is then scaled to have the new envelope (appropriately time-aligned). This preserves the fine structure but changes the envelope of the channel. Then, each channel is re-filtered with the same Gammatone filter that was used to generate the channel in the first place, in an attempt to reduce the spectral consequences of the time-domain smearing process. Finally, the smeared channels are added together again to produce the final output signal, and the result is normalized to the power per unit time of the original input signal. A sample of outputs from the model used by Hou and Pavlovic is shown in figure 2-13, while examples of outputs from the model used in the present simulation (which differs slightly from the original) are shown in figures 2-14 and 2-15. 2.5.2 Method of Recruitment The method of loudness recruitment was taken from the paper by Moore and Glasberg (1993). The algorithm simulates the effects of threshold elevation found in hearingimpaired people, and the rapid increase in loudness level that occurs at audible levels. Loudness recruitment appears to stem from impaired-hearing listeners not having as much of the compressive nonlinearity normally found in the cochlea. Since a normal-hearing listener's cochlea will act to compress incoming sounds over a large range of levels, to induce the perception in normal-hearing listeners of not having this compression, the processing procedure expanded the envelopes of the signals exponentially. Also, in listeners with only one impaired ear, the loudness perceived in the impaired ear equals the loudness in the normal ear at a certain (high) level. This effect was also included in the simulation by not compressing portions of the signal above a threshold. A general block diagram of the recruitment processing procedure can be found in figure 2-16. As can be seen in this diagram, the signal to be processed was first broken into a number of frequency bands. As with the similar step in the Hou smearing method, this step confined the effects of the processing so that the different "channels" would be processed independently. 61 In this recruitment algorithm, the 60 0 .i~I -I -. 50 (-) ii S40 ---I S30 li:Z - --W P WC 20 -4 I 10 0 1 f 1 i -- f) 50 (C) 40 30 40 '0 20 '0 10 50 (A 40 030 0 '20 -14 10 0 0 0.2 0.4 0.6 0 0.8 --- - 1.6 3.2 4.8 6.4 8.0 9.6 Frequency (kHz) Time (second) Figure 2-13: Sample of outputs from the Hou temporal smearing model. The top panels (a and b) show the unsmeared signal and its spectrum, and the lower panels show the signal and its spectrum after it has been smeared with an ERD of O.lms and 28ms, respectivley. Figure taken from Hou and Pavlovic, 1994. 62 Spectra of signals with different amounts of Hou smearing. 100 50 us 14 0M28 0 -50 42 - 100 56 -150 84 -200 ! 0 500 1000 1500 2000 2500 frequency [Hz] 3000 3500 4000 4500 5000 Figure 2-14: Spectra resulting from different amounts of Hou smearing on the vowel II taken from the word BIIDH. All files were put through the recruitment algorithm but with no recruitment. The unsmeared signal is shown at the regular level, while the other traces have been offset by -50 dB increments. The labels to the left of the traces indicate the amount of smearing, that is, the equivalent rectangular duration of the smearing temporal window, in msec; US indicates the unsmeared signal. 63 Channels corresponding to signals with different amounts of Hou smearing. I I I I I I I 1 10 100- US 14 90Q m *0 I 28 8042 0 CD 70r- -56 c 0 84 0 60CD .C 50- C>D 40- 30- 20I I I 0 2 4 I I I I 6 8 10 12 14 Channel #: spacing corresponds to a log scale. I I 16 18 20 Figure 2-15: Channels resulting from different amounts of Hou smearing on the vowel II taken from the word BIIDH. All files were put through the recruitment algorithm but with no recruitment. The unsmeared signal is shown at the regular level, while the other traces have been offset by -8 dB increments. The labels to the left of the traces indicate the amount of smearing, that is, the equivalent rectangular duration of the smearing temporal window, in msec; US indicates the unsmeared signal. 64 INPUT ----- -ROA FILTER 5" CONSTRUCT ANALYTIC AUDITORY ---Pm SIGNAL -W SEPARATE -MENVELCPE ->FINE AND STRUCTURE -.. EXPAM -g -W AND SCALE ENVELOPE ECWMBINE ->ENVELOPE -* FINE AND -AOUP -30 STAJCTRAE ___ CHANNELS Figure 2-16: Block diagram of the model of loudness recruitment by Moore and Glasberg, 1993. The parallel arrows between the middle blocks represent the outputs of the 23 bandpass filters in the first block. Figure taken from Moore and Glasberg, 1993. filters used to create these channels had bandwidths similar to those of the auditory filters that would be found in impaired-hearing listeners. This was not to simulate additional frequency smearing, but to try duplicate the frequency independence that would be found in impaired-hearing listeners. Furthermore, this bandpass filtering allowed the amount of recruitment being simulated at different frequencies to be varied. Next, the envelope of each channel was found, and the fine structure was set aside for later use. The envelope of the signal was raised to a power N, then the fine structure was recombined with the now-recruited envelope. The purpose of expanding the envelope while leaving the fine structure left alone was to minimize spectral distortion. Finally, the channels were recombined to form a single output waveform. The details of the processing are as follows. In the paper, the filters in the initial filter bank were each formed of a series of four first-order Gammatone filters. In the present implementation, however, it was decided to use a single fourth-order Gammatone filter as had been done in the Hou method. The simulation described in the paper used thirteen filters, with center frequencies from 100 to 5837 Hz. However, it was found that using the filters as described caused undesired spectral distortion in the resulting signal. So, instead, 23 filters were used, with center frequencies from 100 to 9190 Hz. The first six filters were spaced every 100 Hz, and thereafter they had a spacing of one ERB, with the ERBs corresponding to the auditory filters of a normalhearing listener, 0.16fcente,. 1 The filter center frequencies were the same as those 'The 100-Hz spacing of the low-frequency auditory filters with non-constant ERBs probably contributed to some spectral distortion, but having logarithmically-spaced auditory filters there 65 used later in the d' discrimination model, and the filter spacings were described in that paper (Farrar et al., 1987). Though the filters in the recruitment algorithm were spaced every ERB that a normal-hearing listener would have, the filters themselves had ERBs three times that width, corresponding to the ERBs of impaired-hearing listeners, as explained above. This led to a gain in the signal which was removed at the end of the processing for this algorithm. After filtering into channels, the envelope of each channel was found via a Hilbert transform, and the fine structure was calculated by dividing the original signal by the envelope. Next, the envelope was filtered with a low-pass filter with a 3-dB point of 100 Hz. The purpose of this step was to prevent spectral distortion; also, Moore and Glasberg (1993) found that if the envelope were not low-pass filtered, the resulting sounds "had a somewhat 'crunchy' quality." After the channel envelopes were low-pass filtered, each envelope was expanded in the following manner. It was assumed that the loudness recruitment would not take place above 100 dB, that is, above this level the loudness hearing-impaired listeners would experience would match the loudness experienced by normal-hearing listeners. If the unprocessed channel envelope, in dB, is Lu; the envelope after amplitude expansion occurs is Lp; and the power that the envelope is raised to (only below 100 dB) is N, then the following formula was used to simulate amplitude expansion: (2.8) Lp = NLu + K. K is a constant defined by K = (1 - N)100 such that at 100 dB, Lu = Lp = 100 dB. Substituting in for K, we have: (2.9) Lp = NLu - (N - 1)100. If Lu was above 100 dB, Lp was set to equal Lu. This method of expansion was the one used in Nejime and Moore (1997), while the method in Moore and Glasberg would have been even more unnatural. The best solution would probably be to have fixed-width auditory filters with ERBs corresponding to an fcente, of 600 Hz. 66 (1993) multiplied the original Hilbert envelope by the low-pass filtered envelope raised to the power N - 1. The current implementation assumed that a value of N = 2 corresponded to a 50 dB hearing loss. At this point, an attempt was made to fit the simulation's parameters to the audiograms of the different real subjects that took part in the confusion matrix experiment. To do this, for each subject, different values of N were assigned to each channel in accordance with how much hearing loss the subject had at that frequency. Since loudness recruitment is caused by outer hair cell loss, the amount of which is correlated with the amount of hearing loss, the value of N for the channel with center frequency f, was determined by the equation (2.10) N(fc) = 100 100 100 - HL(fe)' where HL(fc) is the amout of hearing loss at frequency fe, in dB. Finally, the independently-recruited channels were added back together, and the output was scaled so that if the recruitment parameter was N = 1, i.e. no recruitment took place, then the level of the resulting spectrum matched the original signal's spectrum as much as was possible. The algorithm did modify the signal slightly even if no recruitment took place through the process of dividing the signal into channels and adding them back together again. Also, in the implementation by Moore and Glasberg (1993), the stimuli they used were low-pass filtered at 7000 Hz. Examples of the output from the current simulation are shown in figures 2-17, 2-18, and 2-19. As can be seen in these figures, the recruitment process produces both spectral distortion and absolute level shifts for signals below 100 dB, while for signals at 100 dB there is a little spectral distortion, and for signals well above 100 dB there is no distortion or change in level. 67 Channels corresponding to different amounts of recruitment on an originally 100 dB signal. 10510095 - CU 90- 0 CD a'-85 - 0 0 80- 1 *70 ~ 2. C 0 S7-.0 5.0 60551 0 2 4 6 8 10 12 14 Channel #: spacing corresponds to a log scale. 16 18 20 Figure 2-17: Channels resulting from different amounts of recruitment on the vowel II taken from the word BIIDH. The vowel was at a level of 100 dB. Recruitment of N = 1.0 (where N is the exponent to which the envelope is taken) corresponds to a hearing loss of 0 dB, recruitment of N = 2.0 corresponds to a flat hearing loss of 50 dB, and in general the recruitment amounts correspond to hearing losses via the equation N = 100dB/(10OdB - hearinglossdB). No additional offsets were used in creating the plot: the effective level changes with increasing recruitment due to the increased absolute threshold. 68 Channels corresponding to different amounts of recruitment on an originally 80 dB signal. 80 F 1.0 1.5 60 [- 2.0 40 k 3.0 20 F 0 5.0 -20 I- -40 0 2 4 6 8 10 12 14 Channel #: spacing corresponds to a log scale. 16 18 20 Figure 2-18: Channels resulting from different amounts of recruitment on the vowel II taken from the word BIIDH. The vowel was at a level of 80 dB. Recruitment of N = 1.0 (where N is the exponent to which the envelope is taken) corresponds to a hearing loss of 0 dB, recruitment of N = 2.0 corresponds to a flat hearing loss of 50 dB, and the other recruitment amounts correspond to hearing losses via the equation N = lOOdB/(lO0dB - hearinglossdB). No additional offsets were used in creating the plot: the effective level changes with increasing recruitment due to the increased absolute threshold. 69 Effect of recruitment on different signal levels 130120110ai (z 90- 8080 C C60 C.O 0 0-50 403020 I I I 0 2 4 I I I I I 14 12 10 8 6 Channel #: spacing corresponds to a log scale. I I I 16 18 20 Figure 2-19: Channels resulting from different amounts of recruitment on the vowel II taken from the word BIIDH. The solid lines are the result of recruitment with a simulated loss of 0 dB, and the dotted lines are the result of recruitment with a flat loss of 50 dB. The dashed line shows channels made from the original signal without recruitment, showing that simulating a loss of 0 dB does not distort the signal very much. Signals with levels of 120, 100, and 80 dB were recruited corresponding to losses of 0 dB and 50 dB, creating pairs of lines for each signal level. The 120 dB signal is above 100 dB the limit of recruitment, so recruitment has little effect. The 100 dB signal has the highest parts at around the limit of recruitment, so they are not affected, while the higher frequency portions of the signal are lower in level and so are distorted. The 80 dB signal is severely distorted and reduced in level by recruitment. 70 First Stage Second Stage Moore smearing ter Keurs smearing Hou smearing Moore smearing ter Keurs smearing Hou smearing Recruitment Recruitment Recruitment Recruitment Table 2.2: Table of the different processing conditions used in the simulation. 2.5.3 Combining the Smearing Methods and Recruitment In the simulation, in some processing conditions the various smearing methods were applied in isolation, and in other processing conditions the model of recruitment was applied after one of the smearing methods had been applied. The case of smearing followed by recruitment was previously studied by Nejime and Moore (1997) using the same Moore frequency smearing function and model of recruitment used in the current simulation. Since the model of recruitment tended to modify the spectrum a little even if no recruitment was being done, in the current simulation all of the processing conditions were put through the recruitment model, but for the conditions where recruitment was not desired, a recruitment parameter of N = 1 was used. The eight different processing conditions done in this simulation are shown in table 2.5.3. In the table, an entry specifying recruitment indicates a value of N other than 1. 2.6 d' Discrimination Model The next steps in the signal processing chain are part of the perceptual model. This model is based around the d' discrimination model (Farrar et al., 1987). In essence, the d' discrimination model assumes that people will discriminate between sounds based on their spectral differences. So, the model first finds a representation of the spectra of the signals to be compared. It assumes that signals will be classified based 71 on the linear differences between their spectra, then it computes a linear difference between pairs of spectra. Support for the model's accuracy is found in the Farrar et al. (1987) paper, in which listeners' responses to speech sounds modeled by shaped noise were reasonably well predicted by the model. The model involved two sub-blocks, which are illustrated in figure 2-20. In the first block, the make channels block, the model filters the signal into a number of bands spaced by one ERB, as was done in the Hou and recruitment models. The center frequencies of the filters were the same as those used in the recruitment model. As with the Hou and recruitment models, this stage is meant to simulate the auditory filtering performed in humans. One difference from the Hou and recruitment models was the specific shape of the filters used to form the channels. The Farrar et al. paper used an earlier model of the filters, also derived by Patterson (1974). The form of these filters was Hi (f) = 1 + (' (2.11) where fej cj)2 , is the Jth filter's center frequency, and a is a constant that determines the width of the filter via ERB = (7r/2)a. The ERBs of the filters were set to be 100 for the lowest 6 filters (up to 600 Hz), and were set to 0.16fe,j for the filters centered at frequencies higher than 600 Hz. There were 23 filters total. After filtering, the total power per unit length was found for each filter output, then the natural log (ln) was taken of the result. The resulting quantity shall be referred to as the "channels" from a signal. It is important to note that though the signals used had different lengths, and even the ter Keurs processing method required a different sampling rate than the other methods, the process of finding the power per unit time accomodates for these differences. The steps just described filtering the signal into these channels took place in the make channels block in the main block diagram in figure 2-1, as well as in figure 2-20. The next block in the d' discrimination algorithm compares the channels from 72 d' discrimination algorithm block Make Channels block sumaoe A single A oellnt vowe length overnormalize sumand in2 23 Channe2 23 Channels Wx) vowel X2 Can1 sumaover vowe length Compare vowel pairs to find d's Level in each Channels from 1 Ai channel vowel a d' between vowels a and b Channels from vowel b Channel # (i) l d' 2 23 sqrt( If A, ) a os Figure 2-20: Block diagram of d' discrimination model. The bottom block comes after the top block in the processing path. The top block is used to create channels from all vowels processed, and the bottom block takes all combinations of pairs of vowels, and produces a d' distance between each pair. In the top figure, the signal is first filtered into bands. Then, in each band, the average power per unit length of the signal is found by first squaring it, summing over the signal length, dividing by the signal length, and then normalizing to account for variation in signal length between stimuli. Finally, the natural log of the resulting number is taken to produce a channel. In the bottom diagram, the difference (A2 ) between corresponding channels is found for a pair of stimuli, as indicated in the plot. These differences are then put through the formula in the bottom of the figure. E is an arbitrary constant that is a scale factor for all d' distances. 73 pairs of signals to find the d' distance between each pair of signals. This is illustrated in the bottom part of figure 2-20. For a given pair of signals, the difference between each corresponding channel from the two signals was found, labeled Ai in the figure. Then, the d' value was found via 23 (2.12) A d = in which A2 and - have been previously described. At the output of this processing block, then, was a matrix of d' values for all combinations of signal pairs. One additional processing step was done while finding d' values for pairs of signals. To simulate the effect of the real subjects' hearing thresholds, the audibility of the signals was computed, for each subject separately. The levels of the channels were first converted into physical sound pressure levels through a gain. Next, this physical level was compared against the real subjects' absolute hearing thresholds. In the d' discrimination algorithm, if the level of any channel from either signal being compared fell below a subject's hearing threshold, the difference between that channel and its pair was removed from the calculation of d' for that subject. Or in other words, a simulation was done for each real subject, and the audibility of the stimuli to that subject was computed. In doing this, it was hoped that the results found for each subject would match the subject's actual confusion matrix. Finally, it should be noted that while the d' discrimination model assumed that humans discriminate between sounds on the basis of the spectrum alone, the Moore processing method (and to some extent the ter Keurs processing method) only attempted to simulate the spectral effects of reduced frequency resolution in the impaired ear. So, while this perceptual model may be leaving some elements of the signals out relative to how real people would perceive sounds, it is entirely appropriate for evaluation of these frequency smearing models. The Hou model, however, attempted to simulate reduced timing resolution, and in fact tried to limit the spectral consequences. There were time differences between the lengths of the 74 vowels in the CVC tokens, so the real subjects' comprehension would be affected by temporal smearing, but in the present simulation this was not taken into account. 2.7 Decision Model Once the d' values for all of the pairs of signals had been found, some way was needed to convert these results into a confusion matrix, for comparison with the real subjects' results. A decision model was used that assumed the following. Each stimulus generates a vector of perceptual cues in a perceptual space internal to a listener. The locations of these points in the perceptual space will be referred to as "stimulus centers." The listener also creates a set of "response centers," points in the perceptual space that are used as models or prototypes for each category of stimulus. In the present simulation, it is assumed there is one response center for each vowel, while the stimulus centers are the many examples of each vowel that are presented to a listener. It is assumed that the response centers are used in classifying the stimulus centers (or stimuli) into different categories. It is also assumed that, due to uncertainty in a listener's mental representation, the cue vector for each stimulus center is the center of a multi-dimensional Gaussian probability distribution. The distribution corresponds to the various cue vectors that could be used by a listener in the actual decision of classifying that stimulus, and the distribution probabilities being the associated probabilities of those cue vectors being used. The last step of the decision model assumes that a stimulus will be classified in the category corresponding to the response center that its cue vector is closest to. A simple illustration of the model is shown in figure 2-21. A block diagram of the inner workings of the decision model is shown in figure 2-22. In the implementation of this model, first the matrix of d' values was put into a classical multi-dimensional scaling algorithm that produced a map of the points that accurately represented their d' distances. Only the first three dimensions of the resulting map were used, but these provided sufficient accuracy. It was assumed 75 = Response Centers x X x x 0 CO) X x x X xx x C 0 C E 5 0T xf x XX x xx x x x 1 x X 0 x 0 c 0 0 0 x 0 0 0 0 00 0 X x x 0 x X x 0 0 0 x X x X x C c'J X X x,O, 0 = Stimulus 0 0 0 00 0 0 Centers for Response Centers 1,2,and 3, respectively. Each stimulus creates perceptual cues in a Gaussian distribution about the Stimulus Center associated with it. 0 o * S 0 * 00 S OS .S (D) . * * .l3 S 0 0 * S Perceptual Dimension 1 -- units of d' Figure 2-21: Illustration of decision model function in two dimensions. Stimulus centers are represented by the small points in the figure, and are surrounded by Gaussian distributions in perceptual space. Response centers are found by taking the centroid of the stimulus centers of a single type. The probability distributions around each stimulus center are assumed to evoke responses corresponding to the closest response center. The lines dividing the figure are the boudaries of these so-called response regions. 76 Decision Model Matrix of d's vowel pairs around each Multi-dimensional Scaling Stimulus Center Map ef Stimulus Sum the contributions Confusion from each Matrix vowel type in Perceptual Space response Find Centroids of Stimulus Centers (Find Response Centers) region Find boundaries , between Response Centers N Figure 2-22: Block diagram of decision model. that this resulting map was the perceptual representation of the stimuli by a listener: the vector of coordinates of each point in this multi-dimensional map was the cue vector for that point in a perceptual space, and the dimensions of the map were the dimensions of the perceptual space. To implement the perceptual uncertainty of each stimulus center, each stimulus center was replaced by a multi-dimensional Gaussian centered at the stimulus center's location. Also, the locations of the response centers were assumed to be the centroids of the stimulus centers for each type of vowel. Finally, each point in the Gaussian distributions surrounding the stimulus centers was assigned to category based on which response center it was closest to. A confusion matrix was formed by summing the probabilities of the points for each stimulus category that were classified as each of the possible responses. In the simulation, the following parameters were used in this model: only the first 3 dimensions of the perceptual (d') map were used, so the Gaussians were in 3 dimensions. A variance of d' =0.1 was assumed, where the maximum dimensions of the d' map were usually about 4-5 d' units in the first dimension and about 3-4 d' units in the second dimension, to give an idea of the scale. 77 Chapter 3 Results Tables of simulated confusion matrices for all processing conditions are in Appendix B, and tables of the real subjects' actual confusion matrices are found in Appendix C. 3.1 Total Error Rates Plots of the total error rates for the simulated confusion matrices fitted to the different subjects' data can be found in figures 3-1 through 3-10. Figures 3-1 through 3-5 show the total error rates of the simulated confusion matrices, while figures 3-6 through 3-10 show the error rates relative to the confusion matrices for unsmeared stimuli. All figures in this chapter are located at the end in a dedicated section. The total error rates of simulated confusion matrices are almost always higher than the total error rates of the confusion matrices produced by the real subjects in 17 out of 120 smearing conditions. Of these, 15 of them were either the unsmeared stimuli or processed with the Hou smearing condition, which tended to give the similar results as the unsmeared stimuli. Moreover, the unsmeared stimuli have higher error rates than the real subjects in two cases, in one case they are almost identical, and in the other two cases they are within 7 percent. This indicates that the total error rates are limited by the perceptual model, since that was the processing stage responsible for the unsmeared stimuli producing large total error rates. 78 If we subtract the errors in the unsmeared stimuli from the rest of them (that is, find the error rates relative to the unsmeared stimuli), then the error rates are comparable to those of the actual subjects in a significant portion of the conditions. However, subtracting the confusion matrices from the other processing conditions does not produce confusions identical to those made by real subjects, as will be discussed later. With increasing smearing for the Moore and ter Keurs methods, the total error rate generally increases approximately linearly, but with a varying slope. As the smearing is increased by a factor of 2, there is a change in total error rate of between 0 and ~18 percent, with the majority of cases having a change of 5-10 percent. This change is independent of whether or not recruitment occurs. With the Hou smearing and no smearing, there is little or no increase in error rate as amount of smearing increases. The other trend in the total error rates is that the error rates with smearing followed by recruitment were above the error rates for the same smeared conditions but with no recruitment in all except 1 of 60 cases. Recruitment added a total error rate of between 0 to 20 percent, with a typical change of about 10 percent. This also meant that the error rates with recruitment were typically far above the subjects' actual error rates. 3.2 Locations of Errors in Confusion Matrices The particular errors in the simulated confusion matrices occur in locations generally different from the errors in the real people's confusion matrices. There is some overlap in the real and simulated confusions, but there are also many confusions that are different between the real subjects and the simulation. With the real subjects, errors are typically in a few locations, while in the simulations they are much more widespread. Sample confusion matrices from the processed conditions are shown in table 3.1. These trends can be seen in a visual inspection of the sample confusion 79 matrices. Also, the locations of the maximum errors in the simulated confusion matrices tended to be in the same positions regardless of the type and degree of smearing, or which subject's parameters were used in doing the processing. Generally, as the amount of smearing increased or recruitment was added, the confusions in the existing locations increased in magnitude, while new confusions in other locations were added that generally had smaller magnitudes. Errors in this same set of locations also occurred in the confusion matrices for the unsmeared stimuli, showing that this pattern was likely a consequence of the perceptual model. Plots showing how well the simulated confusion matrices correspond to the real confusion matrices can be seen in figures C-1 to C-15 in Appendix C, and a plot showing how well the average data in table 3.1 fits to the real confusion matrices can be seen in figure 3-11. These plots show a X2 measure and a Mean Squared Error (MSE) measure of the goodness of fit of the simulated matrices to the real subjects' matrices. The MSE measure was computed by 36 (3.1) N E(O - E)21 i=1 where N is the total number of stimulus presentations, 0 is an entry in a real subject's confusion matrix, E is an entry in a simulated confusion matrix, and the sum is over all 36 entries in a confusion matrix. The X2 measure was computed by 36 (3.2) N (O- E)2 E F f orE > 0.5, where N, 0, and E are the same as in equation 3.1. In the X2 measure, any entries in the simulated confusion matrices less than 0.5 were excluded because very small non-zero values would cause excessive contributions to the measure, and zero values would cause values equal to infinity. Even so, if an entry in the simulated confusion matrix is small and the entry in the real confusion matrix is large, that element will contribute to the total X2 measure disproportionally more than it probably should for a "fair" measure. For this reason, the MSE measure 80 is possibly better to use in evaluating the goodness of fits between the real and simulated confusion matrices, although this measure has the problem that if there are small entries in both the real and simulated confusion matrices, the contribution of those elements is fairly trivial. It is this reason that causes some values to be excessively small (as can be seen in the plots). The plots show the natural log of both of these measures. In general, as can be seen in the X2 plots in figures C-1 to C-15, fitting the recruitment and audibility parameters to correspond to one of the subjects' audiograms did not produce results that fit that subject better. Also, the X2 values produced were generally quite large, indicating that the simulated confusion matrices did not fit the real data very well. Values ranged from about 20-50 for subjects JO and RC, and values of about 150 for the other subjects for the X2 measure, and about 7-20 for the MSE measure (on the plot, the natural log of these values is shown). With the X2 measure, the processing condition that produced the best results varied quite a bit, with almost all processing conditions producing the best fit for some subject's real confusion matrix. Also, there does not seem to be any benefit of using recruitment or not using recruitment to produce a better fit. With the MSE measure, in nearly all of the cases the unsmeared and Hou smearing without recruitment conditions produced the best-fitting confusion matrices. There seemed to be slightly better performance from the unrecruited conditions relative to the recruited conditions with this measure. Also, typically with both measures of goodness of fit the Moore method without recruitment did better than the ter Keurs method without recruitment for the same smearing multiplier. Two of the subjects' (JO and RC) confusion matrices tend to fit the predictions better than did the other subjects' confusion matrices, regardless of the subject's audiogram the recruitment and audibility parameters in the simulation were fit to. This was true especially for the unsmeared and Hou smearing with no recruitment conditions. This effect occurred because the real confusion matrices for these subjects had small total error rates, and the errors that did occur were spread out over many 81 low values rather than having a few large values, which is what happened with the other subjects' real confusion matrices. Since the simulated confusion matrices tended to produce confusions in many locations, the contribution towards the X2 measure from each of the confusions was smaller, but there were more such contributions. The end result was a smaller value of x2. Also, because the simulated confusion matrices corresponding to the unsmeared and Hou without recruitment conditions typically had the smallest error rates among the simulated confusion matrices, and because the simulated confusion matrices tended to have confusions in the wrong locations, these conditions tended to fit the real data better. The Hou method does not cause much change in the spectrum, as expected, so the confusion matrices produced by it are not much different than the unsmeared confusion matrices. Also, and notably, the unsmeared and Hou without recruitment conditions usually had total error rates closest to the total error rates of the real confusion matrices as well. These conditions fit subjects JO's and RC's confusion matrices particularly well due to the combination of both confusion matrices having small error rates, which tended to contribute less to the X2 measures as described above. If the confusion matrices resulting from the unsmeared conditions were subtracted from the other simulated confusion matrices, there was still no benefit in accuracy, as the resulting confusions still tended to be in the same locations. 82 No Sm. Moore Sm. ter Keurs Sm. Hou Sm. AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH 99 0 2 0 0 1 94 0 7 0 0 2 88 0 9 0 0 3 99 0 3 0 0 1 No EE 0 90 0 0 9 0 0 85 0 2 15 0 0 82 0 4 16 0 0 90 0 1 11 0 Recruitment EH II 00 UH 0 0 0 1 0 9 0 0 4 0 2 92 3 10 2 84 3 81 7 0 83 2 9 5 1 0 0 5 0 13 2 0 0 12 5 76 16 15 7 60 2 71 0 12 62 5 11 19 3 0 8 1 0 13 4 1 18 1 62 10 15 21 10 50 4 59 1 19 57 4 21 16 0 0 0 1 0 1 10 0 6 0 5 87 5 12 4 78 2 80 7 0 3 78 7 11 With Recruitment AH EE EH II 00 UH 0 0 0 2 1 96 0 16 2 0 81 1 10 1 6 76 1 6 14 13 5 66 1 1 3 70 2 6 17 1 59 5 7 26 0 2 1 0 0 5 0 93 0 17 2 1 0 80 15 2 8 67 1 9 19 18 52 7 2 1 3 62 1 12 1 21 50 8 13 25 0 3 3 1 9 1 2 85 0 17 2 1 80 1 22 3 51 12 2 10 18 22 11 44 4 1 3 53 2 18 1 22 45 3 25 22 0 4 0 0 0 1 4 94 0 1 1 20 1 77 2 13 6 70 1 8 17 13 5 62 1 2 3 67 7 1 21 1 58 5 8 27 0 2 Table 3.1: Table of the average simulated confusion matrices for each processing condition. Averages were taken across subjects and smearing amounts. From top to bottom, the processing conditions are: No smearing, Moore smearing, ter Keurs smearing, and Hou smearing. The left column contains the results of smearing without recruitment, and the right column contains the results of smearing followed by recruitment. 83 3.2.1 Comparison of Moore and ter Keurs Methods Calculations showed that widening the ERBs by 3 times in the Moore method was the same bandwidth as smearing by 0.75 octaves in the ter Keurs method. However, the ter Keurs method almost always produced higher error rates than the Moore method, by about 5-15%. This can be easily seen on the graphs of total error rate (figures 3-1 to 3-5) since the points corresponding to 3 times the width and 0.75 octaves of smearing were set to be the "base" conditions with the smearing multiplier equal to 1. As the smearing multiplier increases the amount of smearing remains about the same for the two methods. A table showing the average confusion matrices for the Moore and ter Keurs smearing methods with approximately the same total error rates is shown in table 3.2. As can be seen in this table, there are a few differences in the amounts of smearing caused by each method on the average, though the locations at which confusions occur are extremely similar. In confusing EH and II with UH, on average the Moore method produces larger II-UH confusions than EH-UH confusions, while the ter Keurs method produces the reverse. Also, the ter Keurs method produces larger 11-00 confusions than the Moore method on average. These are the primary differences between these average confusions created by smearing, although other minor differences remain. Though these differences can be seen in the averages, in the individual subjects the results vary to a large degree, with in some cases the reverse being true. If the Moore and ter Keurs smearing methods (without recruitment) are compared with approximately equal total error rates, it is found that in almost all of the cases, with both measures of the goodness of fit, the ter Keurs smearing method gives a worse fit to the real subjects' data than the Moore method. This can be seen in figures 3-12 and 3-13, which show the fitting measures compared with different smearing amounts but the other parameters the same. In the figures the ter Keurs results (dotted line) are almost always above the Moore results (solid line). 84 Moore 4.5x, ter Keurs 0.75 oct. Moore 6x, ter Keurs 1.125 oct. AH EE EH II 00 UH AH EE EH II 00 UH Moore Smearing AH EE EH II 00 UH 0 5 0 0 1 94 2 14 0 0 85 0 0 12 74 6 7 0 18 14 7 59 0 2 2 73 0 11 0 14 61 6 11 20 0 2 2 0 0 8 0 90 0 17 0 4 0 79 0 14 6 1 70 9 19 17 8 50 5 0 62 3 0 14 0 20 51 8 13 25 3 0 AH 91 0 6 0 0 3 88 0 9 0 0 3 ter Keurs EE EH 0 6 85 0 0 65 8 4 1 16 20 0 8 0 1 82 0 62 4 10 1 15 21 0 Smearing II 00 0 4 10 57 19 14 1 4 10 48 20 16 0 10 1 21 60 3 0 13 1 22 59 4 UH 3 0 17 10 5 60 3 0 17 16 4 57 Table 3.2: Table of the simulated confusion matrices for approximately equivalent total smearing amounts for the Moore and ter Keurs methods, without recruitment. The results shown in the table are the averages across all subjects. The top row contains a Moore smearing with a widening factor of 4.5, and ter Keurs smearing of 0.75 octaves. The total error rate with that amount of smearing for the Moore method was a little lower than the total error rate for that amount of ter Keurs smearing. The bottom row contains a Moore smearing with a widening factor of 6, and ter Keurs smearing of 1.125 octaves. The total error rate with that amount of smearing for the Moore method was a little higher than the total error rate for that amount of ter Keurs smearing. 3.3 Effects of Processing Parameters on the Results This section will discuss how altering different parameters in the processing affected the resulting confusion matrices. 3.3.1 Loudness recruitment and Audibility The effect of the loudness recruitment algorithm is mainly to lower the level of the spectrum and linearly increase the level differences between different parts of the spectrum. Recruitment also tends to introduce some distortion in the signal, causing its spectrum to be smoothed out similarly to the effect of the frequency smearing. 85 However, this effect occurs to the largest extent in the high-frequency parts of the signal. These high frequency portions of the signal are frequently not audible in the current simulation, thereby removing this effect of recruitment. A plot showing the percent of stimuli that had a given channel audible vs. the channel number is shown in figure 3-14. Because of this, the main effects of recruitment in this simulation were likely to alter the audibility of the different channels, and to increase the d' distances through expanding the spectrum. Also, for two of the subjects (JO and MG), the stimuli were presented at a high level (>105 dB) such that recruitment did not make as much of a difference, only causing a change in total error rate of up to about 8% for the Moore and ter Keurs smearing methods, and about 10% for the Hou and unsmeared conditions. The smaller influence of recruitment can be also seen through inspecting the confusion matrices associated with these subjects, in Appendix B. One additional note should be made about the audibility of the high frequencies. It is clear there is something wrong with the audibility in the current simulation. In the original experiment with real subjects, the levels of high- and low-passed sounds with a cutoff of 2500 Hz were adjusted to maximize the subjects' comprehension of the syllables being played. The current simulation predicts that, at the ideal levels found in the previous experiment, the high-passed portion of the signal would have likely been entirely inaudible for all of the subjects. This was clearly not the case, so something is probably wrong with the simulation. To accomodate for this, a second simulation was done that assumed all frequencies were audible for all subjects. The results of this are discussed later. 3.3.2 Male vs. Female Speakers In the simulation of the perceptual model, the stimuli corresponding to the male speaker and the female speaker formed two quite distinct clusters in the perceptual space (of d' distances) map. As such, the male and female speakers' stimuli were put through the perceptual model (and classified) separately. With the male and 86 female speaker stimuli being classified separately, to form a single confusion matrix at the end the confusion matrices for the male speaker and female speaker were averaged. This would accurately reflect the responses of a person exposed to the entire stimulus set, as approximately half of the vowel samples were spoken by the male speaker, and half were by the female speaker. The confusion matrices produced by the simulation with the male and female points separated were more accurate (as determined by the X2 and MSE measures) than if the stimuli spoken by the male and female speakers were mixed together in the perceptual model. This was largely due to the fact that when they were classifed together, the distribution of stimulus centers corresponding to each vowel tended to be much larger than the distribution of stimulus centers from either the male or female speaker separately. The larger distribution of stimulus centers resulting from the combined male and female simulation, as well as the distinct clusters formed in the perceptual map of the stimuli from the two speakers, are illustrated in figure 3-15. A table of the averages across subjects, with a smearing multiplier of 1, is shown in table 3.3. In this table, the amounts of confusions are larger than with the averages from the male and female speaker stimuli classified separately, despite the fact that the smearing multiplier of 1 data corresponded to the least error rates in the separate simulation. A plot showing the X2 and MSE measures of how well the data from the table fit the real subjects' confusion matrices is in figure 3-16. In the figure, the MSE measure and the X2 measure for subjects JO and RC shows that this combined simulation has a much poorer fit than if the male and female speakers' stimuli are classified separately and then combined. The other subjects for the X2 measure show comparable results. 3.4 Results with Simulating Full Audibility In this new simulation, it was assumed that all frequencies were audible to all subjects; so, all channels were included in calculating the d' distance between pairs of vowels. 87 3.4.1 Error rates Graphs of the total error rate assuming that all frequencies are audible are shown in figures 3-17 through 3-21. In these plots, the points corresponding to the un-recruited signals are the same in every plot, because the only difference between them was the audibility of the points, and now that difference has been removed. The error rates of the unrecruited stimuli with full audibility decreased or stayed about the same as the error rates of the reduced-audibility stimuli. This shows that decreasing the audibility of high frequencies has a detrimental effect on perception. Even with the reduced error rates, the subjects' actual error rates still tended to be below the calculated error rates. The results of the ter Keurs smearing method were never below the actual error rates; with small amounts of smearing, however, the Moore method results could match or be close to the actual error rates. Also, in all subjects, the unsmeared confusion matrices were below or close to the actual error rate, showing that it could be theoretically possible to apply a smearing algorithm and match the error rate exactly. The error rates with recruitment were usually far above the subjects' actual error rates, as with the reduced audibility case. Also as with the reduced audibility simulation, the error rates for the Moore and ter Keurs smearing methods increased with the amount of smearing approximately linearly, while the error rates for the Hou and unsmeared conditions did not increase. Comparing recruitment with and without all frequencies being audible, the error rates corresponding to the recruited signals generally stayed the same or decreased a little with full audibility relative to the simulation with reduced audibility. Although increasing the audibility will tend to decrease the error rate (as shown by the unrecruited stimuli), making use of all the frequencies includes the frequencies that were smeared by recruitment, which would tend to increase the error rate. So, it is hard to conclude anything based on this finding. Also, since the audibility differences have been removed, the only difference in the recruited signals between subjects is the amount of recruitment applied at different frequencies (according to the subjects' audiograms). It can be seen in the figures that the between-subject error rates with 88 recruitment can vary by up to 20 percent, showing that recruitment is a significant factor in producing confusions when the affected frequencies are audible. 3.4.2 Locations of Errors In general, with full audibility the locations of the confusions (and maximum confusions) were usually the same as those with reduced audibility. The amounts of the confusions produced did vary relative to the reduced-audibility case, however. Tables showing the average confusion matrices for the different processing conditions and comparing the Moore and ter Keurs smearing results for similar total error rates are in tables 3.4 and 3.5, respectively. A plot showing the X2 and MSE measures for how well the average data in table 3.4 fits the real subjects' confusion matrices can be seen in figure 3-22. As with the reduced audibility case, altering the parameters of the simulation to fit one of the subjects better than the others did not produce results that fit that subject better. Also, the plots showing the x 2 and MSE goodness of fit measures were very similar to those corresponding to the reduced-audibility case. These can be seen in Appendix C in figures C-16 through C-30. The values in these figures are also large, similarly to the reduced audibility condition. With both goodness-of-fit measures, using recruitment tended to produce worse results than not using recruitment in the majority of cases, but this was not true in general. As with the reduced-audibility case, the unsmeared and Hou smearing (without recruitment), and subjects JO and RC tended to produce better fits than the other conditions. In general, the goodness of fits with the x 2 measure tended to be better with the all-frequencies-audible condition, while with the MSE measure the goodness of fits was about the same. 3.4.3 Comparison of Moore and ter Keurs Methods With the full-audibility condition, if the Moore and ter Keurs smearing methods (without recruitment) are compared with approximately equal total error rates, it is 89 found that neither method consistently gives better results than the other. This is different from the reduced-audibility case, in which the Moore method was better. This can be seen in figures 3-23 and 3-24, which show the fitting measures compared with different smearing amounts but the other parameters the same. 90 No Sm. Moore Sm. ter Keurs Sm. Hou Sm. AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH No AH EE 92 0 0 79 6 1 0 6 0 29 1 0 81 0 0 84 15 1 0 6 0 28 1 0 84 0 0 78 12 1 0 11 0 28 4 0 92 0 0 76 7 1 0 9 0 31 2 1 Recruitment EH II 00 8 0 0 0 3 18 78 3 2 2 63 22 0 29 38 13 5 11 15 0 0 0 2 14 62 6 6 3 51 22 0 23 44 11 8 14 13 0 1 0 7 14 53 8 5 6 38 26 1 26 37 22 9 13 6 0 0 0 4 20 71 5 6 2 57 22 0 31 34 18 6 12 UH 1 0 9 7 4 69 4 0 10 19 6 66 3 0 22 18 7 51 2 0 11 10 4 62 With Recruitment AH EE EH II 00 80 3 17 0 0 1 71 1 1 25 11 6 56 4 14 1 13 4 35 17 1 44 3 23 21 2 3 7 12 21 78 2 19 0 1 0 70 1 2 26 16 4 48 5 17 1 14 5 29 19 0 43 2 28 19 1 3 13 14 20 75 4 16 0 3 2 72 1 6 19 14 6 32 13 11 1 18 5 21 22 2 44 2 24 17 5 4 14 14 18 80 4 15 0 0 2 67 1 3 26 11 6 54 4 12 1 16 3 31 17 1 43 2 28 19 3 4 9 11 19 UH 1 1 10 30 10 54 1 0 10 32 8 49 2 0 23 34 11 45 1 1 13 31 7 55 Table 3.3: Table of the average simulated confusion matrices for each processing condition. Averages were taken across subjects for a smearing multiplier of 1. From top to bottom, the processing conditions are: No smearing, Moore smearing, ter Keurs smearing, and Hou smearing. The left column contains the results of smearing without recruitment, and the right column contains the results of smearing followed by recruitment. 91 No Sm. Moore Sm. ter Keurs Sm. Hou Sm. AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH 99 0 0 0 0 1 94 0 6 0 0 2 88 0 6 0 0 2 98 0 0 0 0 1 With Recruitment No Recruitment EH II 00 UH EE AH UH EE EH II 00 0 0 1 2 0 96 0 0 0 0 0 0 15 5 0 0 80 0 2 1 0 97 15 3 6 71 1 4 3 0 4 92 1 20 18 6 55 1 0 0 5 4 90 1 3 77 7 1 12 0 3 91 6 0 1 69 11 6 13 0 1 90 6 0 2 0 2 1 0 3 1 1 93 0 4 0 0 1 20 3 1 75 0 0 8 3 0 89 20 2 6 59 2 12 9 0 6 79 0 24 17 42 9 5 3 13 8 7 68 4 7 66 8 2 16 1 3 83 7 0 7 10 63 9 11 2 6 8 74 5 11 0 5 1 1 7 1 85 6 0 1 6 0 0 16 4 1 79 0 0 9 7 3 81 22 5 50 11 2 10 9 1 72 12 0 17 25 15 38 4 2 7 13 16 58 7 4 58 2 17 19 0 7 75 1 12 5 52 7 19 15 0 7 74 12 5 7 0 1 0 1 5 1 92 2 0 0 0 0 0 26 5 2 0 66 0 2 2 0 96 13 0 9 72 1 5 3 0 8 88 1 17 14 12 54 3 1 0 4 7 89 1 5 72 9 3 11 0 2 92 6 0 1 59 8 14 16 1 2 87 9 0 3 0 Table 3.4: Table of the average simulated confusion matrices for each processing condition, assuming that all frequencies are audible. Averages were taken across subjects and smearing amounts. From top to bottom, the processing conditions are: No smearing, Moore smearing, ter Keurs smearing, and Hou smearing. The left column contains the results of smearing without recruitment, and the right column contains the results of smearing followed by recruitment. 92 Moore 4.5x, ter Keurs 0.75 oct. Moore 6x, ter Keurs 1.5 oct. AH EE EH II 00 UH AH EE EH II 00 UH AH 95 0 7 0 0 1 90 0 10 0 0 3 Moore Smearing EE EH II 00 UH 1 0 0 4 0 0 7 2 0 90 9 0 6 1 77 14 7 8 68 4 4 87 5 0 5 76 8 4 11 0 1 0 0 9 0 0 13 5 0 82 11 0 6 72 0 20 13 8 53 6 4 71 0 10 15 60 9 7 21 0 ter Keurs AH EE EH 4 0 90 1 0 85 75 1 3 12 8 0 1 4 0 7 0 2 8 0 86 5 0 76 66 0 11 18 6 0 2 5 0 7 0 1 Smearing II 00 0 0 7 7 1 11 13 64 75 14 11 4 0 1 11 7 1 12 12 53 75 11 12 7 UH 6 0 9 3 6 76 5 0 10 11 8 72 Table 3.5: Table of the simulated confusion matrices for approximately equivalent total smearing amounts for the Moore and ter Keurs methods, without recruitment, assuming that all frequencies are audible. The results shown in the table are the averages across all subjects. The top row contains a Moore smearing with a widening factor of 4.5, and ter Keurs smearing of 0.75 octaves. The total error rate with that amount of smearing for the Moore method was a somewhat (~5%) lower than the total error rate for that amount of ter Keurs smearing. The bottom row contains a Moore smearing with a widening factor of 6, and ter Keurs smearing of 1.5 octaves. The total error rate with that amount of smearing for the Moore method was almost identical to the total error rate for that amount of ter Keurs smearing. 93 Total error rates for subject JB 5550 - ..-.-.-.-.-.-.-.-.-.-.-......... A5.. 40 - . . . . . ...-. A A . . .. . . .. . . . .. . . .. . . . 2 35 30U HO 20 A A 225 - . - - - - .. - - -- - - - - - - - - - - - - - - - - - - - - -- - - - - 1510-- No smearing -*- Moore sm. -9- ter Keurs sm. -A- Hou sm. 5- 0 1 2 1.8 1.6 1.4 1.2 1 Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms]. Figure 3-1: Total error rates for the simulation using subject JB's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject JB's actual error rate. 94 Total error rates for subject JO 55 -9- No smearing -*- Moore sm. 50- -EJ- ter Keurs sm. -A- Hou sm. 4540- 35C Ca 0 3025- 20. 15 10 - . -. . . ^A - 5 0 1.4 1.2 1 Smearing multiplier: amount of smearing = 2 1.8 1.6 multiplier * [3x width, 0.75 octave, 28 ms]. Figure 3-2: Total error rates for the simulation using subject JO's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject JO's actual error rate. 95 Total error rates for subject MG 55 -)- No smearing Moore sm. -8- ter Keurs sm. -A- Hou sm. -*- 5045- 40- -C Ell 35CD CL) 300 25...... . . . .. . . . .. . . . .. . 20 - . .. . . . .. . . . .. . . . .. . . . -' ' 15 - 1oV 5 0 A L 9 C 2 1.8 1.6 1.4 1.2 1 Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms]. Figure 3-3: Total error rates for the simulation using subject MG's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject MG's actual error rate. 96 Total error rates for subject PW 55- -e- No smearing -*- 50- Moore sm. -E3- ter Keurs sm. -A- Hou sm. 4540 - --.-. . 3520- 151050 1.4 1.2 1 Smearing multiplier: amount of smearing = 1.6 multiplier * 2 1.8 [3x width, 0.75 octave, 28 ins]. Figure 3-4: Total error rates for the simulation using subject PW's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject PW's actual error rate. 97 Total error rates for subject RC 55-0- No smearing -*-- Moore sm. 50- -E3- ter Keurs sm. -A- Hou sm. 4540- - 35CO) " 25 (D 020 15 - e 0 105 0 1 1 1 1 I 2 1.8 1.6 1.4 1.2 1 Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms]. Figure 3-5: Total error rates for the simulation using subject RC's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject RC's actual error rate. 98 Relative error rates for subject JB 55- -e- No smearing -*-9- 50- Moore sm. ter Keurs sm. -A- Hou sm. 45400 -0 (D 35- (D 10-C 10 0 '' -- 1 .11.5 .21. 5-A 0 ra)e A E) 1 1.4 1.2 Smearing multiplier: amount of smearing e = 1.6 multiplier I *[3x E 2 1.8 width, 0.75 octave, 28 ins]. Figure 3-6: Error rates relative to the total error rate for unsmeared stimuli for the simulation using subject JB's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject JB's actual error rate. 99 Relative error rates for subject JO 55-G-*- 50- No smearing Moore sm. -8- ter Keurs sm. -A- Hou sm. ( 45CO) a 400~ 35- "00 a) ... ............................................ CO 30- E C) o 250 * 20(D - 15- - w 10 5- 2 1.8 1.6 1.4 1.2 Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms]. 1 Figure 3-7: Error rates relative to the total error rate for unsmeared stimuli for the simulation using subject JO's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject JO's actual error rate. 100 Relative error rates for subject MG 55- -E- No smearing -*- Moore sm. 50- -- ter Keurs sm. -A- Hou sm. 45- ( CO (40- 0- o 25- 0---- " 1050 '0 1 1.2 1.4 Smearing multiplier: amount of smearing = ..... 1.6 1.8 2 multiplier * [3x width, 0.75 octave, 28 ins]. Figure 3-8: Error rates relative to the total error rate for unsmeared stimuli for the simulation using subject MG's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject MG's actual error rate. 101 Relative error rates for subject PW 55-&- -*- 50- No smearing Moore sm. -8- ter Keurs sm. -A- Hou sm. 2 P CD 45- o 400 a) a) C cD c 2500 - CO 0 5 0 Figure 3-9: '' 1.6 1.4 1.2 1 Smearing multiplier: amount of smearing = multiplier * 2 1.8 [3x width, 0.75 octave, 28 ms]. Error rates relative to the total error rate for unsmeared stimuli for the simulation using subject PW's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject PW's actual error rate. 102 Relative error rates for subject RC 55---*- 50- No smearing Moore sm. -E- ter Keurs sm. -A- Hou sm. 0 452 (D 0. o 400 E 35a30 - .. . . . --. -- .......... -- 0 20 - ..... . o 150 10- AA 0 A I I ' 1 1.2 1.4 1.6 1.8 2 Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms]. Figure 3-10: Error rates relative to the total error rate for unsmeared stimuli for the simulation using subject RC's audiogram. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject RC's actual error rate. 103 .: . . .. In(Chi-squared) MK H MK H-R- RRRR 8 measure for the average of the simulated data I : . . I . . . I . . . . I . MG P.W . . M.. MK H MK H-I- MK H MK H- - MK H MK H-I- MK H MK HRRRR R R R R --- R R R R- - - R R R R--- 7 6 CO c4 - -35 1 2 :JO: 0 : RO- 30 25 15 20 10 Processing condition and real subject compared to the simulation 0, 35 40 S3 In( (O-E) ) measure for the average of the simulated data 5 - MK H MK H- - MK H MK H- - MK H MK HRRRR.........RR RRRR - MK H MK H-- MK H MK H RRRR RRRR - 4 c'J 2 w 0 0 -1 JO JB -2 0 5 MG PPW . 30 15 20 25 10 Processing condition and real subject compared to the simulation RC. F 35 40 Figure 3-11: Plot showing how well the average of all of the confusion matrix data fit to the subject's actual confusion matrices. Averages were taken across all subjects and smearing amounts, for the reduced-audibility simulation. 104 Comparison of Moore and ter Keurs smearing with about equal error rates: In( Chi-squared ) measure 6.5 6 5.5 5 -c (n 4.5 - Cr 4 ..! F 3.5 3 2.5 L 0 3 20 15 10 5 Different subjects fitted. Moore smearing of S.M. = 1.5, ter Keurs smearing of S.M. = 1.0 Comparison of Moore and ter Keurs smearing with about equal error rates: In( (O-E) 25 ) measure .5 r 2.5 C 2 1.5 1 0 20 10 15 5 Different subjects fitted. Moore smearing of S.M. = 1.5, ter Keurs smearing of S.M. = 1.0 25 Figure 3-12: A comparison of the ter Keurs results (dotted line) and Moore results (solid line) if the total error rates are approximately the same for the two smearing conditions. The ter Keurs result had a smearing multiplier of 1.0, and the Moore smearing had a multiplier of 1.5. For each pair of ter Keurs and Moore values (plotted in the same vertical space), the subject used to create the simulated matrix and the real subject it was being compared to were the same. 105 Comparison of Moore and ter Keurs smearing with about equal error rates: In( Chi-squared ) measure U. 0 -I - 6 5.5 Cu 1-2 5-1 (n 4.5 4 3.5| 0 3.4 20 15 10 5 Different subjects fitted. Moore smearing of S.M. = 2.0, ter Keurs smearing of S.M. = 1.5 Comparison of Moore and ter Keurs smearing with about equal error rates: 25 In( (O-E) 2 ) measure 3.2 - 3 W2.8 1 62.6 C5 -F -. 2.4 -.... 2.2 2 0 5 15 10 20 25 Different subjects fitted. Moore smearing of S.M. = 2.0, ter Keurs smearing of S.M. = 1.5 Figure 3-13: A comparison of the ter Keurs results (dotted line) and Moore results (solid line) if the total error rates are approximately the same for the two smearing conditions. The ter Keurs result had a smearing multiplier of 1.5, and the Moore smearing had a multiplier of 2.0. For each pair of ter Keurs and Moore values (plotted in the same vertical space), the subject used to create the simulated matrix and the real subject it was being compared to were the same. 106 Audibility of different channels for Subject JB 0) -0 0 2 4 I I I 14 12 10 8 6 Audibility of different channels for Subject JO I 16 I 18 = 0.5 ----- 0 2 4 14 12 10 8 6 10 for 2Subject MG1 Audibility of different channels 16 16 18 2 4 6 14 12 10 8 Audibility of different channels for Subject PW 16 18 2 4 14 12 10 8 6 Audibility of different channels for Subject RC 16 18 4 6 18 -0 ~0 =3 0. 501 1 0 .5 - 01 ., .0 :5 .5 - n i I I I 2 8 12 10 Channel Number 14 11 16 18 Figure 3-14: Graph showing the percentage of stimuli that had audible channels for a given channel number, for each subject. Note that the channels correspond to a log scale with channel 1 centered at 100 Hz, channel 12 at 1575 Hz, and channel 19 at 4839 Hz (for reference). Also, it can be seen that in most plots two groups of curves can be seen. The set of curves to the left corresponds to no recruitment, and the set to the right is with recruitment. 107 Stimulus and Response Centers for Unsmeared Stimuli, Male and Female Speakers Combined 1.5 00.0 0.5 F OUH 04 .. '-OAH - - - 0 - - -l - - E -0.5 ICL -1 a, -1.5 F -2' -2. 5 -2 -1.5 -1 -0.5 0 perceptual dimension 1 - units of d' 0.5 1 1.5 1 1.5 Stimulus and Response Centers for Unsmeared Stimuli, Male Speaker 1.5 1 0.5 F . UH 0 C 0 Io E. -0.5 k :3 aL OEH -1 CL -1.5 -2.5 -2 -1.5 -1 -0.5 0 perceptual dimension 1 - units of d' 0.5 Figure 3-15: Plot showing the stimulus centers and response centers for the stimuli from the male and female speakers combined (top plot), and just the female speaker (bottom plot). The stimulus centers are shown by the small black dots, and the response centers are shown by the open circles. In each plot, the response centers are the average of the stimulus centers for each vowel. Only half of the total number of stimulus centers were plotted for computational reasons. As can be seen around the "AH" in the top plot, the stimuli corresponding to the male and female speakers formed two distinct clusters for each vowel. The clusters are not distinct for the other vowels in this plot due to the perspective. Also, it can be seen that the stimulus centers for each vowel are less spread out in the bottom plot. 108 In(Chi-squared) 9 measure for the average of results with S.M.=1.0 of combined Male and Female speaker data - K H-[- -- - 8 - - R R R R- - - - R RHHHM RRI- K H 'M MK H-- R R RR- - MK H M - - - - - MKHH R R RR- -- K HRRRR - 7 6 --. R-C 05 - M s53 2 1 '- JO: :JB 0 5 RW: MVG: RC 10 . 15 20 25 30 Processing condition and real subject compared to the simulation 35 40 In( (O-E) 2 ) measure for the average of results with S.M.=1 .0 of combined Male and Female speaker data 5 - MK H: M K H:-- - -R: R R: R - MK H: M K H'RRRR - MK H M K H- - M K H M K: H- -R -- - - R R: RR - MK H M K H -R R: R P 4 3 2 C 1 0 -1 -2 0 JO JB 5 10 :: PW MG: 15 20 25 RC 30 35 40 Processing condition and real subject compared to the simulation Figure 3-16: Plot showing how well the average of all of the confusion matrix data fit to the subject's actual confusion matrices. Averages were taken across all subjects, with the smearing multiplier fixed at 1.0, for the reduced-audibility simulation. 109 Total error rates for subject JB, assuming all frequencies are audible -& 60 No smearing -*- Moore sm. - - ter Keurs sm. -A- Hou sm. *.----.. AA 50- A- E ....... () 40- 0 O30- 20 =-. ---- - - - -- -- -- --- -- -- - -- - - - - - - - - - GUE 0- 10- 2 1.8 1.6 1.4 1.2 1 Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms]. Figure 3-17: Total error rates for the simulation using subject JB's audiogram, assuming that all frequencies are audible to the subject. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject JB's actual error rate. 110 Total error rates for subject JO, assuming all frequencies are audible 55- -0- No smearing -*- Moore sm. -E- 50- ter Keurs sm. -A- Hou sm. 4540t35-- .. 0-O . . .. . . .. . . . . .... . . CD " 25(D 20 -15 - - 10- -- -- -- -- -- -- -- -- ---------------5- 01 1.4 1.2 1 Smearing multiplier: amount of smearing = 2 1.8 1.6 multiplier * [3x width, 0.75 octave, 28 ms]. Figure 3-18: Total error rates for the simulation using subject JO's audiogram, assuming that all frequencies are audible to the subject. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject JO's actual error rate. 111 Total error rates for subject MG, assuming all frequencies are audible 55-0- No smearing -*- Moore sm. El- ter Keurs sm. -- 50 -A- Hou sm. 45- 40-.- c 35 .. C) ... .. . . . . . a).. 0- '.. .........-. 32 5 - - 25 (D g 20O 15 0 ........... .......................... 0 ................................... 10 5A 0 1 1 1 2 1.6 1.8 1 1.2 1.4 Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms]. Figure 3-19: Total error rates for the simulation using subject MG's audiogram, assuming that all frequencies are audible to the subject. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject MG's actual error rate. 112 Total error rates for subject PW, assuming all frequencies are audible 55- -G- No smearing -*- -f- 50- -A- Moore sm. ter Keurs sm. Hou sm. 45 - 40..................................... 35- 30 - o . A... . .. -..-... . .. . . . . . . . . . . . . . . . ... a)........ "25 (D g 20 15- 10 501- 2 1.8 1.6 1.4 1.2 1 Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 octave, 28 ms]. Figure 3-20: Total error rates for the simulation using subject PW's audiogram, assuming that all frequencies are audible to the subject. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject PW's actual error rate. 113 Total error rates for subject RC, assuming all frequencies are audible 55- 50- -E- No smearing -*- Moore sm. -8- ter Keurs sm. -A- Hou sm. -- 45- 4035(D) CL 300 25- ci) 0- 20- -- -- -- -e -- -- ---- -- . -- -- ---- ---- 15 10 5 0 1 1.8 1.6 1.4 1.2 Smearing multiplier: amount of smearing = multiplier * [3x width, 0.75 2 octave, 28 ms]. Figure 3-21: Total error rates for the simulation using subject RC's audiogram, assuming that all frequencies are audible to the subject. Solid lines indicate results with just smearing and no recruitment, as indicated in the legend, and dotted lines indicate those conditions followed by recruitment. The dashed line is subject RC's actual error rate. 114 In(Chi-squared) measure for the average simulated data with all frequencies audible . . I I . . MK H MK H-I--MK HMK H_:MK H MK H-MK H MK HIMK H MK H- - -RRRR----RR RR- --R R R---RR RR---R R R R 9 8 7 6 CU 05 -5 cn 4 :3 -C. 2 1 0 JO: 0 10 5 P.W G: 15 20 40 35 30 25 Processing condition and real subject compared to the simulation In( (O-E)2 ) measure for the average simulated data with all frequencies audible 5 -:MK -_- H:M K H- - MK H MK H - - M K H M K H.RR R R----. R R R R R R R R - M K H M K: H - - - -- RR RR- MK H M K. H - - -: R R R R 4 3 7zJ 2 0 1 0 -1 -2 0 JO: JB 5 10 PW MG: 15 20 25 RC 30 35 40 Processing condition and real subject compared to the simulation Figure 3-22: Plot showing how well the average of all of the confusion matrix data fit to the subject's actual confusion matrices. Averages were taken across all subjects and smearing amounts, for the simulation in which all frequencies were audible. 115 Comparison of Moore and ter Keurs smearing with about equal error rates: --HH~- 4.5 cI 4 c Cr In( Chi-squared ) measure 3.5 10 5 2.500 10 5 15 20 15 20 25 Different subjects fitted. Moore smearing of S.M. = 1.5, ter Keurs smearing of S.M. = 1.0 Comparison of Moore and ter Keurs smearing with about equal error rates: 2.8 - . In( (O-E)2 ) measure - - 2.62.4 - 2.2 - 2 - 1.8 0 - 1.61.41.2 0.8 0 20 15 10 5 Different subjects fitted. Moore smearing of S.M. = 1.5, ter Keurs smearing of S.M. = 1.0 25 Figure 3-23: Assuming full audibility, a comparison of the ter Keurs results (dotted line) and Moore results (solid line) if the total error rates are approximately the same for the two smearing conditions. The ter Keurs result had a smearing multiplier of 1.0, and the Moore smearing had a multiplier of 1.5. For each pair of ter Keurs and Moore values (plotted in the same vertical space), the subject used to create the simulated matrix and the real subject it was being compared to were the same. 116 Comparison of Moore and ter Keurs smearing with about equal error rates: I- 4.8 In(Chi-squared ) measure 4.614.4t--CU 4.21- --- _0 4 2 3.8[ 3.6 - 3.4' 0 15 "* . .20 10 5 Different subjects fitted. Moore smearing of S.M. = 2.0, ter Keurs smearing of S.M. = 2.0 25 2 Comparison of Moore and ter Keurs smearing with about equal error rates: In( (O-E) ) measure 2.8 2.6- -2.4 C 2.2 1 _J 2 0 20 15 10 5 Different subjects fitted. Moore smearing of S.M. = 2.0, ter Keurs smearing of S.M. = 2.0 25 Figure 3-24: Assuming full audibility, a comparison of the ter Keurs results (dotted line) and Moore results (solid line) if the total error rates are approximately the same for the two smearing conditions. The ter Keurs and Moore results both had a smearing multiplier of 2.0. For each pair of ter Keurs and Moore values (plotted in the same vertical space), the subject used to create the simulated matrix and the real subject it was being compared to were the same. 117 Chapter 4 Discussion 4.1 Why the Simulation Did Not Work The primary problem in this simulation seemed to be that the locations of the largest confusions produced at the output of the perceptual model tended to occur in the same locations, regardless of the amount of smearing or even if there was any smearing at all. A certain set of confusions was present in the results for the unsmeared stimuli. As the amount of smearing increased or recruitment was added, the degree of confusions at these locations increased, and at other locations new confusions arose, but these were generally of a smaller magnitude than the original confusions. With recruitment, there tended to be confusions in almost all entries in the resulting confusion matrix, and even with no recruitment but large amounts of smearing there were many entries in the confusion matrices. Real normal-hearing human listeners do not make many vowel confusions in quiet, and they do not even if they are listening to stimuli in quiet that have been smeared with large amounts of smearing (Baer and Moore, 1993, and ter Keurs et al., 1992). Because humans do not make confusions where the perceptual model in the simulation does, this implies that humans use a different algorithm for doing perception than the model. Either humans must use different cues than the model, or they must use the cues (i.e. the spectrum) in a different manner. 118 Vowel duration is another cue that can be used to discriminate vowels. Certainly, the vowels presented to the subjects in the real experiment differed in duration, as shown by figure 4-1. However, the standard deviation of these lengths was large, and this was almost certainly not the only difference between the real human perceptual model and the simulated perceptual model. Instead, it is more likely that humans use the spectral cues of the vowels in a different way than the simulated perceptual model. This is easily illustrated by the diagrams in figure 4-2. On the left side of the diagram, there are three sample spectra, with peaks centered at two possible frequencies representing the formants in a vowel. Spectra 1 and 2 have a single peak at the first frequency, but the width and height of the formant varies. Spectrum 3 has a smaller peak a the first frequency, but also a peak at the second frequency. Because of the differences in the "formants," the d' algorithm used in the simulation would produce that the spectra were equidistant from each other in a perceptual space, shown in the bottom portion of the diagram. However, because of the presence of the second peak, a human would judge that the third spectrum would be far from the other two. This is illustrated in the perceptual map corresponding to a real human, in the bottom right of the diagram. So, in this manner, the d' algorithm would classify the spectra differently than a real human, which would result in different confusions. Another example, on the right side of the figure, shows a spectrum that is flat except for a single bump. In spectrum 1, the bump is wide and short, while in spectrum 2 it is narrower but taller, and in spectrum 3 it is even more narrow and tall so as to be tonelike. The d' algorithm in the simulation would judge these to be equally different from each other, but a human would easily be able to judge that the tonelike stimulus was very different from the other spectra. It should also be pointed out that the d' algorithm used in the simulation was developed and tested on broadband noise stimuli, and vowels and tones are not noise. So, it is proposed that the following situation occurred in this simulation, that caused the results to be other than desired: 119 regardless of how real people do discrimination, the d' algorithm used in this simulation should reflect the fact that some of the vowels' spectra got closer together as a result of smearing, and thus would be confused in the perceptual model. Assuming that the smearing models more or less correctly duplicate the smearing process that goes on in hearing-impaired listeners, the confusions that real people make should be reflected in the d' distances between those vowels getting smaller. Let us assume for the moment that this actually happened in the current simulation. Even with the d' algorithm reflecting that the vowels humans typically confuse were getting closer together through smearing, another effect was also taking place in the current simulation. In particular, the d' discrimination algorithm used in the simulation will confuse some vowels that real humans could easily tell apart, for example by their spectral pattern. An example of this was -illtistrated above. This effect most likely occurred in the current simulation, causing certain confusions to exist that humans would not have difficulty with due to the different discrimination algorithm used by humans. This caused the confusions to exist in the unsmeared stimuli. When smearing was applied, these already-confused stimuli became more similar to each other, along with the stimuli that humans would be confused by. So, although the confusions made by real people were represented in the perceptual map (or confusion matrices), their effect was masked by the other confusions increasing in magnitude. In some cases, this effect could have occurred to such a degree that the simulation would predict the absence of a confusion between some stimuli, while a human would be confused. So, because of this background of large numbers of confusions, it was very difficult to detect the real confusions. Notably, the smearing algorithms did almost-linearly increase the total error rate of the confusion matrices, so it is clear they were doing something that contributed towards the signals being confused with one another. But, it is unknown whether or not the assumption that they were correctly simulating the frequency smearing is correct, or if they were creating inaccurate confusions. 120 One way of testing whether or not this interpretation of the results is correct is to examine the directions that the stimuli moved on the average relative to one another. A necessary constraint for this hypothesis being correct is that if there are confusions produced by the real subjects, then the vowels involved must move closer together as smearing is increased. If the vowels moved further apart (i.e. the d' distance increased), then either this explanation must be incorrect, the smearing and recruitment algorithms are not functioning properly, or some other mechanism besides outer hair cell loss is responsible for the real subject's hearing loss. Examining the relative motion of individual pairs of vowels is likely the only way to determine if the smearing/recruitment results are accurate or not, but even then there is still the effect that the confusions different than those humans would make would change and obscure the desired confusions. The possibility of using vowel duration as a cue is worth discussing additionally. It could be the case that the real subjects were able to distinguish between sounds on the basis of the vowels' durations, even if their spectra were very similar. If this were the case, we would expect to see fewer confusions in those locations with very different durations in the real confusion matrices. Assuming that the extracted vowel durations are indeed reflective of the actual vowel durations perceived by the real subjects, vowel duration was probably not an important cue in vowel discrimination. Many of the real subjects made AH-II and AH-UH confusions, but the duration differences of these vowels are the largest among the stimulus set. It was also the case that vowels with similar durations did tend to be confused by the real subjects, and in some cases (i.e. OO-UH) the confusion magnitudes were quite large, which would be consistent with vowel duration being used as a cue. However, the existence of confusions between vowels of very different durations argues against vowel duration being an important cue. 121 4.2 Other Sources of Error Besides the perceptual model operating differently from how real humans do perception, there were several other possible sources of error that would cause the simulation results to not duplicate the real subjects' confusion matrices well. First, the real subjects' confusion matrices differed somewhat in terms of the location and severity of confusions, though there were a number of similarities. Many of the locations at which confusions occurred could be found in as many as three or four out of the five subjects (see Appendix A for the actual confusion matrices). However, some variation still remained. It seems reasonable to hypothesize that the mechanisms of hearing loss in the different subjects varied to some extent because of this. Other factors that could have varied between subjects include the amount of outer hair cell versus inner hair cell loss and possible neural conduction losses in the auditory nerve. In the current simulation, it was assumed that all of the hearing loss was due to outer hair cell loss. To the extent that this assumption was not true, there could have been differences between the simulated and real confusion matrices. Also, the current simulation simulated a constant widening of the auditory filters for all subjects (which would correspond to a flat loss), regardless of the subjects' audiograms. While the assumption that the subjects' audiograms are flat is approximately true, it is definitely not the case for subjects MG and RC. It is interesting to note that despite this assumption not being true, subject RC's real confusion matrix correlated better with the simulated confusion matrices than the other subjects' (besides JO's) confusion matrices. This lends support to the earlier explanation that it was the lower total error rates and distribution of errors that caused these subjects' confusion matrices to fit the simulated matrices better. 4.3 Conclusions that Can Be Drawn Despite the fact that the simulated confusion matrices tended to not have their largest confusions in the same locations as the real confusion matrices, some conclusions 122 still can be drawn from the results. The smearing and recruitment algorithms still introduced errors into the confusion matrices, indicating that something was occuring. The confusion matrices are reflective of the amount of frequency smearing that occurred through the processing, because the perceptual model acted as a sort of spectrum analyzer. 4.3.1 Total Error Rates and Smearing and Recruitment It is interesting to compare the exact error rates produced by this simulation with the error rates found when Baer and Moore (1993), ter Keurs et al. (1992), and Moore and Glasberg (1993) simulated smearing and recruitment, respectively, and presented the results to normal-hearing listeners. Baer and Moore (1993) presented the results of frequency-smeared words to listeners at 65 dB SPL with a background of quiet. With No Smearing, ERB x3, and ERB x6 conditions, the total error rates were about 0% for the first two,-and, 1.7% for the third condition. This is very low compared to the total error rates found in the current simulation with Moore smearing and no recruitment, which were in the 10-30% range for smearing of ERB x3, and in the 25-40% range for the ERB x6 condition. Also, in the current simulation as the frequency smearing was increased from ERB x3 to ERB x6 the total error rate increased about 10-15% on average. This is far more than the change found by Baer and Moore. One factor that could potentially explain this difference is that in the Baer and Moore experiment, listeners heard entire words, while in the current experiment vowel identification in nonsense syllables was studied. Listeners in the Baer and Moore experiment could potentially combine cues from different phonemes in the entire word to improve comprehension. ter Keurs et al. (1992) simulated frequency smearing with isolated vowels in quiet, that had been equalized in length, and were presented at 65 dBA. They only performed smearing at 1/8, 1/2, and 2 octave amounts, and found that with the unsmeared, 1/8, and 1/2 octave conditions, the total error rate was only 3%, but with 2 octave smearing it jumped to 68%. In comparison, in the current simulation 123 only the smearing amounts of 0.75, 1.125, and 1.5 octaves was investigated (without recruitment). For all subjects, the total error rates were about 25-35% at 0.75 octave smearing, about 30-37% for 1.125 octave smearing, and about 33-40% for 1.5 octaves of smearing. These error rates are clearly between 3% and 68%, but it should also be noted that in the simulation, the unsmeared vowels had a total error rate of 6-20%. Because the smearing amounts in the current simulation did not overlap well with the smearing amounts in the ter Keurs et al. (1992) experiment, it is difficult to form conclusions about the accuracy of the simulation based on just the total error rates. Also, in the ter Keurs et al. experiment, they used 11 different vowels, while the simulation and experiment by Brantley only used 6. However, ter Keurs et al. (1992) also formed a confusion matrix based on the subjects' responses to the 2-octave simulation. A comparison of this result with the current simulation is justified, as one would expect the confusions to be approximately in the same locations in both cases, even though the confusion amounts may be different. ter Keurs et al. found that almost all of the confusions were vowels being judged by subjects as being back vowels. The experiment used all of the vowels in the current simulation except UH, but it instead used "AW" which is similar. They also found a large confusion with EH being classified as AH, which also appeared in the current simulation, and they found minor confusions with II being classified as AH and EH, the former of which was not found in the current simulation, but the latter was. In the current simulation, there do exist many confusions with vowels being classified as back vowels, with the exception of EE-UH, AH-00, and EH-00. This suggests that the simulation may be somewhat reflective of the responses of human listeners. However, the simulation also produced large confusions at other locations, most notably II-EE, 00-EE, AH-EH, EE-EH, EH-II, 00-II, UH-EH, and UH-II. This large amount of other confusions limits the degree to which it can be concluded that the simulation and experiment produced the same results. Moore and Glasberg (1993) processed sentences with the recruitment algorithm and presented them to subjects, using different input levels and different amounts of 124 recruitment. It is unfortunately hard to compare the results of their experiment with the current simulation, because they only presented the 50% correct points in the paper. With a severe flat hearing loss simulated, the 50% correct point was with an input level of about 79 dB SPL, and with a moderate-to-severe sloping loss simulated, the 50% correct point was with an input level of about 52 dB SPL. In comparison, in the current simulation subjects MG and RC had an approximately moderate-tosevere sloping loss. For subject MG, the input level was about 108 dB for both high and low frequencies, and simulating recruitment produced a total error rate of about 14%. For subject RC, the input level was 80 dB SPL up to 2500 Hz, then 102 dB SPL above 2500 Hz. This produced an error rate of nearly 30%. In both of these cases, the input levels were higher and the resulting error rates were lower than those in the Moore and Glasberg experiment. This is consistent with the simulation operating correctly, but it is hard to tell to what extent. Also in the current experiment, subjects JB, JO, and PW had approximately flat moderate to severe losses. Subjects JB and PW had the input signals present at levels of (80,95) and (85,88) dB SPL, where the numbers in parenthesis indicate the levels below and above 2500 Hz, respectively. In response, subject JB's total error rate was about 40%, and subject PW's was about 27%. For these subjects, the error rate for unsmeared stimuli was 21% and 12%, respectively, indicating that this contributed somewhat to the recruited signals' total error rates. Even so, the results for subject JB are roughly comparable to the results found in the Moore and Glasberg experiment. The results for subject PW are consistent with recruitment causing increased error rates, but cannot specify to what degree. For subject JO, the signal was presented at a high level (105,110) dB SPL, causing the error rate to be only 15%. In general, it was found in the simulation that the total error rate as a result of recruitment was comparable to the total error rates for Moore smearing of ERB x3 or ERB x4.5, and usually less than the total error rates for all amounts of ter Keurs smearing. In comparison, the results of the actual experiments indicate that Moore smearing should cause a very low error rate and the error rate for recruitment 125 be greater than it. Moreover, in the current experiment, the input levels to the recruitment algorithm were larger than those used in the actual experiments, but the resulting error rates were still larger than those corresponding to the Moore smearing. This likely indicates that the effect of the perceptual model was to drastically increase the error rates due to Moore smearing, and perhaps also increase the error rates due to recruitment. Comparing the results of the Moore and ter Keurs smearing, the Moore results in the simulation had lower error rates than those from the ter Keurs smearing for comparable amounts of smearing. This result from the simulation is most likely consistent with the experimental results from Baer and Moore, and ter Keurs et al., although in the ter Keurs et al. (1992) experiment, no data is available for smearing bandwidths between 1/2 and 2 octaves. Comparing the results of the ter Keurs smearing and recruitment, recruitment almost always caused total error rates below those from the ter Keurs smearing. Given that in the simulation the input levels to the recruitment algorithm were larger than those used in the experiment by Moore and Glasberg, which would tend to reduce the error rates, the simulation results are likely consistent with the experimental results, although again data for the ter Keurs et al. experiment is not known for intermediate smearing amounts. 4.3.2 Comparing Moore and ter Keurs Smearing In the reduced audibility condition, it was found that the results of the Moore smearing fit the real subjects' confusion matrices better than did the results of the ter Keurs smearing, for similar total error rates. It is not obvious why this occurred. The Moore method usually produced slightly fewer confusions than the ter Keurs method, and had larger confusions in a few locations. This may have 2 caused a few large elements in the real confusion matrices to be excluded from the X calculation, although the MSE measure also indicated that the Moore method was better, although less strongly. In any event, because of this result it can tentatively 126 be concluded that (without recruitment) the Moore smearing method produces more accurate results than the ter Keurs smearing method. This result involved the comparison of all of the simulated matrices to all of the real subjects' matrices, so it does not necessarily indicate that the Moore method is better at creating the specific confusions made by a particular subject, if the parameters in the model are set to correspond to that of the subject, even though this was true; rather, it indicates that the Moore method is better at creating confusions that are generally made by hearing-impaired listeners. However, it should be noted that though the Moore method produced better fits than the ter Keurs method, the fits still were not very accurate. 4.3.3 The Effect of Audibility In comparing the results between the reduced-audibility and full-audibility simulations, it is clear that the audibility of the different parts of the spectrum makes a great deal of difference for unrecruited stimuli. The total error rates for the unrecruited signals (both smeared and not smeared) decreased with the full audibility simulation, and the amounts of the confusions changed somewhat as well, though the locations generally remained the same. Presumably, increasing the audibility of the stimuli would allow for better discrimination because more spectral cues would be available, even if they were distorted from smearing. It is interesting to note that the total error rates of the recruited signals did not decrease very much with increased audibility. The recruitment contributed to the signal degradation both by changing the level of the signal and by introducing spectral smearing, particularly at high frequencies. In real hearing-impaired subjects, using a compression hearing aid could counteract the level-changing effects of recruitment, but will not affect the frequency smearing effects. It would be interesting to compare the relative contributions of these effects of loudness recruitment, by simulating level changes without the frequency-smearing effects and comparing that to the standard recruitment algorithm preceded by a compression stage. 127 Finally, the relative performance of the Moore and ter Keurs smearing methods (with no recruitment) with reduced and full audibility merits discussion. In the reduced audibility case, the Moore method results fit the real subjects' confusion matrices better than the ter Keurs method results, but in the full-audibility case this difference disappeared. This could mean several things. It could be that the confusions by both the Moore and ter Keurs methods were so different than the real subjects' confusion matrices that it was by chance that the Moore method fit better in the reduced audibiility simulation, and that neither really fits well. This is definitely a possibility given the large values of the X2 measures of goodness of fit. A second possibility is that the Moore method produces better results only if there are minimal spectral cues. This could be true even if it is the case that the reduced audibility simulation is terribly inaccurate, and that in the real confusion matrix experiment many more frequencies were audible than in the simulation. If the reduced audibility simulation is inaccurate, and the real subjects heard most of the frequencies, this would imply that the Moore method is not better in general, but only for the situation in which the frequency cues are minimal. 4.4 Potential Perceptual Models in Humans It was hypothesized that the primary inaccuracy in this simulation that caused relatively poor results was the fact that the the d' discrimination model was different from how real human listeners do perception. The question remains as to what would be a better model of human perception, and this section addresses that question. In the example given of two pairs of spectra that had the same d' value but varied in how hard it would be for humans to discriminate between them (figure 4-2), the primary difference between spectral pairs was the presence or absence of distinct spectral features. In Case A, the spectra would be easy for humans to discriminate because they contained sharp spectral peaks and notches. In Case B, the spectra contained no distinct features, only a difference in level between the high and low 128 frequencies. One possible perceptual model that would account for the ability of spectral features to help in signal discrimination is as follows. First, it is assumed that humans do discrimination based on the heights and frequencies of the formants. "Model" vowels (that would correspond to the response centers in this simulation) would be known to have formants of certain heights and at certain frequencies relative to one another. Discrimination of a sample unknown vowel would proceed by first identifying the formants present in the sample vowel's spectrum, via the spectral peaks relative to spectral notches. The frequencies of these formants would be noted, and then the heights of the formants would be computed relative to an average vowel spectrum. A vector of cues would then be formed of the relative distances between the formants, and the heights of the formants. Finally, a d' discrimination algorithm would be used on this cue vector, instead of on the original spectrum, and from this point the perceptual algorithm would proceed as before. 129 Average vowel durations 0.35 - 0.3- 0.25- - 0.2- Ca 0 > 0.15 - 0.1 - 0.05- AH 0 0 EE EH 11 00 UH 6 5 4 3 2 1 Average vowel durations for the vowels AH,EE,EH,11,00,UH, respectively. 7 Figure 4-1: Average vowel lengths for the different vowels. The error bars represent one standard deviation of the vowel length. The averages were nearly identical for male and female speakers. 130 Stimulus Set A Stimulus Set B a) E A f 1/A C) 2A- E f 1/2A 16 A 3 3 E 1/16 A 1/16 A Either of Stimulus Sets A or B results in perceptual maps like those below, which have different results for the simulation and human listeners. Human Results Simulation Results C .W -C 0 03 01 2. - ~01 5 o2 o2 C CL Perceptual Dimension 1 Perceptual Dimension 1 [units of d'] [units of d'] Figure 4-2: Examples of spectra. In both cases, the d' discrimination algorithm will produce equal d' distances between all stimuli. However, a real human listener could easily distinguish that the third spectrum was much different than the other two. In Stimulus Set A, the spectrum heights are A, 2A, and A, for spectra 1,2, and 3, respectively. The widths of the spectra are such that the total area under all of the spectra is the same. 131 Chapter 5 Conclusions 5.1 Summary of Findings Primarily, it was found that the d' algorithm used in this simulation used spectral cues differently than how real humans do in perception. Because of this, confusions produced by the simulation tended to not occur in the same locations as the real subjects' confusions. This was evident because the unsmeared signals were judged to have confusions generally in the same locations as the confusions from the smeared signals. Due to these factors, it was hard to judge the accuracy of the smearing models. In the case of limited audibility of the spectrum, such that just low-frequency cues are available, the Moore model did better than the ter Keurs model in terms of predicting the confusions made by the hearing-impaired listeners, but in both cases the fit was not very good. Even so, increasing error rates with increasing smearing bandwidths occurred with the Moore and ter Keurs methods, both with and without recruitment. This shows that to some degree the smearing algorithms were acting correctly. Also, recruitment caused much-increased error rates, due to the spectral smearing it caused as well as the spectral level changes, which also is consistent with its acting correctly. In general, the error rates caused by the simulation seemed to be in support of the models 132 operating correctly, with the exception of the Moore model without recruitment, and the unsmeared stimuli. We conclude that this method of simulation has potential to be very accurate, provided a good discrimination algorithm exists. Given the current discrimination algorithm, it is not possible to form strong conclusions about the relative accuracy of the Moore and ter Keurs models. 5.2 Relevance to Hearing Aid Signal Processing Assuming that the model of loudness recruitment used in this simulation is correct, it was determined that recruitment causes spectral smearing as well as causing level changes in the spectrum. This is significant because it sets a limit on how well compression hearing aids could correct for loudness recruitment. Compression hearing aids could only correct for level changes, not frequency smearing, and would could possibly introduce additional spectral changes, as well. This limit could be further studied by simulations of recruitment preceded by a compression algorithm. Also, this simulation seems to imply that there is a benefit to increasing the audibility of all frequencies, even if the ear has reduced frequency resolution at those frequencies. This conclusion is implied from the fact that in the simulation, reduced error rates were found in the simulation in which all frequencies were considered audible. This conclusion, however, is somewhat contingent on the accuracy of the discrimination model. 5.3 Recommendations for Future Work There is much potential in having a simulation of hearing loss that does not require stimuli to be presented to and classified by normal-hearing subjects, because any stimuli played to them will necessarily be single waveforms, and they will be subjected to sharp frequency and temporal analysis by the subjects' normal ears. Simulations 133 of hearing loss must necessarily operate on the different frequency bands of a signal, since in impaired ears the signal degredation occurs during the process of its being analyzed by the cochlea. However, if the results of processing in different frequency bands are added together again to form a single waveform for presentation to normalhearing subjects, many of the effects of hearing loss will be undone. In contrast, having multiple frequency bands as the output of a hearing loss simulator could produce quite accurate results, if the model is correct. However, some way of determining how humans would perceive the result of the processing is needed. So, future work on determining an accurate model of human perception could be quite beneficial. The results of such work could have application to speech recognition systems as well as to hearing aid research. Once a good model of human perception is achieved, or even with the models currently available (assuming that some other way of analyzing the results is possible), work could focus on creating improved simulations of hearing loss that would feed into these perceptual models. One possibility for an improved model is described in Appendix D. This model attempts to accurately simulate all of the cochlear nonlinearities, so effects such as two-tone suppression could be seen. Additionally, it simulates the linear response of a location on the basilar membrane to low-frequency tones. Finally, one other application for a model of human perception would be to determine to what extent the spectral analysis by a normal ear undoes the spectral smearing created by the hearing loss models currently employed that create output waveforms for presentation to normal-hearing listeners. Using a perceptual model, the limits on how much of different aspects of hearing loss could be simulated by presenting smeared sounds to normal-hearing listeners could be determined. 134 Appendix A Audiograms and Confusion Matrices for the Real Subjects Following are the audiograms of the subjects that participated in the confusion matrix experiment. The amounts of hearing loss were used to tailor the simulation to try achieve similar results to the subjects. Audiogram for Subject PW 0 12 10 frequency [Hz] Figure A-1: Audiogram for subject PW. 135 Audiogram for Subject JB 0 ...... 2 0 . ............................ ........ ............................... .... .... ...... CD ....... ...... .... .................................. ........... .. ........ 4 0 . .................. -J ........... 0) 6 0 . .. ............... .... .......... . ............... ................... cc 80 ......... . ..... .................. 10 3 10 2 frequency [Hz] Audiogram for Subject JO 0 ... .................................... ....... .... ..... ...... ...... ....... ........... ........... 20 CD ....................................... ........... 4 0 . .................. CD -J 60 . ...... .................... ....... ............. ...... ... ......... ... CU M: 8 0 ......... . . . . .... . ... . .. . ..... .... ................ 10 3 10 2 frequency [Hz] Figure A-2: Audiograms for subjects JB and JO. 136 ....... ...... Audiogram for Subject RC CO ... .......... .... ..... ...... .... . ....... 2 0 . ............................ CD -1 ... ......... ... ............ ...... ........... 4 0 . .................. ......... ... ........... .... .......... ....... ... ...... ..... .... .. 60 . ... .............. .... ........... .................. .... .... .... ..... ....... CO 80 r .................. ........... .............. ..... .... ..... .... ... .......................... 10 3 10 2 frequency [Hz] Audiogram for Subject MG 0 CO .... ................ ......... .... ..... .... ...... ...... ....... ........... 2 0 . .................. .. ..................... .. ... ........... 4 0 . .................. ........... ....... .................. .... .... ... ... ...... .............. ..... ..... 6 0 . .................. Cz 8 0 r .................. ............*..... .... ..... .. ......... ................... 10 3 10 2 frequency [Hz] Figure A-3: Audiograms for subjects RC and MG. 137 Next are tables of the real subjects' actual confusion matrices. The confusion matrices listed here had the bias removed and were reconstructed using algorithm "a" in 4 dimensions by Dr. Louis Braida at MIT. EE 5 82 98 1 0 0 0 5 1 0 1 10 I AH AH EE EH II 00 UH EH 1 0 98 0 3 1 II 0 0 1 89 1 17 00 0 0 0 1 93 55 UH 10 0 0 4 1 14 Table A.1: Confusion matrix for subject JB. AH EE EH II 00 UH AH 88 0 0 2 0 2 EE EH 0 5 0 100 98 0 1 0 6 0 0 1 II 1 0 1 89 1 9 00 UH 5 0 0 0 0 0 7 0 4 88 85 0 Table A.2: Confusion matrix for subject JO. AH EE EH II 00 UH AH 95 0 0 1 0 0 EE 2 93 0 0 0 0 EH 0 0 97 0 0 0 II 0 0 1 98 0 2 00 0 0 0 0 97 51 UH 1 5 0 0 1 46 Table A.3: Confusion matrix for subject MG. 138 AH EE EH II 00 UH AH 89 2 2 1 0 0 EE 1 97 0 0 0 0 EH 1 0 84 2 0 0 II 1 0 13 88 0 0 00 0 0 0 2 68 39 UH 6 0 0 5 31 60 Table A.4: Confusion matrix for subject PW. AH EE EH II 00 UH AH 100 0 0 0 0 0 EE 0 100 0 0 0 0 EH 0 0 93 1 11 0 II 0 0 0 92 3 3 00 0 0 6 2 84 3 UH 0 0 0 4 0 92 Table A.5: Confusion matrix for subject RC. 139 Appendix B Tables of Computed Confusion Matrices The "standard amounts of smearing" used were: a widening factor of 3 for the Moore method, 0.75 octaves for the ter Keurs method, and 28 ms for the Hou method. So, for example, the Moore method in the Smearing Multiplier = 1.5 column simulates auditory filters widened by a factor of 1.5 * 3 = 4.5. 140 AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH 97 0 8 0 0 0 96 0 7 0 0 2 87 0 11 0 0 2 95 0 12 0 0 1 EE 0 84 0 0 17 0 0 82 0 0 19 0 1 83 0 2 23 0 0 82 0 0 18 0 S.M. = 1.0 EH II 3 0 0 0 82 1 1 72 0 10 9 24 4 0 0 1 70 6 3 59 0 16 19 28 10 0 0 1 59 5 5 49 0 20 28 18 5 0 0 0 75 3 1 66 0 8 10 28 00 UH AH 0 0 97 15 0 0 0 9 8 12 15 0 73 1 0 3 63 0 0 0 93 17 0 0 1 17 7 18 20 0 64 1 0 3 48 3 1 1 83 15 0 0 1 24 9 23 20 0 53 4 0 1 50 5 0 0 95 17 0 0 0 10 12 20 0 13 72 2 0 59 1 2 EE 0 84 0 0 17 0 0 78 0 2 23 0 1 81 1 3 21 0 0 82 0 0 18 0 S.M. = 1.5 EH I 00 3 0 0 0 0 15 82 1 0 1 72 12 0 10 73 9 24 3 6 0 0 0 2 19 64 8 1 10 51 20 1 18 56 22 29 5 11 1 1 0 1 18 59 7 1 10 47 23 1 23 52 32 18 1 5 0 0 0 0 17 75 3 0 13 1 66 0 8 72 2 10 28 UH AH EE 0 97 0 0 0 84 9 8 0 15 0 0 1 0 17 63 0 0 1 89 0 0 0 74 20 7 1 17 1 4 1 1 27 42 4 0 4 78 1 0 0 79 23 11 1 16 1 6 3 1 22 44 7 0 0 95 0 0 0 82 10 12 0 20 0 0 2 0 18 59 1 0 S.M. = 2.0 EH II O 3 0 0 0 0 15 82 1 0 1 72 12 73 0 10 9 24 3 7 1 1 1 4 20 65 8 1 11 47 21 1 20 48 22 29 7 12 2 1 0 3 18 9 1 56 13 46 20 1 23 49 30 18 2 5 0 0 0 0 17 75 3 0 13 1 66 0 8 72 2 10 28 UH 0 0 9 15 1 63 3 0 18 17 2 38 6 0 22 14 3 42 0 0 10 20 2 59 Table B.1: Confusion matrices for the simulation done with subject JB's parameters. From top to bottom, for each row the smearing conditions were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE 2 87 2 77 2 14 1 4 3 21 0 6 7 78 5 69 9 11 5 5 8 27 1 6 7 78 7 68 4 23 2 8 5 26 1 7 5 85 75 5 4 16 2 6 4 18 1 6 S.M. = 2.0 S.M. = 1.5 S.M. = 1.0 I 00 UH AH EE EH I 00 UH EH II 00 UH AH EE EH 1 1 1 12 1 84 1 1 1 11 1 85 0 1 10 0 4 21 3 1 2 69 2 19 1 2 2 73 0 18 1 2 17 5 10 51 2 15 16 5 9 54 2 14 15 4 9 56 16 21 8 43 5 6 17 22 7 48 2 5 19 18 6 51 9 39 3 11 4 34 9 49 8 3 4 27 6 58 6 6 27 15 29 17 2 10 31 15 14 31 1 8 38 12 11 32 2 3 1 14 4 76 2 2 1 13 6 77 2 2 1 10 3 20 3 2 3 70 0 20 2 1 5 71 0 21 2 1 23 6 5 43 11 11 7 23 7 42 11 11 24 7 37 11 18 15 46 13 5 4 20 15 43 13 4 4 22 17 12 39 10 35 7 12 5 31 6 40 6 14 7 28 3 39 6 18 32 7 24 26 1 9 34 6 23 27 1 7 37 6 21 28 0 1 0 14 7 77 0 1 0 14 7 77 0 1 0 13 0 18 1 5 6 68 0 18 1 6 6 68 0 18 1 5 15 8 7 42 4 24 15 7 7 4 42 24 15 7 8 43 23 13 6 47 3 8 23 13 48 6 3 8 22 13 6 48 5 52 5 7 5 26 5 53 6 6 5 26 5 52 6 6 39 8 33 13 1 7 39 7 13 33 1 7 39 7 13 33 1 1 0 9 5 85 1 1 0 9 5 85 1 1 0 9 0 18 1 1 5 75 0 18 1 1 5 75 0 18 1 1 15 5 8 51 4 16 15 5 8 51 4 16 15 5 8 51 23 13 52 5 2 6 23 13 5 52 2 6 23 13 5 52 4 58 6 18 10 4 4 58 6 10 18 4 4 58 6 10 40 10 8 35 1 6 40 10 8 35 1 6 40 10 8 35 Table B.2: Confusion matrices for the simulation done with subject JB's parameters.From top to bottom, for each row the smearing conditions were: Moore sm. then recruitment, ter Keurs sm. then recruitment, Hou sm. then recruitment, and just recruitment. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE 0 100 0 94 1 0 0 0 4 0 0 1 0 99 0 94 1 4 1 0 8 0 0 1 0 92 0 86 1 6 5 0 12 0 0 2 0 100 0 93 1 0 1 0 8 0 0 1 S.M. = 2.0 S.M. = 1.5 S.M. = 1.0 I 00 EH IIJ O UH AH EE EH II O UH AH EE EH 0 0 0 0 0 100 0, 0 0 0 0 100 0 0 0 5 1 0 94 0 0 5 1 0 0 94 0 5 1 0 0 3 95 1 0 1 0 3 95 1 0 1 0 3 95 6 3 90 0 0 0 6 3 90 0 0 0 6 3 90 89 5 0 4 0 2 89 5 0 4 0 2 89 5 0 1 5 1 0 1 92 1 5 1 0 1 92 1 5 1 0 0 8 0 90 1 0 0 5 0 94 0 0 0 1 15 4 0 0 81 0 13 2 0 0 85 0 6 1 0 0 6 74 1 9 9 0 5 77 1 8 7 0 3 85 16 8 50 4 0 20 13 7 59 2 0 13 8 5 74 66 0 13 18 0 2 76 9 0 14 0 1 84 7 0 8 24 9 0 3 68 6 19 6 0 2 75 1 6 17 0 1 11 0 87 2 0 1 8 0 89 3 0 0 6 14 6 1 0 79 0 11 6 1 0 82 0 9 5 1 1 9 60 0 11 15 1 64 11 1 9 11 0 70 11 20 11 45 4 0 16 21 10 49 4 0 8 18 9 60 58 1 21 0 17 4 60 1 21 13 0 6 61 1 20 3 20 21 0 1 56 5 21 17 0 2 64 4 17 13 0 0 0 0 0 100 0 0 0 0 0 100 0 0 0 6 1 0 0 93 0 6 1 0 0 93 0 6 1 0 0 6 90 1 0 3 0 6 90 1 0 3 0 6 90 8 5 85 1 0 1 8 85 5 1 0 1 8 5 86 85 6 0 8 0 1 85 6 0 8 0 1 85 6 0 3 4 2 0 1 90 3 4 2 0 1 90 3 4 1 UH 0 0 1 0 2 92 2 0 10 21 3 55 2 0 18 21 3 54 0 0 3 1 1 90 Table B.3: Confusion matrices for the simulation done with subject JO's parameters. From top to bottom, for each row the smearing conditions were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). AH EE 0 99 AH 0 88 EE 0 6 EH 0 0 II 13 0 00 0 2 UH 0 92 AH 0 86 EE 0 6 EH 2 0 II 0 20 00 0 4 UH 0 AH 100 81 0 EE 0 0 EH 0 0 II 0 21 00 0 2 UH 0 AH 100 84 0 EE 1 0 EH 0 0 II 16 0 00 0 1 UH S.M. = 2.0 S.M. = 1.5 S.M. = 1.0 EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH 2 0 0 8 0 91 0 0 0 3 0 96 0 0 0 1 0 20 3 0 0 77 0 10 2 0 0 88 0 11 1 0 10 0 6 72 0 11 9 0 5 76 0 10 12 0 3 79 21 14 9 52 3 0 17 15 7 59 1 0 8 17 5 71 3 64 0 11 0 22 1 78 9 0 0 12 1 76 0 10 59 10 21 7 0 3 68 8 7 15 0 2 70 3 9 16 2 0 1 10 0 88 3 0 0 6 0 91 3 0 0 5 0 15 3 1 0 81 0 14 3 1 0 82 0 10 3 1 19 5 0 45 14 17 19 3 48 18 0 12 23 1 59 10 21 22 38 12 6 0 16 29 9 43 3 0 11 25 8 52 4 56 1 21 18 0 3 59 1 21 16 0 4 57 0 19 51 5 23 19 0 3 52 3 18 23 0 4 53 4 20 19 0 0 0 0 0 100 0 0 0 0 0 0 100 0 0 0 0 17 1 0 0 82 0 17 1 0 0 81 0 17 2 0 14 0 4 82 0 0 14 0 4 82 0 0 14 0 4 83 3 14 7 76 0 0 3 13 77 7 0 0 2 13 6 78 2 73 6 0 0 19 2 72 6 0 0 20 2 70 7 0 75 4 4 16 0 2 76 4 3 15 0 2 4 78 2 14 0 0 0 0 0 100 0 0 0 0 0 0 100 0 0 0 0 14 2 0 0 84 0 14 2 0 0 84 0 14 2 0 6 0 5 88 1 0 6 0 5 88 1 0 6 0 5 88 2 10 5 83 0 0 2 10 5 83 0 0 2 10 5 83 3 75 5 0 0 16 3 75 5 0 16 0 3 75 5 0 81 3 1 13 0 1 81 3 1 13 0 1 81 3 1 13 Table B.4: Confusion matrices for the simulation done with subject JO's parameters.From top to bottom, for each row the smearing conditions were: Moore sm. then recruitment, ter Keurs sm. then recruitment, Hou sm. then recruitment, and just recruitment. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE 0 100 0 95 1 0 0 0 4 0 0 1 0 99 0 93 1 2 1 0 3 0 0 1 0 92 0 85 1 4 6 0 12 0 0 3 0 99 94 0 0 0 1 0 5 0 0 1 S.M. S.M. = 1.0 EH II O UH AH EE EH 0 0 0 100 0 0 0 0 0 95 0 4 1 0 95 1 0 1 0 3 95 3 0 0 0 6 3 90 0 4 0 2 90 4 0 1 0 1 94 3 1 1 5 0 94 0 0 0 0 0 0 89 0 6 1 0 77 1 8 5 0 5 88 8 3 0 5 8 6 80 0 8 0 2 90 5 0 5 0 2 89 4 2 4 6 0 89 5 0 0 3 1 0 81 0 7 7 1 66 0 9 13 0 71 12 15 4 0 5 17 12 61 2 11 0 6 67 1 14 10 0 2 72 5 9 11 0 0 99 1 0 0 0 0 0 94 0 5 1 0 91 0 0 2 0 7 91 5 1 0 0 8 5 86 0 4 0 1 90 4 0 2 0 1 90 4 3 2 S.M. = 2.0 1.5 UH I OO UH AH EE EH IIO0 0 0 0 0 0 0 100 0 0 0 4 1 0 0 95 0 4 1 1 0 3 95 1 0 1 0 3 0 6 3 90 0 0 0 6 90 2 90 4 0 4 0 2 90 4 94 3 1 1 0 1 94 3 1 1 0 0 9 0 90 1 0 0 0 13 4 0 0 82 0 9 2 10 0 6 73 0 10 8 0 6 20 13 8 53 6 0 15 9 65 4 70 0 10 0 15 3 84 6 59 9 22 7 0 3 74 7 13 3 0 1 10 0 86 4 0 1 0 13 7 3 0 77 0 11 7 14 0 63 10 0 13 13 1 11 18 18 4 15 45 0 13 20 48 5 63 1 16 15 0 5 65 17 64 7 16 13 0 1 67 7 13 1 0 0 0 0 99 1 0 0 0 5 1 0 0 94 0 5 1 2 0 7 91 0 0 2 0 7 0 8 5 85 1 0 0 8 86 1 90 4 0 5 0 1 90 4 89 4 3 2 0 1 89 4 3 = Table B.5: Confusion matrices for the simulation done with subject MG's parameters. From top to bottom, for each row the smearing conditions were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE 99 0 0 88 5 1 0 0 0 14 0 1 91 0 0 85 1 5 0 2 0 20 3 0 100 0 0 79 0 1 0 0 18 0 1 0 100 0 0 86 0 1 0 0 0 15 1 0 S.M. S.M. = 1.0 EH I OO UH AH EE EH 97 0 2 0 0 1 0 0 0 84 0 1 11 0 80 3 0 11 8 1 75 1 6 0 5 69 17 9 0 75 1 0 16 0 10 1 0 4 8 16 3 72 0 7 5 0 0 3 89 0 0 84 1 11 1 3 9 1 52 62 11 1 21 3 9 11 0 26 8 53 0 17 60 3 0 17 1 19 20 2 56 3 0 18 0 0 0 0 100 0 0 0 0 79 0 0 2 19 1 83 83 5 0 12 0 4 0 0 7 7 77 12 0 19 0 75 2 0 5 3 3 78 1 0 3 15 0 0 0 0 100 0 0 0 0 2 12 0 0 86 0 7 0 1 89 89 4 6 6 80 12 3 0 0 15 0 78 3 0 0 4 1 0 2 4 82 2 12 = 1.5 II OO 0 1 4 56 8 25 0 2 16 42 21 23 0 1 5 77 5 15 0 2 4 80 4 12 0 14 0 15 75 6 0 13 2 29 58 1 0 19 0 12 75 3 0 12 0 12 78 4 UH 1 0 12 23 1 64 3 0 20 17 3 55 0 0 12 4 2 78 0 0 7 3 3 82 AH EE 91 0 0 81 1 11 0 3 0 17 2 0 87 0 0 82 15 1 1 7 0 17 3 0 100 0 0 79 0 1 0 0 0 19 0 1 100 0 0 86 0 1 0 0 0 15 0 1 S.M. = 2.0 EH II OO 0 7 0 0 3 16 72 6 0 13 9 50 0 8 71 12 6 21 11 1 0 15 12 5 48 15 10 39 22 0 20 59 20 22 4 0 0 0 1 19 0 83 4 0 12 7 76 0 5 75 3 15 3 0 0 0 0 2 12 4 0 89 6 80 12 0 4 78 2 12 4 UH 1 0 10 25 4 59 2 0 18 21 3 51 0 0 12 5 2 77 0 0 7 3 3 82 Table B.6: Confusion matrices for the simulation done with subject MG's parameters.From top to bottom, for each row the smearing conditions were: Moore sm. then recruitment, ter Keurs sm. then recruitment, Hou sm. then recruitment, and just recruitment. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE 0 100 0 88 0 0 0 0 0 12 0 1 0 99 0 93 0 2 0 0 10 0 0 1 0 92 0 87 0 4 4 0 17 0 0 3 0 100 89 0 0 1 0 0 0 10 0 1 S.M. S.M. = 1.0 EH EE AH UH 00 II EH 0 0 0 100 0 0 0 0 0 88 0 12 0 0 95 0 0 3 0 2 95 2 0 0 1 13 2 84 0 0 12 4 75 9 0 3 0 1 88 2 6 3 5 0 94 0 0 0 0 0 0 90 0 6 0 0 77 0 8 7 1 4 87 8 3 0 8 16 6 70 0 6 0 2 79 9 0 5 0 2 89 1 5 3 6 0 90 3 0 0 5 1 0 85 0 9 4 1 63 0 9 18 1 65 13 10 4 0 8 24 8 57 1 0 16 6 58 0 19 12 0 2 62 2 18 15 0 0 0 100 0 0 0 0 0 89 0 10 0 0 90 0 1 5 0 4 90 4 0 0 2 20 4 75 0 11 0 3 75 0 12 8 0 1 78 2 8 12 S.M. = 2.0 = 1.5 II 00 UH AH EE EH I 00 UH 0 0 0 0 0 0 100 0 0 0 12 0 0 0 88 0 12 0 3 0 2 95 0 0 3 0 2 1 13 2 84 0 0 1 13 84 4 75 9 0 12 0 4 75 9 88 2 6 3 0 1 88 2 6 1 0 0 9 0 90 1 0 0 0 13 4 0 0 82 0 8 2 10 0 6 73 0 10 8 0 6 20 13 8 53 6 0 15 8 66 4 70 0 10 0 15 3 85 6 59 9 21 7 0 3 74 7 13 4 0 1 10 0 86 4 0 0 0 12 5 2 0 80 0 10 5 15 0 11 61 0 13 16 1 11 19 17 14 45 5 0 16 23 47 5 59 1 17 0 18 5 59 19 63 6 11 19 0 1 4 67 15 0 0 0 0 0 100 0 0 0 0 10 0 0 0 89 0 10 0 6 0 4 89 0 1 5 0 4 2 19 74 4 0 0 2 19 75 3 75 0 12 0 10 3 75 12 77 2 8 12 0 1 78 2 12 Table B.7: Confusion matrices for the simulation done with subject RC's parameters. From top to bottom, for each row the smearing conditions were: No smearing, Moore sm. only, ter Keurs sm. only, Hou sm. only. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). 00 AH EE 0 97 AH 0 84 EE 0 6 EH 1 0 II 19 0 00 0 2 UH 1 87 AH 0 80 EE 1 8 EH 3 0 II 0 24 00 0 2 UH 0 97 AH 0 78 EE 0 9 EH 0 0 II 21 0 00 0 0 UH 0 98 AH 81 0 EE 0 6 EH 1 0 II 18 0 00 0 0 UH S.M. = 2.0 S.M. = 1.5 S.M. = 1.0 EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH 3 1 0 6 0 91 1 0 0 5 0 94 0 0 0 3 0 20 4 1 0 74 0 18 2 0 79 0 0 16 0 0 19 2 62 11 1 6 19 2 62 11 0 6 17 1 8 68 21 22 41 11 4 0 21 21 9 47 2 0 20 19 4 56 3 46 2 19 30 1 1 56 18 2 0 23 1 66 1 14 7 42 0 20 27 4 45 6 18 28 0 3 47 4 18 29 6 2 2 10 1 79 3 1 1 9 1 84 1 1 1 10 0 17 1 0 82 0 0 17 1 0 0 82 0 19 1 0 24 1 52 12 2 9 24 3 54 10 1 8 25 2 7 57 15 19 17 42 6 1 17 21 12 45 4 0 22 22 5 48 2 56 19 2 20 1 2 56 1 19 0 22 2 55 0 19 38 2 32 20 0 7 42 2 32 20 0 4 47 2 28 22 0 0 0 3 0 97 0 0 0 3 0 97 0 0 0 3 0 21 1 0 78 0 0 21 1 0 78 0 0 21 1 0 12 1 6 73 0 8 12 1 6 73 0 8 11 1 6 73 26 15 2 56 0 0 26 15 2 57 0 0 26 15 2 57 1 70 8 0 0 20 1 69 8 0 0 21 1 69 9 0 49 3 11 35 0 0 50 3 11 35 0 0 50 3 11 35 0 0 0 2 0 98 0 0 0 2 0 98 0 0 0 2 0 18 1 0 81 0 0 18 1 0 0 81 0 18 1 0 11 1 5 77 0 6 11 1 5 77 0 6 11 1 5 77 20 14 4 61 1 0 20 14 4 61 1 0 20 14 4 61 2 72 8 1 0 18 2 72 8 1 18 0 2 72 8 1 49 5 13 33 0 0 49 5 13 33 0 0 49 5 13 33 Table B.8: Confusion matrices for the simulation done with subject RC's parameters.From top to bottom, for each row the smearing conditions were: Moore sm. then recruitment, ter Keurs sm. then recruitment, Hou sm. then recruitment, and just recruitment. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE 0 100 0 91 0 1 1 0 8 0 0 1 0 99 0 91 1 3 0 0 10 0 0 1 0 92 86 0 1 6 5 0 0 15 0 3 0 100 0 91 0 1 2 0 12 0 0 1 S.M. = 2.0 S.M. = 1.5 S.M. = 1.0 EH II 00 UH EE AH EH II 00 UH AH EE EH II 00 UH 0 0 0 0 0 0 100 0 0 0 0 0 100 0 0 0 0 9 0 0 0 91 0 9 0 0 0 91 0 9 0 0 3 0 3 93 0 1 3 0 3 93 0 1 3 0 3 93 1 14 82 2 1 0 1 14 2 82 1 0 1 14 2 82 3 80 8 0 8 0 3 80 8 0 8 0 3 80 8 0 80 1 7 11 0 1 80 1 7 11 0 1 80 1 7 11 1 0 0 8 0 91 0 0 0 5 0 95 0 0 0 1 0 21 3 0 76 0 0 18 1 0 0 81 0 9 0 0 19 0 5 0 68 8 16 0 3 75 1 6 14 0 2 81 19 22 8 48 3 0 22 20 3 54 1 0 8 18 3 71 2 55 0 18 0 25 1 64 0 15 0 20 0 77 0 13 42 8 20 28 0 2 49 6 18 26 0 1 72 2 11 15 2 0 1 10 0 88 1 0 0 9 0 90 1 0 0 6 0 18 1 0 0 80 0 15 1 1 0 82 0 12 2 0 21 1 9 59 0 10 21 1 1 59 11 8 1 19 62 11 20 20 43 12 4 0 18 23 7 49 4 0 7 24 7 58 4 57 2 20 17 0 4 59 1 21 15 0 5 60 1 19 48 2 31 18 0 2 50 1 28 18 0 2 52 1 29 15 0 0 0 0 0 0 100 0 0 0 0 0 100 0 0 0 0 9 0 0 0 91 0 9 0 0 0 91 0 9 0 0 7 0 4 0 87 1 8 0 4 0 87 1 7 0 4 87 2 13 4 79 2 0 2 13 4 79 2 0 2 13 4 79 2 78 7 0 12 0 2 79 7 0 0 12 2 79 7 0 74 2 8 15 0 1 74 2 8 15 0 1 74 2 8 15 Table B.9: Confusion matrices for the simulation done with subject PW's parameters. From top to bottom, for each row the smearing conditions were: No smearing, Moore sm. only, ter Keurs sm! only, Hou sm. only. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE EH II 00 UH AH EE 0 98 83 0 0 5 1 0 0 20 0 1 1 85 0 78 2 6 4 0 0 27 0 2 0 97 0 76 0 9 1 0 21 0 0 0 0 98 0 80 1 6 2 0 19 0 0 0 S.M. = 2.0 S.M. = 1.5 S.M. = 1.0 EH II OO UH AH EE EH II OO UH AH EE EH II 00 UH 1 0 0 6 0 93 1 0 0 3 0 96 0 0 0 2 0 20 3 1 0 76 0 18 2 0 80 0 0 17 1 0 18 1 63 10 0 8 18 1 65 11 0 5 16 1 67 11 21 24 41 10 3 0 19 22 8 49 2 0 19 19 6 55 2 50 2 19 0 27 1 57 18 2 0 22 1 64 1 15 40 8 0 21 28 3 45 6 20 27 0 2 4 43 19 32 4 2 2 7 1 84 2 1 1 10 1 84 1 1 0 11 0 17 1 0 81 0 0 18 1 0 0 81 0 21 2 0 25 2 49 13 2 8 24 3 53 11 2 7 24 3 7 57 17 21 18 38 5 1 19 23 13 41 4 0 23 22 6 44 2 54 2 19 22 1 2 54 1 19 0 23 1 53 0 19 39 2 33 20 0 4 41 2 31 23 0 3 44 2 27 24 0 0 0 3 0 97 0 0 0 3 0 97 0 0 0 3 0 23 1 0 0 76 0 22 1 0 0 76 0 23 1 0 11 2 9 0 70 8 11 2 8 70 0 9 10 2 8 70 29 14 51 5 1 0 29 14 4 51 1 0 29 14 4 52 3 68 7 1 21 0 3 68 8 0 0 21 3 68 8 1 46 4 11 38 0 0 48 4 11 37 0 0 48 4 10 37 0 0 0 2 0 98 0 0 0 2 0 98 0 0 0 2 0 18 2 0 80 0 0 18 2 0 0 80 0 18 2 0 9 1 8 75 1 6 9 1 8 75 1 6 9 1 8 75 24 14 5 55 2 0 24 14 5 55 2 0 24 14 5 55 3 69 8 1 19 0 3 69 8 1 0 19 3 69 8 1 44 6 12 38 0 0 44 6 12 38 0 0 44 6 12 38 Table B.10: Confusion matrices for the simulation done with subject PW's parameters.From top to bottom, for each row the smearing conditions were: Moore sm. then recruitment, ter Keurs sm. then recruitment, Hou sm. then recruitment, and just recruitment. The amount of smearing in each column was the "Smearing Mulitplier" times a 'standard amount of smearing' (see beginning of the appendix). Appendix C Plots of x 2 Values This appendix contains plots showing how well the simulated confusion matrices match the real subjects' confusion matrices. To determine how well the confusion 2 matrices match, a X2 measure and a similar measure ((0 - E) ) were employed. The natural log of each is plotted in the top and bottom parts of each figure, respectively. Each plot compares one simulated confusion matrix to the real confusion matrices from all of the subjects. The title of each plot specifies the subject whos audiogram was used to generate the simulated confusion matrix, and the labels at the bottom of each column specify the real confusion matrix that it is being compared to. Each sub-column (indicated by the dotted lines) corresponds to one of the eight processing conditions. A label at the top of the sub-column indicates the processing condition for that column. The top symbol stands for the type of smearing, if any: '-' = no smearing, 'M' = Moore smearing, 'K' = ter Keurs smearing, and 'H' = Hou smearing. The bottom symbol stands for whether the smearing was followed by recruitment or not: '-' = no recruitment, and 'R' = recruitment. In the title, 'S.M.' stands for "smearing multiplier," which is a scale factor indicating the amount of smearing. The actual amount of smearing used to generate the stimuli were (3 times the auditory filter width for the Moore method, 0.75 octaves for the ter Keurs method, and 28 msec for the Hou method) times the smearing multiplier. So, for example, if a plot says S.M. = 2.0 then that means e.g. the Moore method used smearing of 6 times 151 the normal auditory filter width. The first 15 plots in this appendix correspond to the reduced audibility condition (i.e. the normal simulation), and the last 15 plots correspond to data generated assuming that all frequencies were audible. These latter 15 are labeled as such in their captions. 152 ln(Chi-squared) measure for the simulation fitted to subject JB with S.M.=1.0 . .H.- - MK H MK H. . . . . . . . . . . . .- . MK . .H .. MK H- - MK .H .MK - .MM--MM R RR R -RRRR--- R R RR--. R R RR-R R RR-- - -8 7 -- ci4 O . H - 2 1 0 5 ) :PW. MG: JB : RC : 40 35 30 25 20 15 10 Processing condition and real subject compared to the simulation in( (O-E)2 ) measure for the simulation fitted to subject JB with S.M.=1.0 5 -M - -- K H M KH--M K H M K H- - MK H MK H - RRRR---- RRRR - RRRR- -- -'M KH M K H- RR RR MK H'MK' HM -RRR R 4 3 2 C,*J 0 1 0 -1 JO. JB -2 I 5 MG, RC- Pw 30 25 10 20 15 Processing condition and real subject compared to the simulation 35 Figure C-1: Original simulation including reduced audibility. 153 40 ln(Chi-squared) measure for the simulation fitted to subject JB with S.M.=1.5 9 - .. .. . . . . . .H. M M. MK H MK H- - MK H MK H - - MK H MK H- - MK H MK H-- MK HMK HR R R -R-R-R-R - R RR RI-*- -*- RR-R-R R.R R R--- -.- 8 7 6 5 co 4 -B 3 2 1 J:j~ 0 5 RC- :~ MG: 4) 35 30 25 20 15 10 Processing condition and real subject compared to the simulation In( (O-E)2 ) measure for the simulation fitted to subject JB with S.M.=1.5 5 I '. :MK HMK- H- -- RR RR- MK: HM K H'- - - - R'R'R R - MK H MK H - -M RR R R K: H M K H'- - I - - MK*H MKH- -RPR R'R -* - -- RRRR - 4 3 76J 2 0\ 1 0 -1 JO: JB -2 ) 5 MG RC PW 30 25 20 15 10 Processing condition and real subject compared to the simulation 35 Figure C-2: Original simulation including reduced audibility. 154 40 ln(Chi-squared) measure for the simulation fitted to subject JB with S.M.=2.0 9 .'R. ' . MK H MK H - - -- 8 MH' MK H MK H-- MK H MKH K- HK HR' Re R R - - -* - R'R' R'R -- -- -'R' R' R' R M RKH- -R'R: R'R 7 cD6 :3 u4 2 1 JO JB 0I t IL I ____I__I__ Pw MG: 30 25 20 15 10 Processing condition and real subject compared to the simulation 5 RC : : : 4) 35 0 In( (O-E)2 ) measure for the simulation fitted to subject JB with S.M.=2.0 - MK HM K H- - -- - RRRR---- - MK HM K H - - MK H MK H - R R R'R R R R R - MK H M K H - - MK H MK H RR RR ---- R R R R ---- 4 3 2 C\J 1 C 0 -1 JO: JB -2 ) 5 MG: PW 30 25 20 15 10 Processing condition and real subject compared to the simulation RC 35 Figure C-3: Original simulation including reduced audibility. 155 40 In(Chi-squared) measure for the simulation fitted to subject JO with S.M.=1.0 9 - MK H MK H-I- MK H MK H -I-MK H MK H*-- MK H MK H- - MK H MK HRRR R---RR R R -'- -R'R R R ---. . . .R - .-'- . .- . . .R .R- --. . . RRRR----RR -'-"-'-'R'R'R 8 7 SD6 0 54 :3 2 1 JB 0 JO 5 MG: PW 30 25 20 10 15 Processing condition and real subject compared to the simulation RC 40 35 In( (O-E)2 ) measure for the simulation fitted to subject JO with S.M.=1.0 5 MK H MK H- -MK HMKH- - MK H'MK H- -MKH MKH-- MK H MK H R RR . .R .R . . .--. --. . .- .RRRR --R R R R-I --- -RRRR---. . . ....... 4 3 2 W 0 C 1 0 -1 JO JB 0 5 MG PW 30 25 20 15 10 Processing condition and real subject compared to the simulation RC 35 Figure C-4: Original simulation including reduced audibility. 156 40 In(Chi-squared) measure for the simulation fitted to subject JO with S.M.=1.5 9 _ MK H MK H-I- MK HM K HR'R'R'R --.R'RR:R1 8 K H MK H -'--R'RR -- MK H MK MRR R- -I- RR MK H MK H- R R R R - 7 6 Cz a)5 LFL. -0 2 1 0 5 --- Jo - ' 5 - MG PW: 30 25 20 15 10 Processing condition and real subject compared to the simulation FC 40 35 In( (O-E) ) measure for the simulation fitted to subject JO with S.M.=1 .5 I . 1 r r 1 . .. . - 1 - .-MK H MK HMK: HMK H- MK H MK H- - MK H MK H -K' H-MK: H RRR RRR R---- . . . .R .R .R.R . .-R '- RRR R--R R R'R ----. . . . 4 3 2 C\ 1 0 -1 JB -2 0 I 5 JO MG RC PW 30 25 20 15 10 Processing condition and real subject compared to the simulation 35 Figure C-5: Original simulation including reduced audibility. 157 40 In(Chi-squared) measure for the simulation fitted to subject JO with S.M.=2.0 n1- MK H MK H*- - MK H MK HR* R RR '-'R"R'R* R - - - - R R R R- MK H MK H - -- 8 - - - - - -' -'H MK H MK HR R R R - M 'H MK HMK RR R R 7 - L 6 -5 Co - 4 C) c3 2 1 JB C I 5 JO MG: P W. . 30 25 20 15 10 Processing condition and real subject compared to the simulation RC: 40 35 ln( (0-E)2 ) measure for the simulation fitted to subject JO with S.M.=2.0 -I r. . . . 1 - 5 I - H-- M K H M K H- - MK H M K H - - MK HMK*H :HM MK HK H R RRR R RR R R .R- R R RR----R . . .R .R*.R .R ---. . . . .R .. 4 3 2 0 1 0 -1 JB -2 0 I 5 Jo MG PW 30 25 20 10 15 Processing condition and real subject compared to the simulation RC 35 Figure C-6: Original simulation including reduced audibility. 158 40 In(Chi-squared) measure for the simulation fitted to subject MG with S.M.=1.0 9 MK H M K H-- M -- R RR R- -- 8 MK H MK HMK H K HM K H MHK HK H- - MH - R RR R RR R R-- - --- R -. R RR R--- 7 a)6 Cz 05 -C PW1 2 1 JO JB 01 1 5 0 M G: 1 1L 30 25 20 15 10 Processing condition and real subject compared to the simulation 40 35 In( (O-E) ) measure for the simulation fitted to subject MG with S.M.=1.0 MK H -- MK -MK H MKH- - H MKR. H-- M MKR"RR: R R R---'-R'R'RR-R R - R - -- MK H MK H- R .MK - .- .H MK HR*R'RR---'-RR*R'R -R 4 3 2 C\J w 0: 1 0 -1 -2 r ) Jo JB 5 MG RC PW . 30 25 10 20 15 Processing condition and real subject compared to the simulation 35 Figure C-7: Original simulation including reduced audibility. 159 40 In(Chi-squared) measure for the simulation fitted to subject MG with S.M.=1.5 9 . . . I - I MK H MK H- -'MK H MK H-I- MK H 'MK H -R R R . . . .R.R. R-R - - -- R R'R'R -.-.-.-. R R R R- - - -R . - MK H MK H - 8 - MK H MK H . . . . . RRR 7 2 1 JO JB Cz RC- .PW: 30 25 20 15 10 Processing condition and real subject compared to the simulation 5 0 MG: 35 4C In( (O-E) ) measure for the simulation fitted to subject MG with S.M.=1.5 -MK - - MK H MK H- MKHMK:H - MK*HMK H-- MKH:MK HMK HR-R RR R R - - - R R R R -* -. -. - R RRRR-R - R R - R- R-- -'- . . . . .- - . . . . . ~ .- CD C\1 wU - 1 2 JB J. MG PW .RC . . . 2- - 0 5 30 25 20 15 10 Processing condition and real subject compared to the simulation 35 Figure C-8: Original simulation including reduced audibility. 160 40 9 - 8 In(Chi-squared) measure for the simulation fitted to subject MG with S.M.=2.0 I. I 1 . . . . MK H MK H-I- MKHMK H-I- MK H MK H-I- MK H MK H- - MK H MK H-- -- .R .RR - RR R -"- R R R R- -'- - R RRR -. .R .--. . . .R .R.R 7 6 D-5 4 - M .H - J -C. JO K C-) c53 2 1 0 0 5 ..PW MG SI Jo 30 25 20 15 10 Processing condition and real subject compared to the simulation RC 35 40 In( (0-E)2 ) measure for the simulation fitted to subject MG with S.M.=2.0 5 I I - MK'H M K H' -- - -R: R*R R -'M K --- -I H M K H - - MK H: M K H - - M KH M K H RR RR RRRR--- - RRRR-- -- -MK H MK H - ---- R RR R 4 -LT3 CDJ 2 0 1 0 RO- -1 Jo JB -2 0 5 J MG: Pw 30 25 20 15 10 Processing condition and real subject compared to the simulation RC. . . 35 Figure C-9: Original simulation including reduced audibility. 161 40 In(Chi-squared) measure for the simulation fitted to subject PW with S.M.=1.0 1 9 8 . -MK H MK H-RR R - -: MK H MK H- J13 JO - RRR R H RMK MK H ---- R R RR - MK H MK H -- - MK H MK - H MKR R RR RRRR---- 7 'a6 Co 05 4 3 2 1 0 Co Pw 30 25 20 15 10 Processing condition and real subject compared to the simulation 5 ) MG 40 35 In( (O-E)2 ) measure for the simulation fitted to subject PW with S.M.=1.0 5 MK H --- . 1 . [ . . . -MKHMKH -- MK H MK H-I- MK H MK H'-- MK'H MK HRRRR RR R R---R R R R- --R RRR ---R R R R---- MK H- 4 3 2 C\J 0 1 0 -1 JB -2 0 I 5 JO. - MG RC PW: 30 25 20 15 10 Processing condition and real subject compared to the simulation 35 Figure C-10: Original simulation including reduced audibility. 162 40 In(Chi-squared) measure for the simulation fitted to subject PW with S.M.=1 .5 9 - MK H MK H - - MK H MK H- - MK H MK H- R'RRR--- - R R R ---- R R: R R - -R 8 - - - R MK H MK H R-R RR - MK H MK H- -- R R'RRF 7 RC.. 6 ~ (D5 -3 2 _j-- 1 0 JO- .JB 0 PW : :G 5 40 35 30 25 20 15 10 Processing condition and real subject compared to the simulation In( (O-E)2 ) measure for the simulation fitted to subject PW with S.M.=1.5 5 . . . . . . . . _MK H MK H -- MK H MK H -R R-[ -- - '- .. MK H MK H -- MK H MK H -R R RRRR RR--R I.. MK*H MK H R RR - 4 3 2 0 1 j-- 0 -1 S -2 ) JB J0, 5 MG' PW 30 25 20 15 10 Processing condition and real subject compared to the simulation RC 35 Figure C-11: Original simulation including reduced audibility. 163 40 In(Chi-squared) measure for the simulation fitted to subject PW with S.M.=2.0 9 . .1 RR -- R 8 I I I RR R--- K HMH MK H MK H- - R- ---- - MK H MK H - - 'RR R-R - MK H MK H KH M K Hi - - MK H MK H - - 7 o5 04 FL s3 J 2 1 JO: J13 A 0 5 MG: RC PW: 40 35 30 25 20 15 10 Processing condition and real subject compared to the simulation In( (0-E)2 ) measure for the simulation fitted to subject PW with S.M.=2.0 5 I ,... -M --- K H:MK H '-R'R* R'R - K H: M K H-- - -R* R R*R M K: H M K H MRR R R - - M K H M K. H- R'R'R'R R RRR 4 3 C\1 2 w 1 -IB -- :JB: Jo 0 -1 -2 0 5 MG . P.W. 30 25 20 15 10 Processing condition and real subject compared to the simulation RC 35 Figure C-12: Original simulation including reduced audibility. 164 40 In(Chi-squared) measure for the simulation fitted to subject RC with S.M.=1.0 I 9 MH MK H -i- MK H MK H- - MK HMK H --- -- R.RR _ _: - RRR-' :- m~ 8 M KH M K* HR R - MK H MKH-RC RR R 7 .: .: . .. . '6 05 - :3 M : 2 1 JO. JB 0 1 MqG- RC: 40 35 30 25 20 15 10 Processing condition and real subject compared to the simulation 5 In( (O-E)2 ) measure for the simulation fitted to subject RC with S.M.=1.0 r 5 JBRR R R MK H MK H - ---- R RRR - - I - - - - M K: H MK H RRR R - ---R* R R R - - -.- R R RR- MK HG K'H- - M K H M K H- 4 3 2 0 0 -1 JB . 0 5 JO . MG . PW 30 25 20 15 10 Processing condition and real subject compared to the simulation RC 35 Figure C-13: Original simulation including reduced audibility. 165 40 In(Chi-squared) measure for the simulation fitted to subject RC with S.M.=1.5 - 9 - MK H MK H-- MK H MK H- -'MK H MK H - -R R. RR - -. R R R R R RR-- - - 8 K'H'MK:H'- - MK H MK H - R 7 F6 - MK: H MK H - 2 1 JI3 0 5 4 MG: JO: PW....................... RC: 4) 35 30 25 20 15 10 Processing condition and real subject compared to the simulation In( (O-E)2 ) measure for the simulation fitted to subject RC with S.M.=1.5 5 1 . -- M K: H MK H - - M K H M K H -RRRR-- -- RRRR - -MK H MK H R RRR - - R: RPRwR MK H MK H - - - MK H MK H - R- RRRR I 3 2 0 1 0 -1 JB 0 JO. I 5 - RC MG: 30 25 20 15 10 Processing condition and real subject compared to the simulation 35 Figure C-14: Original simulation including reduced audibility. 166 40 In(Chi-squared) measure for the simulation fitted to subject RC with S.M.=2.0 - 9 - MK H MK H-- MK H MK HRRRR RR RR ---- 8 - I MK H MK H - - MK H MK H R - RR R - .. R R'R'R MK H MK H RR~ - I .............. 7 (D6 5 .. R ci4 RC -3 E53 2 1 0 JO: J3 5 0 MG- PW 4) 35 30 25 20 15 10 Processing condition and real subject compared to the simulation In( (O-E)2 ) measure for the simulation fitted to subject RC with S.M.=2.0 -r 5 - H-- MK H.M HMK M K H MK H -KMK RRRR RRRR---RRRR---- RRRR---- MK H MK H -- - L JB - - M K H M K H- R R"R R 4 3 (D C 0 2 1 0 -1 -2 ) JO. 5 MG PW 30 25 20 15 10 Processing condition and real subject compared to the simulation RC 35 Figure C-15: Original simulation including reduced audibility. 167 40 ln(Chi-squared) measure for the simulation fitted to subject JB with S.M.=1 .0 9 - MK H MK H - - - - R 'R RR : 8 I KKMK MK H MK H - - MK H MK H R'R R R ----R R RR -'M - MK H MK H - - MK H MK H- R R R'R R R'R'R 7 6 CO (D5 --- a4 04 - -1 MKHM 3 2 :G 1 JE3 0 5 PW JO- I - - 30 25 20 15 10 Processing condition and real subject compared to the simulation 35 40 In( (0-E)2 ) measure for the simulation fitted to subject JB with S.M.=1.0 5. . 1... . . 1 I . MK H M K' H- -MK HMK H-- MKHMKH- -- MK H K H- - MK:H MK HRR R -RR RRRR RRR R----'-R'RR* R ---- RRRR- --4 3 (U 2 1 C 0 -1 JO JB -2 0 5 MG' PW 30 25 20 15 10 Processing condition and real subject compared to the simulation Figure C-16: All frequencies audible condition. 168 . RC 35 40 ln(Chi-squared) measure for the simulation fitted to subject JB with S.M.=1.5 KH - - MK H MK H-I- MK H MK H- - MK H MK H- MKHK -- - -R R R R- - - - R R R R-'- -'- , R'R' RR- - -. 1- ' , 8 I , 7 -MKH M K H-- - -R R'R*R - 4 C05 :3 2 1 JB 0 I 5 JO I MG: - CO PW: 40 35 30 25 20 15 10 Processing condition and real subject compared to the simulation In( (0-E)2 ) measure for the simulation fitted to subject JB with S.M.=1.5 5 -:MK --- H MK H- - MK H MK H-- MKHMK H-- MK H MKH R RRRRRRR---RRRR- --RRR R---- - MK:H MK H- RRR - 4 3 2 0 1 C 0 -1 Jo JB -2 0 5 MG, PW 30 25 20 15 10 Processing condition and real subject compared to the simulation Figure C-17: All frequencies audible condition. 169 RC 35 40 9 - ln(Chi-squared) measure for the simulation fitted to subject JB with S.M.=2.0 I MK H MK H-- MK H MK H- - MK H MK H- - MK H MK H - - MK H MK H- R R --- 8 R R -.- *- - R RR R - - -- RRR--- - R R RR 7 6 -J co4 3 .-- 2 1 JB 0 0 FW: MG: 5 30 25 20 15 10 Processing condition and real subject compared to the simulation RC 35 40 In( (0-E) ) measure for the simulation fitted to subject JB with S.M.=2.0 5 . .T I . . :MK H MK H - -K R R R'RI - - -R -- . . I . .. MKHMK H- - MK'H MK HK H MKK-H-- K RR'RR--. . . .R'.RR . . . .R . .R .R- .- -.--. RRR . . . R'R . R---. 4 -Lr-L 3 2 C\J 0 1 -R- - RC 35 40 0 -1 Jo JB -2 0 5 MG* Pw 30 25 20 15 10 Processing condition and real subject compared to the simulation Figure C-18: All frequencies audible condition. 170 ln(Chi-squared) measure for the simulation fitted to subject JO with S.M.=11.0 9 - MK H MK H R"R R R R R R ---- MK HMKH - -MKHMKH- - R R R - .R* 8 - MK H MK H - - MK H MK H --R R RR- - -- R R RR 7 6 CO -05 - - .PW - -K 2 1 0 JO: 30 25 20 15 10 Processing condition and real subject compared to the simulation 5 ) RC- M G: 40 35 In( (0-E)2 measure for the simulation fitted to subject JO with S.M.=1 .0 - - HM RR RR- -M MKK H MKH- - - - RRRR--- .W - 4 - - - MK H MK RRRR-M -- MK: H MKHH R'R R -R R'R R R-- - 3 - CI 0 - - 2 r ~ 1 - JB .RC ... J. MG PW 0 -1 -2 0 5 30 25 20 15 10 Processing condition and real subject compared to the simulation Figure C-19: All frequencies audible condition. 171 35 40 In(Chi-squared) measure for the simulation fitted to subject JO with S.M.=1.5 9 - MK H MK H - - -R* R R R 8 MK H MK H - - MK H MK H - - MK H MK H MKHMK H R RRR- -- '-'RR R R R'R R R RR RR ---- 7 6 05 :3 2 1 JO JB 0 5 RC- :PW. MG 30 25 20 15 10 Processing condition and real subject compared to the simulation 35 4) In( (O-E) 2 ) measure for the simulation fitted to subject JO with S.M.=1.5 5 . - . I. I I . - I MK H MK H - - M K: H M K H M K: H MK H M K H M K: H MK H M K H RRRR--- -. R R R R R - ----- R R -- R R R R -----RR . 4- 376 2: 0 1 C 0-1 - -MG 0 5 30 25 15 20 10 Processing condition and real subject compared to the simulation Figure C-20: All frequencies audible condition. 172 35 40 ln(Chi-squared) measure for the simulation fitted to subject JO with S.M.=2.0 9 I I - MK H MK H'- - MK H MK H-I- MK H MK H'- - MK H MK H R'R'R'R -R- R R R R---.R RR R--RRR ---- 8 M K'H M K R R R R 7 MK H MK H - - 6 05 2 1 0 JO: -JB 5 ) Pw . MG I _ I _ _ _ .I I RC: I__ 30 25 20 15 10 Processing condition and real subject compared to the simulation 40 35 In( (O-E)2 ) measure for the simulation fitted to subject JO with S.M.=2.0 5 I - MM-- - - - - R RRR- I -M - -- * I * HMK - - R RRR- MK H MKH -- - - M K HMK H - - R RRR ---- RR R R - K: H' MK'H - -RR RR 4 3 2 1 0 -1 JB -2 JO 5 MG. PW 25 30 10 15 20 Processing condition and real subject compared to the simulation Figure C-21: All frequencies audible condition. 173 RC 35 40 In(Chi-squared) measure for the simulation fitted to subject MG with S.M.=1.0 9 HR-- -MK H -. MK R RIR 8 .. . . . . . ... . H-MK*H R R - - HMK - R R HRR R - MK - - -RMK H MK HR--MK- H RR- MK RR H-I-*MK R RI-.- - -R*R 7 a)6 CO 05 3 2 1 0 JO . . . JB 0 RC. FW: M G- 40 35 30 25 20 15 10 Processing condition and real subject compared to the simulation 5 In( (O-E)2 ) measure for the simulation fitted to subject MG with S.M.=11.0 M - H MK H - - MK H MKH - - MK H MK H - RRR R---- R RRR-- RR R MK H MK H- - - - - R'R R R- MK H MK H- P-W - RC. - RR R'R --- 4 3 2 0 1 C 0 -1 JO JB -21 ) I 5 I MG Ii 30 25 20 15 10 Processing condition and real subject compared to the simulation Figure C-22: All frequencies audible condition. 174 35 40 In(Chi-squared) measure for the simulation fitted to subject MG with S.M.=11.5 9 . . MK . H MK H -I- .MIK H MIK .H-. .H-I. . MK . . H .MK H. -I-_:MK H MK H -I- MK H MK R-R RR -RR R R R R R RI---R . . .R .R .R- . --. . .R.R .R RI- 8 7 D6 5 RC 4 1 JO JB3 -C 5 0 :PW MG: 35 30 25 20 15 10 Processing condition and real subject compared to the simulation 40 In( (O-E)2 measure for the simulation fitted to subject MG with S.M.=1.5 K H - - - - - M M K H M K H'K H RRR RRRRR-- -- HMK HRR R R-- K R R R R- MKH - -- MK: H MK HR RR R --- 3- 0 I JO JB 0 5 . . . . . . .... . . MG: . ._.. . . . . . . . WR 30 25 20 15 10 Processing condition and real subject compared to the simulation Figure C-23: All frequencies audible condition. 175 35 40 ln(Chi-squared) measure for the simulation fitted to subject MG with S.M.=2.0 I 9 - 8 MK H MK H - - MK H MK H-- MK H MK H- - MK H MK H R RR - RR RR - - - - R R R R- --- '-'R'R RR - MK H MK H - 7 (D6 05 C4 2 1 JO JB C 0 R:- . . .PW. :M:G. 35 30 25 20 15 10 Processing condition and real subject compared to the simulation 5 40 In( (O-E) ) measure for the simulation fitted to subject MG with S.M.=2.0 -MK - HM RR- H - - - -R R R R R R- R R M K H MK H - -'M - H MKH- R--- -. -R - M K: H MK H - R R R R 3 -~ (D 0 0- .. - -~ - 2 0 JOMGPFC JB 5 _ . _. _. _. . . ... 40 35 30 25 20 15 10 Processing condition and real subject compared to the simulation Figure C-24: All frequencies audible condition. 176 ln(Chi-squared) measure for the simulation fitted to subject PW with S.M.=1 .0 9 MK - MK H MK H- - MK H"MK H-I - MK H - RR R R R R- - - - R R R R I- --- 8 H'- - MK H MK H - - MK H MK H R'RRF RR --RRRR ----- 7 6 -05 3a M: 2 1 0 K JO JB 5 ) Co W 40 35 30 25 20 15 10 Processing condition and real subject compared to the simulation s3 In( (O-E)2 ) measure for the simulation fitted to subject PW with S.M.=1.0 5 -'R R - R R- R - MK:H MK R. :MK:H:MK:H-. .H-. .H .MK . .H.-- .MK . .H .MK . H-I MK.. *H .MK .H.-- .MK RRRR RRRR---R---R R R -RR R R RIRR R -- - 4 2 C\ 1 0 -1 JB -2 I ) 5 Jo P MG: RC I 30 25 20 10 15 Processing condition and real subject compared to the simulation Figure C-25: All frequencies audible condition. 177 35 40 ln(Chi-squared) measure for the simulation fitted to subject PW with S.M.=1.5 9 - MK H MK H - -R R R R - - MK HMK H - - MK H MK H - - MK H MK H - - MK H MK HR R R RRRR----R - R'RRR- - -R R RR - -.- 8 7 06 5 C- .3 2 1 J- 00 JO RW: MG- 30 25 20 15 10 Processing condition and real subject compared to the simulation 5 In( (O-E) ) measure for the simulation fitted to subject PW with S.M.=1.5 .M . 51 MK H .:.. . .M- RC40 35 - KH MK HMK HMK H- - MK H MK H-- MKHK'MKH RR RR - - -- R RRR--- -- R R RR----RRR K H- -RRRR-- 4 3 2 N' 0U 1 0 -1 JB -2 ) I 5 JO I MG PW 30 25 20 15 10 Processing condition and real subject compared to the simulation Figure C-26: All frequencies audible condition. 178 RC 35 40 ln(Chi-squared) measure for the simulation fitted to subject PW with S.M.=2.0 MK H MK H - - MK H MK H --'-R* R'R'R - - -- R RR R - 8 MK H MK H - - R - -. - MK H MK H - - MK H MK H -R-R-R-. . .- .R.R.R.R . . . . . . . -RRR--- - 7 7 6 c4 :3 ~o 0 2 1 0 JO JB 5 ) MG: PW . 30 25 20 15 10 Processing condition and real subject compared to the simulation 40 35 In( (O-E)2 ) measure for the simulation fitted to subject PW with S.M.=2.0 5 I I II -MKH M H - - MKHMK - - M K: H MK H'- - M K H M K H - - M K: H'M K HRRRR- - - - R R R R ... -R R R'R R -- - -R RRR---- R-RR R 4 3 2 0 1 0 -1 JB -21 0 I 5 Jo I MG PW 30 25 10 20 15 Processing condition and real subject compared to the simulation Figure C-27: All frequencies audible condition. 179 RC. - 35 40 ln(Chi-squared) measure for the simulation fitted to subject RC with S.M.=1.0 9 MK H MK H - - MK H MK H R-R R R- - RJ R'R'R 8 MK H MK H R- R R R -'MK'H - K H MK HKMK H R- RRRR- - - - RRRR 7 (D6 05 C4 :3 : 2 1 JO: JB 01 MG: - PW 35 30 25 20 15 10 Processing condition and real subject compared to the simulation 5 40 In( (O-E) ) measure for the simulation fitted to subject RC with S.M.=11.0 5 -:MK ---- - 4 H MK H- - MK H MK'H- RRRR--- - R R R R MK H MK H - R R R R - MK H MK H - - MKH MK H RRRR---- -R - - - - M . .. ~ .3 .(D - 2 w 0 1 - - - CMJ .RC ~J ..... MG PW . 0 -1 -2 0 5 30 25 20 15 10 Processing condition and real subject compared to the simulation Figure C-28: All frequencies audible condition. 180 35 40 ln(Chi-squared) measure for the simulation fitted to subject RC with S.M.=1.5 9 . .H .MK H-. .H-I. .MK .MK . .H .MK - RRR- --- - - R R R R MK H MK H-- - - R R R - R-- MK H MK H--- R R R R MK H MK H R R 8 ' -' 7 '36 -7 H- :3 05 c4 -c - . .J 2 1 5 0 40 35 In( (0-E)2 ) measure for the simulation fitted to subject RC with S.M.=1 .5 - 5 R-VV: MG JO 30 25 20 15 10 Processing condition and real subject compared to the simulation -'MK -- HMK H - '-R'R'R:R - MK HMK H-I- MK H MK H-R'R R R-RR R --- MK H MK H-- MK:H MK HR .R*. R". R . . . .RR . - .- . - . R . RR' 4 ---- RRRR 3 R- i;. 2 0 1 0 -1 JO JB -2 0 5 I MG PW 30 25 10 20 15 Processing condition and real subject compared to the simulation Figure C-29: All frequencies audible condition. 181 ...RC : 35 40 ln(Chi-squared) measure for the simulation fitted to subject RC with S.M.=2.0 9 8 . 1 I I - - - MK H MK H- - MK H MK H- - MK H MK'H'R R R R .. R.RRRI-- -'-'PR PRR MK:H- - -MKH: MK H MK H- - -R'R*R'R ---- R*R R R 7 'a6 D05c4 =3 2 1 JO JB 0 C RCR P'W: 40 35 30 25 20 15 10 Processing condition and real subject compared to the simulation 5 In( (O-E) ) measure for the simulation fitted to subject RC with S.M.=2.0 5 I. M KHMH M RRRR - - . . KHM-MHMH K HM - . M K H: M K H-- M K H M K H - R RR R RRR R----- RR R R- RRRR---- --- . .. . 4 3 . . . .. . . . ... .... . . . . . . . . . . . . . . . . . . . . .... . . . .. .. .. .. .. .. . . . . .... 2 C'J . . . 0 .. . . .. . .... . ... . .. 1 . . . ..... 0 . . . . . . .. .. . . . . . . . .. . . . . ... .. . . . . . . . ... . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . ... . . ................... . . . -1 -2 JB . ... a : 0 5 JO MG ... RC' P.W I _______ 30 25 20 15 10 Processing condition and real subject compared to the simulation Figure C-30: All frequencies audible condition. 182 _____ 35 L_ 40 Appendix D Recommendations for an Improved Simulation D.1 Rationale and Background It has been found that a location on the basilar membrane (BM) has a transfer function that acts compressively to tones at its characteristic frequency (CF), while it responds linearly to tones an octave or more below its CF (Oxenham and Plack, 1997). The current simulation does not take this effect into account. Also, it is desired in general to have an improved simulation of hearing loss. This appendix describes one possible model that should address these concerns. D.2 Overview of Model The model proposed here is largely based on a model of the BM's response to low-side suppression. The model is described in Cai and Geisler (1996), and it simulates the response of a single location on the BM to a low-frequency suppressor tone and a tone at the CF of the BM location that is suppressed. The model described here uses the same principles found in that model, but expands it to multiple locations on the BM. It generally simulates the pattern of BM motion found in the ear, to produce 183 a set of "channels." These channels correspond to the pattern of BM motion that would be detected by a single inner hair cell in the cochlea. Or, the channels could be thought of as the outputs of a set of filters similar to auditory filters. Although it is desirable to have a simulation that can produce a single waveform that can be played to normal-hearing people in order to simulate hearing loss, it is probably not possible to achieve all of the effects of hearing loss if this is done (Moore et al., 1992). However, this thesis made use of a perceptual model that used the outputs of a set of auditory filters to do discrimination. This model is potentially quite powerful, as it does not require a signal to be analyzed by a normal ear first, which would tend to remove the simulated effects of hearing loss due to its good frequency and time resolution. Due to these potential benefits, a model of simulating hearing loss is described that has multiple outputs corresponding to the outputs of a filter bank. The resulting signal could be analyzed by either the perceptual model used in this thesis, or another model that takes similar inputs. D.3 Details of Model A diagram outlining the model is shown in figure D-1. The model works as follows. First, a signal is filtered into a number of frequency bands by a series of logarithmically-spaced filters, such that the filters are of increasing width. This step is crudely intended to approximate the filtering that would occur on the basilar membrane in the absence of active amplification and tuning. The entire model, including this step, uses filtering of the entire signal on a continuous-time basisthat is, it does not break the input signal up into frames as was done in the Moore and ter Keurs methods in this thesis. This is to prevent the negative effects of the overlap-add method, as explained in Chapter 2 of this thesis. What follows is easiest to explain from the perspective of how to form a single channel at the output. Each channel, as previously stated, corresponds to the pattern of BM motion that would be detected by a single inner hair cell. This pattern is similar 184 to that seen at the outputs of auditory filters, if they are simulated, but it differs in some ways. The channels include low frequencies with a gain of 1, while conventional auditory filters exclude low frequencies. Also, the frequency selectivity in general of these filters is probably different from that of typical auditory filters. To produce a channel, the outputs from each of the initial bandpass filters are put through a set of gains, then added together. This is to simulate the magnitudes of travelling waves on the BM as they pass by a particular location corresponding to the channel. A plot of a possible set of gains corresponding to a single channel is shown in the bottom right portion of the diagram. At the top of the plot are shown the contributions from low frequencies: since a location on the BM responds linearly to frequencies an octave or more below its CF, the low frequencies contribute with a gain of 1 to the current channel being examined. In the middle of the plot, it can be seen that frequencies around the frequency corresponding to the CF of the channel are amplified (the details of how much amplification is used will be described shortly). At the bottom of the plot, the frequencies higher than the channel CF make no contribution to that location on the BM. Once the contributions of all the input frequencies are added together, the result is a channel. The particular gains used on each of the input frequencies are not fixed, however. It is known that the BM responds compressively to sounds within about 30-90 dB SPL, and so this must be taken into account in the model. How this is done is based on the Cai and Geisler (1996) model previously mentioned. The model simulates the effect of outer hair cells causing active amplification of the BM motion. This is done by placing the signal corresponding to the BM motion at the location being examined (i.e., the channel, in this model) through a nonlinear transfer function that represents the response of OHCs. This transfer function between the BM motion and the "OHC output affecting the gain" is sigmoidal and asymmetric, as shown in the middle left of the diagram. This output from the OHC transfer function is then put through a lowpass filter to simulate how high-Q filters cannot respond instantaneously to changes in gain. Finally, the output from this low-pass filter is used to select a set 185 of gains for the input frequencies contributing to the channels. The particular set of gains chosen to use on the input frequencies will depend on the level of the channel. It is in this manner that the compression of the BM is taken into consideration. Also, the set of gains will change shape somewhat with level. This results from the effect known as "the upward spread of masking;" the change in gain curves with level can be seen in the plot at the bottom of the diagram. In the plot, at higher levels the curves amplify more and more low frequencies. The exact curves used for the gains are the responses of the BM to off-CF tones, which are still being experimentally determined. Also, it should be noted that the feedback system between the channels and the gains of the input frequencies could vary considerably, as long as the appropriate set of gains is selected for a given level of the channel. Due to this model's combining different frequencies nonlinearly to simulate a single channel, various nonlinear phenomena of the BM such as suppression should be accurately simulated at the outputs. D.4 Simulating Hearing Loss The model described thus far is intended to simulate a normal ear. To alter the model to produce the responses corresponding to an impaired ear, the only parameter that needs changing is the outer hair cell transfer function. Since the compressive action of the outer hair cells drives the BM nonlinearities, this compression simply must be removed to simulate hearing loss. So, the outer hair cell transfer function should be made more linear, which will cause the on-CF frequencies to be amplified less, and thereby effectively include more of the surrounding frequencies, causing frequency smearing. Also, the level of the channel will be decreased because less gain is being applied. This simulates the absolute threshold reduction found in impaired ears. The effect of loudness recruitment is also implied through this reduction in the compressive nature of the outer hair cells. 186 D.5 Possible Sources of Error In the paper by Cai and Geisler (1996), they commented that the cutoff frequency of the lowpass filter in their model needed to be adjusted for different frequency inputs. It may be the case that in this model as well, a fixed cutoff value for the lowpass filter for a given channel will not produce correct gains for all possible sets of input frequencies. Also, the filtering at the input of this model was somewhat arbitrarily determined, and may not be completely accurate. Finally, it may prove to be difficult to find the correct sets of gain curves for various channel levels. 187 LPF LOW Freq. Compression Channel ft 1 LPF Compression Channel m,2 Original_ Signal LPF Compression Channel so 3 LPF Compression Channel N High Freq. Output Output Input Compression Normal-hearing Listener Input Impaired-hearing Listener Gain = 1 = Variable gain depending on output from the LPF. The pattern of gains for a channel should resemble the following diagram: A set of gains is chosen depending on the output of the Compression and LPF stages. Low Freq. [ >'-s Center Freq. of the channel High Freq. 1_ 's frequency Figure D-1: Diagram of new hearing loss model. 188 Gain of different frequencies contributing towards a channel References Baer, Thomas, and Moore, Brian C. J. 1993. Effects of spectral smearing on the intelligibility of sentences in noise. J. Acoust. Soc. Am., 94(3), 1229-1241. Baer, Thomas, and Moore, Brian C. J. 1994. Effects of spectral smearing on the intelligibility of sentences in the presence of interfering speech. J. Acoust. Soc. Am., 95(4), 2277-2280. Braida, Louis D. 1991. Crossmodal integration in the identification of consonant segments. The Quarterly Journal of Experimental Psychology, 43A(3), 647-677. Farrar, Catherine L., Reed, Charlotte M., Ito, Yoshiko, Durlach, Nathaniel I., Delhorne, Lorraine A., Zurek, Patrick M., and Braida, Louis D. 1987. Spectralshape discrimination. i. results from normal-hearing listeners for stationary broadband noises. J. Acoust. Soc. Am., 81(4), 1085-1092. Graf, Issac John. 1997 (Sept.). Simulation of the effects of sensorineuralhearing loss. Master's thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science. Hou, Zezhang, and Pavlovic, Chaslav V. 1994. Effects of temporal smearing on temporal resolution, frequency selectivity, and speech intelligibility. J. Acoust. Soc. Am., 96(3), 1325-1340. Leek, M. R., Dorman, M. F., and Summerfield, Q. 1987. Minimum spectral contrast for vowel identification by normal-hearing and hearing-impaired listeners. Acoust. Soc. Am., 81, 148-154. 189 J. Moore, B. C. J., Wojtczak, M., and Vickers, D. A. 1996. Effect of loudness recruitment on the perception of amplitude modulation. J. Acoust. Soc. Am., 100, 481-489. Moore, Brian C. J., and Glasberg, Brian R. 1993. Simulation of the effects of loudness recruitment and threshold elevation on the intelligibility of speech in quiet and in a background of speech. J. Acoust. Soc. Am., 94(4), 2050-2062. Moore, Brian C. J., and Glasberg, Brian R. 1997. A model of loudness perception applied to cochlear hearing loss. Auditory Neuroscience, 3, 289-311. Moore, Brian C.J., Glasberg, Brian R., and Simpson, Andrew. 1992. Evaluation of a method of simulating reduced frequency selectivity. J. Acoust. Soc. Am., 91(6), 3402-3423. Moore, Brian C.J., Glasberg, Brian R., and Vickers, Deborah A. 1995. Simulation of the effects of loudness recruitment on the intelligibility of speech in noise. British Journal of Audiology, 29, 131-143. Nejime, Yoshito, and Moore, Brian C. J. 1997. Simulation of the effect of threshold elevation and loudness recruitment combined with reduced frequency selectivity on the intelligibility of speech in noise. J. Acoust. Soc. Am., 102(1), 603-615. Oxenham, A. J., and Plack, C. J. 1997. A behavioral measure of basilar-membrane nonlinearity in listeners with normal and impaired hearing. J. Acoust. Soc. Am., 101(6), 3666-3675. Oxenham, Y., and Geisler, C. D. 1996. Suppression in auditory-nerve fibers of cats using low-side suppressors. III. Model results. Hearing Res., 96, 126-140. Patterson, R. D. 1974. Auditory filter shape. J. Acoust. Soc. Am., 55, 802-809. Rosen, S., and Fourcin, A. 1986. Frequency selectivity and the perception of speech. In: Moore, B. C. J. (ed), Frequency selectivity in hearing. London: Academic. 190 Steinberg, J. C., and Gardener, M. B. 1937. The dependence of hearing impairment on sound intensity. J. Acoust. Soc. Am., 9(1), 11-23. ter Keurs, Mariken, Festen, Joost M., and Plomp, Reiner. 1992. Effect of spectral envelope smearing on speech reception. I. J. Acoust. Soc. Am., 91(5), 2872-2880. ter Keurs, Mariken, Festen, Joost M., and Plomp, Reiner. 1993. Effect of spectral envelope smearing on speech reception. II. J. Acoust. Soc. Am., 93(3), 15471552. 191