A ‘Goodness of Fit’ calculator Mary Hostler, John Hostler, John Bamford, Helen Whitehouse. HCD group, University of Manchester. Introduction The use of DSP hearing aids means that a wide range of options and settings is available to the audiologist at fitting. These options include fast acting, nonlinear wide dynamic range compression, multiple channels, frequency shaping in numerous bands, noise reduction algorithms, speech enhancement algorithms, feedback suppression, multiple memories and multiple microphones. One consequence of this flexibility is that at the verification stage of the fitting process DSP hearing aids can be matched more closely to prescribed targets (O’Donnell 2001). However in order to achieve this, audiologists have to make subjective judgements about a ‘good’ match to targets, using real ear gain and sound pressure level measures, or coupler measures incorporating Real Ear to Coupler Differences (RECDs). Guidance for making these judgements is given in ‘good practice guidance for adult hearing aid fittings and services’ by Gatehouse et al (2000). It is recommended that tolerances of only +/- 5dB at frequencies of 250, 500, 1000 and 2000 Hz and of +/- 8dB at 3000 and 4000 Hz are acceptable. It is also recommended that the slope in each octave should be within +/5dB/octave of the target slope. These ideas have been reflected in a few 1 studies that have proposed objective scaled measures of ‘goodness of fit’ (GoF). Hall and Rowson (Hall, 1997) developed a measure based on initial proposals by Bamford (1997 unpublished). Their method used a calculation of the difference in dB between the actual gain and the target values at four frequencies of 250, 500, 1000, and 2000, Hz. In order to account for slope, a doubling of the difference value was used where there was a change in direction. This value was then subtracted from one hundred, giving a higher score for a closer fit. Hall’s method makes only a crude adjustment for slope differences and it penalises ‘under-fit’ equally to ‘overshoot’, although in terms of access to the Long Term Average Speech Spectrum (LTASS) the effects of these errors are very different. ‘Under-fit’ reduces access to softer speech sounds whereas the consequences of ‘overshoot’ depend on dynamic range and ‘headroom’ in the hearing aid output limitation settings. However Hall’s GoF measure was used, though not further developed, by Dighe (2001) when investigating discrepancies in real ear measures of children fitted with hearing aids chosen either by traditional or theoretical fitting methods. In the course of our work on the Modernising Children’s Hearing Aid Services (MCHAS) project we decided to try to develop an improved GoF measure, with the intention of: Providing a more objective measure of how well a hearing aid fitting meets targets (relying less on the subjective judgement of individuals) and thus facilitating hearing aid selection and verification procedures; 2 Investigating the differences between proprietary fittings and generic (published) fitting targets; Interpreting variance in outcome and benefit measures. We recognised, of course, that ‘goodness of fit’ is an essentially contestable concept, and one that can be defined in many ways. An essential part of our work was to refine the concept so that a sensible numerical score could be given. We made a number of assumptions, including: A ‘good’ fit is one where the actual gain meets but does not greatly exceed the target gain at each frequency; The ‘match’ between actual and target gain is more crucial at some frequencies than at others; A ‘good’ fit would be expected to correlate well with other measures (speech discrimination, aided audibility measures etc.), although this will also depend on the validity of the targets prescribed by a particular fitting procedure; There would be a reasonable degree of agreement among experienced professional audiologists as to what constitutes ‘goodness of fit’. Developing a GoF calculator We decided at the outset that the GoF measure should be calculated by a spreadsheet program. We thought it would be ideal if the user could simply type in the figures for target gain and actual gain at various frequencies and 3 have the spreadsheet calculate a GoF score automatically: this became the design brief that was ultimately achieved. We worked towards it by an iterative process. The earliest version simply tested the concept and set up a spreadsheet in which target and actual gains were typed in and a score was generated using arbitrary weightings and measures. These measures and weightings were then refined and adjusted so as to produce a better match with other measures referred to above. The current version of the calculator is Mark IV, and the arithmetic implemented in the spreadsheet is explained in Appendix 1. As mentioned above, an important requirement for us was that scores generated by the calculator would correlate with the subjective judgements of experienced clinicians, which we assumed would themselves show a large measure of agreement. In order to test this hypothesis, and to establish a benchmark by which the calculated GoF scores could be verified, we used representative data from 30 of the subjects participating in one of the MCHAS studies and from one site. Hearing thresholds for each subject were provided by the participating site and we calculated the DSL target gain values for each of them. (DSL targets were used in this exercise, but the GoF calculator could be used to rate the fitting to any prescription target). These target gain values were accompanied by the actual hearing aid gain measures that had been gathered by audiologists on site as data for the MCHAS study. GoF scores for the closeness of the 30 hearing aid fittings to the targets were also generated by the calculator. The fittings and targets, displayed both graphically and in tabular form, were sent to 15 experienced paediatric audiologists in the UK, 4 USA and Canada. These individuals were requested to score each example for ‘goodness of fit’ with respect to aided hearing for speech at normal conversational levels, using a five point scale in which 1 = very good and 5 = poor. However no guidance was given to them on how ‘goodness of fit’ was to be interpreted, since we wished to ascertain whether (as hypothesised) they would agree on this. Ratings on the 1 – 5 scale for all 30 fittings were received from all the clinicians we contacted. Understandably there was greater agreement over some fittings rather than others, but the ratings varied by no more than one scale point for 40% of the fittings and by no more than two for 90%. One respondent’s ratings correlated with the average ratings for the whole group at only 0.81, but for all the rest the correlation was between 0.91 and 0.96 (mean = 0.94), supporting our hypothesis that there would be a high degree of unanimity among professionals. The GoF scores produced by the Mark IV calculator correlated with the average of the clinicians’ ratings for each fitting at 0.923. GoF and AAI As noted above, one of our assumptions in developing the GoF calculator was that the scores it produces would correlate with other measures as well. One measure that particularly interested us was the Aided Audibility Index (AAI), for which scores were available in the MCHAS study data. These scores had 5 been generated by a computer program called the Situational Hearing-Aid Response Profile (SHARP) developed by Stelmachowicz, Lewis, Karasek & Creutz (1994). The AAI in the SHARP program, like other Audibility Indices, is based on signal audibility above hearing threshold in a number of frequency bands, and utilises a frequency importance weighting which relates to the contribution made by each frequency band to speech recognition. The program calculates the AAI as a score between 0 and 1 (1 means that the signal is fully audible) and is designed to accommodate non-linear systems using gain information for a range of input levels across 13 different speech spectra. We hypothesised that if the GoF scores from our calculator were really an objective measure of ‘goodness of fit’, there would be a strong correlation with the AAI scores for the same hearing aid fittings. We calculated the GoF and AAI scores for 97 analogue hearing aid fittings and 98 DSP nonlinear fittings and found that the correlation for the latter (with AAI scores for speech at 1 meter) was significant beyond the p = 0.01 level. Correlations between the AAI scores and the GoF scores for analogue hearing aids were not significant, which accords with conclusions reached by O’Donnell (2002) who found that DSP hearing aids are significantly better at meeting fitting targets than aids using analogue technology. Future developments The present version of the GoF calculator (Mark IV) has been in use for the past year and our colleagues are now planning significant improvements to it. 6 These developments are intended to overcome some of the limitations that derive from the relatively simple algorithm that the calculator employs (Appendix 1). The developments are planned with four main objectives in mind: To reflect the different prescriptive procedures that are used to generate the fitting targets. Different procedures have differing rationales, and the consequences of exceeding or under-achieving the target figure at a particular frequency therefore depend on the importance given to the target figure by the specific prescriptive procedure that is employed. To take better account of the degree of hearing loss of the hearing aid user. Clearly, the practical significance of exceeding or falling short of the target gain will depend on the user’s residual dynamic range: for a severeprofound hearing loss, for example, the consequences of falling short of the target will be more serious than for a mild loss. To reflect better the non-linearity of the latest DSP aids, which typically apply differential gain to different levels of signal input. Ideally the calculator should accept data for actual gain achieved in quiet, moderate and loud conditions, and should compute from these an overall ‘goodness of fit’. To incorporate the saturation response to an input of 90 dB or more. At present the calculator accepts data relating to gain (actual and target) alone: ideally it would take account also of output levels and apply a penalty when these reach or exceed the predicted ULLs. Conclusion 7 The GoF calculator as it stands (Mark IV) generates scores that were found to correlate strongly with the subjective judgements of experienced clinicians when it was validated. It has a potential clinical use at the verification stage of the hearing aid fitting process as a tool for providing a quick and simple means to judge the closeness of a hearing aid fit to targets. The calculator has proved easy to use. With further refinement and validation the calculator could be utilised by clinicians and Teachers of the Deaf as a quick means of rating a hearing aid fitting, with possible further uses in research into different prescriptive targets and their relationship to hearing aid benefit and satisfaction. References Dighe, A. (2001) Discrepancies in real ear measurements between two groups of school children with sensori-neural hearing loss wearing hearing aids chosen by a traditional or theoretical fitting method. Unpublished MSc dissertation, University of Manchester. Gatehouse, S., Stephens, S.D.G., Davis, A.C. and Bamford, J.M. (2001) Good practice guidance for adult hearing aid fittings and services. BAAS newsletter, issue 6. Hall, R.L. (1997) The hearing aid outcome measures of self-perceived benefit, satisfaction and use and their relationship with goodness of fit of a hearing aid. Unpublished MSc dissertation, University of Manchester. O’Donnell, J. (2001) Achieving DSL prescriptive targets with analogue and digital hearing aids. Unpublished MSc Dissertation, University of Manchester. 8 Stelmachowicz, P., Lewis, D., Karasek, A., & Creutz, T. (1994) Situational Hearing Aid Response Profile (SHARP version 2.0) Omaha, Neb.: Boys Town national Research Hospital. Appendix 1 HOW THE GOODNESS OF FIT CALCULATOR WORKS Summary The Mk IV calculator uses three measures to calculate ‘goodness of fit’. It allocates ‘penalty points’ for badness of fit in respect of three features of the fitting. It uses these points to calculate the three measures independently of each other and then it combines the measures to calculate a GOF score. The three features measured are: A) ‘Close fit’, for which ‘badness’ is the difference between the actual gain (AG) and the target gain (TG) at each frequency. B) ‘Similar shape’, for which ‘badness’ is the extent to which the shape of the AG curve differs from the shape of the TG curve. C) ‘Adequate gain’, for which ‘badness’ is the difference between the total AG provided by the hearing aid and the total TG required. A) How ‘close fit’ is measured At each frequency, the difference between AG and TG is calculated. Differences of less than 2 dB are awarded zero penalty points (at each frequency). 9 Differences that exceed the following limits are awarded a maximum of 4 penalty points (at each frequency): Maximum penalties awarded when AG 500 Hz 1 kHz 2 kHz 4 kHz 15 dB 15 dB 20 dB 25 dB 20 dB 20 dB 25 dB 25 dB is less than TG by more than these limits: (the aid is underfitted) Maximum penalties awarded when AG is more than TG by more than these limits: (the aid is overfitted) Differences greater than 2 dB and less than the limits are awarded penalty points pro rata, up to a maximum of 4 points (at each frequency). The number of penalty points awarded is divided by the maximum (at each frequency), and the mean of those scores constitutes measure A. B) How ‘similar shape’ is calculated Two measures are used to calculate the similarity between the TG curve and the AG curve. These are the ‘gradient’ and the ‘slope’ between the gain figures at each adjacent pair of frequencies. The gradient between each pair of figures is defined as the second minus the first. Thus if the TG at 1 kHz is 22 dB and at 2 kHz it is30 dB, the gradient is +8 dB. To measure the similarity of gradients, the gradient for each pair of AG figures is subtracted from the gradient for the corresponding TG pair. 10 If the difference between the gradients is 15dB or more, two penalty points are awarded; if it is less, penalty points are awarded pro rata, up to a maximum of two (for each pair). The slope between each pair of figures is defined as follows: ‘up’ if the gradient is positive and greater than +1; ‘down’ if the gradient is negative and less than –1; ‘flat’ if it is neither. The similarity of slopes is measured as follows: if the slope between an adjacent pair of AG figures is the same as the slope between the corresponding TG pair (e.g. both gradients are positive and greater than +1), no penalty points are awarded; if the two slopes are different, one penalty point is assigned if either slope is ‘flat’ and two penalty points are allocated if one is ‘up’ and the other is ‘down’. The total number of penalty points awarded for both gradient and slope is divided by the possible maximum: this constitutes measure B. C) How ‘adequate gain’ is calculated The total of the TG figures, for all four frequencies, is calculated. The total of the AG figures is also calculated. The difference between the two totals, divided by the TG total, constitutes measure C. 11 How the GOF score is calculated The three measures – A, B and C – are weighted in the ratio 3:1:1, summed and then divided by 5. The result is subtracted from 1 in order to produce a GOF score in the range between 1 (best) and 0 (worst). 12