HACAN Heathrow Association for the Control of Aircraft Noise President: Professor Walter Holland CBE MD FRCP FFPHM PO Box 339, Richmond, Surrey TW9 3RB Tel: 0181 876 0455 Fax: 0181 878 0881 PROOF OF EVIDENCE H. F. JONES VALIDITY OF LEQ AS A PREDICTOR OF THE IMPACT OF AIRCRAFT NOISE ON PEOPLE June 1997 HAC 62 Validity of Leq ... 2 Personal Details I hold an MA in mathematics from the University of Cambridge and a PhD in Theoretical Physics from the University of London, and am currently a Senior Lecturer in the Physics Department of Imperial College. I have lived in Richmond since 1965 (when the number of ATMs was 180,000), and have been a committee member of HACAN (originally KACAN) since 1971. The occasion of my joining KACAN was when I took part in a guided tour of the Richmond Green area given by the Richmond Society, and found the commentary frequently inaudible because of aircraft noise. This was about the time when, following the lengthening of the northern runway, all westerly landings were assigned to the southern runway, giving a dramatic foretaste of life without alternation. It seemed to me that the efforts of the Richmond Society on the ground were being negated by this assault from the air. Since then I have been involved in both the Fourth Terminal Inquiry and the first Fifth Terminal Inquiry and have seen the number of ATMs rise to 427,000. Validity of Leq ... 3 Summary This paper is concerned with the validity of the noise index Leq as a predictor of subjective disturbance in the population affected by aircraft noise at Heathrow at the present time or in the future. Both Leq and its predecessor NNI are based on social surveys, of which the most recent was carried out in 1982, when the aircraft noise climate at Heathrow was very different from the present one. I question the validity of the analysis of that survey, noting in particular the large uncertainties involved and the fact that the noise indices studied in depth in that analysis are not the same as that actually adopted when the changeover to Leq was implemented. As a consequence I cast doubt on the official assumption that 57 Leq represents the "onset of community disturbance". A wealth of subjective evidence suggests that the present Leq indexation is already at variance with the true disturbance. It seems that, contrary to the Leq model, sheer numbers of aircraft (greater than those experienced at any site in the 1982 study) cause severe annoyance, even when the noise of some individual aircraft is reduced. Without another survey to check and recalibrate it, extrapolation of the model further into the future, to predict the effect of a Fifth Terminal, is not credible. Validity of Leq ... Table of Contents 1. Introduction 2. History. NNI 3. Change to Leq 4. Critique of Leq as a Predictive Tool 4.1 Doubling the Numbers 4.2 Offsetting Noise vs. Number 4.3 Spreading the Noise 4.4 Concentrating the Noise 5. Leq/NNI as a Snapshot 6. Critique of DR 8402 6.1 Multiple Regression Analysis 6.2 Unequal Sample Sizes/Response Rates 6.3 Extrapolation vs. Interpolation 6.4 The Ratio k 6.5 MRAs with 1-week Leq 6.6 3-month Leq and NNI 7. Critique of DORA 9023 7.1 16-hour vs. 24-hour Leq 7.2 Benchmark for High Annoyance 7.3 Benchmark for Onset of Community Disturbance 8. Leq as a Measure of Disturbance at the Present Time 8.1 Beyond the 57 Leq Contour 8.2 Night Flights 9. Conclusions 4 Validity of Leq ... Definitions References Figures Attachment 1 5 Validity of Leq ... 6 1. Introduction 1.1 The problem of (aircraft) noise is that it causes disturbance to people. Thus the aim of any noise index must be to provide a valid measure of the subjective annoyance experienced by the population affected. To quote Adams and McManus (Noise and Noise Law, Wiley Chancery Law, p.55): "To be useful, there must be a good correlation between the parameter selected and the subjective response to the noise from the point of view of annoyance and noise intrusion". What is measured by the Department of Transport is the geographical distribution of the average noise energy, in the form of a particular variant of Leq, and the fundamental question is whether this correlates sufficiently strongly with annoyance that it can be used as a substitute for checking that annoyance directly. 1.2 In an ideal world such a check could be provided by conducting social surveys on a regular basis, say every two years or so. In that way one could measure the extent of aircraft noise disturbance based on the direct experience of the people affected and identify such correlations as exist between that disturbance and the numbers of aircraft and the average noise of each as these change over time. However, this has not been considered a practicable proposition because of the expense and effort that would be involved, although that could be considerably reduced by undertaking smaller surveys in representative areas. 1.3 Instead, the modus operandi at British airports has been to conduct infrequent large-scale surveys, to try and identify correlations at the time of those surveys between subjective annoyance on the one hand and the number of aircraft and their individual noise on the other. In between surveys the index thus devised, which is described in terms of the physical characteristics of the noise, is updated by measurements and calculations, and regarded as accurately reflecting the subjective disturbance. 1.4 Out of this methodology have arisen two large edifices for the assessment of aircraft noise disturbance, based in turn on NNI and Leq. Both incorporate an implicit trade-off between noise and number of aircraft - that is to say that the index at any location, and by inference the subjective annoyance, does not change if the number of aircraft increases, provided that the average noise of each decreases by a corresponding amount. Thus in Leq terms a doubling of numbers can be offset by a barely perceptible1 3dB decrease in average loudness. This is the reason for the shrinking of the Leq contours and so, it is implied, community disturbance in recent years. 1.5 The central point of this paper is that these edifices are built on very shaky foundations. The Leq system, in particular, is based on a social survey which was carried out in 1982, but is being applied today in vastly changed circumstances, and even being relied upon to predict future disturbance in 2016. My criticism of this system is two-fold. Firstly I expose some of the shortcomings of the analysis of the social survey of 1982 in DR 8402 and DORA 9023, which led to the current system. The second criticism is of a more general nature, namely that the index is being used 1 Ref. 1, para. 11 Validity of Leq ... 7 to extrapolate in time, over rather a long time scale, from the conditions of 1982. As the length of the extrapolation increases, so inevitably will the uncertainties. My conclusion is that the noise contour system as currently applied by the Department of Transport can not be relied on. Indeed there are many indications, from the analysis of letters to the Inquiry, from individual testimony to the Inquiry, from complaints data, from the distribution of HACAN membership etc., as detailed in another HACAN submission, that it considerably underestimates the degree of the present level of disturbance. 1.6 Assessing noise disturbance is by no means a simple matter, both from the point of view of the human response and that of the measurement of the characteristics of the noise environment. The human response is both a physical and a psychological reaction to noise, which depends on many factors, including pitch, tone, intermittency, ambient noise levels, socio-economic class, etc. etc., and has a great deal of variability from one individual to another. The physical characteristics of the noise are also rather complicated, as reflected by the number of different measures which are used, such as dBA, dBD, PNdB, EPNdB, Leq of many different varieties, DNL, LAmax, LNP etc. (See Adams and McManus, Attachment 1). These all involve some form of averaging, with different weightings, over frequency and/or time - and their validity or appropriateness can change with time, for example as the mix of the aircraft fleet changes, both in size and in type of engines - from propellers to turbo-props, to jets, to fan-jets etc. 1.7 There is clearly some attraction in extracting from this complicated situation a single index, which at Heathrow has become (16-hour, 3-month) Leq, but its limitations should be clearly recognized, and it should not be elevated to the status of an icon. After all, contours of Leq are just that, contours of Leq. What are really needed are contours of disturbance. The Leq methodology can help draw a contour of disturbance at a particular time. If by interviewing representative samples of the affected population a strong correlation between levels of disturbance and Leq is established then it may be valid to draw a complete contour around an airport based on Leq calculations, thus avoiding the cost of a much larger interview programme. Again, changes in Leq may be able to act as a surrogate for changes in disturbance over short periods of time provided that flight numbers and average noise levels change by only modest amounts, but when these change significantly the index needs to be recalibrated by asking people about their subjective experience of current noise levels. Such a recalibration is long overdue at Heathrow. 1.8 The relevance of this to the Fifth Terminal proposals is two-fold. Firstly, we believe, and will give evidence to show, that the Leq contours, as currently interpreted, greatly underestimate the extent of the disturbance from the airport at the present time. The population has been subject to a huge increase in the number of operations since the commitment to limit annual numbers to 275,000 was abandoned in 1985, and has therefore been deprived of the improvement in the noise climate which would have resulted from the introduction of quieter engines with a fixed number of operations. Instead, the Leq system has been used to claim, without further verification, that the reduction in the average noise per aircraft has more than offset the effect of increasing numbers. This is quite contrary to the experience of Validity of Leq ... 8 HACAN members, for whom the large increase in numbers, even with somewhat quieter aircraft, has resulted in more disturbance rather than less. 1.9 Secondly, the projections which are being made as far ahead as 2016 are on even more shaky ground. As has been detailed in HAC 1, we are firmly of the belief that the full utilization of T5 would require a very substantial increase in numbers. We do not believe the claim that the effect of such an increase would be offset by a reduction in the noise of individual aircraft, and look to a limit on numbers as the only sure way of improving the situation. 1.10 Such a limit could take the form of an imposed limit, as the government currently applies its powers in relation to Stansted, for example. Alternatively, and less satisfactorily, there could be the practical operational limit of the capacity of Heathrow’s runways with present restrictions on night flights maintained or improved, and the alternation system maintained. It is widely accepted that this limit is around 475,000. In HACAN’s view, a Fifth Terminal designed to take an additional 30 million passengers per annum cannot possibly be accommodated within current operational practices. Validity of Leq ... 9 2. The History of NNI (the Noise and Number Index) 2.1 The index NNI was established as a result of the report of the Wilson Committee(1) (1963). NNI = L + 15 log N -80 where L = logarithmic average of peak noise levels (PNdB), N = number of aircraft per day ( Roughly PNdB = dBA + 13 ) [The average is taken over a 12-hour day (06.00 - 18.00 GMT) for the 3 months midJune to mid-September, and aircraft with L below 80 PNdB are not counted.] 2.1.1 Insofar as it concerned aircraft noise, the report was based on a social survey(2) (September 1961), with a sample size of 1909, combined with measurements of noise exposure. The NNI index was put forward as the best correlation between annoyance and exposure to aircraft noise at that time. 2.1.2 The following correspondences were subsequently adopted, although their basis is not entirely clear: 35 NNI "Onset of Community Disturbance" 45 NNI "Moderate Annoyance" 55 NNI "High Annoyance" 2.1.3 The only such correspondence which was explicitly suggested in the Wilson report (p. 211) was that "exposure to aircraft noise reaches an unreasonable level in the range 50 - 60 NNI." The further correspondences seem to be based on Figure 2 of the report, reproduced here as Fig. 1, plotting average annoyance rating against NNI. The rating "moderate" does seem to correspond roughly to 45 NNI, but it is interesting to note that the rating "little" corresponds to 32 NNI rather than 35 NNI. The figure of 32 NNI is also picked out in Ref. 6 (para 18) as a level below which "very few people find noise to be a major disamenity". 2.2 A critique was given in a KACAN paper(3) in 1969. 2.2.1 The main points made were that (a) The NNI might well have to be reviewed whenever the situation changed qualitatively, for example by the change in the nature of the fleet. (b) The use of log N rather than N had not been well established. This has enormous implications when the NNI is extrapolated to larger numbers of aircraft. (c) The duration of the noise was not taken into account. Validity of Leq ... 10 (d) It is rather unlikely that a single variable can adequately correlate with the whole spectrum of noise exposure and annoyance. 2.2.2 To amplify this latter point it is worth reproducing a plot (Fig. 2) of selected NNI levels against L and N, produced by P. Davies, former general secretary of FHANG. From this figure it can be seen that the same level of 55 NNI can be produced by such widely differing scenarios (all per NNI day) as: 400 flights at 96 PNdB, 150 flights at 102 PNdB (the night-time take-off limit), 46 flights at 110 PNdB (the day-time take-off limit), or 1 Concorde at take-off (130 PNdB) (e) In terms of subjective annoyance NNI attempts to represent an average, but there is a wide variation in people's sensitivity to noise. (f) The Wilson Committee emphasized the tentative nature of its conclusions, which, however, were subsequently treated as definitive. 2.3 The index was monitored in the Second Survey of Aircraft Noise Annoyance around Heathrow(4), carried out in September 1967 with a sample size of 4699 adults and published in 1971. 2.3.1 The authors claimed to find no increase in annoyance since the earlier survey, in spite of the increase in aircraft numbers, and speculated whether this was evidence of acclimatization among the population affected. 2.3.2 Using multiple regression analysis to correlate various combinations of L and N with the annoyance scale N/1 they suggested that the degree of correlation was very insensitive to the precise value of the coefficient K in the combination L + K log(N+1). But in fact there was a marginally better correlation using N itself, in the form L + 0.1(N+1)-70. 2.4 As was pointed out in the KACAN response(5), the evidence for acclimatization was far from conclusive: there had been significant population shifts in the intervening period, and also the alternation system had been introduced by 1967. Alternation, or half-day noise relief, is a measure for alleviating noise disturbance which is very important to HACAN's members, and we will return to this point later on (§ 4.4). 2.5 The status and validity of NNI was reviewed in a DORA paper(6) in 1981 which broadly endorsed the index, though leaving the way open for future reviews in the light of changing circumstances, in particular the trend to a larger number of somewhat quieter aircraft. International comparisons were made in an Annex, from which it is to be noted that the indices used in Germany and The Netherlands were broadly similar to NNI, while those used in several other countries, in particular the USA, were basically of the Leq type. Validity of Leq ... 11 3. The Change to Leq 3.1 In 1985 a CAA paper, DR Report 8402, was published(7) which recommended a change from NNI to Leq as a physical index which correlated better with subjective annoyance. The paper was based on noise measurements together with a social survey carried out in 1982 in which 2097 people were interviewed, at the three London airports Heathrow, Gatwick and Luton, and also at Manchester and Aberdeen. 3.1.1 The principal difference between NNI and Leq (Equivalent Continuous Sound Level) lies in the relative weighting of noise and number. Leq is defined as that continuous noise level (dBA) which, over a specified period of time, would have the same acoustic energy as the succession of discrete noise events. If all the events have the same duration and noise level this reduces to Leq = L + 10 log N +const. (dBA). 3.1.2 Thus, compared with NNI, the weighting of log N is reduced from 15 to 10, which clearly has important implications, to be discussed below. 3.2 After public consultation the paper was followed up(8) in 1990 by a further report, DORA Report 9023, which contained detailed proposals for changing from NNI to Leq. An important element of this report was the recommendation for setting the bench marks in Leq corresponding to the "onset of community disturbance", "moderate disturbance" and "high disturbance" at 57 Leq, 63 Leq and 69 Leq respectively. As already mentioned, these had previously been set in NNI terms at 35 NNI, 45 NNI and 55 NNI respectively. 4. Critique of Leq as a Predictive Tool 4.1 Doubling the Numbers 4.1.1 Suppose that the numbers of aircraft doubled while the noise of each remained the same. Then the Leq would increase by approximately 3 ( = 10 log 2). But it will be noted that the increments between the different levels of subjective annoyance have been set at 6. So apparently the annoyance would only move half way to the next benchmark, say from 57 Leq to 60 Leq. In fact the claim is that the numbers would have to quadruple before the population became just moderately disturbed. Or again, given that the current number of ATMs at Heathrow (427,000) is roughly equal to the numbers at Gatwick, Stansted and Luton combined, according to the Leq model the population around Heathrow would hardly notice if all the latter flights were transferred to Heathrow. This seems so patently absurd that it calls into question the whole concept of Leq as a tool for quantifying changes in the response of the population over time. 4.1.2 In this regard NNI is equally implausible. Doubling the numbers would lead to an increase in NNI of 4.5, and quadrupling to an increase of 9, compared with the steps of 10 set between the benchmarks of "onset of disturbance", "moderate annoyance" and "high annoyance". Validity of Leq ... 12 4.2 Offsetting Noise vs. Numbers 4.2.1 The actual trend which has occurred over the last few decades at Heathrow, and is likely to continue for some time, is of increasing numbers of aircraft coupled with decreasing noise levels of individual aircraft, although, as detailed in HAC 63, the scope for further reductions in landing noise is severely limited. Leq does not change if, say, the number of planes is doubled but they are each 3dB quieter. This sort of trade-off explains how the noise contours have shrunk in recent years and are predicted to do so in general, though even then growing in some areas, in the future. But how much credence should we place in this shrinking when it goes against all the evidence of increasing public protest and is based on the premise that a barely perceptible change in perceived loudness can completely offset a doubling of numbers? 4.2.2 The trade-off in the case of NNI is of the same general nature, although the numbers are given somewhat more weighting. There a doubling of numbers could apparently be offset by a 4.5dB reduction in average loudness. 4.3 Spreading the Noise 4.3.1 One of the alleviative measures most valued by local residents is the alternation system, whereby aircraft land on one runway and take off on the other for half of the day (07.00 - 15.00), the roles of the two runways being reversed for the other half of the day (15.00 - 23.00), and the overall pattern being rotated on a weekly basis. The result for the majority of residents is that they suffer aircraft noise for half of the day but have a period of respite for the other half. In the alternative scenario of mixed mode, which would marginally increase runway capacity, the same number of aircraft would be spread continuously throughout the day, with a longer gap between aircraft but no respite. It is absolutely unequivocal which of these scenarios the residents prefer (See HAC 60), yet there would be no difference in Leq. 4.4 Concentrating the Noise 4.4.1 Suppose that the Government decided that runway 27L would always be used for landing, and runway 27R always for take-off, as indeed was the threat when a third parallel runway was considered at Heathrow. That means that for half the population the numbers would be roughly doubled, whereas some would have the numbers greatly reduced. Thus at a stroke roughly half the population would be removed from the 57 Leq contour, so that one could claim that "the number of people affected by aircraft noise" had been drastically reduced. But of course the population still affected would be much worse off, with many more inside the higher contours. The shape of the whole Leq "mountain" is important, not just the headline 57 Leq contour. 5. Leq/NNI as a Snapshot 5.1 How can we explain the evident inadequacy of NNI or Leq to tally with people's reaction to a changing situation with the weighty surveys on which they were based? To answer this question we need to look at the methodology of those surveys. To be specific let us just refer to the ANIS survey described in Ref. 7. A sample population was chosen at 26 sites, chosen to have a wide variation in numbers N and loudness L. Validity of Leq ... 13 Then the subjective response of the sample was compared with the physical noise data N and L, and a search was made for a single variable of the form L + k log N which would have the best correlation with the subjective response. The claim was that a particular form of Leq, which corresponds to k=10 in the above expression2, was well correlated with the subjective response, better that NNI, which corresponds to k=15. This was the basis for the change from NNI to Leq. 5.2 However, it is very important to realize that what is being undertaken is a snapshot of the situation at a particular time, and indeed over a limited range of the variables L and N. Thus, if we accept the results of the study it would be reasonable to use it by interpolation to estimate the subjective response of people other than those sampled, with intermediate values of N and L, at that time. However, what is on much more shaky ground is extrapolation to a future situation in which the typical values of N or L lie outside the range covered in the study. As far as Heathrow is concerned the present numbers of aircraft indeed go beyond those measured in the survey, as detailed in the following section. 5.3 A graphical illustration of the pitfalls of extrapolation is given in Fig. 3. This is a hypothetical example, but one not without relevance to noise indices such as NNI or Leq. Two completely different functions of x are plotted, one logarithmic, one linear. Nonetheless over a limited range of x, say from 200 to 500, they agree fairly well, and if one is a good fit to some data within that range, the other will also be a reasonable fit. However, when the two fits are extended (extrapolated) beyond that range, they differ markedly. Yet we can not with any confidence prefer one curve over the other, or indeed any other curve which would give a reasonable fit to the data within the limited range 200-500. 5.4 The point that any noise measure has a limited range of validity in time was explicitly acknowledged in the introduction to Ref. 4, one of whose objectives was "to investigate whether the findings of the 1961 survey remain valid in 1967" and in paragraph 19 of Ref. 6, which states "There is now a considerable amount of experience of the usefulness and validity of the NNI for immediate control and short term development, but less certainty about its use for those long term planning purposes where some new circumstances need to be envisaged. ... There is therefore an argument in favour of testing the Index to ensure that it can continue to be representative of annoyance in these changing conditions." It was implicitly acknowledged by the setting up of the ANIS study, to re-evaluate NNI in the light of changing circumstances. 6. Critique of DR 8402 6.0.1 The paper DR 8402 (Ref. 7) was the basis for the changeover from NNI to Leq, which was claimed to have a better correlation with subjective disturbance. The methodology was to perform multiple regression analyses to see how the various measures of subjective annoyance were correlated with the noise data at the time of the ANIS study (1982). 2But see §6.6 Validity of Leq ... 14 6.1 Multiple Regression Analysis 6.1.1 Multiple regression analysis (MRA) involves finding that linear combination of independent variables which best explains the variation in a given dependent variable. In the present case the independent variables are for the most part noise data, and the dependent variable is some measure of subjective annoyance. The subjective measures primarily used in DR 8402 were AVOGAS, the average annoyance rating on the (old) Guttman scale, ARCBOTH, the percentage of the sample population considering aircraft noise to be the most bothersome noise, VMANN, the percentage very much annoyed by noise in general and ARCNA, the percentage finding the levels of aircraft noise not acceptable. The noise measures included average daily numbers of aircraft above a certain noise threshold, average peak noise levels, again above various thresholds, NNI, and various versions of Leq. The latter were averaged over various periods (three months, 1 week, 24 hours and 16 hours, and also over various modes of operation of the airport). It was found that some of the correlations were significantly improved by including WORKAP, the percentage of the sample population whose work was in some way connected with the airport, among the independent variables . 6.1.2 As a gauge of how good a fit is, the multiple correlation coefficient, R, has a rather simple interpretation, namely that 100×R2% of the variation of the dependent variable (in this case AVOGAS, ARCBOTH etc.) is explained by the fit. Thus, for example, R=0.9 (R2=0.81) means that 81% of the variation is explained by the fit. However, it is clear that this percentage can always be increased by bringing in more independent variables, so the number of independent variables also affects the significance of the fit. This can be partly taken into account by using an 'adjusted' value of R, denoted by Ra, but strictly speaking the correct test involves a statistic F derived from R, whose significance can then be read off from published tables. 6.2 Unequal Sample Size/Response Rates 6.2.1 A general reservation which applies to all of the MRAs performed in DR 8402 is that the dependent variables AVOGAS etc. are treated as single, exact numbers, whereas they are in fact averages over a sample population, each with an individual error, or variance. In such a situation the correct procedure is known as maximum likelihood analysis, which places less emphasis on those data which have a large error and more on those which are better determined. The quantity to be minimized in this case is chi squared ( 2) rather than 1-R2. If all the samples were of the same size and all the response rates the same this more general analysis would reduce to MRA. However, this is not the case - the sample sizes vary from 66 to 101 (Table C2) and the response rates from 55% to 78.3% (Table 5.1), so the results obtained by MRA in DR 8402 are indeed subject to the above criticism. That is, the fits taking into account the sampling errors on AVOGAS etc. would actually differ from those obtained by the simpler analysis. 6.2.2 Moreover, in the presence of sampling errors, even if these are all equal so that the fits are the same, the F test of the simple MRA overestimates the significance of the fits. That is because the simple MRA is just a fit to the central values of AVOGAS etc., ignoring the fact that these central values have an uncertainty. Validity of Leq ... 15 6.3 Extrapolation vs. Interpolation 6.3.1 Again, a legitimate use of such analyses is to interpolate within the range of noise and numbers covered by the survey, but extrapolation beyond that range is a much more dangerous activity. In para. 8.30 of DR 8402 it was claimed that the data set, which was designed to include areas of high numbers/low noise, was appropriate for future conditions. However, the authors clearly did not anticipate the scale of the growth in numbers which have occurred since then, which means that in many areas present flight numbers now exceed the maximum which occurred in 1982. 6.3.2 At that time the number of air transport movements (ATMs) was about 250,000, and the numbers were subject to a limit of 275,000, but this was abandoned in 1985 following the first Terminal 5 Inquiry partly on the grounds that the airport had essentially reached saturation point as far as runway capacity was concerned. Far from this being the case, the latest figures (January 1997) show a throughput of 427,000 ATMs. 6.3.3 In 1982 the sample area affected by the largest number of aircraft movements was East Sheen, which suffered from a daily (24-hour) average of 319 movements in worst mode (westerly landing on either runway). Today the corresponding figure would be 420,000/365/2 = 575 movements, an 80% increase. Even those areas affected by only a single runway are now subjected to 288 flights per day, close to the maximum in 1982. Thus the extension to today's situation of fits set up on the basis of the 1982 measurements does indeed involve a considerable amount of extrapolation. 6.4 The Ratio k 6.4.1 Coming now to the actual MRAs performed in DR 8402, the basis for the preference of Leq over NNI was an analysis which examined the correlation of AVOGAS (the average annoyance level on the old Guttman scale) with noise, in the form of average peak noise level L over various periods (3-month, 1-week, 24-hour day, 16-hour day) and with various thresholds (80, 75, 70 dBA), and the corresponding numbers N of aircraft (actually the logarithms of those numbers). The aim of the analysis was to find the ratio k of the coefficient of log N to that of L in the linear fit. The results from MRA1 were k=6.4 in a fit whose overall R2 was 0.6667. This seems to show that the coefficient k=15 which occurs in the NNI combination is not validated, but it also casts doubt on the validity of the 3-month Leq which was used in this analysis, for which the nominal k would be 10 if all the aircraft noise events were identical. However, as discussed later, over the actual aircraft mix at the time the coefficient is rather larger, of the order of 13. Moreover, given the value of R, the fit is not a very good one, explaining only some 67% of the variation in AVOGAS (64% if Ra2 is used). 6.5 MRAs with 1-week Leq Validity of Leq ... 16 6.5.1 The best fits (MR7) to the annoyance data are obtained by using 1-week 24hour Leq (i.e. the 24-hour Leq in the week immediately preceding the interview) as the independent variable, taking into account the percentage of the sample (WORKAP) whose employment was connected with the airport. For some of the dependent variables (AVOGAS and ARCBOTH) a jump at 57 Leq was also introduced. 6.5.2 For example MR7B (VMANN vs. W1LQ24 and WORKAP), which is shown as a graph of the adjusted VMANN against W1LQ24 in Fig. 9.4, reproduced here as Fig. 4, gives R2 = 0.8402 (Ra2 = 0.8283), and thus explains 84% of the variation. However, it is very important to note that in the production of noise contours the three-month version of Leq (M3LQ) has to be used for obvious practical reasons, so this correlation is not as useful as it might seem. 6.6 3-month Leq and NNI 6.6.1 In fact, if one compares NNI with M3LQ24 (3-month, 24-hour Leq), they are very highly correlated. This is shown in Fig. 9.1 of DR 8402, reproduced here as Fig. 5. The correlation coefficient r is not quoted, but in fact is 0.98647 (r2=0.97), which means that the correlation is very good indeed3. 6.6.2 Another way of looking at this is an analysis, not performed in the paper, of M3LQ versus M3L80 and LM3N80, namely the 3-month average of peak noise levels with a threshold of 80dBA and the logarithm of the corresponding numbers. A similar analysis can be performed for thresholds of 75 and 70dBA. The correlation is actually better than any of the multiple correlations presented in the paper, with R2 = 0.979, but the relative coefficient of log N to L is 12.9 rather than 10. As mentioned above, the ratio 10 is what one would obtain if all the aircraft noise events were identical, but this analysis shows that over the spread of aircraft types at the time of the survey the coefficient was approximately 13, quite close to the 15 of NNI. This explains why, over that data sample, the two are not so different. However, it is not at all clear that this relation will stay the same as time progresses and the aircraft mix changes, and we see again the possible dangers of extrapolation. 7. Critique of DORA 9023 7.1 16-hour Leq vs. 24-hour Leq 7.1.1 In DORA 9023, it was decided to use 16-hour Leq, i.e. Leq averaged over the period 0700-2300 local time, rather than 24-hour Leq, for noise classification purposes. It is therefore necessary to establish the relationship between 16-hour and 24-hour Leq, shown in Fig. 6, and between NNI and 16-hour Leq, shown in Fig. 7. To a good approximation M3LQ16 = M3LQ24 +1.3 dBA under the conditions prevailing in 1982. 7.1.2 The reason for the preference for 16-hour Leq was the perception, with which I would concur, that night-time disturbance is a separate problem from day-time disturbance. Moreover, the averaging process implicit in Leq is likely to be much less 3The linear relationship is of the approximate form NNI = 1.36 M3LQ24 -41 Validity of Leq ... 17 appropriate for night-time disturbance because, in the middle of the night at least, the disturbance is produced by well-separated individual noise events. 7.1.3 However, the choice of 16-hour Leq has the inevitable result that any change in the pattern of night-time disturbance, such as the recent reduction of the night quota period, which caused a huge community reaction, is not reflected at all in the published (day-time) Leq contours. This point is discussed in more detail in §8.2. 7.2 Benchmark for High Annoyance 7.2.1 The benchmark for high annoyance was eventually set at 69 Leq (16 hours), which corresponds roughly to 67.4 Leq (24 hours). This seems not unreasonable, always remembering that the analysis refers to the situation in 1982, given that at this level of 24-hour Leq aircraft noise was found to be very much annoying to about 2/3 of the population, "not acceptable" to about 3/4 and the "most bothersome noise" to 9/10 (para 9.16, p. 62 of DR 8402). It corresponds to 51 NNI according to the regression analysis of Fig. 7. By 1988, however, the relationship had changed(8), so that 69 Leq (16 hours) then corresponded to 53.5 NNI. In either case the NNI value is less than 55, so the upper benchmark seems to err on the safe side. 7.3 Benchmark for Onset of Community Annoyance 7.3.1 The benchmark for "Onset of Community Disturbance" was set at 57 (16 hours) Leq, which would correspond to 55.5 Leq (24 hours), or 34.5 NNI. Part of the basis for this decision, as discussed in para 2.4.2 of the paper, was the set of figures 1-5, plotting various indicators of community disturbance against (3-month, 24-hour) Leq. It is worth examining Figures 1 and 2, reproduced here as Figs. 8 and 9, in more detail, since they correspond directly to a figure already given in DR 8402, Fig. 7.5 (here Fig. 10), which plots VMANN, the percentage of the population very much annoyed by aircraft noise, against three-month, 24-hour Leq, M3LQ24. 7.3.2 It is noteworthy that this latter figure, which incorporates error bars, gives no evidence of any threshold around 55 Leq. The stretched scale and the suggestive shading in Fig. 1, and the aggregation of points in Fig. 2, of DORA 9023 are artefacts which tend to give the misleading impression of such a threshold. 7.3.3 Two further points should be made in this connection. Firstly these two figures make no allowance for WORKAP, the percentage of the population whose work is associated with the airport. This latter was shown in DR 8402 (in conjunction with 1week Leq) to be an important "confounding factor", which tends to mask the true annoyance. A multiple regression analysis, not performed in DR 8402, of VMANN vs. M3LQ24 and WORKAP gives a fitted value of very nearly 20% at 55.5 Leq (24 hours), corresponding to 57 Leq (16 hours), for the adjusted percentage very much annoyed (Fig. 11). 7.3.4 Secondly all such fits have a substantial margin of uncertainty, i.e. there is a large spread about the regression line. In this particular fit the standard deviation is 7.37. There is a 30% chance that true value of VMANN differs from the regression equation by more than one standard deviation, and a 5% chance that it differs by two Validity of Leq ... 18 standard deviations, i.e. by about 15%. Thus it can be very misleading to draw a regression as a line if one forgets the broad swathe of uncertainty on either side. 7.3.5 These points apply to each of the benchmarks, but are perhaps of particular importance when one is trying to establish a "threshold of annoyance". 7.3.6 The ultimate criteria for the choice of scales were derived from comparisons between average GAS scores, NNI and 3-month, 16-hour Leq in 1982, and also in unpublished Leq measurements made in 1988. The three options for translating between the old benchmarks in NNI and new benchmarks in 16-hour Leq, as presented in a table at the bottom of p. 29 of DORA 9023 were: 1) The best fit between NNI and Leq in 1982, without reference to the data on subjective annoyance. 2) The 1982 Leq values which corresponded to the same GAS scores which in the original social survey of 1967 occurred at 35, 45 and 55 NNI respectively. 3) The best fit between NNI and Leq in 1988, again without reference to the data on subjective annoyance. 7.3.7 Of these, the second seems the most reasonable. After all, it is subjective annoyance which we are trying to gauge. However, it is important to note that the correlation between AVOGAS and 3-month, 16-hour Leq is not particularly impressive, particularly when the WORKAP factor is not taken into account (In DR 8402 the best correlation between AVOGAS and the noise metrics involved both WORKAP and a step at 57 Leq (1 week, 24 hours)). The correlation coefficient r is only 0.76 (r2 = 0.57), and correspondingly the standard deviation is large, as can be seen visually from Fig. 12. Its actual value is 0.6. To put this number into context the differences in AVOGAS between "low", "moderate" and "high" annoyance in 1967 were 0.82. Thus the error in this relation is of the order of the intervals between the different categories. For a similar reason the reduction in AVOGAS at 35 NNI between 1967 and 1982 is statistically insignificant. 7.3.8 It was ultimately decided to set the lower threshold at 57 Leq (3 months, 16 hours). From the point of view of AVOGAS this is not an unreasonable figure. But, as we have stressed above, it is essentially impossible to locate the threshold, if indeed one exists, with any precision. The reasons, again, are that the correlation between AVOGAS and Leq is not very strong, the errors are large, and the confounding factor of WORKAP has not been taken into consideration. Moreover, we should never forget that the analysis was based on a social survey carried out in 1982, since when conditions at Heathrow have changed dramatically. 7.3.9 The measure VMANN discussed above shows no threshold at all. Another measure used extensively in DR 8402, but not used in DORA 9023, is ARCBOTH, the percentage of the population considering aircraft noise to be the most bothersome noise. When fitted against 3-month, 16-hour Leq, taking into account the WORKAP factor, this measure seems to exhibit a step at about 60 Leq, as shown in Fig. 13. However, the average adjusted percentage of the nine sample areas below this step is Validity of Leq ... 19 some 36%, so it hardly represents a threshold of disturbance. At 57 Leq the fitted value is 39%. 7.3.10 This section has unfortunately been rather technical, but it is important to distinguish between all the different variants of Leq which have been used, and to point out the large errors involved, bearing in mind that an error of just 3dB in setting the threshold could "justify" a doubling of aircraft numbers! The authors of DORA 9023 themselves (p. 30) admit that "This kind of analysis has to be largely a matter of judgement: there are statistical and methodological uncertainties and the numbers are indicative rather than definitive". Nonetheless the figure of 57 Leq (3 months, 16 hours) has been elevated to the status of an icon, and the numbers inside the 57 Leq contour are confidently equated by government ministers, and in BAA's statement of case, to the numbers of people actually disturbed at any time, now or in the future. 8. Leq as a Measure of Disturbance at the Present Time 8.0.1 I have cast doubt on the validity of Leq for making predictions about future levels of community disturbance from aircraft noise. In addition there are many reasons, apart from those mentioned in the previous section, for believing that the 57 Leq contour seriously underestimates the current level of disturbance. 8.1 It is clear a priori that the 57 Leq contour is only one aspect of the whole noise climate of the airport. As discussed above, it is very important to consider the populations inside the contours of higher levels. It is equally important not to neglect the lower levels. It is a myth to suppose that the disturbance ceases abruptly at 57 Leq: there is obviously not a sharp cutoff. In a separate proof of evidence (HAC 64) HACAN will show that there are large numbers of HACAN members, affiliated amenity societies, complainants and registered objectors to this Inquiry lying outside the 57 Leq contour. 8.2 The estimates for Leq are derived for a 16-hour day (07.00 - 23.00) during a threemonth period in the summer. Thus it takes no account of night flights, which are known to be a huge cause of distress to residents and the most frequent cause of complaint, nor of the noise in the "shoulder hours", which has increased dramatically in recent years. To spell this out in a little more detail, the present night noise regime, which dates from 1993, specifies two periods: the "night period" 23.00 - 07.00, when a take-off noise limit of 102 PNdB and various other restrictions apply, and the "night quota period", from 23.30 - 06.00, when the number of aircraft is restricted. The "shoulder hours" are the night-time periods either side of the quota period, namely 23.00 - 23.30 and 06.00 - 07.00. The end of the period when a numerical limit applied during the winter was brought forward from 06.30 to 06.00, and between 1991 and 1996 the number of flights between 06.00 and 07.00 increased from 7,301 to 11,924 (letter from NATS to Dr. J. Cavalla). None of these changes is reflected in any way in the calculation of the 57 Leq contour. Validity of Leq ... 20 9. Conclusions 9.1 The crucial question we have been addressing is the validity of Leq as an objective measure of subjective disturbance, and in particular whether the 57 Leq contour accurately delineates the boundary beyond which aircraft noise is not perceived to be a significant problem. 9.2 The first issue is whether Leq is well-correlated with subjective disturbance. In DR 8402 the best such correlations involved W1LQ24, the 1-week, 24-hour version of Leq, and took account of the WORKAP factor. But the usefulness of such a correlation is not at all clear, since for practical purposes one is forced to use the 3month version of Leq. 9.3 This was indeed done in DORA 9023, and the picture was further complicated by the eventual decision to use 16-hour Leq, which did not feature at all in the multiple regression analyses carried out in DR 8402. Moreover, no account was taken of the WORKAP factor, which had proved so important in the earlier document. Thus there is an unfortunate mismatch between DR 8402, which finds good correlation using an impractical version of Leq, and DORA 9023, which uses M3LQ16, which has only a moderate correlation with subjective annoyance. 9.4 The second issue, the identification of 57 Leq (M3LQ16) with the onset of community disturbance, is even more fraught. In many subjective measures there is no clear threshold at all and, because the correlation is not particularly high, the errors are very large, and yet in terms of population, the difference between 57 Leq and, say, 54 Leq is considerable. As quoted above, the authors of DORA 9023 acknowledged the limitations of this analysis, but, as with NNI, such reservations have tended to be ignored. 9.5 In any case, all of this discussion refers to a social survey conducted in 1982, when conditions at Heathrow were very different. From many other indicators, discussed in section 8, it seems clear that the 57 Leq contour does not now have the significance attributed to it, if indeed it ever had. What is clearly needed, following the precedents of 1967 and 1982, is another social survey to recalibrate the contours to present conditions. In the absence of such a recalibration the Inquiry cannot place any weight on Leq contours as currently interpreted. Validity of Leq ... 21 Definitions Decibel: L = 10 log10[(p/pref)2]. A-weighting: The pressures at different frequencies are weighted differently, to mimic the response of the human ear. Typical sound levels (from Ref. 1) 65 dBA Busy restaurant or canteen 69 dBA Vacuum cleaner in home (at 10') 76 dBA Inside compartment of suburban electric train 80 dBA Ringing alarm at 2' 86 dBA Printing press (medium size automatic) 92 dBA Heavy diesel vehicle at 25' The present take-off limits at Heathrow are 110 PNdB = 97 dBA during the day and 102 PNdB = 89 dBA at night. Leq: Leq = 10 log10 <(pA/pref)2> over some time period. = 10 log10 < 10**(LA/10) > = SEL + 10 log10 N - const. M3LQ24: 3-month, 24-hour Leq W1LQ24: 1-week, 24-hour Leq (measured in the preceding week) M3LQ16: 3-month, 16-hour Leq (from 0700 hours to 2300 hours) AVOGAS: average score on the old Guttman annoyance scale ARCBOTH: percentage of the population finding aircraft noise the most bothersome noise ADJBOTH: ARCBOTH adjusted for the WORKAP factor VMANN: percentage of the population finding aircraft noise very annoying ADJVM: VMANN adjusted for the WORKAP factor ARCNA: percentage of the population finding aircraft noise not acceptable WORKAP: percentage of the population whose work is connected with the airport Validity of Leq ... References 1) Noise: Final Report. (The 'Wilson Report') Cmnd. 2056, HMSO (1963). Reprinted 1971. 2) Aircraft Noise Annoyance Around London (Heathrow) Airport SS 337 (1963). 3) The Noise and Number Index. Kew Papers on Aircraft Noise, no. 3. 4) Second Survey of Aircraft Noise Annoyance around London (Heathrow) Airport. MIL Research Limited. HMSO 1971. 5) The Second Survey of Aircraft Noise Annoyance around London (Heathrow) Airport. Kew Papers on Aircraft Noise, no. 4. 6) The Noise and Number Index. DORA Communication 7907 (1981). 7) United Kingdom Aircraft Noise Index Study: main report. P. Brooker, J. B. Critchley, D. J. Monkman and C. Richmond. DR Report 8402 (1985). 8) The Use of Leq as an Aircraft Noise Index J. B. Critchley and J. B. Ollerhead. DORA Report 9023 (1990) 22 Validity of Leq ... 23 Figures Fig. 1 Annoyance rating vs. NNI. (Ref. 1, p.208) Fig. 2 NNI levels vs. L and N. (P. Davies) Fig. 3 Extrapolation vs. interpolation Fig. 4 Adjusted percentage 'very much annoyed' vs. 1-week, 24-hour Leq. (Ref. 7, p. 108) Fig. 5 NNI vs. 3-month, 24-hour Leq. (Ref. 7, p. 105) Fig. 6 3-month, 16-hour Leq vs. 3-month, 24-hour Leq. (constructed from data given in Ref. 7) Fig. 7 NNI vs. 3-month, 16-hour Leq. (constructed from data given in Ref. 7) Fig. 8 Percentage 'very much annoyed' vs. 3-month, 24-hour Leq. (Ref. 8, p. 35) Fig. 9 Aggregated graph derived from the previous figure. (Ref. 8, p. 35) Fig. 10 Percentage 'very much annoyed' vs. 3-month, 24-hour Leq. (Ref. 7, p. 102) Fig. 11 Adjusted percentage 'very much annoyed' vs. 3-month, 24-hour Leq. (constructed from data given in Ref. 7) Fig. 12 Average annoyance rating on the old Guttman scale vs. 3-month, 16hour Leq. (constructed from data given in Ref. 7) Fig. 13 Adjusted percentage finding aircraft noise to be the most bothersome noise vs. 3-month, 16-hour Leq. (constructed from data given in Ref. 7)