AQ eReporting: technical issues – informal discussion document EIONET AQ workshop 6-7/9/2015 Based on EEA and ETC/ACM experiences to date, several technical questions have been identified during the compilation and handling AQ data reported under the IPR. The Eionet Air Quality meeting provides an informal forum where some of these issues may be discussed with air quality reporting experts. This document briefly summarises six of the issues where some technical discussion and proposals within the limits of the developed eReporting system may be beneficial to ensure a clear understanding of how data reported by countries is presently being handled. 1 Rounding The IPR Guidance Document indicates that “assessment data have to be compared to the environmental objectives (i.e. limit value, target value, etc.) in the same numeric accuracy as is used for the specification of the environmental objective in the Directive” (p. 10). Issue The rule leads to rounding which might potentially appear excessive in some cases, in particular for assessment purposes. This is the case for e.g. BaP for which the target value (1 ng/m3) has to be exceeded by more than 50% to be considered as an exceedance. In practice it appears that several Member States consider one or two decimals when verifying compliance. The IPR Guidance also notes that "AQUILA has recommended that in the case of polycyclic aromatic hydrocarbons the target value, lower and upper assessment thresholds should be quoted to the number of significant figures commensurate with the allowable uncertainty in their determination. In most cases this will mean two significant digits (e.g. target value of 1.0 ng/m3 for benzo [a] pyrene). AQUILA's recommendation will be taken into account in any future revision of the relevant requirements." For compliance reporting the legal requirement is to report to the same numeric accuracy as specified in the Directive (i.e. in the case of BaP this means one significant digit) – nevertheless, Member States are encouraged to follow the good practice to provide at least two significant digits. For AQ assessment purposes (e.g. AQ report and other outputs), the EEA is presently using the rounding rules presented in Annex which allow to obtain relevant classes of values for graphs and maps. Discussion question: Do you agree with the rounding rules presented in Annex for assessment purposes? 2 Data capture, time coverage and data coverage (from IPR guidance part 1) The data capture is the proportion (%) of valid measurements obtained within the measurement period defined by time coverage. For ozone measurements, the measurement period must be divided into summer and winter seasons. 1 The data capture DC in a given averaging period (e.g. a year) is calculated as follows DC = Nvalid / Ntotal * 100 % Where Nvalid Ntotal number of valid hourly/daily measurements in the measurement period (defined by the time coverage) total number of hours/days in that measurement period. The time coverage (TC) is the proportion (%) of a calendar year (or summer season (April September) in the case of indicative ozone measurements), for which measurements were originally planned. The time coverage for a given averaging period (year / season …) is calculated as follows: TC = Nplanned / Naveraging period* 100 % Where Nplanned Naveraging_period number of hours/days on which measurements were planned to take place total number of hours/days in the given averaging period (e.g. year/season) Issue The eReporting system as described in the IPR guidance document and implemented by the EEA system does not allow to make the difference between “planned” missing (or invalid or not reported at all) data and “unplanned” missing (or invalid or not reported at all) data. If data is not provided the system can only assume that the data is missing. The rules on time coverage and data capture requirements may therefore lead to ambiguous cases. This is illustrated below using the example given in the IPR Guidance document part 1 p.48 - 4 BaP daily Situation a Minimum time coverage 33% Planned measurement time 156 days Planned measurement time > time coverage 100 * 156 / 365 = 42.74% > 33% N valid = 132 days Data capture = 100 * 132 / 156 -84.6% rounded to 85% Valid Situation b If we modify the planned measurement time from 156 to, say, 200 days, but have at the end of the monitoring period the same number of valid measurement (132) this becomes Minimum time coverage 33% over the year Planned measurement time 200 days Planned measurement time > time coverage 100 * 200 / 365 = 54.79% > 33% N valid = 132 days Data capture = 100 * 132 / 200 -66% < 85% Not valid although same number of valid measurements and time coverage criteria fulfilled In practice, the system calculates DC and TC as follows: 2 π·ππ‘π πΆπππ‘π’ππ (π·πΆ) = ππππ πΆππ£πππππ (ππΆ) = [count validdata ] ([count validdata ] + [count notvalid maint ] + [count notvalid other ]) ([count validdata ] + [count notvalid maint ] + [count notvalid other ]) [Interval] In order to avoid ambiguities which might result, the system also calculate the data coverage (DCov) π·ππ‘π πΆππ£πππππ (π·πΆππ£) = [count validdata ] = ππΆ ∗ π·πΆ [Interval] In case of continuous measurement (TC = 100%), DC = DCov. In the example presented above for BaP, both situation would obtain the same data coverage: 132 valid data over one year 36.17%) so both considered as valid (> 33%). Discussion question: Do you agree with the proposed calculation and Data Coverage approach? 3 Maintenance and calibration 1. Should the 5% for maintenance and calibration (and therefore the 85% data capture criteria) still apply when raw data are aggregated to another averaging time? For example hourly data aggregated into daily data used to check compliance like for PM10. Applying the 75% rule for calculating the daily values and the 85 % of daily values for the number of exceedance means that 64 % of hourly values are sufficient to calculate valid statistics. It seems reasonable for simplicity reasons to keep the rules are they are applied at the moment. 2. Should the 5% for maintenance and calibration also be considered when time coverage are < 100% (discontinuous measurement e.g. BaP)? Or should it be proportional to the time coverage? It seems reasonable to consider that this 5% should only be applied to continuous measurement. In case of discontinuous measurement, maintenance and calibration should be done outside the measurement periods. However, iIt must be noted that in the IPR guidance document, this distinction between continuous and discontinuous method is not considered and the 5% seems to apply to both cases (see example on BaP above). Discussion question: Do you agree with the above answers? 3 4 Validity flag and verification status Validity flags associated to the data are 1 valid 2 valid, but below detection limit measurement value given 3 valid, but below detection limit and number replaced by 0.5*detection limit -99 not valid due to station maintenance or calibration -1 not valid due to other circumstances or data is simply missing. Verification flags associated to the data are: 1 verified 2 preliminary verified 3 not verified The Directive foresees minimum percentages of valid values for calculating statistics relevant for compliance checking. For the E1a submission (annual submission of data related to the reporting year Y-1 to be done by end September of the following year) all data are supposed to be fully verified. This is not the case for the data transferred under E2a (UTD) which can be associated to different verification levels (usually they are not or partly verified). Issue Both E1a data and E2a data are merged in a single database within the system. Verification flags and the origin of the data remain associated to the values. Because only the last transmitted version of the data sets is kept within the database, this mixed origin should not raise any issue: the UTD data (E2a with verification flag set to not or partly verified) should normally be overwritten by the official delivery submitted later (E1a with verification flag set to fully verified). Some special cases may however happen: - the country does not submit officially through E1a data which were previously transmitted as UTD (E2a). - the country, for whatever reason, has not proceeded to the verification of all data submitted in E1a. As statistics are calculated from all data, it might happen in these cases than unverified or partly verified data are used for calculating compliance statistics. It is suggested to flag the statistics in case they are calculated with data other than fully verified. One single “partly” or “unverified value” would trigger this flagging. Discussion question: Do you agree with the suggested approach? 5 AEI calculation method The table below presents the official method to calculate the AEI. Official method (AQ Directive) Select PM2.5 SamplingPoints set up for AEI Calculate the annual mean for each SamplingPoint and corresponding data capture for year X, X-1, X-2 Reject those SamplingPoints with data capture <85 % 4 Repeat for each selected PM2.5 SamplingPoint Calculate the annual average AEI contribution for the years X, X-1 and X-2. The average concentration for each year is calculated as the simple arithmetic average of the values for retained locations. Calculate the AEI as the simple arithmetic average of the values for the three years involved. Rounding to integer if above 10µg/m3, to 1 decimal if below 10 µg/m3 to done at this level Issue The official methodology leaves room for technical interpretation (different set of time series over the 3 year period, time series rejected …). From the experience in the past, it appears that (ETC/ACM): - “not all MS have delivered an "official" AEI-value; - when AEI has been reported MS may not report over the same 3-year period; - a list of "official" AEI-monitoring stations as selected by the MS is not available for all countries …” The methodology should be further detailed in order to ensure harmonised and consistent implementation in all Member States. The methodology used by ETC/ACM until now is as follows: “ - select for a three-year period all operational (sub)urban background stations (operational is defined as having a data coverage of 75% or more) calculate for each year a national average from the station annual means next average this for the three years in none of these step numbers have been rounded. Only when presenting the AEI-values, numbers should be rounded to one decimal. This calculation procedure is similar as the one described in the AQD, only the selection of stations differs” but also the percentage of valid data which is the one used by the EEA for assessment purposes. For compliance evaluation, the threshold of 85% would be applied. - Discussion question: Do you agree with the approach developed by ETC/ACM for assessment purpose? In particular for selecting stations? What to do in case the set of stations differs from year to year (data rejected because insufficient percentage of valid data, new stations or stations disappearing)? This approach would not alter in any way the official method and legal requirement established by the Ambient Air Quality Directive, which is to apply a threshold of 85% to evaluation and assessments done to check compliance. 6 CSI4 – Exceedance of air quality limit values in urban areas The methodology used at present to calculate the EEA core set indicator on exceedance of air quality limit values in urban areas is as follows: “For PM10, PM2.5, O3, NO2 and SO2 only stations with data time coverage of at least 75 % per calendar year are used. That is, in the case of daily values, having more than 274 valid daily values per calendar year (or 275 days if leap year). And in the case of hourly values, having more than 6570 valid hourly values per calendar year (or 6588 hours if leap year). For B(a)P the minimum data time coverage accepted is 14 % (51 days), according to the data quality objectives related to indicative measurements in the Directive 2004/107/EU (EU, 2004). 5 For every year, each city i in country j, and every pollutant, the total number of urban or suburban traffic stations (nit) and the total number of urban or suburban background stations (nib) are obtained. Ptj % of the total population of the city (Popi) is proportionally assigned to each of the traffic stations and Pbj % of Popi is proportionally assigned to each of the background stations. So, every traffic station has an allocated population equal to ((Ptj / 100) * Popi / nit) and every background station has an allocated population equal to ((Pbj /100) *Popi / nib).” Issue It happens mostly for PM that two or more methods can be used at the same station. At the moment, the values associated to both of them are taken into account in the calculation of the index, provided they both fulfil the time coverage criteria. This introduces a bias into the results i.e. a ‘double’ counting of measurements at a single station. . Discussion question: If just one measurement method is taken for incorporation in the CSI004 statistics, which should it be? 6 7 Annex Rounding EEA assessment /*NO2*/ statistics_average_group='day' and statistics_percentage_valid>=75 and component_code='8' and statistics_year=2012 and statistic_shortname='mean' - <=20 20< - <=30 30< - <=40 40< - <=50 50< (Rounding to integer for all classes) /*SO2*/ statistics_average_group='day' and statistics_percentage_valid>=75 and component_code='1' and statistics_year=2012 and statistic_shortname='mean' - <=5 5< - <=10 10< - <=20 20< - <=25 25< (Rounding to integer for all classes) /*PM10 mean*/ statistics_average_group='day' and statistics_percentage_valid>=75 and component_code='5' and statistics_year=2012 and statistic_shortname='mean' - <=20 20< - <=31 31< - <=40 40< - <=50 50< (Rounding to integer for all classes) /*PM10 P[90.4]*/ --New map statistics_average_group='day' and statistics_percentage_valid>=75 and component_code='5' and statistics_year=2012 and statistic_shortname='P90.4' - <=20 20< - <=40 40< - <=50 50< - <=75 75< (Rounding to integer for all classes) /*PM2.5*/ statistics_average_group='day' and statistics_percentage_valid>=75 and component_code='6001' and statistics_year=2012 and statistic_shortname='mean' - <=10 10< - <=20 20< - <=25 25< - <=30 30< (Rounding to integer for all classes) /*CO*/ statistics_average_group='dymax' and statistics_percentage_valid>=75 and component_code='10' and statistics_year=2012 and statistic_shortname='mean' - <=0.5 0.5< - <=1.0 1.0< - <=2.0 2.0< - <=2.5 2.5< (Rounding to one decimal for all classes) /*C6H6*/ 7 statistics_percentage_valid>=50 and component_code='20' and statistics_year=2012 and statistic_shortname='mean' <=1.7 1.7< - <=2.0 2.0< - <=3.5 3.5< - <=5.0 5.0< (Rounding to one decimal for all classes) /*O3*/ statistics_average_group='dymax' and statistics_percentage_valid>=75 and component_code='7' and statistics_year=2012 and statistic_shortname='mean' - <=60 60< - <=80 80< - <=100 100< (Rounding to integer for all classes) /*O3 P[93.2]*/ statistics_average_group='dymax' and statistics_percentage_valid>=75 and component_code='7' and statistics_year=2012 and statistic_shortname='P93.2' - <=80 80< - <=100 100< - <=120 120< - <=140 140< (Rounding to integer for all classes) /*Bap*/ statistics_percentage_valid>=14 and component_code in ('6015','5029','5129','1029') and statistics_year=2012 and statistic_shortname='mean' - <=0.12 0.12< - <=0.40 0.40< - <=0.60 0.60< - <=1.00 1.00< (Rounding to two decimals for all classes) /*Pb*/ statistics_percentage_valid>=14 and component_code in ('12','5012') and statistics_year=2012 and statistic_shortname='mean' <=0.02 0.02< - <=0.10 0.10< - <=0.50 0.50< - <=1.00 1.00< (Rounding to two decimals for all classes) /*Cd*/ statistics_percentage_valid>=14 and component_code in ('14','5014') and statistics_year=2012 and statistic_shortname='mean' <=1.0 1< - <=2 2< - <=5 5< - <=8 8< (Rounding to integer for all classes) /*As*/ statistics_percentage_valid>=14 and component_code in ('18','5018') and statistics_year=2012 and statistic_shortname='mean' <=1 1< - <=3 3< - <=6 6< - <=9 9< 8 (Rounding to integer for all classes) /*Ni*/ <=5 5< - <=10 10< - <=20 20< - <=30 30< (Rounding to integer for all classes) 9