Comparison of EPA Sum06 Secondary Ozone Exposure Estimates with Observed Data by Stefan Falke, Bret Schichtel, and Luis Vasconcelos CAPITA August 11, 1997 DRAFT Introduction ......................................................................................................................... 1 Data Used ............................................................................................................................ 1 Methodology ....................................................................................................................... 2 Sum06 calculation ........................................................................................................... 2 Differences in Sum06 calculation ................................................................................... 3 EPA 3-month maximum Sum06 grid ............................................................................. 5 EPA Grid to Data 3-month max. Sum06 comparison..................................................... 6 Results ................................................................................................................................. 6 EPA Grid Evaluation ...................................................................................................... 6 Discussion ........................................................................................................................... 9 Acknowledgments............................................................................................................... 9 References ........................................................................................................................... 9 Introduction EPRI is reviewing the U.S. EPA's economic analysis for the proposed secondary ozone standard. In support of this review, CAPITA is conducting an evaluation of EPA ozone exposure estimates used in EPA's analysis. This paper presents the comparison of three month maximum ozone Sum06 values calculated from the AIRS database with those estimated by the EPA in their review of the secondary ozone standard. The objectives of this evaluation are twofold: first, to reproduce the Sum06 (the summation of hourly ozone concentrations greater than or equal to 60 ppb) at monitoring stations used in the EPA analysis and second, to assess the performance of the EPA estimation with monitoring sites not included in the EPA analysis Data Used Two sets of ozone data were used in this analysis; an integrated hourly ozone database derived from a number of monitoring networks and sum06 3-month maximum ozone exposure estimates derived from the Environmental Research Laboratory's Geographic Information System (GIS). The ozone data used in this report were collected from multiple sources: Data Set Supplying Organization Years AIRS CASTNet SCION EPA EPA 1991, 1995 1991, 1995 1993, 1995 LADCO GEORGIA NORTH CAROLINA Southern Oxidant Study Lake Michigan Air Directors Consortium State of Georgia State of North Carolina 1991 (88, 93, 95) 1988, 91, 93, 95 Data from each network were extracted and combined into a single integrated data set. The details of the data sources and quality control procedures are discussed in the report "Preparation of Ozone Files for Data Analysis" by Husar and Husar 1996. The first examination of average daily maximum ozone maps has revealed anomalous ozone "holes" and peaks at unexpected locations. For those sites the hourly and daily maximum ozone values were re-examined for possible inconsistencies. Sudden systematic changes in the ozone concentrations, as well as major deviation from neighboring sites were the main clues for anomalous behavior. As a result of this quality control process, 6 out of ~1000 monitoring sites were discarded. The database described in the Husar and Husar 1996 document contains data from the Eulerian Model Evaluation and Field Study (EMEFS) during 1988 - 6/90. This ozone concentrations in this database were found to be systematically low and were removed for this analysis (see “Validation of an Ozone Integrated Database.”) The remaining data were used in all the subsequent computations exactly as submitted by the networks. Methodology The calculation of the ozone Sum06 followed the method used by the EPA. Monthly and 3 month maximum Sum06 values were calculated for each of the EPA and non-EPA monitoring sites. Values from the 1990 EPA Sum06 grid were extracted for each grid cell in which monitoring sites were located and were compared with 1990 Sum06 values calculated from observed data to evaluate the accuracy of the EPA estimates. Sum06 calculation A 3-month Sum06 value is the summation of all hourly ozone concentrations, during the day, greater than or equal to 6 ppm for a continuous three month period. The three month period was contained within the EPA defined ozone season. The ozone season varies for different regions of the U.S. For example, the southern states have a year long ozone season while Montana’s ozone season runs from June through September. The sum06 metric is a summation so it is necessary to correct for all missing data. The EPA (1996) recommended corrections were used and are presented below: Daily Sum06 values were created by summing hourly ozone observations greater than or equal to 60 ppb for the 12-hour period from 8:00 AM to 8:00PM for each day. All days that had 70% or more valid data were flagged as valid days. Monthly Sum06 values were computed by summing all daily sum06 values for a given month. A monthly Sum06 value was calculated for a monitoring site only if at least 70% of the days in the month were valid. Valid monthly values were corrected for missing data by multiplying by M/m, where M is the total number of hours in the month from 8 AM to 8 PM and m is the number of hours with ozone concentrations from 8 AM and 8 PM. Three-month Sum06 values were calculated for each site by summing together the monthly values for every three consecutive months, i.e. May - July, June - August, etc. If a monthly sum06 values was missing but the two months adjacent to it had at least 90% valid days, then a Sum06 value was calculated for the missing month as the weighted average of the adjacent months, where the weight was the number of days in the month. The maximum 3-month sum06 was found from the 3-month values. Differences in Sum06 calculation The above outlined methodology was applied to 1990 hourly ozone observations at AIRS monitoring sites. The resulting three month maximum Sum06 values (called CAPITA Sum06 in this discussion) differed from those used by the EPA in the generation of their exposure estimates (EPA Sum06). Figure A is a correlation plot of the EPA Sum06 with the CAPITA Sum06. The scatter shows that for many sites, the CAPITA Sum06 is larger than the EPA Sum06. The CAPITA Sum06 is lower than the EPA Sum06 for only few sites and these are at 3-month Sum06 values less than 30 ppm-hrs. Figure A. Analysis of the EPA 3-month maximum indicated that it was derived without the correction of monthly Sum06 for missing hourly ozone as outlined in the methodology section above. The CAPITA Sum06 was recalculated without the correction of monthly values and the results are correlated with the EPA Sum06 in Figure B. Many of the CAPITA Sum06 points which were higher than the EPA Sum06 were corrected but there are still differences between the two calculations. Figure B. Further evaluation of the EPA Sum06 values revealed that the correction for missing monthly values in the 3-month Sum06 calculation was different than that outlined in the methodology. The methodology said that months adjacent to a missing month required at least 90% valid days to be used in imputing a value for the missing month but in the calculation of the EPA Sum06 it appeared that this requirement was less strict at about 70% valid sites. Using a 70% restriction instead of 90%, the CAPITA Sum06 was recalculated and correlated with the EPA Sum06 in Figure C. All but a few sites were corrected. It was found that those sites which still had differences had different monthly Sum06 values and the cause for these differences was unknown. It would seem that the hourly ozone concentrations used in the monthly Sum06 calculation were different since no correction was conducted on the monthly Sum06 values. Figure C. A question remaining after this analysis of the EPA sum06 values is whether the EPA Sum06 values described in this section were used in generating the EPA gridded sum06 estimates described in the next section. One possibility to determine this is to examine differences between the data derived sum06 and the gridded sum06. The gridded sum06 corresponding to each monitoring site location was extracted from the EPA GIS grid and compared with the CAPITA sum06 and EPA sum06. Average Sum06 over all monitoring sites were calculated. The EPA GIS sum06 values averaged to 20.68 ppmhrs. The CAPITA sum06 had an average of 21.59 ppm-and the EPA sum06 averaged to 20.66 ppm-hrs. The EPA sum06 average is very close to the EPA GIS average indicating the EPA sum06 values were used in the generation of EPA gridded sum06 estimates. Averaging sum06 values over the entire US is a crude and very simple way of comparing the data but it does provide some initial insight into the source of the gridded sum06 values. EPA 3-month maximum Sum06 grid The EPA used a GIS (Geographic Information System) to derive its sum06 maps from the monitoring station point values. It uses a potential exposure surface (PES) as a model of spatial variation of ozone. The PES incorporates factors such as temperature, cloud cover, elevation, wind direction, and ozone precursor emission sources. The GIS generated 1990 3 -month maximum Sum06 grid with a resolution of 10 km2 is shown in Figure 1. The San Joaquin Valley in California had the largest 3-month max. Sum06 (>50 ppm-hrs). The Southeast was also a region of high Sum06 with values above 30 ppm-hrs. The southern parts of Illinois and Ohio also show elevated Sum06 as do New Jersey, Delaware and Maryland. Figure 1. GIS generated EPA grid of 3-month maximum ozone Sum06 for 1990. EPA Grid to Data 3-month max. Sum06 comparison The 1990 3-month max. Sum06 values calculated directly form the observed data were compared with those of EPA’s grid. Grid values at those locations where monitoring stations exist were extracted and compared to the data derived Sum06 using correlation plots, differences, and ratios. The point values for differences and ratios were interpolated to a grid using an inverse distance weighting interpolation. Results EPA Grid Evaluation The comparison of EPA Sum06 values with those calculated from the data was done at the sites used in the EPA analysis as well as at a set of stations not used by the EPA. The locations of the EPA sites along with a contoured grid of their data derived 3-month Sum60 values are displayed in Figure 2. The underlying contour was generated using inverse distance interpolation and exhibits unrealistic spreading of Sum06 values into areas with no monitoring sites, e.g. the high Sum06 values in Nevada and central Texas. These areas differ from the estimates in EPA’s GIS grid but the Sum06 spatial patterns in areas of high spatial density are similar to that in EPA’s grid (Figure 1) although the elevated values in the East appear to be higher than what is indicated by the EPA grid. Figure 3 contains a map of the non-EPA site locations. The size of the squares is proportional to the three month maximum Sum06 at each monitoring site. The spatial density of the monitoring sites is decent except for in the Central Plains States. The highest Sum06 values are in California, the Southeast, the Midwest, and on the Atlantic Coast. Figure 2. Three month maximum Sum06 values at sites included in EPA’s analysis. Figure 3. Three month maximum Sum06 values at sites not included in EPA’s analysis. Figure 4 contains scatterplots for the 3-month maximum Sum06 obtained from the grid and their measured data counterparts. Grid values for EPA sites tend to agree with the data. At lower values (<25 ppm-hrs), the grid was mostly larger than the data, whereas higher Sum06 data values were underestimated by the grid. More scatter is seen at nonEPA sites (Figure 4b) and the grid values are biased low. The two outliers at EPA GIS values near 100 ppm-hrs cause the regression line fit to give a slope less than one. The scatter of the data points indicates a bias in the grid to underestimate the data Sum06. One reason for the scatter seen in Figure 4 is that the values extracted from the EPA grid were at the centroid of the 10X10 km2 grid cell and not at the exact coordinates of the monitoring stations. The only way to achieve 1:1 correspondence between a site’s Sum06 and its grid value is if the site were located exactly at the center of the grid cell. This rarely, if ever, occurs and, therefore, an inherent uncertainty exists in the extraction of grid Sum06 values. Figure 4. Correlation of EPA grid vs. data Sum06 a) at EPA sites b) at non EPA sites. The difference between the EPA gridded Sum06 and data Sum06 values are displayed in Figure 5a for the EPA stations and Figure 5b for the non-EPA sites. Most of the EPA sites have a grid-data difference between –3 and +3 ppm-hrs. A few sites are outside of this range, namely sites in California, eastern Utah, southeastern Missouri, and South Carolina. It is possible that the western sites have such large differences due to the highly textured topography in their areas. The grid values used in the difference were obtained from the centroid of 10 km2 grid cells and not the Sum06 from the sites’ geographical coordinates. This could produce a Sum06 value 7 km away and at a substantially higher or lower elevation from the monitoring site. The contoured plots in the figures were created by first calculating the Sum06 difference at each of the monitoring sites and then interpolating this difference. Figure 5. Difference between grid and data 3-month max. Sum06 values a) at EPA sites and b) at nonEPA sites The non-EPA sites show larger differences than the EPA sites. The grid values obtained at most sites are lower than their calculated Sum06, particularly in much of the eastern half of the U.S. where the EPA grid estimates are more than 9ppm-hrs lower than measure data. Discussion A rising concern from this analysis is that the EPA Sum06 values almost always underestimate what is calculated from the measured data. Currently, we are verifying that the method used to calculate the 3-month max. Sum06 was identical to that used in the generation of the EPA grids and we are investigating if any of the bias is caused by extracting grid values at grid cell centroids rather than at exact station locations. Acknowledgments References