ANN Modeling of O3 at Rural Monitoring Sites in the West Central Airshed Zone of Alberta Paper # 1120 Warren B. Kindzierski, Mohamed Gamal El-Din, Madhan Selvaraj, and Yaming He Department of Civil and Environmental Engineering, University of Alberta, Edmonton, Alberta T6G 2M8 ABSTRACT The West Central Airshed Society zone in west central Alberta has experienced high ground level ozone concentrations at a rural background station (Hightower Ridge) and at a rural station closer to anthropogenic activity (Tomahawk). The objective of this study was to assess the feasibility of using artificial neural network (ANN) modeling to evaluate historical ambient air quality and meteorological data collected at these two stations. The purpose of the evaluation was to attempt to explain the role that meteorological versus anthropogenic factors had in contributing to observed hourly average ozone concentrations. Hourly average ozone concentrations were analyzed using data for the months of March through September over the years 2001 and 2002. It was found that peak averaged ozone concentrations occurred between the hours of 3:00 pm to 5:00 pm in the range 79 to 88 µg/m3 (40 to 45 ppb). The highest averaged ozone concentrations occurred at the higher elevation station (Hightower Ridge, elevation 1,500 m above sea level). The lowest averaged ozone concentrations occurred during morning hours at each station. Much lower concentrations occur at the lower elevation station (Tomahawk, elevation 800 m above sea level). This diurnal pattern – i.e. peak levels in mid-afternoon and lower levels in early morning – are due to vertical convective mixing during day time hours and absence of this mixing during night time and early morning hours. Meteorological conditions are the most important factors related to the behavior of ozone concentrations observed at these stations. This behavior was confirmed using ANN modeling. Temperature, relative humidity, and pressure were all related to ozone concentrations. This strongly points to the importance of natural phenomena in mostly contributing to the presence of ground level ozone at these stations. Anthropogenic factors (precursor air pollutants originating from the activity of humans) are much less important. INTRODUCTION Ozone (O3) is a reactive gas that can form from the action of sunlight on man-made precursors (i.e. hydrocarbons and nitrogen oxides emitted in fuel combustion) in urban areas. Exposure to elevated levels of O3 in urban areas is reported to cause a wide variety of toxicological effects in sensitive receptors.1-3 Exposure-related effects are reported to include lung inflammation, effects on host-defense mechanisms, reduced pulmonary function, and adverse changes in lung biochemistry. O3 is also a naturally occurring trace constituent of the atmosphere. There is controversy regarding how much of ambient O3 monitored at ground level is natural and how much is 1 produced from man-made precursors.2 Estimates of the natural component of O3 vary in literature. Background O3 concentrations vary by geographic location, altitude and season. The natural component of the background originates from three sources:2 Stratospheric O3 that is transported down to the troposphere (lower part of the atmosphere extending from the surface up to a height about 7 km). O3 formed from photochemically-initiated oxidation of methane and carbon monoxide produced by biogenic sources (living organisms or their remains) and released by geological material. Photochemically-initiated oxidation of biogenic volatile organic compounds (VOCs). At certain times of the year there may be stratospheric intrusion of ozone down to ground level.4 These occurrences are reported to be rare, of short duration, and typically associated with strong frontal passages or severe thunderstorms.4 Rural long-term average O3 concentrations are reported to be relatively high in the Canadian prairies when compared with concentrations in cities or more southerly locations.1 Further, background O3 transported from rural locations plays a role in the occurrence high O3 concentrations at urban air monitoring stations. These occurrences generally arise under hot stagnant weather conditions when photochemical reactions involving man-made precursor (anthropogenic) emissions can add to the background (natural) ozone. West Central Airshed Society (WCAS) zone encompasses an area of 35,000 square kilometers and is located on the west central region in the province of Alberta (Figure 1). The western side of the air shed is heavily forested and characterized by foothills and mountainous area. The eastern side of the air shed encroaches upon the Capital Region of Alberta (Edmonton and surrounding area) and is characterized by more gently rolling terrain with greater anthropogenic development (e.g. gas plants and coal-fired power plants) and residential acreage developments. Air-quality is monitored on a continuous basis at five stations in the WCAS airshed. In the past these stations have experienced high hourly-average ground level ozone at rural air monitoring stations (e.g. Hightower Ridge, elevation 1,500 m above sea level) and at a station situated in closer proximity to anthropogenic activity (e.g. Tomahawk, elevation 800 m above sea level). The objective of this study was to assess the feasibility of using artificial neural network (ANN) modeling to evaluate historical ambient air quality and meteorological data collected at West Central Airshed Society air monitoring stations. The purpose of the evaluation was to attempt to explain the role that meteorological versus anthropogenic factors had in contributing to observed hourly-average ground level O3 levels. METHODS A systematic approach was followed in developing the ANN models. Two years of historical data – 2001 to 2002 – were initially considered for this study. Only data for the months of March to September of each year were used in developing the models. This is because peak O3 concentrations are mostly experienced during this time period over the course of the year. Focusing only on this time period would allow a better understanding of what input variables may be important in contributing to peak O3 concentration events. 2 Figure 1. Province of Alberta showing West Central Airshed Society Zone and location of Hightower Ridge and Tomahawk air monitoring stations. West Central Airshed Society Zone Edmonton Hightower Ridge Tomahawk Calgary Hourly average concentrations of NO, NO2, SO2, and PM2.5 were used as precursor and surrogate air pollutants originating from the activity of humans in developing the models for the stations. Summary statistics for these pollutants and for ozone are presented in Table 1 for the Hightower Ridge and Tomahawk station. Hourly average values for meteorological input variables for each of the stations included: wind speed, wind direction, global solar radiation, relative humidity, temperature, deviation of wind direction, and deviation of wind speed. Hourly average atmospheric pressure data for the same time period were obtained in an electronic format from Environment Canada, Edmonton, AB and used in the following manner: Hourly atmospheric pressure data from an Environment Canada weather station location in Jasper, Alberta were matched to the Hightower Ridge station dataset. 3 Hourly atmospheric pressure data from the Environment Canada weather station location in Violet Grove, Alberta were matched to the Tomahawk station dataset. Table 1. Summary hourly average concentrations for the Hightower Ridge and Tomahawk monitoring stations using data for the months of March through September over the years 2001 and 2002. NO NO2 SO2 PM2.5 3 O3 (ppb) (ppb) (ppb) (µg/m ) (ppb) Mean 0.1 0.7 0.2 2.4 44 S.D. 0.3 0.9 0.6 3.0 12 Min 0 0 0 0. 5.3 Max 14.8 15 11 36 81 90%ile 0.2 1.4 0.6 5.5 59 Mean 0.47 3.4 1.1 3.7 34 S.D. 1.3 3.4 2.4 4.8 13 Min 0 0 0 0 0.8 Max 28.3 36 52 121 91 90%ile 1.3 6.9 2.4 S.D. = standard deviation; Min = minimum; Max = maximum 8.3 50 Hightower Ridge Tomahawk Indexed temporal input variables – year, month, day and hour – were included as separate inputs. Model performance was evaluated after training each network. Generally there are a number of criteria to do this. However the coefficient of determination (R2) was used in this study. R2 indicates the proportion of variance in the model (dependent) variable – or output variable – that is explained by the input (independent) variables. Results from ANN models with different structures can be directly assessed without confusion using this criterion. ANN modeling was performed using the commercial neural net software product NeuroShell®2 (Ward Systems Group Inc., Frederick, MD) operated on an IBM® compatible computer in a Microsoft® Windows environment. The following systematic approach was used: 1. Historical data were initially screened for erroneous data using basic statistics. Missing values or extremely high values were removed from the data patterns. 2. The remaining data patterns for each station were divided into two subsets – training and production subsets – in a ratio of 3 to 2. 3. The next step involved determining initial best performing (baseline) ANN models by running a series of simulations with the data patterns and varying the number hidden layer neurons and number of epochs at different settings. A logistic activation function was used for the hidden and output layer. 4. Using the best performing ANN models based on Step 3, another series of simulations were run to identify input variables that were important contributors to prediction of the output 4 variable (hourly-average ground level O3 concentration). This was accomplished by removing one input variable at a time and re-running the ANN model with the remaining input variables. Any change in R2 by an amount 0.03 or more was deemed to indicate that the removed input variable was an important contributor in the prediction of the output (hourly average O3 concentration). For the purposes of this evaluation, input variables that were observed to have this feature were considered “core” or “important” variables. 5. Another series of simulations were run to identify input variables of secondary importance. This was accomplished by adding one input variable at a time to the models developed with the core input variables and then re-running the models. Any increase in R2 would be due to the importance of the added input variable. 6. A final series of simulations were run to optimize the network architecture of the ANN models by using the “core” and secondary important input variables and varying the number of epochs and number of hidden layer neurons. Model results were then evaluated for the purpose of identifying the best network architecture for ANN modeling. RESULTS AND DISCUSSION Hourly Average O3 Concentrations Hourly average O3 concentrations were computed for each station using data for the months of March through September over the years 2001 and 2002. These concentrations are plotted in Figures 2 and 3 for the Hightower Ridge and Tomahawk stations. In these figures, “0” represents midnight and “12” represents noon (12:00 pm). Figures 2 and 3 indicate that peak hourly-averaged concentrations between 15 and 17 (3:00 pm to 5:00 pm) in the range 79 to 88 µg/m3 (40 to 45 ppb). Slightly higher hourly-averaged concentrations occur at the higher elevation station (Hightower Ridge). The lowest hourly-averaged concentrations occur during the morning hours. In addition, much lower concentrations occur at the lower elevation station (Tomahawk) compared to Hightower Ridge. These trends are notable and their significance is explained below. The distinctly different diurnal hourly average O3 concentrations for Hightower Ridge (high elevation rural site) and Tomahawk (lower elevation rural site) have been observed by others at high versus low elevation sites in rural eastern United States.5 It was reported that these patterns reflect the fact that the O3 concentration is strongly influenced by meteorological factors. Similar afternoon O3 concentrations were observed at both high and low rural elevation sites, however much lower concentrations were observed at low elevation sites during the stable nighttime hours. This similar trend was also observed here – much lower O3 concentrations occurred during morning hours at the lower elevation sites (Tomahawk and Violet Grove) compared to the higher elevation site (Hightower Ridge). This trend is due to the fact that at night, O3 is destroyed near the surface due to physical contact with the surface vegetation.5 Since O3 formation and vertical convective mixing are inhibited at night, the O3 destroyed near the earth's surface is not replenished during the night hours. In the morning hours, with the onset of vertical convective mixing, O3 from the upper layers is delivered to the surface resulting in an increase in the surface concentration. By 3:00 pm to 5:00 pm (in the case of WCAS monitoring stations), the intensity of the vertical mixing is such that 5 there is no vertical gradient of ozone. During this time, monitors located at low and high elevation sites show similar hourly average values (i.e. 40 to 45 ppb). With the diminishing of convective mixing in the late afternoon, O3 destruction near the surface causes concentrations to decline once again. At high elevation sites, O3 monitors are exposed to air masses that are not in contact with the surface, which explains near-constant O3 values throughout a day.5 Figure 2. Hourly average O3 concentrations for Hightower Ridge station using data for the months of March through September over the years 2001 and 2002. 50 O3 concentration (ppb) 45 40 35 30 25 20 15 10 5 0 0 2 5 7 9 11 13 15 17 19 21 23 Hour of the Day Figure 3. Hourly average O3 concentrations for Tomahawk station using data for the months of March through September over the years 2001 and 2002. 50 O3 concentration (ppb) 45 40 35 30 25 20 15 10 5 0 0 2 5 7 9 11 13 15 17 19 21 23 Hour of the day 6 Hightower Ridge Station ANN Model As a result of Step 3, the initial (baseline) ANN model developed for the Hightower Ridge station using all variables had a coefficient of determination (R2) value of 0.79 (Table 2). In identifying input variables that were important contributors to prediction of the output variable (Step 4), month of the year and relative humidity emerged as core variables. A number of variables were identified as being of secondary importance (Step 5): year; hour, SO2, NO2, NO, PM2.5, temperature, global solar radiation, pressure, wind direction, and wind direction deviation. Table 2. Results of ANN modeling for Hightower Ridge and Tomahawk station using data for the months of March through September over the years 2001 and 2002. Initial (baseline) ANN model coefficient of determination (R2) value using all variables Core variables Variables of secondary importance Retrained ANN model coefficient of determination (R2) value using only core variables and variables of secondary importance Hightower Ridge Station 0.79 Tomahawk Station 0.81 month relative humidity year month NO2 concentration temperature relative humidity pressure none year hour SO2 concentration NO2 concentration NO concentration PM2.5 concentration temperature global solar radiation pressure wind direction wind direction deviation 0.81 0.78 Relative humidity and temperature were observed to be important and have been shown by others6 to be important variables related to ground level O3 concentrations. The importance of hour of day is assumed an artifact of diurnal variation that is observed in the hourly-average O3 concentration (Figure 2) and for meteorological parameters (e.g. temperature). Atmospheric pressure values were related to O3 concentrations, although the model results did not indicate an obvious and strong relationship. As indicated previously, at certain times of the year and in special geographical settings there may be stratospheric intrusion of ozone down to ground level.4 These occurrences are reported to be of short duration and typically with strong frontal passages or severe thunderstorms. These types of weather phenomena are associated with changes in atmospheric pressure. 7 Wind direction deviation was found to be an important input variable for the Hightower Ridge station. The annual average value of this parameter was found to be 27º for the Hightower Ridge station, which comes under the category of unstable atmospheric conditions.7,8 These turbulent conditions have been reported by others9,10 to be a major cause for natural phenomenon associated with O3 concentrations. Global solar radiation emerged as an important input variable. Global solar exposure would tend to be highest in the spring and early summer months coincident with the position of the sun over the northern hemisphere. Springtime high O3 concentrations are a common occurrence in the northern hemisphere.11 It was unexpected that precursor and/or surrogate indicators of anthropogenic factors (i.e. air pollutant concentrations for NO, NO2, SO2, and PM2.5) were shown to be important input parameters. However, air pollutant concentrations at the Hightower Ridge station are much less compared to that observed at Tomahawk (Table 1). As indicated previously, the Hightower Ridge station is located at an elevation of 1,500 m above sea level and is used an indicator of regional background air quality of that area. Air pollutant concentrations for these parameters at the Tomahawk station are higher. Upon closer examination of wind direction statistics (data not shown), it was observed that wind direction occurs from westerly and south-westerly directions a majority of time. The origins of these winds are from wilderness/mountainous areas that lack major anthropogenic sources within the airshed. Thus it is suggested that the origins of some the air pollutants related to O3 concentrations (e.g. at least SO2) are from air masses originating from the south and southwest outside of the airshed (e.g. transboundary origins). As hourly-average O3 concentrations were related to SO2, this indicates that at least some of the O3 has a transboundary origin (i.e. related to westerly and south-westerly directions). The final step (Step 6) consisted of using the core and secondary important input variables identified from Steps 4 and 5 and retraining the model to identify the optimum network architecture. The best (most efficient) performing model had a R2 value of 0.81 when trained with 1,300 epochs and 15 hidden layer neurons. The final model optimized for the best architecture was compared against actual output data to examine how well it could predict hourly-average O3 concentrations (Figure 4). This model was able to follow the highs and lows with a reasonable degree of accuracy. Tomahawk Station ANN Model As a result of Step 3, the initial (baseline) ANN model developed for the Tomahawk station had a coefficient of determination (R2) value of 0.81 (Table 2). In identifying input variables that were important contributors to prediction of the output variable (Step 4), year, month, NO2 concentration, temperature, relative humidity, and pressure emerged as core variables for the Tomahawk station (Table 2). Based upon the results of Step 5, none of the input variables were identified as being of secondary importance. Year emerged as an important variable. The reasons for this are unknown as only two separate years worth of data were evaluated. Inclusion of data from the months of March to September, and monthly variation observed in the concentration of O3 and other meteorological parameters (data not shown) may be the reason for the importance of month as an input variable. 8 Figure 4. Actual versus predicted ground level O3 concentrations for the Hightower Ridge station ANN model optimized for the best network architecture (R2 = 0.81). Actual Network 70 O3 concentration (ppb) 60 50 40 30 20 10 0 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 Pattern number As indicated previously, relative humidity and temperature were observed to be important variables related to ground level O3 concentrations.6 The annual average relative humidity value for the Tomahawk station was 64%. Relative humidity values >60% were related to elevated O3 concentrations in urban areas of Edmonton and Calgary, Alberta.6 Atmospheric pressure values were related to O3 concentrations. These findings again indicate a possible relationship with weather phenomena (i.e. associated changes in atmospheric pressure) and ground level O3 concentrations at the Tomahawk station. It was unknown whether indicators of anthropogenic factors (e.g. air pollutant concentrations for NO, NO2, SO2, and/or PM2.5) would be shown to be important input parameters. Air pollutant concentrations for these parameters at the Tomahawk station are higher that that observed at Hightower Ridge (refer to Table 1). Four coal-fired power plants and three natural gas processing plants exist within a 35-km radium of the Tomahawk station within the West Central Airshed Society zone. Only NO2 concentrations were related to O3 concentrations, which may be indicating a more important role of anthropogenic activities at this station. The final step (Step 6) consisted of using the core and secondary important input variables identified from Steps 4 and 5 and retraining the model to identify the optimum network architecture. The best (most efficient) performing model had a R2 value of 0.78 when trained with 500 epochs and 14 hidden layer neurons. The final model optimized for the best architecture was compared against actual output data to examine how well it could predict hourly-average O3 concentrations (Figure 5). This model was able to follow the highs and lows with a reasonable degree of accuracy. 9 Figure 5. Actual versus predicted ground level O3 concentrations for the Tomahawk station ANN model optimized for the best network architecture (R2 = 0.78). Actual Network O3 (ppb) concentration 60 50 40 30 20 10 0 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 Pattern number FINDINGS Evaluation of hourly average O3 concentrations at two rural air monitoring stations in west central Alberta for the months of March through September found that peak averaged levels occurred between the hours of 3:00 pm to 5:00 pm in the range 79 to 88 µg/m3 (40 to 45 ppb). The highest averaged O3 concentrations occurred at the higher elevation background station (Hightower Ridge). The lowest averaged ozone concentrations occurred during morning hours at each station. Much lower concentrations occur at the lower elevation station (Tomahawk). This diurnal pattern is due to vertical convective mixing during day time hours and absence of this mixing during night time and early morning hours. Meteorological conditions are the most important factors related to the behavior of O3 concentrations observed at these stations. This behavior was confirmed using ANN modeling. Temperature, relative humidity, and pressure were all related to ozone concentrations. This strongly points to the importance of natural phenomena in mostly contributing to the presence of ground level ozone at these stations. Anthropogenic factors (precursor air pollutants originating from the activity of humans) was unimportant at the rural background station (Hightower Ridge) and much less important at Tomahawk. REFERENCES 1. National Ambient Air Quality Objectives for Ground-level Ozone: Science Objectives and Guidelines, Health Canada: Ottawa, ON, 1999. 10 2. Review of National Ambient Air Quality Standards for Ozone: Assessment of Scientific and Technical Information, U.S. Environmental Protection Agency, U.S. Government Printing Office: Washington, D.C., 1996. 3. Burnett, R.T.; Brook, J.R.; Yung, W.T.; Dales, R.E. Environ. Res. 1997, 72, 24-31. 4. Guidance Document for the Management of Fine Particulate Matter and Ozone in Alberta, Clean Air Strategic Alliance: Edmonton, AB, 2003. 5. 1992 Regional Ozone Concentrations in the Northeastern United States, Northeast States for Coordinated Air Use Management: Boston, MA, 1993. 6. Application of Artificial Neural Network for Modeling Ground-level Ozone Concentrations in Calgary and Edmonton, Alberta, Su, H., Department of Civil and Environmental Engineering, University of Alberta: Edmonton, AB, 2004. 7. Guideline on Air Quality Models (revised), U.S. Environmental Protection Agency, U.S. Government Printing Office: Washington, D.C., 1986. 8. Mitchell, A.E.; Timbre, K.O. Atmospheric stability class from horizontal wind fluctuation. 72nd Ann. Meet. Air Pollut. Control Assoc., Cincinnati, OH, 1979. 9. Davies, T.D.; Schuepback, E. Atmos. Environ. 1994, 28, 53-68. 10. Chung, Y.S., and Dann, T. Atmos. Environ. 1985, 19, 157-162. 11. Monks, P.S. Atmos. Environ. 2000, 34, 3545-3561 11