Bayesian Modeling of Air Pollution Effects on Clinic Visits for Lower Respiratory Illness in Small Areas J.S. Hwang and C.C. Chan Academia Sinica and Taiwan University Outline Environmental epidemiological studies of air pollution Study objective and small area design Environment and health data Statistical models Main findings Discussion 2 Epidemiological studies Apply time-domain methods to demonstrate associations between air pollution and various health effects in single geographical areas. These studies share two common features 1. Mainly carried out in places with a large population to collect sufficient daily events for time series analysis. 2. Aggregate data measured from several stations in a large area to represent population exposures. Misclassification is often compounded in these studies because spatial variation of individual exposure is typically not considered. 3 Possible solutions Create less heterogeneous exposures by clustering hospitals around a monitoring station as suggested by Burnett et al. Exposure attribution based on clustered hospitals remains a serious challenge because some hospitals are located as far as 200 km away from any monitoring stations. 4 Known census clusters will provide exposure populations with smaller and more homogeneous regions (Zidek et al.). Many important explanatory factors are either unmeasured or unavailable in all clusters. Census areas are not equivalent to clinic catchment areas. Daily outcomes in small census subdivision are sparse when the health outcome is the case for serious illness. 5 The Study Design and Objective Small Area Design Cluster clinics around a monitoring station to create relatively homogeneous area of size about 20 km2. Population at risk of each area is the estimated service coverage of all clinics in that area. Population exposure is represented by measurements from the monitoring station. Health outcome is daily clinic visit for lower respiratory illness. 6 Objective Use daily pollutant levels and clinic visits for lower respiratory illness data recorded in 50 small areas to estimate air pollution health effect. Statistical Analysis 1. Estimate population at risk for each area and convert daily clinic visit counts to daily clinic visit rates. 2. Phase I: Use linear models to model temporal patterns in order to obtain estimated pollution-health effect for each area. 3. Phase II: Use Bayesian hierarchical models to combine the estimated pollution-health effects across the 50 communities. 7 The Data Study communities 50 townships and city districts across the island Include rural, urban and industrial areas Population densities range from 250 to 28,000 persons/km2 Environmental variables 24-hour average for NO2, SO2 and PM10 Daily maximum O3 and 8-hour running average for CO Daily maximum temperature and average dew point Clinic Visits National Health Insurance program covers medical services of about 96% of 23 millions population by a network of 16,122 contracted medical institutions in Taiwan. Huge computerized clinic visit records contain clinic’s ID, township names, date-of-visit, patient's ID, gender, birthday, cause-of-visit, payments and others. One-year records from the 50 study communities in 1998. Clinic visits due to lower respiratory illness like acute bronchitis, acute bronchiolitis, and pneumonia are used as health effects. Classify the population at risk into 3 age groups: children (0-14), adults (15-64) and elderly (65+). Population at Risk Define population at risk for a selected community as those who would go to the clinics in the community whenever they need to make medical visits, which is the service coverage of all the clinics in the community. Include some non-resident daytime workers who may visit clinic in the community, but exclude residents who prefer to use medical resources outside the community. 10 Data Summary Estimated population at risk ranged from 19,000 to 278,000. The averages of daily average NO2, SO2, PM10, and CO levels were 23.6 ppb, 5.4 ppb, 58.9 g / m 3 , 1.0 ppm, and daily maximum O3 levels 54.2 ppb. Yearly average temperature ranged from 24.6℃ to 30.6℃. The average of daily rates of clinic visits due to lower respiratory illness was 1.34 per 1000. The average rates are 2.39, 0.88 and 1.02 per 1000 for the children, adults and elderly groups, respectively. 11 Population Estimation Similar to estimating the number of unseen species in ecological studies, using only the numbers of individuals captured during a fixed interval of time. Use clinic visits due to all diseases recorded in the study communities during 1998 to estimate population at risk. An individual’s x times of clinic visits in a community during one year is analogous to a species having x members captured during one unit of time. 12 For the species problem, the x members are assumed unrelated, while one person’s x clinic visits are generally correlated. Assumption may still be satisfied when we only count the first visit for consecutive visits with same diagnosis in a short time period. Let n x be the number of people having exactly x clinic visits in a community during 1998. x 1 n x is the total number of different people having made at least one clinic visit in that community in 1998. The number of people who made no clinic visits in 1998 but would do so if they were later sick is n 0 . 13 Assume that all n 0 people will eventually get sick and visit one of the clinics in this community in the coming t years. The expected number of n 0 is denoted by (t ) in unseen species problem. Efron and Thisted (Biometrika,76) proposed ˆ(t ) x0 h x n x x 1 with h x ( 1) x 1 t x Pr ( B x ) , where B is Bin( x0 , 1 (1 t )) . 14 Ideally, one should choose an appropriate t value to obtain less biased population estimation without excess uncertainty. Our choice of t 5 is based on the observation that Patient’s medical seeking behavior was stable under the NHI program Limited changes in the demographics of study communities in the past six years in Taiwan. 15 Validity of the population estimator We estimated the number of people not recorded in the database of 1997 but who appeared in 1998. Mean absolute value of the relative difference between estimated additional subjects, ˆ (t 1) , and actually observed new patients in 1998 was less than 2% across study communities. 16 Phase I modeling Use daily visit rate in log scale instead of count as response variable. Daily series of rates for each sub-population by area and age group are modeled separately. Our models are general linear regressions with seasonal autoregressive moving average residual processes. The regression terms/confounding variables were chosen through extensive exploratory data analyses. 17 The Model: log ( y iat ) iah 0 iah1SUN iah 2 MON iah 3SAT iah 4SH iah5 WIN iah 6SUM iah 7 TG32 it iah8 TL32 it iah 9 TP3it iah10 DEWit iah11 DP3it iah POLL i , t h Wiaht , where y iat is the observed clinic visit rate of the a th age group in the i th community at the t th day. POLLi, t-h is the level of pollutant at day t h , where t is the current day and h ranges from 0 to 2. iah is the pollution coefficient. Wiaht ~ SARIMA(1,0,0) (1,0,0) 7 . 18 Model Selection The model was examined at several communities with a mean R-squared = 0.53 in fitting the data of all the sub populations. Ideally, we can explore the data to find the best models for each setting of the combination of 5 air pollutants, 3 time lags, and 4 age categories in all 50 locations, respectively. However, because of efficiency considerations we apply this single regression model to all sub-populations in all 50 locations at this phase. 19 Health impact is measured as the percentage increase in clinic visit rates that corresponds to a 10% increase in local air pollution levels. The percentage change is expressed by 100{exp(0.1 X ih ˆiah ) 1}, where ˆiah is the estimated pollution coefficients for community i , age group a , and lag h , and X ih is the corresponding average pollution level. The 95% confidence interval for the percentage change is constructed by replacing ˆiah with ˆiah 2ˆ iah , where ˆ iah is the standard error. 20 Phase II modeling The second phase of hierarchical modeling is to use variables of community’s characteristics and spatial dependency To modify pollution coefficient estimate in each location, To obtain an overall pollution coefficient estimate across multiple locations. 21 Three stages: First, the estimated 50 pollution coefficients for a single pollutant, a fixed age group and time lag, denoted as ˆ ( ˆ1 ,, ˆ50 )' are assumed to be multivariate normal, that is ˆ ~ N 50 ( , ) , where ( 1 , 50 )' and diag(ˆ12 ,,ˆ 502 ) , and ˆ i is the estimate of standard error of ˆi . 22 Second, spatial variation among the 50 mean pollution coefficients is further modeled as follows. i 0 1Z1i q Z qi i , Corr ( i , j ) 2 exp{d ij / R}, where d ij is Euclidean distance between the air monitoring stations for communities i , j , and R is a range parameter. For the current study, we use community’s population density, annual average of temperature, and annual levels of the 5 major pollutants to construct the regression terms Z i [sPD, sT , sNO2 , sSO2 , sPM 10 , sO3 , sCO ]i , 23 The intercept 0 can be interpreted as an overall pollution coefficient for any location with mean predictors. The other coefficients, 1 ,, 7 , reflect the modification or adjustment on its local pollution coefficient ( i ) from the location's population density, long-term average temperature and pollution levels. Based on empirical correlograms for the 50 estimated pollution coefficients, the range parameter R is fixed at 5 km. 24 Third, we complete the hierarchical structure with a proper prior model for and 2 We use conjugate priors, normal prior ~ N ( , C ) and inverse gamma prior 2 ~ IG(a, b) in our model. The hyper parameters, , C , a, b , in our model are chosen to reflect no information on and 2 . The Bayesian inference is based on the posterior distribution of , and 2 given the Phase I estimates ̂ , ̂ and the specified hyper parameters. Samples from these posteriors can be obtained from the MCMC algorithm, or simply use BUGS software. 25 Results From Phase I analysis Variation in clinic visits was likely related to variation in NO2, CO, SO2 and PM10 exposures. There was no significant pollution effect for ozone exposures. Significant association was seen at current day but less significant at 1-day lag among most of these 50 communities. Significant intra-community and inter-community variability in the estimated percentage changes of clinic visit rates across 50 communities. 26 1 2 4 5 6 7 8 9 10 11 12 13 14 15 17 20 21 22 23 24 26 28 29 30 31 32 33 36 37 38 39 40 42 43 44 45 46 48 50 51 52 53 54 55 56 58 59 60 65 69 -2 -1 0 1 2 3 4 5 6 % increase in clinic visit rate 1 2 4 5 6 7 8 9 10 11 12 13 14 15 17 20 21 22 23 24 26 28 29 30 31 32 33 36 37 38 39 40 42 43 44 45 46 48 50 51 52 53 54 55 56 58 59 60 65 69 -2 -1 0 1 2 3 4 5 6 % increase in clinic visit rate Figure 2. Effects of 10% increase in nitrogen dioxide with 0-1 day lags on percentage change in clinic visit rates for lower respiratory illness among 50 communities, as estimated by the Phase I model. Phase I model for NO2 in all ages combined Lag0 Area Lag1 Area 27 From Phase II analysis The 95% posterior support intervals of the estimated overall pollution coefficient ( 0 ) showed that clinic visits were related to NO2, CO, SO2 and PM10 exposures but not O3. An individual community’s pollution coefficient for NO2 was negatively adjusted by long-term PM10 and O3 exposure. The acute effect of SO2 exposure was negatively adjusted by community’s population density, long-term PM10 and SO2 exposure. The acute effects of SO2 exposure was positively affected by community’s annual CO and O3 concentrations. 28 The acute effect of CO exposure was negatively affected by community’s population density, long-term exposure of PM10 and O3. The acute effect of PM10 exposure was slightly affected by long-term exposure of PM10 negatively and long-term exposure of CO positively. In summary, area’s annual PM10 level is a major effect modifier. The short-term effects of air pollution on lower respiratory illness would be lower in areas with a large PM10 average. Yearly averages of community’s NO2 and SO2 levels, however, had no significant influence on the acute effects of the 5 pollutants in the Phase II models. 29 Figure 3. Posterior means and 95% posterior support intervals of the overall true pollution coefficients and the coefficients of community covariates in the Phase II model. SO2 -10 -2 Coefficients 0 2 4 Coefficients -5 0 5 10 6 NO2 Temp NO2 SO2 PM10 O3 CO Overall PD Temp NO2 SO2 PM10 Covariate Covariate PM10 CO O3 CO O3 CO Overall PD -50 -1.0 Coefficients 0 50 Coefficients 0.0 1.0 100 Overall PD Temp NO2 SO2 PM10 Covariate O3 CO Overall PD Temp NO2 SO2 PM10 Covariate 30 1 2 4 5 6 7 8 9 10 11 12 13 14 15 17 20 21 22 23 24 26 28 29 30 31 32 33 36 37 38 39 40 42 43 44 45 46 48 50 51 52 53 54 55 56 58 59 60 65 69 overall -2 -1 0 1 2 3 4 5 6 % increase in clinic visit rate 1 2 4 5 6 7 8 9 10 11 12 13 14 15 17 20 21 22 23 24 26 28 29 30 31 32 33 36 37 38 39 40 42 43 44 45 46 48 50 51 52 53 54 55 56 58 59 60 65 69 overall -2 -1 0 1 2 3 4 5 6 % increase in clinic visit rate Figure 4. Effects of 10% increase in nitrogen dioxide with 0-1 day lags on percentage change in clinic visit rates for lower respiratory illness among 50 communities, as estimated by the Phase II model. Model for NO2 in all ages combined Lag0 Area Lag1 Area 31 Findings in summary NO2 had the greatest estimated percentage increases in daily clinic visit rates for a 10% increase in pollution levels. The pollution effects were always the greatest for current-day exposures and decreased significantly as exposure time lags increased for NO2, CO, SO2 and PM10. The magnitudes of pollution effects appeared to increase with age, the elderly being the most susceptible. The short-term effects of air pollution on lower respiratory illness would be lower in areas with a large PM10 average. 32 NO2 SO2 PM10 O3 O3 All Adu. Chi. All Eld. Adu. Chi. All Eld. Adu. Chi. All Eld. Adu. Chi. All Eld. Adu. Chi. All Lag1 Eld. CO Eld. Adu. Chi. All Eld. PM10 Adu. Chi. All Eld. Adu. SO2 Chi. All Eld. Adu. NO2 Chi. All Eld. Adu. Chi. Percent changes 0.0 0.5 1.0 Percent changes 0.0 1.0 2.0 Figure 5. Estimated percent changes in clinic visit rates for a 10% increase in national air pollution levels with 0-1 day lags by ages and pollutants. Lag0 CO 33 Discussion Subject matter Few epidemiologic studies have related clinic visits of minor illness to ambient air pollution. Studies on minor health effects of air pollution should be encouraged even though currently major on-going epidemiologic studies on air pollution are about mortality. From scientific viewpoints, the studies on minor health effects can strengthen consistency in the biological plausibility of mortality effects by air pollution. From public health viewpoints, a minor health effect usually impacts on large-scale population and can lead to the death of susceptible population. 34 Population at risk estimation is an important issue in environmental health studies. Gaussian linear process for rates versus Poisson process for counts Linear predictors of these two models are the same except one constant term of population at risk in log scale; A minor difference between these two models is the assumed variance structure; Gaussian process provides us with flexible model selection, diagnostics and simplified computation. High collinearity among air pollutants prevents us from using multi-pollutant models. 35 Joint tempo-spatial models can fit the multiple time series of rates data simultaneously. However, model selection and calculations are challenges. Some other challenging issues of epidemiologic studies on air pollution: Why the exposure-response slopes for individual air pollutants varied significantly among different study sites? Whether the pollution effects were from single pollutant or mixtures of air pollutants? What was the relationship between chronic and acute exposure effects? 36