Air Pollution Effects on Daily Clinic Visits for Lower Respiratory Illness

advertisement
Bayesian Modeling of Air Pollution Effects on Clinic Visits
for Lower Respiratory Illness in Small Areas
J.S. Hwang and C.C. Chan
Academia Sinica and Taiwan University
Outline
 Environmental epidemiological studies of air pollution
 Study objective and small area design
 Environment and health data
 Statistical models
 Main findings
 Discussion
2
Epidemiological studies
 Apply time-domain methods to demonstrate
associations between air pollution and various health
effects in single geographical areas.
 These studies share two common features
1. Mainly carried out in places with a large population to
collect sufficient daily events for time series analysis.
2. Aggregate data measured from several stations in a
large area to represent population exposures.
 Misclassification is often compounded in these
studies because spatial variation of individual
exposure is typically not considered.
3
 Possible solutions
 Create less heterogeneous exposures by clustering
hospitals around a monitoring station as suggested by
Burnett et al.

Exposure attribution based on clustered hospitals
remains a serious challenge because some hospitals
are located as far as 200 km away from any monitoring
stations.
4
 Known census clusters will provide exposure populations
with smaller and more homogeneous regions (Zidek et
al.).

Many important explanatory factors are either
unmeasured or unavailable in all clusters.

Census areas are not equivalent to clinic catchment
areas.

Daily outcomes in small census subdivision are sparse
when the health outcome is the case for serious illness.
5
The Study Design and Objective
 Small Area Design
 Cluster clinics around a monitoring station to create
relatively homogeneous area of size about 20 km2.
 Population at risk of each area is the estimated service
coverage of all clinics in that area.
 Population exposure is represented by measurements from
the monitoring station.
 Health outcome is daily clinic visit for lower respiratory
illness.
6
 Objective
 Use daily pollutant levels and clinic visits for lower
respiratory illness data recorded in 50 small areas to
estimate air pollution health effect.
 Statistical Analysis
1. Estimate population at risk for each area and convert daily
clinic visit counts to daily clinic visit rates.
2. Phase I: Use linear models to model temporal patterns in
order to obtain estimated pollution-health effect for each
area.
3. Phase II: Use Bayesian hierarchical models to combine
the estimated pollution-health effects across the 50
communities.
7
The Data
 Study communities
 50 townships and city districts across the island
Include rural, urban and industrial areas
Population densities range from 250 to 28,000
persons/km2
 Environmental variables
 24-hour average for NO2, SO2 and PM10
 Daily maximum O3 and 8-hour running average for CO
 Daily maximum temperature and average dew point
 Clinic Visits
 National Health Insurance program covers medical services
of about 96% of 23 millions population by a network of
16,122 contracted medical institutions in Taiwan.
 Huge computerized clinic visit records contain clinic’s ID,
township names, date-of-visit, patient's ID, gender, birthday,
cause-of-visit, payments and others.
 One-year records from the 50 study communities in 1998.
 Clinic visits due to lower respiratory illness like acute
bronchitis, acute bronchiolitis, and pneumonia are used as
health effects.
 Classify the population at risk into 3 age groups: children
(0-14), adults (15-64) and elderly (65+).
Population at Risk
 Define population at risk for a selected community as
those who would go to the clinics in the community
whenever they need to make medical visits, which is
the service coverage of all the clinics in the community.
 Include some non-resident daytime workers who may
visit clinic in the community, but exclude residents who
prefer to use medical resources outside the community.
10
Data Summary
 Estimated population at risk ranged from 19,000 to 278,000.
 The averages of daily average NO2, SO2, PM10, and CO
levels were 23.6 ppb, 5.4 ppb, 58.9 g / m 3 , 1.0 ppm, and
daily maximum O3 levels 54.2 ppb.
 Yearly average temperature ranged from 24.6℃ to 30.6℃.
 The average of daily rates of clinic visits due to lower
respiratory illness was 1.34 per 1000.
 The average rates are 2.39, 0.88 and 1.02 per 1000 for the
children, adults and elderly groups, respectively.
11
Population Estimation
 Similar to estimating the number of unseen species in
ecological studies, using only the numbers of individuals
captured during a fixed interval of time.
 Use clinic visits due to all diseases recorded in the study
communities during 1998 to estimate population at risk.
 An individual’s x times of clinic visits in a community
during one year is analogous to a species having x
members captured during one unit of time.
12
 For the species problem, the x members are assumed
unrelated, while one person’s x clinic visits are generally
correlated.
 Assumption may still be satisfied when we only count the
first visit for consecutive visits with same diagnosis in a
short time period.
 Let n x be the number of people having exactly x clinic
visits in a community during 1998.
 x 1 n x is the total number of different people having
made at least one clinic visit in that community in 1998.
 The number of people who made no clinic visits in 1998
but would do so if they were later sick is n 0 .
13
 Assume that all n 0 people will eventually get sick and
visit one of the clinics in this community in the coming t
years.
 The expected number of n 0 is denoted by (t ) in
unseen species problem.
 Efron and Thisted (Biometrika,76) proposed
ˆ(t )  x0 h x n x
x 1
with h x  ( 1) x 1 t x Pr ( B  x ) , where B is Bin( x0 , 1 (1  t )) .
14
 Ideally, one should choose an appropriate t value to
obtain less biased population estimation without excess
uncertainty.
 Our choice of t  5 is based on the observation that
 Patient’s medical seeking behavior was stable under the
NHI program
 Limited changes in the demographics of study communities
in the past six years in Taiwan.
15
 Validity of the population estimator
 We estimated the number of people not recorded in the
database of 1997 but who appeared in 1998.
 Mean absolute value of the relative difference between
estimated additional subjects, ˆ (t  1) , and actually
observed new patients in 1998 was less than 2% across
study communities.
16
Phase I modeling
 Use daily visit rate in log scale instead of count as
response variable.
 Daily series of rates for each sub-population by area
and age group are modeled separately.
 Our models are general linear regressions with
seasonal autoregressive moving average residual
processes.
 The regression terms/confounding variables were
chosen through extensive exploratory data analyses.
17
The Model:
log ( y iat )   iah 0   iah1SUN   iah 2 MON   iah 3SAT   iah 4SH 
 iah5 WIN   iah 6SUM   iah 7 TG32 it   iah8 TL32 it 
 iah 9 TP3it   iah10 DEWit   iah11 DP3it 
 iah POLL i , t  h  Wiaht ,
where y iat is the observed clinic visit rate of the a th age
group in the i th community at the t th day.
 POLLi, t-h is the level of pollutant at day t  h , where t is
the current day and h ranges from 0 to 2.
  iah is the pollution coefficient.
 Wiaht ~ SARIMA(1,0,0)  (1,0,0) 7 .
18
 Model Selection
 The model was examined at several communities with a
mean R-squared = 0.53 in fitting the data of all the sub
populations.
 Ideally, we can explore the data to find the best models
for each setting of the combination of 5 air pollutants, 3
time lags, and 4 age categories in all 50 locations,
respectively.
 However, because of efficiency considerations we apply
this single regression model to all sub-populations in all
50 locations at this phase.
19
 Health impact is measured as the percentage increase
in clinic visit rates that corresponds to a 10% increase
in local air pollution levels.
 The percentage change is expressed by
100{exp(0.1 X ih ˆiah )  1}, where ˆiah is the estimated
pollution coefficients for community i , age group a , and
lag h , and X ih is the corresponding average pollution
level.
 The 95% confidence interval for the percentage change
is constructed by replacing ˆiah with ˆiah  2ˆ iah , where
ˆ iah is the standard error.
20
Phase II modeling
 The second phase of hierarchical modeling is to use
variables of community’s characteristics and spatial
dependency
 To modify pollution coefficient estimate in each location,
 To obtain an overall pollution coefficient estimate across
multiple locations.
21
 Three stages:
 First, the estimated 50 pollution coefficients for a single
pollutant, a fixed age group and time lag, denoted as
ˆ  ( ˆ1 ,, ˆ50 )' are assumed to be multivariate normal,
that is
ˆ ~ N 50 (  , ) ,
where   (  1 ,  50 )' and   diag(ˆ12 ,,ˆ 502 ) , and ˆ i is
the estimate of standard error of ˆi .
22
 Second, spatial variation among the 50 mean pollution
coefficients is further modeled as follows.
 i   0  1Z1i     q Z qi   i ,
Corr ( i ,  j )   2  exp{d ij / R},
where d ij is Euclidean distance between the air monitoring
stations for communities i , j , and R is a range parameter.
For the current study, we use community’s population
density, annual average of temperature, and annual levels
of the 5 major pollutants to construct the regression terms
Z i  [sPD, sT , sNO2 , sSO2 , sPM 10 , sO3 , sCO ]i ,
23
The intercept  0 can be interpreted as an overall
pollution coefficient for any location with mean predictors.
The other coefficients, 1 ,, 7 , reflect the modification or
adjustment on its local pollution coefficient (  i ) from the
location's population density, long-term average
temperature and pollution levels.
Based on empirical correlograms for the 50 estimated
pollution coefficients, the range parameter R is fixed at 5
km.
24
 Third, we complete the hierarchical structure with a proper
prior model for  and  2
We use conjugate priors, normal prior  ~ N ( , C ) and
inverse gamma prior  2 ~ IG(a, b) in our model.
The hyper parameters,  , C , a, b , in our model are chosen
to reflect no information on  and  2 .
 The Bayesian inference is based on the posterior
distribution of  ,  and  2 given the Phase I estimates
̂ , ̂ and the specified hyper parameters.
 Samples from these posteriors can be obtained from the
MCMC algorithm, or simply use BUGS software.
25
Results
 From Phase I analysis
 Variation in clinic visits was likely related to variation in NO2,
CO, SO2 and PM10 exposures.
 There was no significant pollution effect for ozone
exposures.
 Significant association was seen at current day but less
significant at 1-day lag among most of these 50
communities.
 Significant intra-community and inter-community variability
in the estimated percentage changes of clinic visit rates
across 50 communities.
26
1
2
4
5
6
7
8
9
10
11
12
13
14
15
17
20
21
22
23
24
26
28
29
30
31
32
33
36
37
38
39
40
42
43
44
45
46
48
50
51
52
53
54
55
56
58
59
60
65
69
-2 -1
0
1
2
3
4
5
6
% increase in clinic visit rate
1
2
4
5
6
7
8
9
10
11
12
13
14
15
17
20
21
22
23
24
26
28
29
30
31
32
33
36
37
38
39
40
42
43
44
45
46
48
50
51
52
53
54
55
56
58
59
60
65
69
-2 -1
0
1
2
3
4
5
6
% increase in clinic visit rate
Figure 2. Effects of 10% increase in nitrogen dioxide with 0-1 day lags on
percentage change in clinic visit rates for lower respiratory illness among 50
communities, as estimated by the Phase I model.
Phase I model for NO2 in all ages combined
Lag0
Area
Lag1
Area
27
 From Phase II analysis
 The 95% posterior support intervals of the estimated overall
pollution coefficient ( 0 ) showed that clinic visits were
related to NO2, CO, SO2 and PM10 exposures but not O3.
 An individual community’s pollution coefficient for NO2 was
negatively adjusted by long-term PM10 and O3 exposure.
 The acute effect of SO2 exposure was negatively adjusted
by community’s population density, long-term PM10 and SO2
exposure.
 The acute effects of SO2 exposure was positively affected
by community’s annual CO and O3 concentrations.
28
 The acute effect of CO exposure was negatively affected by
community’s population density, long-term exposure of PM10
and O3.
 The acute effect of PM10 exposure was slightly affected by
long-term exposure of PM10 negatively and long-term
exposure of CO positively.
 In summary, area’s annual PM10 level is a major effect
modifier. The short-term effects of air pollution on lower
respiratory illness would be lower in areas with a large PM10
average.
 Yearly averages of community’s NO2 and SO2 levels,
however, had no significant influence on the acute effects of
the 5 pollutants in the Phase II models.
29
Figure 3. Posterior means and 95% posterior support intervals of the overall
true pollution coefficients and the coefficients of community covariates in the
Phase II model.
SO2
-10
-2
Coefficients
0 2 4
Coefficients
-5 0
5 10
6
NO2
Temp NO2
SO2 PM10
O3
CO
Overall PD
Temp NO2 SO2 PM10
Covariate
Covariate
PM10
CO
O3
CO
O3
CO
Overall PD
-50
-1.0
Coefficients
0
50
Coefficients
0.0
1.0
100
Overall PD
Temp NO2 SO2 PM10
Covariate
O3
CO
Overall PD
Temp NO2 SO2 PM10
Covariate
30
1
2
4
5
6
7
8
9
10
11
12
13
14
15
17
20
21
22
23
24
26
28
29
30
31
32
33
36
37
38
39
40
42
43
44
45
46
48
50
51
52
53
54
55
56
58
59
60
65
69
overall
-2 -1
0
1
2
3
4
5
6
% increase in clinic visit rate
1
2
4
5
6
7
8
9
10
11
12
13
14
15
17
20
21
22
23
24
26
28
29
30
31
32
33
36
37
38
39
40
42
43
44
45
46
48
50
51
52
53
54
55
56
58
59
60
65
69
overall
-2 -1
0
1
2
3
4
5
6
% increase in clinic visit rate
Figure 4. Effects of 10% increase in nitrogen dioxide with 0-1 day lags on
percentage change in clinic visit rates for lower respiratory illness among 50
communities, as estimated by the Phase II model.
Model for NO2 in all ages combined
Lag0
Area
Lag1
Area
31
 Findings in summary
 NO2 had the greatest estimated percentage increases in
daily clinic visit rates for a 10% increase in pollution levels.
 The pollution effects were always the greatest for
current-day exposures and decreased significantly as
exposure time lags increased for NO2, CO, SO2 and PM10.
 The magnitudes of pollution effects appeared to increase
with age, the elderly being the most susceptible.
 The short-term effects of air pollution on lower respiratory
illness would be lower in areas with a large PM10 average.
32
NO2
SO2
PM10
O3
O3
All
Adu.
Chi.
All
Eld.
Adu.
Chi.
All
Eld.
Adu.
Chi.
All
Eld.
Adu.
Chi.
All
Eld.
Adu.
Chi.
All
Lag1
Eld.
CO
Eld.
Adu.
Chi.
All
Eld.
PM10
Adu.
Chi.
All
Eld.
Adu.
SO2
Chi.
All
Eld.
Adu.
NO2
Chi.
All
Eld.
Adu.
Chi.
Percent changes
0.0
0.5
1.0
Percent changes
0.0
1.0
2.0
Figure 5. Estimated percent changes in clinic visit rates for a 10% increase in
national air pollution levels with 0-1 day lags by ages and pollutants.
Lag0
CO
33
Discussion
 Subject matter
 Few epidemiologic studies have related clinic visits of minor
illness to ambient air pollution.
 Studies on minor health effects of air pollution should be
encouraged even though currently major on-going
epidemiologic studies on air pollution are about mortality.
 From scientific viewpoints, the studies on minor health
effects can strengthen consistency in the biological
plausibility of mortality effects by air pollution.
 From public health viewpoints, a minor health effect usually
impacts on large-scale population and can lead to the
death of susceptible population.
34
 Population at risk estimation is an important issue in
environmental health studies.
 Gaussian linear process for rates versus Poisson process
for counts
 Linear predictors of these two models are the same
except one constant term of population at risk in log scale;
 A minor difference between these two models is the
assumed variance structure;
 Gaussian process provides us with flexible model
selection, diagnostics and simplified computation.
 High collinearity among air pollutants prevents us from
using multi-pollutant models.
35
 Joint tempo-spatial models can fit the multiple time series
of rates data simultaneously. However, model selection
and calculations are challenges.
 Some other challenging issues of epidemiologic studies
on air pollution:
 Why the exposure-response slopes for individual air
pollutants varied significantly among different study sites?
 Whether the pollution effects were from single pollutant or
mixtures of air pollutants?
 What was the relationship between chronic and acute
exposure effects?
36
Download