This work is licensed under a . Your use of this

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this
material constitutes acceptance of that license and the conditions of use of materials on this site.
Copyright 2008, The Johns Hopkins University and Francesca Dominici. All rights reserved. Use of these materials
permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or
warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently
review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for
obtaining permissions for use from third parties as needed.
How Risky is Breathing?
Statistical Methods in Air Pollution
Risk Estimation
Francesca Dominici
Department of Biostatistics
Bloomberg School of Public Health
Johns Hopkins University
From crisis to questions
• We began with crisis-- the
London fog in 1952, and have
moved to questions:
– Are there adverse effects of
today’s air pollution?
– How large are these risks?
Particulate levels – 3,000
g /m 3

This image has been deleted because JHSPH OpenCourseWare
was not able to secure permission for its use.
December 5 1952: London's Piccadilly Circus at midday
This image has been
deleted because JHSPH
OpenCourseWare was
not able to secure
permission for its use.
Maureen Scholes,
a nurse at the Royal
London Hospital in 1952,
says the smog penetrated
through clothes,
blackening undergarments
4,000 deaths the first
week
8,000 over next 2
months
Source: Royal London Hospital Archives and Museum
Designer Smog Masks - London
1950’s
This image has been deleted because JHSPH OpenCourseWare
was not able to secure permission for its use.
Davis When Smoke Ran Like Water (2002)
Air pollution and mortality:
Then and now
London, December, 1952
Mortality and PM10 in Chicago, 2000
This image has been
deleted because JHSPH
OpenCourseWare was
not able to secure
permission for its use.
"London "Killer" Smog of 1952" from Environmental Health. Available at: http://ocw.jhsph.edu. Copyright © Johns
Hopkins Bloomberg School of Public Health. Creative Commons BY-NC-SA. Adapted from Turco, R. P.
Air pollution and health:
Fundamental questions
Is there a risk at current levels?
yes
How can we estimate it?
By integrating national data sets
and developing methods to analyze
them
How big is the risk?
The risk is very small but everyone is
exposed!
What causes it?
???
Bad air day?
Chicago PM2.5 = 10 g /m

3
This image has been deleted because JHSPH OpenCourseWare
was not able to secure permission for its use.
Bad air day?
Chicago PM2.5 = 20

g /m
3
This image has been deleted because JHSPH OpenCourseWare
was not able to secure permission for its use.
Bad air day?
Chicago PM2.5 = 30
g /m
3

This image has been deleted because JHSPH OpenCourseWare
was not able to secure permission for its use.
Standard setting process in the US is evidence-based
National Data Sets
National Morbidity Mortality Air
Pollution Study
• Collected data 100 largest cities in the
United States
– Daily mortality
– Daily temperature
– Daily level of PM10
• Long time series
– 1987 to 2000
The National Medicare Cohort Study,
1999-2005 (MCAPS)
• Medicare data include:
– Billing claims for everyone over 65
enrolled in Medicare (~48 million
people),
•date of service
•treatment, disease (ICD 9)
•age, gender, and race
•place of residence (zip code)
• Approximately 204 counties linked
to the PM2.5 monitoring network
MCAPS study population: 204 counties with populations larger
than 200,000 (11.5 million people)
This image has been deleted because JHSPH OpenCourseWare
was not able to secure permission for its use.
Please visit www.biostat.jhsph.edu/MCAPS for maps and other
MCAPS information
Daily time series of hospitalization rates and PM2.5
levels in Los Angeles county (1999-2005)
This image has been deleted because JHSPH OpenCourseWare
was not able to secure permission for its use.
Please visit www.biostat.jhsph.edu/MCAPS for maps and other
MCAPS information
Statistical Ideas
3 Statistical Ideas for
Analysis of Observational
Studies
1. Adjusting for confounding
– Semi-Parametric Regression
2. Combining health risk estimates
across counties
–
Bayesian Hierarchical Models
3. Accounting for the uncertainty in the
selection of the statistical model
– Model averaging for confounding
adjustment
Statistical Methods for multi-site
time series studies
• Compare day-to-day variations in
hospital admission rates with day-today variations in pollution levels
within the same community
• Avoid problem of unmeasured
differences among populations
• Key confounders
Seasonal effects of infectious
diseases and weather
Statistical Methods
Within city: Semi-parametric regressions
for estimating associations between dayto-day variations in air pollution and
mortality controlling for confounding
factors
Across cities: Hierarchical Models for
estimating:
– national-average relative rate
– exploring heterogeneity of air pollution
effects across the country
Dominici Samet Zeger JRSSA 2000
Confounding
•
The association between air
pollution and mortality is potentially
confounded by:
– Weather
– Other pollutants
– Seasonality
– Long-term trend
1) Semi-parametric regression model
for estimating health risk within a
county
air pollution series
log E[Yt ]  log N   x t  s(temp)  s(time)
c
# of adverse
events
on day t
c
t
# of people at
risk on day t
c
health risk
Kelsall Samet Zeger Xu AJE 1997
Time varying
confounders:
•Weather variables
•seasonality
2)Bayesian hierarchical
models for pooling risks across
cities
County-specific
risk estimate
County-specific
true risk
Within-county
statistical error
c
c
c
ˆ
    
 ~ N(0,v )
  
 ~ N(0, )
c
Pooled risk

c
c
c
c
2
Across-county variance
of the true risks
3) Do I have the “right” statistical
model?
Explore the sensitivity of the risk
estimates to the statistical model
Sensitivity of the national average lag effect of PM10 on
mortality to different statistical models to adjust for
confounding (NMMAPS 1987-2000)
weak moderate
strong
Reported estimate
Different statistical models to adjust for confounding
Peng Dominici Louis JRSSC 2006
3) Do I have the “right” statistical
model?
X
Z1
Z2
Y
Z1 is a predictor of Y
Z2 is a confounder
Estimating risks by averaging across statistical models
Regression Models
y  x  1z1
y  x  2z2
y  x  1z1  2z2
Weights based on
prediction(BIC)
Weights based
on ability to
adjust for
confounding
0.9
0.0
0.0
0.9
0.1
0.1
3) Model averaging for confounding
adjustment in observational studies
• We assign zero weights to models that have
optimal prediction properties but that do not
include all the potential confounders
• We identify all the potential confounders by
searching for good predictors of the
exposure X
• Theoretical results and simulation studies
have showed that this approach outperform
existing methods to account for model
uncertainty
Crainiceanu Dominici Parmigiani Biometrika 2007
Wang Crainiceanu Parmigiani Dominici technical report 2007
Biostatistics in Action:
The weight of the
evidence
This image has been deleted because JHSPH OpenCourseWare
was not able to secure permission for its use.
Full-text available at
http://content.nejm.org/cgi/content/abstract/343/24/1742
O3
November 17 2004
Mortality
This image has been deleted because JHSPH OpenCourseWare
was not able to secure permission for its use.
Full-text available at
http://jama.ama-assn.org/cgi/content/full/292/19/2372
PM2.5
March 8 2005
Hospital
Admissions
This image has been deleted because JHSPH OpenCourseWare
was not able to secure permission for its use.
Full-text available at
http://jama.ama-assn.org/cgi/content/full/295/10/1127
The new challenge:
Estimating the toxicity of
the PM complex mixture
New Scientific Questions
and Statistical Challenges
What are the mechanisms of
PM toxicity?
 Size?
 Chemical components?
 Sources?
New Methods for estimating health
effects of complex mixtures
Emission
sources
Chemical
constituents
Size
Total mass
K
Biomass
burning
Cl
EC
Vehicles
OC
SO4
PM2.5
PM10
NO3
Si
Crustal
PM10-2.5
Ca
Al
Fe
Bell Dominici Ebisu Zeger Samet
EHP 2007
3
 g/m increase in PM
% change in CVD hospitalization rate associated with 10
increase in PM10-2.5 on average across 108 US counties
(1999-2005)
PM10-2.5
PM
alone
PM2.5 PM
alone
10  2.5
PM10-2.5
PM
adjusted
Adjusted by PM
by PM2.5
PM2.5 PM
adjusted

Adjusted by PM
by PM10-2.5
10 2.5
2.5
2
g /m 3
2.5
10 2.5
2.5
1.5
% increase in admission wi th a 10
1
0.5
0
-0.5
-1
Lag
Lag
-1.5
0
1
2
0
1
Lag
2
0
1
Lag
2
0
1
2
Peng Bell Chang McDermott ZegerLagSamet Dominici tech report 2007
The policy impact
NAAQS: Science has had an
Impact
• From US EPA NAAQS Criteria Document 1996: “Many
of the time-series epidemiology studies looking for
associations between O3 exposure and daily human
mortality have been difficult to interpret because of
methodological or statistical weaknesses, including
the failure to account for other pollutants and
environmental effects.”
• From US EPA Criteria Document 2006: “While
uncertainties remain in some areas, it can be
concluded that robust associations have been
identified between various measures of daily O3
concentrations and increased risk of mortality.”
Assessing the Public Health Impact of the Air Quality Regulations
Reproducible research
• We want to reproduce previous
findings
– “Did you do what you said you did?”
• Test assumptions, robustness of
findings; check methodology
– “Is what you did any good?”
• Implement and test new methodology
– “I can do it better!”
Peng Dominici Zeger
AJE 2006
NMMAPSdata package for R
• R is a free software environment for statistical
analysis and graphics
• NMMAPSdata package contains the entire updated
(1987—2000) NMMAPS database as an add-on
module for R
• Supplemental code available online for reproducing
canonical NMMAPS analysis and other analyses
• iHAPSS: Internet-based Health and Air Pollution
Surveillance System
– http://www.ihapss.jhsph.edu/
Peng Welty R news 2004
Zeger Peng McDermott Dominici Samet HEI 2006
A new book to appear this
summer…
Roger Peng & Francesca
Dominici
Environmental
Epidemiology with
R: A Case study in
Air Pollution and
Health
Concluding Thoughts
Questions
Policy
Data
Biostatistics in Action!
Methods
can be used to
address other
questions beyond
air pollution
analyses of
observational
studies
Evidence
• The weight of the evidence:
– Has an explicit role in the Clean Air Act
• New NAAQS process
• New Research underway: especially PM Components
and Sources – the cycle begins anew
Collaborators
in the BSPH
•
•
•
•
•
•
•
•
•
•
•
•
PhD Students
Medicare data
users and
collaborators
in the BSPH and
SOM
Michelle Bell
• Howard Chang
Patrick Breysse
• Sandy Eckel
Ciprian Crainiceanu • Sorina Eftim
• Gerald Anderson
• Jennifer Feder
Mary Fox
• Emily Smith
• Haley Hedlin
Alyson Geyh
• Ben Brooke
•
Yun
Lu
Aidan McDermott
• Lia Clattenburg
• Chi Wang
Tom Louis
• Robert Herbert
• Yijie Zhou
Giovanni Parmigiani
• Peter Pronovost
Roger Peng
Jonathan Samet Funding sources
Ron White
•EPA: PM Research Center (Samet)
Scott Zeger
•NIEHS: Training grant in Environmental
Biostatistics (Louis and Dominici)
•NIEHS R01: Statistical methods in
Environmental Epidemiology (Dominici)