NASLOV

advertisement
LINKAGE OF HEALTH SURVEY DATA FROM THE
NATIONAL CENTER FOR HEALTH STATISTICS TO AIR
QUALITY DATA FROM THE U.S. ENVIRONMENTAL
PROTECTION AGENCY
Jennifer Parker, jdparker@cdc.gov, National Center for Health Statistics, Centers for
Disease Control and Prevention, USA
ABSTRACT
The objective of this paper is to describe analytic issues posed by data files obtained by
linking air quality data from the United States Environmental Protection Agency (US EPA)
with health data from the National Center for Health Statistics (NCHS). The combination of
detailed demographic and health data with routinely monitored environmental exposure data
has the potential to enhance our understanding of environmental impacts on health. NCHS
has linked selected data systems to air pollution indicators, albeit at different levels of
geography. Because air monitors are placed for regulatory, not scientific purposes, they may
not correspond to areas sampled in national surveys. Consequently, there are challenges
when analyzing linked data.
Between 20% and 70% of the survey records are linked to exposure estimates from air
monitoring stations, depending on the pollutant/s and the spatial-temporal criteria for
estimation. Statistical analysis is complicated by adding spatially correlated exposure
variables to complex sample surveys, where stratification, clustering, and survey weights
already need to be considered.
Despite the added complexities, these linked files provide a national resource for
understanding possible health impacts from environmental pollutants. A further
understanding of the statistical issues can help with future linkages of health survey data with
other environmental data, such as climate and water.
Key words: health survey, air pollution
1. Introduction
Several data systems in the United States (US) provide policy makers with information on
current health conditions and trends. The National Center for Health Statistics (NCHS)
conducts several nationally representative surveys, including the National Health and
Nutrition Examination Survey (NHANES) and the National Health Interview Survey (NHIS)
1
and coordinates the national vital statistics program, which includes all birth and
death records. Area-level environmental information, often collected for
regulatory purposes, is housed in other government agencies, including the US
Environmental Protection Agency (US EPA). Combining these sources of information can
provide researchers means to understand the role of environmental factors on health, which
can, in turn, be used to inform regulatory decisions. This paper decribes examples of NCHS
geographically-linked data files.
2. Examples of geographically linked files
2.1. NCHS NHIS linked to US EPA air quality data
The NHIS is a continuous, complex sample survey of the non-institutionalized civilian
population of the US, with annual data release. Currently the NHIS collects information in
about 400 geographic locations (primary sampling units (PSUs)), from about 35,000
households or 87,000 individuals. The main objective of the NHIS is to monitor the health of
the United States population through the collection and analysis of data on a broad range of
health topics. A major strength of this survey lies in the ability to display these health
characteristics by many demographic and socioeconomic characteristics.
The US EPA Air Quality System (AQS) contains ambient air pollution data collected by EPA,
state, local, and tribal air pollution control agencies from thousands of monitoring stations.
Measurements are obtained for several pollutants, including ozone, particulate matter, sulphur
dioxide (SO2), and nitrogen dioxide (NO2), although the collections vary from hourly to every
3-6 days, depending on equipment. For each monitor location, both individual measurements
with their collection dates and annual summary estimates are available.
Annual air pollution estimates were linked to the 1986-2005 NHIS by averaging
measurements collected at air monitors located 1) within specified distances of the
respondent’s residence and 2) within an administrative unit, county (Parker et al., 2008a).
Two distance- based geographic criteria are considered here: 5 miles (~8 km), 20 miles (~32
km). Table 1 illustrates that the success of the linkage varied by pollutant and criteria for
2003-2005 NHIS. Although PM2.5 monitors are in more locations than monitors for other
pollutants, less than 40% of respondents live within 5 miles of a PM2.5 monitor; less than 20%
are within 5 miles of an NO2 or SO2 monitor. Despite interest in synergistic pollution effects,
less than half of respondents could be linked to multiple pollutants using the 20 mile criterion.
Respondents with linked data were more likely to be urban and poor than those who could not
be linked (not shown). However, associations between pollution and health status are
relatively robust to linkage, particularly when characteristics associated with the linkage are
considered (Parker et al., 2008b).
2.2. NCHS NHANES linked to US EPA air quality data
NHANES is designed to assess the health and nutritional status of adults and children in the
United States. The NHANES is much smaller than the NHIS, with considerably fewer PSUs
and about 10,000 participants in a 2-year data release. However, the survey is unique in that
it combines interviews and physical examinations. The NHANES interview includes
2
demographic, socioeconomic, dietary, and health-related questions. The
examination component consists of medical, dental, and physiological
measurements, as well as laboratory tests administered by highly trained
medical personnel. Among its many uses, NHANES findings are the basis for national
standards for such measurements as height, weight, and blood pressure.
1999-2006 NHANES was linked to AQS data using both the geographic criteria used above
and criteria relative to the examination date (e.g. pollution values for the day, week, and 6
weeks prior to exam); temporal criteria were included due to the number of acute
measurements collected. Table 2 shows that linkage depends more on the spatial than the
temporal criteria and that it linkage is similar for adults with and without chronic bronchitis.
The complex design of the NHANES, with strong geographic clustering, can affect inference
from these files through relatively large design effects; design effects which represent the
increase in variance due to the complex sample design, are somewhat higher when using
longer temporal and broader spatial criteria (e.g. measurements averaged for the entire year
for monitors within 20 miles) compared to those created using shorter time windows and
shorter radii criteria (e.g. measurements the day prior to the exam within 5 miles).
Furthermore, unweighted pollution estimates have wider distributions than weighted
estimates, likely due to the relatively greater inclusion of over-sampled groups (such as
African Americans and Mexicans) in the geographically linked files.
3. Conclusion
Linkage of health surveys with monitored air pollution data depend on the geographic range,
or spatial scale, and the time-frame of interest, or temporal scale. Air quality monitors are
not in all locations and are not necessarily operational all the time. National surveys collect
information on a sample of locations, although some are in the field throughout the year (e.g.
NHIS). In the case of NHANES, however, the mobile examination units are only in a
particular location for a short period each year. Issues in combining mis-aligned data were
described in detail by Gotway and Young (2002).
On the other hand, associations found using regression models that include factors related to
linkage (e.g. poverty status) have been shown to be relatively robust to the linkage issues,
providing support for additional types of environmental linkages and further examination of
the statistical challenges. Importantly, the linkage of information collected for different
purposes from separate agencies, can enhance the usefulness of data from each agency.
The geographically linked files are restricted-use to protect survey respondent privacy. Data
users can submit an application to use these files, or have requested additional geographically
linked files to be created for a specific research or policy question, through NCHS’s Research
Data Center (RDC, http://www.cdc.gov/rdc/) where output can be assessed for disclosure
risks.
3
Table 1: Percentage of 2003-2005 NHIS respondents linked to EPA AQS, by pollutant and
geographic linkage criteria.
Geographic linkage criteria for air pollution monitors
Pollutant
County of residence
5 miles from
residence
20 miles from
residence
Percentage
Particulate matter, PM2.5
71
37
80
Particulate matter, PM10
53
27
68
Carbon monoxide
49
24
61
Ozone
69
31
79
Nitrogen dioxide
44
19
55
Sulfur dioxide
45
18
58
All 6 pollutants above
28
6
42
Table 2: Percentage of NHANES respondents linked to particulate matter, PM 2.5, by temporal and
geographic criteria for linkage, and chronic bronchitis status. 1999-2006 NHANES linked to EPA AQS
Disease status
Temporal/Geographic linkage criteria
With Chronic Bronchitis
Without Chronic Bronchitis
Percentage
6 weeks prior to exam/20 miles from residence
64
71
6 weeks prior to exam/5 miles from residence
32
36
1 week prior to exam/20 miles from residence
64
71
1 week prior to exam/5 miles from residence
32
36
1 day prior to exam/20 miles from residence
45
51
1 day prior to exam/5 miles from residence
18
21
References
1. Parker, J., Kravets, N., and Woodruff, T. (2008a) Linkage of the National Health Interview Survey to Air
Quality Data. Vital and Health Statistics. Series 2. No 145. National Center for Health Statistics. Hyattsville,
Maryland.
2. Parker, J., Woodruff, T., Akinbami, L, and Kravets, K. (2008b) Linkage of the US National Health Interview
Survey to air monitoring data: An evaluation of different strategies. Environmental Research 106:384-392.
3. Gotway, C. and Young, L. (2002). Combining Incompatible Spatial Data. Journal of the American
Statistical Association. 97: 632-648.
4
Download