LINKAGE OF HEALTH SURVEY DATA FROM THE NATIONAL CENTER FOR HEALTH STATISTICS TO AIR QUALITY DATA FROM THE U.S. ENVIRONMENTAL PROTECTION AGENCY Jennifer Parker, jdparker@cdc.gov, National Center for Health Statistics, Centers for Disease Control and Prevention, USA ABSTRACT The objective of this paper is to describe analytic issues posed by data files obtained by linking air quality data from the United States Environmental Protection Agency (US EPA) with health data from the National Center for Health Statistics (NCHS). The combination of detailed demographic and health data with routinely monitored environmental exposure data has the potential to enhance our understanding of environmental impacts on health. NCHS has linked selected data systems to air pollution indicators, albeit at different levels of geography. Because air monitors are placed for regulatory, not scientific purposes, they may not correspond to areas sampled in national surveys. Consequently, there are challenges when analyzing linked data. Between 20% and 70% of the survey records are linked to exposure estimates from air monitoring stations, depending on the pollutant/s and the spatial-temporal criteria for estimation. Statistical analysis is complicated by adding spatially correlated exposure variables to complex sample surveys, where stratification, clustering, and survey weights already need to be considered. Despite the added complexities, these linked files provide a national resource for understanding possible health impacts from environmental pollutants. A further understanding of the statistical issues can help with future linkages of health survey data with other environmental data, such as climate and water. Key words: health survey, air pollution 1. Introduction Several data systems in the United States (US) provide policy makers with information on current health conditions and trends. The National Center for Health Statistics (NCHS) conducts several nationally representative surveys, including the National Health and Nutrition Examination Survey (NHANES) and the National Health Interview Survey (NHIS) 1 and coordinates the national vital statistics program, which includes all birth and death records. Area-level environmental information, often collected for regulatory purposes, is housed in other government agencies, including the US Environmental Protection Agency (US EPA). Combining these sources of information can provide researchers means to understand the role of environmental factors on health, which can, in turn, be used to inform regulatory decisions. This paper decribes examples of NCHS geographically-linked data files. 2. Examples of geographically linked files 2.1. NCHS NHIS linked to US EPA air quality data The NHIS is a continuous, complex sample survey of the non-institutionalized civilian population of the US, with annual data release. Currently the NHIS collects information in about 400 geographic locations (primary sampling units (PSUs)), from about 35,000 households or 87,000 individuals. The main objective of the NHIS is to monitor the health of the United States population through the collection and analysis of data on a broad range of health topics. A major strength of this survey lies in the ability to display these health characteristics by many demographic and socioeconomic characteristics. The US EPA Air Quality System (AQS) contains ambient air pollution data collected by EPA, state, local, and tribal air pollution control agencies from thousands of monitoring stations. Measurements are obtained for several pollutants, including ozone, particulate matter, sulphur dioxide (SO2), and nitrogen dioxide (NO2), although the collections vary from hourly to every 3-6 days, depending on equipment. For each monitor location, both individual measurements with their collection dates and annual summary estimates are available. Annual air pollution estimates were linked to the 1986-2005 NHIS by averaging measurements collected at air monitors located 1) within specified distances of the respondent’s residence and 2) within an administrative unit, county (Parker et al., 2008a). Two distance- based geographic criteria are considered here: 5 miles (~8 km), 20 miles (~32 km). Table 1 illustrates that the success of the linkage varied by pollutant and criteria for 2003-2005 NHIS. Although PM2.5 monitors are in more locations than monitors for other pollutants, less than 40% of respondents live within 5 miles of a PM2.5 monitor; less than 20% are within 5 miles of an NO2 or SO2 monitor. Despite interest in synergistic pollution effects, less than half of respondents could be linked to multiple pollutants using the 20 mile criterion. Respondents with linked data were more likely to be urban and poor than those who could not be linked (not shown). However, associations between pollution and health status are relatively robust to linkage, particularly when characteristics associated with the linkage are considered (Parker et al., 2008b). 2.2. NCHS NHANES linked to US EPA air quality data NHANES is designed to assess the health and nutritional status of adults and children in the United States. The NHANES is much smaller than the NHIS, with considerably fewer PSUs and about 10,000 participants in a 2-year data release. However, the survey is unique in that it combines interviews and physical examinations. The NHANES interview includes 2 demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel. Among its many uses, NHANES findings are the basis for national standards for such measurements as height, weight, and blood pressure. 1999-2006 NHANES was linked to AQS data using both the geographic criteria used above and criteria relative to the examination date (e.g. pollution values for the day, week, and 6 weeks prior to exam); temporal criteria were included due to the number of acute measurements collected. Table 2 shows that linkage depends more on the spatial than the temporal criteria and that it linkage is similar for adults with and without chronic bronchitis. The complex design of the NHANES, with strong geographic clustering, can affect inference from these files through relatively large design effects; design effects which represent the increase in variance due to the complex sample design, are somewhat higher when using longer temporal and broader spatial criteria (e.g. measurements averaged for the entire year for monitors within 20 miles) compared to those created using shorter time windows and shorter radii criteria (e.g. measurements the day prior to the exam within 5 miles). Furthermore, unweighted pollution estimates have wider distributions than weighted estimates, likely due to the relatively greater inclusion of over-sampled groups (such as African Americans and Mexicans) in the geographically linked files. 3. Conclusion Linkage of health surveys with monitored air pollution data depend on the geographic range, or spatial scale, and the time-frame of interest, or temporal scale. Air quality monitors are not in all locations and are not necessarily operational all the time. National surveys collect information on a sample of locations, although some are in the field throughout the year (e.g. NHIS). In the case of NHANES, however, the mobile examination units are only in a particular location for a short period each year. Issues in combining mis-aligned data were described in detail by Gotway and Young (2002). On the other hand, associations found using regression models that include factors related to linkage (e.g. poverty status) have been shown to be relatively robust to the linkage issues, providing support for additional types of environmental linkages and further examination of the statistical challenges. Importantly, the linkage of information collected for different purposes from separate agencies, can enhance the usefulness of data from each agency. The geographically linked files are restricted-use to protect survey respondent privacy. Data users can submit an application to use these files, or have requested additional geographically linked files to be created for a specific research or policy question, through NCHS’s Research Data Center (RDC, http://www.cdc.gov/rdc/) where output can be assessed for disclosure risks. 3 Table 1: Percentage of 2003-2005 NHIS respondents linked to EPA AQS, by pollutant and geographic linkage criteria. Geographic linkage criteria for air pollution monitors Pollutant County of residence 5 miles from residence 20 miles from residence Percentage Particulate matter, PM2.5 71 37 80 Particulate matter, PM10 53 27 68 Carbon monoxide 49 24 61 Ozone 69 31 79 Nitrogen dioxide 44 19 55 Sulfur dioxide 45 18 58 All 6 pollutants above 28 6 42 Table 2: Percentage of NHANES respondents linked to particulate matter, PM 2.5, by temporal and geographic criteria for linkage, and chronic bronchitis status. 1999-2006 NHANES linked to EPA AQS Disease status Temporal/Geographic linkage criteria With Chronic Bronchitis Without Chronic Bronchitis Percentage 6 weeks prior to exam/20 miles from residence 64 71 6 weeks prior to exam/5 miles from residence 32 36 1 week prior to exam/20 miles from residence 64 71 1 week prior to exam/5 miles from residence 32 36 1 day prior to exam/20 miles from residence 45 51 1 day prior to exam/5 miles from residence 18 21 References 1. Parker, J., Kravets, N., and Woodruff, T. (2008a) Linkage of the National Health Interview Survey to Air Quality Data. Vital and Health Statistics. Series 2. No 145. National Center for Health Statistics. Hyattsville, Maryland. 2. Parker, J., Woodruff, T., Akinbami, L, and Kravets, K. (2008b) Linkage of the US National Health Interview Survey to air monitoring data: An evaluation of different strategies. Environmental Research 106:384-392. 3. Gotway, C. and Young, L. (2002). Combining Incompatible Spatial Data. Journal of the American Statistical Association. 97: 632-648. 4