Practical issues and tools for modeling temporal and spatio-temporal trends in atmospheric pollutant monitoring data Paul D. Sampson Department of Statistics University of Washington The International Environmetric Society Modelling Spatio-Temporal Trends Workshop 3 November 2003 1 Our experience in analysis of trends in atmospheric pollutants Part I: Meteorological adjustment and long-term temporal trends in ozone • Meteorological adjustment of western Washington and northwest Oregon surface ozone observations with investigation of trends. Reynolds, Das, Sampson, Guttorp, NRCSE TRS #15 (http://www.nrcse.washington.edu/pdf/trs15_doe.pdf) • Meteorological adjustment of Chicago, Illinois, surface ozone observations with investigation of trends. Reynolds, Caccia, Sampson, Guttorp, NRCSE TRS #25 (http://www.nrcse.washington.edu/pdf/trs25_chicago.pdf) • A review of statistical methods for the meteorological adjustment of tropospheric ozone. Thompson, Reynolds, Cox, Guttorp, Sampson, Atmospheric Environment 35, 617-630, 2001. Part II: Spatial trend for health effects studies • Spatial estimation of ambient air concentrations for ozone, 1986-94, for chronic health effects modeling in 83 counties in the U.S. Current contract with U.S. EPA. • Spatio-temporal modeling and prediction of ambient PM2.5 concentrations for acute and chronic health effects modeling with the NIH/NHLBI cohort study, MESA (Multi-Ethnic Study of Atheroscloerosis). Current proposal and ongoing collaboration with colleagues at the Univ of Washington’s Northwest Center for Particulate Matter and Health. 2 Spatio-temporal modeling of ambient PM exposure for chronic health effect studies Paul D. Sampson Department of Statistics University of Washington Northwest Center for Particulate Matter & Health External Science Advisory Committee Meeting 12 November 2003 3 Motivation for fine(r)-scale spatial modeling of pollutant exposure for chronic health effect studies: • Major North American cohort studies of PM: single communitywide exposure/monitor to characterize a metropolitan area. Fails to address important local spatial variation of air pollutants known to exist within regions. • Hoek et al.: 3-component regression model to predict exposure to air pollutants (black smoke and NO2). Incorporates (a) regional background levels, (b) urban gradient (based on population density) and (c) proximity to heavily-trafficked roadways and other point sources. • Build on this approach to combine in a spatial model average concentration data from fixed-site ambient monitors and spatial covariate information encoded in a GIS, including population density, proximity to roads, and traffic density. 4 Aside: U.S. EPA currently funded Epidemiologic Research on Health Effects of Long-Term Exposure to Ambient P.M. and Other Air Pollutants (June 2003) • Laden, Schwartz et al (Harvard): Chronic Exposure to Particulate Matter and Cardiopulmonary Disease. Nurses Health Study: Prospective cohort study of 121,700 women throughout U.S. • Knutson, Beeson et al (Loma Linda): Relating Cardiovascular Disease Risk to Ambient Air Pollutants using GIS and Bayesian Neural Networks. AHSMOG study. • Samet, Zeger, Dominici et al (Johns Hopkins): Chronic and Acute Exposure to Ambient Fine Particulate Matter and Other Air Pollutants: National Cohort Studies of Mortality and Morbidity. Data from Medicate beneficiary file and National Claims History File. • Diez-Roux, Keeler, Samson, Lin (Michigan). Long-Term Exposure to Ambient PM and Subclincial Atherosclerosis. MESA Study. 5 EPA apparently mandated/directed that all these studies be concerned with computing exposure estimates from ambient monitoring data and GIS-based information on local traffic, pop density, … . Jon Samet (Johns Hopkins): “EPA should invest in drawing national maps of exposure as all our research groups are trying to do the same thing.” 6 Applications: • “MESA Air”: NHLBI-funded Multi-Ethnic Study of Atherosclerosis: effects of ambient PM (and other pollutants) on subclinical cardiovascular function – 8700 subjects, aged 50-89, from 9 communities, assessed prospectively, longitudinally. – Monitoring data and exposure assessment: • Current AQS PM monitors (mostly 3-day sampling) • Supplemental monitors, up to 5 per community (2 week integrated msmts of key pollutants) • Mobile gradient monitoring (2 week integrated sampling) PLUS • Distances to nearest major roadways with traffic volume and composition • Distances to pollutant point source EVALUATION on PM2.5 and co-pollutants measured at 10% of homes • Preliminary demonstration of spatio-temporal modeling using S. Calif ozone data. 7 Personal PM exposure for subject I at time t: sum of non-ambient (N) and ambient (A) components: E E E P it N it A it Ambient exposure is ambient concentration times an ambient exposure attenuation factor reflecting time spent outside the home and particle infiltration into the home: A A Eit it Cit Model for ambient concentration: trend + residual C ( si , t ) v( si , t ) A it 8 Smoothly varying spatio-temporal trend is further decomposed: (si , t ) 1 (si ) 2 (si , t ) • the 1st term represents long-term mean concentration and will derive from a Bayesian analysis of a spatial regression model combining average concentration data from fixedsite ambient monitors and spatial covariate information encoded in a GIS. • the 2nd component represents mainly smooth seasonal temporal variation. 9 C ( si , t ) v( si , t ) A it The variance model for the residual term represents the spatio-temporal variation considered primarily at the 2week time scale of the fixed sites and mobile gradient monitors. Estimation of this component will be based on (extensions to) the Bayesian model for the SampsonGuttorp spatial deformation approach to nonstationary spatial covariance as demonstrated in Damian et al. (2001, 2003). This modeling strategy accommodates the spatial varying effects of predominant meteorology, coast lines and topographic features that underlie the statistical relationship between time varying pollutant levels at different points in space. 10 Spatial Analysis Region Definitions 11 Region 1: Northeast, all 125 sites and target counties 12 Region 6: S Calif, all 82 sites and target counties 13 Region 6: S Calif, all 94 sites, fitting and validation Fitting (63) Validation (31) Los Angeles County 14 Region 6 : S. Calif Starplot of temporal trend coefficients (LA) 061111003 061112003 060719004 060714003 060370113 060374002 15 0.4 0.2 0.0 sqrt(max 8hr O3) 60714003 01/01/1989 01/01/1990 01/01/1991 01/01/1992 01/01/1993 01/01/1994 01/01/1993 01/01/1994 01/01/1993 01/01/1994 1987-1994 0.4 0.2 0.0 sqrt(max 8hr O3) 60719004 01/01/1989 01/01/1990 01/01/1991 01/01/1992 1987-1994 0.4 0.2 0.0 sqrt(max 8hr O3) 60374002 01/01/1989 01/01/1990 01/01/1991 01/01/1992 1987-1994 16 0.4 0.2 0.0 sqrt(max 8hr O3) 60370113 01/01/1989 01/01/1990 01/01/1991 01/01/1992 01/01/1993 01/01/1994 01/01/1993 01/01/1994 01/01/1993 01/01/1994 1987-1994 0.4 0.2 0.0 sqrt(max 8hr O3) 61112003 01/01/1989 01/01/1990 01/01/1991 01/01/1992 1987-1994 0.4 0.2 0.0 sqrt(max 8hr O3) 61111003 01/01/1989 01/01/1990 01/01/1991 01/01/1992 1987-1994 17 Some Southern California PM2.5 Monitors s9002v1 s1201v1 s1002v1 s1103v1 s1301v1 s0002v1 s2005v1 s1601v1 s8001v1 s1003v1 s4002v1 18 01/01/2001 01/01/2001 s1003v1 s1103v1 3 2 1 0 1 2 3 log PM2.5 4 dates.99.01 01/01/2000 01/01/2001 01/01/1999 01/01/2000 01/01/2001 s1201v1 s1301v1 3 2 0 1 2 3 log PM2.5 4 dates.99.01 4 dates.99.01 1 log PM2.5 01/01/2000 dates.99.01 0 01/01/1999 3 01/01/1999 0 log PM2.5 01/01/1999 2 0 01/01/2000 4 01/01/1999 1 log PM2.5 3 2 1 0 log PM2.5 4 s1002v1 4 s0002v1 01/01/2000 01/01/2001 dates.99.01 01/01/1999 01/01/2000 01/01/2001 dates.99.01 19 01/01/2001 3 01/01/1999 01/01/2000 01/01/2001 dates.99.01 s4002v1 s8001v1 3 2 1 0 1 2 3 log PM2.5 4 dates.99.01 0 log PM2.5 01/01/1999 2 0 01/01/2000 4 01/01/1999 1 log PM2.5 3 2 1 0 log PM2.5 4 s2005v1 4 s1601v1 01/01/2000 01/01/2001 dates.99.01 01/01/1999 01/01/2000 01/01/2001 dates.99.01 3 2 1 0 log PM2.5 4 s9002v1 01/01/1999 01/01/2000 01/01/2001 dates.99.01 20 2.0 01/01/2000 01/01/2001 01/01/1999 biweekly s1103v1 01/01/2000 01/01/2001 01/01/1999 01/01/2000 01/01/2001 biweekly s1201v1 biweekly s1301v1 2.0 3.0 log PM2.5 4.0 dates(row.names(pm.biweek)) 4.0 dates(row.names(pm.biweek)) 3.0 log PM2.5 3.0 2.0 3.0 log PM2.5 4.0 biweekly s1003v1 2.0 01/01/1999 01/01/2001 dates(row.names(pm.biweek)) 2.0 log PM2.5 01/01/1999 01/01/2000 dates(row.names(pm.biweek)) 4.0 01/01/1999 3.0 log PM2.5 3.0 2.0 log PM2.5 4.0 biweekly s1002v1 4.0 biweekly s0002v1 01/01/2000 01/01/2001 dates(row.names(pm.biweek)) 01/01/1999 01/01/2000 01/01/2001 dates(row.names(pm.biweek)) 21 2.0 01/01/2000 01/01/2001 01/01/1999 01/01/2001 biweekly s4002v1 biweekly s8001v1 3.0 2.0 3.0 log PM2.5 4.0 dates(row.names(pm.biweek)) 2.0 log PM2.5 01/01/1999 01/01/2000 dates(row.names(pm.biweek)) 4.0 01/01/1999 3.0 log PM2.5 3.0 2.0 log PM2.5 4.0 biweekly s2005v1 4.0 biweekly s1601v1 01/01/2000 01/01/2001 dates(row.names(pm.biweek)) 01/01/1999 01/01/2000 01/01/2001 dates(row.names(pm.biweek)) 3.0 2.0 log PM2.5 4.0 biweekly s9002v1 01/01/1999 01/01/2000 01/01/2001 dates(row.names(pm.biweek)) 22 (1) Estimation of the long-term mean spatial field Following Hoek and colleagues (2002 Atmos Env, 2003 Epidemiology), assume the regression model can be written 1 (si ) 0 1vi1 2vi 2 k vik Where v1 , , vk represent pop density, proximity to roads, traffic density, and possibly local topographic and climatic wind patterns. The Bayesian analysis incorporates prior information on the parameters and on the spatial covariance structure of residuals from this regression model in a manner similar to that of our Bayesian framework for spatial estimation of the residual component (see (3) below). 23 Mean field Note that monitoring observations will be used directly in the estimation, not just in the specification or calibration of the regression model as in the work of Hoek et al. I.e., in Hoek et al., (long-term) exposure is estimated as: Cˆ (si ) ˆ1 (si ) ˆ0 ˆ1vi1 ˆ2vi 2 ˆk vik In our (geostatistical) approach, we will be estimating the space-time field, Cˆ (si , t ) ; the long-term exposure at a point includes an estimated (“kriged”) spatial residual and can be written: Cˆ ( si ) ˆ1 ( si ) eˆ( si , t ) ˆ 0 ˆ1vi1 ˆ2 vi 2 ˆk vik eˆ( si , t ) 24 (2) Smooth, spatially varying, temporal variation. (si , t ) 1 (si ) 2 ( si , t ) The spatial index in the 2nd component allows for the possibility that the magnitude and precise details of the seasonal variation may vary from location to location over the spatial scale of the regional target communities. Preliminary analysis of PM2.5 monitoring data in the Los Angeles county region suggests some spatial variation in seasonality, but in some regions we expect to find that this seasonal variation is homogeneous, permitting an additive (separable) decomposition of the spatio-temporal trend. (si , t ) 1 ( si ) 2 (t ) 25 Trend decomposition Characterize and estimate the seasonal structure of air pollutant concentrations in terms of a model written as: C ( si , t ) 1 ( si ) 2 ( si , t ) ( si , t ) J 2 ( si , t ) j ( si ) f j (t ) j 1 where the f j (t ) are temporal basis functions describing possible seasonal trend patterns, and j ( si ) represent spatially varying coefficients of these trend patterns. Example: O3 trend components. (What do we expect with PM more generally?) 26 We compute trend components empirically as smoothed versions of the temporal singular vectors of the TN data matrix (rather than assuming parametric forms such as trigonometric functions). Arbitrary amounts of missing data are accommodated in an EM-like iterative calculation of the SVD. The Bayesian spatial regression model can incorporate the coefficients of these trend components as spatial fields, and thus provide the basis for estimation of 2 ( si , t ) at target homes. 27 400 200 0 Singular value 600 800 Singular values of T=2912 x S=545 observation matrix 0 100 200 300 Index, 1:545 400 500 28 0.04 0.02 0.0 -0.04 Annual.svd$svd$u[1:1456, j] Annual Trend Component 1 01/01/1987 10/01/1987 07/01/1988 04/01/1989 01/01/1990 10/01/1990 01/01/1994 10/01/1994 0.04 0.02 0.0 -0.02 Annual.svd$svd$u[1457:2912, j] dates87to94[1:1456] 01/01/1991 10/01/1991 07/01/1992 04/01/1993 dates87to94[1457:2912] 29 0.06 0.02 -0.02 -0.06 Annual.svd$svd$u[1:1456, j] Annual Trend Component 2 01/01/1987 10/01/1987 07/01/1988 04/01/1989 01/01/1990 10/01/1990 01/01/1994 10/01/1994 0.06 0.02 -0.02 -0.06 Annual.svd$svd$u[1457:2912, j] dates87to94[1:1456] 01/01/1991 10/01/1991 07/01/1992 04/01/1993 dates87to94[1457:2912] 30 0.04 0.0 -0.04 Annual.svd$svd$u[1:1456, j] Annual Trend Component 3 01/01/1987 10/01/1987 07/01/1988 04/01/1989 01/01/1990 10/01/1990 01/01/1994 10/01/1994 31 0.06 0.02 -0.02 -0.06 Annual.svd$svd$u[1457:2912, j] dates87to94[1:1456] 01/01/1991 10/01/1991 07/01/1992 04/01/1993 dates87to94[1457:2912] 0.06 0.02 -0.02 -0.06 Annual.svd$svd$u[1:1456, j] Annual Trend Component 4 01/01/1987 10/01/1987 07/01/1988 04/01/1989 01/01/1990 10/01/1990 01/01/1994 10/01/1994 0.04 0.0 -0.04 Annual.svd$svd$u[1457:2912, j] dates87to94[1:1456] 01/01/1991 10/01/1991 07/01/1992 04/01/1993 dates87to94[1457:2912] 32 Region 6 : S. Calif Starplot of temporal trend coefficients (LA) 061111003 061112003 060719004 060714003 060370113 060374002 33 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 distance 1.5 b1.b4 0.5 lin.b4 0.2 0.6 0.0 b1.b3 0.0 0.05 0.2 0.4 b1.b2 -0.2 -0.05 lin.b3 0.0 0.4 0.8 0 -0.006 0.4 -3*10^-6 0.6 0.0 0.2 0.0 -0.012 lin.b2 0.0 0.0 10^-6 -0.006 1.2 2*10^-6 2*10^-11 0.0 -8*10^-8 lin.b1 -0.5 -0.2 mu.b4 -0.3 -10^-6 mu.b3 -0.6 2*10^-6 0.005 -0.012 mu.b2 -0.2 0 -0.005 mu.b1 -0.6 -0.6 -2*10^-6 0.010 6*10^-11 -2*10^-8 0.0001 mu.lin -1.0 -8*10^-6 0.0 semivariance 0.0004 0.0007 mu Linear Model of Coregionalization with Gaussian (co)-variograms, (fit.lmc=T) lin b1 b2 b2.b3 b3 b2.b4 b3.b4 0.0 0.5 1.0 b4 1.5 34 Ordinary kriging prediction of mu 0.26 1.5 0.25 1.0 0.24 0.5 0.23 0.0 y 0.22 0.21 -0.5 0.20 -1.0 0.19 -1.5 0.18 -2 -1 0 x 1 2 35 Ordinary kriging prediction of b2 1.5 1.0 1.0 0.5 0.5 y 0.0 -0.5 0.0 -1.0 -0.5 -1.5 -2 -1 0 x 1 2 36 0.4 0.2 0.0 sq rt Ozone Fitted trend (solid) vs Predicted (dashed): 060371002 01/01/1989 01/01/1990 01/01/1991 01/01/1992 01/01/1993 01/01/1994 D ate 0.4 0.2 0.0 sq rt Ozone Fitted trend (solid) vs Predicted (dashed): 060371301 01/01/1989 01/01/1990 01/01/1991 01/01/1992 01/01/1993 01/01/1994 D ate 0.4 0.2 0.0 sq rt Ozone Fitted trend (solid) vs Predicted (dashed): 060375001 01/01/1989 01/01/1990 01/01/1991 01/01/1992 D ate 01/01/1993 01/01/1994 37 (3) Nonstationary residual spatio-temporal variation. C ( si , t ) v( si , t ) A it Final component: spatio-temporal variation at the (2-week) time scale of the fixed sites and mobile gradient monitors. Sampson-Guttorp spatial deformation approach (Damian et al. 2001, 2003), to model the nonstationary spatial covariance structure. Allows for spatially varying effects of predominant meteorology, coast lines and topographic features that underlie the statistical relationship between time varying pollutant levels at different points in space. Bayesian analysis provides a full posterior distribution for the model parameters, and thus a ready computation of multiple imputations of exposures for the health effects analysis. 38 63 Region 6 monitoring sites and their representation in a deformed coordinate system reflecting spatial covariance Thu Oct 30 00:12:36 PST 2003 55643 26 54 5 56 2530 43 20 62 26 52 32 15 7 17 58 11 47 18 6 9468223727 41 12 42 3 51 35 45 49 44 53 57 40 59 55 61 2 63 60 4 28 38 50 29 19 23 36 48 33 39 10 31 1 21 34 24 14 54 30 25 20 62 32 478 52 7 11 69 22 27 41 12 5842 18 351 46 37 44 17 45 49 15 35 57 53 2 55 38 36 61 63 50 46059 29282310 31 48 24 33 39 34 13 16 14 40 19 1 21 13 16 39 0.8 0.6 -0.2 0.0 0.2 0.4 Correlation 0.4 0.2 0.0 -0.2 Correlation 0.6 0.8 1.0 Region 6 S. Calif 1.0 Region 6 S. Calif 0 1 2 3 Geographic Distance (km) 4 0 1 2 3 4 D-plane Distance 40 Observed vs Predicted ozone at 3 validation sites. 0.4 0.3 0.0 0.1 0.2 Zpredf[ , j] 0.3 0.2 0.0 0.1 Zpredf[ , j] 0.4 0.5 060296001 0.5 060371103 0.0 0.1 0.2 0.3 0.4 0.5 Zval[, j] 0.0 0.1 0.2 0.3 0.4 0.5 Zval[, j] 0.3 0.2 0.1 0.0 Zpredf[ , j] 0.4 0.5 060831015 0.0 0.1 0.2 0.3 Zval[, j] 0.4 0.5 41 0.2 0.0 sq rt Ozone 0.4 Observ ed (points) v s Predicted (lines): 060371002 01/01/1989 04/01/1989 07/01/1989 10/01/1989 01/01/1990 04/01/1990 07/01/1990 10/01/1990 04/01/1992 07/01/1992 10/01/1992 04/01/1994 07/01/1994 10/01/1994 0.2 0.0 sq rt Ozone 0.4 D ate 01/01/1991 04/01/1991 07/01/1991 10/01/1991 01/01/1992 0.2 0.0 sq rt Ozone 0.4 D ate 01/01/1993 04/01/1993 07/01/1993 10/01/1993 01/01/1994 D ate 42 0.2 0.0 sq rt Ozone 0.4 Observ ed (points) v s Predicted (lines): 060371301 01/01/1989 04/01/1989 07/01/1989 10/01/1989 01/01/1990 04/01/1990 07/01/1990 10/01/1990 04/01/1992 07/01/1992 10/01/1992 04/01/1994 07/01/1994 10/01/1994 0.2 0.0 sq rt Ozone 0.4 D ate 01/01/1991 04/01/1991 07/01/1991 10/01/1991 01/01/1992 0.2 0.0 sq rt Ozone 0.4 D ate 01/01/1993 04/01/1993 07/01/1993 10/01/1993 01/01/1994 D ate 43 0.2 0.0 sq rt Ozone 0.4 Observ ed (points) v s Predicted (lines): 060375001 01/01/1989 04/01/1989 07/01/1989 10/01/1989 01/01/1990 04/01/1990 07/01/1990 10/01/1990 04/01/1992 07/01/1992 10/01/1992 04/01/1994 07/01/1994 10/01/1994 0.2 0.0 sq rt Ozone 0.4 D ate 01/01/1991 04/01/1991 07/01/1991 10/01/1991 01/01/1992 0.2 0.0 sq rt Ozone 0.4 D ate 01/01/1993 04/01/1993 07/01/1993 10/01/1993 01/01/1994 D ate 44 Conclusion: • We can estimate/predict both the day-to-day deviations from the trend, and the seasonal shape of the trend quite well, but • We sometimes miss the long-term mean. => need to incorporate extra local information to predict the mean concentration. 45 46 Technical details 47 Details, issues, and extensions • Gaussian assumption after transformation • Current AQS data sampling usually every 3 days; proposed sampling on 2-week intervals • Conditional, hierarchical approach to estimating the parameters of our space-time models from this “incomplete data,” beginning with models estimated from the longer-term AQS data and them updating estimated model parameters with data from the new fixed and mobile monitors. 48 • First stage of analysis: build separate models and estimates for the three major exposures of interest, PM2.5, NOx, and O3. • Second stage: take advantage of the association between PM2.5 and NOx in a multivariate (“co-kriging”) analysis that assumes only that spatial nonstationarity can be expressed in a common underlying deformed coordinate system. 49 J C ( si , t ) 1 ( si ) j ( si ) f j (t ) ( si , t ) j 1 J ij f j (t ) it j 0 or C F where we are writing C as an ST (space-time) matrix of observations, is an S(J+1) matrix of coefficients multiplying the matrix F, (J+1) T, with columns containing values of the basis functions evaluated at the S observation sites (i=1,…,S). Obvious calculation is an SVD of the concentration matrix C. 50 C F UDV j 1 d j u j v j S T d u v j 1 j j j U J 1 ( J 1) D ( J 1) V d u v j J 2 j j j S N ( J 1) where the columns of the (truncated) matrix of right singular vectors is considered to represent the matrix of values of the J+1 temporal basis functions: F = V ( J 1) Issues: Smoothness of the singular vectors as components of trend; computation with missing data. 51 N=63, S. Calif: 4 samples from the posterior distribution of deformations reflecting spatial covariance Tue Oct 28 22:18:29 PST 2003 56 5 2643 54 30 25 20 62 327 478 52 17 15 11 41 6922 27 58 12 18 351 42 46 37 45 49 35 44 57 53 61 38 63 55 59 502 460 231036 28 29 48 2431 33 39 34 13 16 14 55643 26 40 19 121 556 2643 54 30 62 20 25 17 15 327478 52 11 6922 27 41 12 58 18 351 42 37 45 49 35 46 40 19 44 53 57 2460 61 38 36 121 55 59 63 50 231031 28 29 48 24 3933 34 13 14 16 54 30 25 20 62 17 15 327478 52 11 58 18 6 41 27 12 351 42 922 37 45 49 46 40 35 19 44 57 53 2 460 38 36 121 61 63 55 59 50 23 28 10 29 48 2431 33 39 34 13 16 14 56 5 2643 54 15 30 25 20 62 17 32 527478 11 40 58 6 41 18 12 351 42 92227 45 49 35 46 19 37 53 44 57 21 2 460 36 1 61 38 63 59 50 55 28231031 29 48 24 33 39 34 13 16 14 52 Posterior sample Site variances 0.2 0.0 0.5 1.0 1.5 78 71 94 103 92 83 96 88 131 139 113 59 0.1 0.0 Variogram 0.3 0.4 Region 6 : S. Calif Empirical variogram of log site variances Circle radii proportional to (detrended) log site variances 2.0 D-plane Distance 53