file - Environmental Health

advertisement
SUPPLEMENTAL MATERIALS
Areal weighting method validation
To validate our areal weighting approach to reformulating incongruent administrative indicators,
we aggregated three high-resolution air pollution surfaces with differing spatial patterns to each
administrative unit, and then reformulated CD-, PP- and SD-level concentrations to UHFs.
Comparing the actual versus reformulated UHF-level concentrations across pollutants and
administrative units indicates how well this method can re-produce global spatial variation, where
underlying within-area variability is unavailable. Density distribution plots and bandwidth (Figure
S1) validated CD and PP for reformulation to UHFs, but not SDs.
Spatial Regression
We explored multiple spatial statistical techniques accounting for spatial dependence in bivariate
correlations, including Geographically Weighted Regression (GWR), Conditional Autoregressive
(CAR), and selected Spatial (Simultaneous) Autoregressive (SAR) model as most appropriate to
NYC administrative data, given irregular unit shape and size. Geographically Weighted
Regression (GWR) allows regression coefficients to vary across space (i.e., non-stationarity); each
observation (e.g., sampling point or areal unit) is the target of a separate regression spatially
weighted against the entire domain [1]. GWR has greatest utility in multivariate models for which
an inverse-distance weighting scheme is desirable (e.g., proximity analysis), and for research
questions
focusing
on
locally-varying
predictor-outcome
relationships.
Conditional
Autoregressive (CAR) and Spatial (Simultaneous) Autoregressive (SAR) models account for
spatial autocorrelation globally, but CARs specify a symmetric covariance matrix. As such, spatial
weights for CAR often include continuous inverse distance decay, not ideal for irregularly-shaped
and -sized areal units [2,3]. Some analyses have reported negligible differences between SAR and
CAR results [4,5], but SAR requirements (e.g., flexible spatial weights definition, non-symmetric
covariance) better match NYC administrative data.
SAR specification begins with diagnostic tests for spatial autocorrelation (e.g., Moran’s I p < 0.05)
on single-predictor Ordinary Least Squares (OLS) regression residuals. Autocorrelation among
area-level measures may be caused either by: 1) underlying social and/or chemical processes
leading to inherent spatial clustering (e.g., higher air pollution concentrations closer to fixed
sources), or 2) through “spill-over effects,” a mismatch between the true scale of the underlying
process and the administrative unit used (e.g., a neighborhood split across two Police Precincts).
To account for residual autocorrelation, SAR incorporates a lag (SARlag) or error (SARerr) term
[6,7]. Generally, where residual autocorrelation is inherent in the predictor variable(s), SARlag
models may be more appropriate, applying a weighted autoregressive term (Wy) to the response
variable (y = ρWy +xβ + ɛ, with ɛ = is a vector of iid error terms). SARerr models are useful when
spatial dependence is observed primarily in residuals, and incorporate an autoregressive error term
(y = xβ + ɛ, with ɛ = ʎW ɛ + u, a vector of spatially correlated error terms, and u is a vector of iid
errors) [8,9]. Having no a priori hypothesis about the nature of spatial processes operating across
our multiple indicators, we assumed that different units and variables might give rise to diverse
autocorrelation structures. As such, we referred to Lagrange Multiplier test statistics to specify
SAR model-type (e.g., error or lag), following standard decision-making criteria [7, pp. 196-200].
Despite widespread univariate autocorrelation, relatively few (20%) bivariate comparisons called
for SAR. As such, the main analysis prioritized comparability among r-values, and referred only
to OLS r-values to estimate spatial correlation. Model fit was improved in all SAR, measured by
Log Likelihood Ratio test [7]. SAR model-type was not patterned by administrative unit, or by
stressor construct; 88% (n = 63) of comparisons called for an error model, versus a lag model (n
= 9). Figure S2 shows SAR pseudo-r-values, and illustrates the irregular nature of spatial
dependence structures across units of aggregation and constructs. Though not directly comparable,
all SAR pseudo-r-values were stronger than OLS r-values. The magnitude of that increase,
however, varied substantially: among SARerr, the mean difference was 0.34 (range 0.02 to 0.81),
and among SARlag, the mean difference was on average less (mean=0.14, range 0.01 to 0.42), but
also highly variable.
Importantly, SAR specification is determined jointly by all model covariates, and thus,
multivariate epidemiological model specification depends on the underlying structures and spatial
interactions present. The predominance of error, over lag, SAR models in this analysis may be a
point of departure for spatial adjustment in complex multi-variable models moving forward.
Spatial regression models are relatively new to environmental health research, and incorporating
sensitivity tests for spatial autocorrelation in preliminary exploration and variable selection proved
beneficial toward understanding SAR model specification.
REFERENCES
1. Fotheringham AS, Brunson C, Charlton ME: Geographically Weighted Regression: The
Analysis of Spatially Varying Relationships. Chichester: Wiley; 2002.
2. Goovaerts P: Geostatistical Analysis of County-level Lung Cancer Mortality Rates in the
Southeastern United States. Geogr Anal 2010, 42:32-52.
3. Kelsall J and Wakefield J: Modeling Spatial Variation in Disease Risk: A Geostatistical
Approach. J Am Statistical Association 2002, 97:692-701.
4. Wall MM: A close look at the spatial structure implied by the CAR and SAR models.
Journal of Statistical Planning and Inference 2004, 121:311-324.
5. Lichstein JW, Simons TR, Shriner SA, Farnzreb KE: Spatial Autocorrelation and
Autoregressive Models in Ecology. Ecological Monographs 2002, 72:445-463.
6. Anselin L and Bera AK: Spatial Dependence in Linear Regression Models with and
Introduction to Spatial Econometrics. In Handbook of applied economic statistics. Edited by
Ullah A and Giles DEA. New York: Marcel Dekker, Inc; 1998.
7. Anselin L: Exploring Spatial Data with GeoDa: A Workbook. Urbana-Champaign, IL: Center
for Spatially Integrated Social Science, University of Illinois; 2005.
8. Kissing DW and Carl G: Spatial autocorrelation and the selection of simultaneous
autoregressive models. Global Ecol Biogeogr 2008, 17:59-71.
9. de Smith M, Goodchild M, Longley P: Spatial autoregressive and Bayesian modeling. In
Geospatial Analysis - 4th edition. [http://www.spatialanalysisonline.com/HTML/index.html]
Table S1: Unit of aggregation (MAUP) effects on correlation measures. Correlation coefficients
(Pearson r-values) between multiple pollutants and Census variables, aggregated to three
different administrative units.
CD (n=59)
PM2.5
EC
NO2
SO2
O3
% < 200%
FPL
0.01
0.13
0.02
0.24
-0.02
% Unemp
0.04
0.17
0.02
0.33
-0.09
PP (n=74)
% NonWhite
-0.16
-0.01
-0.11
0.10
0.01
% < 200%
FPL
-0.19
-0.11
-0.18
0.18
0.32
% Unemp
-0.05
0.01
-0.08
0.25
0.25
UHF (n=34)
% NonWhite
-0.27
-0.20
-0.23
0.03
0.32
% < 200%
FPL
0.08
0.14
0.10
0.20
-0.01
% Unemp
-0.02
0.07
0.00
0.29
-0.02
Table S2. SAR and OLS r-values, by stressor construct. Shaded areas indicate SAR pseudo-rvalues (tan = SAR error model, blue = SAR lag model). Bold values indicate rho ≥ 0.60.
[Next page]
% NonWhite
-0.21
-0.10
-0.14
0.03
0.07
Murder
-0.13
1
Assault
0.64
0.68
1
Robbery
0.33
0.60
0.90
1
Burglary
0.90
0.13
0.40
0.62
1
Safety
-0.33
0.73
0.82
0.73
-0.06
1
Child abuse
-0.02
0.85
0.88
0.82
-0.01
0.85
1
-0.37
0.20
-0.38
0.02
-0.26
0.15
-0.74
1
-0.16
0.83
0.86
0.81
0.41
0.76
0.86
0.10
1
-0.26
0.66
0.71
0.62
0.08
0.84
0.67
0.09
0.74
1
0.84
-0.34
-0.40
0.03
0.43
-0.52
-0.77
-0.80
-0.26
-0.36
1
Crowding
-0.21
0.17
0.45
0.46
0.41
0.39
0.27
0.14
0.48
0.58
-0.20
1
No
insurance
-0.44
0.21
0.41
0.18
-0.41
0.51
0.28
0.30
0.38
0.50
-0.43
0.63
1
Without care
-0.30
0.41
0.41
0.39
0.03
0.53
0.38
0.15
0.35
0.63
-0.29
0.34
0.50
1
No provider
-0.03
0.09
0.22
0.22
0.09
0.44
0.15
-0.05
0.32
0.43
-0.03
0.60
0.72
0.22
1
-0.44
0.45
0.68
0.50
0.04
0.70
0.74
0.28
0.69
0.63
-0.52
0.79
0.56
0.37
0.46
1
0.23
0.11
0.54
0.52
0.13
0.39
0.34
-0.06
0.24
0.30
0.22
0.33
0.23
0.25
0.24
0.39
1
0.45
-0.19
0.14
0.18
0.10
-0.02
-0.11
-0.20
-0.01
-0.09
0.45
0.05
-0.03
-0.04
0.17
0.01
0.73
1
-0.28
0.50
0.47
0.26
-0.06
0.56
0.53
0.36
0.47
0.51
-0.31
0.41
0.41
0.26
0.18
0.52
0.42
0.01
1
-0.37
0.67
0.74
0.60
0.38
0.82
0.73
0.52
0.28
0.81
-0.54
0.53
0.62
0.56
0.60
0.67
0.41
-0.32
0.64
1
-0.27
0.66
0.82
0.85
0.10
0.84
0.95
0.69
0.87
0.74
-0.38
0.46
0.36
0.40
0.26
0.88
0.42
-0.02
0.52
0.70
1
-0.25
0.31
0.49
0.49
0.14
0.66
0.52
0.04
0.52
0.61
-0.28
0.83
0.55
0.49
0.55
0.80
0.36
0.04
0.31
0.63
0.62
1
0.20
0.39
0.52
0.48
0.3
0.83
0.36
0
0.35
0.42
0.04
0.28
0.41
0.46
0.73
0.21
0.40
0.08
0.33
0.63
0.31
0.46
1
-0.08
0.51
0.72
0.60
-0.03
0.75
0.78
0.19
0.80
0.80
-0.40
0.79
0.57
0.42
0.52
0.92
0.57
0.18
0.55
0.63
0.91
0.80
0.28
1
-0.34
0.64
0.72
0.58
-0.01
0.79
0.71
0.10
0.50
0.72
-0.52
0.37
0.50
0.59
0.44
0.70
0.20
0.14
0.30
0.77
0.73
0.60
0.36
0.55
1
-0.21
0.75
0.55
0.52
-0.02
0.59
0.48
0.15
0.51
0.49
-0.35
0.08
0.17
0.48
0.02
0.24
0
0.25
0.30
0.68
0.49
0.11
0.32
0.16
0.70
1
-0.25
0.14
0.40
0.35
-0.10
0.59
0.63
0.01
0.27
0.63
-0.32
0.60
0.49
0.39
0.55
0.70
0.34
0.06
0.24
0.47
0.64
0.76
0.20
0.63
0.57
-0.06
1
Murder
Assault
Robbery
Burglary
Safety
Child abuse
Parks unclean
Sidewalks
unclean
Serious housing
violations
Air complaints
Crowding
No insurance
Without needed
care
No medical
provider
Public health
insurance
Freq. noise
disrupt
Traffic noise
disrupt
Neighbor noise
disrupt
Delayed
rent/mortgage
Food Stamp
enrollment
Less high
school
education
Unemployed
Poverty
% Non-White
% African
American
% Hispanic
1
Larceny
Larceny
Parks
unclean
Sidewalks
unclean
Housing
Violations
Air quality
complaints
Public HI
Freq. noise
disrupt
Traffic noise
disrupt
Neighbor
noise disrupt
Delayed rent
Food Stamp
enroll
% Less high
school
%
Unemployed
% Poverty
% NonWhite
% African
American
% Hispanic
FIGURE CAPTIONS
Figure S1: Kernel density plots comparing pollutant (wintertime PM2.5 and SO2, and summer O3) distributions,
by UHF (row 1) and areas reformulated to UHF from other administrative units (rows 2-4).
Download