Strength of Spatial Correlation and Spatial Designs: Effects on Covariance Estimation

advertisement
Strength of Spatial Correlation
and Spatial Designs:
Effects on Covariance Estimation
Kathryn M. Irvine
Oregon State University
Alix I. Gitelman
Sandra E. Thompson
R82-9096-01
The research described in this presentation has been funded by the U.S.
Environmental Protection Agency through the STAR Cooperative Agreement
CR82-9096-01 Program on Designs and Models for Aquatic Resource Surveys
at Oregon State University. It has not been subjected to the Agency's review and
therefore does not necessarily reflect the views of the Agency, and no official
endorsement should be inferred
Talk Outline
• Stream Sulfate Concentration
– Geostatistical Model
– Preliminary Findings
• Simulations
• Results
– Parameter Estimation
• Discussion
Study Objective:
Model the spatial heterogeneity of stream
sulfate concentration in streams in the MidAtlantic U.S.
Why stream sulfate concentration?
– Indirectly toxic to fish and aquatic biota
• Decrease in streamwater pH
• Increase in metal concentrations (AL)
– Observed positive spatial relationship with
atmospheric SO4-2 deposition
(Kaufmann et al. 1991)
The Data
• EMAP water chemistry data
– 322 stream locations
• Watershed variables:
– % forest, % agriculture, % urban, % mining
– % within ecoregions with high sulfate
adsorption soils
• National Atmospheric Deposition Program
EMAP and NADP locations
M
A
H
A
/
M
A
I
A
EMAP
N
A
D
P
NADP
Geostatistical Model
Y ( s)  X ( s)    ( s)
(1)
Where Y(s) is a vector of observed ln(SO4-2) concentration at stream locations (s)
X(s) is a matrix of watershed explanatory variables
 is a vector of unknown regression coefficients
(s) is the spatial error process
 ( s) ~ N n (0, Σ)
Σ   I   exp( D)
2
2
Where D is matrix of pairwise distances,
 is 1/range,
2 is the partial sill
2 is the nugget
Effective Range
Definition:
1) Distance beyond which the correlation between
observations is less than or equal to 0.05.
2) Distance where the semi-variogram reaches
95% of the sill.

 2  2 
 log  0.05

2




1
1.5
Semi-Variogram
Effective Range
Empirical
ML
REML
272 km
1.0
0.5
Partial
Sill
Semi-Variogram
197 km
0.0
Nugget
0
100
200
km
300
Interpretations of Spatial Covariance
Parameters
• Patch Characteristics
(Rossi et al. 1992; Robertson and Gross 1994; Dalthorp et al. 2000;
Schwarz et al. 2003 and more)
– Effective Range ~ Size of Patch
– Nugget ~ Tightness of Patches
• Sample Design Modifications
– Effective Range: Independent Samples
– Nugget: Measurement Error
Why Are the Estimates Different?
Simulation Study
Strength of Spatial Correlation?
– Nugget:Sill ratio and/or Range Parameter
• Mardia & Marshall (1984): measurement error increases
variability of ML estimates of range
• Zimmerman & Zimmerman (1991): REML and ML better
when spatial signal weak (short range)
• Lark (2000): ML better compared to MOM when short
range and large nugget:sill ratio
• Thompson (2001): estimation for Matern with 20% and
50% nugget under different spatial designs
Is the spatial correlation too weak?
Effective Range Values for Simulations
Range Parameter
1
3
0.10
2.89
8.67
Nugget-to-Sill Ratio
0.33
0.50
0.67
2.59
2.30
1.90
7.77
6.90
5.70
EMAP Estimates Re-Scaled:
Range Parameter ~1.5
Nugget-to-Sill Ratio ~0.50
0.90
0.69
2.07
Is it the spatial sample design?
-Cluster design optimal for covariance parameter estimation
(Pettitt and McBratney 1993; Muller and Zimmerman 1999; Zhu and Stein 2005; Xia et al. 2006;
Zimmerman 2006; Zhu and Zhang 2006)
Is it the spatial sample design?
0 2 4 6 8 10
0 2 4 6 8 10
0 2 4 6 8 10
n =1 4
n
4
=1
La
4
tti
n
4=1
c
Ra
e
4n
4d
0
2
4
6
8
1 0 0
2
4
6
8
1 0 0
2
4
6
8
1 0
0 2 4 6 8 10
0 2 4 6 8 10
0 2 4 6 8 10
n =3 6
n
1
=3
La
6
tti
n
1=3
c
Ra
e
6n
1d
0
2
4
6
8
1 0 0
2
4
6
8
1 0 0
2
4
6
8
1 0
Zimmerman (2006) and Thompson (2001)
Simulation Study
• Spatial Designs: Lattice, Random, Cluster
• Range Parameter = 1 and 3
• Nugget/Sill Ratio:
0.10, 0.33, 0.50, 0.67, 0.90
• n=144 and n=361 (In-fill Asymptotics)
• 100 realizations per combination
• RandomFields in R
• Estimation using R code (Ver Hoef 2004)
1.Estimation of Covariance Parameters
The Effective Range
Results for Estimation of Effective Range
Range Parameter = 1
Range Parameter = 3
Ratio Design
Method
10%
50%
90%
Ratio Design
Method
Estimation Error
10%
50%
90%
0.10
ML
-0.88
-0.20
0.76
0.10
ML
-4.79
-2.18
2.31
REML
-0.82
0.00
1.11
REML
-4.40
-0.86
9.89
ML
-0.88
-0.25
0.62
ML
-4.48
-2.01
3.02
REML
-0.79
-0.08
0.92
REML
-4.03
-0.58
10.89
ML
-1.01
-0.27
0.94
ML
-5.07
-2.45
3.98
REML
-0.92
-0.11
1.40
REML
-4.66
-0.74
12.75
ML
-359.92
-0.20
0.48
-37.75
-1.31
0.77
REML
-300.49
0.10
502.84
-2.18
-0.10
2464.44
ML
-341.89
-0.16
0.59
-30.42
-1.34
0.84
REML
-295.10
0.06
1390.17
REML
-2.15
-0.14
726.00
ML
REML
-7.21
-1.04
-0.30
-0.02
0.65
30.84
ML
REML
-2.53
-1.91
-1.40
0.35
1.63
1255.04
Estimation Error
grid
random
cluster
0.90
grid
random
cluster
grid
random
cluster
0.90
Estimation Error = estimate - truth
grid
ML
REML
random
cluster
ML
Results for Estimation of Effective Range
Range Parameter = 1
Range Parameter = 3
Ratio Design
Method
10%
50%
90%
Ratio Design
Method
Estimation Error
10%
50%
90%
0.10
ML
-0.88
-0.20
0.76
0.10
ML
-4.79
-2.18
2.31
REML
-0.82
0.00
1.11
REML
-4.40
-0.86
9.89
ML
-0.88
-0.25
0.62
ML
-4.48
-2.01
3.02
REML
-0.79
-0.08
0.92
REML
-4.03
-0.58
10.89
ML
-1.01
-0.27
0.94
ML
-5.07
-2.45
3.98
REML
-0.92
-0.11
1.40
REML
-4.66
-0.74
12.75
ML
-359.92
-0.20
0.48
-37.75
-1.31
0.77
REML
-300.49
0.10
502.84
-2.18
-0.10
2464.44
ML
-341.89
-0.16
0.59
-30.42
-1.34
0.84
REML
-295.10
0.06
1390.17
REML
-2.15
-0.14
726.00
ML
REML
-7.21
-1.04
-0.30
-0.02
0.65
30.84
ML
REML
-2.53
-1.91
-1.40
0.35
1.63
1255.04
Estimation Error
grid
random
cluster
0.90
grid
random
cluster
grid
random
cluster
0.90
grid
ML
REML
random
cluster
ML
Results for Estimation of Effective Range
Range Parameter = 1
Range Parameter = 3
Ratio Design
Method
10%
50%
90%
Ratio Design
Method
Estimation Error
10%
50%
90%
0.10
ML
-0.88
-0.20
0.76
0.10
ML
-4.79
-2.18
2.31
REML
-0.82
0.00
1.11
REML
-4.40
-0.86
9.89
ML
-0.88
-0.25
0.62
ML
-4.48
-2.01
3.02
REML
-0.79
-0.08
0.92
REML
-4.03
-0.58
10.89
ML
-1.01
-0.27
0.94
ML
-5.07
-2.45
3.98
REML
-0.92
-0.11
1.40
REML
-4.66
-0.74
12.75
ML
-359.92
-0.20
0.48
-37.75
-1.31
0.77
REML
-300.49
0.10
502.84
-2.18
-0.10
2464.44
ML
-341.89
-0.16
0.59
-30.42
-1.34
0.84
REML
-295.10
0.06
1390.17
REML
-2.15
-0.14
726.00
ML
REML
-7.21
-1.04
-0.30
-0.02
0.65
30.84
ML
REML
-2.53
-1.91
-1.40
0.35
1.63
1255.04
Estimation Error
grid
random
cluster
0.90
grid
random
cluster
grid
random
cluster
0.90
grid
ML
REML
random
cluster
ML
Results for Estimation of Effective Range
Range Parameter = 1
Range Parameter = 3
Ratio Design
Method
10%
50%
90%
Ratio Design
Method
Estimation Error
10%
50%
90%
0.10
ML
-0.88
-0.20
0.76
0.10
ML
-4.79
-2.18
2.31
REML
-0.82
0.00
1.11
REML
-4.40
-0.86
9.89
ML
-0.88
-0.25
0.62
ML
-4.48
-2.01
3.02
REML
-0.79
-0.08
0.92
REML
-4.03
-0.58
10.89
ML
-1.01
-0.27
0.94
ML
-5.07
-2.45
3.98
REML
-0.92
-0.11
1.40
REML
-4.66
-0.74
12.75
ML
-359.92
-0.20
0.48
-37.75
-1.31
0.77
REML
-300.49
0.10
502.84
-2.18
-0.10
2464.44
ML
-341.89
-0.16
0.59
-30.42
-1.34
0.84
REML
-295.10
0.06
1390.17
REML
-2.15
-0.14
726.00
ML
REML
-7.21
-1.04
-0.30
-0.02
0.65
30.84
ML
REML
-2.53
-1.91
-1.40
0.35
1.63
1255.04
Estimation Error
grid
random
cluster
0.90
grid
random
cluster
grid
random
cluster
0.90
grid
ML
REML
random
cluster
ML
Summary Covariance Parameter Estimation
• Effective Range :
– ML under-estimate the truth
– REML more skewed in 90th percentile (large nuggetto-sill and range parameter)
• Partial Sill:
– ML under-estimate the truth
– REML more skewed in 90th percentile
• Nugget:
– estimated well; particularly with cluster design
Discussion
– Which estimation method to use?
– Consistency Results: 2
(Chen et al. 2000, Zhang and Zimmerman 2005)
– Uncertainty estimates for REML and ML
• REML: Increasing Domain (Cressie and Lahiri 1996)
• ML: Increasing Domain and Infill Asymptotics
(Zhang and Zimmerman 2005)
Acknowledgements
• Co-Authors
• Jay Ver Hoef, Alan Herlihy, Andrew
Merton, Lisa Madsen
Questions
Results
1. Estimation of Covariance Parameters
2. Estimation of Autocorrelation Function
Results:
2. Estimation of Autocorrelation Function
Estimation of Autocorrelation Function
Cluster Design
for
M
ra
L
ng
fo
e
r
A
u
t
o
c
r
e
l
a
i
n
0. 0.2 0.4 0.6 0.8 1.0
A
u
t
o
c
r
e
l
a
i
n
0. 0.2 0.4 0.6 0.8 1.0
M L
0
2
4
6
0
2
D is t
anc e
4
6
D is t
anc
A
u
t
o
c
r
e
l
a
i
n
0. 0.2 0.4 0.6 0.8 1.0
A
u
t
o
c
r
e
l
a
i
n
0. 0.2 0.4 0.6 0.8 1.0
R EM L R
fo
E
r M
ra
L
n
0
2
4
6
0
2
D is t
anc e
4
6
D is t
anc
Summary:
Estimation of Autocorrelation Function
• Overall Patterns:
– ML and REML poor performance with stronger
spatial correlation (larger effective ranges)
– REML large variability
– ML under-estimation
– ‘BEST’ case:
Cluster Design with range parameter = 1 and n=361
Wet Atmospheric Sulfate Deposition
http://www.epa.gov/airmarkets/cmap/mapgallery/mg_wetsulfatephase1.html
E
s
t
i
m
a
e
d
A
u
o
c
r
l
a
t
i
n
F
u
c
o
0. 0.2 0.4 0.6 0.8 1.0
Estimated Auto-correlation Function
for ln(SO4-2)
M
L
R
E
M
L
0
1
0
0
2
0
0
3
0
0
k
m
4
0
0
5
0
0
Sketch of watershed with overlaid
landcover map
Forest
Mining
Urban
Agriculture
2. Estimation of Autocorrelation Function
Lattice Design
Estimation of Autocorrelation Function
Lattice Design
for
M
ra
L
ng
fo
e
r=
A
u
t
o
c
r
e
l
a
i
n
0. 0.2 0.4 0.6 0.8 1.0
A
u
t
o
c
r
e
l
a
i
n
0. 0.2 0.4 0.6 0.8 1.0
M L
0
2
4
6
0
2
D is t
anc e
4
6
D is t
anc
A
u
t
o
c
r
e
l
a
i
n
0. 0.2 0.4 0.6 0.8 1.0
A
u
t
o
c
r
e
l
a
i
n
0. 0.2 0.4 0.6 0.8 1.0
R EM L R
fo
E
r M
ra
L
n
0
2
4
6
0
2
D is t
anc e
4
6
D is t
anc
2. Estimation of Autocorrelation Function
Random Design
Estimation of Autocorrelation Function
Random Design
for
M
ra
L
ng
fo
e
r
A
u
t
o
c
r
e
l
a
i
n
0. 0.2 0.4 0.6 0.8 1.0
A
u
t
o
c
r
e
l
a
i
n
0. 0.2 0.4 0.6 0.8 1.0
M L
0
2
4
6
0
2
D is t
anc e
4
6
D is t
anc
A
u
t
o
c
r
e
l
a
i
n
0. 0.2 0.4 0.6 0.8 1.0
A
u
t
o
c
r
e
l
a
i
n
0. 0.2 0.4 0.6 0.8 1.0
R EM L R
fo
E
r M
ra
L
n
0
2
4
6
0
2
D is t
anc e
4
6
D is t
anc
2. Estimation of Autocorrelation Function
Cluster Design
Download