Local Enhancement of Global Estimation

advertisement
Local Enhancement
of Global Estimation
Molly Leecaster, Ph.D.
Kerry Ritter, Ph.D.
DAMARS and STARMAP 2nd Annual Conference
Oregon State University Corvallis, OR
August 11, 2003
Acknowledgement
PROJECT FUNDING
• The work reported here was developed under the STAR
Research Assistance Agreement CR-829095 awarded by the
U.S. Environmental Protection Agency (EPA) to Colorado State
University. This presentation has not been formally reviewed by
EPA. The views expressed here are solely those of the
presenter and STARMAP, the Program they represent. EPA
does not endorse any products or commercial services
mentioned in this presentation.
Outline of Presentation
• Introduction
• Two-stage sample design
• Spatial modeling of binary EMAP data
– Indicator kriging
– Conditional autoregressive model
• Simulation Example
• Future work
Introduction
• EMAP developed for estimation of areal extent of
resources
– Sample locations are spatially separated
• EMAP participants are interested in global estimation
but also have local concerns
– Spatial modeling
• EMAP data does not provide information on the local
spatial structure required for good spatial models
• Therefore ….
Augment EMAP design to improve spatial modeling
Goals
• Present enhancement to EMAP design
• Use of enhanced sample in spatial models of
indicator data
– Indicator kriging
– Conditional autoregressive model
Outline of Presentation
• Introduction
• Two-stage sample design
• Spatial modeling of EMAP data
• Simulation Example
• Future work
Two-stage: Systematic Grid Plus
Star Cluster Sample Design
• Two-stage because two goals
– Systematic (EMAP) grid for global structure
– Star cluster sample for variogram estimation
• Enhance EMAP design with additional sample
locations
– Ideal for areal extent and prediction
– Ideal for variogram estimation
Two-Stage Design
80
StarData1F
Pink…….…….absence
Blue…….…….presence
60
Black….……...systematic
Green.………..star clusters 1
20
40
y
Orange…..…..star clusters 2
0
20
40
x
60
80
Stage One:
Systematic Component (EMAP)
• Based on global estimation requirements
– e.g. 30 spatially separated locations per strata
Stage Two:
Star Cluster Component
• Star clusters of sample sites around stage-one
locations
• Star clusters provide estimate of small scale pairwise variance
• Star clusters also provide many added pairs of
samples at various distance lags
• Star clusters provide directional information at small
scale
• How to specify star clusters?
Stage Two:
Star Cluster Component
• Location of star clusters
– Adaptive, locate at specified observed response
• Does this bias the variogram estimation?
– Random stage-one locations
– Systematic subset of stage-one locations
• Size of star clusters
– Diameter of star = variogram range
– Diameter of star > variogram range
• Number of star clusters
– At least two, but how many more?
Outline of Presentation
• Introduction
• Two-stage sample design
• Spatial modeling of EMAP data
• Simulation Example
• Future work
Spatial Models for Binary Data
• Indicator kriging for geo-referenced data
• Conditional autoregressive model for binary lattice
data
Indicator Kriging
• Binary geo-referenced data
• Spatial correlation structure modeled from data
• Precision of predictions depends on sample spacing
and variogram parameters
Ordinary Indicator Kriging

*
F
• Estimate local indicator mean, oIK u; z k
location

, at each
u
• Apply simple IK estimator using estimated mean
I u; zk 
*
OK
n u 
   u; z k I u ; z k   
SK
 1
SK
m
u; zk F u; zk 
*
oIK
Conditional Autoregressive Model
for Binary Data
• Binary lattice data
• Spatial correlation structure assumed: locally
(neighborhood) dependent Markov random field
• Neighborhood defined as fixed pattern of surrounding
grid points
• Precision of predictions depends on neighborhood
structure, grid size, and variance of response
Conditional Autoregressive Model
for Binary Data
yi  xi ai
xi ~ Bernoulli ( pi )
exp  0  sii 
pi 
1  exp  0  sii 
yi observed presence/a bsence
xi true presence/a bsence
ai sample indicator
si sum of neighborho od presence/a bsence
Comparison of Models
• Ordinary Indicator Kriging
– Advantages
• Knowledge of spatial relationship improves prediction
• Assumed spatial relationship based on data
– Disadvantages
• Not robust to variogram mis-specification
• Requires strong stationarity assumption
• Conditional autoregressive
– Advantages
• No need to estimate or model variogram
• Can be used without geo-referenced data
– Disadvantages
• Assumed spatial relationship based on a grid size that could
be inaccurate
Outline of Presentation
• From last year to now … progress & new directions
• Two-stage sample design
• Spatial modeling of EMAP data
• Simulation Example
• Future work
Simulation Example
• Used simulation so spatial structure was known
• Simulated response from specific variogram model
on to 50x50 hexagon grid of points
• Specified presence/absence cutoff
• Applied two-stage sample design (2 realizations)
• Estimated and modeled variogram from sample data
– For some, did two manual and one automatic fit
• Predicted probability of presence using indicator
kriging and conditional autoregressive model
Simulation Methods
• Simulated data from Gaussian random field (S-Plus)
– Spherical variogram, range = 22, sill = 0.4, nugget = 0
– Simulated value > 2 => presence
• Sample Designs
– Systematic sample (n=30)
– Systematic sample plus 2 star clusters (n=54)
– Systematic sample plus 4 star clusters (n=78)
• Models
– Indicator kriging
– Conditional autoregressive model
Data Simulation with Sample Sites
StarData1F
80
Pink…….…….absence
Blue…….…….presence
60
Black….……...systematic
Green.………..star clusters 1
20
40
y
Orange…..…..star clusters 2
0
20
40
x
60
80
Variogram for Sample Designs
Systematic
0.5
0
10
20
30
40
50
0.1
0.00
0.2
gamma
0.05
0.3
0.4
0.10
gamma
0.15
Systematic + 2 Stars
0.0
distance
0
10
20
30
40
Systematic + 4 Stars
50
Sill
Nugget
Systematic 17
0.17
0
Sys. + 2
20
0.4
0
Sys. + 4
14
0.4
0
0.2
0.1
0.0
gamma
0.3
Range
0.4
distance
0
10
20
30
distance
40
50
Systematic
Sample
Results
Ordinary Indicator Kriging Predictions
80
From Systematic Sample on Data 1F
60
0
0.2
0.4
0.6
0.8
1
80
40
Conditional Autregressive Model Predictions
From Systematic Sample on Data 1F
60
20
0
0.2
0.4
0.6
0.8
1
StarData1F
20
40
60
80
20
20
40
y
60
40
80
0
0
20
40
x
60
80
0
20
40
60
80
Systematic Sample with 2 Stars
80
Ordinary Indicator Kriging Predictions
From Systematic + 2 Star Sample on Data 1F
60
0
0.2
0.4
0.6
0.8
1
20
80
40
Conditional Autregressive Model Predictions
From Systematic +2 Star Sample on Data 1F
60
0
0.2
0.4
0.6
0.8
1
StarData1F
20
40
60
80
20
40
20
y
60
40
80
0
0
0
20
40
x
60
80
20
40
60
80
Systematic Sample with 4 Stars
80
Ordinary Indicator Kriging Predictions
From Systematic + 4 Star Sample on Data 1F
60
0
0.2
0.4
0.6
0.8
1
80
40
Conditional Autregressive Model Predictions
From Systematic + 4 Star Sample on Data 1F
60
20
0
0.2
0.4
0.6
0.8
1
StarData1F
20
40
60
80
20
40
y
20
60
40
80
0
0
0
20
40
x
60
80
20
40
60
80
Three Fits: Systematic + 2 Stars
0.4
0.5
Automatic Fit
gamma
0.0
0.3
0.1
0.4
0.2
0.5
gamma
0.3
Manual Fit #1
objective = 0.1467
10
20
30
40
50
0.2
0
Range
Sill
Nugget
0.0
0.1
distance
objective = 0.2307
0
10
20
30
40
Manual Fit #2
50
0.3
0
20
0.4
0
11
0.27
0
All use correct model
0.0
0.1
0.2
gamma
0.3
0.4
17
0.5
distance
objective = 0.197
0
10
20
30
distance
40
50
Predictions from 3 Variogram Fits
80
Automatic Fit
0
Manual Fit #1
0.2
Ordinary
Indicator Kriging Predictions
0.4
From Systematic
+ 2 Star Sample on Data 1F
0.6
80
60
0.8
1
40
20
60
40
0
0.2
0.4
0.6
0.8
1
40
StarData1F
60
80
20
20
60
60
80
0
80
Manual Fit #2
20
40
60
80
20
20
40
40
y
0
0
20
40
x
60
80
Comparison of Prediction Errors
• Sensitivity
– Number of presence sites predicted to be present
• Specificity
– Number of absence sites predicted to be absent
• True Positive Rate
– Number of predicted presence sites that truly are
present
• True Negative Rate
– Number of predicted absence sites that truly are
absent
Comparison of Predictions (Data1F)
(positive if probability > 0.5)(Auto, Manual #2)
Model
Indicator
Kriging
Conditional
Auto.
Sample
Sensitivity Specificity True
True
Positive Negative
Rate
Rate
Systematic
28%
98%
85%
74%
Systematic
+ 2 Stars
Systematic
+ 4 Stars
Systematic
Systematic
+ 2 Stars
Systematic
+ 4 Stars
41%
94%
77%
77%
(36%, 27%)
(96%, 99%)
(80%, 76%)
(90%, 74%)
32%
97%
85%
75%
15%
96%
63%
70%
56%
85%
64%
80%
54%
86%
65%
80%
Comparison of Predictions (Data1F)
(positive if probability > 0.3)(Auto, Manual #2)
Model
Indicator
Kriging
Sample
Sensitivity Specificity True
True
Positive Negative
Rate
Rate
Systematic
48%
91%
71%
78%
Systematic
+ 2 Stars
Systematic
+ 4 Stars
Conditional Systematic
Auto.
Systematic
+ 2 Stars
Systematic
+ 4 Stars
59%
85%
65%
81%
(56%, 44%)
(87%, 93%)
(67%, 76%)
(80% ,78%)
49%
91%
73%
79%
48%
80%
53%
76%
80%
46%
42%
83%
80%
49%
43%
83%
Data Simulation with Sample Sites
80
StarData3F
Pink…….…….absence
60
Blue…….…….presence
y
Black….……...systematic
40
Green.………..star clusters 1
20
Orange…..…..star clusters 2
0
20
40
x
60
80
Variograms for Sample Designs
Systematic
gamma
0
10
20
30
0.05
0.0
0.10
0.15
0.20
0.25
0.2
0.1
gamma
0.30
0.3
Systematic + 2 Stars
40
0.00
distance
0
10
20
30
Systematic + 4 Stars
40
0.27
0
Sys. + 2
12
0.30
0.05
Sys. + 4
13
0.30
0.03
0.2
Systematic 15
gamma
Nugget
0.1
Sill
0.0
Range
0.3
distance
0
10
20
30
distance
40
Systematic Sample Results
80
Ordinary Indicator Kriging Predictions
From Systematic Sample on Data 3F
60
0
0.2
0.4
0.6
0.8
1
80
40
Conditional Autregressive Model Predictions
From Systematic Sample on Data 3F
60
20
0
0.2
0.4
0.6
0.8
1
StarData3F
20
40
60
80
20
40
y
20
60
40
80
0
0
0
20
40
x
60
80
20
40
60
80
Systematic Sample with 2 Stars
80
Ordinary Indicator Kriging Predictions
From Systematic + 2 Star Sample on Data 3F
40
60
0
0.2
0.4
0.6
0.8
1
StarData3F
20
40
60
80
20
40
20
y
60
40
80
0
0
0.2
0.4
0.6
0.8
1
60
20
80
Conditional Autregressive Model Predictions
From Systematic +2 Star Sample on Data 3F
0
0
20
40
x
60
80
20
40
60
80
Systematic Sample with 4 Stars
80
Ordinary Indicator Kriging Predictions
From Systematic + 4 Star Sample on Data 3F
60
0
0.2
0.4
0.6
0.8
1
80
40
Conditional Autregressive Model Predictions
From Systematic + 4 Star Sample on Data 3F
60
20
0
0.2
0.4
0.6
0.8
1
StarData3F
20
40
60
80
20
40
y
20
60
40
80
0
0
0
20
40
x
60
80
20
40
60
80
Three Fits: Systematic
objective = 0.0356
0
10
20
30
40
0.2
gamma
0.1
0.3
0.2
Manual Fit #1
0.0
0.1
distance
Range
Sill
Nugget
0.0
objective = 0.0519
0
10
20
30
Manual Fit #2
40
distance
.27
0
8
.22
0
All use correct model
0.2
15
0.3
.21
gamma
.25
0.1
30
0.0
gamma
0.3
Automatic Fit
objective = 0.0333
0
10
20
distance
30
40
Predictions from 3 Variogram Fits
80
Automatic Fit
Manual Fit #1
40
60
40
80
Ordinary Indicator Kriging Predictions
From Systematic Sample on Data 3F
20
60
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
20
Manual Fit #2
20
40
60
80
80
0
StarData3F
20
40
60
0
0.2
0.4
0.6
0.8
1
80
20
20
40
y
40
60
60
80
0
0
20
40
x
60
80
0
20
40
60
80
Comparison of Predictions (Data3F)
(positive if probability > 0.5)(Auto, Manual #2)
Model
Indicator
Kriging
Sample
Sensitivity Specificity True
True
Positive Negative
Rate
Rate
Systematic
31%
92%
65%
73%
Systematic
+ 2 Stars
Systematic
+ 4 Stars
Conditional Systematic
Auto.
Systematic
+ 2 Stars
Systematic
+ 4 Stars
(1%, 15%)
(99%, 97%)
(88%, 69%)
(68%, 70%)
21%
96%
75%
72%
24%
97%
81%
72%
7%
98%
65%
69%
17%
97%
71%
71%
18%
99%
88%
71%
Comparison of Predictions (Data3F)
(positive if probability > 0.3)(Auto, Manual #2)
Model
Indicator
Kriging
Sample
Sensitivity Specificity True
Positive
Rate
Systematic
62%
80%
60%
Systematic
+ 2 Stars
Systematic
+ 4 Stars
Conditional Systematic
Auto.
Systematic
+ 2 Stars
Systematic
+ 4 Stars
True
Negative
Rate
81%
(72%, 37%)
(69%, 89%)
(53%, 63%)
(84%, 75%)
43%
90%
68%
77%
44%
91%
71%
77%
68%
57%
41%
77%
78%
58%
47%
84%
80%
56%
47%
85%
Simulation Conclusions - Design
• Two star clusters improved small-scale features of
variogram
• Two star clusters improved prediction accuracy
• Four star clusters offered little improvement over two
stars
Simulation Conclusions - Models
• Variogram model affects predictions
• Kriging tends toward overall mean probability of
presence, i.e. it smooths
• Kriging builds patches whose diameter is
approximately the range of the variogram
• Conditional autoregressive model attempts to
connect observed presence
• Neither model had consistently higher sensitivity or
specificity
Outline of Presentation
• From last year to now … progress & new directions
• Two-stage sample design
• Spatial modeling of EMAP data
• Simulation Example
• Future work
Future Work
• Further simulation studies on two stage design
– Effect of sample size
– Number of star clusters necessary to improve
variogram estimation
– Effect of size of star clusters
– Bias from adaptive second-stage sampling
– Advantages of indicator kriging and conditional
autoregressive model
– Sensitivity of conditional autoregressive model to
initial values, prior distributions, and grid size
– Sensitivity of kriging to variogram model
specification
Future Work
• Apply two-stage sample design to real data
– DDT data from Santa Monica Bay, CA
– EMAP data and local monitoring data
• Freely distribute functions for applying the conditional
autoregressive model on a hexagon lattice
– Functions in R to produce hexagon lattice input for
WinBUGS
– File in WinBUGS to apply model
• Investigate optimal grid size to achieve EMAP and
spatial modeling goals
Systematic (EMAP) Grid Based on
Variogram Model
• Kriging variance

n (u )
2
OK
u   C 0   OK u C u  u  OK u 
 1
where
C 0 is the covariance at distance 0
OK is the kriging weight
C u  u  is the distance - dependent covariance term
• Analog for conditional autoregressive model

AL

 in neighborho od of u
 1 n(u )


 not in neighborho od of u
 0
Systematic (EMAP) Grid Based on
Variogram Model
• Prediction variance is minimized by large covariance
between prediction location and sample locations
• For kriging, grid refers to sample locations
• For conditional autoregressive, grid refers to sample
locations and prediction locations
• Want -------- Sample locations “close” together
– Samples too far apart =>
• Kriging -> correctly uses no spatial relationship
• Conditional autoregressive -> incorrectly uses
assumed spatial relationship
– Samples too close together => waste of resources
Download