Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models

advertisement
Using the Maryland Biological
Stream Survey Data to Test Spatial
Statistical Models
A Collaborative Approach to
Analyzing Stream Network Data
Andrew A. Merton
Overview

The material presented here is a subset of the
work done by Erin Peterson for her Ph.D.

Interested in developing geostatistical models for
predicting water quality characteristics in stream
segments


Data: Maryland Biological Stream Survey (MBSS)
The scope and nature of the problem requires
interdisciplinary collaboration

Ecology, geoscience, statistics, others…
Stream Network Data

The response data is comprised of observations
within a stream network
What does it mean to be a “neighbor” in such a
framework?
 How does one characterize the distance between
“neighbors”?

Should distance measures be confined to the stream
network?
 Does flow (direction) matter?

Stream Network Data

Potential explanatory variables are not restricted
to be within the stream network


Topography, soil type, land usage, etc.
How does one sensibly incorporate these
explanatory variables into the analysis?

Can we develop tools to aggregate upstream
watershed covariates for subsequent downstream
segments?
Competing Models

Given a collection of competing models, how
does one select the “best” model?
Is one subset of explanatory variables better or
closer to the “true” model?
 Should one assume correlated residuals and, if so,
what form should the correlation function take?


How does the distance measure impact the choice of
correlation function?
Functional Distances &
Spatial Relationships
Geostatistical models are based on
straight-line distance
B
A
C
Straight-line Distance (SLD)
Is this an appropriate measure of distance?
Influential continuous landscape variables: geology type or acid rain
(As the crow flies…)
Functional Distances &
Spatial Relationships
Distances and relationships are represented
differently depending on the distance
measure
B
A
C
Symmetric Hydrologic Distance (SHD)
Hydrologic connectivity
(As the fish swims…)
Functional Distances &
Spatial Relationships
Distances and relationships are represented
differently depending on the distance
measure
B
A
C
Asymmetric Hydrologic Distance (AHD)
Longitudinal transport of material
(As the sh*t flows…)
Candidate Models

Restrict the model space to general linear
models
2
Z ~ N ( X , )  N ( X ,  )
Look at all possible subsets of explanatory variables
X (Hoeting et al)
 Require a correlation structure that can
accommodate the various distance measures

Could assume that the residuals are spatially independent,
i.e., S = 2I (probably not best)
 Ver Hoef et al propose a better solution

Asymmetric Autocovariance Models for
Stream Networks

Weighted asymmetric hydrologic distance
(WAHD)

Developed by Jay Ver Hoef, National
Marine Mammal Laboratory, Seattle

Moving average models

Incorporates flow and uses hydrologic
distance

Represents discontinuity at confluences
Flow
Exponential Correlation Structure

The exponential correlation function can be
used for both SLD and SHD
1
hij  0

  ij i , j 1 such that ij  
(1  1 ) exp  hij /  2  hij  0
 
n

For AHD, one must multiply  (element-wise) by
the weight matrix A, i.e., ij* = aij ij, hence WAHD
The weights represent the proportion of flow volume that
the downstream location receives from the upstream
location
 Estimating the aij is non-trivial – Need special GIS tools
(Theobald et al)

GIS Tools
Theobald et al have created automated tools to extract data
about hydrologic relationships between sample points
Visual Basic for Applications programs that:
1. Calculate separation distances between sites
 SLD, SHD, Asymmetric hydrologic distance (AHD)
2. Calculate watershed covariates for each stream segment
 Functional Linkage of Watersheds and Streams (FLoWS)
3. Convert GIS data to a format compatible with statistics software
1
2
2
1
3
SLD
3
SHD
1
2
3
AHD
Spatial Weights for WAHD
Proportional influence: influence of each neighboring sample site
on a downstream sample site
•Weighted by catchment area: Surrogate for flow
1. Calculate influence of each upstream
segment on segment directly downstream
2. Calculate the proportional influence of
one sample site on another
• Multiply the edge proportional
influences
3. Output:
• n×n weighted incidence matrix
stream confluence
stream segment
Spatial Weights for WAHD
Proportional influence: influence of each neighboring sample site
on a downstream sample site
•Weighted by catchment area: Surrogate for flow
1. Calculate influence of each upstream
segment on segment directly downstream
2. Calculate the proportional influence of
one sample site on another
• Multiply the edge proportional
influences
3. Output:
• n×n weighted incidence matrix
stream confluence
stream segment
Spatial Weights for WAHD
Proportional influence: influence of each neighboring sample site
on a downstream sample site
•Weighted by catchment area: Surrogate for flow
1. Calculate influence of each upstream
segment on segment directly downstream
2. Calculate the proportional influence of
one sample site on another
• Multiply the edge proportional
influences
3. Output:
• n×n weighted incidence matrix
stream confluence
stream segment
Spatial Weights for WAHD
Proportional influence: influence of each neighboring sample site
on a downstream sample site
•Weighted by catchment area: Surrogate for flow
1. Calculate influence of each upstream
segment on segment directly downstream
A
C
2. Calculate the proportional influence of
one sample site on another
• Multiply the edge proportional
influences
3. Output:
• n×n weighted incidence matrix
B
E
D
F
G
H
survey sites
stream segment
Spatial Weights for WAHD
Proportional influence: influence of each neighboring sample site
on a downstream sample site
•Weighted by catchment area: Surrogate for flow
1. Calculate influence of each upstream
segment on segment directly downstream
A
C
2. Calculate the proportional influence of
one sample site on another
• Multiply the edge proportional
influences
3. Output:
• n×n weighted incidence matrix
B
E
D
F
G
H
Site PI = B * D * F * G
Parameter Estimation

Maximize the (profile) likelihood to obtain
estimates for , , and 2
Profile likelihood:
n
n
1
n
 profile( ; ˆ , ˆ 2 , Z )   log( 2 )  log( ˆ 2 )  log  
2
2
2
2
ˆ ( )  ( X '  1 X ) 1 X '  1Z
MLEs
1
(
Z

X

)'

( Z  X )
ˆ 2 ( ) 
n
Model Selection

Hoeting et al adapted the Akaike Information
Corrected Criterion for spatial models
AICC estimates the difference between the candidate
model and the “true” model
 Select models with small AICC

p  k 1
AICC  2 profile( ;  ,  , Z )  2n
n pk 2
2
where n is the number of observations, p-1 is the number of
covariates, and k is the number of autocorrelation parameters
Spatial Distribution of MBSS Data
N
Summary Statistics for Distance Measures
• Distance measure greatly impacts the number of
neighboring sites as well as the median, mean, and
maximum separation distance between sites
Summary statistics for distance measures in kilometers using DO (n=826).
Distance Measure
N Pairs
Min
Median
Mean
Max
Straight Line
Distance
340725
0.05
101.02
118.16
385.53
Symmetric
Hydrologic Distance
62625
0.05
156.29
187.10
611.74
Pure Asymmetric *
Hydrologic Distance
1117
0.05
4.49
5.83
27.44
* Asymmetric hydrologic distance is not weighted here
Comparing Distance Measures

The “selected” models (one for each distance
measure) were compared by computing the mean
square prediction error (MSPE)
GLM: Assumed independent errors
 Withheld the same 100 (randomly) selected records
from each model fit
np
2
 Want MSPE to be small
ˆ
(
Z

Z
)
 i i

MSPE 
i 1
np
Comparing Distance Measures
Prediction Performance for Various Responses
ANC
DOC
COND
350000.00
40000.00
300000.00
35000.00
DO
9.00
2.50
GLM
8.00
2.00
7.00
30000.00
250000.00
6.00
25000.00
1. 5 0
200000.00
5.00
20000.00
15 0 0 0 0 . 0 0
4.00
15 0 0 0 . 0 0
10 0 0 0 0 . 0 0
10 0 0 0 . 0 0
50000.00
5000.00
0.00
0.00
MSPE
GLM
SL
SH
1. 0 0
3.00
2.00
0.50
0.00
0.00
GLM
WAH
SL
SH
WAH
GLM
PHLAB
NO3
1. 2 0
SLD
1. 0 0
SL
SH
GLM
WAH
SO4
0 . 18
400.00
0 . 16
350.00
1. 0 0
SL
SH
WAH
TEMP
SHD
9.00
8.50
0 . 14
300.00
0 . 12
0.80
250.00
8.00
0 . 10
0.60
0.40
0.20
0.06
15 0 . 0 0
0.04
10 0 . 0 0
0.02
50.00
GLM
SL
SH
WAH
7.50
7.00
0.00
0.00
WAHD
200.00
0.08
0.00
GLM
SL
SH
WAH
6.50
GLM
SL
SH
WAH
GLM
SL
SH
WAH
Maps of the Relative Weights

Generated maps by kriging (interpolation)

Predicted values are linear combinations of the
“observed” data, i.e.,
E ( Z 2 | Z1 ) 
1
1
1
1
1
( X 2 ( X 1T 11
X 1 ) 1 X 1T 11
  2111
( I  X 1 ( X 1T 11
X 1 ) 1 X 1T 11
)) Z1
 MZ1
Z1 is the observed data, Z2 is the predicted value, 11 is the
correlation matrix for the observed sites, and  is the correlation
matrix between the prediction site and the observed sites
Relative Weights Used to Make Prediction at Site 465
General Linear Model
Straight-line
Symmetric Hydrologic
Weighted Asymmetric Hydrologic
Relative Weights Used to Make Prediction at Site 465
General Linear Model
Straight-line
Symmetric Hydrologic
Weighted Asymmetric Hydrologic
Residual Correlations for Site 465
General Linear Model
Symmetric Hydrologic
Straight-line
Weighted Asymmetric Hydrologic
Residual Correlations for Site 465
General Linear Model
Straight-line
Symmetric Hydrologic
Weighted Asymmetric Hydrologic
Some Comments on the Sampling Design
Probability-based random survey design
• Designed to maximize spatial independence of survey
sites
• Does not adequately represent spatial relationships in
stream networks using hydrologic distance measures
275
244
244 sites did not have neighbors
Sample Size = 881
Number of sites with ≥ 1 neighbor: 393
Mean number of neighbors per site: 2.81
Frequency
149
133
109
66
38
35
32
12
19
7
15
13
6
1
0
0
2
13
14
15
16
17
0
0
1
2
3
4
5
6
7
8
9 10 11
Number of Neighboring Sites
12
Conclusions

A collaborative effort enabled the analysis of a
complicated problem
Ecology – Posed the problem of interest, provides
insight into variable (model) selection
 Geoscience – Development of powerful tools based
on GIS
 Statistics – Development of valid covariance
structures, model selection techniques
 Others – e.g., very understanding (and sympathetic)
spouses…

Download