Predicting Water Quality Impaired Stream Segments using

advertisement
Predicting Water Quality Impaired Stream Segments using
Landscape-scale Data and a Regional Geostatistical Model
Erin E. Peterson
Postdoctoral Research Fellow
CSIRO Mathematical and Information Sciences Division
March 3, 2006
www.csiro.au
Space-Time Aquatic Resources
Modeling and Analysis Program
The work reported here was developed under STAR Research
Assistance Agreement CR-829095 awarded by the U.S.
Environmental Protection Agency (EPA) to Colorado State
University. This presentation has not been formally reviewed by
EPA. EPA does not endorse any products or commercial services
mentioned in this presentation.
This research is funded by
U.S.EPA 凡Science
Science To
ToAchieve
Achieve
Results (STAR) Program
Cooperative
Agreement # CR - 829095
Collaborators
Dr. David M. Theobald
Natural Resource Ecology Lab
Department of Recreation & Tourism
Colorado State University, USA
Dr. N. Scott Urquhart
Department of Statistics
Colorado State University, USA
Dr. Jay M. Ver Hoef
National Marine Mammal Laboratory, Seattle, USA
Andrew A. Merton
Department of Statistics
Colorado State University, USA
Overview
Introduction
~
Background
~
Patterns of spatial autocorrelation
in stream water chemistry
~
Visualizing model predictions
~
Current and future research in
SEQ
Purpose of Our Research
Water Quality Monitoring Goals
 Create a regional water quality
assessment
 Identify water quality impaired
stream segments
Purpose
 Demonstrate a geostatistical
methodology based on
 Coarse-scale GIS data
 Field surveys
 Predict water quality
characteristics about stream
segments throughout a region
How are geostatistical model different from traditional
statistical models?
Traditional statistical models (non-spatial)
 Residual error (ε) is assumed to be uncorrelated
 ε = unexplained variability in the data
Y  X 
Geostatistical models
 Residual errors are correlated through space
 Spatial patterns in residual error resulting from unidentified
process(es)
 Model spatial structure in the residual error
 Explain additional variability in the data
 Generate predictions at unobserved sites
Y ( s )  X ( s )    ( s )
Geostatistical Modelling
Fit an autocovariance function to data

Describes relationship between observations based on separation
distance
3 Autocovariance Parameters
2) Sill: delineated where semivariance
asymptotes
3) Range: distance within which spatial
autocorrelation occurs
Sill
Semivariance
1) Nugget: variation between sites as
separation distance approaches zero
10
Nugget
0
0
Range
Separation Distance
1000
Distance Measures and Spatial Relationships
B
A
C
Straight Line Distance (SLD)
 As the crow flies
Distance Measures and Spatial Relationships
B
A
C
Symmetric Hydrologic Distance (SHD)
 As the fish swims
Distance Measures and Spatial Relationships
B
A
C
Weighted asymmetric hydrologic distance (WAHD)
 As the water flows
 Incorporate flow direction & flow volume
Ver Hoef, J.M., Peterson, E.E., and Theobald, D.M. (2006) Spatial Statistical Models that Use Flow and Stream
Distance, Environmental and Ecological Statistics, to appear.
Distance Measures and Spatial Relationships
B
A
C
Challenge:
 Spatial autocovariance models developed for SLD may not be valid for hydrologic
distances
– Covariance matrix is not positive definite
Asymmetric Autocovariance Models for Stream Networks
Weighted asymmetric hydrologic distance
(WAHD)
Developed by Jay Ver Hoef, National
Marine Mammal Laboratory, Seattle,
WA, USA
Moving average models
Incorporate flow volume, flow direction,
and use hydrologic distance
Positive definite covariance matrices
Ver Hoef, J.M., Peterson, E.E., and Theobald, D.M., Spatial Statistical Models that Use Flow and Stream
Distance, Environmental and Ecological Statistics. In Press.
Flow
Objectives
Evaluate 8 chemical response variables
1.
2.
3.
4.
5.
6.
7.
8.
pH measured in the lab (PHLAB)
Conductivity (COND) measured in the lab μmho/cm
Dissolved oxygen (DO) mg/l
Dissolved organic carbon (DOC) mg/l
Nitrate-nitrogen (NO3) mg/l
Sulfate (SO4) mg/l
Acid neutralizing capacity (ANC) μeq/l
Temperature (TEMP) °C
Determine which distance measure is most appropriate


SLD, SHD, WAHD?
More than one?
Find the range of spatial autocorrelation
Maryland Biological Stream Survey (MBSS) Data
Maryland Department of Natural Resources
 Maryland, USA
 1995, 1996, 1997
Stratified probability-based random survey design
 1st, 2nd, and 3rd order non-tidal streams
 955 sites
 881 sites after pre-processing
 17 interbasins
Maryland, USA
Baltimore
Annapolis
Washington D.C.
Study
Area
Chesapeake Bay
Spatial Distribution of MBSS Data
N
Functional Linkage of Watersheds and Streams (FLoWS)
Create data for geostatistical modelling
1. Calculate watershed covariates for each stream segment
2. Calculate separation distances between sites
 SLD, SHD, Asymmetric hydrologic distance (AHD)
3. Calculate the spatial weights for the WAHD
4. Convert GIS data to a format compatible with statistics software
FLoWS website: http://www.nrel.colostate.edu/projects/starmap
2
1
3
SLD
1
2
3
SHD
1
2
3
AHD
Spatial Weights for WAHD
Proportional influence (PI): influence of each neighboring survey site on a
downstream survey site

Weighted by catchment area: Surrogate for flow volume
1. Calculate the PI of each upstream
segment on segment directly downstream
Watershed
Segment B
Watershed
Segment A
A
2. Calculate the PI of one survey site on
another site
 Flow-connected sites
 Multiply the segment PIs
B
C
Segment PI
of A
=
Watershed Area A
Watershed Area A+B
Spatial Weights for WAHD
Proportional influence (PI): influence of each neighboring survey site on a
downstream survey site

Weighted by catchment area: Surrogate for flow volume
1. Calculate the PI of each upstream
segment on segment directly downstream
A
C
B
E
2. Calculate the PI of one survey site on
another site
 Flow-connected sites
 Multiply the segment PIs
D
F
G
H
survey sites
stream segment
Spatial Weights for WAHD
Proportional influence (PI): influence of each neighboring survey site on a
downstream survey site

Weighted by catchment area: Surrogate for flow volume
1. Calculate the PI of each upstream
segment on segment directly downstream
A
C
B
E
2. Calculate the PI of one survey site on
another site
 Flow-connected sites
 Multiply the segment PIs
D
F
G
H
Site PI = B * D * F * G
Data for Geostatistical Modelling
Distance matrices

SLD, SHD, AHD
Spatial weights matrix

Contains flow dependent weights for
WAHD
Watershed covariates

Lumped watershed covariates

Mean elevation, % Urban
Observations

MBSS survey sites
Geostatistical Modeling Methods
Validation Set

Unique for each chemical response variable
Initial Covariate Selection

5 covariates
Model Development


Restricted model space to all possible linear models
4 model sets
Response
ANC (μeq/l)
COND (μmho/cm)
DOC (mg/l)
DO (mg/l)
NO3 (mg/l)
pH Lab
SO4 (mg/l)
TEMP (°C)
Significant Covariates
PASTUR, LOWURB, WOODYWET, YR96, YR97
HIGHURB, LOWURB, COALMINE, YR96, NORTHING
WOODYWET, CONIFER, MIXEDFOR, LOWURB, NORTHING
DECIDFOR, HIGHURB, WOODYWET, YR96, YR97
PASTUR, PROBCROP, ROWCROP, LOWURB, WATER
PROBCROP, DECIDFOR, WOODYWET, ACREAGE, CONIFER
LOWURB, COALMINE, NORTHING, ER67, ER69
PROBCROP, LOWURB, WATER, YR96, YR97
Geostatistical Modelling Methods
Geostatistical model parameter estimation
Maximize the profile log-likelihood function
Log-likelihood function of the parameters ( ,  ,  2 ) given the observed data Z is:
( ,  ,  2 ; Z )  
n
1
1
log( 2 )  log  2  
( Z  X )'  1 ( Z  X )
2
2
2
2
Maximizing the log-likelihood with respect to B and sigma2 yields:
ˆ  ( X '  1 X ) 1 X '  1Z
and
( Z  X ˆ ) '  1 ( Z  X ˆ )
ˆ 
n
2
Both maximum likelihood estimators can be written as functions of  alone
Derive the profile log-likelihood function by substituting the MLEs ( ˆ , ˆ ) back into the
log-likelihood function
2
n
n
1
n
 profile( ; ˆ , ˆ 2 , Z )   log( 2 )  log( ˆ 2 )  log  
2
2
2
2
Geostatistical Modeling Methods
Correlation matrix for SLD and SHD models
Fit exponential autocorrelation function
1
C1 (h;1 , 2 )  
(1  1 ) exp(h / 2 )
if h  0
if h  0
where C1 is the correlation based on the distance between two sites, h, given the
autocorrelation parameter estimates: nugget (0 ), sill (1 ), and range ( 2).
Correlation matrix for WAHD model
 Fit exponential autocorrelation function (C1)
 Hadamard (element-wise) product of C1 & square root of spatial weights
matrix forced into symmetry (  jB w j )
D
0

C ( si , s j |  )  C1 (0)   0

 jBD w j C1 (h)
locations are not flow connected,
if location 1 = location 2,
otherwise.
Geostatistical Modeling Methods
Model selection within model set


GLM: Akaike Information Corrected Criterion (AICC)
Geostatistical models: Spatial AICC (Hoeting et al., in press)
AICC  2 profile( ;  ,  2 , Z )  2n
p  k 1
n pk 2
where n is the number of observations, p-1 is the number of covariates, and k is the
number of autocorrelation parameters.
http://www.stat.colostate.edu/~jah/papers/spavarsel.pdf
Model selection between model types



100 Predictions: Universal kriging algorithm
Mean square prediction error (MSPE)
Cannot use AICC to compare models based on different distance
measures
Model comparison

r2 for observed vs. predicted values
Results
Summary statistics for distance measures



Spatial neighborhood differs
Affects number of neighboring sites
Affects median, mean, and maximum separation distance
Summary statistics for distance measures in kilometers using DO (n=826).
Distance Measure
N Pairs
Min
Median
Mean
Max
Straight Line
Distance
340725
0.05
101.02
118.16
385.53
Symmetric
Hydrologic Distance
62625
0.05
156.29
187.10
611.74
Pure Asymmetric *
Hydrologic Distance
1117
0.05
4.49
5.83
27.44
* Asymmetric hydrologic distance is not weighted here
Results
Range of spatial autocorrelation differs
Mean Range Values



SLD = 28.2 km
SHD = 88.03 km
WAHD = 57.8 km
Shortest for SLD
TEMP = shortest range values
DO = largest range values
180.79
100.00
301.76
90.00
Range (km)
80.00
70.00
SLD
60.00
SHD
50.00
40.00
WAHD
30.00
20.00
10.00
0.00
ANC
COND
DOC
DO
NO3
PHLAB
SO4
TEMP
Results
Distance Measures

GLM always has less predictive ability

More than one distance measure usually performed well
– SLD, SHD, WAHD: PHLAB & DOC
– SLD and SHD : ANC, DO, NO3
– WAHD & SHD: COND, TEMP

SLD distance: SO4
DOC
COND
ANC
350000.00
40000.00
300000.00
35000.00
9.00
2.50
GLM
8.00
2.00
7.00
30000.00
250000.00
6.00
25000.00
1. 5 0
200000.00
5.00
20000.00
15 0 0 0 0 . 0 0
4.00
15 0 0 0 . 0 0
10 0 0 0 0 . 0 0
5000.00
0.00
0.00
GLM
SL
SH
1. 0 0
3.00
10 0 0 0 . 0 0
50000.00
MSPE
DO
2.00
0.50
1. 0 0
0.00
0.00
GLM
WAH
SL
SH
WAH
GLM
PHLAB
NO3
1. 2 0
SL
SH
GLM
WAH
SO4
0 . 18
400.00
0 . 16
350.00
1. 0 0
SL
SH
WAH
TEMP
SLD
SHD
9.00
8.50
0 . 14
300.00
0 . 12
0.80
250.00
8.00
0 . 10
0.60
0.40
0.20
0.06
15 0 . 0 0
0.04
10 0 . 0 0
50.00
0.00
GLM
SL
SH
WAH
7.50
7.00
0.02
0.00
WAHD
200.00
0.08
0.00
GLM
SL
SH
WAH
6.50
GLM
SL
SH
WAH
GLM
SL
SH
WAH
Results
Predictive ability of models
r2

Strong: ANC, COND, DOC, NO3, PHLAB
Weak: DO, TEMP, SO4
1.00
0.90
0.80
GLM
0.70
0.60
SLD
R2
r2 0.50
0.40
SHD
0.30
WAHD
0.20
0.10
0.00
ANC
COND
DOC
DO
NO3
PHLAB
SO4
TEMP
Discussion
Distance measure influences how spatial relationships are
represented in a stream network


Site’s relative influence on other sites
Dictates form and size of spatial neighborhood
Important because…

Impacts accuracy of the geostatistical model predictions
SLD
SHD
WAHD
Discussion
Patterns of spatial autocorrelation found at
relatively coarse scale

Geostatistical models describe more variability than
GLM
SLD, SHD, and WAHD represent spatial
autocorrelation in continuous coarse-scale
variables



SLD
> 1 distance measure performed well
SLD never substantially inferior
Do not represent movement through network
Different range of spatial autocorrelation?


Larger SHD and WAHD range values
Separation distance larger when restricted to network
SHD
Discussion
Probability-based random survey design (-) affected WAHD

Maximize spatial independence of sites

Does not represent spatial relationships in networks

Validation sites randomly selected
275
244
244 sites did not have neighbors
Sample Size = 881
Number of sites with ≤1 neighbor: 393
Mean number of neighbors per site: 2.81
Frequency
149
133
109
66
38
35
32
12
19
7
15
13
6
1
0
0
2
13
14
15
16
17
0
0
1
2
3
4
5
6
7
8
9
10
11
Number of Neighboring Sites
12
Discussion
WAHD models explained more variability as neighboring sites
increased
Not when neighbors had:
Similar watershed conditions
Significantly different chemical response values
4500
4500
WAHD
GLM
Difference (O – E)


00
0
1
2
3
4 5 6 7 8 9 10 11 12 13 14 15 16 17
Number of Neighboring Sites
Discussion
GLM predictions improved as number of neighbors increased

Clusters of sites in space have similar watershed conditions
– Statistical regression pulled towards the cluster

GLM contained hidden spatial information
– Explained additional variability in data with > neighbors
4500
4500
Difference (O – E)
WAHD
GLM
00
0
1
2
3
4 5 6 7 8 9 10 11 12 13 14 15 16 17
Number of Neighboring Sites
Predictive Ability of Geostatistical Models
Coarse
Scale of unknown
influential processes
COND
SO4
ANC
PH
NO3
DOC
TEMP
DO
Fine
0
0.5
r2
1.0
Conclusions
1) Spatial autocorrelation exists in stream chemistry data at a
relatively coarse scale
2) Geostatistical models improve the accuracy of water
chemistry predictions
3) Patterns of spatial autocorrelation differ between chemical
response variables

Ecological processes acting at different spatial scales affect
conditions at the survey site
4) SLD is the most suitable distance measure in Maryland for
these chemical response variables at this time


Unsuitable survey designs
SHD: GIS processing time is prohibitive
Conclusions
5) Results are scale specific


Spatial patterns change with survey scale
Other patterns may emerge at shorter separation distances
6) Further research is needed at finer scales

Watershed or small stream network
Visualization of Model Predictions
Demonstrate how a geostatistical methodology can be used to
compliment regional water quality monitoring efforts
1) Predict regional water quality conditions
2) Identify the spatial location of potentially
impaired stream segments
MBSS 1996 DOC
Kilometers
0
N
n
312
Min
0.6
1st Qu.
1.2
20
Median
1.7
Mean
1.9
3rd Qu.
2.7
Max
15.9
σ2
1.8
Spatial Patterns in Model Fit
Squared Prediction Error (SPE)
Generate Model Predictions
Prediction sites

Study area
– 1st, 2nd, and 3rd order non-tidal streams
– 3083 segments = 5973 stream km

ID downstream node of each segment
– Create prediction site

More than one site at each confluence
Generate predictions and prediction
variances



SLD Mariah model
Universal kriging algorithm
Assigned predictions and prediction variances back
to stream segments in GIS
DOC Predictions (mg/l)
Weak Model Fit
Strong Model Fit
Water Quality Attainment by Stream Kilometres
Threshold values for DOC


Set by Maryland Department of Natural Resources
High DOC values may indicate biological or ecological
stress
Theshold
Low
Medium
High
DOC (mg/l)
< 5.0
5.0 - 8.0
> 8.0
Stream
Kilometers
5387.67
400.19
185.16
Percent
90.2
6.7
3.1
Current and Future Research in SEQ
Different ways to capture spatial information
1) Geostatistical models
 Attempt to explain spatial relationship between response variables
 May represent another ecological process that is affecting them
2) Spatial location of covariates
 Does the spatial location of landuse within the watershed affect the
response?
 Does the spatial configuration of landuse affect the response?
3) Stream network configuration and connectivity
 How does the configuration of the network affect the response?
 Are stream segments within one network really connected?
Geostatistical Models
Covariance Matched Constrained Kriging (CMCK)
Y ( s)   ( s)   K r (| u  s |) (u ) / (s) x(u)du
mean
constant here
but might
incorporate
other
covariates
weight function for
kernel function: relative stream
Governs spatial orders or
dependence
watershed areas
independent
Gaussian
process
|u-s| = river distance d
Cressie, N., Frey, J., Harch, B., and Smith, M.: 2006, ‘Spatial Prediction on a River Network’, Journal of Agricultural,
Biological, and Environmental Statistics, to appear.
Geostatistical Models
B
A
C
Covariance Matched Constrained Kriging (CMCK)
 Combination of distance measures
Cressie, N., Frey, J., Harch, B., and Smith, M.: 2006, ‘Spatial Prediction on a River Network’, Journal of Agricultural,
Biological, and Environmental Statistics, to appear.
Geostatistical Models and the EHMP
Develop geostatistical models
 Individual indices and multivariate indicators
 Physical/Chemical
 Nutrients
 Fish
 Ecosystem Processes
 Invertebrates
Determine which distance measure(s) to use
 One distance measure: SLD, SHD, WAHD
 More than one distance measure: CMCK (covariance matched
constrained kriging)
 Based on statistical evidence, ecological expertise, and survey design
Make model predictions
Spatial Location of Watershed Attributes
Lumped non-spatial
watershed attributes
Covariate
AREA
URBAN
BARREN
WATER
CONIFER
DECIDFOR
MIXEDFOR
EMERGWET
WOODYWET
COALMINE
EASTING
NORTHING
ER63-ER69
MEANELEV
SLOPE
ARGPERC
CARPERC
FELPERC
MAFPERC
SILPERC
MEANK
MAXTEMP
MINTEMP
PRECIP
ANPRECIP
Description
Catchment area (ha)
% Urban
% Barren
% Open Water
% Conifer or evergreen forest type
% Deciduous forest type
% Mixed forest type
% Emergent Herbacious Wetlands
% Woody or shrubby wetlands
% Coalmine
Easting - Albers Equal Area Conic
Northing - Albers Equal Area Conic
Omernik's Level III Ecoregion
Mean elevation in the watershed
Mean slope in the watershed
% Argillaceous rock type in watershed
% Carbonic rock type in watershed
% Felsic rock type in watershed
% Mafic rock type in watershed
% Siliceous rock type in watershed
Mean soil erodability factor in watershed
(adjusted for rock fragments)
Mean annual maximum temperature (°C)
Mean minimum temperature for January - April (°C)
Mean precipitation for January - April (mm)
Mean annual precipitation
Spatial Location of Watershed Attributes
Buffer streams
using straightline distance
Overland hydrologic
distance to stream
Straight-line
distance from
stream outlet
Overland hydrologic
distance
+
instream distance to stream
outlet
Spatial Configuration of Watershed Attributes
 How large or small are patches of landuse?
 How complex is the shape?
 Is landuse clumped or dissected?
 Is landuse adjacent to stream?
Network Configuration
Network Connectivity
= Survey site
Network Connectivity
= Survey site
Barrier
Barrier
Represent connectivity on a regional scale
Network Connectivity
Define individual networks
Network Configuration and Connectivity
Measure network size and complexity
Questions? Comments?
Erin E. Peterson
Phone: +61 7 3214 2914
Email: Erin.Peterson@csiro.au
www.csiro.au
Download