Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models N. Scott Urquhart Joint work with Erin P. Peterson, Andrew A. Merton, David M. Theobald, and Jennifer A. Hoeting All of Colorado State University, Fort Collins, CO 80523-1877 MSS/MBSS # 1 FUNDING ACKNOWLEDGEMENT The work reported here today was developed under the STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. The views expressed here are solely those of presenter and STARMAP, the Program he represents. EPA does not endorse any products or commercial services mentioned in this presentation. This research is funded by U.S.EPA – Science To Achieve Results (STAR) Program Cooperative # CR - 829095 Agreement MSS/MBSS # 2 Maryland Bioglogical Stream Survey (MBSS) Sample Site Locations 0 Kilometers 30 Legend MBSS sample sites 1:100,000 National Hydrography Dataset Maryland 0 5,000 Meters ¯ MSS/MBSS # 3 OUR PATH TODAY What are “Spatial Statistical Models”? Measuring Distance in Space The Maryland Biological Stream Survey Outstanding data set to compare models A Few Results Work in Progress MSS/MBSS # 4 GATHERING SOME INSIGHTS Raise your hand if you Had a statistics course – even in the distant past Remember doing a t-test Did a simple linear regression (fitted a line) Did a multiple regression Examined model failures Did analyses accommodating “correlated errors” Have used spatial statistics, eg, kreiging MSS/MBSS # 5 STATISTICS AND PREDICTION OBJECTIVE: Measure relevant responses, Like dissolved organic carbon (DOC), and Related variables at suitable sites, then Develop formula to predict DOC at Unvisited sites Why? Clean Water Act (CWA) 303(d) requires states to identify “impacted” waters and plan to eliminate impact What state has the $ to evaluate every water? Predict, instead. MSS/MBSS # 6 PREDICTIVE VARIABLES Predict DOC from measures such as Area above the stream evaluation point % Barren % High Intensity Urban % Woody Wetland (*) % Conifer or Evergreen Forest Type (*) % Mixed Forest Type (*) % low intensity Urban (*) To accommodate year diff’s: 1996 & 1997 (*) MSS/MBSS # 7 GIS TOOLS These variables require Efficient delineation of watershed above any point STARMAP has developed such software It is available Documented in a poster MSS/MBSS # 8 PREDICTIVE MODELS Classical regression model would be: Yi 0 1 X 1i 2 X 2 i p X pi i where i have a constant variance and are UNCORRELATED. BUT “Everything is related to everything else, but near things are more related than distant things” Tobler (1970). Thus the “uncorrelated” above is indefensible in many cases MSS/MBSS # 9 SO WHAT IS SPATIAL STATISTICS? Spatial Statistics is a set of techniques which Allow correlated data Index the amount of correlation by distance the points are apart Incorporate this correlation into predictions MSS/MBSS # 10 SO WHAT IS SPATIAL STATISTICS II? MSS/MBSS # 11 WHAT ARE “SPATIAL STATISTICAL MODELS”? MSS/MBSS # 12 MEASURING DISTANCE IN SPACE MSS/MBSS # 13 The Maryland Biological Stream Survey Outstanding data set to compare models MSS/MBSS # 14 A FEW RESULTS MSS/MBSS # 15 WORK IN PROGRESS MSS/MBSS # 16 MSS/MBSS # 17 The Clean Water Act (CWA) of 1972 requires • States, tribes, & territories to identify water quality (WQ) impaired stream segments • Create a priority ranking of those segments • Calculate the Total Maximum Daily Load (TMDL) for each impaired segment based upon chemical and physical WQ standards • A biannual inventory characterizing regional WQ The Problem • It is impossible to physically sample every stream within a large area • Too many stream segments • Limited personnel • Cost associated with sampling • Probability-based inferences used to generate regional estimates of WQ • In miles by stream order • Does not indicate where WQ impaired segments are located • A rapid and cost-efficient method needed to locate potentially impaired stream segments throughout large areas Our Approach • Develop a geostatistical model based on coarse-scale geographical information system (GIS) data • Make predictions for every stream segment throughout a large area • Generate a regional estimate of stream condition • Identify potentially WQ impaired stream segments MSS/MBSS # 18 Dissolved Organic Carbon (DOC) Example Fit a geostatistical model to DOC data and coarse-scale watershed characteristics • Maryland Biological Stream Survey data 1996 • 7 interbasins & 343 DOC survey sites • GIS data: GIS data, scale, and sources. Dataset USGS National Hydrography Dataset (NHD) USGS National Land Cover Dataset (NLCD) National Elevation Dataset (NED) Omernik's Level III Ecoregion USGS Lithology PRISM (Parameter-elevation Regressions on Independent Slopes Model) temperature data Scale 1:250,000 30 meter 30 meter 1:7,500,000 1:250,000 4 kilometer Source http://nhd.usgs.gov/ http://landcover.usgs.gov/natllandcover.asp http://ned.usgs.gov/ http://www.epa.gov/wed/pages/ecoregions/level_iii.htm USEPA Western Ecology Division, Corvallis, OR http://www.ocs.orst.edu/prism/faq.phtml MSS/MBSS # 19 Methods Pre-process GIS data • “Snap” survey sites to streams • Calculate watershed attributes using the Functional Linkage of Watersheds and Streams (FLoWS) tools (Theobald et al., 2005; Peterson et al., in review) Calculate distance matrices for model selection • R statistical software • x,y coordinates for observed survey sites Covariates selected using the Leaps and Bounds regression algorithm. Description Covariate % Water WATER % Emergent Wetlands EMERGWET % Woody wetlands WOODYWET % Felsic rock type in watershed FELPERC Mean minimum temperature (°C) MINTEMP (January to April) Omernik's Level 3 Ecoregion 64 ER64 Omernik's Level 3 Ecoregion 65 ER65 Omernik's Level 3 Ecoregion 66 ER66 Omernik's Level 3 Ecoregion 67 ER67 Omernik's Level 3 Ecoregion 69 ER69 • Test all possible linear models using the 10 covariates • 1024 models (210 = 1024) • Distance measure: Straight-line distance (aka Euclidean) • Autocorrelation function: Mariah • Estimate autocorrelation parameters: nugget, sill, and range MSS/MBSS # 20 Model Results • Range of spatial autocorrelation: 21.09 kilometers • Significant watershed attributes = WATER, EMERGWET, WOODYWET, FELPERC, and MIN TEMP Summary statistics for log10 DOC and model covariates. Variable Min 1st Qu. Median log10 DOC (mg/l) -0.22 0.08 0.24 WATER (%) 0 0 0.16 EMERGWET (%) 0 0 0.13 WOODYWET (%) 0 0 0.27 FELPERC (%) 0 0 0.31 MINTEMP (°C) -5.88 -3.06 -2.39 Mean 0.28 0.25 0.26 1.24 26.81 -2.49 3rd Qu. 0.43 0.28 0.35 1.15 55.26 -1.4 Max 1.20 4.64 4.85 22.01 100 0.03 σ2 0.25 0.44 0.44 3.28 36.14 1.47 Model fit • Leave-one-out cross validation method and Universal kriging • Overall MSPE = 0.93, R2 = 0.72 • One strongly influential site • R2 without the influential site = 0.66 MSS/MBSS # 21 • East-West trend in model fit • Conservative model fit: tends to underestimate DOC • 35 MSPE values > 1.5 • These sites have similar covariate values to nearby sites, but considerably different DOC values than nearby sites MSS/MBSS # 22 Model Predictions Create prediction sites • 1st, 2nd, and 3rd order non-tidal stream segments • 3083 prediction sites = downstream node of each GIS stream segment • Downstream node ensures that entire segment is located in same watershed • More than one prediction location at stream confluences • Covariates for prediction sites represent the conditions upstream from the segment, not the stream confluence Calculate distance matrices for model predictions • Include observed and predicted survey sites Generate predictions and prediction variances • Assign values back to stream segments in GIS • Universal kriging Algorithm Prediction statistics Summary Statistics for DOC predictions and prediction variances. Variable Min 1st Qu. Median Mean Predictions 0.8 1.5 1.9 2.7 Prediction Variances 0.049 0.095 0.122 0.171 3rd Qu. 3.0 Max 40.4 0.193 2.597 MSS/MBSS # 23 • 18 prediction values > 15.9 mg/l • Also possessed 18 largest prediction variances • Located in watersheds with large WATER, EMERGWET, or WOODYWET values • Large covariate values are not represented in the observed covariate data • Represent 5973.03 kilometers of stream miles Stream habitat characterization estimated as a percentage of stream miles in DOC (mg/l) during 1996. Thesholds Miles Kilometers Percent DOC < 5 3347.74 5387.67 90.2 5 ≤ DOC ≤ 8 248.67 400.19 6.7 DOC > 8 115.06 185.16 3.1 Total 3711.46 5973.03 100 MSS/MBSS # 24 Products • Geostatistical model used to predict segment-scale WQ conditions at unobserved locations • Map of the study area that shows the likelihood of WQ impairment for each segment • Can be tied to threshold values or WQ standards • Technical and Regulatory Services Administration within the Maryland Department of the Environment • Modifying the USGS NHD to include: • watershed impairments & stream-use designations by NHD segment • Frank Siano, personal communication • A methodology that illustrates how agencies can accomplish spatial analysis using GIS data, MBSS data, and geostatistics The Advantages • Additional sampling is not necessary • Compliments existing methodologies • Derive a regional estimate of stream condition in two ways: • Probability-based inferences about stream miles by stream order • Sum prediction values in miles by stream order • Identify potentially WQ impaired stream segments • Methodology can be used for regulated constituents as well • Nitrate, acid neutralizing capacity, pH, and conductivity can be accurately predicted using geostatistical models (Peterson et al., in review2) • Identify spatial patterns of WQ throughout a large area • Identify areas where additional samples would provide the most information • Model results can be displayed visually • Allows professionals to communicate results with a wide variety of audiences easily MSS/MBSS # 25 References Hoeting J.A., Davis R.A., & Merton A.A., Thompson S.E. (in press) Model Selection for Geostatistical Models. Ecological Applications. http://www.stat.colostate.edu /%7Ejah/papers/index.html Peterson E.E., Theobald D.M., & Ver Hoef J.M. (in review1) Support for geostatistical modeling on stream networks: Developing valid covariance matrices based on hydrologic distance and stream flow. Freshwater Biology. Peterson E.E., Merton A.A., Theobald D.M., & Urquhart N.S. (in review2) Patterns of Spatial Autocorrelation in Stream Water Chemistry. Environmental Monitoring. Theobald D.M., Norman J., Peterson E.E., Ferraz S. (2005) Functional Linkage of Watersheds and Streams (FLoWs) Network-based ArcGIS tools to analyze freshwater ecosystems. Proceedings of the ESRI User Conference 2005. July 26, 2005, San Diego, CA, USA. Acknowledgements The work reported here was developed under STAR Research Assistance Agreement CR829095 awarded by the U.S. Environmental Protection Agency to the Space Time Aquatic Resource Modeling and Analysis Program (STARMAP) at Colorado State University. This poster has not been formally reviewed by the EPA. The views expressed here are solely those of the authors. The EPA does not endorse any products or commercial services presented in this poster. MSS/MBSS # 26