Multi-Lag Cluster Enhancement of Fixed Grids for Variogram Estimation for Near Coastal Systems Kerry J. Ritter, SCCWRP Molly Leecaster, SCCWRP N. Scott Urquhart, CSU Ken Schiff , SCCWRP Dawn Olsen, City of San Diego Tim Stebbins, City of San Diego Project Funding • The work reported here was developed under the STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. The views expressed here are solely those of the presenter and STARMAP, the Program they represent. EPA does not endorse any products or commercial services mentioned in this presentation. • Southern Californian Coastal Water Research Project (SSCWRP) Background • Maps of sediment condition are important for making decisions regarding pollutant discharge • Maps in marine systems are rare • Special study by San Diego Municipal Wastewater Treatment Plant • Objective: To build statistically defensible maps of chemical constituents and biological indices around two sewage outfalls – Point Loma – South Bay Point Loma and South Bay Outfalls TYPICAL DESIGN SITUATION • Many features of the real situation are unknown. – Here: The nature of the semivariogram • Multiple Responses – What is a good solution for one response may not be a good design for another! • Time constraint – Answer was required by June 14, 2004 Two-Phase Approach • Phase I: Model spatial variability at various spatial scales (eg. Variogram) – This summer • Phase II: Use information from Phase I to design survey that meets accuracy requirements – next summer = 2005 • Current focus is on Phase I Variogram 2.0 2.5 VARIOGRAM 1.5 0.5 1.0 gamma SILL=> RANGE } 0.0 NUGGET=> 0 10 20 30 distance 40 50 Design Considerations for Modeling the Variogram • Sufficient replication at various spatial scales – Variogram model – Parameter estimates • Adequate spatial coverage to support investigating – Stationarity – Isotropy vs. Anisotropy – Strata • Allow for multiple responses Empirical Variograms (Point Loma 2000 Regional Survey) TOC COPPER 0.05 10 15 gamma 0.10 gamma 30 20 2 4 6 8 0 0.0 5 10 0 0 0 2 distance R=5.09 S=36.27 N =0.00 4 6 8 0 distance R=8.8 S=.077 N =0.0242 50 40 250 20 30 No. of pairs 200 150 100 10 50 0 0 2 4 6 distance R=6.14 S=218.55 N =0.00 8 2 4 2 4 6 distance R=2.75 S=22.53 N =0.00 Lag Distribution Variogram 300 ZINC gamma gamma 20 40 25 0.15 50 30 60 CHROMIUM 6 lag distance (km) 8 8 Multi-Lag Cluster (MCL) Enhancements to Fixed Grids • Clusters of sites, spaced at various lag distances, are placed around fixed locations on an existing grid. • Allows current monitoring grid to remain “in tact”. • Provides replication at multiple spatialscales There are many ways to allocate resources within the MLC • Economic constraints: limit total number of samples – ( eg. 100 in Point Loma) • • • • More clusters with fewer sites within a cluster? or less clusters with fewer sites? Shorter sample spacing or larger sample spacing? What is best (decent!) design configuration? Choosing the Best Design Case Study: Point Loma • Three design configurations – S, STAR, and S with satellites • Two sets of lag classes – Shorter vs. larger sample spacing • Compare lag distributions • Simulation study – Simulate response – Consider different models of spatial variability • Compare relative performance of designs for estimating parameters “STAR” and “S” Cluster Designs STAR DESIGN 0 0 20 20 40 40 Yk m Yk m 60 60 80 80 100 100 S DESIGN 0 20 40 60 Xkm 80 100 0 20 40 60 Xkm 80 100 “S” and “S with Satellites” Design S DESIGN 0 0 20 20 40 40 Ykm Ykm 60 60 80 80 100 100 S w ith SATELLITES DESIGN 0 20 40 60 Xkm 80 100 0 20 40 60 Xkm 80 100 0 20 40 Yk m 60 80 100 STAR DESIGN 0 20 40 60 Xkm 80 100 0 20 40 Ykm 60 80 100 S DESIGN 0 20 40 60 Xkm 80 100 0 20 40 Ykm 60 80 100 S w ith SATELLITES DESIGN 0 20 40 60 Xkm 80 100 Sample Allocation Star S S with Satellites Grid Stations =12 Grid Stations =12 Grid Stations =12 5 “STAR” Clusters of Size 17 3 grid station 2 sites of interest 1 “S” Cluster of Size 9 11 “S” Clusters of Size 9 5 grid stations 6 sites of interest 8 “S” Clusters of Size 9 8 Satellites added to 3 S” 4 grid stations 4 sites of interest Field duplicates=9 Field duplicates=6 Field duplicates=8 Total Samples = 12+3*(17-1) +2*(17)+9+9=112 Total Samples = 12+5*(9-1)+6*(9)+6=112 Total Samples = 12+4*(9-1) +6*(9)+6=112 “Star” Cluster Design Point Loma 5 Star + 1 S Cluster 3610 3610 3615 3615 Ykm Ykm 3620 3620 3625 3625 Point Loma 5 Star + 1 S Cluster 466 468 470 Xkm 472 474 466 468 470 Xkm 472 474 “S” Cluster Design Lag = 0.05, 0.10, 0.20, 0.50 Lag = 0.05, 0.25, 1.00, 3.00 S DESIGN Ykm 3610 3610 3615 3615 Ykm 3620 3620 3625 3625 S DESIGN 466 468 470 Xkm 472 466 468 470 Xkm 472 474 “S” Cluster with Satellites S w ith SATELLITES DESIGN Yk m 3610 3610 3615 3615 Yk m 3620 3620 3625 3625 S w ith SATELLITES DESIGN 466 468 470 Xkm 472 466 468 470 Xkm 472 474 Omnidirectional Lag Dist. Lag = 0.05, 0.10, 0.20, 0.50 Lag = 0.05, 0.25, 1.00, 3.00 Ominidirectional Lag Dist 200 S Star SSAT 0 100 100 200 No. of pairs 300 SD3 StarD5 SSATD3 0 No. of pairs 300 400 400 Ominidirectional Lag Dist 0 2 4 6 Pairwise Lag distances 8 0 2 4 6 Pairwise Lag distances 8 Directional Lag Dist Lag = 0.05, 0.10, 0.20, 0.50 { Lag = 0.05, 0.25, 1.00, 3.00 is similar} Direction = 0 120 120 Direction = 90 100 S90 STAR90 SSAT90 80 0 0 20 20 40 60 No. of pairs 60 40 No. of pairs 80 100 S0 ST AR0 SSAT 0 0 2 4 6 8 0 Pairwise Lag distances 2 4 6 8 Pairwise Lag distances 120 Direction = 135 120 Direction = 45 60 No. of pairs 80 100 S135 STAR135 SSAT135 0 20 40 60 40 20 0 No. of pairs 80 100 S45 STAR45 SSAT45 0 2 4 6 Pairwise Lag distances 8 0 2 4 6 Pairwise Lag distances 8 Simulation Study • 3 Grid Enhancements: S, STAR, S with Satellites • Two sets of lag classes of size 4 – 0.05, 0.10, 0.20, 0.50 (km) – 0.05, 0.25, 1, 3 (km) • Spherical variogram – Range = 1, 2, 4, 6 – Nugget = 0.00, 0.10 – Sill = 1 • 1000 sims • Fit using automated procedure in Splus – This may have introduced artifacts Percent Difference from Target Range (Median Range) S=1, N= 0.10 Lag = 0.05, 0.25, 1.00, 3.00 40 40 Lag = 0.05, 0.10, 0.20, 0.50 20 10 10 20 Percent of Target 30 S Star SSAT 0 0 -10 -10 Percent of Target 30 S Star SSAT 1 2 3 4 True Range 5 6 1 2 3 4 True Range 5 6 Percent Difference from Target Sill (Median Sill) S=1, N= 0.10 Lag = 0.05, 0.25, 1.00, 3.00 20 20 Lag = 0.05, 0.10, 0.20, 0.50 10 15 S Star SSAT 0 5 Percent of Target 10 5 0 -5 -5 -10 -10 Percent of Target 15 S Star SSAT 1 2 3 4 True Range 5 6 1 2 3 4 True Range 5 6 Percent Difference from Target Nugget (Median Nugget) S=1, N= 0.10 Lag = 0.05, 0.25, 1.00, 3.00 100 100 Lag = 0.05, 0.10, 0.20, 0.50 50 S STAR SSAT -50 0 Median 0 -50 -100 -100 Median 50 S STAR SSAT 1 2 3 4 True Range 5 6 1 2 3 4 True Range 5 6 Summary STAR- performed better than S and S with Satellites for estimating variogram parameters - robust to different lag classes Multiple lag distances better than increased replication at fewer lag distances Larger lag classes generally did better than shorter lag classes (eliminates “holes”) Final Design Five “S” clusters and includes10 duplicates: five at star centers & five elsewhere) Further Research • Choose another variogram model – Exponential • Choose another variogram fitting algorithm – REML • Simulate anisotropy • Investigate robustness to model misspecification • Explore other designs STARMAP and CITY OF SAN DIEGO? • Outreach to a member of the EPA affiliates • Research opportunity – real problem – Mapping consequences – Apparently no other US data exists which is • spatially intense and • near coastal – This mapping requirement resulted from SD’s permit renewal – Similar repeats are very likely MORE GENERAL QUESTION • How much spatial correlation is there in aquatic systems, after accounting for habitat features? – I am trying to assemble spatially intense relevant data sets in a number of settings – Ask for such data sets at EMAP 2004 Symposium in May • Have located a few SPATIALLY INTENSE DATA SETS OF ENVIRONMENTAL RESPONSES • Ohio River – Have 400+ sites • Josh French is looking at this data • Have about 60 Virginia stream sites • On two streams • Access to a northeast estuary study 100+ points • Some spatial correlation demonstrated • Detroit River – fairly short segment 60+ points • San Diego study = near coastal SPATIALLY INTENSE DATA SETS OF ENVIRONMENTAL RESPONSES • Have nothing on wetlands • Other possibilities – San Francisco Bay • Preliminary observation – SD data shows greater range in the semivariogram than I had expected – Even after accounting for depth or particle size – Why had I expected that? Effluent is fresh water; it rises fast from outfall. Coastal and tidal currents are strong there. END OF PLANNED PRESENTATION • Questions and suggestions are welcome