Colorado State University’s EPA-FUNDED PROGRAM ON SPACE-TIME AQUATIC RESOURCE MODELING and ANALYSIS PROGRAM (STARMAP) Jennifer A. Hoeting and N. Scott Urquhart Associate Professor and Senior Research Scientist Department of Statistics Colorado State University Fort Collins, CO 80523-1877 1 STARMAP FUNDING Space-Time Aquatic Resources Modeling and Analysis Program The work reported here today was developed under the STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. The views expressed here are solely those of presenters and STARMAP, the Program they represent. EPA does not endorse any products or commercial services mentioned in these presentation. This research is funded by U.S.EPA – Science To Achieve Results (STAR) Program Cooperative # CR - 829095 Agreement 2 Overview of Presentation 1. EPA’s Request for Applications (RFA) 2. CSU’s Response = STARMAP 3. A summary of some of the goals and recent accomplishments of the four STARMAP projects 4. Opportunities for Cooperation 3 EPA’s REQUEST FOR APPLICATIONS (RFA) Content Requirements • Research in Statistics Directed toward using, in part, data gathered by probability surveys of the “EMAP-sort.” • Training of “future generations” of environmental statisticians • Outreach to the states and tribes 4 EPA’s REQUEST FOR APPLICATIONS (RFA) - continued • Major Administrative Requirement “… each of the two programs established will involve collaborative research at multiple, geographically diverse sites.” • Two Programs: 1. Oregon State University: Design-based/model assisted survey methodology 2. Colorado State University: Spatial and temporal modeling, incorporating hierarchical survey design, data analysis, modeling 5 RESPONSE to RFA from CSU • Institutions: Colorado State University o Department of Statistics o Natural Resources Ecology Lab Oregon State University Including work at o o o o o Iowa State University University of Alaska, Fairbanks University of Washington Southern California Coastal Water Research Project (SCCWRP) Water Quality Technology, Inc 6 STARMAP Overview Goals of STARMAP: • Develop statistical methods for aquatic resources • Extend current methods for sampling design and modeling • Emphasize spatio-temporal data: spatially explicit data collected over time 7 STARMAP Overview • • • Most statistical techniques taught in graduate statistics classes assume that the observations are uncorrelated Reality: aquatic resources that are nearby in space are typically more similar than those far apart STARMAP aims to 1. Develop sampling methods to enhance EMAP designs 2. Develop statistical methods which make the best use of the all available current data 8 STARMAP Types of available data • A response of interest A probability sample in a region, e.g., 305(b) Some purposefully chosen points in the region Spatially “intensive” points near some of the observation locations Response may be multivariate • Predictors Some at observation locations only Some at whatever density desired from GIS 9 STARMAP PROJECTS 1. Combining Environmental Data Sets 2. Local Estimation 3. Indicator Development 4. Outreach 10 STARMAP PROJECT 1: COMBINING ENVIRONMENTAL DATA SETS Project leader: Jennifer Hoeting, CSU Department of Statistics Two of the goals of the project: 1. Develop models and methodology for modeling aquatic resource data 2. Enhance EMAP designs 11 STARMAP PROJECT 1: A closer look at one of the projects Goal 1: Develop models and methodology for modeling aquatic resource data • Challenges: • Spatially explicit, but incomplete coverage over space Form of the response Example: Compositional data What proportion of the species of fish at a sample location are in three pollution (or thermal) tolerance categories: intolerant, intermediate, and tolerant? Can we relate multiple compositions to environmental covariates in a scientifically meaningful way? 12 Modeling compositional data: Motivating Problem • Stream sites in the Mid-Atlantic region of the United States were visited Response: For each site, each observed fish species was cross categorized according to several traits Predictors: Environmental variables are also measured at each site (e.g. precipitation, chloride concentration,…) • How can we determine if collected environmental variables affect species trait compositions (which ones)? 13 Modeling compositional data: Sampling locations for Mid-Atlantic Highlands Region 14 Modeling compositional data: Discrete Compositions and Probability Models • Compositional data are multivariate observations Z = (Z1,…,ZD) subject to the constraints that SiZi = 1 and Zi 0. • Compositional data are usually modeled with the Logistic-Normal distribution (Aitchison 1986). LN model defined for positive compositions only, Zi > 0 • Problem: With discrete counts one has a non-trivial probability of observing 0 individuals in a particular category 15 Modeling compositional data: Random effects discrete regression model • Developed a new model: the random effects discrete regression model • Developed Bayesian methods to estimate the parameters of this model • Developed graphical models theory which allows for statistically sound displays of the results 16 Modeling compositional data: Random effects discrete regression model • Sampling of individuals occurs at many different random sites, i = 1,…,S, where covariates are measured only once per site • Hierarchical model for individual probabilities: f REDR y | x exp x ,ε x y ,x f c d f 0 εf ~ MVN 0, S f d fcd c f dm y , x x f y m2 f M m if f is not complete in G if f is complete in G 17 Modeling compositional data: Example Chain Graph c d • Mathematical graphs are used to illustrate complex dependence relationships in a multivariate distribution • A random vector is represented as a set of vertices, V . • Pairs of vertices are connected by directed or undirected edges depending on the nature of each pair’s association 18 Modeling compositional data: Fish Species Richness in the Mid-Atlantic Highlands • 91 stream sites in the Mid Atlantic region of the United States were visited in an EPA EMAP study • Response composition: Observed fish species were crosscategorized according to 2 discrete variables: 1. Habit 2. Pollution tolerance • Column species • Intolerant • Benthic species • Intermediate • Tolerant 19 Modeling compositional data: Stream Covariates Environmental covariates: values were measured at each site for the following covariates 1. 2. 3. 4. 5. 6. Mean watershed precipitation (m) Minimum watershed elevation (m) Turbidity (ln NTU) Chloride concentration (ln meq/L) Sulfate concentration (ln meq/L) Watershed area (ln km2) 20 Modeling compositional data: Fish Species Functional Groups Posterior suggested chain graph for independence model (lowest DIC model) Precipitation Habit Elevation Area Turbidity Sulfate Tolerance Chloride Edge exclusion determined from 95% HPD intervals for parameters and off-diagonal elements of Ø. 21 Modeling compositional data: A summary The Random Effects Discrete Regression Model • Allows for multivariate composition response • Provides a statistically defensible graphical model interpretation • Offers measures of uncertainty and inferences not available using other techniques for species trait and related analyses • Allows for predictions at unobserved locations 22 STARMAP PROJECT 1: Some Recent Accomplishments Goal 1: Develop models and methodology for modeling aquatic resource data Other projects aimed at goal 1: • Models for radio telemetry habitat association data Radio-tagged fish are monitored over time Goal: extend existing models to account for seasonal changes in fish habitat types • Model selection for geo-statistical models When predicting a continuous response , which covariates are best? Does spatial correlation affect model selection (YES!) 23 STARMAP PROJECT 1: Some Recent Accomplishments Goal 2: Enhance EMAP designs • How should EMAP-type sampling be intensified to estimate spatial correlation? Current context – City of San Diego and Southern California Coastal Water Research Project (SCCWRP) o Accurate maps of environmental measures around San Diego’s oceanic sewage outfall • How to Get From 305(b) Survey Results to Identify 303(d) Sites? STARMAP organized a morning of talks on this topic at the recent EMAP Conference 24 STARMAP PROJECT 2: Local Inferences from Aquatic Studies Project leader: Jay Breidt, CSU Department of Statistics Goals: 1. Develop techniques for small area estimation 2. Develop methods to estimate the cumulative distribution function 3. Methods to infer causality from non-experimental spatially referenced data 25 STARMAP PROJECT 2: Some Recent Accomplishments Goal 1: Small area estimation Combining probability survey data with non-probability data to make spatially-explicit predictions Bayesian models to construct a set of ensemble estimates to predict some response Data not observed everywhere, but methods will provide predictions over entire region along with estimates of uncertainty Current emphasis: characteristics of water quality for Mid-Atlantic Highlands region 26 STARMAP PROJECT 2: Some Recent Accomplishments • Goal 1: Developing and comparing different methods for small area estimation Developing new semi-parametric methods Compared to parametric and non-parametric methods, can optimize over the benefits of both • Goal 2: Nonparametric regression estimators for two-stage samples Incorporates auxiliary information available at the level of the primary sampling unit Current emphasis: EMAP Northeast Lakes • Presented results at recent EMAP conference 27 STARMAP PROJECT 3: Development and Evaluation of Aquatic Indicators Project leader: Dave Theobald, CSU Natural Resources Ecology Lab Two of the project goals: 1. Develop and determine landscape indicators for analyses of EMAP data 2. Develop better GIS tools for relevant agencies 28 STARMAP PROJECT 3: Some Recent Accomplishments Goal 1: Develop and determine landscape indicators for analyses of EMAP data • Developing predictors for stream size and flow status to overcome limitations of the National Hydrological Database • Estimation of regional indicators of taxa richness • Classification of perennial versus non-perennial streams Quantifying taxa richness in terms of rarity assessed by a fixed count Sampling macroinvertebrates: compositing and structure of variance Compiling indicators and additional GIS data coverage for MAHA and Western Pilot Study 29 STARMAP PROJECT 3: Some Recent Accomplishments Goals 2: Develop better GIS tools • • • • Software for Generalized Random Tessellation Stratified (GRTS) sampling GRTS: Robust spatially balanced random sampling Software implements the GRTS algorithm in ARCVIEW Software is in final testing stages 30 Laramie Foothills Study Area and Sample Points 31 Photo interpretation points displayed with predicted current condition map 32 STARMAP PROJECT 4: OUTREACH Project leader: Scott Urquhart, CSU Department of Statistics Project goals: 1. Identify and establish statistical needs of states, tribes and local agencies 2. Prepare content material relevant to target audience 33 STARMAP PROJECT 4: Outreach • Learning Materials for Aquatic Monitoring 1. Individualized interface o Images can vary by geographic context o Content varies by responsibility level o Supports language variation 2. Browser based o Also available on a CD ROM • • Avoid internet delays for learners at remote sites & in the field Customizable environment 3. Materials are under active development o Interface & initial materials tested late last summer by monitoring personnel in state agencies, Region 10 and NGOs o Anticipate video taping of EMAP training session in Corvallis later this month; material to be included in “How to Monitor” o See poster and reprint for more info 34 STARMAP PROJECT 4: Recent Accomplishments • Content – Monitoring Objectives Methods for Site Selection What/How to Monitor How to Monitor = Field Operations How to Summarize Case Studies o Planning studies o Site selection o Analyses 35 STARMAP Training future environmental statisticians • Graduate students graduated 1 Ph.D. + 1 affiliated student in landscape ecology 4 M.S. • Current graduate students 6 Ph.D. students – including two in landscape ecology 2 M.S. students • Post doctoral fellows – one at present; seeking others • Early career professionals 3 young faculty 2 agency employees 36 STARMAP Training future environmental statisticians Colorado State University’s PRIMES program • PRogram for Interdisciplinary Mathematics, Ecology and Statistics, • NSF IGERT program aimed at training graduate students in this interdisciplinary area • Works well with STARMAP as both have similar goals • Allows us to offer new classes and support students in many ways • Opportunities for visitors and joint research! 37 OPPORTUNITIES FOR COOPERATION • • • GIS-based GRTS site selection New analysis needs We are looking for aquatic environmental data sets Which are spatially intense o Like at sites 100s of meters apart to few km Or which include spatial locations and were collected over a long time frame (> 5 time points) Identified several such possible sets at EMAP Conference • Involvement in Evolving Learning Materials Testing Suggestions Case studies o We could analyze some data for you to make these 38 CHECK OUT WHAT WE ARE DOING • STARMAP Web Site: http://www.stat.colostate.edu/starmap/ This presentation will be posted there, soon. • Team members here are … • Questions Are Welcome! 39