A Comparison of Variance Estimates of Stream Network Resources Sarah J. Williams Candidate for the degree of Master of Science Colorado State University Department of Statistics October, 25 2006 But First – What have I been doing since June? …Not Insurance …Not Pharmaceutical Industry …Not Government Noel-Levitz ‘Your Partner in Higher Education’ Targeted market research Student satisfaction surveys E-Learning Tools Retention strategy Effective recruitment strategies Noel-Levitz Two main offices: Denver and Iowa City, IA Full time consultants Associate consultants Marketing Sales Administrative IT Rest of us are in analysis in differing capacity Forecast PLUS (F+) Recruitment models SmartApproach Prospect Inquiry Admit Econometric models FAS What else do I do? Product Enhancements Software testing Methodology research Emerging Marketing Product Discrete Choice Modeling Assisting Consultants and TPSS staff What else do I do? National Conference Campus Visits SAS programming classes Meetings…meetings…meetings! A Comparison of Variance Estimates of Stream Network Resources Sarah J. Williams Candidate for the degree of Master of Science Colorado State University Department of Statistics October, 25 2006 Today’s Outline 1. 2. The problem Two methods of variance estimation 3. Local Area Neighborhood estimate Linear model and Components of Variance Comparing the two estimates Practical Implications Part 1: The Problem Surveys of aquatic resources provide challenges in sampling and also analysis Studies of aquatic resources need to have a proper temporal design that will allow for trend detection The revisit structure may be the most difficult part of the study to define Waterways are highly variable and volatile Studies of aquatic resources need to have a proper spatial design that will preserve spatial proximity Generalized Random Tessellation Stratified (GRTS) sample By using well planned panel designs researchers achieve both of these very important objectives Part 2: Two methods of variance estimation Of course, there are many methods for estimating the variance of a sample Today, we will focus on two of these methods Local Area Neighborhood Estimate (NBH) Linear model components of variance Local Area Neighborhood Estimate Design-based Compares well to a Horvitz-Thompson estimate S 2 NBH ( Zˆ T ) z(s j ) wij z D ( si ) (s ) si R si D ( s j ) j 2 Mixed Linear Model and Components of Variance y Xβ Zγ ε Why use this method? Design-based estimators have no inclusion of time Specifications of the study Mixed Linear Model and Components of Variance Yij S i T j Eij 2 S i ~ (0, site ) 2 T j ~ ( (t j ), year ) 2 Eij ~ (0, resid ) Yij (t j ) Si T j Eij 2 S i ~ (0, site ) 2 T j ~ (0, year ) 2 Eij ~ (0, resid ) In either case above we have that for a single observation: E[Yij ] (t j ) 2 2 2 Var[Yij ] site year resid Part 3: Comparing the two methods Coho Salmon of the Pacific Northwest The Oregon Plan for Salmon and Watersheds GRTS design (40 panels) 3-year salmon lifecycle Each monitoring area is a stratum The Data 35 responses of interest (Landscape & habitat) 6 regions of interest 8 years available 1,535 site visits (1,055 distinct sites) Response Region Count Site LWDPIECE1 1-NC 329 77.73 2-MC 308 81.11 SECCHNAREA 1-NC 314 247805.84 2-MC 304 191217.93 Response Region Count LWDPIECE1 1-NC 329 2-MC 308 SECCHNAREA 1-NC 314 2-MC 304 Linear model estimates: Year Residual 13.54 149.59 -0.43 14.43 -9.22 92810.91 -3020.04 73495.43 Site+Res Local SRS Var 227.32 138.47 234.91 95.54 52.19 96.13 340616.75 225939.60 342010.30 264713.36 77529.41 262645.77 Linear model estimates: Site/SRS Year/SRS Res/SRS Site+Res/SRS Local/SRS 0.331 0.058 0.637 0.968 0.589 0.844 -0.004 0.150 0.994 0.543 0.725 0.000 0.271 0.996 0.661 0.728 -0.011 0.280 1.008 0.295 Results and Conclusions 0.8 0.6 0.4 Local Variance As residual component of variance increases, so too does NBH 0.0 0.2 1.0 All Regions 0.0 0.2 0.4 0.6 0.8 1.0 Residual Variance Component 0.8 0.6 Local Variance 0.4 0.2 As site component of variance increases, NBH decreases 0.0 1.0 All Regions 0.0 0.2 0.4 0.6 Site Variance Component 0.8 1.0 Results and Conclusions 2 1 35 sresid,i 2 0.63 35 i 1 s NBH ,i Indicating that on average, the residual component of variance is 0.63 times the NBH estimate 2 1 35 ssite,i 2 1.32 35 i 1 s NBH ,i Indicating that on average, the site component of s variance is 1.32 times the NBH estimate 2 NBH ,i It was also of interest to express the NBH estimate as a linear combination of the residual and site components of variance: 2 2 2 sˆNBH ,i 0.42 * ssite,i 0.72 * sresid,i This relation is forced through the origin. Using corr ( sNBH ,i , sˆNBH ,i ) interpretable as R2 shows moderate strength of this relation 2 2 2 0.517 What does it all mean? Practical Implications When is it appropriate to use the linear model method over the local area estimate of variance? The local estimate will capture all variance due to residual effect and we have seen that time variance is relatively negligible The results of this study show that the NBH accounts for roughly 60% -70% of variance due to site Note again, the inverse relationship of site variance with NBH This project is an intermediate step in the larger problem of accurately modeling trend of an environmental indicator Acknowledgements N. Scott Urquhart Don Stevens Tom Kincaid Kim Jones Oregon Department of Fisheries and Wildlife The work reported here was developed under the STAR Research Assistance Agreements CR-829095 awarded to Colorado State University and CR829096 awarded to Oregon State University by the U.S. Environmental Protection Agency (EPA). This presentation has not been formally reviewed by the EPA. The work done and views expressed here are solely those of the author and STARMAP. EPA does not endorse any products or commercial services mentioned in this report. Thank you for listening today Resources Hocking, R.R. Methods and Applications of Linear Models. Wiley, 2002 Horvitz, D.G. and D.J. Thompson. 1952. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association 47: 663–685. Sarndal, C., Swensson, B., and Wretman, J. Model Assisted Survey Sampling. SpringerVerlag, 1992. Stevens, D.L., and Olsen A.R. 1999. Spatially Restricted Surveys Over Time for Aquatic Resources. Journal of Agricultural, Biological and Environmental Statistics, Vol.4, No.4: 415-428. Stevens, D.L. 2003. Sampling Design and Statistical Analysis Methods for the Integrated Biological and Physical Monitoring of Oregon Streams. The Oregon Plan for Salmon and Watersheds. Stevens, D.L., and Olsen, A.R. 2003. Variance estimation for spatially balanced samples of environmental resources. Environmetrics, submitted. Strahler A.N. 1957. Quantitative analysis of watershed geomorphology. Transactions of the American Geophysical Union, 21, 913-920. Urquhart, N.S., Overton W.S., and Birkes D.S. 1993. Comparing Sampling Designs for Monitoring Ecological Status and Trends: Impact of Temporal Patterns. Statistics for the Environment, chapter 3:71-85. Urquhart, N.S., Paulsen, S.G. and Larsen, D.P. 1998. Monitoring for policy-relevant regional trends over time. Ecological Applications, 8: 246-257. Urquhart, N.S., and Kincaid, T.M. 1999. Designs for detecting trend from repeated surveys of ecological resources. Journal of Agricultural, Biological and Environmental Statistics, 4: 404-414. The Oregon Plan for Watersheds and Salmon, Biennial Report, 2005. www.epa.gov/emap www.dfw.state.or.us Map obtained from http://nrimp.dfw.state.or.us/crl/default.aspx?PN=GCAs