Sampling Perennial Streams An Application in Model Based Optimal Design William Coar and F. Jay Breidt Abstract Optimal Design for Horvitz- Previous sample designs for perennial streams rely on the River Reach File 3 (RF3) Classification in the National Hydrography Dataset (NHD) to establish a sampling frame of perennial reaches. Recent studies have provided evidence of misclassification resulting in an incomplete frame, biased results and high costs. This application in optimal design investigates the use of auxiliary data to assist in predicting the current status of a stream segment as either perennial or nonperennial. The entire NHD database could then be used as a complete frame. Based on this information, optimal designs for fixed cost are derived for the complete and incomplete frame. Anticipated mean square error under a superpopulation model is then used to compare estimators from the complete and incomplete frame. Thompson Estimator Example For simplicity, assume Zi|Fi are independent. The cost of sampling element i is ci. First Let the relative frame size of the incomplete order inclusion probabilities that minimize frame as well as the expected cost vary. AMSE for a fixed expected cost C are said to Relative AMSE for Decreasing be optimal. Yi Pi (1 Fi ) 0.95 0.90 0.85 0.80 0.95 0.95 0.90 0.85 Relative Size of Incomplete Frame 0.95 0.90 0.85 0.80 400 0 100 2 2i 0 (1 Fi )i (1 Fi )i C 2C U U cnf (qr 1) 2 ( (1 Fi ) 2i ) 2 cnf (qr 1) 2 ( (1 Fi ) 2i ) 2 200 300 400 100 200 300 0.95 •Additional numerical investigation of AMSE 0 100 200 300 Expected Cost Expected Cost Mu(2)nf= 3 xMu(2)f Mu(2)nf= 4 xMu(2)f 400 2i , q U i 2i cf cnf U This research is funded by U.S.EPA – Science To Achieve Results (STAR) Program Cooperative # CR - 829095 Agreement DISCLAIMER The work reported here was developed under the STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This poster has not been formally reviewed by EPA. The views expressed here are solely those of the authors and STARMAP, the Program they represent. EPA does not endorse any products or commercial services mentioned in this poster. 100 200 300 Expected Cost 400 0.95 0.90 0.85 0.95 0.90 0.85 0 •Investigate optimal design for regression estimator •Investigate asymptotic properties of the AMSE ratio under different formulations (fixed frame bias, small frame bias, negligible frame bias). 0.80 i Relative Size of Incomplete Frame where F r (1 F ) vary with cost structures, relative frame sizes, Future Work 0.90 400 0.80 AMSE (ˆ) 2 EVp (ˆ) V E p (ˆ ) [ E E p (ˆ )] Relative Size of Incomplete Frame is defined ˆ U Relative efficiency of the biased scheme will and moment structure. Mu(2)nf= 2 xMu(2)f 2 U second moment with higher probability. Expected Cost in the Excluded Frame Elements 2 If {Z i } i U are a realization from a superpopulation model, then an anticipated mean as i U i •Optimal design samples elements with large Relative AMSE for Increasing Second Moment U U squared error of an estimator (1 F ) ( (1 F ) ) i Variability Mu(2)nf= 1 xMu(2)f expected costs for both designs, AMSE (ˆb ) 1 2qr 2C 1 2 2 AMSE (ˆu ) (qr 1) (qr 1) cnf (qr 1) 2 300 •Optimal design sample low cost elements with higher probability. 0.85 perennial streams is Cnf= 4 xCf 400 Efficiency investigate gains or losses with particular model and cost structures. Under equal 300 Cnf= 3 xCf Expected Cost The AMSE of the biased estimator ˆ relative b to the unbiased estimatorˆu can be used to 200 0.80 Suppose the total of some attribute about 100 Expected Cost 200 stream yielding an unbiased estimator. Cost 0 Relative Size of Incomplete Frame xi . Relative Mean Square Error 100 inclusion probabilities for every perennial non-perennial. Expected Cost 0.95 probability p ( xi ), for known auxiliary 0 •Complete frame assures positive first order probabilities to truly perennial streams than to 400 0.90 probability p ( xi,). For Yi>0, Zi>0 with 300 0.85 Assume reach(i) is truly perennial with 200 0.80 Z i Yi Pi U where 2i E [Yi | Pi 1, xi ] p( xi ) Relative Size of Incomplete Frame x i vector of known covariates Fi 2i / ci 2 Fi Known RF3 Classifica tion U Pi ~ Bernoulli ( p ( xi )) Yi Pi Fi i 100 C Fi 2i / ci Relative Size of Incomplete Frame Perennial) Define: Y ~ ( ( F ), 2 ( F )) i i i i i 0.90 0 Elimination of the frame bias •Optimal design gives higher inclusion 0.80 Under the incomplete frame (RF3 Classified 0.85 0.80 Relative Size of Incomplete Frame Under the complete frame (entire NHD) C 2i / ci i 2i / ci q2 Cnf= 2 xCf Relative Size of Incomplete Frame Cnf= 1 xCf U Model Results 0 100 200 300 Expected Cost 400 •Continue work on estimating p(xi)