Sampling Perennial Streams An Application in Model Based Optimal Design

advertisement
Sampling Perennial Streams
An Application in Model Based Optimal Design
William Coar and F. Jay Breidt
Abstract
Optimal Design for Horvitz-
Previous sample designs for perennial streams rely
on the River Reach File 3 (RF3) Classification in the
National Hydrography Dataset (NHD) to establish a
sampling frame of perennial reaches. Recent studies
have provided evidence of misclassification resulting in
an incomplete frame, biased results and high costs.
This application in optimal design investigates the use
of auxiliary data to assist in predicting the current status
of a stream segment as either perennial or nonperennial. The entire NHD database could then be used
as a complete frame. Based on this information, optimal
designs for fixed cost are derived for the complete and
incomplete frame. Anticipated mean square error under
a superpopulation model is then used to compare
estimators from the complete and incomplete frame.
Thompson Estimator
Example
For simplicity, assume Zi|Fi are independent.
The cost of sampling element i is ci. First
Let the relative frame size of the incomplete
order inclusion probabilities that minimize
frame as well as the expected cost vary.
AMSE for a fixed expected cost C are said to
Relative AMSE for Decreasing
be optimal.
Yi Pi (1  Fi )
0.95
0.90
0.85
0.80
0.95
0.95
0.90
0.85
Relative Size of Incomplete Frame
0.95
0.90
0.85
0.80
400
0
100
2
2i
0


  (1  Fi )i 
(1  Fi )i

C
2C
U

U

cnf (qr  1) 2 ( (1  Fi ) 2i ) 2 cnf (qr  1) 2 ( (1  Fi ) 2i ) 2
200
300
400
100
200
300
0.95
•Additional numerical investigation of AMSE
0
100
200
300
Expected Cost
Expected Cost
Mu(2)nf= 3 xMu(2)f
Mu(2)nf= 4 xMu(2)f
400
2i
, q
U
i
2i
cf
cnf
U
This research is funded by
U.S.EPA – Science To Achieve
Results (STAR) Program
Cooperative
# CR - 829095
Agreement
DISCLAIMER
The work reported here was developed under the STAR Research Assistance Agreement CR-829095
awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This poster
has not been formally reviewed by EPA. The views expressed here are solely those of the authors and
STARMAP, the Program they represent. EPA does not endorse any products or commercial services
mentioned in this poster.
100
200
300
Expected Cost
400
0.95
0.90
0.85
0.95
0.90
0.85
0
•Investigate optimal design for regression
estimator
•Investigate asymptotic properties of the
AMSE ratio under different formulations (fixed
frame bias, small frame bias, negligible frame
bias).
0.80
i
Relative Size of Incomplete Frame
where
F 
r
 (1  F ) 
vary with cost structures, relative frame sizes,
Future Work
0.90
400
0.80
AMSE (ˆ) 
2
EVp (ˆ)  V E p (ˆ  )  [ E E p (ˆ  )]
Relative Size of Incomplete Frame
is defined
ˆ
U
Relative efficiency of the biased scheme will
and moment structure.
Mu(2)nf= 2 xMu(2)f
2
U
second moment with higher probability.
Expected Cost
in the Excluded Frame Elements
2
If {Z i } i U are a realization from a superpopulation model, then an anticipated mean
as
i
U
i
•Optimal design samples elements with large
Relative AMSE for Increasing Second Moment
U
U
squared error of an estimator
 (1  F )
( (1  F ) )
i
Variability
Mu(2)nf= 1 xMu(2)f
expected costs for both designs,
AMSE (ˆb )
1
2qr
2C
 1


2
2
AMSE (ˆu )
(qr  1) (qr  1) cnf (qr  1) 2
300
•Optimal design sample low cost elements
with higher probability.
0.85
perennial streams is
Cnf= 4 xCf
400
Efficiency
investigate gains or losses with particular
model and cost structures. Under equal
300
Cnf= 3 xCf
Expected Cost
The AMSE of the biased estimator ˆ relative
b
to the unbiased estimatorˆu can be used to
200
0.80
Suppose the total of some attribute about
100
Expected Cost
200
stream yielding an unbiased estimator.
Cost
0
Relative Size of Incomplete Frame
xi .
Relative Mean Square Error
100
inclusion probabilities for every perennial
non-perennial.
Expected Cost
0.95
probability p ( xi ), for known auxiliary
0
•Complete frame assures positive first order
probabilities to truly perennial streams than to
400
0.90
probability p ( xi,). For Yi>0, Zi>0 with
300
0.85
Assume reach(i) is truly perennial with
200
0.80
Z i  Yi Pi
U
where 2i  E [Yi | Pi  1, xi ] p( xi )
Relative Size of Incomplete Frame
x i  vector of known covariates

Fi  2i / ci
2
Fi  Known RF3 Classifica tion


U
Pi ~ Bernoulli ( p ( xi ))
Yi Pi Fi 
i 
100
C Fi  2i / ci
Relative Size of Incomplete Frame
Perennial)
Define: Y ~ (  ( F ),  2 ( F ))
i
i
i
i
i

0.90
0
Elimination of the frame bias
•Optimal design gives higher inclusion
0.80
Under the incomplete frame (RF3 Classified
0.85

0.80
Relative Size of Incomplete Frame
Under the complete frame (entire NHD)
C  2i / ci
i 
 2i / ci
q2
Cnf= 2 xCf
Relative Size of Incomplete Frame
Cnf= 1 xCf
U
Model
Results
0
100
200
300
Expected Cost
400
•Continue work on estimating p(xi)
Download