Ratio and Regression Sampling, Example

advertisement
Simple Random Sampling
n items measured out of N possible items sometimes N is inf inite 
n
y
y
i 1
i
n
 y sum over all
n
n
i
i 1
 y  square each value and
sy
then add them
2
i 1
2
i
y

sy 
2
sy
2
i
  y i  / n
n items 
2
2
sy
n 1
2
sy  N  n 


 without replacement ;
n  N 
2
with replacement or when N is very l arg e
n
yT  N  y
s yT  N 2  s y
2
2
( when y is already on a per ha basis (e.g ., m 3 per ha, then use A area in ha 
2
instead of N in calculatin g yT and s yT )
CV 
sy
y
 100
95% Confidence Intervals for the mean of the population :
y  /  t n 1,1 / 2  s y
95% Confidence Intervals for the total of the population :
yT  /  t n 1,1 / 2  s yT
Percent error achieved 
t n 1,1 / 2  s y
CI width
 100 
 100 for a completed survey
est . mean
y
AE
 100 when AE is given for a new survey
y
without replacement :
with replacement or very l arg e N :
Desired Percent error ( PE ) 
Sample Size
n
1
AE 2
1

N t 2  sy2
1

1
PE 2
 2
N t  CV 2
n
t 2  sy
AE 2
2
t 2  CV 2

PE 2
Systematic Sampling
Strip Sampling
 each strip is one observation, and the value for each strip is yi in the simple random sampling
formula, and n is the number of strips selected for measurement
 randomly select the first strip
 Intensity is calculated as:
I
na
A
 Can calculate number of strips needed based on intensity desired, the size of the area (A), and the
area under each plot (a)
Line Plot Sampling (or grid sampling)
 each plot is one observation, and the value for each plot is yi in the simple random sampling
formula, and n is the number of plots.
IA
or 2) use simple random sampling equations for a specified AE
a
or PE or 3) sample size is lim ited by cos t
Sample Size : 1) n 
Spacing :
d ( m) 
A(m 2 )
n
L ( m) 
or
A(m 2 )
A(m 2 )
or B(m) 
n  B ( m)
n  L ( m)
Line Intersect Sampling
For each line with length L in metres, dij in cm, and  ij in deg rees :
y i (m 3 / ha ) 
2
8 L
mi

j 1
dij 2
cos( ij )
10000  
y i ( stems / ha ) 
2 L
mi
1
 lij  cos( ij )
j 1
then the y i are used in the simple random sampling equations to get an estimate
of the mean , confidence int ervals for this, and an estimate of the total with an
associated confidence int erval. The sample size , n, is the number of lines .
Sampling for Proportions of Observations that have a particular characteristic
Single Observations, No stratification
n
number having the characteri stic
ˆ 
p

number of sampled
y
i 1
i
n
yi  1 if it has the characteristic and 0 otherwise
N is the total number of observations in the population
Confidence intervals:
One way: normal approximation when np > 5 or nq > 5
qˆ  1  pˆ
s pˆ 
pˆ qˆ 
n
1  
n 1
N
pˆ  z   s pˆ 
2
1
2n
For example, for a 95% confidence interval, z=1.96. For a 90% confidence interval, z=1.64. Another
way: for 95 % confidence intervals, use the binomial graph.
Testing Hypotheses,
One proportion: Is the proportion more than (some value)? (Note: could also test if the proportion
was less than a value)
H 0 : p  ( some value)
H 0 : p  ( some value)
z
pˆ  ( some value)
s pˆ
Compare to z critical from a normal distributi on table. If z calculated is greater than
z in the table, then reject H 0 . For example : z critical for   0.05 is 1.64
Comparing proportions from two populations:
H0 :
p1  p2
H0 :
p1  p2
s 2 pˆ1 
z
pˆ 1  qˆ1
n1  1
 N 1  n1 


N
1


s 2 pˆ 2 
ˆ 2  qˆ 2
p
n2  1
 N 2  n2

 N2



( pˆ 1  pˆ 2 )  0
s 2 pˆ1  s 2 pˆ 2
Compare to z critical from a normal distributi on table. If z calculated is greater than
z in the table, then reject H 0 . For example : z critical for   0.05 is 1.64
Sample size for estimating one proportion:
n
1
1
AE 2
 2
N z  pˆ  qˆ
Proportions in clusters (e.g., plots, trucks, houses, or any group of items)
Two types of analysis:
1. Give equal weight to the proportion from each cluster. Calculate a proportion for each cluster, and
treat each proportion as yi in simple random sampling equations
2. Give plots with more items more weight. Equations are then based on ratio of means sampling:
N=number of clusters in the population
n=number of clusters sampled
mi= number of items in cluster i
ai = number of items in cluster i that have the characteristic (e.g. dead, or whatever characteristic)
m 
n
 mi
n
the average number of items per cluster
i 1
n
p
 ai
i 1
n
 mi
the estimated proportion
i 1
n
Need also :
 ai
n
2
i 1
sp
2
sp 
( N  n)
1



N
nm2
sp
 mi
i 1
 ai
n
n
i 1
i 1
 aimi   ai  mi
2
2
 2 p  aimi  p 2  mi 2
n 1
2
95% Confidence Intervals for the proportion :
p  /  t n 1,1 / 2  s p
Stratified Random Sampling
For each stratum (use simple random sampling equations for data from the stratum):
n h items measured out of N h possible items sometimes N h is inf inite 
n
yh 
s yh
y
i 1
ih
nh
i 1
y

2
n
y
sum over all
ih
  y ih  / nh
2
2
ih
s yh  N h  nh


nh  N h
2
s yh
nh  1
yTh  N h  y h
nh items 
2
s yTh  N h  s yh
2
2
2


n
 y  square each and sum 
i 1
2
ih
s

 or s yh 2  yh
nh

s yh
CVh 
 100
yh
2
( when y h is already on a per ha basis e.g . m 3 / ha , then use Ah area in ha 
2
instead of N h in calculatin g yTh and s yTh )
95% Confidence Intervals for the mean for Stratum h :
y h  /  t n 1,1 / 2  s yh
95% Confidence Intervals for the total for Stratum h :
yTh  /  t n 1,1 / 2  s yTh
Overall (putting strata together to get estimates for the entire population):
Nh
A
or Wh  h where Wh is the relative size of Stratum h
N
A
L
L
N
  Wh  y h   h  y h
yTST  N  y ST
h 1
h 1 N
Wh 
y ST
L
s yST   Wh  s yh
2
2
2
h 1
2
L
N
2
  h2  s yh
h 1 N
s yTST  N 2  s yST
2

2

(if y st is exp ressed on a per ha basis e.g . m 3 / ha then replace N by
A area for population  to obtain yTST and s yTST
2
95% Confidence Intervals for the mean for the population :
y ST  /  t EDF  s yST
95% Confidence Intervals for the total for the population :
yTST  /  t EDF  s yTST
where EDF is somewhere between the smallest nh  1 up to the sum of
the nh  1 for all strata combined . EDF can be calculated by :
s 
s 
2 2
EDF 
L

h 1
W
y ST
2
h
 s yh
nh  1
2 2

2 2

y ST
 Nh2
2



s
y
2
h

L  N



nh  1
h 1
Percent error achieved 
2
t EDF  s yST
CI width
 100 
 100 for a completed survey
est . mean
y ST
Stratified Random Sampling (Continued)
Sample Size Calculations:
Equal allocation :
nh  n / L
where n is calculated as :
without replacement
L
n 
L
L  t 2  W 2h  s2h
h 1
L
 Wh  s 2 h
AE 2  t 2  h  1

L  t 2   N 2h  s2h
h 1
L
N 2  AE 2  t 2   N h  s 2 h
h 1
N
for with replacement or when N is very l arg e
L
n 
L  t 2  W 2h  s2h
h 1
AE
2
L

L  t 2   N 2h  s2h
h 1
2
N  AE 2
Pr oportional allocation :
N
n n  h  n  Wh
N
where n is calculated as :
without replacement
L
n 
L
t 2   Wh  s 2 h
h 1
L
AE 2  t 2 
 Wh  s 2 h
h 1

N  t 2   Nh  s2h
h 1
L
N 2  AE 2  t 2   N h  s 2 h
h 1
N
for with replacement or when N is very l arg e
L
n 
t 2   Wh  s 2 h
h 1
AE
2
L

N  t 2   Nh  s2h
h 1
N  AE 2
2
Stratified Random Sampling (Continued)
Neyman allocation :
nh  n 
Wh  s h
L
W
h 1
h
 sh
where n is calculated as : without replacement
 L

t    Wh  s h 
 h 1

2
 L

t    N h  sh 
 h 1

2
n 

L
AE 2  t 2 
 Wh  s
2
2
2
h
h 1
L
N 2  AE 2  t 2   N h  s 2 h
h 1
N
for with replacement or when N is very l arg e
 L

t    Wh  s h 
 h 1

2
AE
2
2
n 
 L

t    N h  sh 
 h 1


2
2
N  AE
2
2
Optimum allocation :
nh  n 
Wh  s h
L
W
h 1
h
 sh
ch
ch
where n is calculated as :
n 
without
 L
 L W  s
t 2    Wh  s h c h   h h
ch
 h 1
 h  1
replacement




L
 Wh  s
2

h
AE 2  t 2  h  1
 L
 L N  s
t 2    N h  s h c h   h h
ch
 h 1
 h  1




L
N 2  AE 2  t 2   N h  s 2 h
h 1
N
for with replacement or when N is very l arg e :
n 
 L
 L W  s
t 2    Wh  s h c h   h h
ch
 h 1
 h  1
AE 2





 L
 L N  s
t 2    N h  s h c h   h h
ch
 h 1
 h  1
N 2  AE 2




Ratio/Regression Sampling
n  number of items with both x and y measures;
N  number of items in the population
x is the auxilliary var iable
 x and  x known, as x is measured on all N items
Using Mean of Ratios
h
n
y
i 1
sr
2
h
i
n
/ xi  ri
r
i 1
r

2
i
i 1
r
2
i
  ri  / n
r
i
i 1
n
2
sr 
n 1
sr
 N n


n  N 
yr  r   x
sr
CV 
2
sr
 100
r
2
sr 
sr 
2
sr
2
yT r  N  y r
s yr  s r   x
s yT r  N  s yr


( when y r is already on a per ha basis e.g. m 3 / ha , then use A area in ha 
instead of N in calculatin g yT r
nd s yT r )
95% Confidence Intervals for the mean of the population :
y r  /  t n 1,1 / 2  s y r
95% Confidence Intervals for the total of the population :
yTr  /  t n 1,1 / 2  s yT r
Achieved Percent
error 
t n 1,1 / 2  s y r
CI width
 100 
 100
est . mean
yr
Sample Size without replacemen t :
1
n
1
PE 2
 2
N t  CV 2
with replacemen t or very l arg e N :
t 2  CV 2
n
PE 2
Ratio/Regression Sampling (continued)
Using Ratio of Means
n
R
y
i 1
n
x
i 1
sR 
i
y

x
sR
i
sR
2
2
2
 1    y i  2  R   xi y  R 2  xi   N  n 



 

2  
  N 
n

1
n


x  


2
yR  R   x
yT R  N  y R
y
s
CV  1  100 
yR
2
i
 2  R   xi y  R 2  xi (n  1)
2
yR
s yR  s R   x
s yT R  N  s y R

 100

( when y R is already on a per ha basis e.g . m 3 / ha , then use A area in ha  instead of
N in calculatin g yT R
nd s yT R )
95% Confidence Intervals for the mean of the population :
y R  /  t n 1,1 / 2  s y R
95% Confidence Intervals for the total of the population :
yT R  /  t n 1,1 / 2  s yT R
Percent
error 
Sample Size
n
1
AE 2
1

N t 2  s1 2
t n 1,1 / 2  s y R
CI width
 100 
 100
est . mean
yR
without replacement :

1
1
PE 2

N t 2  CV 2
with replacement or very l arg e N :
t 2  s1
t 2  CV 2

AE 2
PE 2
2
n
Ratio/Regression Sampling (continued)
Using Regression
n
n
n
 yi
Need :
 yi
i 1
y
2
y
i 1
i
n
i 1
n
n
x
i 1
n
i
x
i 1
n
x y
2
i
i 1
i
x
i
SPXY   xi y i   xi  y i  / n
x
i 1
i
n
SSX   xi   xi  / n
2
2
SSY   y i   y i  / n
2
2
then :
slope :
b1 
SPXY
SSX
int ercept : b0  y  b1  x
y lr  b0  b1   x
yT lr  N  y lr
1  x  x 
 N n
 SE E 

 

n
SSX
 N 
2
s ylr
s yT lr  N  s ylr
SS RES
n2
SE E 

SS RES  SSY  b1  SPXY

( when y lr is already on a per ha basis e.g. m 3 / ha , then use A area in ha  instead
of N in calculatin g yT lr and s yT lr )
95% Confidence Intervals for the mean of the population :
y lr  /  t n 1,1 / 2  s ylr
95% Confidence Intervals for the total of the population :
yT lr  /  t n 1,1 / 2  s yT lr
Achieved Percent
Sample Size :
error 
not cov ered
t n 1,1 / 2  s ylr
CI width
 100 
est . mean
y lr
in FRST 339
 100
Double Sampling
Phase 1: Large sample for x
2
 n' 
xi    xi  / n'

 i 1 
s 2 x '  i 1
n'1
n'
n'
x '   xi n'
n' observations on x
i 1
2
Phase 2: Smaller sample, measure y also.
n observations on x and on y
n
y
 yi
i 1
s
n
x
s2x 
i
2
2
y
y

  xi  / n
2
i
  yi  / n
x
n 1
x y
2
n 1
n
2
s xy 
i
i
x
i 1
i
n
  xi  yi  / n
n 1
Double Sampling: Using ratio of means (relationship between y and x goes through zero):
n
Rˆ 2 P 
y
i
x
i
i 1
n
i 1

y
x
y 2 P  Rˆ 2 P  x '
yT 2  P  N  y 2  P
2
s 2 y  Rˆ 2 P  s 2 x  2  Rˆ 2 P  s xy
2
2  Rˆ 2 P  s xy  Rˆ 2 P  s 2 x '
s2 y
s y2  P 


n
n'
N
NOTE : last term drops out if N is very l arg e or if sampling is with replacemen t.
2
Can use s 2 x  from Phase 2  instead of s 2 x '  from Phase 1
s 2 yT 2  P  N 2  s 2 y2  P


( when y 2 P is already on a per ha basis e.g. m 3 / ha , then use A area in ha  instead
of N in calculatin g yT 2 P and s yT 2  P )
95% Confidence Intervals for the mean for the population :
y 2 P  /  t n 1,1 / 2  s y2  P
95% Confidence Intervals for the total for the population :
yT 2 P  /  t n 1,1 / 2  s yT 2  P
Percent error achieved 
t n 1,1 / 2  s y2  P
CI width
 100 
 100
est . mean
y 2 P
Double Sampling
Double Sampling: Using regression (relationship between y and x can go through something other
than zero)
b
s xy
sx
2
where b is the estimated slope between y and x
y lr 2  P  y  b( x  x ' )
s
2
ylr 2  P
yT
lr 2  P
 N  y lr 2  P
2
2
2
 n  1   s y  s xy  / s x


n
 n  2  
2
 


  1  1  1    b 2  s x 

 
n n' 
n' 


s 2 yT lr 2  P  N 2  s 2 ylr 2  P


( when y lr 2  P is already on a per ha basis e.g. m 3 / ha , then use A area in ha  instead
of N in calculatin g y lr 2  P and s
2
yT lr 2  P
)
95% Confidence Intervals for the mean of the population :
y lr 2  P  /  t n 1,1 / 2  s ylr 2  P
95% Confidence Intervals for the total of the population :
y T lr 2  P  /  t n 1,1 / 2  s yT lr 2  P
Achieved Percent error 
t n 1,1 / 2  s ylr 2  P
CI width
 100 
 100
est . mean
y lr 2  P
Download