Ratio and Regression Sampling, Example

Simple Random Sampling n items measured out of N possible items sometimes N is inf inite  n y y i 1 i n  y sum over all n n i i 1  y  square each value and sy then add them 2 i 1 2 i y  sy  2 sy 2 i   y i  / n n items  2 2 sy n 1 2 sy  N  n     without replacement ; n  N  2 with replacement or when N is very l arg e n yT  N  y s yT  N 2  s y 2 2 ( when y is already on a per ha basis (e.g ., m 3 per ha, then use A area in ha  2 instead of N in calculatin g yT and s yT ) CV  sy y  100 95% Confidence Intervals for the mean of the population : y  /  t n 1,1 / 2  s y 95% Confidence Intervals for the total of the population : yT  /  t n 1,1 / 2  s yT Percent error achieved  t n 1,1 / 2  s y CI width  100   100 for a completed survey est . mean y AE  100 when AE is given for a new survey y without replacement : with replacement or very l arg e N : Desired Percent error ( PE )  Sample Size n 1 AE 2 1  N t 2  sy2 1  1 PE 2  2 N t  CV 2 n t 2  sy AE 2 2 t 2  CV 2  PE 2 Systematic Sampling Strip Sampling  each strip is one observation, and the value for each strip is yi in the simple random sampling formula, and n is the number of strips selected for measurement  randomly select the first strip  Intensity is calculated as: I na A  Can calculate number of strips needed based on intensity desired, the size of the area (A), and the area under each plot (a) Line Plot Sampling (or grid sampling)  each plot is one observation, and the value for each plot is yi in the simple random sampling formula, and n is the number of plots. IA or 2) use simple random sampling equations for a specified AE a or PE or 3) sample size is lim ited by cos t Sample Size : 1) n  Spacing : d ( m)  A(m 2 ) n L ( m)  or A(m 2 ) A(m 2 ) or B(m)  n  B ( m) n  L ( m) Line Intersect Sampling For each line with length L in metres, dij in cm, and  ij in deg rees : y i (m 3 / ha )  2 8 L mi  j 1 dij 2 cos( ij ) 10000   y i ( stems / ha )  2 L mi 1  lij  cos( ij ) j 1 then the y i are used in the simple random sampling equations to get an estimate of the mean , confidence int ervals for this, and an estimate of the total with an associated confidence int erval. The sample size , n, is the number of lines . Sampling for Proportions of Observations that have a particular characteristic Single Observations, No stratification n number having the characteri stic ˆ  p  number of sampled y i 1 i n yi  1 if it has the characteristic and 0 otherwise N is the total number of observations in the population Confidence intervals: One way: normal approximation when np > 5 or nq > 5 qˆ  1  pˆ s pˆ  pˆ qˆ  n 1   n 1 N pˆ  z   s pˆ  2 1 2n For example, for a 95% confidence interval, z=1.96. For a 90% confidence interval, z=1.64. Another way: for 95 % confidence intervals, use the binomial graph. Testing Hypotheses, One proportion: Is the proportion more than (some value)? (Note: could also test if the proportion was less than a value) H 0 : p  ( some value) H 0 : p  ( some value) z pˆ  ( some value) s pˆ Compare to z critical from a normal distributi on table. If z calculated is greater than z in the table, then reject H 0 . For example : z critical for   0.05 is 1.64 Comparing proportions from two populations: H0 : p1  p2 H0 : p1  p2 s 2 pˆ1  z pˆ 1  qˆ1 n1  1  N 1  n1    N 1   s 2 pˆ 2  ˆ 2  qˆ 2 p n2  1  N 2  n2   N2    ( pˆ 1  pˆ 2 )  0 s 2 pˆ1  s 2 pˆ 2 Compare to z critical from a normal distributi on table. If z calculated is greater than z in the table, then reject H 0 . For example : z critical for   0.05 is 1.64 Sample size for estimating one proportion: n 1 1 AE 2  2 N z  pˆ  qˆ Proportions in clusters (e.g., plots, trucks, houses, or any group of items) Two types of analysis: 1. Give equal weight to the proportion from each cluster. Calculate a proportion for each cluster, and treat each proportion as yi in simple random sampling equations 2. Give plots with more items more weight. Equations are then based on ratio of means sampling: N=number of clusters in the population n=number of clusters sampled mi= number of items in cluster i ai = number of items in cluster i that have the characteristic (e.g. dead, or whatever characteristic) m  n  mi n the average number of items per cluster i 1 n p  ai i 1 n  mi the estimated proportion i 1 n Need also :  ai n 2 i 1 sp 2 sp  ( N  n) 1    N nm2 sp  mi i 1  ai n n i 1 i 1  aimi   ai  mi 2 2  2 p  aimi  p 2  mi 2 n 1 2 95% Confidence Intervals for the proportion : p  /  t n 1,1 / 2  s p Stratified Random Sampling For each stratum (use simple random sampling equations for data from the stratum): n h items measured out of N h possible items sometimes N h is inf inite  n yh  s yh y i 1 ih nh i 1 y  2 n y sum over all ih   y ih  / nh 2 2 ih s yh  N h  nh   nh  N h 2 s yh nh  1 yTh  N h  y h nh items  2 s yTh  N h  s yh 2 2 2   n  y  square each and sum  i 1 2 ih s   or s yh 2  yh nh  s yh CVh   100 yh 2 ( when y h is already on a per ha basis e.g . m 3 / ha , then use Ah area in ha  2 instead of N h in calculatin g yTh and s yTh ) 95% Confidence Intervals for the mean for Stratum h : y h  /  t n 1,1 / 2  s yh 95% Confidence Intervals for the total for Stratum h : yTh  /  t n 1,1 / 2  s yTh Overall (putting strata together to get estimates for the entire population): Nh A or Wh  h where Wh is the relative size of Stratum h N A L L N   Wh  y h   h  y h yTST  N  y ST h 1 h 1 N Wh  y ST L s yST   Wh  s yh 2 2 2 h 1 2 L N 2   h2  s yh h 1 N s yTST  N 2  s yST 2  2  (if y st is exp ressed on a per ha basis e.g . m 3 / ha then replace N by A area for population  to obtain yTST and s yTST 2 95% Confidence Intervals for the mean for the population : y ST  /  t EDF  s yST 95% Confidence Intervals for the total for the population : yTST  /  t EDF  s yTST where EDF is somewhere between the smallest nh  1 up to the sum of the nh  1 for all strata combined . EDF can be calculated by : s  s  2 2 EDF  L  h 1 W y ST 2 h  s yh nh  1 2 2  2 2  y ST  Nh2 2    s y 2 h  L  N    nh  1 h 1 Percent error achieved  2 t EDF  s yST CI width  100   100 for a completed survey est . mean y ST Stratified Random Sampling (Continued) Sample Size Calculations: Equal allocation : nh  n / L where n is calculated as : without replacement L n  L L  t 2  W 2h  s2h h 1 L  Wh  s 2 h AE 2  t 2  h  1  L  t 2   N 2h  s2h h 1 L N 2  AE 2  t 2   N h  s 2 h h 1 N for with replacement or when N is very l arg e L n  L  t 2  W 2h  s2h h 1 AE 2 L  L  t 2   N 2h  s2h h 1 2 N  AE 2 Pr oportional allocation : N n n  h  n  Wh N where n is calculated as : without replacement L n  L t 2   Wh  s 2 h h 1 L AE 2  t 2   Wh  s 2 h h 1  N  t 2   Nh  s2h h 1 L N 2  AE 2  t 2   N h  s 2 h h 1 N for with replacement or when N is very l arg e L n  t 2   Wh  s 2 h h 1 AE 2 L  N  t 2   Nh  s2h h 1 N  AE 2 2 Stratified Random Sampling (Continued) Neyman allocation : nh  n  Wh  s h L W h 1 h  sh where n is calculated as : without replacement  L  t    Wh  s h   h 1  2  L  t    N h  sh   h 1  2 n   L AE 2  t 2   Wh  s 2 2 2 h h 1 L N 2  AE 2  t 2   N h  s 2 h h 1 N for with replacement or when N is very l arg e  L  t    Wh  s h   h 1  2 AE 2 2 n   L  t    N h  sh   h 1   2 2 N  AE 2 2 Optimum allocation : nh  n  Wh  s h L W h 1 h  sh ch ch where n is calculated as : n  without  L  L W  s t 2    Wh  s h c h   h h ch  h 1  h  1 replacement     L  Wh  s 2  h AE 2  t 2  h  1  L  L N  s t 2    N h  s h c h   h h ch  h 1  h  1     L N 2  AE 2  t 2   N h  s 2 h h 1 N for with replacement or when N is very l arg e : n   L  L W  s t 2    Wh  s h c h   h h ch  h 1  h  1 AE 2       L  L N  s t 2    N h  s h c h   h h ch  h 1  h  1 N 2  AE 2     Ratio/Regression Sampling n  number of items with both x and y measures; N  number of items in the population x is the auxilliary var iable  x and  x known, as x is measured on all N items Using Mean of Ratios h n y i 1 sr 2 h i n / xi  ri r i 1 r  2 i i 1 r 2 i   ri  / n r i i 1 n 2 sr  n 1 sr  N n   n  N  yr  r   x sr CV  2 sr  100 r 2 sr  sr  2 sr 2 yT r  N  y r s yr  s r   x s yT r  N  s yr   ( when y r is already on a per ha basis e.g. m 3 / ha , then use A area in ha  instead of N in calculatin g yT r nd s yT r ) 95% Confidence Intervals for the mean of the population : y r  /  t n 1,1 / 2  s y r 95% Confidence Intervals for the total of the population : yTr  /  t n 1,1 / 2  s yT r Achieved Percent error  t n 1,1 / 2  s y r CI width  100   100 est . mean yr Sample Size without replacemen t : 1 n 1 PE 2  2 N t  CV 2 with replacemen t or very l arg e N : t 2  CV 2 n PE 2 Ratio/Regression Sampling (continued) Using Ratio of Means n R y i 1 n x i 1 sR  i y  x sR i sR 2 2 2  1    y i  2  R   xi y  R 2  xi   N  n        2     N  n  1 n   x     2 yR  R   x yT R  N  y R y s CV  1  100  yR 2 i  2  R   xi y  R 2  xi (n  1) 2 yR s yR  s R   x s yT R  N  s y R   100  ( when y R is already on a per ha basis e.g . m 3 / ha , then use A area in ha  instead of N in calculatin g yT R nd s yT R ) 95% Confidence Intervals for the mean of the population : y R  /  t n 1,1 / 2  s y R 95% Confidence Intervals for the total of the population : yT R  /  t n 1,1 / 2  s yT R Percent error  Sample Size n 1 AE 2 1  N t 2  s1 2 t n 1,1 / 2  s y R CI width  100   100 est . mean yR without replacement :  1 1 PE 2  N t 2  CV 2 with replacement or very l arg e N : t 2  s1 t 2  CV 2  AE 2 PE 2 2 n Ratio/Regression Sampling (continued) Using Regression n n n  yi Need :  yi i 1 y 2 y i 1 i n i 1 n n x i 1 n i x i 1 n x y 2 i i 1 i x i SPXY   xi y i   xi  y i  / n x i 1 i n SSX   xi   xi  / n 2 2 SSY   y i   y i  / n 2 2 then : slope : b1  SPXY SSX int ercept : b0  y  b1  x y lr  b0  b1   x yT lr  N  y lr 1  x  x   N n  SE E      n SSX  N  2 s ylr s yT lr  N  s ylr SS RES n2 SE E   SS RES  SSY  b1  SPXY  ( when y lr is already on a per ha basis e.g. m 3 / ha , then use A area in ha  instead of N in calculatin g yT lr and s yT lr ) 95% Confidence Intervals for the mean of the population : y lr  /  t n 1,1 / 2  s ylr 95% Confidence Intervals for the total of the population : yT lr  /  t n 1,1 / 2  s yT lr Achieved Percent Sample Size : error  not cov ered t n 1,1 / 2  s ylr CI width  100  est . mean y lr in FRST 339  100 Double Sampling Phase 1: Large sample for x 2  n'  xi    xi  / n'   i 1  s 2 x '  i 1 n'1 n' n' x '   xi n' n' observations on x i 1 2 Phase 2: Smaller sample, measure y also. n observations on x and on y n y  yi i 1 s n x s2x  i 2 2 y y    xi  / n 2 i   yi  / n x n 1 x y 2 n 1 n 2 s xy  i i x i 1 i n   xi  yi  / n n 1 Double Sampling: Using ratio of means (relationship between y and x goes through zero): n Rˆ 2 P  y i x i i 1 n i 1  y x y 2 P  Rˆ 2 P  x ' yT 2  P  N  y 2  P 2 s 2 y  Rˆ 2 P  s 2 x  2  Rˆ 2 P  s xy 2 2  Rˆ 2 P  s xy  Rˆ 2 P  s 2 x ' s2 y s y2  P    n n' N NOTE : last term drops out if N is very l arg e or if sampling is with replacemen t. 2 Can use s 2 x  from Phase 2  instead of s 2 x '  from Phase 1 s 2 yT 2  P  N 2  s 2 y2  P   ( when y 2 P is already on a per ha basis e.g. m 3 / ha , then use A area in ha  instead of N in calculatin g yT 2 P and s yT 2  P ) 95% Confidence Intervals for the mean for the population : y 2 P  /  t n 1,1 / 2  s y2  P 95% Confidence Intervals for the total for the population : yT 2 P  /  t n 1,1 / 2  s yT 2  P Percent error achieved  t n 1,1 / 2  s y2  P CI width  100   100 est . mean y 2 P Double Sampling Double Sampling: Using regression (relationship between y and x can go through something other than zero) b s xy sx 2 where b is the estimated slope between y and x y lr 2  P  y  b( x  x ' ) s 2 ylr 2  P yT lr 2  P  N  y lr 2  P 2 2 2  n  1   s y  s xy  / s x   n  n  2   2       1  1  1    b 2  s x     n n'  n'    s 2 yT lr 2  P  N 2  s 2 ylr 2  P   ( when y lr 2  P is already on a per ha basis e.g. m 3 / ha , then use A area in ha  instead of N in calculatin g y lr 2  P and s 2 yT lr 2  P ) 95% Confidence Intervals for the mean of the population : y lr 2  P  /  t n 1,1 / 2  s ylr 2  P 95% Confidence Intervals for the total of the population : y T lr 2  P  /  t n 1,1 / 2  s yT lr 2  P Achieved Percent error  t n 1,1 / 2  s ylr 2  P CI width  100   100 est . mean y lr 2  P

Ratio and Regression Sampling, Example

Related documents

Products

Support

Ratio and Regression Sampling, Example

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib