Simple Random Sampling n items measured out of N possible items sometimes N is inf inite n y y i 1 i n y sum over all n n i i 1 y square each value and sy then add them 2 i 1 2 i y sy 2 sy 2 i y i / n n items 2 2 sy n 1 2 sy N n without replacement ; n N 2 with replacement or when N is very l arg e n yT N y s yT N 2 s y 2 2 ( when y is already on a per ha basis (e.g ., m 3 per ha, then use A area in ha 2 instead of N in calculatin g yT and s yT ) CV sy y 100 95% Confidence Intervals for the mean of the population : y / t n 1,1 / 2 s y 95% Confidence Intervals for the total of the population : yT / t n 1,1 / 2 s yT Percent error achieved t n 1,1 / 2 s y CI width 100 100 for a completed survey est . mean y AE 100 when AE is given for a new survey y without replacement : with replacement or very l arg e N : Desired Percent error ( PE ) Sample Size n 1 AE 2 1 N t 2 sy2 1 1 PE 2 2 N t CV 2 n t 2 sy AE 2 2 t 2 CV 2 PE 2 Systematic Sampling Strip Sampling each strip is one observation, and the value for each strip is yi in the simple random sampling formula, and n is the number of strips selected for measurement randomly select the first strip Intensity is calculated as: I na A Can calculate number of strips needed based on intensity desired, the size of the area (A), and the area under each plot (a) Line Plot Sampling (or grid sampling) each plot is one observation, and the value for each plot is yi in the simple random sampling formula, and n is the number of plots. IA or 2) use simple random sampling equations for a specified AE a or PE or 3) sample size is lim ited by cos t Sample Size : 1) n Spacing : d ( m) A(m 2 ) n L ( m) or A(m 2 ) A(m 2 ) or B(m) n B ( m) n L ( m) Line Intersect Sampling For each line with length L in metres, dij in cm, and ij in deg rees : y i (m 3 / ha ) 2 8 L mi j 1 dij 2 cos( ij ) 10000 y i ( stems / ha ) 2 L mi 1 lij cos( ij ) j 1 then the y i are used in the simple random sampling equations to get an estimate of the mean , confidence int ervals for this, and an estimate of the total with an associated confidence int erval. The sample size , n, is the number of lines . Sampling for Proportions of Observations that have a particular characteristic Single Observations, No stratification n number having the characteri stic ˆ p number of sampled y i 1 i n yi 1 if it has the characteristic and 0 otherwise N is the total number of observations in the population Confidence intervals: One way: normal approximation when np > 5 or nq > 5 qˆ 1 pˆ s pˆ pˆ qˆ n 1 n 1 N pˆ z s pˆ 2 1 2n For example, for a 95% confidence interval, z=1.96. For a 90% confidence interval, z=1.64. Another way: for 95 % confidence intervals, use the binomial graph. Testing Hypotheses, One proportion: Is the proportion more than (some value)? (Note: could also test if the proportion was less than a value) H 0 : p ( some value) H 0 : p ( some value) z pˆ ( some value) s pˆ Compare to z critical from a normal distributi on table. If z calculated is greater than z in the table, then reject H 0 . For example : z critical for 0.05 is 1.64 Comparing proportions from two populations: H0 : p1 p2 H0 : p1 p2 s 2 pˆ1 z pˆ 1 qˆ1 n1 1 N 1 n1 N 1 s 2 pˆ 2 ˆ 2 qˆ 2 p n2 1 N 2 n2 N2 ( pˆ 1 pˆ 2 ) 0 s 2 pˆ1 s 2 pˆ 2 Compare to z critical from a normal distributi on table. If z calculated is greater than z in the table, then reject H 0 . For example : z critical for 0.05 is 1.64 Sample size for estimating one proportion: n 1 1 AE 2 2 N z pˆ qˆ Proportions in clusters (e.g., plots, trucks, houses, or any group of items) Two types of analysis: 1. Give equal weight to the proportion from each cluster. Calculate a proportion for each cluster, and treat each proportion as yi in simple random sampling equations 2. Give plots with more items more weight. Equations are then based on ratio of means sampling: N=number of clusters in the population n=number of clusters sampled mi= number of items in cluster i ai = number of items in cluster i that have the characteristic (e.g. dead, or whatever characteristic) m n mi n the average number of items per cluster i 1 n p ai i 1 n mi the estimated proportion i 1 n Need also : ai n 2 i 1 sp 2 sp ( N n) 1 N nm2 sp mi i 1 ai n n i 1 i 1 aimi ai mi 2 2 2 p aimi p 2 mi 2 n 1 2 95% Confidence Intervals for the proportion : p / t n 1,1 / 2 s p Stratified Random Sampling For each stratum (use simple random sampling equations for data from the stratum): n h items measured out of N h possible items sometimes N h is inf inite n yh s yh y i 1 ih nh i 1 y 2 n y sum over all ih y ih / nh 2 2 ih s yh N h nh nh N h 2 s yh nh 1 yTh N h y h nh items 2 s yTh N h s yh 2 2 2 n y square each and sum i 1 2 ih s or s yh 2 yh nh s yh CVh 100 yh 2 ( when y h is already on a per ha basis e.g . m 3 / ha , then use Ah area in ha 2 instead of N h in calculatin g yTh and s yTh ) 95% Confidence Intervals for the mean for Stratum h : y h / t n 1,1 / 2 s yh 95% Confidence Intervals for the total for Stratum h : yTh / t n 1,1 / 2 s yTh Overall (putting strata together to get estimates for the entire population): Nh A or Wh h where Wh is the relative size of Stratum h N A L L N Wh y h h y h yTST N y ST h 1 h 1 N Wh y ST L s yST Wh s yh 2 2 2 h 1 2 L N 2 h2 s yh h 1 N s yTST N 2 s yST 2 2 (if y st is exp ressed on a per ha basis e.g . m 3 / ha then replace N by A area for population to obtain yTST and s yTST 2 95% Confidence Intervals for the mean for the population : y ST / t EDF s yST 95% Confidence Intervals for the total for the population : yTST / t EDF s yTST where EDF is somewhere between the smallest nh 1 up to the sum of the nh 1 for all strata combined . EDF can be calculated by : s s 2 2 EDF L h 1 W y ST 2 h s yh nh 1 2 2 2 2 y ST Nh2 2 s y 2 h L N nh 1 h 1 Percent error achieved 2 t EDF s yST CI width 100 100 for a completed survey est . mean y ST Stratified Random Sampling (Continued) Sample Size Calculations: Equal allocation : nh n / L where n is calculated as : without replacement L n L L t 2 W 2h s2h h 1 L Wh s 2 h AE 2 t 2 h 1 L t 2 N 2h s2h h 1 L N 2 AE 2 t 2 N h s 2 h h 1 N for with replacement or when N is very l arg e L n L t 2 W 2h s2h h 1 AE 2 L L t 2 N 2h s2h h 1 2 N AE 2 Pr oportional allocation : N n n h n Wh N where n is calculated as : without replacement L n L t 2 Wh s 2 h h 1 L AE 2 t 2 Wh s 2 h h 1 N t 2 Nh s2h h 1 L N 2 AE 2 t 2 N h s 2 h h 1 N for with replacement or when N is very l arg e L n t 2 Wh s 2 h h 1 AE 2 L N t 2 Nh s2h h 1 N AE 2 2 Stratified Random Sampling (Continued) Neyman allocation : nh n Wh s h L W h 1 h sh where n is calculated as : without replacement L t Wh s h h 1 2 L t N h sh h 1 2 n L AE 2 t 2 Wh s 2 2 2 h h 1 L N 2 AE 2 t 2 N h s 2 h h 1 N for with replacement or when N is very l arg e L t Wh s h h 1 2 AE 2 2 n L t N h sh h 1 2 2 N AE 2 2 Optimum allocation : nh n Wh s h L W h 1 h sh ch ch where n is calculated as : n without L L W s t 2 Wh s h c h h h ch h 1 h 1 replacement L Wh s 2 h AE 2 t 2 h 1 L L N s t 2 N h s h c h h h ch h 1 h 1 L N 2 AE 2 t 2 N h s 2 h h 1 N for with replacement or when N is very l arg e : n L L W s t 2 Wh s h c h h h ch h 1 h 1 AE 2 L L N s t 2 N h s h c h h h ch h 1 h 1 N 2 AE 2 Ratio/Regression Sampling n number of items with both x and y measures; N number of items in the population x is the auxilliary var iable x and x known, as x is measured on all N items Using Mean of Ratios h n y i 1 sr 2 h i n / xi ri r i 1 r 2 i i 1 r 2 i ri / n r i i 1 n 2 sr n 1 sr N n n N yr r x sr CV 2 sr 100 r 2 sr sr 2 sr 2 yT r N y r s yr s r x s yT r N s yr ( when y r is already on a per ha basis e.g. m 3 / ha , then use A area in ha instead of N in calculatin g yT r nd s yT r ) 95% Confidence Intervals for the mean of the population : y r / t n 1,1 / 2 s y r 95% Confidence Intervals for the total of the population : yTr / t n 1,1 / 2 s yT r Achieved Percent error t n 1,1 / 2 s y r CI width 100 100 est . mean yr Sample Size without replacemen t : 1 n 1 PE 2 2 N t CV 2 with replacemen t or very l arg e N : t 2 CV 2 n PE 2 Ratio/Regression Sampling (continued) Using Ratio of Means n R y i 1 n x i 1 sR i y x sR i sR 2 2 2 1 y i 2 R xi y R 2 xi N n 2 N n 1 n x 2 yR R x yT R N y R y s CV 1 100 yR 2 i 2 R xi y R 2 xi (n 1) 2 yR s yR s R x s yT R N s y R 100 ( when y R is already on a per ha basis e.g . m 3 / ha , then use A area in ha instead of N in calculatin g yT R nd s yT R ) 95% Confidence Intervals for the mean of the population : y R / t n 1,1 / 2 s y R 95% Confidence Intervals for the total of the population : yT R / t n 1,1 / 2 s yT R Percent error Sample Size n 1 AE 2 1 N t 2 s1 2 t n 1,1 / 2 s y R CI width 100 100 est . mean yR without replacement : 1 1 PE 2 N t 2 CV 2 with replacement or very l arg e N : t 2 s1 t 2 CV 2 AE 2 PE 2 2 n Ratio/Regression Sampling (continued) Using Regression n n n yi Need : yi i 1 y 2 y i 1 i n i 1 n n x i 1 n i x i 1 n x y 2 i i 1 i x i SPXY xi y i xi y i / n x i 1 i n SSX xi xi / n 2 2 SSY y i y i / n 2 2 then : slope : b1 SPXY SSX int ercept : b0 y b1 x y lr b0 b1 x yT lr N y lr 1 x x N n SE E n SSX N 2 s ylr s yT lr N s ylr SS RES n2 SE E SS RES SSY b1 SPXY ( when y lr is already on a per ha basis e.g. m 3 / ha , then use A area in ha instead of N in calculatin g yT lr and s yT lr ) 95% Confidence Intervals for the mean of the population : y lr / t n 1,1 / 2 s ylr 95% Confidence Intervals for the total of the population : yT lr / t n 1,1 / 2 s yT lr Achieved Percent Sample Size : error not cov ered t n 1,1 / 2 s ylr CI width 100 est . mean y lr in FRST 339 100 Double Sampling Phase 1: Large sample for x 2 n' xi xi / n' i 1 s 2 x ' i 1 n'1 n' n' x ' xi n' n' observations on x i 1 2 Phase 2: Smaller sample, measure y also. n observations on x and on y n y yi i 1 s n x s2x i 2 2 y y xi / n 2 i yi / n x n 1 x y 2 n 1 n 2 s xy i i x i 1 i n xi yi / n n 1 Double Sampling: Using ratio of means (relationship between y and x goes through zero): n Rˆ 2 P y i x i i 1 n i 1 y x y 2 P Rˆ 2 P x ' yT 2 P N y 2 P 2 s 2 y Rˆ 2 P s 2 x 2 Rˆ 2 P s xy 2 2 Rˆ 2 P s xy Rˆ 2 P s 2 x ' s2 y s y2 P n n' N NOTE : last term drops out if N is very l arg e or if sampling is with replacemen t. 2 Can use s 2 x from Phase 2 instead of s 2 x ' from Phase 1 s 2 yT 2 P N 2 s 2 y2 P ( when y 2 P is already on a per ha basis e.g. m 3 / ha , then use A area in ha instead of N in calculatin g yT 2 P and s yT 2 P ) 95% Confidence Intervals for the mean for the population : y 2 P / t n 1,1 / 2 s y2 P 95% Confidence Intervals for the total for the population : yT 2 P / t n 1,1 / 2 s yT 2 P Percent error achieved t n 1,1 / 2 s y2 P CI width 100 100 est . mean y 2 P Double Sampling Double Sampling: Using regression (relationship between y and x can go through something other than zero) b s xy sx 2 where b is the estimated slope between y and x y lr 2 P y b( x x ' ) s 2 ylr 2 P yT lr 2 P N y lr 2 P 2 2 2 n 1 s y s xy / s x n n 2 2 1 1 1 b 2 s x n n' n' s 2 yT lr 2 P N 2 s 2 ylr 2 P ( when y lr 2 P is already on a per ha basis e.g. m 3 / ha , then use A area in ha instead of N in calculatin g y lr 2 P and s 2 yT lr 2 P ) 95% Confidence Intervals for the mean of the population : y lr 2 P / t n 1,1 / 2 s ylr 2 P 95% Confidence Intervals for the total of the population : y T lr 2 P / t n 1,1 / 2 s yT lr 2 P Achieved Percent error t n 1,1 / 2 s ylr 2 P CI width 100 100 est . mean y lr 2 P