Statistical Sampling Sampling Simple Random Sampling • Every possible combination of sample units has an equal and independent chance of being selected. • However… Systemic Sampling • Beware coincidental bias of sample interval and natural area. • Ridges • River bends • Etc. Stratified Random Sampling • The point is to reduce variability within strata. • Example: if you were measuring average estrogen levels in humans, you would stratify male versus female. • Can you think of some forest examples? Stratified Random Sampling Sampling In Excel =AVERAGE(A1:An) mean of the squared deviations Square root of variance Standard Deviation 𝜎= 1 𝑛−1 Use Excel function =STDEV(A1:An) or =STDEV.S(A1:An) 𝑛 (𝑥1 − 𝑥)2 𝑖=1 Exercise in Random Sampling • Student heights equals population • Calculate population mean, etc. • Take a systemic 20% sample compare estimates of population. • Take a 50% sample (systemic or random) and compare results. • Calculate mean, variance, SD and CV of both population and samples. Variability The differences between individuals or units in a population Standard Error of the mean • Equals the standard deviation of all possible sample means around the true population mean. Finite Population Correction Factor The finite population correction factor serves to reduce the standard error when relatively large samples are drawn from finite populations Confidence Interval • specify the precision of the sample mean in relation to the population mean. Student’s t distribution Confidence Interval Effect of Standard Deviation The red distribution has a mean of 40 and a standard deviation of 5; the blue distribution has a mean of 60 and a standard deviation of 10. For the red distribution, 68% of the distribution is between 45 and 55; for the blue distribution, 68% is between 40 and 60. Sampling Error Rather than work with absolute confidence limits, convert them to a percent of the sample mean which is called sampling error. The notation in the handbook is an upper case E. Take the confidence interval quantity and scale it to the sample mean by dividing by the sample mean. Express this value as a percent by multiplying by 100. By expressing the confidence interval as a percentage, the mean can be plus or minus the percentage derived. For example, at 95% confidence, an estimate of the mean has a confidence interval of 46.4 plus or minus 2.6. When expressed as a sampling error percent, the mean is plus or minus 5.6% which says the true population mean falls within 95% percent of the estimate. Determining Sample Size For a 95% confidence level, the t value approaches 2 as the sample size gets large, so a t value of 2 is commonly used when estimating sample size. The CV is the relative variability in the population being sampled. Use the population CV if known or use an estimate if it is not known. The E represents the desired sampling error, for example, 10% Items with Possible Impacts on Sampling Intensity Coefficient of Variation • The relative variability in the population being sampled • A unitless measure usual for comparing sampling methods • Sample Standard Deviation divided by sample mean times 100 sd/mean X 100 = CV Using CV for Comparison Because CVs have no associated unit of measure, they can be useful in comparing sampling methods to determine which is most efficient. So which method of sampling would require fewer samples? Effect of CV Change As the coefficient of variation increases, so does the required sample size. Sampling Intensity Revisited • The USFS Way Sample Selection – from Precruise data 1. Determine the sampling error for the sale as a whole. (set to 10%) 2. Subdivide (or stratify) the sale population into sampling components as needed to reduce the variability within the sampling strata. 3. Calculate the coefficient of variation (CV) by stratum and a weighted CV over all strata. (this will be covered more later in the statistics lectures) 4. Calculate number of plots for the sale as a whole and then distribute by stratum. Number of Plots Value of t is assumed to be 2 Error is set at 10% Distribute Plots by Stratum • For each stratum, the calculation would look like this: • n1 = (17.6 * 185) / 67.9 = 48 plots • n2 = (7.7 * 185) / 67.9 = 21 plots • n3 = (7.2 * 185) / 67.9 = 20 plots • n4 = (35.4 * 185) / 67.9 = 96 plots • Which totals to the 185 plots for the entire tract. Tree Expansion Factor • 1 divided by the fixed plot size times the number of plots • n = number of plots • SZ = fixed plot size • Ft = tree factor Sample Error Example step 1 (Calculate Standard Error) Plots 1/5 acre in size were used in this example and acres is equal to 18. So the total number of plots for the strata would be 5 plots per acre times 18 acres = 90 potential plots. Notice the application of the Finite Correction Factor (FCF) for this method. Sample Error – Step 2 Recall the Standard Error was calculated as 8.3 ft3 36.2% is a bit larger than the level we set to begin with (10%) – Implications?