1 Estimating the “correct” sample size Confidence intervals improve with the square root of the sample size so… …there are decreasing returns to scaling up your sample size. Sample size must be planned for the best return on investment, because surveying people is a costly and time consuming exercise. However, you also have to have workable margins of error. 2 General Considerations in Estimating Sample Size There are two questions in determining sample size: How small do you want your confidence interval to be? How small do you want your margins of error to be? The formula for calculating sample size depend on having an idea of these parameters. You would not want less than 95% confidence interval, and normally want the margins to be as low as possible. 3 Estimating sample size for proportions The first step to determining sample size is to ask: How small do you want your confidence interval to be? Say you want to be sure that the population proportion will fall within 2% of your sample proportion 95 times out of 100. What you are asking for is that 1.96 * SEp = 2% and we can use the standard error statistic to estimate the appropriate sample size to return a 2% margin. 4 The SE statistic in estimating sample size for proportions πΊπ¬π = π (πππ − π ) π Same SE statistic we used before except p has been replaced by the population proportion given by π Again, there are two problems to be solved: First, we don’t know what π is Second, we don’t know what n is (that’s what we need to know) Use theory for one and then solve for the other 5 Using theory to solve for π 1.96 * πΊπ¬π = ππ(πππ − ππ) π =2 The π term has been replaced by 50% because we know from the normal curve that in a proportions dataset 50% is the mean! The 1.96 * SEp = 2% formula is now restated as above, leaving one unknown value, n, to be solved by algebra 6 Using algebra to solve for n To solve for n 1.96 ∗ 50 π= 2 π = 49 n = 492 n = 2,401 The sample size should be 2,401 to give a 2% confidence interval 7 The SE statistic in estimating sample size for means You want to be 95% sure the population mean is within 1.5 grade points? Where: (πΆ ∗ π)2 π= πΏ C is the confidence interval you want. σ (small letter sigma) is the population standard deviation. δ (small letter delta) is the amount by which you want the sample mean to vary from the population mean. So where do you find all the values? 8 The SE statistic in estimating sample size for means (πΆ ∗ π) π= πΏ 2 C is 95% (your chosen confidence interval). δ is 1.5% (your chosen margin of error). σ is something you don’t know – the population standard deviation. But you can substitute the sample standard deviation ‘s’ for this: (πΆ ∗ π )2 π= πΏ 9 The SE statistic in estimating sample size for means But where do you get it, since you don’t have you survey done yet – you are trying to figure out the sample size - to find σ you can: 1. Use a previous survey. 2. Do a pilot study. 3. Use secondary data. 4. Use yours or some others researcher’s experience. Methods 1 or 2 are the best and a pilot study of 30 cases would provide a theoretically sound sample standard deviation you could use as a substitute. We have the small n=63 survey with a standard deviation of 12 so we can use this: 10 THIS SLIDE IS FROM EARLIER IN THE LECTURE IT SHOWS HOW THE SUBSTITUTION OF THE SAMPLE SD FOR THE POPULATION STANDARD DEVIATION WORKS πΊπ¬ = ππ π We still have the σ term though, but that can be exchanged for the sample standard deviation since we know from theory that a sample’s mean, standard deviation, and proportion are close enough to the population’s mean, standard deviation, and proportion to be able to substitute. Thus the formula becomes: ππ πΊπ¬ = π And this is adequate for calculations. More to the point, there are no unknowns and so it can be solved. 11 The SE statistic in estimating sample size for means (πΆ ∗ π)2 (πΆ ∗ π )2 This: π= πΏ Becomes this: π= πΏ With the substitution method the sample size formula now has all of the parameters it needs: C=1.96, s=12, δ=1.5 (1.96 ∗ 12)2 And worked through, π = 1.5 = 369 You would need a sample size of 369 students to be 95% sure that your population mean would be within 1.5 grade points of the sample mean. 12 Diminishing Returns on Sample Size First thing to remember is the sampling fraction (n/N): it doesn’t matter how big the population is from which you draw your sample because it is the sample size that matters not the fraction that the sample is of the population. If ‘n’ keeps increasing, your confidence interval keeps decreasing until when n=N the confidence interval would be zero – your sample is the population. But by how much would you have to increase the sample size and is it worth it? 13 Effect of diminishing returns on sample size using sd=12.0 n Doubled SE in % Change in ‘n’ produces rapidly decreasing change in ‘CI’ Change in Change in Change in CI for CI @ 95% n CI every n 30 2.19 4.29 30 -1.2577 -0.04192 60 1.55 3.04 60 -0.8893 -0.01482 120 1.10 2.15 120 -0.6289 -0.00524 240 0.77 1.52 240 -0.4447 -0.00185 480 0.55 1.07 480 -0.3144 -0.00066 960 0.39 0.76 960 -0.2223 -0.00023 1920 0.27 0.54 1920 -0.1572 -0.00008 3840 advantage 0.19 To a point, 0.38then the 3840 -0.1112of a larger ‘n’ diminishes 0.14 0.27 7680 rapidly. -0.0786 7680 15360 0.10 0.19 -15360 -0.1898 -0.00003 -0.00001 14 0.00001 Diminishing Returns on Sample Size (cont…) Change in CI for every change in 'n' 0 4 CI% Interval Confidence -0.02 5 6 7 8 9 10 11 12 13 14 15 16 17 3 -0.04 -0.06 2 -0.08 -0.1 -0.12 -0.14 -0.16 -0.18 1 -0.2 of 'n' Doublings ofDoublings ‘n’ starting at 30 15 18 19