8 - Determining the Sample Size Necessary to Obtain a Desired Margin of Error When Estimating and p with a Confidence Interval Population Mean () In the handout 7 – Sampling Distributions we found that the interval X 1.96 up to X 1.96 n n had a 95% chance of covering the population mean provided we “knew” the population standard deviation . The margin of error for this interval is Margin of Error 1.96 n If we wanted our margin of error to be at most E units what sample size should we use? This says that to obtain a 95% CI for with a margin of error no larger than E we should use a sample size of 1.96 n E 2 However we cannot calculate this in practice unless we know . Which of course we don’t and furthermore we don’t even know s, the sample standard deviation, until we have our data in hand. Thus in order to use this result we need to plug in a “best guess” for . This guess might come from: Pilot study where s = sample standard deviation is calculated Prior studies (literature reviews) Use approximation based on the Range, Range . Granted we don’t 4 the range until the data is collected, but we might be able to guess the largest and smallest values we might expect to see when collect our data. In general, using a which is too large is better than using one that is too small. Example: What sample size would be necessary to estimate the mean cholesterol level for the population of females between the ages of 30 – 40 with a 95% confidence interval that has a margin of error no larger than 5 mg/dl? 75 Population Proportion (p) In the handout 7 – Sampling Distributions we found that the interval p(1 p) p(1 p) up to pˆ 1.96 pˆ 1.96 n n had a 95% chance of covering the population proportion. The margin of error for this interval is p(1 p) Margin of Error 1.96 n If we wanted this to be at most E units what sample size should we use? This says that to obtain a 95% CI for p with a margin of error no larger than E we should use a sample size of 1.962 p(1 p) n 2 E However we cannot calculate this in practice unless we know p? Which of course we don’t and furthermore we don’t even know p̂ , the sample proportion, until we have our data in hand. In order to use this result we need to plug in a “best guess” for p. This guess might come from: Pilot study where p̂ = sample proportion is calculated Prior studies Use the worst case scenario by noting that p(1 p) .25 and is equal to .25 when p=.50. Using p = .50 simplifies the formula to 1.96 2 n 4E 2 If you have no “best guess” for p this conservative approach is the one you should take. Example: How many patients would need to be used to estimate the success rate of medical procedure, if researchers initially believe the success rate is no smaller than 85% and wish to estimate the true success rate using a 95% confidence interval with a margin of error no larger than E = .03? What if they wish to assume nothing about the success rate initially? 76