Chapter 7 Confidence Intervals and Sample Size The inferences that were discussed in chapters 5 and 6 were based on the assumption of an a priori hypothesis that the researcher had about a population. However, there are times when the researchers do not have a hypothesis. In such cases they would simply like a good estimate of the parameter. Point estimate Confidence interval – a point estimate ± margin of error Chapter 7 - Page 183 The logic behind the creation of confidence intervals can be demonstrated using the empirical rule, otherwise known as the 68-95-99.7 rule. Chapter 7 - Page 184 To determine a more precise critical value, use the probabilities in the z table to find the z-value. The critical value is found by first determining the area in one tail. The area in the left tail (AL) is found by subtracting the degree of confidence from 1 and then dividing this by 2. 1 deg ree of confidence . For example, substituting into the formula AL 2 for a 95% confidence interval produces AL 1 0.95 0.025 2 . The critical Z value for an area to the left of 0.025 is –1.96. Z -2.2 -2.1 -2.0 -1.9 -1.8 -1.7 0.09 0.0110 0.0143 0.0183 0.0233 0.0294 0.0367 0.08 0.0113 0.0146 0.0188 0.0239 0.0301 0.0375 0.07 0.0116 0.0150 0.0192 0.0244 0.0307 0.0384 0.06 0.0119 0.0154 0.0197 0.0250 0.0314 0.0392 0.05 0.0122 0.0158 0.0202 0.0256 0.0322 0.0401 0.04 0.0125 0.0162 0.0207 0.0262 0.0329 0.0409 0.03 0.0129 0.0166 0.0212 0.0268 0.0336 0.0418 0.02 0.0132 0.0170 0.0217 0.0274 0.0344 0.0427 0.01 0.0136 0.0174 0.0222 0.0281 0.0351 0.0436 The critical z value of –1.96 is also called the 2.5th percentile. That means that 2.5% of all possible statistics are below that value. Critical values can also be found using a TI 84 calculator. Use 2nd Distr, #3 invnorm (percentile, ,). For example invnorm(0.025,0,1) gives –1.95996 which rounds to –1.96. Find the critical z values for the following. Degree of Confidence 0.90 0.95 0.99 Area in Left Tail z* 0.025 1.96 Chapter 7 - Page 185 0.00 0.0139 0.0179 0.0228 0.0287 0.0359 0.0446 Confidence intervals for means require a critical value, t*, which is found on the t tables. These critical values are dependent upon both the degree of confidence and the sample size, or more precisely, the degrees of freedom. For example, the t* value for a 95% confidence interval with 7 degrees of freedom is 2.365. One Tail Probability Two Tail Probability Confidence Level df 5 6 7 8 0.4 0.25 0.1 0.05 0.025 0.01 0.005 0.0005 0.8 0.5 0.2 0.1 0.05 0.02 0.01 0.001 20% 50% 80% 90% 95% 98% 99% 99.9% 0.267 0.265 0.263 0.262 0.727 0.718 0.711 0.706 1.476 1.440 1.415 1.397 2.015 1.943 1.895 1.860 2.571 2.447 2.365 2.306 3.365 3.143 2.998 2.896 4.032 3.707 3.499 3.355 6.869 5.959 5.408 5.041 Chapter 7 - Page 186 A problem is that the standard error of sampling distributions includes variables we don’t know. pˆ p1 p . x n n . Therefore we have to estimate p and . The estimated standard errors then become: s pˆ pˆ 1n pˆ and s x s . n Parameter Distribution Estimated Standard Error s pˆ Proportion for one population, p pˆ 1 pˆ n p̂ p̂ p̂ p̂ p̂ p̂ p̂ p̂ p̂ Difference between proportions for two populations, pA – pB s pˆ A pˆ B pˆ A 1 pˆ A pˆ B 1 pˆ B nA nB pˆ A pˆ B pˆ A pˆ B pˆ A pˆ B pˆ A pˆ B Mean for one population or mean difference for dependent data, sx s n x x x x x x x x x Difference between means of two independent populations, µA – µB x A xB x A xB x A xB x A xB Chapter 7 - Page 187 n 1s A2 n B 1s B2 1 1 s x A xB A n A nB 2 n A n B The reasoning process for determining the formulas for the confidence intervals is the same in all cases. 1. Determine the degree of confidence. The most common are 95%, 99% and 90%. 2. Use the degree of confidence along with the appropriate table (z* or t*) to find the critical value. 3. Multiply the critical value times the standard error to find the margin of error. 4. The confidence interval is the statistic plus or minus the margin of error. Chapter 7 - Page 188 Notice that all the confidence intervals have the same format, even though some look more difficult than others. statistic ± margin of error statistic ± critical value x estimated standard error Confidence intervals about the proportion for one population: pˆ z * pˆ 1 pˆ n Confidence intervals for the difference in proportions between two populations: pˆ A pˆ B z * pˆ A qˆ A pˆ B qˆ B nA nB Confidence intervals for the mean for one population: x t* s n Confidence interval for the difference between two independent mean: n A 1s A2 n B 1s B2 1 1 n A nB 2 n A nB x A x B t * where t* is the appropriate percentile from the t(nA+nB-2) distribution. Chapter 7 - Page 189 Proportions (for categorical data) 1– sample pˆ z * pˆ 1 pˆ n Means (for quantitative data) x t* s n df = n – 1 Assumptions: If n<30, population is approximately normally distributed. Assumptions: np 5, n(1-p) 5 Calc: A: 1 PropZInt Calc: 8: T Interval 2– samples pˆ A pˆ B z * pˆ A qˆ A pˆ B qˆ B nA nB Assumptions: np 5, n(1-p) 5 for both populations Calc: B: 2 PropZInt n A 1s A2 n B 1s B2 n A nB 2 x A x B t * 1 1 n A n B df = nA+nB – 2 Assumptions: If n<30, population is approximately normally distributed. Calc: 0: 2-SampTInt What does a confidence interval mean? For a 95% confidence interval, 95% of all possible statistics are within z* (or t*) standard errors of the mean of the distribution. Therefore, there is a 95% probability that the data that is randomly selected will produce one of those statistics and the confidence interval that is created will contain the parameter. Whether the interval ultimately does include the parameter or not is unknown. We only know that if the sampling processes was repeated a large number of times producing many confidence intervals, about 95% of them would contain the parameter. Chapter 7 - Page 190 1. Find the 95% confidence interval of households prepared for a natural disaster. Assume that a random sample of 900 households was taken. Of these, 98 claimed they are prepared. Can we conclude that more than 10% are prepared? Chapter 7 - Page 191 2. Find the 90% confidence interval for the difference between the proportion of households in tornado/hurricane areas prepared for a disaster and the proportion of households in earthquake areas. Assume a random sample is taken from both populations. For the Tornado country 122/800 are prepared. For earthquake country, 98/900 are prepared. Chapter 7 - Page 192 3. Find the 99% confidence interval for the average daily caloric intake of US residents. Mean 3250, SD 600 n = 18 Chapter 7 - Page 193 4. Find the 95% confidence interval for the difference between the average daily caloric intake of a person on a diet compared to prior to the diet? Subject 1 2 3 4 5 6 Before 3820 3550 2840 4280 2960 2540 calories during 3760 3650 2530 3460 2960 2530 calories during- -60 100 -310 -820 0 -10 Before Chapter 7 - Page 194 5. Find the 99% confidence interval for the difference in daily caloric intake of Canadian residents and Americans. The table below shows the mean, standard deviation and sample size for the two samples. Units: hours/week Canadians Americans Mean 2950 3250 Standard Deviation 550 600 sample size, n 14 18 Chapter 7 - Page 195 Sample Size Estimation E z* n .25 z *2 E2 pˆ 1 pˆ . n or n z *2 4E 2 Example 2. Estimate the sample size needed for a national presidential poll if the desired margin of error is 3%. Assume 95% degree of confidence. Chapter 7 - Page 196