Math 3680 Lecture #15 Confidence Intervals Review: Suppose that E(X) = m and SD(X) = s. Recall the following two facts about the average of n observations drawn with replacement: E( X ) m SD( X ) s n sX Estimation Example: A university has 25,000 registered students. In a survey of 318 students, the average age of the sample is found to be 22.4, with a sample SD of 4.5 years. Estimate the average age of all 25,000 students, and attach a standard error to this estimate. Wrong Answer: The average age of the student body is exactly 22.4 years. What is wrong with this simplistic analysis? Answer: Of course, we estimate the average of the population to be 22.4 years – but this estimate will not be exact. To determine the magnitude of the error, we need to find the SE, and that means a box model. 25,000 tickets Average = ?? SD = ?? 318 draws Bootstrap Estimation: Although the SD of the box is unknown, we estimate the SD of the box from the fractions in the sample: SD of box 4.5 25000 318 4.5 SE of the sample average 25000 1 318 0.251. (Why?) Conclusion: The average age is about 22.4 years, give or take 0.251 years or so. Confidence Intervals: Large samples or known s 0.4 68% 0.3 0.2 0.1 -0.994458 0.994458 We say that the range 22.40.251 years = 22.149-22.651 years is a 68% confidence interval for the average age of the population. 0.4 95% 0.3 0.2 0.1 -1.95996 1.95996 We say that the range 22.4(1.96)(0.251) years = 21.909-22.891 years is a 95% confidence interval for the average age of the population. 0.4 99.7% 0.3 0.2 0.1 -2.96774 2.96774 We say that the range 22.4(2.968)(0.251) years = 21.656-23.144 years is a 99.7% confidence interval for the average age of the population. 0.4 1 - 2a 0.3 0.2 za 0.1 z1a In general, we say that the range s s X za m X z1a n n is a 1 - 2a confidence interval for the population average m. Logic: P za Z z1a 1 2a X m P za z1a 1 2a s/ n s s P za X m z1a 1 2a n n s s P za m X z1a 1 2a n n s s P X za m X z1a 1 2a n n Observations: 1) We are NOT saying that 95% of the students are between 21.9 and 22.9 years old – this is patently ridiculous, of course. 2) We are NOT saying that there is a 95% chance that the average age is between 21.9 and 22.9 years. The population average is constant – it is either in this range or it is not. Observations: 3) The true interpretation is as follows: If several people run this experiment and they all find a 95%confidence interval, then the true population parameter will lie in about 95% of these intervals. 100 different 95% confidence intervals 23.5 23 22.5 22 21.5 100 different 68% confidence intervals 23.25 23 22.75 22.5 22.25 22 21.75 23.5 23 22.5 22 21.5 100 different 95% confidence intervals, n = 4 x 318 =1272 Observations: 4) In the previous problem, we replaced the population s with the sample s. (When did we do this?) As it turns out, this makes little practical difference for large samples. More on this later when we consider small samples. Observations: 5) The normal approximation has been used. As discussed earlier, a large number of draws is required for this assumption to hold. 6) Remember: There is no such thing as a 100% confidence interval. In practice, scientists often use 95% as a balance between a high confidence level and a narrow confidence interval. Example: In a simple random sample of 680 households (in a city of millions), the average number of TV sets is 1.86, with an SD of 0.80. Find a 95% confidence interval for the average number of TV sets per household in the city. True or false: (i) 1.860.06 is a 95%-confidence interval for this population average. (ii) 1.860.06 is a 95%-confidence interval for this sample average. (iii) There is a 95% chance for the population average to be in the range 1.860.06. Example: The chart to the right shows platelet counts among 120 geriatric patients. Find a 95% confidence interval for the average platelet count among geriatric patients. 132 127 214 184 181 211 190 139 112 105 174 143 135 185 120 235 142 129 134 154 117 125 194 163 181 108 212 129 126 256 106 142 110 114 143 129 125 203 168 162 176 198 131 129 125 254 228 174 125 142 194 104 107 188 179 198 184 115 229 103 126 208 208 138 123 244 139 108 142 175 181 184 137 106 178 150 238 101 169 105 142 105 101 110 117 139 147 106 115 131 196 112 111 102 124 180 111 178 148 125 120 146 139 247 176 179 170 141 147 119 232 141 112 104 242 187 129 133 185 151 Fill in the blanks with either box or draws. Probabilities are used when reasoning from the __________ to the _____________. Confidence levels are used when reasoning from the ____________ to the ______________. Fill in the blank with either observed or expected. The chance error is in the _______________ value. Fill in the blank with either sample or population. The confidence level is for the ______________ average. Confidence Intervals: Projecting Sample Size Example: In a preliminary simple random sample of 680 households (in a city of millions), the average number of TV sets in the sample households is 1.86, with an SD of 0.80. Suppose that it’s desired to construct a 90% confidence interval which has a margin of error of 0.03. How large a sample would be necessary? Solution: 0.4 0. 8 z0.95 0.03 n 0. 8 1.645 0.03 n 0.3 0.2 0.1 43.867 n 1924.3 n -1.64485 1.64485 So, the sample size should be at least 1925 Confidence Intervals: Small samples Example: A biological research team measures the weights of 14 chipmunks, randomly chosen. Find a 90% confidence interval for the average weight of chipmunks. 7.6 8.2 8.66 9.41 8.45 8.08 8.86 7.48 9.24 9.34 9.58 10.1 8.55 9.15 Note: The previous calculations used the fact that X m s/ n approximately follows the normal curve for large values of n. In this problem, we cannot use this approximation. However, for both small and large samples, we can use the fact that X m S/ n approximately follows the Student’s t-distribution with n - 1 degrees of freedom. 0.4 1 - 2a 0.3 0.2 tn-1, a 0.1 tn-1, 1-a In general, we say that the range s s X tn 1,a m X tn 1,1a n n is a 1 - 2a confidence interval for the population average m. 0.4 Excel: TINV(0.1, 13) 90% 0.3 0.2 0.1 -1.77093 1.77093 Therefore, the 90% confidence interval is 0.76 8.76 1.77093 , 14 or 8.40 – 9.12 ounces. Note: Be sure you look up the correct number on the table in the back of the book. The numbers at the bottom of Table 4 specify the two-sided confidence levels. Example: Duracell tests 12 batteries in flashlights. They determine that the average life of the batteries in this sample is 3.58 hours, with a sample SD of 1.58 hours. Find a 95% confidence interval for the average life of a Duracell battery in a flashlight. Repeat if 100 batteries were tested (with the same sample mean and SD as above) Note: In previous lectures, we considered another technique of inferring information about the box from the draws – namely, hypothesis testing. Confidence intervals provide a method of estimating the average of the box. Hypothesis testing checks if the difference between the supposed box average and the sample average is either real or due to chance.