HON 180 Confidence Intervals for Averages 1. A utility company serves 50,000 households. As part of a survey of customer attitudes, they take a simple random sample of 750 of these households. The average number of television sets in the sample households turns out to be 1.86, and the SD is 0.80. If possible, find a 95%-confidence interval for the average number of television sets in all 50,000 households. If this isn’t possible, explain why not. Answer. 1.86 2 750 0.8 1.86 0.06 or 1.80 to 1.92. 750 2. Out of the 750 households in the survey in the previous exercise, 451 have computers. If possible, find a 90%-confidence interval for the percentage of all the 50,000 households with computers. If this isn’t possible, explain why not. Answer. 451 0.60 . So the 90% confidence interval is 750 0.60 1.64 750 0.6 0.4 0.60 0.03 or about 57% to 63%. 750 3. Out of the 750 households in the survey in the first question, 749 have at least one television set. If possible, find a 95%-confidence interval for the percentage of all the 50,000 households with at least one television set. If this isn’t possible, explain why not. Answer. It is not possible because the box model in this case is very skewed. A sample of 750 is not large enough for the distribution of sample percentages to be normal. All of our confidence interval calculations rely on the distribution of sample percentages being normal. The Central Limit Theorem guarantees that if the sample size is large enough, the distribution of sample percentages will be normal. However if the parent population is very skewed, the sample size needs to be really large (in the thousands). 4. As part of the survey described in exercise 1, all persons age 16 and over in the 750 sample households are interviewed. This makes 1,528 people. On average, the sample people watched 5.20 hours of television that Sunday before the survey, and the SD was 4.50 hours. If possible, find a 95%-confidence interval for the average number of hours spent watching television on that Sunday by all persons age 17 and over in the 50,000 households. If this isn’t possible, explain why not. Answer. It is not possible because this sample is a cluster sample, not a simple random sample. We do not have methods to calculate the SE for cluster samples. 5. (a) As his sample, a psychology instructor takes all the students in his class. Is this a probability sample? A cluster sample? Answer. It is neither a probability sample nor a cluster sample of a well-defined population. It is a particular group of students who are choosing to take a certain course at a certain time from a particular instructor. (b) A sociologist interviews the first 100 subjects who walk through a shopping mall one day. Does she have a probability sample? A cluster sample? Answer. It is neither a probability sample nor a cluster sample. It is a convenience sample of shoppers. 6. One year, there were about 3000 institutions of higher learning in the U.S. As part of a continuing study of higher education, the Carnegie Commission took a simple random sample of 400 of these institutions. The average enrollment in the 400 sample schools was 3700, and the SD was 6500. The Commission estimates the average enrollments of all 3000 institutions to be around 3700; they put a give or take number 325 on this estimate. Say whether each of the following statements is true or false, and explain. If you need more information to decide, say what you need and why. (a) An approximate 68%-confidence interval for the average enrollment of all the 3000 institutions runs from 3375 to 4025. True. The SE is indeed 400 6500 325 . 400 (b) If a statistician takes a simple random sample of 400 institutions out of 3000, and goes one SE either way from the average enrollment of the 400 sample schools, there is about a 68% chance that his interval will cover the average enrollment of all the 3000 schools. True. This statement accurate describes what a 68% confidence interval tells you. (c) The enrollments of all 3000 institutions follow the normal curve. False. The sample had a mean of 3700 and SD of 6500. We can assume that 6500 is an estimate for the SD for the 3000 institutions. But if the SD is 6500 and distribution is normal, then we would expect approximately 16% of the institutions to have enrollments less the about 3700 6500 = 2800 which is impossible. (d) About 68% of the schools in the sample had enrollments in the range 3700 6500. We need more information. It is possible that 68% of the schools in the sample are in that range. (e) It is estimated that 68% of the 3000 institutions of higher learning in the U.S. enrolled between 3700 – 325 = 3375 and 3700 + 325 = 4025 students. False. The 68% probability in a “68%-confidence interval” refers to the probability that the process of choosing a random sample and forming a confidence interval from it will yield an interval that includes the unknown population parameter. The 68% does not refer to the parent population. Furthermore we know that the SD is 6500, so it is impossible that 68% of the school enrollment lie in that narrow band from 3375 to 4025. (f) The normal curve can’t be used to figure confidence intervals here at all because the data don’t follow the normal curve. True. While it is true that the data don’t follow the normal curve, we can in fact use the normal curve to figure confidence intervals here because the Central Limit Theorem tells us the sample averages will follow the normal curve even when the parent population is not normal as long as the sample size is large enough. A sample of size of 400 is large enough here for the normal approximation to apply.