HON 180 Confidence Intervals for Averages

advertisement
HON 180 Confidence Intervals for Averages
1. A utility company serves 50,000 households. As part of a survey of customer attitudes,
they take a simple random sample of 750 of these households. The average number of
television sets in the sample households turns out to be 1.86, and the SD is 0.80. If
possible, find a 95%-confidence interval for the average number of television sets in all
50,000 households. If this isn’t possible, explain why not.
Answer. 1.86  2  750  0.8  1.86  0.06 or 1.80 to 1.92.
750
2. Out of the 750 households in the survey in the previous exercise, 451 have computers. If
possible, find a 90%-confidence interval for the percentage of all the 50,000 households
with computers. If this isn’t possible, explain why not.
Answer. 451  0.60 . So the 90% confidence interval is
750
 0.60  1.64  750  0.6  0.4  0.60  0.03 or about 57% to 63%.
750
3. Out of the 750 households in the survey in the first question, 749 have at least one
television set. If possible, find a 95%-confidence interval for the percentage of all the
50,000 households with at least one television set. If this isn’t possible, explain why not.
Answer. It is not possible because the box model in this case is very skewed. A sample
of 750 is not large enough for the distribution of sample percentages to be normal. All of
our confidence interval calculations rely on the distribution of sample percentages being
normal. The Central Limit Theorem guarantees that if the sample size is large enough, the
distribution of sample percentages will be normal. However if the parent population is
very skewed, the sample size needs to be really large (in the thousands).
4. As part of the survey described in exercise 1, all persons age 16 and over in the 750
sample households are interviewed. This makes 1,528 people. On average, the sample
people watched 5.20 hours of television that Sunday before the survey, and the SD was
4.50 hours. If possible, find a 95%-confidence interval for the average number of hours
spent watching television on that Sunday by all persons age 17 and over in the 50,000
households. If this isn’t possible, explain why not.
Answer. It is not possible because this sample is a cluster sample, not a simple random
sample. We do not have methods to calculate the SE for cluster samples.
5. (a) As his sample, a psychology instructor takes all the students in his class. Is this a
probability sample? A cluster sample?
Answer. It is neither a probability sample nor a cluster sample of a well-defined
population. It is a particular group of students who are choosing to take a certain course
at a certain time from a particular instructor.
(b) A sociologist interviews the first 100 subjects who walk through a shopping mall one
day. Does she have a probability sample? A cluster sample?
Answer. It is neither a probability sample nor a cluster sample. It is a convenience
sample of shoppers.
6. One year, there were about 3000 institutions of higher learning in the U.S. As part of a
continuing study of higher education, the Carnegie Commission took a simple random
sample of 400 of these institutions. The average enrollment in the 400 sample schools
was 3700, and the SD was 6500. The Commission estimates the average enrollments of
all 3000 institutions to be around 3700; they put a give or take number 325 on this
estimate. Say whether each of the following statements is true or false, and explain. If
you need more information to decide, say what you need and why.
(a) An approximate 68%-confidence interval for the average enrollment of all the 3000
institutions runs from 3375 to 4025.
True. The SE is indeed
400  6500  325 .
400
(b) If a statistician takes a simple random sample of 400 institutions out of 3000, and
goes one SE either way from the average enrollment of the 400 sample schools, there is
about a 68% chance that his interval will cover the average enrollment of all the 3000
schools.
True. This statement accurate describes what a 68% confidence interval tells you.
(c) The enrollments of all 3000 institutions follow the normal curve.
False. The sample had a mean of 3700 and SD of 6500. We can assume that 6500 is an
estimate for the SD for the 3000 institutions. But if the SD is 6500 and distribution is
normal, then we would expect approximately 16% of the institutions to have enrollments
less the about 3700  6500 = 2800 which is impossible.
(d) About 68% of the schools in the sample had enrollments in the range 3700  6500.
We need more information. It is possible that 68% of the schools in the sample are in
that range.
(e) It is estimated that 68% of the 3000 institutions of higher learning in the U.S. enrolled
between 3700 – 325 = 3375 and 3700 + 325 = 4025 students.
False. The 68% probability in a “68%-confidence interval” refers to the probability that
the process of choosing a random sample and forming a confidence interval from it will
yield an interval that includes the unknown population parameter. The 68% does not
refer to the parent population. Furthermore we know that the SD is 6500, so it is
impossible that 68% of the school enrollment lie in that narrow band from 3375 to 4025.
(f) The normal curve can’t be used to figure confidence intervals here at all because the
data don’t follow the normal curve.
True. While it is true that the data don’t follow the normal curve, we can in fact use the
normal curve to figure confidence intervals here because the Central Limit Theorem tells
us the sample averages will follow the normal curve even when the parent population
is not normal as long as the sample size is large enough. A sample of size of 400 is
large enough here for the normal approximation to apply.
Download