Problem Set 5

advertisement
Department of Urban Studies and Planning
Massachusetts Institute of Technology
11.220 Quantitative Reasoning and Statistical Methods for Planning I
Spring 1999
Homework Set #5 Solutions
Comparison and Explanation—
Experimental Design, Statistical Hypothesis Testing
Question 1 1
In a particular community, the mean water usage per household for the month of January
1993 was .060 acre-feet—an acre-foot is the amount of water necessary to cover one acre
to a depth of one foot. This figure is based on the usage of ALL households in the
community during that month. During 1994 a countrywide water conservation campaign
was conducted. In January 1995, a random sample of 50 homes was selected, and water
usage was recorded for each home in the sample. The mean water usage per household in
the sample was .054 acre-feet with a standard deviation (s) of .016 acre-feet.
[6]
(a)
The county supervisors would like to know whether the data support the claim
that mean January water usage has declined. Has it? Conduct an appropriate
statistical hypothesis test and draw whatever conclusion you feel is justified by the
data.
This is a one-tailed hypothesis-test problem
Ho : 


a 
Where true mean household water usage for January, 1995
Since this is a lower-tailed test, we compute the P-value by calculating the value of the z test
statistic
z  x  o
s/ n

1
This question is adapted from Example 9.10, page 317 of Jay Devore and Roxy Peck, Statistics: The Exploration
and Analysis of Data, third edition (Pacific Grove, California: Duxbury Press, 1997).
1


z
0.054  0.060 0.006

 2.61
0.0023
0.016 / 50
This result implies that the observed mean of 0.054 is roughly 2.6 standard deviations below the
hypothesized value. The P-value is then equal to the area under the z-curve and to the left of
-2.61. From looking at the Z-table (Table II of Weiss), the P-value = 0.0045.
Using a 0.01 significance level Ho would be rejected (since 0.0045 < 0.01), suggesting that the
mean usage was lower. Notice that the rejection of Ho would not be justified if a very small
significance level such as 0.001 had been selected.
[6]
(b)
Of course, the county supervisors would like to conclude even more than simply
that water usage has gone down. Comment briefly on the experimental design
used here. How could it have been improved? What would that improvement
have accomplished?
Since water usage is often determined by income and education, a simple random sample may
not be the best way to formulate an environmental policy for the area. Indeed, the elasticity of
demand for water may vary considerably as the socioeconomic texture of a neighborhood
changes. It would thus be more appropriate to have stratified random sampling by household
income level or by education. It is also important to have a control in the study to offer some
insights regarding causality. Therefore, it may be useful to compare the changes in water
consumption rates in a nearby community with similar geographic and hydro-geological
characteristics where a formal conservation program was not instituted. Such a control would
help us determine whether the conservation program was actually a contributing factor in the
change or whether external factors were responsible for the change in water usage pattern.
2
Question 2
The Internal Revenue Service has completed a study of tax compliance in the New
England states. One of its conclusions was the following:
“The percentage of taxpayers in Massachusetts that under-report their true
incomes is significantly higher than the percentage of taxpayers in
Connecticut who under-report their true incomes.”
Surprisingly, despite this result, the Internal Revenue Service also announced that
it was not going to invest any further resources on trying to decrease underreporting in Massachusetts.
[6]
(a)
Explain the decision not to attempt to decrease under-reporting in Massachusetts
might be consistent with the fact that they found a “significant” difference
between the two states. Use the notions of practical significance and statistical
significance in your answer.
The statement implies that a statistical test may have been performed on the tax data which
determined that the difference in under-reporting between the two states was “significant”
insofar that the researchers could reject the null hypothesis. However, the cost of enforcement to
improve reporting in Massachusetts may be too high as compared to the lost revenues from
under-reporting. Hence a practical decision was made to not pursue the matter. In other words
the difference in under-reporting was practically insignificant, given various resource
constraints.
Question 3 2
This problem is in honor of my son, a college freshman and golf team member.
Recently Luc bought a new driver (the longest-hitting club). Prior to buying this club he
had carefully collected statistics on how far he was able to hit the ball with his old driver.
The mean distance was 200 yards.
He has just begun using the new driver and is curious whether the average distance of his
drives has gone up. He took a simple random sample of his drives using the new driver
and came up with the following observations:
205
198
220
210
194
2
201
213
191
211
203
This question is adapted from Exercise 9.53, page 337 of Devore and Peck, Statistics: The Exploration and
Analysis of Data.
3
[6]
(a)
Based upon these observations can Luc conclude that the length of his drives has
gone up, on average, with this new club?
First let us calculate some of the sample statistics for the data:
n=10,
sample mean = 204.60, sample standard deviation (s) = 9.03
Let us try and use a one-sample t test (one-tailed) for this problem
t
x  

s/ n
204.60  200
 161
.
9.03 / 10
Degrees of freedom = n-1 = 9
The critical value of t for and 9degrees of freedom is From Weiss, Table IV)
Since the critical value of t is greater than the test statistic, we cannot reject our null hypothesis
Thus Luc’s new driver is not hitting significantly longer drives (in a statistical sense) than his
old driver.
Alternatively, the P- value =0.071 (from calculator). Since this value is greater than the usual
significance level of 0.05, we fail to reject the null hypothesis.
The values in Table IV indicate that we would only reject our null hypothesis if  was set at 0.1
(a
rather high risk of Type 1 error).
[6]
(b)
What does a normal probability plot of the data tell you about the appropriateness
of the significance test you used in part (a)? [We have only discussed this in
passing in class, but using the text and Excel you should be able to construct a
normal probability plot for these data and answer this question without difficulty.]
To construct the Normal Probability Plot we arrange the data in ascending order and plot it
against the normal scores for n=10 (given in Table III of Weiss)
Shot Distance
191
194
198
201
203
205
210
211
Normal Score
-1.55
-1.00
-0.65
-0.37
-0.12
0.12
0.37
0.65
4
213
220
1.00
1.55
Normal Probability Plot
for Luc's Driver Shots
2
Normal Score
1.5
1
0.5
0
185
-0.5
190
195
200
205
210
215
220
225
-1
-1.5
-2
Distance
The plot is relatively linear which means that we can assume that the data is normally
distributed and hence the use of a t test is reasonable.
[3]
(c)
What is the most worrisome threat to internal validity in this “experiment?”
The most worrisome threat to internal validity in this problem is one of “maturation.” In other
words Luc could be performing better over time irrespective of the age or design of the club.
5
Question 4
The following text setting up this problem is taken from this year’s midterm exam:
The McClatchy News Service (San Luis Obispo Telegram-Tribune, June 13, 1991`)
conducted a study to measure the degree of violence on television. They reported the
results of three simple random samples of prime-time television hours. The following
table summarizes the information reported for the three major networks:3
Network
Mean Number of Violent Acts Per Hour
ABC
CBS
NBC
15.6
11.9
11.0
Suppose that each of these sample means was computed on the basis of viewing 50
randomly selected prime-time hours and that the population standard deviation for each
of the four networks is know to be  = 5.
[6]
(a)
The questions that you were asked in the midterm based on these data asked you
to focus on them from an estimation point of view. You calculated confidence
intervals and looked at the overlap between those confidence intervals to compare
the level of violence on ABC as compared to the other networks. We now have
the mathematical tools to approach this in a more mathematically precise fashion.
Conduct a statistical hypothesis test to decide whether or not you can conclude
based on these sample results that ABC programs are more violent, on average,
than CBS programs.
Ho= Mean violence acts per hour are the same (CBS=ABC)
Ha= Mean violence acts per hours for ABC > Mean violence hours for CBS
We need to do z test for two population means and independent samples.
The z statistic in this case is calculated by the following formula:
z
( xABC  xCBS )  ( ABC  CBS )
( 2ABC / nABC )  ( 2CBS / ncbs)
3
This question is adapted from Question 8.15, pages 282-283 of Devore and Peck, Statistics: The Exploration and
Analysis of Data.
6
CBS -ABC = 0
The standard error term in the denominator happens to be = 1 in this case
Therefore z= 15.6-11.9 = 3.7
This value gives a P value =1- 0.9999=.00001, which means that we reject our null hypothesis
Therefore the difference between ABC and CBS violent acts per hours is statistically significant.
[3]
(b)
Without actually performing the calculations, given what you have discovered in
your answer to part (a) what can you conclude about whether ABC programs are
more violent, on average, than NBC programs?
Given the very low P value for the result in part (a), and the fact that that the difference in
violence on programs that are aired on NBC versus ABC is even greater than for CBS versus
ABC, . we would also reject the null hypothesis that the violent acts per hour on the two networks
in favor of the alternative that there is a significant difference in violent acts per hour on the two
networks.
7
Question 5 4
A study compares two groups of mothers with young children who were on welfare two
years ago. One group attended a voluntary training program offered free of charge at a
local vocational school that was advertised in the local news media. The other group did
not choose to attend the training program. A researcher finds out about this “natural
experiment” (an experiment that has happened without being designed by a researcher)
and is interested in seeing what can be concluded from these results.
The researcher takes all of the data available from the vocational training program and
calculates the proportion of the mothers who are still on welfare. Furthermore, he
decides to treat these data as though they are the result of a simple random sample taken
from all of those who might have enrolled in the voluntary training program. [I know that
this seems odd, but the fact is that it is often done, though that decision is typically not as
explicit as I have made it here.]
He then takes a simple random sample of welfare mothers with children who decided not
to participate in the program. [For the purposes of this problem, assume that the
researcher can identify this parent population and take a simple random sample from it.]
Finally, he conducts a statistical hypothesis test and concludes: “I conclude (1) that the
proportion of mothers who enrolled in voluntary training and were still on welfare is
lower than the proportion of mothers who did not enroll in the program and were still on
welfare and (2) that that difference is statistically significant (P< .01).”
You are on the staff of a member of Congress who is interested in the plight of welfare
mothers. She asks for your advice:
[6]
(a)
Explain to the Congresswoman what statistical hypothesis test the researcher
conducted. You will need to explain what a statistical hypothesis test was called
for (rather than just simply looking at the observed difference between the two
proportions).
The researcher probably conducted a pooled t test for the two groups of welfare mothers to
determine whether or not the ostensible difference in numbers is statistically significant. The test
was employed to provide some probabilistic measure of how likely we are of being correct in
rejecting our null hypothesis, and asserting that there is a difference beyond just a “chance”
occurrence. However, the test does not of itself establish causality since there may be other
exogenous factors at play. Controls need to be instituted in the experimental design to ensure
that the difference was not caused by these factors.
4
This question is adapted from Review Exercise 8.59, pages 513-514, of David S. Moore, Statistics: Concepts and
Controversies, fourth edition (New York: W. H. Freeman, 1996). Please note that the wording is not identical to the
original.
8
[4]
(b)
Explain to the Congresswoman in simple language what “statistically significant
(P < .01)” means.
“P” stands for the probability that the difference observed in the study samples would occur if
the two populations actually have the same proportion still on welfare.
[4]
(c)
Is this study good evidence that requiring job training of all welfare mothers
would reduce the percent who remain on welfare several years later? In
answering this question you may want to think of issues of both internal and
external validity.
This evidence is not convincing on its own. Treatments were not randomly assigned, but instead
were chosen by the mothers. Mothers who chose to attend a job training program may be more
inclined to get themselves off welfare.
Question 6
NO GENERIC SOLUTIONS FOR THIS QUESTION
Pick a question that is of interest to you and that you believe could be answered through
carefully designing an experiment, conducting the experiment, and conducting a
statistical hypothesis test on the collected data in order to draw conclusions.
[2]
(a)
Clearly pose your question.
[6]
(b)
Describe the experiment that you would use and explain how its design would
help you to eliminate threats to internal and external validity.
[6]
(c)
Invent data that might have resulted from your experiment and conduct an
appropriate statistical hypothesis test.
[4]
(d)
Draw whatever conclusions are appropriate from your calculations.
9
Download