Type I and II Errors and Power 1. TV safety. The manufacturer of a

advertisement
Type I and II Errors and Power
1. TV safety. The manufacturer of a metal stand for home
TV sets must be sure that its product will not fail
under the weight of the TV. Since some larger sets
weigh nearly 300 pounds, the company's safety
inspectors have set a standard of ensuring that the
stands can support an average of at least 500 pounds.
Their inspectors regularly subject a random sample of
the stands to increasing weight until they fail. They
test the hypothesis H0:  = 500 against HA:  < 500,
using the level of significance  = 0.01. If the sample
of stands fail to pass this safety test, the inspectors
will not certify the product for sale to the general
public.
a) Is this an upper-tail or lower-tail test? In the context
of the problem, why do you think this is important?
b) Explain what will happen if the inspectors commit
a Type I error.
c) Explain what will happen if the inspectors commit a
Type II error.
2. Catheters. During an angiogram, heart problems can be
examined via a small tube (a catheter) threaded into
the heart from a vein in the patient's leg. It's important
that the company who manufactures the catheter
maintain a diameter of 2.00 mm. (The standard
deviation is quite small.) Each day, quality control
personnel make several measurements to test H0:  =
2.00 against HA:   2.00 at a significance level of  =
0.05. If they discover a problem, they will stop the
manufacturing process until it is corrected.
a) Is this a one-sided or two-sided test? In the context
of the problem, why do you think this is important?
b) Explain in this context what happens if the quality
control people commit a Type I error.
c) Explain in this context what happens if the quality
control people commit a Type II error.
3. TV safety revisited. The manufacturer of the metal TV
stands in Exercise 1 is thinking of revising its safety
test.
a) If the company's lawyers are worried about being
sued for selling an unsafe product, should they
increase or decrease the value of ? Explain.
b) In this context, what is meant by the power of the
test?
c) If the company wants to increase the power of the
test, what options does it have? Explain the
advantages and disadvantages of each option.
4. Catheters again. The catheter company in Exercise 2 is
reviewing its testing procedure.
a) Suppose the significance level is changed to  =
0.01. Will the probability of Type II error increase,
decrease, or remain the same?
b) What is meant by the power of the test the company
conducts?
c) Suppose the manufacturing process is slipping out
of proper adjustment. As the actual mean diameter of
the catheters produced gets farther and farther above
the desired 2.00 mm, will the power of the quality
control test increase, decrease, or remain the same?
d) What could they do to improve the power of the
test?
7. Testing cars. A clean air standard requires that vehicle
exhaust emissions not exceed specified limits for
various pollutants. Many states require that cars be
tested annually to be sure they meet these standards.
Suppose state regulators double check a random
sample of cars that a suspect repair shop has certified
as okay. They will revoke the shop's license if they
find significant evidence that the shop is certifying
vehicles that do not meet standards.
a) In this context, what is a Type I error?
b) In this context, what is a Type II error?
c) Which type of error would the shop's owner
consider more serious?
d) Which type of error might environmentalists
consider more serious?
8. Quality control. Production managers on an assembly
line must monitor the output to be sure that the level
of defective products remains small. They periodically
inspect a random sample of the items produced. If
they find a significant increase in the proportion of
items that must be rejected, they will halt the
assembly process until the problem can be identified
and repaired.
a) In this context, what is a Type I error?
b) In this context, what is a Type II error?
c) Which type of error would the factory owner
consider more serious?
d) Which type of error might customers consider more
serious?
9. Cars again. As in Exercise 7, state regulators are
checking up on repair shops to see if they are
certifying vehicles that do not meet pollution
standards.
a) In this context, what is meant by the power of the
test the regulators are conducting?
b) Will the power be greater if they test 20 or 40 cars?
Why?
c) Will the power be greater if they use a 5% or a 10%
level of significance? Why?
d) Will the power be greater if the repair shop's
inspectors are only a little out of compliance or a lot?
Why?
10. Production. Consider again the task of the quality
control inspectors in Exercise 8.
a) In this context, what is meant by the power of the
test the inspectors conduct?
b) They are currently testing 5 items each hour.
Someone has proposed they test 10 each hour instead.
What are the advantages and disadvantages of such a
change?
c) Their test currently uses a 5% level of significance.
What are the advantages and disadvantages of
changing to an alpha level of 1%?
d) Suppose that as a day passes one of the machines
on the assembly line produces more and more items
that are defective. How will this affect the power of
the test?
11. Equal opportunity? A company is sued for job
discrimination because only 19% of the newly hired
candidates were minorities when 27% of all applicants
were minorities. Is this strong evidence that the
company's hiring practices are discriminatory?
a) Is this a one-tailed or a two-tailed test? Why?
b) In this context, what would a Type I error be?
c) In this context, what would a Type II error be?
d) In this context, describe what is meant by the power
of the test.
e) If the hypothesis is tested at the 5% level of
significance instead of 1%, how will this affect the
power of the test?
f) The lawsuit is based on the hiring of 37 employees.
Is the power of the test higher than, lower than, or the
same as it would be if it were based on 87 hires?
12. Stop signs. Highway safety engineers test new road
signs, hoping that increased reflectivity will make
them more visible to drivers. Volunteers drive through
a test course with several of the new and old style
signs and rate which kind shows up the best.
a) Is this a one-tailed or a two-tailed test? Why?
b) In this context, what would a Type I error be?
c) In this context, what would a Type II error be?
d) In this context, describe what is meant by the power
of the test.
e) If the hypothesis is tested at the 1% level of
significance instead of 5%, how will this affect the
power of the test?
f) The engineers hoped to base their decision on the
reactions of 50 drivers, but time and budget
constraints may force them to cut back to 20. How
would this affect the power of the test? Explain.
13. Dropouts. A Statistics professor has observed that for
several years about 13% of the students who initially
enroll in his Introductory Statistics course withdraw
before the end of the semester. A salesman suggests
that he try a statistics software package that gets
students more involved with computers, predicting
that it will cut the dropout rate. The software is
expensive, and the salesman offers to let the professor
use it for a semester to see if the dropout rate goes
down significantly. The professor will have to pay for
the software only if he chooses to continue using it.
a) Is this a one-tailed or two-tailed test? Explain.
b) Write the null and alternative hypotheses.
c) In this context, explain what would happen if the
professor makes a Type I error.
d) In this context, explain what would happen if the
professor makes a Type II error.
e) What is meant by the power of this test?
14. Ads. A company is willing to renew its advertising
contract with a local radio station only if the station
can prove that more than 20% of the residents of the
city have heard the ad and recognize the company's
product. The radio station conducts a random phone
survey of 400 people.
a) What are the hypotheses?
b) The station plans to conduct this test using a 10%
level of significance, but the company wants the
significance level lowered to 5%. Why?
c) What is meant by the power of this test?
d) For which level of significance will the power of
this test be higher? Why?
e) They finally agree to use  = .05, but the company
proposes that the station call 600 people instead of the
400 initially proposed. Will that make the risk of Type
II error higher or lower? Explain.
15. Dropouts, part II. Initially, 203 students signed up
for the Stats course in Exercise 13. They used the
software suggested by the salesman, and only 11
dropped out of the course.
a) Should the professor spend the money for this
software? Support your recommendation with an
appropriate test.
b) Explain carefully what your P-value means in this
context.
16. Testing the ads. The company in Exercise 14 contacts
600 people selected at random, and only 133
remember the ad.
a) Should the company renew the contract? Support
your recommendation with an appropriate test.
b) Explain carefully what your P-value means in this
context.
17. Suppose that a study is designed to choose between the
hypotheses:
Null hypothesis: Population proportion is 0.25.
Alternative hypothesis: Population proportion is
higher than 0.25.
On the basis of a sample of size 500, the sample
proportion is 0.29. The standard deviation for the
potential sample proportions in this case is about 0.02.
a. Compute the standardized score corresponding to
the sample proportion of 0.29, assuming the null
hypothesis is true.
b. What is the percentile for the standardized score
computed in part a?
c. Based on the results of parts a and b, make a
conclusion. Be explicit about the wording of your
conclusion and justify your answer.
d. To compute the standardized score in part a, you
assumed the null hypothesis was true. Explain why
you could not compute a standardized score under
the assumption that the alternative hypothesis was
true.
18. Consider the medical testing situation, in which the
null hypothesis is that the patient does not have the
disease and the alternative hypothesis is that they do.
a. Give an example of a medical situation in which a
type 1 error would be more serious.
b. Give an example of a medical situation in which a
type 2 error would be more serious.
have provided additional information?
19. For each of the situations in Exercise 18, explain the
two errors that could be made and what the
consequences would be.
but we cannot specify the probability of making a type
2 error, given that the alternative hypothesis is true.
21. Given the convention of declaring that a result is
“statistically significant” if the p-value is 0.05 or less,
what decision would be made concerning the null and
alternative hypotheses in each of the following cases?
Be explicit about the wording of the decision.
a. P-value = 0.35
b. P-value = 0.04
22. We learned that researchers have discovered a link
between vertex baldness and heart attacks in men.
a. State the null hypothesis and the alternative
hypothesis used to investigate whether or not there
is such a relationship.
b. Discuss what would constitute a type 1 error in this
study.
c. Discuss what would constitute a type 2 error in this
study.
23. A report in the Davis Enterprise (April 6, 1994, p. A11) was headlined, “Highly educated people are less
likely to develop Alzheimer’s disease, a new study
suggests.”
a. State the null and alternative hypotheses the
researchers would have used in this study
b. What do you think the headline is implying about
statistical significance? Restate the headline in
terms of statistical significance.
20. Explain why we can specify the probability of making
a type 1 error, given that the null hypothesis is true,
Answers
1. a) Lower-tail. We want to show it will not hold 500
pounds (or more) easily.
b) Reject that the stand can hold at least 500 lbs, when
it can. In other words, They will decide the stands are
unsafe, when they are in fact safe
c) Fail to reject that the stand can only hold at least 500
lbs when it can not. In other words, they will decide
the stands are safe, when they're not.
2. a) Two-sided. If they're too big, they won't fit through
the vein. If they're too small, they probably won't work
well.
b) The catheters are rejected when in fact the diameters
are fine, and the manufacturing process is needlessly
stopped.
c) Catheters that do not meet specifications are allowed
to be produced and sold.
3. a) Increase . This means a larger chance of declaring
the stands safe, if they are not.
b) The probability of correctly detecting that the stands
are not capable of holding more than 500 pounds.
c) Decrease the standard deviation—probably costly.
Increase the sample size—takes more time for testing
and is costly. Increase —more Type I errors. Increase
the "design load" to be well above 500 pounds—again,
costly.
4. a) Increase
b) The probability of correctly detecting deviations
from 2 mm in diameter.
c) Increase
d) Increase the sample size or increase a.
7. a) It is decided the shop is not meeting standards when it
is.
b) The shop is certified as meeting standards when it is
not.
c) Type I
d) Type II
8. a) Deciding there has been an increase in defective items
when there has not.
b) Deciding the number of defectives is small, when it
has increased.
c) Probably Type II, depending on the costs of shutting
the line down. Generally, because of warranty costs
and lost customer loyalty, defects that are caught in the
factory are much cheaper to fix than defects found after
items are sold.
d) Type II
9. a) The probability of detecting the shop is not meeting
standards when it is not.
b) 40 cars. Larger n.
c) 10%. More chance to reject H0.
d) A lot. Larger differences are easier to detect.
10. a) The probability of deciding there are too many
defectives when this is true.
b) Advantage: more power. Disadvantage: more work
testing.
c) Advantage: smaller Type I error chance.
Disadvantage: less power.
d) Increases because the effect size increases.
11. a) One-tailed. The company wouldn't be sued if "too
many" minorities were hired.
b) Deciding the company is discriminating when it is
not.
c) Deciding the company is not discriminating when it
is
d) The probability of correctly detecting discrimination
when it exists.
e) Increases power.
f) Lower, since n is smaller.
12. a) One-tailed because we are interested only in whether
they are more visible. If the new design is less visible,
we don't care how much less visible it is.
b) Deciding the signs are more visible when they are
not.
c) Failing to decide new signs are more visible when
they are.
d) Ability (probability) to detect that a more visible
sign works.
e) It will decrease power, because you'll need more
evidence to reject H0.
f) It will decrease power because it will make the SD
larger.
13. a) One-tailed. Software is supposed to decrease the
dropout rate.
b) H0: p = 0.13; HA: p < 0.13
c) He buys the software when it doesn't help students.
d) He doesn't buy the software when it does help
students.
e) The probability of correctly deciding the software is
helpful.
14. a) H0: p = 0.20; HA: p > 0.20
b) The company wants more "proof" the ad is effective
before deciding it is.
c) The probability of correctly deciding that more than
20% have heard the ad and recognize the product when
it's true.
d) 10%. More chance to reject H0.
e) Lower. Larger n has more power and smaller Type
II error probability.
15. a) z = -3.21, p = 0.0007. The change is statistically
significant. A 95% confidence interval is (2.3%, 8.5%).
This is clearly lower than 13%. If the cost of the
software justifies it, the professor should consider
buying the software,
b) The chance of observing 11 or fewer dropouts in a
class of 203 is only 0.07% if the dropout rate is really
13%.
16. a) z = 1.33, p = 0.0923. The company should not renew
the contract.
b) There is a 9.23% chance of having 133 or more of
600 people in a random sample remember the ad if in
fact 20% of people do.
17. a. z = 2.066
b. p-value = 0.0194
c. Assuming  = 0.05, the probability of obtaining 500
samples with an average proportion of 0.29 is 1.94%;
therefore we will reject the claim that the population
proportion is 0.25
d. We only know that the population proportion is
higher than 0.25 and not the exact value.
18. a. Minor disease with serious treatment, e.g., tonsillitis
and the treatment is surgery
b. Being infected with HIV
19.  = there is an effect when there isn’t;  = there is no
effect when there is.
20. Do not know what part of the alternative hypothesis is
true.
21. a. Fail to reject Ho
b. Rejects Ho
22. a. Ho: baldness has no effect on heart attacks; H1:
baldness has an effect on heart attacks
b.  = there is an effect when there isn’t
c.  = there is no effect when there is.
23.. a. Ho: education has no effect on developing
Alzheimer;
H1: higher education decreases the likelihood of
developing Alzheimer
b. There is high likelihood of developing Alzheimer for
the uneducated.
Download