advertisement

Type I and II Errors and Power 1. TV safety. The manufacturer of a metal stand for home TV sets must be sure that its product will not fail under the weight of the TV. Since some larger sets weigh nearly 300 pounds, the company's safety inspectors have set a standard of ensuring that the stands can support an average of at least 500 pounds. Their inspectors regularly subject a random sample of the stands to increasing weight until they fail. They test the hypothesis H0: = 500 against HA: < 500, using the level of significance = 0.01. If the sample of stands fail to pass this safety test, the inspectors will not certify the product for sale to the general public. a) Is this an upper-tail or lower-tail test? In the context of the problem, why do you think this is important? b) Explain what will happen if the inspectors commit a Type I error. c) Explain what will happen if the inspectors commit a Type II error. 2. Catheters. During an angiogram, heart problems can be examined via a small tube (a catheter) threaded into the heart from a vein in the patient's leg. It's important that the company who manufactures the catheter maintain a diameter of 2.00 mm. (The standard deviation is quite small.) Each day, quality control personnel make several measurements to test H0: = 2.00 against HA: 2.00 at a significance level of = 0.05. If they discover a problem, they will stop the manufacturing process until it is corrected. a) Is this a one-sided or two-sided test? In the context of the problem, why do you think this is important? b) Explain in this context what happens if the quality control people commit a Type I error. c) Explain in this context what happens if the quality control people commit a Type II error. 3. TV safety revisited. The manufacturer of the metal TV stands in Exercise 1 is thinking of revising its safety test. a) If the company's lawyers are worried about being sued for selling an unsafe product, should they increase or decrease the value of ? Explain. b) In this context, what is meant by the power of the test? c) If the company wants to increase the power of the test, what options does it have? Explain the advantages and disadvantages of each option. 4. Catheters again. The catheter company in Exercise 2 is reviewing its testing procedure. a) Suppose the significance level is changed to = 0.01. Will the probability of Type II error increase, decrease, or remain the same? b) What is meant by the power of the test the company conducts? c) Suppose the manufacturing process is slipping out of proper adjustment. As the actual mean diameter of the catheters produced gets farther and farther above the desired 2.00 mm, will the power of the quality control test increase, decrease, or remain the same? d) What could they do to improve the power of the test? 7. Testing cars. A clean air standard requires that vehicle exhaust emissions not exceed specified limits for various pollutants. Many states require that cars be tested annually to be sure they meet these standards. Suppose state regulators double check a random sample of cars that a suspect repair shop has certified as okay. They will revoke the shop's license if they find significant evidence that the shop is certifying vehicles that do not meet standards. a) In this context, what is a Type I error? b) In this context, what is a Type II error? c) Which type of error would the shop's owner consider more serious? d) Which type of error might environmentalists consider more serious? 8. Quality control. Production managers on an assembly line must monitor the output to be sure that the level of defective products remains small. They periodically inspect a random sample of the items produced. If they find a significant increase in the proportion of items that must be rejected, they will halt the assembly process until the problem can be identified and repaired. a) In this context, what is a Type I error? b) In this context, what is a Type II error? c) Which type of error would the factory owner consider more serious? d) Which type of error might customers consider more serious? 9. Cars again. As in Exercise 7, state regulators are checking up on repair shops to see if they are certifying vehicles that do not meet pollution standards. a) In this context, what is meant by the power of the test the regulators are conducting? b) Will the power be greater if they test 20 or 40 cars? Why? c) Will the power be greater if they use a 5% or a 10% level of significance? Why? d) Will the power be greater if the repair shop's inspectors are only a little out of compliance or a lot? Why? 10. Production. Consider again the task of the quality control inspectors in Exercise 8. a) In this context, what is meant by the power of the test the inspectors conduct? b) They are currently testing 5 items each hour. Someone has proposed they test 10 each hour instead. What are the advantages and disadvantages of such a change? c) Their test currently uses a 5% level of significance. What are the advantages and disadvantages of changing to an alpha level of 1%? d) Suppose that as a day passes one of the machines on the assembly line produces more and more items that are defective. How will this affect the power of the test? 11. Equal opportunity? A company is sued for job discrimination because only 19% of the newly hired candidates were minorities when 27% of all applicants were minorities. Is this strong evidence that the company's hiring practices are discriminatory? a) Is this a one-tailed or a two-tailed test? Why? b) In this context, what would a Type I error be? c) In this context, what would a Type II error be? d) In this context, describe what is meant by the power of the test. e) If the hypothesis is tested at the 5% level of significance instead of 1%, how will this affect the power of the test? f) The lawsuit is based on the hiring of 37 employees. Is the power of the test higher than, lower than, or the same as it would be if it were based on 87 hires? 12. Stop signs. Highway safety engineers test new road signs, hoping that increased reflectivity will make them more visible to drivers. Volunteers drive through a test course with several of the new and old style signs and rate which kind shows up the best. a) Is this a one-tailed or a two-tailed test? Why? b) In this context, what would a Type I error be? c) In this context, what would a Type II error be? d) In this context, describe what is meant by the power of the test. e) If the hypothesis is tested at the 1% level of significance instead of 5%, how will this affect the power of the test? f) The engineers hoped to base their decision on the reactions of 50 drivers, but time and budget constraints may force them to cut back to 20. How would this affect the power of the test? Explain. 13. Dropouts. A Statistics professor has observed that for several years about 13% of the students who initially enroll in his Introductory Statistics course withdraw before the end of the semester. A salesman suggests that he try a statistics software package that gets students more involved with computers, predicting that it will cut the dropout rate. The software is expensive, and the salesman offers to let the professor use it for a semester to see if the dropout rate goes down significantly. The professor will have to pay for the software only if he chooses to continue using it. a) Is this a one-tailed or two-tailed test? Explain. b) Write the null and alternative hypotheses. c) In this context, explain what would happen if the professor makes a Type I error. d) In this context, explain what would happen if the professor makes a Type II error. e) What is meant by the power of this test? 14. Ads. A company is willing to renew its advertising contract with a local radio station only if the station can prove that more than 20% of the residents of the city have heard the ad and recognize the company's product. The radio station conducts a random phone survey of 400 people. a) What are the hypotheses? b) The station plans to conduct this test using a 10% level of significance, but the company wants the significance level lowered to 5%. Why? c) What is meant by the power of this test? d) For which level of significance will the power of this test be higher? Why? e) They finally agree to use = .05, but the company proposes that the station call 600 people instead of the 400 initially proposed. Will that make the risk of Type II error higher or lower? Explain. 15. Dropouts, part II. Initially, 203 students signed up for the Stats course in Exercise 13. They used the software suggested by the salesman, and only 11 dropped out of the course. a) Should the professor spend the money for this software? Support your recommendation with an appropriate test. b) Explain carefully what your P-value means in this context. 16. Testing the ads. The company in Exercise 14 contacts 600 people selected at random, and only 133 remember the ad. a) Should the company renew the contract? Support your recommendation with an appropriate test. b) Explain carefully what your P-value means in this context. 17. Suppose that a study is designed to choose between the hypotheses: Null hypothesis: Population proportion is 0.25. Alternative hypothesis: Population proportion is higher than 0.25. On the basis of a sample of size 500, the sample proportion is 0.29. The standard deviation for the potential sample proportions in this case is about 0.02. a. Compute the standardized score corresponding to the sample proportion of 0.29, assuming the null hypothesis is true. b. What is the percentile for the standardized score computed in part a? c. Based on the results of parts a and b, make a conclusion. Be explicit about the wording of your conclusion and justify your answer. d. To compute the standardized score in part a, you assumed the null hypothesis was true. Explain why you could not compute a standardized score under the assumption that the alternative hypothesis was true. 18. Consider the medical testing situation, in which the null hypothesis is that the patient does not have the disease and the alternative hypothesis is that they do. a. Give an example of a medical situation in which a type 1 error would be more serious. b. Give an example of a medical situation in which a type 2 error would be more serious. have provided additional information? 19. For each of the situations in Exercise 18, explain the two errors that could be made and what the consequences would be. but we cannot specify the probability of making a type 2 error, given that the alternative hypothesis is true. 21. Given the convention of declaring that a result is “statistically significant” if the p-value is 0.05 or less, what decision would be made concerning the null and alternative hypotheses in each of the following cases? Be explicit about the wording of the decision. a. P-value = 0.35 b. P-value = 0.04 22. We learned that researchers have discovered a link between vertex baldness and heart attacks in men. a. State the null hypothesis and the alternative hypothesis used to investigate whether or not there is such a relationship. b. Discuss what would constitute a type 1 error in this study. c. Discuss what would constitute a type 2 error in this study. 23. A report in the Davis Enterprise (April 6, 1994, p. A11) was headlined, “Highly educated people are less likely to develop Alzheimer’s disease, a new study suggests.” a. State the null and alternative hypotheses the researchers would have used in this study b. What do you think the headline is implying about statistical significance? Restate the headline in terms of statistical significance. 20. Explain why we can specify the probability of making a type 1 error, given that the null hypothesis is true, Answers 1. a) Lower-tail. We want to show it will not hold 500 pounds (or more) easily. b) Reject that the stand can hold at least 500 lbs, when it can. In other words, They will decide the stands are unsafe, when they are in fact safe c) Fail to reject that the stand can only hold at least 500 lbs when it can not. In other words, they will decide the stands are safe, when they're not. 2. a) Two-sided. If they're too big, they won't fit through the vein. If they're too small, they probably won't work well. b) The catheters are rejected when in fact the diameters are fine, and the manufacturing process is needlessly stopped. c) Catheters that do not meet specifications are allowed to be produced and sold. 3. a) Increase . This means a larger chance of declaring the stands safe, if they are not. b) The probability of correctly detecting that the stands are not capable of holding more than 500 pounds. c) Decrease the standard deviation—probably costly. Increase the sample size—takes more time for testing and is costly. Increase —more Type I errors. Increase the "design load" to be well above 500 pounds—again, costly. 4. a) Increase b) The probability of correctly detecting deviations from 2 mm in diameter. c) Increase d) Increase the sample size or increase a. 7. a) It is decided the shop is not meeting standards when it is. b) The shop is certified as meeting standards when it is not. c) Type I d) Type II 8. a) Deciding there has been an increase in defective items when there has not. b) Deciding the number of defectives is small, when it has increased. c) Probably Type II, depending on the costs of shutting the line down. Generally, because of warranty costs and lost customer loyalty, defects that are caught in the factory are much cheaper to fix than defects found after items are sold. d) Type II 9. a) The probability of detecting the shop is not meeting standards when it is not. b) 40 cars. Larger n. c) 10%. More chance to reject H0. d) A lot. Larger differences are easier to detect. 10. a) The probability of deciding there are too many defectives when this is true. b) Advantage: more power. Disadvantage: more work testing. c) Advantage: smaller Type I error chance. Disadvantage: less power. d) Increases because the effect size increases. 11. a) One-tailed. The company wouldn't be sued if "too many" minorities were hired. b) Deciding the company is discriminating when it is not. c) Deciding the company is not discriminating when it is d) The probability of correctly detecting discrimination when it exists. e) Increases power. f) Lower, since n is smaller. 12. a) One-tailed because we are interested only in whether they are more visible. If the new design is less visible, we don't care how much less visible it is. b) Deciding the signs are more visible when they are not. c) Failing to decide new signs are more visible when they are. d) Ability (probability) to detect that a more visible sign works. e) It will decrease power, because you'll need more evidence to reject H0. f) It will decrease power because it will make the SD larger. 13. a) One-tailed. Software is supposed to decrease the dropout rate. b) H0: p = 0.13; HA: p < 0.13 c) He buys the software when it doesn't help students. d) He doesn't buy the software when it does help students. e) The probability of correctly deciding the software is helpful. 14. a) H0: p = 0.20; HA: p > 0.20 b) The company wants more "proof" the ad is effective before deciding it is. c) The probability of correctly deciding that more than 20% have heard the ad and recognize the product when it's true. d) 10%. More chance to reject H0. e) Lower. Larger n has more power and smaller Type II error probability. 15. a) z = -3.21, p = 0.0007. The change is statistically significant. A 95% confidence interval is (2.3%, 8.5%). This is clearly lower than 13%. If the cost of the software justifies it, the professor should consider buying the software, b) The chance of observing 11 or fewer dropouts in a class of 203 is only 0.07% if the dropout rate is really 13%. 16. a) z = 1.33, p = 0.0923. The company should not renew the contract. b) There is a 9.23% chance of having 133 or more of 600 people in a random sample remember the ad if in fact 20% of people do. 17. a. z = 2.066 b. p-value = 0.0194 c. Assuming = 0.05, the probability of obtaining 500 samples with an average proportion of 0.29 is 1.94%; therefore we will reject the claim that the population proportion is 0.25 d. We only know that the population proportion is higher than 0.25 and not the exact value. 18. a. Minor disease with serious treatment, e.g., tonsillitis and the treatment is surgery b. Being infected with HIV 19. = there is an effect when there isn’t; = there is no effect when there is. 20. Do not know what part of the alternative hypothesis is true. 21. a. Fail to reject Ho b. Rejects Ho 22. a. Ho: baldness has no effect on heart attacks; H1: baldness has an effect on heart attacks b. = there is an effect when there isn’t c. = there is no effect when there is. 23.. a. Ho: education has no effect on developing Alzheimer; H1: higher education decreases the likelihood of developing Alzheimer b. There is high likelihood of developing Alzheimer for the uneducated.