252y0511 2/25/05 (Open in ‘Print Layout’ format) Possible Rubric for Statistics Exams. I have been hearing a lot about rubrics lately, and have taken a while to be assured that they are not the materials that the third pig built his house out of. My first attempt at this came to me in a recent assessment meeting. 1. 2. 3. 4. 5. Did the student make a good effort to understand the question? This would include asking the instructor and consulting notes and texts if he/she did not understand what was desired. Was the method used to solve the problem the best and most appropriate for the problem? Was the method used correctly? Did the student present the solution in such a way that the instructor can understand how the student got the answers presented? This should include all formulas, equations and tables used. Is it evident from the way the work is presented that the student understood what he/she was doing? Is it legible? Was the conclusion stated clearly? Was the null hypotheses rejected or not rejected? What were the implications of the conclusion for a relevant goal, for example the decision to buy a new product? In view of what was said here, it is incredible that, on every exam I give, students give me confidence intervals and tests for means when I ask for confidence intervals and tests for medians, variances and even proportions. Check the wording on the questions that you misunderstood. Though one question on the multiple choice part of the exam takes a bit of thought, can you identify what wording in the question made you think it was about a mean? Can you tell me what it was? It is also remarkable that there are any people out there who do not know that proportions, probabilities and p-values (which are probabilities) must be between one and zero. It is also amazing to me that that so many of you cannot express the difference between t and z . In the most practical sense a value of t comes from the t table and must be used with s , the sample standard deviation in confidence intervals and tests for the population mean. There are only a few other cases where we use t and they will be discussed later in the course. On the other hand z , which comes only from the bottom line of the t table, but can be calculated using the table of the standardized Normal distribution, must be used with , the population standard deviation, in confidence intervals and tests for the mean. z is also used in large sample tests for the population proportion, population mean, population standard deviation, population median and the means of the Poisson and Binomial distribution if the correct formulas are used, but don’t push it. The Normal distribution should not be used if more accurate methods are available. In any case, look at Things You Should Never Do on and Exam or Anywhere Else before you do another assignment and frequently thereafter. Cheating Most instructors would consider any collusion on the take-home exam cheating. I am a bit more lenient, since I believe peer-learning is important. But there are limits. Helping one another with methodology is acceptable, but when students copy one another’s numbers, it is not acceptable. On this exam, I saw many mistakes that I doubt that the individual would have made if he/she were working alone. Furthermore, some copying was so blatant that errors appeared because one student could not read another student’s handwriting. (Several papers copied ‘interval’ as ‘internal’.) This sort of communication, where one person is doing all the work and a second individual is simply copying, and copying badly, is not cooperation. I am afraid that more evidence of this sort of cheating will send me and the exams to the dean. 1 252y0511 2/25/05 (Open in ‘Print Layout’ format) ECO 252 QBA2 FIRST HOUR EXAM February 28 2005 Name ___KEY___________ Hour of class registered _____ Class attended if different ____ Show your work! Make Diagrams! Exam is normed on 50 points. Answers without reasons are not usually acceptable. I. (8 points) Do all the following. x ~ N 1.5, 6 Do not make diagrams of x with zero in the middle. Make up your mind! If you are diagramming x , put the mean in the middle; if you are diagramming z put zero in the middle. Copies of the diagrams are below, but need a vertical line at 1.5 or zero to be x completely useful. Remember z and that this equation implies that if we have z and need a value of x , as in parts j-n, we use x z . 21 1.5 0 1.5 z 1. P0 x 21 .00 P P 0.25 z 3.25 P 0.25 z 0 P0 z 3.25 6 6 .0987 .4994 .5981 Make a diagram! Your diagram for x should show a Normal curve with a vertical line at 1.5 in the middle and the area shaded from zero to 21. Your diagram for z shows a Normal curve with a vertical line at zero in the middle and the area shaded from - 0.25 to 2.14. Because this area is on both sides of zero, add. Normal Curv e with Mean 1.5 and Standard Dev iationN 6 orm al Curv e with Mean 0 and Standard Dev iation 1 The Area Between 0 and 21 is 0.5981 The Area Between -0.25 and 3.25 is 0.5981 0.07 0.4 0.06 0.3 0.04 Density Density 0.05 0.03 0.2 0.02 0.1 0.01 0.00 -20 -10 0 Da ta A x is 10 0.0 20 -5.0 -2.5 0.0 Da ta A x is 2.5 5.0 Graph for x Graph for z 1.50 1.5 7.00 1.5 z 2. P7.00 x 1.50 P P 1.42 z 0 .4222 6 6 Make a diagram! Your diagram for z shows a Normal curve with a vertical line at zero in the middle and the area shaded from – 1.42 to zero. Because this area starts at zero, you do not need to add or subtract. Normal Curv e with Mean 1.5 and Standard Dev iationN 6 orm al Curv e with Mean 0 and Standard Dev iation 1 The Area Between -7 and 1.5 is 0.4217 The Area Between -1.42 and 0 is 0.4222 0.07 0.4 0.06 0.3 0.04 Density Density 0.05 0.03 0.2 0.02 0.1 0.01 0.00 -20 -10 0 Da ta A x is 10 20 0.0 -5.0 -2.5 0.0 Da ta A x is 2.5 5.0 2 252y0511 2/25/05 (Open in ‘Print Layout’ format) 10 .22 1.5 3. Px 10 .22 P z Pz 1.45 Pz 0 P0 z 1.45 .5 .4265 .0735 6 Normal Curv e with Mean 1.5 and Standard Dev iationN 6 ormal Curv e with Mean 0 and Standard Dev iation 1 The Area to the Right of 10.22 is 0.0731 The Area to the Right of 1.45 is 0.0735 0.07 0.4 0.06 0.3 0.04 Density Density 0.05 0.03 0.2 0.02 0.1 0.01 0.00 4. -20 -10 0 Da ta A x is 10 0.0 20 -5.0 -2.5 0.0 Da ta A x is 2.5 5.0 x.08 Make a diagram showing a Normal curve centered at zero, with 8% above z .08 , 42% between z .08 and zero and 50% below zero. Recall that z .08 is 8% from the top of the distribution and 50% - 8% = 42% from zero. So P0 z z.08 .4200 . According to the Normal table P0 z 1.40 .4192 and P0 z 1.41 .4207 . So z .08 is between 1.40 and 1.41, and either would be an acceptable answer, but z .08 1.405 would be better. Now, if we use x z , we get x.08 1.5 1.406 9.90 or x.08 1.5 1.405 6 9.93 or x.08 1.5 1.416 9.96 . Check: Px 9.93 9.93 1.5 P z Pz 1.405 Pz 0 P0 z 1.41 .5 .4207 .0793 .08 . 6 Density Nor mal C ur ve with M ean 1 .5 and Standar d Deviation 6 T he A r ea to the Right of 9 .9 is 0 .0 8 0 8 0.050 0.025 0.000 -20 -10 0 10 20 30 20 30 20 30 Data Axis Density Nor mal C ur ve with M ean 1 .5 and Standar d Deviation 6 T he A r ea to the Right of 9 .9 3 is 0 .0 8 0 0 0.050 0.025 0.000 -20 -10 0 10 Data Axis Density Nor mal C ur ve with M ean 1 .5 and Standar d Deviation 6 T he A r ea to the Right of 9 .9 6 is 0 .0 7 9 3 0.050 0.025 0.000 -20 -10 0 10 Data Axis 3 252y0511 2/25/05 (Open in ‘Print Layout’ format) II. (5 points-2 point penalty for not trying part a.) (Mansfield) A random sample is taken of the length in feet of aluminum foil rolls. The following data is found. (Recomputing what I’ve done for you is a great way to waste time.) x x2 1 2 3 4 5 6 7 Sum 74.88 75.86 74.81 74.28 74.35 73.41 74.66 522.25 5607.0144 5754.7396 5596.5361 5517.5184 5527.9225 5389.0281 5574.1156 38966.8747 a. Compute the sample standard deviation, s , of the waiting times. Show your work! (2) b. Compute a 99% confidence interval for the mean, . (2) c. Is the population mean significantly different from 75.8 ft? (1) x 522 .25 74.6071 Solution: a. x s 2 x 2 nx 2 38966 .8747 774 .6071 2 6 n 7 n 1 3.3384 0.55640 s 0.55640 0.7459 . Note that excessive rounding can throw this answer 6 way off. Using x 74.6071 , I got s 2 0.05167 and s 0.2273 . b. Compute a 99% confidence interval for the mean, . (2) Given: x 76 .6071 , s 0.7459 , n 7 and .01 . So 0.55640 6 3.707 0.079486 0.2819 , DF n 1 6 and t .005 7 n 7 x t 2 s x 74.6071 3.707 0.2819 74.607 1.045 or 73.562 to 75.652. sx s 0.7459 c. Since 75.8 is not on the confidence interval, it is significantly different from the sample mean of 74.6071. 4 252y0511 2/25/05 (Open in ‘Print Layout’ format) III. Do all of the following Problems (18+ points) Show your work except in multiple choice questions. (Actually – it doesn’t hurt there either.) If the answer is ‘None of the above,’ put in the correct answer. 1. When a p-value is smaller than a significance level a) A type one error has been committed b) A type two error has been committed c) *The null hypothesis is rejected d) The alterative hypothesis is rejected e) The critical value is correct. 2. The t distribution should be used when the parent (underlying) population a) *Is Normal, the population standard deviation is unknown and we are testing a mean. b) Is Normal, the population standard deviation is known and we are testing a mean. c) Is Normal, the mean of the population is unknown and we are testing a mean. d) Is binomial and we are testing for a proportion. e) The t distribution should be used in all of these cases. 3. The Normal distribution can be used in all of the cases below except when: a) We are testing a mean, the population standard deviation is unknown and the sample is large. b) We are testing a proportion and the sample is large. c) We are testing a variance and the sample is large. d) We are testing the mean of a Poisson distribution and the sample is large. e) *All of the above are cases when the Normal distribution can be used. [6] 4. (Lange) The state wants to estimate the proportion of the labor force that was unemployed in North Hotzeplotz and wants to be 99% confident that their estimate is within 5% (written 0.05) of the population proportion. If the proportion is probably about 15%, how large a sample is needed? (3) [9] Solution: The outline says “The usually suggested formula is n pqz 2 , but since p is usually e2 unknown, a conservative choice is to set p 0.5 . This is the formula everyone forgets that we covered.” We don’t need to use .5, since we have an estimate of p .15, which implies q 1 p .85. So we try n pqz 2 e2 .15.85 2.576 2 .05 2 338 .42 . Use at least 339 respondents. 5 252y0511 2/25/05 (Open in ‘Print Layout’ format) 5. I wish to do a test to see if the average level of satisfaction of my employees is above 75 on a zero to 100 scale. I take a survey of 30 of my 100 employees and get a mean of 76. I assume that the data is Normally distributed with a standard deviation of 8. What are my null and alternate hypotheses? a) H 0 : 75 and H 1 : 75 b) H 0 : 75 and H 1 : 75 * H 0 : 75 and H 1 : 75 d) H 0 : 75 and H 1 : 75 c) H 0 : 75 and H 1 : 75 f) H 0 : 75 and H 1 : 75 g) None of the above. e) 6. I wish to do a test to see if the average level of satisfaction of my employees is above 75 on a zero to 100 scale. I take a survey of 30 of my 100 employees and get a mean of 76. I assume that the data is Normally distributed with a standard deviation of 8. Assume that your null and alternate hypotheses in question 5 are correct and that you significance level is 92%. Find a critical value for the sample mean. Show clearly what formulas you are using. (3) [14] Solution: H 0 : 75 and H 1 : 75 . .08 , n 30, N 100 , 0 75 , 8, x 76 . The formula table says. Interval for Confidence Hypotheses Test Ratio Critical Value Interval Mean ( x 0 x z 2 x xcv 0 z 2 x H0 : 0 z known) x H1 : 0 x n Mean ( x 0 x t 2 s x xcv 0 t 2 s x H0 : 0 t unknown) sx H1 : 0 DF n 1 s sx n The alternate hypothesis says that the mean is above 75, so we find a critical value above 75. You found z .08 in Part I of this exam!! x cv 0 z x 75 z.08 n N n N 1 6470 100 30 75 1.405 1.5085 75 1.405 1.228 75 1.405 3099 30 100 1 75 1.73 75.7 You would get the same value if you used z.08 1.40 or z .08 1.41 . Note: The 75 1.405 8 finite population correction had to appear on the exam, but without it x 7. 1.46 . n (Dummeldinger)A movie rental chain is considering opening a new outlet. The corporation will open an outlet only if more than 5000 out of the 20000 households in the area have DVD players. It randomly selects 300 households and finds that 96 have DVD players. What are our null and alternative hypotheses? a) H 0 : 5000 and H1 : 5000 b) H 0 : p 5000 and H1 : p 5000 c) H 0 : p .32 and H1 : p .32 d) * H 0 : p .25 and H1 : p .25 e) H 0 : .25 and H1 : .25 f) H 0 : 5000 and H1 : 5000 [16] Note the p 96 .32 is a statistic, not a parameter and that we know it is true. 300 6 252y0511 2/25/05 (Open in ‘Print Layout’ format) 8. We wish to determine if the median income in an area exceeds $40000. A random sample of 250 households was selected. 104 had incomes above $40000. Let p be the proportion with incomes above $40000. Our null hypotheses include (3): a) H 0 : 40000 b) H 0 : 40000 H 0 : p 0.5 d) * H 0 : p 0.5 e) H 0 : p 0.5 f) H 0 : 40000 g) * H 0 : 40000 h) H 0 : 0.5 i) H 0 : 0.5 [19] Explanation: “The median income in an area exceeds $40000” is written 40000 and must be an alternative hypothesis, since it does not contain an equality. Its opposite is H 0 : 40000 . The table in the outline reads as below. Hypotheses about Hypotheses about a proportion a median If p is the proportion If p is the proportion above 0 below 0 c) H 0 : 0 H 1 : 0 H 0 : 0 H 1 : 0 H 0 : 0 H 1 : 0 H 0 : p .5 H 1 : p .5 H 0 : p .5 H 1 : p .5 H 0 : p .5 H 1 : p .5 H 0 : p .5 H 1 : p .5 H 0 : p .5 H 1 : p .5 H 0 : p .5 H 1 : p .5 9. We wish to determine if the median income in an area exceeds $40000. A random sample of 250 households was selected. 104 had incomes above $40000. Let p be the proportion with incomes above $40000. Assume that your null hypotheses in 8) are correct and test the hypothesis. a) Using a test ratio and a p-value. (2). b) Using a critical value for x , p, s or the median as appropriate. (2). c) Using a confidence interval. (2, 3 or 5 depending on your level of chutzpah). d) (Extra Credit) Redo a), b) or c) assuming that the sample of 250 came from a population of 2000. Solution: a) From the formula table we have: Interval for Confidence Hypotheses Test Ratio Critical Value Interval Proportion p p0 p p z 2 s p pcv p0 z 2 p H 0 : p p0 z p H1 : p p0 pq p0 q0 sp p n n q 1 p q 1 p 0 0 7 252y0511 2/25/05 (Open in ‘Print Layout’ format) H : 40000 The Hypotheses were 0 and are now H 1 : 40000 p 104 .416 and I assumed that .05 . p 250 sp a) H 0 : p 0.5 . So n 250 , p 0 0.5, H 1 : p 0.5 0.50.5 .001 .03162 , 250 0.416 0.584 .0009718 .03117 250 Using a test ratio and a p-value. (2). z p p0 p .416 .5 2.66 . Since this is a right.03162 sided test, pvalue Pz 2.66 .5 .4941 .9841 . Since this is above .05, do not reject the null hypothesis. Normal Curve with Mean 0 and Standard Deviation 1 The Area to the Right of -2.66 is 0.9961 0.4 Density 0.3 0.2 0.1 0.0 -5 -4 -3 -2 -1 0 Data A xis 1 2 3 4 b) Using a critical value for x , p, s or the median as appropriate. (2). The only reasonable possibility is a critical value for p . Since the alternative hypothesis is p .5 , we need a critical value above 0.5. pcv p0 z.05 p .5 1.645.03162 .552 . Make a diagram c) of a Normal curve with a mean at 0.05 and a rejection zone above 0.552. Since p .416 is not in the rejection zone, do not reject the null hypothesis. Using a confidence interval. (2, 3 or 5 depending on your level of chutzpah). If we do it the easy way , use a one-sided confidence interval in the same direction as the alternative hypothesis. p p z s p .416 1.645.03117 .364. Since it is not impossible for the proportion to be both at most 0.467 and at least 0.5, reject the null hypothesis. Make a diagram with an almost Normal curve centered at .416. To represent the confidence interval, show a shaded area above .416. To represent the null hypotheses show a second shaded area below .5. Note that they overlap. Actually, it is sufficient to show that p 0 .5 is in the first shaded area. If you have real nerve try a confidence interval for the median. It says in the outline to use x k , where k n 1 z .2 n is the lower limit in a 2-sided confidence interval. The 2 upper limit would be n k 1 . Since our alternative hypothesis is H 1 : 40000 , we want a lower limit in a 1-sided confidence interval. So we use k n 1 z .05 n 2 251 1 1.645 250 112 .495 . Our interval will be x113 , where x113 is the 113th 2 number when the numbers are put in order. We can say P x113 .95 and if x113 is above 40000, reject the null hypothesis. 8 252y0511 2/25/05 (Open in ‘Print Layout’ format) d) (Extra Credit) Redo a), b) or c) assuming that the sample of 250 came from a population pq N n 0.426 0.584 1750 of 2000. In Section 8.7, the text says to use s p n N 1 250 1999 .008507 .02917 in a confidence interval. By analogy p p0 q0 n N n N 1 .5.5 1750 .008754 .09356 would be used in test ratios or critical values. This 250 1999 is the variance of the Hypergeometric distribution. 9 252y0511 2/25/05 (Open in ‘Print Layout’ format) ECO252 QBA2 FIRST EXAM February 28 2005 TAKE HOME SECTION Name: _________________________ Student Number and class: _________________________ IV. Do at least 3 problems (at least 7 each) (or do sections adding to at least 20 points - Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H 1 where appropriate. You have not done a hypothesis test unless you have stated your hypotheses, run the numbers and stated your conclusion.. (Use a 95% confidence level unless another level is specified.) Answers without reasons are not usually acceptable. 1. (Dummeldinger) You are an automobile manufacturer and the EPA has just estimated that your 2005 Prejector model gets 35 miles per gallon on the highway. You wish to prove that the Prejector gets more than 35 mpg. 50 of the current model are tested with the results below. To personalize the data below take the last digit of your student number, divide it by 10 and add it to the numbers below. (For example, Seymour Butz’s student number is 976502, so he will add 0.20 and change the data to 44.64, 48.04, 37.57 etc. – but see the hint below, you do not need to write down the numbers that you are using, just your computations.) Miles per gallon 44.44 47.84 34.59 32.02 35.61 42.56 40.92 33.56 44.26 21.41 42.70 44.70 37.37 36.27 35.80 46.52 24.56 38.24 41.37 33.47 43.45 35.72 43.58 33.07 39.57 23.55 51.16 45.40 37.59 41.49 21.05 44.28 29.14 29.54 34.07 48.02 43.41 41.86 42.31 23.98 24.03 35.20 27.58 41.13 32.18 39.03 44.44 36.78 35.47 33.88 x 1860 .17 , x 71904 .65, x a x na, x a2 x 2 2ax a 2 x 2 2ax a 2 x 2 2a x na 2 Hint: n 50, 2 Assume that the Normal distribution applies to the data and use a 99% confidence level. a. Find the sample mean and sample standard deviation of the incomes in your data, showing your work. (1) b. State your null and alternative hypotheses (1) c. Test the hypothesis using a test ratio (1) d. Test the hypothesis using a critical value for a sample mean. (1) e. Test the hypothesis using a confidence interval (1) f. Find an approximate p-value for the null hypothesis. (1) g. On the basis of your tests, is the EPA right? Why? (1) h. Assume that the Normal distribution does not apply and, using the data as given above, test that the median is above 35. (3) i. (Extra credit) Again, use the data as given and do an approximate 99% 2-sided confidence interval for the median. 2. 3. Once again, assume that the Normal distribution applies, but assume a population standard deviation of 7 and that we are testing whether the mean is below 36 mpg. (99% confidence level) a. State your null and alternative hypotheses(1) b. Find a p-value for the null hypothesis using the mean that you found in a. (1) c. Create a power curve for the test. (6) a. Assume that you are testing the hypothesis 36 using the original data. Let p be the proportion of the data above p .5. Using a 99% confidence level find a critical 36, so that, according to the outline, your alternate hypothesis is value for p , how many items in the sample of 50 would have to be above 36 for you to reject the null hypothesis (This answer should either say ‘between 0 and ?’ or ‘between ? and 50.’) (2) b. Using the proportion of numbers above 36 in the original data, find a p-value for the null hypothesis. (1) c. (Extra credit) Create a power curve for the test by using the alternate hypothesis in b and finding the power for other values of p1 . (up to 6) d. Assume that p .5 , how large a sample would you need to estimate the proportion above 36 with an error of .01? How much would you cut down the sample size if you used the proportion that you actually found? Illustrate how much the required sample size would fall if you lowered the confidence level. (3) e. Use the proportion that you found in 3b) to create a 2-sided confidence interval for the proportion above 36. Does it differ significantly from .5? Why? (2) 10 252y0511 2/25/05 (Open in ‘Print Layout’ format) 4. a. Take the standard deviation that you found in 1), add the same quantity that you added in part 1) to it. (For example, Seymour Butz’s student number is 976502 and he found s 7.12 , so he will add 0.20 to it and use 7.32.) b. Test the hypothesis that the standard deviation is 6. (99% confidence level) Use a test ratio. (2) Find a p-value for your answer in 4a). (1) c. Do a 99% confidence interval for the standard deviation (2) d. (Extra credit) Redo 4a) using an appropriate confidence interval. (2) e. (Extra credit) Find critical values for s in 4a). (1) f. A bank's average default rate on loans is supposedly 7 per month. In the first month there are 13 defaults. Test the first assertion assuming a Poisson distribution. Use a two-sided test with a 1% significance level. (2) g. In 4f) find what values of x (the number of defaults in the first month) would enable you not to reject the null hypothesis. (2) h. (Extra credit) Assume that the bank, in fact, has an average default rate on loans of 9 per month, what is the probability that you will fail to reject your null hypothesis that the mean is 7, using the ‘accept’ zone that you found in g)? 1. (Dummeldinger) You are an automobile manufacturer and the EPA has just estimated that your 2005 Prejector model gets 35 miles per gallon on the highway. You wish to prove that the Prejector gets more than 35 mpg. 50 of the current model are tested with the results below. To personalize the data below take the last digit of your student number, divide it by 10 and add it to the numbers below. (For example, Seymour Butz’s student number is 976502, so he will add 0.20 and change the data to 44.64, 48.04, 37.57 etc. – but see the hint below, you do not need to write down the numbers that you are using, just your computations.) Miles per gallon 44.44 47.84 34.59 32.02 35.61 42.56 40.92 33.56 44.26 21.41 42.70 44.70 37.37 36.27 35.80 46.52 24.56 38.24 41.37 33.47 43.45 35.72 43.58 33.07 39.57 23.55 51.16 45.40 37.59 41.49 21.05 44.28 29.14 29.54 34.07 48.02 43.41 41.86 42.31 23.98 24.03 35.20 27.58 41.13 32.18 39.03 44.44 36.78 35.47 33.88 x 1860 .17 , x 71904 .65, x a x na, x a2 x 2 2a x na2 Hint: n 50 , 2 Assume that the Normal distribution applies to the data and use a 99% confidence level. a. Find the sample mean and sample standard deviation of the incomes in your data, showing your work. (1) Seymour wouldn’t use the formulas above, so he actually added all the numbers and their squares. x 1861.17 and x 2 71979.1. If he had had the sense to use the formulas above, He got he would have found x 0.20 1860 .17 50.02 1861 .17 and x 0.202 71904 .65 2.02 1860 .17 50.02 2 71904 .65 74.41 0.02 71979 .08 So I will use and n 50 . x 1861 .17 , x 2 71979 .08 x x 1861 .17 37.223 s2 sx s n 55 .133 1.050 and n 50. 50 n x 50 2 nx 2 n 1 71979 .08 50 37 .223 2 55 .133 49 s 55 .133 7.425 11 252y0511 2/25/05 (Open in ‘Print Layout’ format) b. State your null and alternative hypotheses (1) H 0 : 35 and H 1 : 35 The formula table says: Interval for Confidence Hypotheses Interval Mean ( x t 2 s x H0 : 0 unknown) H : DF n 1 1 Test Ratio t 0 Critical Value xcv 0 t 2 s x x 0 sx sx s n 49 Here .01, 0 35, x 37 .223 , s x 1.050 and n 50. For a one sided test use t .01 2.405 . c. Test the hypothesis using a test ratio (1) x 0 37 .223 35 2.117 . Make a diagram showing an almost-Normal curve with a t 1.050 sx 49 2.405 . Since 2.117 is not in the vertical bar at t 0 . Show a 1% rejection zone above t .01 rejection zone, do not reject the null hypothesis. d. Test the hypothesis using a critical value for a sample mean. (1) The alternative hypothesis tells us that we need a critical value above 35. The form will be x cv 0 t s x 35 2.405 1.050 37.525 . Make a diagram showing an almost-Normal curve with a vertical bar at 0 35 . Show a 1% rejection zone above 37.525. Since x 37 .223 is not in the rejection zone, do not reject the null hypothesis. e. Test the hypothesis using a confidence interval (1) In view of the alternative hypothesis, the one-sided confidence interval will have the form x t s x 37.223 2.405 1.050 34.70 . Make a diagram showing an almost-Normal curve with a vertical bar at x 37 .223 . Represent the confidence interval by shading the entire area above 34.70. Represent the null hypothesis, H 0 : 35 by shading the entire area below 35. Since these areas overlap, the confidence interval does not contradict the null hypothesis. f. Find an approximate p-value for the null hypothesis. (1) Recall that t x 0 sx 37 .223 35 2.117 . Since we are doing a right-sided test, the p-value would be Pt 2.117 . 1.050 Remember that we have n 1 50 1 49 degrees of freedom. If we try to locate 2.117 on the 49df line of the t table we find that it is between t 49 2.010 and t 49 2.405 . Remember that .025 .01 49 49 .025 and P t t .01 .01 . Since 2.117 lies between by the definition of the symbols, P t t .025 the two values of t that we found on the table, .025 Pt 2.117 .01 , or .025 pvalue .01 . This is verified by running tAreaA (See next page.). If we use .01, we can see that the p-value is above the significance level, which means we do not reject the null hypothesis. 12 252y0511 2/25/05 (Open in ‘Print Layout’ format) t Curve with 49 Degrees of Freedom and Standard Deviation 1.02105 The Area to the Right of 2.117 is 0.0197 0.4 Density 0.3 0.2 0.1 0.0 -5 -4 -3 -2 -1 0 Data A xis 1 2 3 4 g. On the basis of your tests, is the EPA right? Why? (1) Since we have not rejected the null hypothesis, H 0 : 35 , we cannot dispute the EPA’s statement that the mean gas mileage is 35 mpg. h. Assume that the Normal distribution does not apply and, using the data as given above, test that the median is above 35. (3) The original data is repeated with values above 35 starred. 44.44* 34.59 35.61* 40.92* 44.26* 42.70* 47.84* 32.02 42.56* 33.56 21.41 44.70* 37.37* 36.27* 35.80* 46.52* 24.56 38.24* 41.37* 33.47 43.45* 35.72* 43.58* 33.07 39.57* 23.55 51.16* 45.40* 37.59* 41.49* 21.05 44.28* 29.14 29.54 34.07 48.02* 43.41* 41.86* 42.31* 23.98 24.03 35.20* 27.58 41.13* 32.18 39.03* 44.44* 36.78* 35.47* 33.88 It looks to me as if 33 out of 50 are above 35. Using the outline, we have ‘The median mpg is above 35’ is written 35 and must be an alternative hypothesis since it does not contain an equality. Its opposite is H 0 : 35 . Let us say that p is the proportion above 35. The table in the outline reads as below. Hypotheses about a median Hypotheses about a proportion If p is the proportion If p is the proportion above 0 below 0 H 0 : 0 H 0 : p .5 H 0 : p .5 H 1 : p .5 H 1 : p .5 H 1 : 0 H 0 : 0 H 0 : p .5 H 0 : p .5 H 1 : p .5 H 1 : 0 H 1 : p .5 H 0 : 0 H 0 : p .5 H 0 : p .5 H : 0 H 1 : p .5 H 1 : p .5 1 So our hypotheses become H 0 : p .5 and H 1 : p .5 . There are three ways to go from here. (α) According to the binomial table with p .5 and n 50 , pvalue Px 33 1 Px 32 1 .98358 .0164 . This is below .05 . (β) If we do a conventional test of a proportion, p p 33 .66 , 50 p p 0 .66 .5 p0 q0 .5.5 2.263 . We can test this .0707 , and z p .0707 n 50 against z .05 1.645 or z.01 2.327 or we can say pvalue Pz 2.263 .5 P0 z 2.26 .5 .4881 .0119 . 13 252y0511 2/25/05 (Open in ‘Print Layout’ format) (γ) From the outline, we could also use z 2x n 233 50 n 2.263 . This is identical 50 to (β). These all lead to a rejection of our null hypothesis if .05 or to a failure to reject the hypothesis if .01 . i. (Extra credit) Again, use the data as given and do an approximate 99% 2-sided confidence interval for the median. The original data is presented in order. x x x Rank Rank Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 21.05 21.41 23.55 23.98 24.03 24.56 27.58 29.14 29.54 32.02 32.18 33.07 33.47 33.56 33.88 34.07 34.59 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 It says in the outline to use x k , where k 35.20 35.47 35.61 35.72 35.80 36.27 36.78 37.37 37.59 38.24 39.03 39.57 40.92 41.13 41.37 41.49 41.86 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 42.31 42.56 42.70 43.41 43.45 43.58 44.26 44.28 44.44 44.44 44.70 45.40 46.52 47.84 48.02 51.16 n 1 z .2 n is the lower limit in a 2-sided confidence interval. 2 The upper limit would be n k 1 . If we want a 2-sided interval with .01, use z .005 2.576 . So 50 1 2.576 50 16.392 , which rounds down to 16 and n k 1 50 16 1 35 . From the data 2 above x16 34 .07 and x35 42 .31 . So we can say that P34.07 42.31 .99 Let us try to verify this by use of the binomial table with p .5 and n 50 . P16 x 35 Px 35 Px 15 .99870 .00330 .9954 . The next smallest interval would be x17 to x34 and P17 x 34 Px 34 Px 16 .99670 .00767 .9890 is too small if we want to be conservative. k 2. Once again, assume that the Normal distribution applies, but assume a population standard deviation of 7 and that we are testing whether the mean is below 36 mpg. (99% confidence level) a. State your null and alternative hypotheses (1) H 0 : 36 and H 1 : 36 . b. Find a p-value for the null hypothesis using the mean that you found in a. (1) Remember that n 50 and that Seymour found x 37.223 . If 7, x z x 0 x 7 n 50 37 .223 36 1.24 and since this is a left-sided test, 0.9899 49 0.9899 . 50 pvalue Pz 1.24 .5 .3925 .8925 . 14 252y0511 2/25/05 (Open in ‘Print Layout’ format) c. Create a power curve for the test. (6) Remember that we had H 0 : 36 and H 1 : 36 , so that the critical value must be below 36. The formula table gives us xcv 0 z x , which 2 becomes x cv 0 z x 36 2.327 0.9899 33.697 . The diagram for the test is a Normal curve centered on a vertical line at 36. Below 36, the value 33.697 cuts off a 1% rejection zone. So we do not reject the null hypothesis if the sample mean is above 33.697. Let us use the following points for our operating characteristic curve: 36.00, 34.8, 33.697, 32.4 and 31.2. Since the difference between 36 and 33.697 is about 2.4, I used a distance of half that or 1.2 to pick my points. We will not reject the null hypothesis if the sample mean is above 33.697. x 0.9899 .So 33 .697 36 Pz 2.33 .5 .4901 .9901 1 0.9899 Power 1 1 .9901 .0099 Px 33.697 36 P z Px 33.697 34.8 P z Power 1 1 .8665 .1335 33 .697 34 .8 Pz 1.11 .5 .3665 .8665 0.9899 Px 33.697 33.697 P z 33 .45 33 .45 P z 0 .5 0.9899 Power 1 1 .5 .5 Px 33.697 32.4 P z 33 .697 32 .4 Pz 1.31 .5 .4049 .0951 0.9899 Px 33.697 31.2 P z 33 .697 31 .2 Pz 2.52 .5 .4941 .0059 0.9899 Power 1 1 .0951 .9049 Power 1 1 .0059 .9941 The power curve is a simple graph of these points. The x axis goes from about 31 to 36 and the y axis from zero to 1. The curve falls from almost 1 or 100% to .01 or 1% at 36. 3. a. Assume that you are testing the hypothesis 36 using the original data. Let p be the proportion of the data above 36, so that, according to the outline, your alternate hypothesis is p .5. Using a 99% confidence level find a critical value for p , how many items in the sample of 50 would have to be above 36 for you to reject the null hypothesis (This answer should either say ‘between 0 and ?’ or ‘between ? and 50.’) (2) We had H 0 : 36 and H 1 : 36 which (according to the table on page 12) became H 0 : p .5 and H 1 : p .5 . We need a critical value for a one-sided test that is below .5. We p0 q0 .5.5 .0707 , so that pcv p0 z p n 50 .5 2.327 .0707 .3354 . This is about 16 items out of 50, so that, if there are between 0 and 16 items above 50, we reject the null hypothesis. If you have more sense than I did and use the Binomial table instead, you will find that Px 16 .00767 is the highest probability below 1% on the n 50, p .5 part of the table. have already seen that p b. Using the proportion of numbers above 36 in the original data, find a p-value for the null hypothesis. (1) If you look at the numbers in order on page 14, you will see that 22 are below 36 and 28 are above 28 .56 . Since this is a left-sided test, pvalue P p .56 36. The proportion is thus p 50 15 252y0511 2/25/05 (Open in ‘Print Layout’ format) .56 .5 Pz Pz 0.84 .5 .2995 .7995 . The killjoys who used the Binomial table got .0707 a much more accurate value of Px 28 .83888 . c. (Extra credit) Create a power curve for the test by using the alternate hypothesis in b and finding the power for other values of p1 . (up to 6) Remember that the hypotheses are H 0 : p .5 and H 1 : p .5 , and that p cv .3354 . If the proportion is below .5, we will not reject the null hypothesis if p is above .3354. The halfway point between .5 and .3354 is .4177, which is about .08 below .5. I used .5, .42, .3353, .26 and .18. Note that I ignored the failure of everyone who did this question to recompute p . .5 .4901 .9901 1 p p .42 .5 .3869 .8869 p .3353 p p .26 p .5 .3869 .1131 p p .18 .5 .4979 .0021 .3353 .5 Pz Pz 2.33 .0707 Power 1 1 .9901 .0099 p .0707 p .5 .3353 .42 .42 .58 Pz Pz 1.21 .0698 .0698 50 Power 1 1 .8869 .1131 .3353 .3353 .3353 .6647 Pz .0668 Pz 0 .5 .0668 50 Power 1 1 .5 .5 .3353 .26 .26 .74 Pz Pz 1.21 .0620 .0620 50 Power 1 1 .1131 .8869 .3353 .18 .18 .82 Pz Pz 2.87 .0543 .0543 50 Power 1 1 .0021 .9979 d. Assume that p .5 , how large a sample would you need to estimate the proportion above 36 with an error of .01? How much would you cut down the sample size if you used the proportion that you actually found? Illustrate how much the required sample size would fall if you lowered the confidence level. (3) The outline says “The usually suggested formula is n pqz 2 , but since p is usually unknown, a e2 conservative choice is to set p 0.5 . This is the formula everyone forgets that we covered. Assume .01 . So n found p pqz 2 e2 .5.52.576 2 .012 16598 .44 , and we use 16599. We actually 28 .56 .44 2.576 2 .56 . If we use .56 instead, n 16350 .55 , and we use 16351. 50 .012 This is 98.5% of the previous value, but higher values of p could bring considerable savings.. Now, if we switch from a 99% confidence level to a 95% confidence level, n .56 .44 1.960 2 .012 9465 .70 . We use 9466, which is 58% of our second value and 57% of our original sample size and thus represents a considerable saving. 16 252y0511 2/25/05 (Open in ‘Print Layout’ format) e. Use the proportion that you found in 3b) to create a 2-sided confidence interval for the 28 .56 , p p z s p proportion above 36. Does it differ significantly from .5? Why? (2) p 2 50 pq .56 .44 .0702 . A 99% confidence interval would be .76 2.576 .0702 n 50 .56 .181 or .379 to .741. If you used a 95% confidence level instead, you would get .56 1.960 .0702 .56 .138 or .422 to .698. Because the confidence interval includes .5, the difference is not significant at the 95% or 99% level. and s p 4. a. Take the standard deviation that you found in 1), add the same quantity that you added in part 1) to it. (For example, Seymour Butz’s student number is 976502 and he found s 7.12 , so he added 0.20 to it and used 7.32.) (No credit.) b. Test the hypothesis that the standard deviation is 6. (99% confidence level) Use a test ratio. (2) Find a p-value for your answer in 4a). (1) Our hypotheses are H 0 : 6 and H 1 : 6 . Since n 50 , we are in a large sample situation. The outline says 2 2 49 7.425 2 62 n 1s 2 02 and for large samples z 2 2 2DF 1 . So 75 .0389 . Assume .01, DF 50 1 49 and z 275 .0389 249 1 150 .0778 97 12 .2596 9.84886 2.40 . For a 2-sided test, make a Normal curve with a vertical line at the center where z 0 . Rejection zones will be above z.005 2.596 and below z.025 2.576 . Since 2.40 is between the critical values, do not reject the null hypothesis. Note that for a 5% significance level, the hypothesis would be rejected. Since this is a 2-sided test, we use pvalue 2Pz 2.40 2.5 .4918 .0164 . c. Do a 99% confidence interval for the standard deviation (2). The outline gives s 2DF z 2 2DF s 2DF z 2 2DF , If .01, z z.005 2.576 and 2 2 DF 249 98 9.899 . The interval is thus 7.425 9.899 7.425 9.899 or 2.576 9.899 2.576 9.899 73 .500 73 .500 or 5.892 10.369 12 .475 7.323 d. (Extra credit) Redo 4b) using an appropriate confidence interval. (2) H 0 : 6 and H 1 : 6 and .01 . The interval becomes 7.425 9.899 7.425 9.899 or 1.960 9.899 1.960 9.899 73 .500 73 .500 or 6.197 to 9.260. Since 6 is not on this interval, reject the null 11 .859 7.939 hypothesis. e. (Extra credit) Find critical values for s in 4a). (1) The easiest way to do this is to use the 69.899 formula sheet, which says scv 2 DF . These become s cv or 1.960 9.899 z 2 DF 2 17 252y0511 2/25/05 (Open in ‘Print Layout’ format) 69.899 69.899 59 .394 59 .394 5.008 and 7.481 . Note that s 7.425 is 1.960 9.899 11 .859 1.960 9.899 7.939 between these two values, so that we cannot reject the null hypothesis. f. A bank's average default rate on loans is supposedly 7 per month. In the first month there are 13 defaults. Test the first assertion assuming a Poisson distribution. Use a two-sided test with a 1% significance level. (2) This is essentially problem B4, which was assigned. .01 . H : Poisson7 If we assume that the distribution is Poisson, our hypotheses are 0 . Though it H 1 : not Poisson7 is possible to put together a rejection region, the easiest way to do this is to use the Poisson(7) table and a p-value approach. Since this is a 2-sided test, we double p-values. If we look up the probability that x is 13 or larger in the Poisson table, we find: pvalue 2Px 13 21 Px 12 21 .9730 2.0270 .0540 . Since pvalue , do not reject H 0 . g. In 4f) find what values of x (the number of defaults in the first month) would enable you not to reject the null hypothesis. (2) Try x 14 . pvalue 2Px 14 21 Px 13 21 .98719 2.01281 .0256 . pvalue 2Px 15 21 Px 14 21 .99428 2.00572 .0114 . pvalue 2Px 16 21 Px 15 21 .99757 2.00243 .00486 . If x 2 pvalue 2Px 2 2.02964 .0593 . If x 1 pvalue 2Px 1 2.00730 .0146 . If x 0 pvalue 2Px 0 2.00091 .00182 . If .01, the p-value is below the significance level for x 16 and x 0 (In a 2-sided test, the trick is to look for probabilities below 2 .005 and above 1 2 .995 .) So we do not reject the null hypothesis if 1 x 15 . If x 15 If x 16 h. (Extra credit) Assume that the bank, in fact, has an average default rate on loans of 9 per month, what is the probability that you will fail to reject your null hypothesis that the mean is 7, using the ‘accept’ zone that you found in g)? This is actually the easiest problem on the exam. If the mean is 9, P1 x 15 Px 15 Px 0 .97796 .00012 .97784 . The power is 1-.97784 = 0.02%, which is about as bad as it gets. This takes us back to my statement in lecture that a 1period trial has very little power. If, instead, you let the trial run for 12 months, the mean would be 127 84 . and the standard deviation would be 84 . The critical values would be x cv 84 2.576 84 . It should be very easy to show that if, in fact 12 9 108 , is much lower. 18