252y0421 3/29/04 (Page layout view!) ECO252 QBA2 Name SECOND HOUR EXAM Hour of Class Registered March 24, 2004 Circle 10am 11am Show your work! Make Diagrams! Exam is normed on 50 points. Answers without reasons are not usually acceptable. Note that some equations have been squashed by Word. If you click on them or print them out, they should be fine. I. (8 points) Do all the following. x ~ N 2,9 - If you are not using the supplement table, make sure that I know it. 0 2 27 .88 2 z P 3.32 z 0.22 1. P27 .88 x 0 P 9 9 P3.32 z 0 P0.22 z 0 .4995 .0871 .4124 Make a diagram! For z draw a Normal curve with zero in the middle. Shade the area between -3.32 and 0.22 and note that it is all on one side of the mean, so that you subtract the area between -0.22 and zero from the area between -3.32 and zero. 16 2 1 2 z P 0.11 z 1.56 2. P0 x 16 P 9 9 P0.11 z 0 P0 z 1.56 .0438 .4406 .4844 Make a diagram! For z draw a Normal curve with zero in the middle. Shade the area between -0.11 and 1.56 and note that it is on both sides of the mean, so that you add the area between -0.11 and zero to the area between zero and 1.56. 16 2 Pz 1.56 3. F 16 (The cumulative probability up to 16) F 16 Px 16 P z 9 Pz 0 P0 z 1.56 .5 .4406 .9406 Make a diagram! For z draw a Normal curve with zero in the middle. Shade the entire area below 1.56 and note that it is on both sides of the mean, so that you add the area below zero to the area between zero and 1.56. x.115 . First we must find z .115 . This is the value of z that has Pz z .115 .115 or P0 z z..115 .5 .115 .3850 . On the Normal table, the closest we can find to .3850 is P0 z 1.20 .3849 . So z.115 1.20 and x.115 z .115 2 1.209 12.8. 4. 12 .8 2 Pz 1.20 .5 .3849 .1151 .115 Check: Px 12 .8 P z 9 Make a diagram! For z draw a Normal curve with zero in the middle. Divide the area above zero into 11.5% above z .115 and 50% - 11.5% below z .115 . 252y0421 3/17/04 II. (24+ points) Do all the following? (2points each unless noted otherwise). Note the following: 1. You will be penalized if you do not compute the sample variance of the x L column in question 1. 2. This test is normed on 50 points, but there are more points possible including the takehome. You may not finish the exam and might want to skip some questions. 3. A table identifying methods for comparing 2 samples is at the end of the exam. 4. If you answer ‘None of the above’ in any question, you should provide an alternative answer and explain why. You may receive credit for this even if you are wrong. Questions 1-6 refer to Exhibit 1. Exhibit 1:(Edited from problems presented by Samuel Wathen with one small error for Lind et. al. 2002) The first two columns below are evaluations of a sample of five products, first at FIFO and, second, at LIFO. Based on the results shown, is LIFO more effective than FIFO in keeping the value of inventory lower? (Assume that the underlying distribution is Normal.) d xF xL Product xF xL 1 2 3 4 5 225 119 100 212 248 904 221 100 113 200 245 879 4 19 -13 12 3 25 x F2 x L2 d2 50625 14161 10000 44944 61504 181234 48841 10000 12769 40000 60025 171635 16 361 169 144 9 699 Minitab calculated the following sample statistics: n Mean Median StDev SE Mean xF 5 180.8 212.0 66.7 29.8 xL 5 175.8 200.0 ____ d 5 5.00 4.00 11.98 Variable 1. Compute the standard deviation of x L . You may use any of the material given in exhibit 1. Solution: 2. 5.36 s L2 x L 2 nx L2 n 1 171635 5175 .82 4276 .7 s L 4276 .7 65 .396 4 Note: If you wasted our time using the definitional formula, see the end of Part II. What is the null hypothesis? a) F L F L c) * F L d) F L b) e) None of the above. Explanation: The question seems to be asking if L F . This is the same as F L , which cannot be a null hypothesis because it does not contain an equality. It must be an alternate hypothesis so that the null hypothesis is its opposite, F L 2 252y0421 3/17/04 Exhibit 1:The first two columns below are evaluations of a sample of five products, first at FIFO and, second, at LIFO. Based on the results shown, is LIFO more effective than FIFO in keeping the value of inventory lower? (Assume that the underlying distribution is Normal.) xL d xF xL x F2 221 100 113 200 245 879 n 4 19 -13 12 3 25 Mean 50625 14161 10000 44944 61504 181234 Median xF 5 180.8 212.0 66.7 xL 5 175.8 200.0 ____ d 5 5.00 4.00 Product 1 2 3 4 5 xF 225 119 100 212 248 904 Variable d2 x L2 48841 16 10000 361 12769 169 40000 144 60025 9 171635 699 StDev 11.98 SE Mean 29.8 5.36 3. What is (are) the degrees of freedom? a) *4 Since each line represents on product, this is paired data. b) 5 c) 8 d) 15 e) 10 Explanation: Since each line represents one product, this is paired data. Our variable is thus really d , which contains only 5 numbers. 4. If you used the 5% level of significance, what is the appropriate t or z value from the tables. a) 2.571 b) 2.776 c) 2.262 d) 2.228 e) 1.645 f) 1.960 g) *None of the above. This is a one-sided 5% test, and the alternate hypothesis, F L , is the same as D 0 , where D F L . t 5. d 0 50 0.93 and must be larger than t.405 2.132 sd 5.36 What is the value of your calculated t or z ? a) * 0.933 b) 2.776 c) 0.477 d) 2.028 e) None of the above. 3 252y0421 3/17/04 6. What is your decision at the 5% significance level? a) Do not reject the null hypothesis and conclude that LIFO is more effective in keeping the value of the inventory lower. b) Reject the null hypothesis and conclude that LIFO is more effective in keeping the value of the inventory lower. c) Reject the alternative hypothesis and conclude that LIFO is more effective in keeping the value of the inventory lower. d) *Do not reject the null hypothesis and conclude that LIFO is not more effective in keeping the value of the inventory lower. e) None of the above. 7. Find an approximate p-value for the null hypothesis that you tested. Please explain your result! Solution: We need Pt 0.933 . If we check the line for 4 degrees of freedom, we find 4 4 t .20 0.941 and t .25 0.741 , which means that Pt .941 .20 and Pt .741 .25 . Since 0.933 lies between these values, it must be true that .20 Pt 0.933 .25 . There is some flexibility here depending on your answer to Question five. 8. A manufacturer revises a manufacturing process and finds a fall in the defect rate of 4% 5%. a) The fall in defects is statistically significant because 5% is larger than 4%. b) The fall in defects is statistically significant because the confidence interval supports H0. c) *The fall in defects is not statistically significant because 4% is smaller than 5%. d) The fall in defects is not statistically significant because the confidence interval would lead us to reject H0. Questions 9-11 refer to Exhibit 2. Exhibit 2:(Edited from problems presented by Samuel Wathen) A group of adults and a group of children both tried Wow! Cereal. Was there a difference in how adults and kids responded to it? Number in Number who Fraction of Sample liked it sample who 250 187 liked it Adults .748 (Group 1) Children 250 100 66 .660 (Group 2) Total .748 .252 .000754 .660 .340 .002244 100 350 253 .723 .723 .277 .0005722 350 4 252y0421 3/29/04 9. What is the null hypothesis ? a) 1 2 There is no reasonable way to define a mean here. b) c) 1 2 1 2 d) * p1 p 2 There is no reason to assume that one fraction is larger than the other before we look at the data. Of course b), c), e) and f) do not contain equalities and cannot be null hypotheses. e) p1 p 2 f) p1 p 2 g) None of the above. 10. Calculate a 99% confidence interval for the difference between the fraction of adults and fraction of kids that liked Wow! Explain why you reject or do not reject the null hypothesis. (4) Solution: The outline says the following: p p z s p or 2 p1 p2 p1 p2 z 2 s p , where s p p1 q1 p 2 q 2 . The tables say n1 n2 z.005 2.576 , so the interval is p .748 .660 2.576 .000754 .002244 , or .088 2.576 .002998 .088 2.576 .05475 .088 0.141 . Since this interval includes zero, do not reject the null hypothesis. 11. (Extra Credit)Calculate a 77% confidence interval for the difference between the fraction of adults and fraction of kids that liked Wow! (2) On page 1, we found z.115 1.20 . Since the confidence level is 77%, the significance level is 23% and half the significance level is 11.5%. The interval is thus p .088 1.20.152296 .088 0.183 Questions 12-14 refer to Exhibit 3. Exhibit 3:(Edited from problems presented by Samuel Wathen) A survey was taken among a randomly selected 100 property owners to see if opinion about a street widening was related to the distance of front footage they owned. The results appear below. Opinion Front-Footage For Undecided Against Under 45 feet 12 4 4 45-120 feet 35 5 30 Over 120 feet 3 2 5 12. How many degrees of freedom are there? a) 2 b) 3 c) *4 r 1c 1 d) 5 e) 9 f) None of the above. 5 252y0421 3/17/04 13. What is the value of E for people in favor of the project who own less than 45 feet of frontage ? a) *10 .2050 b) 12 c) 35 d) 50 e) None of the above. Front-Footage Under 45 feet 45-120 feet Over 120 feet For 12 35 3 50 Undecided Against 4 5 2 11 4 30 5 39 pr 20 70 10 100 .20 .70 .30 14. Assume that the computed value of chi square is 8.5 a) What is the null hypothesis that you are testing ? (2) Solution: Opinions and Front footage are independent. b) What is your conclusion ? Why ? (3) Solution: We do not reject the null hypothesis at the 5% level because 8.5 is below 4 2 .05 9.4877 . 15. Turn in your computer output from computer problem 1 only tucked inside this exam paper. (3 points - 2 point penalty for not handing this in.) 16. The following output is from a computer problem very much like the one you did to compare two sets of data. Two production processes are in use. I wish to compare numbers of defects in Process A and Process B to test the statement “ The number of defects in process A is significantly lower than in process B.” Three tests are done. Assume that the underlying distribution is Normal. a)Which of the three tests should we use? b) What is the null hypothesis as we use it? c) Should we reject the null hypothesis? Why? Test 1: MTB > twosamplet 'A' 'B' Two-Sample T-Test and CI: A, B Two-sample T for A vs B N A 90 B 110 Mean 220.5 300.5 StDev SE Mean 34.7 3.7 82.7 7.9 Difference = mu A - mu B Estimate for difference: -79.98 95% CI for difference: (-97.15, -62.81) T-Test of difference = 0 (vs not =): T-Value = -9.20 P-Value = 0.000 DF = 152 H 0 : A B H1 : A B 6 252y0421 3/17/04 Test 2: MTB > twosamplet 'A' 'B'; SUBC> alter 1. Two-Sample T-Test and CI: A, B Two-sample T for A vs B N A 90 B 110 Mean 220.5 300.5 StDev SE Mean 34.7 3.7 82.7 7.9 Difference = mu A - mu B Estimate for difference: -79.98 95% lower bound for difference: -94.36 T-Test of difference = 0 (vs >): T-Value = -9.20 P-Value = 1.000 DF = 152 H0 : A B H 1 : A B Test 3: MTB > Twosamplet 'A' 'B'; SUBC> alter -1. Two-Sample T-Test and CI: A, B Two-sample T for A vs B N A 90 B 110 Mean 220.5 300.5 StDev SE Mean 34.7 3.7 82.7 7.9 Difference = mu A - mu B Estimate for difference: -79.98 95% upper bound for difference: -65.59 T-Test of difference = 0 (vs <): T-Value = -9.20 P-Value = 0.000 DF = 152 H 0 : A B H 1 : A B H 0 : D 0 Solution: a), b)Test 3 is appropriate because it tests and D 0 is equivalent to H 1 : D 0 A B c) Since the p-value is below any significance level we might use, we reject the null hypothesis. 17. (Extra credit) My boss objects that he thinks that the variances are equal, so that I used the wrong test. I go back to the computer and do the following. (The null hypothesis is equal variances.) Was I right? Why? MTB > %VarTest c3 c4; SUBC> Unstacked. Test for Equal Variances F-Test (normal distribution) Test Statistic: 0.176 P-Value : 0.000 Solution: I was right. If the null hypothesis was equal variances, the p-value was below any significance level that I would use. So we reject the null hypothesis. 7 252y0421 3/17/04 18. (Extra Credit)Now my beloved boss says that maybe the underlying distribution is not Normal. I go back to the computer and run the following. Process A results are in C3. Process B results are in C4. Remember that there are 90 data items for process A and 100 for process B. What are our hypotheses and results? MTB > Stack c3 c4 c5; SUBC> Subscripts c6; SUBC> UseNames. MTB > Rank c5 c7. This stacks the 2 sets of results together so they can be ranked. C7 now contains the ranks. MTB > Unstack (c7); SUBC> Subscripts c6; SUBC> After; SUBC> VarNames. Ranks for A are now in C7_A. Ranks for B are now in C7_B. MTB > sum c8 Sum of C7_A Sum of C7_A = 6008.0 MTB > sum c9 Sum of C7_B Sum of C7_B = 14092 Solution: We use the Wilcoxon-Mann-Whitney Test for Two Independent Samples to compare the medians.. According to the outline, for values of n1 and n 2 that are too large for the tables, W has the normal n1 n2 1 and variance W2 16 n2 W . If the significance level is 5% W W and the test is one-sided, we reject our null hypothesis if z lies below -1.645. W So n1 90 and n 2 110 . W 6008 is the smaller of the rank sums. W 1 2 n1 n1 n 2 1 distribution with mean W 1 2 90 1 2 n1 90 110 1 45 201 9045 So z W W W 6008 9045 185825 and W2 16 n2 W 16 1109045 185825. 3037 7.045 . Since the p-value would be 431 .07 Pz 7.045 .5 .5 0 , we would reject the null hypothesis of A B at any significance level. 252x0421 3/17/04 Questions 19-22 refer to Exhibit 4. Exhibit 4:(Edited from problems presented by Samuel Wathen) A professor asserts that she uses a Normal curve with a mean of 75 and a standard deviation of 10 to grade students. Last year’s grades are below. Test to see if the professor’s assertions are correct at the 99% confidence level. Row Grade Interval 1 2 3 4 5 A B C D F 90+ 80-90 70-80 60-70 Below 60 E 7.6820 27.7955 44.0450 27.7955 7.6820 115.0000 O 15 20 40 30 10 115 O2 E 29.2892 14.3908 36.3265 32.3793 13.0174 125.4032 19. Show the calculations necessary to get the number that were expected to get B’s. 90 75 80 75 z P0.5 z 1.5 .4332 .1995 .2417 Solution: P80 x 90 P 10 10 and .2417 115 27.7755 252y0421 3/29/04 8 20. What table value of chi-square would you use to test the professor’s assertion? 4 Solution: 2 .01 13.2767 21. What is the calculated value of chi-square? Solution: O2 n 125 .4032 115 10.4032 E 22. Explain your conclusion. Solution: Since the calculated chi-square is smaller than the table chi-squared, do not reject the null hypothesis that the grades follow a Normal distribution. Answer to Question 1 using definitional formula: Row xL C2 C3 1 2 3 4 5 221 100 113 200 245 879 45.2 -75.8 -62.8 24.2 69.2 0.0 2043.04 5745.64 3943.84 585.64 4788.64 17106.80 s 2 x x n 1 2 17106 .80 4276 .7 4 Location - Normal distribution. Compare means. Location - Distribution not Normal. Compare medians. s L 4276 .7 65 .396 Paired Samples Method D4 Independent Samples Methods D1- D3 Method D5b Method D5a Proportions Method D6 Variability - Normal distribution. Compare variances. Method D7 9 252y0421 3/17/04 ECO252 QBA2 SECOND EXAM March 24, 2004 TAKE HOME SECTION Name: _________________________ Student Number: _________________________ III. Neatness Counts! Show your work! Always state your hypotheses and conclusions clearly. (19+ points) 1) Chi-squared and Related Tests (Bassett et. al.) To personalize the data below, change the number of stations reporting 4 thunderstorms to the second to last digit of your student number. This will change the total number of stations reporting. For example, Seymour Butz’s student number is 976500, so he will change the number of stations reporting 4 thunderstorms to zero and the total number of stations reporting will be 22 + 37 + 20 + 13 + 0 + 2 = 94. a) 100 weather stations reported the following in August 2003: Number of Thunderstorms x Number of stations reporting thunderstorms O x 0 1 2 3 22 37 20 13 4 5 6 2 In the region in question, the number of thunderstorms per month is believed to have a Poisson distribution with a mean of 1. Test to see if this is appropriate using a chi-squared method. For example if, 5 stations reported 2 thunderstorms and 5 stations reported 3 thunderstorms and there were only 10 stations, the total number of storms reported would be 25 35 25 , and the average number of storms reported would be 25 10 2.5 . (4) b) Repeat the test using the Kolmogorov-Smirnov method. (3) c) Find the average number of storms per station and use it to generate a Poisson table on Minitab. To do so follow the k , column 2 Pk and column ' le ' 3 Px le k or something similar.. ( stands for ' ' ) In column 1 place the numbers 0 through 10. example below, replacing 0.732 with your mean (a number like 1.723). Head Column 1 (C1) MTB > PDF c1 c2; SUBC> Poisson 0.732. MTB > CDF c1 c3; SUBC> Poisson 0.732. MTB > print c1 - c3 Data Display Row k P(k) P(x le k) 1 0 0.480946 0.48095 2 1 0.352053 0.83300 3 2 0.128851 0.96185 4 3 0.031440 0.99329 5 4 0.005753 0.99904 6 5 0.000842 0.99989 7 6 0.000103 0.99999 8 7 0.000011 1.00000 9 8 0.000001 1.00000 10 9 0.000000 1.00000 11 10 0.000000 1.00000 10 252y0421 3/17/04 This table tells us that, for a Poisson distribution with a mean of 0.732, Px 3 .031440 and Px 3 .99329 . To keep the numbers correct, you could merge the data for k = 5 to 10 into a category of ‘5 or more storms.’ Decide whether a chi-squared or K-S method is appropriate (Only one method is!) and test for a Poisson distribution with your mean, remembering that you estimated the mean from your data. (4) d) (Extra Credit) Two dice were thrown 180 times with the results below. Test the hypothesis that the distribution follows the binomial distribution with Number of Sixes Frequency O x n 2 and p .15 . (2) 0 105 1 2 70 5 e) (Extra extra credit) Test the data in d) for a binomial distribution in general by using pˆ Total number of sixes (2) Total number of throws Solution: I will use Seymour’s version here, and try to put the others into an appendix. a) 100 weather stations reported the following in August 2003: Number of Thunderstorms x 0 1 2 3 4 5 Number of stations reporting x 22 37 20 13 0 2 thunderstorms O In the region in question, the number of thunderstorms per month is believed to have a Poisson distribution with a mean of 1. Test to see if this is appropriate using a chi-squared method. (4) H 0 : Poisson(1) This is the data from the Supplementary Materials book. x P x F x 0 1 2 3 4 5 6 7 8 0.367879 0.367879 0.183940 0.061313 0.015328 0.003066 0.000511 0.000073 0.000009 0.36788 0.73576 0.91970 0.98101 0.99634 0.99941 0.99992 0.99999 1.00000 So we need to put together a version of E . Note that O adds to n 94 . So if we take Px and multiply by 94, we get (34.5806, 34.5806, 17.2904, 5.7634, 1.4408, 0.2882, 0.0480, 0.0069, 0.0008). The last three, at least, are too small to use, so we combine them to get the table below. Row O 1 2 3 4 5 4 22 37 20 13 2 94 E 34.5806 34.5806 17.2904 5.7634 1.7847 93.9997 E O 12.5806 -2.4194 -2.7096 -7.2366 -0.2153 -0.0003 E O2 158.272 5.853 7.342 52.368 0.046 E O 2 E 4.57690 0.16927 0.42464 9.08628 0.02597 14.2833 O2 E 13.9963 39.5886 23.1343 29.3229 2.2413 108.2831 2 .05 9.4877. Depending on which method you used 2 14.2833 or 2 108.2831 94 14.2831. 4 These are both above 2.05 , so reject the null hypothesis. b) Repeat the test using the Kolmogorov-Smirnov method. (3) First, take O , divide by n 94 and make the result into a cumulative distribution. 11 252y0421 3/17/04 Row 1 2 3 4 5 6 O O n 22 37 20 13 0 2 0.234043 0.393617 0.212766 0.138298 0.000000 0.021277 Fo 0.23404 0.62766 0.84043 0.97872 0.97872 1.00000 Copy F x , label it Fe , compute D , and find the maximum D , which is .133837. According to the K-S table, this should be compared with 1.36 1.36 .1403 . Because the maximum deviation is not above n 94 .1403, do not reject the null hypothesis. Row Fo Fe D 1 2 3 4 5 6 7 8 9 0.23404 0.62766 0.84043 0.97872 0.97872 1.00000 0.36788 0.73576 0.91970 0.98101 0.99634 0.99941 0.99992 0.99999 1.00000 0.133837 0.108100 0.079274 0.002287 0.017617 0.000590 c) Find the average number of storms per station and use it to generate a Poisson table on Minitab. Decide whether a chi-squared or K-S method is appropriate (Only one method is!) and test for a Poisson distribution with your mean, remembering that you estimated the mean from your data. (4) We multiply O and x and get 22(0)+ 37(1)+20(2)+13(3)+0(4)+2(5) = 0 + 37 + 40 + 39 + 0 + 10 = 126 as our mean. We generate the part of the Poisson table that we need, multiply Px by n 94 and use a 3 chi-square method. We compare our computed chi-square of 5.9364 to 2 .05 7.8147 , and do not reject the null hypothesis, H 0 : Poisson . Note that we have lost a degree of freedom by computing the mean from the data, which is why we can’t use the K-S method. x Row P x 1 2 3 4 5 6 7 8 9 10 11 Row 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 0.261846 0.350873 0.235085 0.105005 0.035177 0.009427 0.002105 0.000403 0.000068 0.000010 0.000001 O E E O 22 37 20 13 0 2 94 24.6135 32.9821 22.0980 9.8704 3.3066 1.1293 93.9999 2.61349 -4.01792 2.09799 -3.12956 3.30660 -0.87070 -0.00009 E O2 6.8303 16.1437 4.4016 9.7942 10.9336 0.7581 E O 2 E 0.27750 0.48947 0.19918 0.99227 3.30660 0.67132 5.93644 O2 E 19.6640 41.5074 18.1012 17.1218 0.0000 3.5420 99.9364 12 252y0421 3/17/04 d) (Extra Credit) Two dice were thrown 180 times with the results below. Test the hypothesis that the distribution follows the binomial distribution with n 2 and p .15 . (2) Number of Sixes x 0 1 2 Frequency O 105 70 5 e) (Extra extra credit) Test the data in d) for a binomial distribution in general by using pˆ Total number of sixes Total number of throws 80 .4444 . I did these together. Since the total number of sixes was 1(70) + 2(5) = 80 p 180 My table for .4444 could have been generated by Px C xn p x q n x , where q 1 p , but I used MTB > SUBC> MTB > SUBC> cdf c7 c10; binomial 2 .4444. pdf c7 c11; binomial 2 .4444. The table for .15 could have been done the same way or with the formula. I used the table in the Supplement and then took the difference between the numbers. I got the E column by multiplying Px by 180. p .15 p .4444 x Row F x P x F x P x 1 2 3 0 1 2 0.7225 0.9775 1.0000 0.7225 0.2550 0.0225 0.30869 0.80251 1.00000 0.308691 0.493817 0.197491 Only one method was needed in each of d) and e). If you used chi-squared, you should have gotten the following. H 0 : Binomial(.15,2) Row 1 2 3 x O 0 1 2 105 70 5 180 E 130.05 45.90 4.05 180.00 E O E O2 25.05 -24.10 -0.95 0.00 627.503 580.810 0.903 E O 2 O2 E E 4.8251 12.6538 0.2228 17.7017 84.775 106.754 6.173 197.702 2 Since 2 .05 5.9915, and our computed chi-square of 17.7017 is larger, we reject the null hypothesis. The K-S method is probably easier. I got the following. O Row x O Fo Fe D E n 1 2 3 0 1 2 105 70 5 130.05 45.90 4.05 0.58333 0.97222 1.00000 0.7225 0.9775 1.0000 0.139167 0.005278 0.000000 0.583333 0.388889 0.027778 The maximum D is .139167. According to the K-S table, this should be compared with 1.36 1.36 .1014 . Because the maximum deviation is above .1403, reject the null hypothesis. n 180 1 d) In this section, we have lost a degree of freedom. Since 2 .05 3.84146 is way below our chi-square of 74.248, we reject the null Hypothesis. H 0 : Binomial Row x 1 2 3 0 1 2 O 105 70 5 180 E 55.5644 88.8871 35.5484 180.000 E O -49.4356 18.8871 30.5484 -0.0000 E O2 2443.87 356.72 933.21 E O 2 E 43.9827 4.0132 26.2517 74.2476 O2 E 198.418 55.126 0.703 254.248 13 252y0421 3/17/04 2) (Meyer and Krueger) WEFA compiled the following random samples of single-family home prices in the eastern and western parts of the US (in $thousands.). (Note – in this problem it is OK to use Excel or Minitab as a help – but you must fool me into believing that you did it by hand.) Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 City - E x1 Albany NY Allentown PA Baltimore MD Bergen NJ Boston MA Buffalo NY Charlestown SC Charlotte NC Greensboro NC Greenville SC Harrisburg PA Hartford CT Middlesex NJ Monmouth NJ New Haven CT New York NY Newark NJ Philadelphia PA Raleigh/Durham NC Rochester NY Springfield MA Syracuse NY Washington DC 108.607 85.250 112.747 195.232 180.865 83.122 92.840 104.433 97.638 88.355 79.846 129.130 169.540 137.859 134.856 170.830 187.128 114.553 119.355 85.043 102.678 82.372 155.176 City-W x2 Bakersfield CA Fresno CA Orange C. CA Portland OR Riverside CA Sacramento CA San Diego CA San Francisco CA San Jose CA Seattle WA Stockton CA Tacoma WA 137.171 107.627 204.862 123.605 123.836 120.232 172.601 220.067 224.828 147.854 98.440 119.884 City-No 1 2 3 4 5 6 7 8 9 10 11 12 These are available on the website in Minitab. Minitab reports the following sample statistics. Variable x1 x2 n 23 12 Mean 122.50 150.10 Median 112.75 130.50 StDev 37.20 44.50 You may use the statistics given for x1, but personalize the data for Western cities as follows: Use the fourth digit of your student number to pick the first city to be eliminated and then eliminate the third city after that. (You may, if you wish, drop the last two digits of the prices in the Western Cities.) For example, Seymour Butz’s student number is 976500, so he will use the number 5 to eliminate cities 5 (Riverside) and 8 (San Francisco). If the fourth digit of your student number is zero, eliminate cities 10 and 1. You will thus have only 10 cities in your second sample. a. Compute a (mean and) standard deviation for your personalized second sample. Show your work! (2) b. Test to see if there is a significant difference between the mean home prices in the eastern and western US. You may assume that the samples come from Normal populations with equal variances, though there are 2 points extra credit if you do not assume equal variances. You may use a test ratio, critical value or a confidence interval (4 points) or all three of these (6 points – assuming that you get the same conclusion for all of them) . c. Test the variances to find out if you were or would have been justified to assume equality of variances. Were you? (2) d. (Extra Credit)Use a Lilliefors test to see if the Western data is Normally distributed. (2) e. (Extra Credit) Assume that the data is not normally distributed and test to see if there is a significant difference between the medians. (3) Solution: Because there is no way that I will find time to do the individual solutions, you will have to make do with the original data. 14 252y0421 3/17/04 a. Compute a (mean and) standard deviation for your personalized second sample. Show your work! (2) Row x2 x 22 1 2 3 4 5 6 7 8 9 10 11 12 137.171 107.627 204.862 123.605 123.836 120.232 172.601 220.067 224.828 147.854 98.440 119.884 1801.0 18815.9 11583.6 41968.4 15278.2 15335.4 14455.7 29791.1 48429.5 50547.6 21860.8 9690.4 14372.2 292129 x and s x From the formula sheet (Table 20) x 2 2 n 1 n nx 2 , so x 2 x n2 2 1801 150 .08 and 12 292129 12 150 .08 1985 .5384 s 2 1985 .5384 44 .559 n2 1 11 b. Test to see if there is a significant difference between the mean home prices in the eastern and western US. You may assume that the samples come from Normal populations with equal variances, though there are 2 points extra credit if you do not assume equal variances. You may use a test ratio, critical value or a confidence interval (4 points) or all three of these (6 points – assuming that you get the same conclusion for all of them) . From the syllabus supplement. The hypotheses are as on the table. H 0 : 1 2 H 1 : 1 2 Interval for Confidence Hypotheses Test Ratio Critical Value Interval Difference H 0 : D D0 * D d t 2 s d d cv D0 t 2 sd d D0 t between Two H 1 : D D0 , s 1 1 d Means ( sd s p D n 1s12 n2 1s22 1 2 n n 2 1 unknown, s 22 x 22 nx 22 2 1 variances assumed equal) (Method D2) Difference Between Two Means( Unknown, Variances Assumed Unequal) sˆ p 2 n1 n2 2 DF n1 n2 2 H 0 : D D0 D d t 2 sd s12 s22 n1 n2 sd DF s12 s22 n 1 n2 H1 : D D0 D 1 2 t d D0 sd d cv D0 t 2 sd 2 s12 2 n1 n1 1 s 22 2 n2 n2 1 I will use the Minitab sample statistics reported above. 15 252y0421 3/29/04 d x1 x 2 122.5 150.1 27.6 . n 1 n2 1 s p2 1 n1 n 2 2 s12 s 22 If we assume that the variances are equal 221383 .84 111980 .25 1582 .64 , so that 33 1 1 1 1 1 1 200.698 14.1668 . 1582 .64 200 .698 and s d s p2 s d2 sˆ 2p 23 12 n1 n 2 n1 n 2 d D0 27 .6 1.948 df n1 n2 2 23 12 2 33 Make a diagram: Show an almost sd 14 .1668 Normal curve with a center at zero and critical values at t 33 2.035 and t 33 2.035 . Since the t .025 .025 computed value of t is between these, do not reject the null hypothesis. If we do not assume equal variances, use the following worksheet. sx21 s12 1383 .84 60 .1670 n1 23 s x22 s 22 1980 .25 165 .021 n2 12 s d2 sd s12 s 22 n1 n 2 2 s12 n1 60 .1670 2 164 .548 n1 1 22 2 s12 n1 165 .021 2 2475 .63 n1 1 11 225 .188 s12 s 22 225.188 15.0063 n1 n 2 s2 s2 2 1 2 n1 n 2 df 2 2 s2 s 22 1 n2 n1 n 1 n2 1 1 2 s d2 225 .188 2 19 .2069 2 2 2 164 . 548 2475 . 63 s x2 s x2 1 n1 1 n 2 1 d D0 27 .6 1.839 Make a diagram: Show sd 15 .0063 19 19 2.093 . Since an almost Normal curve with a center at zero and critical values at t .025 2.093 and t .025 Round this down and use 19 degrees of freedom. t the computed value of t is between these, do not reject the null hypothesis. 16 252y0421 3/29/04 c. Test the variances to find out if you were or would have been justified to assume equality of variances. Were you? (2) From Table 3 we have the following. Interval for Confidence Hypotheses Test Ratio Critical Value Interval Ratio of Variances 22 s22 DF , DF s2 H0 : 12 22 2 F.5 1.5 2 F DF1 , DF2 12 2 2 1 s1 s2 H1 : 12 22 1 1 , DF2 F1DF 2 DF n 1 DF1 , DF2 and 1 1 F 2 DF2 n 2 1 s2 F DF2 , DF1 22 2 s1 .5 .5 2 or 1 2 F 11, 22 s 22 s12 1980 .25 11,22 2.65 . We should also 1.431 for a two sided test compare this to F.025 1383 .84 s12 22,11 , but the test ratio is below 1 and cannot possibly be above the against F.025 s 22 critical value. Since both ratios are below their critical values, we cannot reject the null hypothesis. compare F 22,11 d. (Extra Credit) Use a Lilliefors test to see if the Western data is Normally distributed. (2) This is a decidedly machine aided solution. The computer got a mean of 150.084 and a standard deviation of 44.5448. Note that the data has been put in order. For the first row x x 98 .440 150 .084 z 1.15937 The computation of F0 is not shown, but comes from the fact s 44 .5448 that each of the numbers is 112 of the data, so that the observed cumulative distribution consists of 112 , etc. Fe Pz 1.15937 .5 P1.15937 z 0 .5 P1.16 z 0 .5 .3770 .1230 . This value was found from the Normal table, so that it is less accurate than the value below. 2 12 , 3 12 Row 1 2 3 4 5 6 7 8 9 10 11 12 x 98.440 107.627 119.884 120.232 123.605 123.836 137.171 147.854 172.601 204.862 220.067 224.828 z -1.15937 -0.95313 -0.67797 -0.67016 -0.59443 -0.58925 -0.28989 -0.05006 0.50549 1.22973 1.57107 1.67795 Fo 0.08333 0.16667 0.25000 0.33333 0.41667 0.50000 0.58333 0.66667 0.75000 0.83333 0.91667 1.00000 Fe 0.123153 0.170262 0.248896 0.251379 0.276111 0.277848 0.385952 0.480037 0.693394 0.890601 0.941917 0.953322 D 0.039819 0.003596 0.001104 0.081954 0.140556 0.222152 0.197382 0.186629 0.056606 0.057268 0.025250 0.046678 The maximum difference is .222152. Compare this to the 5% value for n 12 on the Lilliefors table of .242. Since the maximum observed difference is below the critical value, do not reject the null hypothesis. 17 252y0421 3/17/04 e. (Extra Credit) Assume that the data is not normally distributed and test to see if there is a significant difference between the medians. (3) Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 If we use a Wilcoxon-Mann-Whitney rank test, we get the following. x1 r1 x1 r1 108.607 85.250 112.747 195.232 180.865 83.122 92.840 104.433 97.638 88.355 79.846 129.130 169.540 137.859 134.856 170.830 187.128 114.553 119.355 85.043 102.678 82.372 155.176 13 5 14 32 30 3 7 11 8 6 1 21 27 24 22 28 31 15 16 4 10 2 26 356 137.171 107.627 204.862 123.605 123.836 120.232 172.601 220.067 224.828 147.854 98.440 119.884 23 12 33 19 20 18 29 34 35 25 9 17 274 According to the outline, for values of n1 and n 2 that are too large for the tables, W has the normal n1 n2 1 and variance W2 16 n2 W . If the significance level is 5% W W and the test is two-sided, we reject our null hypothesis if z does not lie between 1.960 . W So n1 12 and n2 23. W 274 is the smaller of the rank sums. W 1 2 n1 n1 n 2 1 distribution with mean W 1 2 12 1 2 n1 12 23 1 6 36 216 We have z W W W and W2 274 216 1 6 n 2 W 16 23216 828. 2.015 . Since this is not between the critical values, reject the null 828 hypothesis of equal medians. 18