252x0333 11/18/03 (Page layout view!) ECO252 QBA2 THIRD HOUR EXAM Nov 25 2003 Name Hour of Class Registered (Circle) I. (30+ points) Do all the following (2points each unless noted otherwise). TABLE 11-0 Shiffler and Adams present the partially complete ANOVA table below that resulted from the analysis of a problem with 3 rows and 3 columns. ANOVA Source of Variation SS Columns 18 Rows 40 df MS F F Interaction Within (Error) 208 Total 296 62 1. Complete the table. Assume a 5% significance level. You may not be able to get exactly the degrees of freedom you are looking for, but you should be able to come close. (4) 2. Is there significant interaction? Explain your answer. TABLE 13-6 The following Minitab table (with many parts deleted) was obtained when "Score received on an exam (measured in percentage points)" (Y) is regressed on "percentage attendance" (X) for 22 students in a Statistics for Business and Economics course. Regression Analysis: Orders versus Weight The regression equation is Score = …… + ….. Attendance Predictor Constant Attendance Coef 39.3927 0.34058 S = 20.2598 SE Coef 37.2435 0.52852 R-Sq = 2.034% T 1.0576 0.6444 P 0.3028 0.5266 R-Sq(adj) = -2.864% Analysis of Variance Source Regression Residual Error Total 3. DF 1 20 21 SS MS F P 0.523 Referring to Table 13-6, which of the following statements is true? a) -2.86% of the total variability in score received can be explained by percentage attendance. b) -2.86% of the total variability in percentage attendance can be explained by score received. c) 2% of the total variability in score received can be explained by percentage attendance. d) 2% of the total variability in percentage attendance can be explained by score received. 252x0333 11/18/03 4. Referring to Table 13-6, which of the following statements is true? a) If attendance increases by 0.341%, the estimated average score received will increase by 1 percentage point. b) If attendance increases by 1%, the estimated average score received will increase by 39.39 percentage points. c) If attendance increases by 1%, the estimated average score received will increase by 0.341 percentage points. d) If the score received increases by 39.39%, the estimated average attendance will go up by 1%. 5. (Text CD problem 12.51)The manager of a commercial mortgage department has collected data over 104 weeks concerning the number of mortgages approved. The data is the x and O columns below ( x is the number of mortgages approved and O is the number of weeks that happened, for example there were 32 weeks in which 2 mortgages were approved) and the problem asks if it follows a Poisson distribution. x O Row E 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5 6 7 8 9 10 11 12 13 25 32 17 9 6 1 1 0 0 0 0 0 104 12.7355 26.7445 28.0817 19.6572 10.3200 4.3344 1.5170 0.4551 0.1195 0.0279 0.0059 0.0011 0.0002 104.000 Since we have no guide as to what the parameter of the distribution is, the x and O columns were multiplied together to tell us that there were 219 mortgages approved over 104 weeks to give us an average of 2.1 mortgages per week. The E above is the computer – generated Poisson distribution multiplied by 104 . In a Kolmogorov – Smirnov procedure we make the O and E into cumulative distributions and compare them as is done below. Row Fo Fe D 1 2 3 4 5 6 7 8 9 10 11 12 13 0.12500 0.36538 0.67308 0.83654 0.92308 0.98077 0.99038 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.12246 0.37962 0.64963 0.83864 0.93787 0.97955 0.99414 0.99851 0.99966 0.99993 0.99999 1.00000 1.00000 0.0025435 0.0142304 0.0234453 0.0021047 0.0147973 0.0012180 0.0037536 0.0014857 0.0003369 0.0000689 0.0000126 0.0000019 0.0000000 Assume this is correct and explain how you would finish this analysis and why you would or would not reject the null hypothesis. (4) 2 252x0333 11/18/03 6. Referring to the previous problem, a more direct method of comparing the observed and expected data is below. Answer the following questions. a) What method is being used? (1) b) How many degrees of freedom do we have? (1) c) Why are the columns shorter here than in Problem 5? (1) d) Do we reject our null hypothesis? Why? (3) O2 Row O E E 1 2 3 4 5 6 7 13 25 32 17 9 6 2 104 12.7355 26.7445 28.0817 19.6572 10.3200 4.3344 2.1267 104.0000 13.2700 23.3693 36.4650 14.7020 7.8488 8.3056 1.8808 105.8415 7. In problems 5 and 6, one of the methods was used improperly. Which one? Why? 8. Random samples of salaries (in thousands) for lawyers in 3 cities are presented by Dummeldinger. They are repeated in the three left columns. 1 2 3 4 5 6 7 Atlanta 45.5 47.9 43.1 42.0 49.0 52.0 39.0 DC 41.5 40.1 39.0 56.5 37.0 49.0 43.0 LA 52.0 72.0 41.0 54.0 33.0 42.0 50.0 rank-At 12.0 13.0 11.0 8.5 14.5 17.5 3.5 80.0 rank-DC 7.0 5.0 3.5 20.0 2.0 14.5 10.0 62.0 rank-LA 17.5 21.0 6.0 19.0 1.0 8.5 16.0 89.0 You are asked to analyze them, which you do using a Kruskal – Wallis procedure. You are aware that the tables you have are only appropriate for columns with 5 or fewer items in them, so you drop the last two items in each column and after ranking the items from 1 to 15 get a Kruskal – Wallis H of 1.82. If you use the tables, What did you test and what is the conclusion? (3) 9. You remember how to work with column sizes that are too large for the table. You rank the data as appears in the three right columns above. Compute the Kruskal – Wallis H and use it to test your null hypothesis at the 5% significance level.(3) 10. The Kruskal – Wallis test above was done on the assumption that the underlying data did not follow the Normal distribution. Let’s assume that you found out that the underlying distributions were Normal and had a common variance. The method to use would be. a) Friedman Test b) Chi – squared test. c) One way ANOVA d) Two – way ANOVA 3 252x0333 11/18/03 TABLE 13-8 The regression equation is GPA = 0.5681 + .1021 ACT Predictor Constant ACT S = 0.2691 Coef .5681 .1021 SE Coef 0.9284 0.0356 R-Sq = 0.5774% T 0.6119 2.8633 P 0.5630 0.0286 R-Sq(adj) = 0.5069% Analysis of Variance Source Regression Residual Error Total DF 1 6 7 SS 0.5940 0.4347 1.0287 MS 0.5940 0.0724 F 8.1986 P .0287 It is believed that GPA (grade point average, based on a four point scale) should have a positive linear relationship with ACT scores. Given above is the Minitab output from regressing GPA on ACT scores using a data set of 8 randomly chosen students from a Big Ten university. 11. Referring to Table 13-8, the interpretation of the coefficient of determination in this regression is that a) 57.74% of the total variation of ACT scores can be explained by GPA. b) ACT scores account for 57.74% of the total fluctuation in GPA. c) GPA accounts for 57.74% of the variability of ACT scores. d) none of the above 12. Referring to Table 13-8, the value of the measured test statistic to test whether there is any linear relationship between GPA and ACT is a) 0.0356. b) 0.1021. c) 0.7598. d) 2.8633. 13. Referring to Table 13-8, what is the predicted average value of GPA when ACT = 20? a) 2.61 b) 2.66 c) 2.80 d) 3.12 14. Referring to Table 13-8, what are the decision and conclusion on testing whether there is any linear relationship at the 1% level of significance between GPA and ACT scores? a) Do not reject the null hypothesis; hence, there is not sufficient evidence to show that ACT scores and GPA are linearly related. b) Reject the null hypothesis; hence, there is not sufficient evidence to show that ACT scores and GPA are linearly related. c) Do not reject the null hypothesis; hence, there is sufficient evidence to show that ACT scores and GPA are linearly related. d) Reject the null hypothesis; hence, there is sufficient evidence to show that ACT scores and GPA are linearly related. 4 252x0333 11/18/03 ECO252 QBA2 Third EXAM Nov 25 2003 TAKE HOME SECTION Name: _________________________ Social Security Number: _________________________ Please Note: computer problems 2 and 3 should be turned in with the exam. In problem 2, the 2 way ANOVA table should be completed. The three F tests should be done with a 5% significance level and you should note whether there was (i) a significant difference between drivers, (ii) a significant difference between cars and (iii) significant interaction. In problem 3, you should show on your third graph where the regression line is. II. Do the following: (23+ points). Assume a 5% significance level. Show your work! 1. Assume that each column below represents a random sample of sales of the popular cereal brand, ‘Whee!’ As it was moved from shelf 1 (lowest) to shelf 4 (highest) of a group of supermarkets. Assume that the underlying distribution is Normal and test the hypothesis 1 2 3 4 . a) Before you start add the second to last digit of your social security number to the 451 in column 4 and find the sample variance of sales from shelf 4. For example, Seymour Butz’s SS number is 123456789 and he will change 451 to 459. This should not change the results by much. (2) b) Test the hypothesis (6) Show your work – it is legitimate to check your results by running these problems on the computer, but I expect to see hand computations for every part of them. c) Compare means two by two, using any one appropriate statistical method, to find out which shelves are significantly better than others. (3) d) (Extra Credit) What if you found out that each row represented one store? If this changes your analysis, redo the analysis. (5) e) (Extra Credit) What if you found out that each row represented one store and that the underlying distribution was not Normal? If this changes your analysis, redo the analysis. (5) f) I did some subsequent analysis on this problem. The output, in part said Levene's Test (any continuous distribution) Test Statistic: 0.609 P-Value : 0.613 What was I testing for and what should my conclusion be? (2) Row 1 1 2 3 4 5 6 7 8 9 10 336 417 208 420 366 227 357 353 518 388 Sales of ‘Whee’ Cereal Shelf 2 3 4 440 277 374 421 481 349 328 449 462 373 464 479 492 456 338 413 383 554 497 510 354 423 321 424 518 451 311 462 339 202 Sum of Sum of 1362860 Sum of Sum of 1602366 Sum of Sum of 2140264 shelf 1 = 3590.0 squares of shelf 1 = shelf 2 = 3954.0 squares of shelf 2 = shelf 3 = 4586.0 squares of shelf 3 = 5 252x0333 11/18/03 2. A company, operating in 12 regions, gives us its advertising expenses as a percent of those of its leading competitor, and its sales as a percent of those of its leading competitor. Row 1 2 3 4 5 6 7 8 9 10 11 12 Ad 77 110 110 93 90 95 100 85 96 83 100 95 Sales 85 103 102 109 85 103 110 86 92 87 98 108 Sum Sum Sum Sum of of of of Ad = 1134.0 squares of Ad = 108258 Sales = 1168.0 squares of Sales = 114750 Note that the sum and sum of squares of sales can’t be used directly, but they should help you to get the corrected numbers. Change the 103 in the ‘sales’ column by adding the second-to-last digit of your Social Security number to it. For example, Seymour Butz’s SS number is 123456789 and he will change 103 to 111. This should not change the results by much. The question is whether our relative advertising expenses affect our relative sales, so ‘Sales’ should be your dependent variable and ‘Ad’ should be your independent variable. Show your work – it is legitimate to check your results by running the problem on the computer, but I expect to see hand computations that show clearly where you got your numbers for every part of this problem. a. Compute the regression equation Y b0 b1 x to predict the ‘Sales’ on the basis of ‘Ad’. (2) b. Compute R 2 . (2) c. Compute s e . (2) d. Compute s b0 and do a significance test on b0 (2) e. Do an ANOVA table for the regression. What conclusion can you draw from this table about the relationship between advertising expenditures and sales? Why? (2) f. It is proposed to raise our expenditures to 110% of our competitors’ in every region. Use this to find a predicted value for sales and to create a confidence interval for sales. Explain the difference between this and a prediction interval and when the prediction interval would be more useful. (3) 6