5/3/99 252x9943 ECO252 QBA2 FINAL EXAM May 5, 1999 Name Hour of Class Registered (Circle) MWF 10 11 TR 12:30 2:00 I. (16 points) Do all the following. 1. Hand in your fourth regression problem (2 points) and answer the following questions. a. For the regression of the number of hours of work against the number of machines, what coefficients are significant at the 1% level? Why? What about the 5% level? (2) b. Would you say that the regression of number of hours of work against the number of machines and months of experience is more successful than the regression against machines alone? Why? (3) c. What was the surprise that occurred when you did the stepwise regression? (2) 2. The following pages show the regression of the variable 'mins', the winning time in minutes in a triathlon, against some of the following independent variables: 'female' A dummy variable that is 1 if the contestant is female. 'swim' Number of miles of swimming 'bike' Number of miles of biking 'run' Number of miles of running c6 ‘swim’ multiplied by ‘female’ c7 ‘bike’ multiplied by ‘female’ c8 ‘run’ multiplied by ‘female’ c9 ‘swim’ squared c10 ‘bike’ squared c11 ‘run’ squared a. In the regression of ‘mins’ against ‘female’, ‘swim’, ‘bike’ and ‘run’, which coefficients have signs that look wrong? Why? Which coefficients are not significant at the 99% confidence level? (3) b. Look at the regression of ‘mins’ against ‘run‘, c8 and c11 and the regression of ‘mins’ against ‘run’, and c8. Use .10 . Does either seem to be an improvement over the regression of ‘mins’ against ‘run’ alone? Why?(2) c. Explain the meaning of the F test in the regression of ‘mins’ against ‘female’, ‘swim’, ‘bike’ and ‘run’ . What is being tested and what are the conclusions? (2) d. The printout concludes with a printout of the data and of a correlation matrix. What does this suggest about the problems that are occurring with these regressions? (2) 5/3/99 252x9943 Worksheet size: 100000 cells MTB > RETR 'C:\MINITAB\LR13-49.MTW'. Retrieving worksheet from file: C:\MINITAB\LR13-49.MTW Worksheet was saved on 5/ 3/1999 MTB > regress c1 on 4 c2 c3 c4 c5 Regression Analysis The regression equation is mins = - 24.6 + 35.5 female - 25.0 swim + 7.13 bike - 6.37 run Predictor Constant female swim bike run Coef -24.57 35.47 -25.01 7.130 -6.372 s = 33.02 Stdev 20.13 14.77 45.75 1.331 5.384 R-sq = 98.0% t-ratio -1.22 2.40 -0.55 5.36 -1.18 p 0.241 0.030 0.593 0.000 0.255 R-sq(adj) = 97.4% Analysis of Variance SOURCE Regression Error Total DF 4 15 19 SS 786104 16351 802455 MS 196526 1090 SOURCE female swim bike run DF 1 1 1 1 SEQ SS 6291 726098 52189 1526 Unusual Observations Obs. female mins 1 0.00 489.25 18 1.00 660.48 Fit 547.00 582.47 F 180.29 Stdev.Fit 17.48 17.48 p 0.000 Residual -57.75 78.01 St.Resid -2.06R 2.79R R denotes an obs. with a large st. resid. MTB > regress c1 on 1 c5 Regression Analysis The regression equation is mins = - 19.2 + 23.6 run Predictor Constant run s = 57.74 Coef -19.25 23.615 Stdev 23.19 1.582 R-sq = 92.5% t-ratio -0.83 14.92 p 0.417 0.000 R-sq(adj) = 92.1% 2 5/3/99 252x9943 Analysis of Variance SOURCE Regression Error Total DF 1 18 19 SS 742445 60011 802455 MS 742445 3334 Unusual Observations Obs. run mins 1 26.2 489.2 12 18.6 589.1 Fit 599.5 420.0 F 222.69 Stdev.Fit 25.7 16.4 p 0.000 Residual -110.2 169.1 St.Resid -2.13R 3.05R R denotes an obs. with a large st. resid. MTB > regress c1 on 2 c5 c8 Regression Analysis The regression equation is mins = - 19.2 + 22.1 run + 3.02 C8 Predictor Constant run C8 Coef -19.25 22.106 3.017 s = 54.36 Stdev 21.83 1.705 1.659 R-sq = 93.7% t-ratio -0.88 12.96 1.82 p 0.390 0.000 0.087 R-sq(adj) = 93.0% Analysis of Variance SOURCE Regression Error Total DF 2 17 19 SS 752216 50240 802455 SOURCE run C8 DF 1 1 SEQ SS 742445 9771 Unusual Observations Obs. run mins 2 18.6 505.1 11 26.2 540.9 12 18.6 589.1 MS 376108 2955 Fit 391.9 639.0 448.0 F 127.27 Stdev.Fit 21.9 32.5 21.9 p 0.000 Residual 113.2 -98.1 141.0 St.Resid 2.27R -2.25R 2.83R R denotes an obs. with a large st. resid. 3 5/3/99 252x9943 MTB > regress c1 on 2 c5 c11 Regression Analysis The regression equation is mins = - 102 + 39.6 run - 0.519 C11 Predictor Constant run C11 Coef -101.71 39.550 -0.5192 s = 54.11 Stdev 49.18 8.654 0.2778 R-sq = 93.8% t-ratio -2.07 4.57 -1.87 p 0.054 0.000 0.079 R-sq(adj) = 93.1% Analysis of Variance SOURCE Regression Error Total DF 2 17 19 SS 752675 49780 802455 SOURCE run C11 DF 1 1 SEQ SS 742445 10230 Unusual Observations Obs. run mins 12 18.6 589.1 MS 376337 2928 Fit 454.3 F 128.52 Stdev.Fit 24.0 p 0.000 Residual 134.8 St.Resid 2.78R R denotes an obs. with a large st. resid. MTB > regress c1 on 3 c5 c8 c11 Regression Analysis The regression equation is mins = - 102 + 38.0 run + 3.02 C8 - 0.519 C11 Predictor Constant run C8 C11 s = 50.01 Coef -101.71 38.042 3.017 -0.5192 Stdev 45.45 8.033 1.526 0.2567 R-sq = 95.0% t-ratio -2.24 4.74 1.98 -2.02 p 0.040 0.000 0.066 0.060 R-sq(adj) = 94.1% 4 5/3/99 252x9943 Analysis of Variance SOURCE Regression Error Total DF 3 16 19 SS 762446 40009 802455 SOURCE run C8 C11 DF 1 1 1 SEQ SS 742445 9771 10230 Unusual Observations Obs. run mins 12 18.6 589.1 MS 254149 2501 Fit 482.3 F 101.64 Stdev.Fit 26.3 p 0.000 Residual 106.7 St.Resid 2.51R R denotes an obs. with a large st. resid. 5 5/3/99 252x9943 MTB > print c1-c11 Data Display Row mins female swim bike run C6 C7 C8 C9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 489.250 505.150 245.500 204.400 114.533 108.267 79.417 566.500 74.983 116.117 540.933 589.067 280.100 235.033 127.167 120.750 90.317 660.483 83.150 131.817 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2.40 2.00 1.20 1.50 0.93 0.93 0.50 2.40 0.50 0.60 2.40 2.00 1.20 1.50 0.93 0.93 0.50 2.40 0.50 0.60 112.0 100.0 55.3 48.0 24.8 24.8 18.0 112.0 20.0 25.0 112.0 100.0 55.3 48.0 24.8 24.8 18.0 112.0 20.0 25.0 26.2 18.6 13.1 10.0 6.2 6.2 5.0 26.2 4.0 6.2 26.2 18.6 13.1 10.0 6.2 6.2 5.0 26.2 4.0 6.2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.40 2.00 1.20 1.50 0.93 0.93 0.50 2.40 0.50 0.60 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 112.0 100.0 55.3 48.0 24.8 24.8 18.0 112.0 20.0 25.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 26.2 18.6 13.1 10.0 6.2 6.2 5.0 26.2 4.0 6.2 5.7600 4.0000 1.4400 2.2500 0.8649 0.8649 0.2500 5.7600 0.2500 0.3600 5.7600 4.0000 1.4400 2.2500 0.8649 0.8649 0.2500 5.7600 0.2500 0.3600 Row C10 C11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 12544.0 10000.0 3058.1 2304.0 615.0 615.0 324.0 12544.0 400.0 625.0 12544.0 10000.0 3058.1 2304.0 615.0 615.0 324.0 12544.0 400.0 625.0 686.44 345.96 171.61 100.00 38.44 38.44 25.00 686.44 16.00 38.44 686.44 345.96 171.61 100.00 38.44 38.44 25.00 686.44 16.00 38.44 MTB > Correlation c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11. Correlations (Pearson) mins 0.089 0.951 0.984 0.962 0.510 0.584 0.564 0.956 0.975 0.928 female swim bike run C6 C7 C8 female swim bike run C6 C7 C8 C9 C10 C11 0.000 0.000 0.000 0.792 0.716 0.726 0.000 0.000 0.000 0.973 0.965 0.432 0.480 0.470 0.985 0.954 0.932 0.985 0.420 0.494 0.479 0.979 0.989 0.954 0.417 0.487 0.487 0.982 0.983 0.985 0.982 0.980 0.426 0.412 0.403 0.993 0.483 0.488 0.471 0.478 0.478 0.479 C9 0.982 0.974 C10 C10 C11 0.975 6 5/3/99 252x9943 II. Do at least 4 of the following 7 Problems (at least 15 each) (or do sections adding to at least 60 points Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H1 where applicable. Use a significance level of 5% unless noted otherwise. 1. a. Premiums on a group of 11closed end mutual funds were as follows. (These are in per cent, but that shouldn’t affect your analysis.)Test the hypothesis that the mean is 3 per cent using (i) Either a test ratio or a critical value and (ii) A confidence interval. (6) +4.7 -0.7 +5.3 +9.2 -0.3 -0.3 +5.0 +0.4 -1.9 +0.5 -3.1 b. Test that the following data (i) has a Poisson distribution (6) and (ii)has a Poisson distribution with a mean of 4.5 (6). If you do both parts do only one with a chi-square method. x 0 1 2 3 4 5 6 7 O 23 19 42 60 89 79 48 40 7 5/3/99 252x9943 2. Eight Technicians are asked to take a test and then rated by their supervisors. Scores and ratings follow, with the addition of productivity figures. (Use .01 ) Technician Test Score Performance Productivity ranking x1 x2 x3 Armstrong 83 3 180 Brubecker 68 7 170 Cooper 60 6 164 Dollfuss 81 4 182 Ezekiel 74 5 174 Fassbinder 95 1 191 Goodwrench 90 2 195 Hingle 66 8 160 x 1 617, x 2 1 48631, x 2 36, x 2 2 204, x 3 1416, x 2 3 251702. a. Compute the correlation between x1 and x 2 and test it for significance.(5) b. Compute the rank correlation between x1 and x 2 and test it for significance. Which of these two measures (rank or conventional correlation?) is most appropriate here? Why?(5) c. Compute Kendall’s W for these data and test it for significance (6) d. Test the hypothesis that the correlation between x1 and x 2 is .8 . (5) 8 5/3/99 252x9943 3. Samples of demand for four types of sailboat sold by your firm is as follows: West Coast East Coast Total Pirates Revenge 74 146 220 Jolly Roger 54 110 164 Bluebeard’s Treasure 46 100 146 Ahab’s Quest 50 120 170 Total 224 476 700 Do all tests at the 95% confidence level. a. Management had initially assumed that the proportion of total sales of “Pirates Revenge” would be at most 30% of sales. Test this. (3) b. Test the hypothesis that sales of the “Pirates Revenge” are the same proportion of sales on both the East and West Coast (4) c. Test the hypothesis that sales on the West Coast follow a uniform distribution (i.e. that each model is the same proportion of West Coast sales) (5) d. Test the hypothesis that the proportions of each boat sold are the same on both coasts. (5) 9 5/3/99 252x9943 4. Data on passengers (in thousands), advertising (in $thousands) and (National income in $trillions) appears below. (Use .05 ) x 3 y (2) (This must be done correctly pass adv inc season a) Compute y 15 17 13 23 x1 10 12 8 17 x2 2.40 2.72 2.08 3.68 x3 1 1 1 1 to get full credit for b.) b) Compute a simple regression of passengers against National income. (6) c) Compute R 2 (4) d) Compute s e (3) 16 10 2.56 1 e) Compute s b1 ( the std deviation of the coefficient 21 14 20 26 18 17 18 23 15 16 y 272 , 15 3.36 0 10 2.24 0 14 3.20 0 19 3.84 0 10 2.72 0 11 2.07 0 13 2.33 0 16 2.98 0 10 1.94 1 12 2.17 1 2 y 5128 , x 187, x y 3549 , x 1 2y 1 759 .090 , of National Income) and do a confidence interval for 1 .(3) f) Do a confidence interval for Passengers, when income is $4.10 billion. (3) At what income will this interval be smallest? (1) x x x 1 2 2 1 2469, x 2 40.29, x 2 2 113.339, 525 .38 and n 15 . You do not need all of these. 10 5/3/99 252x9943 5. Data from problem 4 is repeated below. (Use .01 ) y 272 , y 2 5128 , x 187, x12 2469, x y 3549 , x 1 2y 1 759 .090 , x x 1 2 x 2 40.29, x 2 2 113.339, 525 .38 and n 15 . a. Do a multiple regression of passengers against advertising and National Income. (12) b. Compute R 2 and R 2 adjusted for degrees of freedom for both this and the previous problem. Compare the values of R 2 adjusted between this and the previous problem. Use an F test to compare R 2 here with the R 2 from the previous problem.(5) c. Compute the regression sum of squares and use it in an F test to test the usefulness of this regression. (5) d. Use your regression to predict the number of passengers when we spend $13 (thousand) on advertising and National Income is $3.5 (trillion).(2) d. This regression on the previous page was run with the command MTW > regress C1 on 1 C3; SUBC > dw. As a result, the last line of the regression read Durbin-Watson statistic = 0.71 What did I test for and what was the meaning of the last line of the regression? Assume a confidence level and use the tables in the text. 11 5/3/99 252x9943 6.(Watch it!) Three methods are used to train candidates for the FAA pilots exam. Scores for trainees are shown below classified by method. Method a. Assume that the data is normal and compare the means Video Audio for the first two methods (Assume unequal variances) (5) Cassette Cassette Classroom b. Do the same for all three methods (You may assume 72 73 68 equal variances now) (7) 86 75 83 80 60 50 c. Test column 1 to see if it has the normal distribution (5) 91 52 91 46 84 84 68 76 77 75 94 81 92 90 72 86 80 91 46 68 75 x1 Note: For the first column: x1 x 0.14 0.82 0.41 1.16 1.91 0.41 0.07 s Note: x 1 518, x 2 1 39626, x 3 810, x 2 3 67240. 12 4/29/99 252x9942 7. (Watch it!) Three methods are used to train candidates for the FAA pilots exam. Scores for trainees are shown below classified by method. Method a. Using a sign test, check x3 to see if it has a median of 85. (4) Video Audio Cassette Cassette Classroom b. Repeat the test on x3 using a more powerful method. (5) x1 72 86 80 91 46 68 75 x2 73 75 60 52 84 76 x3 68 83 50 91 84 77 94 81 92 90 c. Apply the Runs Test as follows: Write down the numbers in x1 and x3 together in order. Underneath the numbers write down A if the number comes from x1 and C if it comes from x3 . You will have a sequence like AACACC ….. . In case of a tie remove both tying numbers from your test. Do a runs test on the resulting sequence to see if the A’s and C’s appear randomly. Congratulations! You have just done a Wald-Wolfowitz Test for the equality of means in two (nonnormal) samples. If the sequence is random, the means are equal. (6) 13