252x0781 12/05/07 ECO252 QBA2 Final Exam December 12-15, 2007 Version 1 Name and Class hour:_________________________ I. (25+ points) Do all the following. Note that answers without reasons and/or citation of appropriate statistical tests receive no credit. Most answers require a statistical test, that is, stating or implying a hypothesis and showing why it is true or false by citing a table value or a p-value. If you haven’t done it lately, take a fast look at ECO 252 - Things That You Should Never Do on a Statistics Exam (or Anywhere Else). There are over 150 possible points, but the exam is normed on 75 points. In the Lees’ 2000 text they noted that before 1979 the Federal Reserve targeted interest rates, letting the money supply grow in such a way that the interest rates would remain stable. After 1979, the Fed switched to targeting the money supply. The Lees did a regression of Money supply against GNP (I had to replace this with GDP.), the prime rate (PrRt) and a dummy variable (Dummy) that is 1 before 1979 and zero from 1979 till 1990, when their analysis stops, They report a high R-squared, and extremely significant coefficients for the Prime Rate, GNP and the dummy variable, which seems to tell us that the Fed’s change of regime had a real effect on the money supply. Later in the text they suggest the addition of an interaction variable (GDPPR), which is the product of the Prime rate and the GDP, and a second interaction variable (GDPPR). I added the year and its square measured from 1958, population, and GDP squared. My attempt to update the Lees results was terrible discouraging. The dependent variable is M1 or its logarithm (logM1). ————— 12/3/2007 11:31:46 PM ———————————————————— Welcome to Minitab, press F1 for help. MTB > WOpen "C:\Documents and Settings\RBOVE\My Documents\Minitab\M1PrRGDP.MTW". Retrieving worksheet from file: 'C:\Documents and Settings\RBOVE\My Documents\Minitab\M1PrRGDP.MTW' Worksheet was saved on Mon Dec 03 2007 MTB > print c5 c2 c4 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 Data Display Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 C5 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 M1 140.0 140.7 145.2 147.8 153.3 160.3 167.8 172.0 183.3 197.4 203.9 214.4 228.3 249.2 262.9 274.2 287.1 306.2 330.9 357.3 381.8 408.5 436.7 474.8 521.4 551.6 619.8 724.7 750.2 786.7 792.9 824.7 896.9 1024.8 1129.7 1150.7 PrRt 4.50 5.00 4.50 4.50 4.50 4.50 4.50 5.52 5.50 6.50 8.23 8.00 5.50 5.04 7.49 11.54 7.07 7.20 6.75 8.63 11.65 12.63 20.03 16.50 10.50 12.60 9.78 8.50 8.25 9.00 11.07 10.00 8.50 6.50 6.00 7.25 GDP $506.60 $526.40 $544.70 $585.60 $617.70 $663.60 $719.10 $787.80 $832.60 $910.00 $984.60 $1,038.50 $1,127.10 $1,238.30 $1,382.70 $1,500.00 $1,638.30 $1,825.30 $2,030.90 $2,294.70 $2,563.30 $2,789.50 $3,128.40 $3,255.00 $3,536.70 $3,933.20 $4,220.30 $4,462.80 $4,739.50 $5,103.80 $5,484.40 $5,803.10 $5,995.90 $6,337.70 $6,657.40 $7,072.20 Dummy 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GDPPr 2280 2632 2451 2635 2780 2986 3236 4349 4579 5915 8103 8308 6199 6241 10356 17310 11583 13142 13709 19803 29862 35231 62662 53708 37135 49558 41275 37934 39101 45934 60712 58031 50965 41195 39944 51273 GDPdum 506.6 526.4 544.7 585.6 617.7 663.6 719.1 787.8 832.6 910.0 984.6 1038.5 1127.1 1238.3 1382.7 1500.0 1638.3 1825.3 2030.9 2294.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 yearsq 1 4 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361 400 441 484 529 576 625 676 729 784 841 900 961 1024 1089 1156 1225 1296 1 252x0781 12/05/07 37 38 39 40 41 42 43 44 45 46 47 48 Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 1127.4 1081.4 1072.8 1095.9 1123.0 1087.7 1182.0 1219.5 1305.5 1375.2 1373.2 1365.9 Pop 176289 179979 182992 185771 188483 191141 193526 195576 197457 199399 201385 203984 206827 209284 211357 213342 215465 217583 219760 222095 224567 227225 229466 231664 233792 235825 237924 240133 242289 244499 246819 249623 252981 256514 259919 263126 266278 269394 272647 275854 279040 282217 285226 288126 290796 293638 296507 299398 9.00 8.25 8.50 8.50 7.75 9.50 6.98 4.75 4.22 4.01 6.01 8.02 GDPsq 256644 277097 296698 342927 381553 440365 517105 620629 693223 828100 969437 1078482 1270354 1533387 1911859 2250000 2684027 3331720 4124555 5265648 6570507 7781310 9786887 10595025 12508247 15470062 17810932 19916584 22462860 26048774 30078643 33675970 35950817 40166441 44320975 50016013 54725965 61103926 68961398 76510009 85903239 96373489 102576384 109612524 120139137 136560259 154601869 174100108 $7,397.70 $7,816.90 $8,304.30 $8,747.00 $9,268.40 $9,817.00 $10,128.00 $10,469.60 $10,960.80 $11,685.90 $12,433.90 $13,194.70 log M1 4.94164 4.94663 4.97811 4.99586 5.03240 5.07705 5.12277 5.14749 5.21112 5.28523 5.31763 5.36784 5.43066 5.51826 5.57177 5.61386 5.65983 5.72424 5.80182 5.87858 5.94490 6.01249 6.07925 6.16289 6.25652 6.31282 6.42940 6.58576 6.62034 6.66785 6.67570 6.71502 6.79894 6.93225 7.02971 7.04813 7.02767 6.98601 6.97803 6.99933 7.02376 6.99182 7.07496 7.10620 7.17434 7.22635 7.22490 7.21957 0 0 0 0 0 0 0 0 0 0 0 0 66579 64489 70587 74350 71830 93262 70693 49731 46255 46860 74728 105821 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 37 38 39 40 41 42 43 44 45 46 47 48 1369 1444 1521 1600 1681 1764 1849 1936 2025 2116 2209 2304 logM1l 4.89222 4.94164 4.94663 4.97811 4.99586 5.03240 5.07705 5.12277 5.14749 5.21112 5.28523 5.31763 5.36784 5.43066 5.51826 5.57177 5.61386 5.65983 5.72424 5.80182 5.87858 5.94490 6.01249 6.07925 6.16289 6.25652 6.31282 6.42940 6.58576 6.62034 6.66785 6.67570 6.71502 6.79894 6.93225 7.02971 7.04813 7.02767 6.98601 6.97803 6.99933 7.02376 6.99182 7.07496 7.10620 7.17434 7.22635 7.22490 2 252x0781 12/05/07 I followed the course suggested by the textbook to find what variables were actually important in predicting the money supply. Results for: M1PrRGDP.MTW MTB > Regress c2 5 c4 c6 c7 c10 c12; SUBC> Constant; SUBC> VIF; SUBC> Brief 2. Regression 1 Regression Analysis: M1 versus PrRt, GDP, Dummy, year, Pop The regression equation is M1 = 2874 - 19.1 PrRt + 0.0714 GDP - 115 Dummy + 46.2 year - 0.0149 Pop Predictor Coef SE Coef T Constant 2874 1232 2.33 PrRt -19.116 3.941 -4.85 GDP 0.07138 0.01762 4.05 Dummy -114.81 48.62 -2.36 year 46.23 15.57 2.97 Pop -0.014888 0.007176 -2.07 S = 57.7863 R-Sq = 98.4% R-Sq(adj) Analysis of Variance Source DF SS Regression 5 8498077 Residual Error 42 140249 Total 47 8638326 Source PrRt GDP Dummy year Pop DF 1 1 1 1 1 MS 1699615 3339 P VIF 0.025 0.000 2.241 0.000 62.461 0.023 8.260 0.005 668.523 0.044 917.418 = 98.2% F 508.98 P 0.000 Seq SS 3746 8260319 139454 80187 14371 Unusual Observations Obs PrRt M1 Fit SE Fit Residual St Resid 23 20.0 436.70 361.08 37.33 75.62 1.71 X 35 6.0 1129.70 982.60 18.35 147.10 2.68R 36 7.3 1150.70 986.80 14.01 163.90 2.92R 37 9.0 1127.40 975.89 11.81 151.51 2.68R R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. So the regression above was my first attempt. There are several questions that can be asked at this point. 1) Why does this regression look awfully good as far as significance and the amount of the variation in the Y variable that is explained by the equation? (3) 2) There are only two coefficients here whose sign you can predict in advance. What are they, what did you predict and why and were you right? (2) 3) What does the Analysis of Variance tell us? What hypothesis did it cause you to reject?(1) 3 252x0781 12/05/07 MTB > Regress c2 4 c4 c6 c7 c10 ; SUBC> Constant; SUBC> VIF; SUBC> Brief 2. Regression 2 Regression Analysis: M1 versus PrRt, GDP, Dummy, year The regression equation is M1 = 321 - 20.7 PrRt + 0.0415 GDP - 174 Dummy + 14.5 year Predictor Coef SE Coef Constant 321.24 66.06 PrRt -20.668 4.016 GDP 0.04152 0.01055 Dummy -173.71 40.96 year 14.530 3.077 S = 59.9651 R-Sq = 98.2% T P VIF 4.86 0.000 -5.15 0.000 2.160 3.94 0.000 20.791 -4.24 0.000 5.444 4.72 0.000 24.254 R-Sq(adj) = 98.0% Analysis of Variance Source DF SS Regression 4 8483706 Residual Error 43 154620 Total 47 8638326 MS 2120927 3596 Source PrRt GDP Dummy year DF 1 1 1 1 F 589.83 P 0.000 Seq SS 3746 8260319 139454 80187 Unusual Observations Obs PrRt M1 Fit SE Fit Residual St Resid 23 20.0 436.70 371.34 38.39 65.36 1.42 X 35 6.0 1129.70 982.21 19.04 147.49 2.59R 36 7.3 1150.70 988.13 14.53 162.57 2.79R 37 9.0 1127.40 980.00 12.08 147.40 2.51R R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. MTB > Regress c2 3 c4 c6 c7 ; SUBC> Constant; SUBC> VIF; SUBC> Brief 2. Regression 3 Regression Analysis: M1 versus PrRt, GDP, Dummy The regression equation is M1 = 451 - 14.3 PrRt + 0.0865 GDP - 240 Dummy Predictor Coef SE Coef T P VIF Constant 450.99 73.19 6.16 0.000 PrRt -14.269 4.605 -3.10 0.003 1.914 GDP 0.086456 0.005548 15.58 0.000 3.875 Dummy -239.76 46.90 -5.11 0.000 4.809 S = 73.0515 R-Sq = 97.3% R-Sq(adj) = 97.1% Analysis of Variance Source DF SS Regression 3 8403519 Residual Error 44 234807 Total 47 8638326 Source PrRt GDP Dummy DF 1 1 1 MS 2801173 5337 F 524.91 P 0.000 Seq SS 3746 8260319 139454 Unusual Observations Obs PrRt M1 Fit 23 20.0 436.7 435.7 SE Fit 43.7 Residual 1.0 St Resid 0.02 X 4 252x0781 12/05/07 35 6.0 1129.7 941.0 20.6 188.7 2.69R 36 7.3 1150.7 959.0 16.0 191.7 2.69R 37 9.0 1127.4 962.1 14.0 165.3 2.30R R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. 4) What did I do to get from Regression 1 to regression 3 and why? (2) 5) Why was I now ready to quit dropping variables and do a ‘best subsets’ regression? (1) [9] 6) What would the money supply be that would be predicted for 1970 assuming that the numbers given for 1970 are correct? By what percent is it off the actual value? (2) 7) Can you make this into a rough prediction interval? Does this include the actual value for 1970? (2) [13] MTB > BReg c2 c4 c6 c7 ; SUBC> NVars 1 3; SUBC> Best 2; SUBC> Constant. Regression 4 Best Subsets Regression: M1 versus PrRt, GDP, Dummy Response is M1 Vars 1 1 2 2 3 R-Sq 95.6 67.8 96.7 95.7 97.3 R-Sq(adj) 95.6 67.1 96.5 95.5 97.1 Mallows Cp 26.5 477.7 11.6 28.1 4.0 S 90.432 246.02 79.727 91.197 73.051 D P u r G m R D m t P y X X X X X X X X X 8) What is Regression 4 telling me to do? Why can you say that? (2) MTB > Regress c2 3 c4 c6 c7 ; SUBC> GFourpack; SUBC> RType 1; SUBC> Constant; SUBC> VIF; Regression 5 5 252x0781 12/05/07 SUBC> SUBC> DW; Brief 2. Regression Analysis: M1 versus PrRt, GDP, Dummy The regression equation is M1 = 451 - 14.3 PrRt + 0.0865 GDP - 240 Dummy Predictor Coef SE Coef T P VIF Constant 450.99 73.19 6.16 0.000 PrRt -14.269 4.605 -3.10 0.003 1.914 GDP 0.086456 0.005548 15.58 0.000 3.875 Dummy -239.76 46.90 -5.11 0.000 4.809 S = 73.0515 R-Sq = 97.3% R-Sq(adj) = 97.1% Analysis of Variance Source DF SS Regression 3 8403519 Residual Error 44 234807 Total 47 8638326 Source PrRt GDP Dummy DF 1 1 1 MS 2801173 5337 F 524.91 P 0.000 Seq SS 3746 8260319 139454 Unusual Observations Obs PrRt M1 Fit SE Fit Residual St Resid 23 20.0 436.7 435.7 43.7 1.0 0.02 X 35 6.0 1129.7 941.0 20.6 188.7 2.69R 36 7.3 1150.7 959.0 16.0 191.7 2.69R 37 9.0 1127.4 962.1 14.0 165.3 2.30R R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. Durbin-Watson statistic = 0.445619 Residual Plots for M1 9) Regression 5 is just a repeat of regression 3, but now I am doing residual analysis. What are the DurbinWatson statistic and the plot of residuals vs. order telling me is present? What 2 conditions for regression seem to be being violated? (3) [18] 6 252x0781 12/05/07 MTB > Regress c2 4 c4 c6 c7 c13; SUBC> GFourpack; SUBC> RType 1; SUBC> Constant; SUBC> VIF; SUBC> DW; SUBC> Brief 2. Regression 6 Regression Analysis: M1 versus PrRt, GDP, Dummy, GDPsq The regression equation is M1 = 131 - 13.1 PrRt + 0.187 GDP - 26.3 Dummy - 0.000007 GDPsq Predictor Coef SE Coef T P Constant 131.36 64.18 2.05 0.047 PrRt -13.142 3.050 -4.31 0.000 GDP 0.18659 0.01370 13.62 0.000 Dummy -26.33 41.88 -0.63 0.533 GDPsq -0.00000671 0.00000088 -7.59 0.000 S = 48.3231 R-Sq = 98.8% R-Sq(adj) = 98.7% Analysis of Variance Source DF SS Regression 4 8537916 Residual Error 43 100410 Total 47 8638326 Source PrRt GDP Dummy GDPsq DF 1 1 1 1 MS 2134479 2335 F 914.07 VIF 1.919 53.994 8.764 33.120 P 0.000 Seq SS 3746 8260319 139454 134396 Unusual Observations Obs PrRt M1 Fit SE Fit Residual St Resid 23 20.0 436.70 386.21 29.65 50.49 1.32 X 35 6.0 1129.70 997.46 15.53 132.24 2.89R 36 7.3 1150.70 1020.24 13.32 130.46 2.81R 37 9.0 1127.40 1026.39 12.53 101.01 2.16R 42 9.5 1087.70 1191.94 14.93 -104.24 -2.27R 48 8.0 1365.90 1320.38 30.91 45.52 1.23 X R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. Durbin-Watson statistic = 0.551845 Residual Plots for M1 7 252x0781 12/05/07 10) I now felt free to add the square of GDP as a new independent variable? What happened to the VIFs? Do I care? Why? (2) 11) What did adding the square of GDP do to the significance of my coefficients and the fraction of the variation of Y that is explained by the equation? (2) [22] MTB > let c14 = loge (c2) MTB > Regress c14 4 c4 c6 c7 c13; SUBC> GFourpack; SUBC> RType 1; SUBC> Constant; SUBC> VIF; SUBC> DW; SUBC> Brief 2. Regression 7 Regression Analysis: log M1 versus PrRt, GDP, Dummy, GDPsq The regression equation is log M1 = 4.79 + 0.00846 PrRt + 0.000453 GDP + 0.0289 Dummy - 0.000000 GDPsq Predictor Coef SE Coef T P Constant 4.7882 0.1358 35.26 0.000 PrRt 0.008461 0.006453 1.31 0.197 GDP 0.00045309 0.00002899 15.63 0.000 Dummy 0.02889 0.08862 0.33 0.746 GDPsq -0.00000002 0.00000000 -11.66 0.000 S = 0.102246 R-Sq = 98.5% R-Sq(adj) = 98.4% Analysis of Variance Source DF SS Regression 4 29.3981 Residual Error 43 0.4495 Total 47 29.8476 Source PrRt GDP DF 1 1 MS 7.3495 0.0105 F 703.01 VIF 1.919 53.994 8.764 33.120 P 0.000 Seq SS 1.2680 25.6375 8 252x0781 12/05/07 Dummy GDPsq 1 1 1.0725 1.4202 Unusual Observations Obs PrRt log M1 Fit SE Fit Residual St Resid 23 20.0 6.0792 6.1618 0.0627 -0.0826 -1.02 X 42 9.5 6.9918 7.2158 0.0316 -0.2239 -2.30R 48 8.0 7.2196 7.0393 0.0654 0.1803 2.29RX R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. Durbin-Watson statistic = 0.306367 Residual Plots for log M1 12) I just replaced the money supply by its logarithm. The residual analysis tells me this was a sort of good idea? What does that mean? (1)[23] 13) What is really weird about these coefficients? Which one has the wrong sign? (1) MTB > Regress c14 3 c4 c6 SUBC> GFourpack; SUBC> RType 1; SUBC> Constant; SUBC> VIF; c13; Regression 8 9 252x0781 12/05/07 SUBC> SUBC> DW; Brief 2. Regression Analysis: log M1 versus PrRt, GDP, GDPsq The regression equation is log M1 = 4.83 + 0.00732 PrRt + 0.000445 GDP - 0.000000 GDPsq Predictor Coef SE Coef T P Constant 4.83016 0.04310 112.06 0.000 PrRt 0.007316 0.005359 1.37 0.179 GDP 0.00044536 0.00001650 26.99 0.000 GDPsq -0.00000002 0.00000000 -15.60 0.000 S = 0.101203 R-Sq = 98.5% R-Sq(adj) = 98.4% Analysis of Variance Source DF SS Regression 3 29.3970 Residual Error 44 0.4506 Total 47 29.8476 Source PrRt GDP GDPsq DF 1 1 1 MS 9.7990 0.0102 F 956.75 VIF 1.351 17.854 18.176 P 0.000 Seq SS 1.2680 25.6375 2.4915 Unusual Observations Obs 23 42 48 PrRt 20.0 9.5 8.0 log M1 6.0792 6.9918 7.2196 Fit 6.1606 7.2104 7.0413 SE Fit 0.0620 0.0267 0.0644 Residual -0.0814 -0.2186 0.1783 St Resid -1.02 X -2.24R 2.28RX R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. Durbin-Watson statistic = 0.289829 Residual Plots for log M1 MTB > Regress c14 2 SUBC> GFourpack; SUBC> RType 1; SUBC> Constant; SUBC> VIF; SUBC> DW; SUBC> Brief 2. c6 c13; Regression 9 10 252x0781 12/05/07 Regression Analysis: log M1 versus GDP, GDPsq The regression equation is log M1 = 4.87 + 0.000457 GDP - 0.000000 GDPsq Predictor Coef SE Coef T P Constant 4.87027 0.03184 152.96 0.000 GDP 0.00045654 0.00001446 31.58 0.000 GDPsq -0.00000002 0.00000000 -18.76 0.000 S = 0.102169 R-Sq = 98.4% R-Sq(adj) = 98.4% Analysis of Variance Source DF SS Regression 2 29.378 Residual Error 45 0.470 Total 47 29.848 Source GDP GDPsq DF 1 1 MS 14.689 0.010 F 1407.18 VIF 13.455 13.455 P 0.000 Seq SS 25.705 3.673 Unusual Observations Obs GDP log M1 Fit SE Fit Residual St Resid 42 9817 6.9918 7.1988 0.0256 -0.2070 -2.09R 47 12434 7.2249 7.0925 0.0478 0.1324 1.47 X 48 13195 7.2196 7.0041 0.0590 0.2154 2.58RX R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. Durbin-Watson statistic = 0.208342 Residual Plots for log M1 MTB > Regress c14 3 SUBC> GFourpack; SUBC> RType 1; SUBC> Constant; SUBC> VIF; SUBC> DW; SUBC> Brief 2. c6 c13 c8; Regression 10 Regression Analysis: log M1 versus GDP, GDPsq, GDPPr The regression equation is 11 252x0781 12/05/07 log M1 = 4.87 + 0.000465 GDP - 0.000000 GDPsq - 0.000001 GDPPr Predictor Coef SE Coef T P VIF Constant 4.86787 0.03240 150.23 0.000 GDP 0.00046548 0.00002208 21.08 0.000 30.892 GDPsq -0.00000002 0.00000000 -16.38 0.000 17.958 GDPPr -0.00000070 0.00000130 -0.54 0.593 5.889 S = 0.102985 R-Sq = 98.4% R-Sq(adj) = 98.3% Analysis of Variance Source DF SS Regression 3 29.3810 Residual Error 44 0.4667 Total 47 29.8476 MS 9.7937 0.0106 F 923.42 P 0.000 Source DF Seq SS GDP 1 25.7052 GDPsq 1 3.6727 GDPPr 1 0.0031 Unusual Observations Obs GDP log M1 Fit SE Fit Residual St Resid 42 9817 6.9918 7.1826 0.0396 -0.1908 -2.01R 48 13195 7.2196 6.9803 0.0741 0.2393 3.35RX R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. Durbin-Watson statistic = 0.196041 14) What has happened to significance and the fraction of the variation in the dependent variable explained by the regression in Regressions 8), 9) and 10.? In terms of significance etc. which of these 3 is the ‘best’ regression? Why would the Chairman of the FRB be very annoyed? (3) [27] Residual Plots for log M1 MTB > Regress c14 4 SUBC> GFourpack; SUBC> RType 1; SUBC> Constant; SUBC> VIF; SUBC> DW; SUBC> Brief 2. c6 c13 c8 Not Shown. c15; Regression 11 Regression Analysis: log M1 versus GDP, GDPsq, GDPPr, logM1l The regression equation is log M1 = - 0.174 + 0.000001 GDP - 0.000000 GDPsq - 0.000001 GDPPr + 1.04 logM1l Predictor Coef SE Coef T Constant -0.1738 0.2820 -0.62 GDP 0.00000085 0.00002708 0.03 GDPsq -0.00000000 0.00000000 -0.36 GDPPr -0.00000109 0.00000045 -2.39 logM1l 1.04474 0.05838 17.89 S = 0.0358443 R-Sq = 99.8% R-Sq(adj) = Analysis of Variance Source DF SS Regression 4 29.7924 Residual Error 43 0.0552 Total 47 29.8476 Source GDP GDPsq GDPPr logM1l DF 1 1 1 1 MS 7.4481 0.0013 P 0.541 0.975 0.723 0.021 0.000 99.8% F 5797.02 VIF 383.407 136.981 5.902 80.236 P 0.000 Seq SS 25.7052 3.6727 0.0031 0.4114 Unusual Observations 12 252x0781 12/05/07 Obs 28 37 38 48 GDP 4463 7398 7817 13195 log M1 6.58576 7.02767 6.98601 7.21957 Fit 6.49641 7.09767 7.07589 7.18793 SE Fit 0.00824 0.00984 0.00849 0.02830 Residual 0.08935 -0.07000 -0.08988 0.03164 St Resid 2.56R -2.03R -2.58R 1.44 X R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. Durbin-Watson statistic = 1.17315 Residual Plots for log M1 Not displayed. 15) So what problem did this fix? Incidentally what I added to the independent variables was the money supply of the previous period? (1) [28] 13 252x0781 12/05/07 II. Do at least 4 of the following 8 Problems (at least 12 each) (or do sections adding to at least 50 points – (Anything extra you do helps, and grades wrap around). It is especially important to do more if you have skipped much of parts I or II. Show your work! State H 0 and H1 where applicable. Use a significance level of 5% unless noted otherwise. Do not answer questions without citing appropriate statistical tests – That is, explain your hypotheses and what values from what table were used to test them. Clearly label what section of each problem you are doing! The entire test has about 160 points, but 70 is considered a perfect score. Don’t waste our time by telling me that two means, proportions, variances or medians don’t look the same to you. You need statistical tests! There are some blank pages below. Put your name on as many loose pages as possible! Mark sections of your answer clearly. 1). Multiple choice. a) If I want to test to see if the mean of x1 is larger than the mean of x 2 my null hypothesis is: (Note: D 1 2 ) Only check one answer! (2) i) 1 2 and D 0 ii) 1 2 and D 0 v) 1 2 and D 0 vi) 1 2 and D 0 iii) 1 2 and D 0 iv) 1 2 and D 0 vii) 1 2 and D 0 viii) 1 2 and D 0 b) Compared to multiple regression, simple regression is different in having only one i) Observation ii) Parameter iii) Dependent variable iv) Independent variable v) Y-intercept vi) All of the above c) For the following quantities, mark their lines with yes (Y) or no (N) as to whether they must be positive ___ R 2 adjusted for degrees of freedom ___ The correlation rx1 x2 between two independent variables x1 and x 2 ___ S xy xy nx y ___ The coefficient b0 in a multiple regression. d) Assume that we wish to test the hypothesis that a mean is greater than 3 and we compute the ratio x 3 t where our sample statistics are computed from a sample of 29. If .05 , we reject the null sx hypothesis if i) t is above 1.645 or below -1.645 ii) t is above 1.960 or below -1.960 iii) t is below – 1.645 iv) t is below -1.960 v) t is above 1.645 vi) t is above 1.960 vii) None of the above. (Fill in a more appropriate answer!) 14 252x0781 12/05/07 e) Consumers are asked to take the Pepsi Challenge. They were asked they which cola they preferred and the number that preferred Pepsi was recorded. Sample 1 was males and sample 2 was females. The following was run on Minitab MTB > PTwo 109 46 52 13; SUBC> Pooled. Test and CI for Two Proportions Sample X N Sample p 1 46 109 0.422018 2 13 52 0.250000 Difference = p (1) - p (2) Estimate for difference: 0.172018 95% CI for difference: (0.0221925, 0.321844) Test for difference = 0 (vs not = 0): Z = 2.12 P-Value = 0.034 On the basis of the printout above we can say one of the following. i) At a 99% confidence level we can say that we have enough evidence to state that the proportion of men that prefer Pepsi differs from the proportion of women that prefer Pepsi ii) At a 95% confidence level we can say that we have enough evidence to state that the proportion of men that prefer Pepsi differs from the proportion of women that prefer Pepsi iii) At a 99% confidence level we can say that we have enough evidence to state that the proportion of men that prefer Pepsi equals the proportion of women that prefer Pepsi. iv) At a 96% confidence level there is insufficient evidence to indicate that the proportion of men that prefer Pepsi differs from the proportion of women that prefer Pepsi f) A researcher is comparing room temperatures preferred by random samples of 135 adults and 80 children. The Minitab output follows. MTB > TwoT 135 77.5 4.5 80 76.5 2.5; SUBC> Alternative 1. Two-Sample T-Test and CI Sample 1 2 N 135 80 Mean 77.50 76.50 StDev 4.50 2.50 SE Mean 0.39 0.28 Difference = mu (1) - mu (2) Estimate for difference: 1.000 95% lower bound for difference: 0.211 T-Test of difference = 0 (vs >): T-Value = 2.09 P-Value = 0.019 DF = 212 On the basis of what you see here and the way we have stated null-alternate hypothesis pairs in class we come to the following conclusion if we use a 99% confidence level. i) Do not reject H 0 : 1 2 ii) Do not reject H 0 : 1 2 iii) Do not reject H 0 : 1 2 iv) Reject H 0 : 1 2 v) Reject H 0 : 1 2 vi) Reject H 0 : 1 2 vii) None of the above (Fill in a more appropriate answer!) 15 252x0781 12/05/07 2) The data below represent the sales of Friendly Autos for 7 randomly selected months. They believe that the number of cars sold depends on the average price for that month (in $ thousands), Number of advertising spots that appeared on the local TV station and whether other types of advertising were used in that month (a dummy variable that is 1 if other types of advertising were used in a given month. Row 1 2 3 4 5 6 7 Sold 10 8 12 13 9 14 15 Price 28.2 28.7 27.9 27.8 28.1 28.8 28.9 Adv 10 6 14 18 10 19 20 Type 1 1 1 0 0 1 1 Sum of Sold = 81, Sum of Price = 198.4, Sum of Adv = 97, Sum of Sold squared = 979, Sum of Price squared = 5624.44, Sum of Adv squared = 1517, Sum of Sold * Price = 2297.4, Sum of Sold * Adv = 1206, Sum of Price * Adv = 2751.4. a) If advertising (Adv) is x5 (it isn’t) and Type is x 6 , compute x 5 x6 (2) b) Compute the coefficients of the equation Yˆ b0 b1 x to predict the value of ‘Sold’ on the basis of ‘Price.’ (5) c) Compute R 2 and R 2 adjusted for degrees of freedom. (4) d) Compute the standard error s e . (3) e) Is the slope of the simple regression significant at the 1% level? Do not answer this question without appropriate calculations! (3) [17] f) Is the sign of the coefficient of Price, what you expected? Why or why not? (1) g) Predict the average number of cars that will be sold when the price is $30 thousand using the equation you got and make it into an appropriate interval. (4) h) Do a 1% confidence interval for o , the y-intercept. (3) [24, 36] 16 252x0781 12/05/07 3) The data below represent the sales of Friendly Autos for 7 randomly selected months. They believe that the number of cars sold depends on the average price for that month (in $ thousands), Number of advertising spots that appeared on the local TV station and whether other types of advertising were used in that month (a dummy variable that is 1 if other types of advertising were used in a given month. Row 1 2 3 4 5 6 7 Sold 10 8 12 13 9 14 15 Price 28.2 28.7 27.9 27.8 28.1 28.8 28.9 Adv 10 6 14 18 10 19 20 Type 1 1 1 0 0 1 1 Sum of Sold = 81, Sum of Price = 198.4, Sum of Adv = 97, Sum of Sold squared = 979, Sum of Price squared = 5624.44, Sum of Adv squared = 1517, Sum of Sold * Price = 2297.4, Sum of Sold * Adv = 1206, Sum of Price * Adv = 2751.4. a) Do a multiple regression of ‘Sold’ against ‘Price’ and ‘Advertising.’ Attempts to recycle b1 from the previous page or to compute b2 by using a simple regression formula won’t work and won’t get any credit. (12) b) Compute R 2 and R 2 adjusted for degrees of freedom. (3) c) i) Do an ANOVA for the simple regression using either your regression sum of squares or R 2 (2). ii) Do a similar ANOVA for the multiple regression. (2) iii) Combine the two ANOVAs to do an F test to see if the addition of ‘Adv’ was worthwhile. (2) [21] d) Predict the average number of cars that will be sold when the price is $30 thousand and there are 15 spots using the equation you got and make it into an appropriate interval. (3) [24, 60] 17 252x0781 12/05/07 4) The data below represent the sales of Friendly Autos for 7 randomly selected months. They believe that the number of cars sold depends on the average price for that month (in $ thousands), Number of advertising spots that appeared on the local TV station and whether other types of advertising were used in that month (a dummy variable that is 1 if other types of advertising were used in a given month. Row 1 2 3 4 5 6 7 Sold 10 8 12 13 9 14 15 Price 28.2 28.7 27.9 27.8 28.1 28.8 28.9 Adv 10 6 14 18 10 19 20 Type 1 1 1 0 0 1 1 The Minitab output below gives the full regression of ‘Sold’ against all three independent variables. Regression Analysis: Sold versus Adv, Price, Type The regression equation is Sold = 8.46 + 0.487 Adv - 0.153 Price + 0.982 Type Predictor Constant Adv Price Type Coef 8.457 0.48699 -0.1530 0.9815 S = 0.218501 SE Coef 6.990 0.01696 ………… 0.2297 R-Sq = 99.7% Analysis of Variance Source DF SS Regression 3 41.571 Residual Error 3 0.143 Total 6 41.714 Source Adv Price Type DF 1 1 1 T 1.21 28.72 ………… 4.27 P 0.313 0.000 0.586 0.024 R-Sq(adj) = 99.3% MS 13.857 0.048 F 290.24 P 0.000 Seq SS 40.404 0.295 0.872 a) Using the material in this output find the value of R 2 for a regression against ‘Adv’ alone. (2) b) Look at the line that represents the coefficient of ‘Price.’ What about the coefficient makes me happy? What about the coefficient makes me sad? (2) c) Find the partial correlation of ‘Type’ with ‘Sold.’ (2) d) Since you now have enough information to do it, use an F test the see whether the addition of the two advertising independent variables as a pair was worthwhile. (4) [10] 18 252x0781 12/05/07 (Blank) 19 252x0781 12/05/07 Row 1 2 3 4 5 6 7 Sold 10 8 12 13 9 14 15 Price 28.2 28.7 27.9 27.8 28.1 28.8 28.9 Adv 10 6 14 18 10 19 20 Type 1 1 1 0 0 1 1 Sum of Sold = 81, Sum of Price = 198.4, Sum of Adv = 97, Sum of Sold squared = 979, Sum of Price squared = 4868.44, Sum of Adv squared = 1517, Sum of Sold * Price = 1997.4, Sum of Sold * Adv = 1206, Sum of Price * Adv = 2751.4. e) Compute the correlation between ‘Adv’ and ‘Price’ and test it for significance. Try to use the spare parts that you already have. (4) [14] f) Test the same correlation to see if it is 0.2. (4) [18, 78] g) Don’t forget to hand in your last computer problem. Check here if you did. __________________. (2 to 7) [78+] 20 252x0781 12/05/07 5) The manager of a computer network has the following data on the 200 service interruptions that have occurred over the last 100 days. x O 0 1 2 3 4 5 6 7 2 51 18 12 11 4 1 1 100 x O 0 1 2 3 4 5 3 16 30 29 18 4 a) Test to see if these follow a Poisson distribution (6) b) Use another method to test whether this has a Poisson distribution with a parameter of 1.8. (5) c) A coin is to be tested to see if it is fair. In order to test it the coin is given 5 flips 100 times and the number of heads in 5 flips is recorded at left. This means that there are a total of 500 flips and the coin has come up heads 255 times. Construct a 99% confidence interval for the proportion of times it comes up heads. Test the hypothesis that the proportion is 50% using this interval. (4) d) The distribution shown here should be a binomial distribution with n 5 and p .5 . A more powerful test of the fairness of the coin should be to use probabilities from your cumulative binomial table to check whether this distribution is correct. (4) [19, 97] e) Assume that a coin is flipped 20 times and comes up heads half the time. If the sequence of heads and tails is HHHTTTHHHTTTHHHTTTTH, can we say that the sequence is random? (This is not a yes or no question – I want a statistical test for randomness! (2) f) Now assume that there are 5 times as many flips and 5 times as many runs and heads half the time. Can we say that the sequence is random now? (3) [24, 102] 21 252x0781 12/05/07 6) Do the following. Use a 1% significance level in this problem! a) (Multiple choice) I wish to test to see if a distribution is Normal, but I must first use my data to figure out the mean and standard deviation. I have 100 data points divided into 0 to under 20, 20 to under 40, 40 to under 60, 60 to under 80 and 80 to under 100. Assume that my expected frequency is 5 or larger for each class. I could use (i) A chi-squared test with 4 degrees of freedom or a Kolmogorov – Smirnov test. (ii) A chi-squared test with 2 degrees of freedom or a Kolmogorov – Smirnov test. (iii) A chi-squared test with 4 degrees of freedom or a Lilliefors test. (iv) A chi-squared test with 2 degrees of freedom or a Lilliefors test. (v) Only a Lilliefors test. (vi) Only a Kolmogorov – Smirnov test. (vii) Only a chi-squared test. (2) b) (Bassett et al) An industrial process is run at 4 different temperatures on four different days. A random sample of 3 units is taken and scored. The results are as follows. Do the scores differ according to temperature? 100C degrees 41 44 48 120C degrees 54 56 53 140C degrees 50 52 48 160C degrees. 38 36 41 Minitab has computed the following. Sum of 100C = 133, Sum of 120C = 163, Sum of 140C = 150, Sum of 160C = 115, Sum of squares of 100C = 5921, Sum of squares of 120C = 8861, Sum of squares of 140C = 7508, Sum of squares of 160C = 4421, Bartlett's Test - Test statistic = 1.22, p-value = 0.748 and Levene's Test - Test statistic = 0.43, p-value = 0.736. Assume that the scores are not considered to come from the Normal distribution, state your null hypothesis and test it. (5). c) Assume that the scores are considered to come from the Normal distribution, state your null hypothesis and test it. (6) d) Why were the Bartlett and Levene tests run? Which of the two is correct here if the underlying distribution is Normal? What do they tell us? (2) [15] e) Ignore everything that has gone before. Assume that the Normal distribution applies and test the hypothesis that the mean of the 120C population is larger than the mean of the 100C population. Assume that the underlying distributions are Normal and have equal variances (4) or assume that the underlying distributions are Normal and do not necessarily have equal variances. (6) Do not do both! [19, 116] 22 252x0781 12/05/07 (Blank) 23 252x0781 12/05/07 7) The following are tests of proportions. (Bassett et al). You must do legitimate tests at the 10% significance level. a) Is there any association between Forecasted and observed rainfall? 141 forecasts are considered. Observed Rainfall No rain Forecasted None 34 Light Rain 21 Heavy Rain 23 Light Rain Forecasted 24 4 9 Heavy Rain Forecasted. 17 3 38 State your null and alternative hypotheses and test it. (7) b) Are there significant differences in the proportions of female insects in 3 different locations? In location 1, 44% of 100 bugs are female. In location 2, 43% of 200 bugs are female. In location 3, 55% of 200 bugs are female. First test to see if there is a significant difference between the proportions in locations 1 and 2. (4) c) In b, test whether proportions of females are independent of location using all three proportions. (5) [16, 132] 24 252x0781 12/05/07 8) The following are odds and ends that don’t fit anywhere else. We are selling our production in an imperfect market. x1 is the number of units produced and x2 is our revenue. r1 and r2 are the ranks of the items in x1 and x2. .05 Row 1 2 3 4 5 6 7 8 9 10 x1 330 263 428 584 423 219 308 123 173 140 x2 221 194 245 243 244 171 213 108 143 120 r1 7 5 9 10 8 4 6 1 3 2 r2 7 5 10 8 9 4 6 1 3 2 Minitab has computed the following: sum of x1 = 2991, squares squares x1 x2 = Sum of x2 = 1902, Sum of of x1 = 1088721, Sum of of x2 = 386210 and Sum of 767524. a) Test x1 to see if its median is 200. Do not use the sign test or compute any medians. (4) b) Assuming that x1 and x2 are both random samples from a nonnormal distribution, test to see if they have similar medians. (4) c) Compute the correlation between x1 and x2 and the rank correlation between them. Why is the rank correlation higher? (6) d) Test the rank correlation for significance. (2) [16] 25