252x0343 12/06/03 ECO252 QBA2 FINAL EXAM DEC 11, 2003 Name Hour of Class Registered _______ I. (25+ points) Do all the following. Note that answers without reasons receive no credit. Most answers require a statistical test, that is, stating or implying a hypothesis and showing why it is true or false by citing a table value or a p-value. The fourth computer problem involved the regression of the Y variable below against some, but not all of the X values. Column Variables in Data Set C2 X2 Type 1 = Private, 0 = Public. C3 X1 First Quartile SAT C3 X5 3rd Quartile SAT C5 X4 Room and Board Cost C6 Y Annual Total Cost C7 X6 Average Indebtedness at Graduation C8 X3 Interaction = X1 * X2 You were directed to hand in the computer output (4 points) and your answer to problem 14.37 or the equivalent problem in the 8th edition. (Up to 7 points). I ran the same problem you did, but went on to add X4 and X5 to the input. The output appears on pages 1-7. My first two regressions were stepwise regressions. The second stepwise regression is set up to force the dummy variable designating ‘type of university’ into the equation. This means that our first equation in regression 2 is essentially Yˆ b0 b2 X 2 . a)According to the first regression in regression 2 what are the mean annual total costs for public and private universities and how does the printout show us that they are significantly different? (2) b) Regressions 3 and 4 are the regressions you supposedly did. According to this regression for a public university the constant in the regression equation is b0 1013 and the slope, relative to the first quartile SAT is b1 11.3339. The equation relating annual total costs to the first quartile SAT effectively has both a different intercept and a different slope; what is the equation? Are the intercepts and slopes for public and private universities, in fact significantly different? What tells us this? (3). (Extra credit: at what SAT level do public and private universities have the same cost? (2)) c) Regression 6 should be the best of all the regressions, because it has the most independent variables and the highest R-squared, but it isn’t. (i) Look at the coefficients of the independent variables and ignore their significance, one of those coefficients is incredibly unreasonable, which one is it? (1) (ii) Which coefficients are significant at the 1% level, why? (2) What about the 10% level? (1) Compare the adjusted R-squares with the other regressions, what do they tell us? (1) Look at the VIFs, what do they imply?(2) d) Do an F test to tell whether adding X3, X4 and X5 as a package to equation 3 with only X1 and X2 was useful? What is your conclusion? (4) e) I didn’t follow directions when I did a prediction interval for equation 3, so it should disagree with yours. I added some guesses as to (median?) values for X3, X4 and X5. What does the printout say I used? What would you expect should happen to the size of the prediction interval if our addition of new variables gives us a better estimate of Y? Did it happen? Cite numbers.(3) f) Use the method suggested in the text, using the standard error s e to compute a prediction interval for the same values of the independent variables and equation 3 – how accurate is it? (3) 32 ————— 12/5/2003 7:15:10 PM ———————————————————— Welcome to Minitab, press F1 for help. MTB > Retrieve "C:\Documents and Settings\RBOVE.WCUPANET\My Documents\Drive D\MINITAB\Colleges2002.MTW". 252x0343 12/06/03 Retrieving worksheet from file: C:\Documents and Settings\RBOVE.WCUPANET\My Documents\Drive D\MINITAB\Colleges2002.MTW # Worksheet was saved on Fri Dec 05 2003 Results for: Colleges2002.MTW MTB > Stepwise c6 c3 c2 c8 c5 c4; SUBC> AEnter 0.15; SUBC> ARemove 0.15; SUBC> Constant. 1) Stepwise Regression: Annual Total versus First quarti, Type of Scho, ... Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is Annual T on 5 predictors, with N = Step Constant 1 12198 2 -1021 inter T-Value P-Value 10.42 17.40 0.000 8.35 11.97 0.000 First qu T-Value P-Value 80 13.3 4.62 0.000 S 3058 2724 R-Sq 79.51 83.95 R-Sq(adj) 79.25 83.54 C-p 20.2 1.3 More? (Yes, No, Subcommand, or Help) SUBC> yes No variables entered or removed More? (Yes, No, Subcommand, or Help) SUBC> no MTB > Stepwise c6 c3 c2 c8 c5 c4; SUBC> Force c2; SUBC> AEnter 0.15; SUBC> ARemove 0.15; SUBC> Constant. 2) Stepwise Regression: Annual Total versus First quarti, Type of Scho, ... Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is Annual T on 5 predictors, with N = Step Constant 1 12478 2 -7264 3 1013 Type of T-Value P-Value 11646 13.97 0.000 8732 11.57 0.000 -3016 -0.48 0.630 19.5 7.37 0.000 11.3 2.25 0.027 First qu T-Value P-Value inter T-Value P-Value 80 11.2 1.90 0.061 S 3610 2783 2737 R-Sq 71.44 83.24 84.00 R-Sq(adj) 71.07 82.81 83.37 C-p 58.1 4.7 3.1 More? (Yes, No, Subcommand, or Help) SUBC> yes 2 252x0343 12/06/03 No variables entered or removed More? (Yes, No, Subcommand, or Help) SUBC> no MTB > Name c18 = 'RESI1' MTB > Regress c6 2 c3 c2; SUBC> Residuals 'RESI1'; SUBC> GHistogram; SUBC> GNormalplot; SUBC> GFits; SUBC> RType 1; SUBC> Constant; SUBC> VIF; SUBC> Predict c9 c10; SUBC> Brief 2. 3) Regression Analysis: Annual Total versus First quarti, Type of Scho The regression equation is Annual Total Cost = - 7264 + 19.5 First quartile SAT + 8732 Type of School Predictor Constant First qu Type of Coef -7264 19.524 8732.4 S = 2783 SE Coef 2728 2.651 754.7 R-Sq = 83.2% T -2.66 7.37 11.57 P 0.009 0.000 0.000 VIF 1.4 1.4 R-Sq(adj) = 82.8% Analysis of Variance Source Regression Residual Error Total Source First qu Type of DF 1 1 DF 2 77 79 SS 2963313624 596492779 3559806404 MS 1481656812 7746659 F 191.26 P 0.000 Seq SS 1926306635 1037006989 Unusual Observations Obs First qu Annual T 27 1040 21484 56 1010 15722 61 1320 17526 Fit 13041 21188 27240 SE Fit 514 560 578 Residual 8443 -5466 -9714 St Resid 3.09R -2.00R -3.57R R denotes an observation with a large standardized residual Predicted Values for New Observations New Obs 1 Fit 20993 SE Fit 579 ( 95.0% CI 19839, 22147) ( 95.0% PI 15332, 26654) Values of Predictors for New Observations New Obs 1 First qu 1000 Type of 1.00 Residual Histogram for Annual T Normplot of Residuals for Annual T Residuals vs Fits for Annual T MTB > %Resplots c18 c2; SUBC> Title "Residuals vs Type". Executing from file: W:\wminitab13\MACROS\Resplots.MAC Macro is running ... please wait Residual Plots: RESI1 vs Type of Scho 3 252x0343 12/06/03 MTB > %Resplots c18 c3; SUBC> Title "Residuals vs Type". Executing from file: W:\wminitab13\MACROS\Resplots.MAC Macro is running ... please wait Residual Plots: RESI1 vs First quarti MTB > Name c19 = 'RESI2' MTB > Regress c6 3 c3 c2 c8; SUBC> Residuals 'RESI2'; SUBC> GHistogram; SUBC> GNormalplot; SUBC> GFits; SUBC> RType 1; SUBC> Constant; SUBC> VIF; SUBC> Predict c9 c10 c11; SUBC> Brief 2. 4) Regression Analysis: Annual Total versus First quarti, Type of Scho, ... The regression equation is Annual Total Cost = 1013 + 11.3 First quartile SAT - 3016 Type of School + 11.2 inter Predictor Constant First qu Type of inter Coef 1013 11.339 -3016 11.177 S = 2737 SE Coef 5120 5.039 6234 5.889 R-Sq = 84.0% T 0.20 2.25 -0.48 1.90 P 0.844 0.027 0.630 0.061 VIF 5.2 97.2 120.6 R-Sq(adj) = 83.4% Analysis of Variance Source Regression Residual Error Total Source First qu Type of inter DF 1 1 1 DF 3 76 79 SS 2990309581 569496823 3559806404 MS 996769860 7493379 F 133.02 P 0.000 Seq SS 1926306635 1037006989 26995957 Unusual Observations Obs First qu Annual T 3 800 9476 9 1250 13986 27 1040 21484 61 1320 17526 Fit 10084 15186 12805 27718 SE Fit 1176 1303 520 622 Residual -608 -1200 8679 -10192 St Resid -0.25 X -0.50 X 3.23R -3.82R R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence. Predicted Values for New Observations New Obs 1 Fit 20513 SE Fit 623 ( 95.0% CI 19271, 21755) ( 95.0% PI 14921, 26105) Values of Predictors for New Observations New Obs 1 First qu 1000 Type of 1.00 inter 1000 MTB > Name c20 = 'RESI3' MTB > Regress c6 4 c3 c2 c8 c5; SUBC> Residuals 'RESI3'; SUBC> Constant; SUBC> VIF; SUBC> Predict c9 c10 c11 c12; SUBC> Brief 2. 4 252x0343 12/06/03 5) Regression Analysis: Annual Total versus First quarti, Type of Scho, ... The regression equation is Annual Total Cost = - 13 + 11.4 First quartile SAT - 3053 Type of School + 10.9 inter + 0.165 Room and Board Predictor Constant First qu Type of inter Room and Coef -13 11.382 -3053 10.928 0.1655 S = 2750 SE Coef 5483 5.064 6263 5.934 0.3062 R-Sq = 84.1% T -0.00 2.25 -0.49 1.84 0.54 P 0.998 0.028 0.627 0.069 0.591 VIF 5.2 97.3 121.3 1.9 R-Sq(adj) = 83.2% Analysis of Variance Source Regression Residual Error Total Source First qu Type of inter Room and DF 4 75 79 DF 1 1 1 1 SS 2992518033 567288370 3559806404 MS 748129508 7563845 F 98.91 P 0.000 Seq SS 1926306635 1037006989 26995957 2208452 Unusual Observations Obs First qu Annual T 9 1250 13986 27 1040 21484 61 1320 17526 Fit 15174 12880 27621 SE Fit 1309 541 650 Residual -1188 8604 -10095 St Resid -0.49 X 3.19R -3.78R R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence. Predicted Values for New Observations New Obs 1 Fit 20071 SE Fit 1030 ( 95.0% CI 18020, 22122) ( 95.0% PI 14221, 25922) Values of Predictors for New Observations New Obs 1 First qu 1000 Type of 1.00 inter 1000 Room and 5000 MTB > Name c21 = 'RESI4' MTB > Regress c6 5 c3 c2 c8 c5 c4; SUBC> Residuals 'RESI4'; SUBC> Constant; SUBC> VIF; SUBC> Predict c9 c10 c11 c12 c13; SUBC> Brief 2. 6) Regression Analysis: Annual Total versus First quarti, Type of Scho, ... The regression equation is Annual Total Cost = 5873 + 26.2 First quartile SAT - 4605 Type of School + 12.2 inter + 0.150 Room and Board - 17.0 Third quartile SAT Predictor Constant First qu Type of inter Room and Third qu S = 2754 Coef 5873 26.23 -4605 12.162 0.1503 -17.01 SE Coef 8515 17.18 6502 6.096 0.3070 18.81 R-Sq = 84.2% T 0.69 1.53 -0.71 2.00 0.49 -0.90 P 0.493 0.131 0.481 0.050 0.626 0.369 VIF 59.2 104.5 127.7 1.9 58.0 R-Sq(adj) = 83.2% 5 252x0343 12/06/03 Analysis of Variance Source Regression Residual Error Total Source First qu Type of inter Room and Third qu DF 5 74 79 DF 1 1 1 1 1 SS 2998718897 561087507 3559806404 MS 599743779 7582264 F 79.10 P 0.000 Seq SS 1926306635 1037006989 26995957 2208452 6200863 Unusual Observations Obs First qu Annual T 3 800 9476 9 1250 13986 27 1040 21484 61 1320 17526 Fit 9606 15381 13192 27217 SE Fit 1323 1331 642 789 Residual -130 -1395 8292 -9691 St Resid -0.05 X -0.58 X 3.10R -3.67R R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence. Predicted Values for New Observations New Obs 1 Fit 20002 SE Fit 1034 95.0% CI 17942, 22061) ( ( 95.0% PI 14141, 25862) Values of Predictors for New Observations New Obs 1 First qu 1000 Type of 1.00 inter 1000 Room and 5000 Third qu 1200 MTB > Name c22 = 'RESI5' MTB > Regress c6 2 c3 c5 ; SUBC> Residuals 'RESI5'; SUBC> Constant; SUBC> VIF; SUBC> Predict c9 c12 ; SUBC> Brief 2. 7) Regression Analysis: Annual Total versus First quarti, Room and Boa The regression equation is Annual Total Cost = - 24258 + 27.9 First quartile SAT + 1.84 Room and Board Predictor Constant First qu Room and Coef -24258 27.927 1.8439 S = 3959 SE Coef 3686 3.532 0.3534 R-Sq = 66.1% T -6.58 7.91 5.22 P 0.000 0.000 0.000 VIF 1.2 1.2 R-Sq(adj) = 65.2% Analysis of Variance Source Regression Residual Error Total Source First qu Room and DF 1 1 DF 2 77 79 SS 2352968982 1206837422 3559806404 MS 1176484491 15673213 F 75.06 P 0.000 Seq SS 1926306635 426662346 6 252x0343 12/06/03 Unusual Observations Obs First qu Annual T 14 920 7210 16 1120 9451 41 1060 25865 53 900 17886 61 1320 17526 Fit 16752 18272 14472 18758 26398 SE Fit 1006 593 849 1441 845 Residual -9542 -8821 11393 -872 -8872 St Resid -2.49R -2.25R 2.95R -0.24 X -2.29R R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence. Predicted Values for New Observations New Obs 1 Fit 12889 SE Fit 820 ( 95.0% CI 11255, 14522) ( 95.0% PI 4838, 20939) Values of Predictors for New Observations New Obs 1 First qu 1000 Room and 5000 7 252x0343 12/06/03 II. Do at least 4 of the following 6 Problems (at least 13 each) (or do sections adding to at least 50 points Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H1 where applicable. Use a significance level of 5% unless noted otherwise. Do not answer questions without citing appropriate statistical tests – That is, explain your hypotheses and what values from what table were used to test them. 1. A marketing analyst collects data on the screen size and price of the two models produced by a competitor. Here ‘price’ is the price in dollars, ‘size’ is screen size in inches, ‘model’ is 1 for the deluxe model (zero for the regular model) and rx1 is the column in which you may rank x1 . Row 1 2 3 4 5 6 7 8 9 10 price size y x1 371.69 403.61 484.41 492.89 606.25 634.41 651.00 806.25 1131.00 1739.00 7320.51 13 19 21 17 25 21 210 25 290 370 1011 model x12 x 22 0 169 0 361 0 441 1 289 0 625 1 441 0 44100 1 625 0 84100 1 136900 4 268051 0 0 0 1 0 1 0 1 0 1 4 x2 y2 138153 162901 234653 242941 367539 402476 423801 650039 1279161 3024121 6925785 x1 y x2 y 4832 7669 10173 8379 15156 13323 136710 20156 327990 643430 1187817 x1 x 2 0 0 0 17 0 21 0 25 0 370 433 rx1 1 3 9.0 10.0 a. Fill in the x 2 y column.(2) b. Compute the simple regression of price against size.(6) c. Compute R squared and R squared adjusted for degrees of freedom. (3) d. Compute the standard error s e (3) e. Compute s b1 and make it into a confidence interval for 1 . (3) f. Do a prediction interval for the price of a model with a 19 inch screen. (4) 21 8 252x0343 12/06/03 2. A marketing analyst collects data on the screen size and price of the two models produced by a competitor. Here ‘price’ is the price in dollars, ‘size’ is screen size in inches, ‘model’ is 1 for the deluxe model (zero for the regular model) and rx1 is the column in which you will rank x1 . Row 1 2 3 4 5 6 7 8 9 10 price size y x1 371.69 403.61 484.41 492.89 606.25 634.41 651.00 806.25 1131.00 1739.00 7320.51 13 19 21 17 25 21 210 25 290 370 1011 model x12 x 22 0 169 0 361 0 441 1 289 0 625 1 441 0 44100 1 625 0 84100 1 136900 4 268051 0 0 0 1 0 1 0 1 0 1 4 x2 y2 138153 162901 234653 242941 367539 402476 423801 650039 1279161 3024121 6925785 x1 y 4832 7669 10173 8379 15156 13323 136710 20156 327990 643430 1187817 x2 y x1 x 2 0 0 0 17 0 21 0 25 0 370 433 rx1 1 3 9.0 10.0 a. Do a multiple regression of price against size and model.(10) b. Compute R-squared and R-squared adjusted for degrees of freedom for this regression and compare them with the values for the previous problem. (4) c. Using either R – squares or SST, SSR and SSE do F tests (ANOVA). First check the usefulness of the simple regression and then the value of ‘model’ as an improvement to the regression (6) d. Predict the price of a deluxe model with a 19 inch screen – how much change is there from your last prediction? (2) 22 9 252x0343 12/06/03 3. A marketing analyst collects data on the screen size and price of the two models produced by a competitor. Here ‘price’ is the price in dollars, ‘size’ is screen size in inches, ‘model’ is 1 for the deluxe model (zero for the regular model) and rx1 is the column in which you will rank x1 . Row 1 2 3 4 5 6 7 8 9 10 price size y x1 371.69 403.61 484.41 492.89 606.25 634.41 651.00 806.25 1131.00 1739.00 7320.51 13 19 21 17 25 21 210 25 290 370 1011 model x12 x 22 0 169 0 361 0 441 1 289 0 625 1 441 0 44100 1 625 0 84100 1 136900 4 268051 0 0 0 1 0 1 0 1 0 1 4 x2 y2 138153 162901 234653 242941 367539 402476 423801 650039 1279161 3024121 6925785 x1 y 4832 7669 10173 8379 15156 13323 136710 20156 327990 643430 1187817 x2 y x1 x 2 0 0 0 17 0 21 0 25 0 370 433 rx1 1 3 9.0 10.0 a. Compute the correlation between price and size and check to see if it is significant using the spare parts from problem 1 if you have them. (5) b. Use the same correlation to test the hypothesis that the correlation is .85 (4) c. Do ranks for the values of ‘size’ in the rx1 column, compute a rank correlation between price and size and test it for significance using the rank correlation table if possible. (5) 14 10 252x0343 12/06/03 4. Explain the following. a. Under what circumstances you could use a Chi squared method to test for Normality but not a Kolmogorov - Smirnov? (2) b. Under what circumstances could you use a Lilliefors test to test for Normality but not a Kolmogorov – Smirnov? (2) c. Under what circumstances could you use a Kruskal – Wallis test to test whether four distributions are similar but not a one – way ANOVA? (2) d. What 2 tests can be used to test for the equality of two medians? Which is more powerful? (2) f. A random sample of 21 Porsche drivers were asked how many miles they had driven in the last year and a frequency table was constructed of the data. Miles Observed frequency 0 – 4000 2 4000 – 8000 7 8000 – 12000 7 Over 12000 5 Does the data follow a Normal distribution with a mean of 8000 and a standard deviation of 2000? Do not cut the number of groups below what is presented here. Find the appropriate E or cumulative E and do the test. (6) 11 252x0343 12/06/03 5. (Ullman) A Latin Square is an extremely effective way of doing a 3 way ANOVA. In this example the data is arranged in 4 rows and 4 columns . there are 3 factors. Factor A is rows - machines. Factor B is columns – operators and Factor C - materials is shown by a tag C1, C2, C3, and C4.each material appears once in each row or column. These are times to do a job categorized by machines, operators and cutting material. The rules are just the same as in any ANOVA- degrees of freedom add up and sums of squares add up. /I am going to set this up as a 2 way ANOVA with one measurement per cell. There is no interaction. We, of course assume that the parent distribution is Normal B1 B2 B3 B4 Sum SS ni x i x 2 i A1 A2 A3 A4 Sum nj 7 C1 6 C4 5 C3 6 C2 24 4 4 C2 9 C1 1 C4 3 C3 17 4 5 C3 4 C2 6 C1 4 C4 19 4 3 C3 2 C2 1 C1 10 C4 16 4 6.00 4.25 4.75 4.00 x j SS x j 146 107 93 19 21 13 23 76 16 ( 4 4 4 4 16 n ) 4.75 5.25 3.25 5.75 ( ) x 99 137 63 161 2 xijk x i 2 x 2 xijk 114 x .2j . 2 You now have a choice. a) If you are a real wimp, you will pretend that each column is a random sample and compare the means of each operator. (5) the table will look like that below. Source SS DF MS F F.05 Between Within Total b) If you are less wimpy, you will pretend that this is a 2-way ANOVA and your table will look like that below (8) Source SS DF MS F F.05 Rows A Columns B Within Total c) If you are very daring, you will try the table below. (11) To do this you need to know that the means for the 4 materials are 8, 3.75, 3.75 and 3.50 and that the factor C sum of squares is x SSC 4 2 ..k nx 2 4 8 2 3.752 3.752 3.50 2 16 2 ?. I think that the degrees of freedom should be obvious. Please don’t make the same mistakes you make on the last exam! You have 3 null hypotheses. Tell me what they are and whether you reject them. Source SS DF MS F F.05 Rows A Columns B Materials C Within Total d) Assuming that your data is cross classified, compare the means of columns 1 and 4 using a 2-sample method. (3) e) Assume that this is the equivalent of a 2-way one-measurement per cell ANOVA, but that the underlying distribution is not Normal and do an appropriate rank test. (5) 12 252x0343 12/06/03 (Blank Page – more on near page) 13 252x0343 12/06/03 6. a. A Stock moves up and down as follows. In 36 days it goes up 14 times and down 22 times. UDDDDUUUDUDDDUUDDDDUDDDUUDDDUDUUDDDU (i) Test these movements for randomness. (5) (ii) Take the first half of the series and test it for randomness – (and don’t repeat what you did in part (i) exactly. (4) b. Explain, briefly, why I did not bother with a Durbin – Watson test in the regression that began the exam (2) c. Test the hypothesis that the population the D’s and U’s above came from is evenly split between D’s and U’s (4). 14