MIDTERM2, STAT 512 20 pt total 1. (4pt) A multiple regression was run with 65 cases and 7 explanatory variables. Give the degrees of freedom for the F statistics (for numerator and denominator) that tests the null hypothesis that the coefficients of the first 3 explanatory variables are all equal to zero. The number of df is 3 and 65-8=57. 2. (8 pt) Refer to the SAS output marked OUTPUT FOR PROBLEM 3. The data are from a study of 78 7th grade students. The goal is to predict GRADE (average school grade on a scale of 0 to 11) from variables which include IQ (score on an I.Q. test) and GENDER (0 = female, 1 = male). a. (2pt) Using the output for the simple linear regression, does there appear to be a linear relationship between GRADE and IQ? Give a test statistic with degrees of freedom and p-value to support your answer (you may use other evidence as well). There does appear to be a linear relationship – just notice that the p-value of the t-test is very small. The number of df for this t-test is 78-1=77; in other words, you could have used normal distribution as well. b. (2pt) Individual #51 has GRADE = 0.53 and IQ = 103. What value of GRADE is predicted for this individual by the estimated simple linear regression model? Calculate the residual ei for this observation. Plug the value IQ=103 into the regression model to obtain -3.56+103*0.1=6.74. The residual is 6.74-0.53=6.21 c. (2pt)The variable IQGEN is the product of IQ and GENDER. Examine the output for the model involving these three variables. Write down the estimated regression equation for this model. Also write down the two separate fitted lines for female and male students. The equation is -2.25+0.094*iq for females and -2.25-3.84+(0.094+0.026)*iq=-6.09+0.12*iq for males. The united regression model is -2.25-3.84*gender+0.094*iq+0.026*iqgen d. (2pt) Examine the results of the t-tests for the three regression coefficients as well as the result of the (general linear) F-test labeled “SAMELINE”. The results of this general linear test were produced with the SAS input line “test gender, iqgen;”. State the null hypotheses tested by each of these four tests and whether that hypothesis is rejected. What apparent conflict do you see between the results of these tests? Explain why such a conflict might arise and suggest one possible action that might be used to eliminate this conflict. This is fairly self-explanatory. Out of the three t-tests, only the test for iq produces a significant result stating that there is a non-negligible linear relationship between iq and the grade. The other two t-tests say that there is no statistically significant relationship between iqgen gender , on one hand, and the grade, on the other. The general linear test , however, suggests that iqgen and gender, taken together, do contribute to the explanation of total variation in grade; thus, one shouldn’t drop both of them from the full model at once. This is the result of multicollinearity…Clearly, gender and iqgen are strongly correlated. 3. (8 pt) Refer to the SAS output labeled OUTPUT FOR PROBLEM 4. This continues the analysis begun in problem 3 using GRADE, IQ, and GENDER. Now the additional variables AGE (in years) and SC (score on a “self-concept” scale) are included (and IQGEN is removed). You may also use the OUTPUT FOR PROBLEM 3 results for this problem a. (3pt) Examine the results of the model that includes IQ, AGE, GENDER, and SC (the “full” model). Which variable(s), if any, would you consider eliminating from the model? Justify your answer extensively using information such as the results of hypothesis tests, extra sums of squares, and R2 values, as well as any other evidence that may support your argument. The most “suspicious”, at first sight, variable is iq – it has Type II SS that is very different from the Type I SS; moreover, its t-test is also not significant. Age is also not significant; however, its SS are closer to each other. The reduced model doesn’t contain age but contains iq; note that it is quite satisfactory as all of the variables are now significant and R^2 has not changed much. Moreover, the F-value is significant in both models. The conclusion would be that it is possible to remove either iq or age from the first model. b. (2pt) Does multicollinearity appear to be an issue in this analysis? Explain your reasoning, making specific reference to the parameter estimates and the results of hypothesis tests, as well as any other evidence that may support your argument. Yes, there is an obvious multicollinearity here at presence. The large difference in SS for the iq model is the most obvious sign of it. Another one is the presence of insignificant individual test results while the overall F-test is significant. c. (2pt) Which variable do you think is the most important explanatory variable? Do you recommend using this variable alone in the model? Justify your answer. The most important variable(s) is probably either age or iq. It is probably not a great idea to use either one of its own – note that, even in the reduced model, sc and gender are statiscally significant on their own. d. (1pt) What are the dimensions of the design matrix for the full model in this problem? It is 78 by 6. OUTPUT FOR PROBLEM 3 The REG Procedure Model: MODEL1 Dependent Variable: grade Analysis of Variance Source DF Sum of Mean Squares Square F Value Pr > F Model 1 136.31881 136.31881 Error 76 203.10809 2.67247 Corrected Total 77 339.42689 51.01 Root MSE 1.63477 R-Square 0.4016 Dependent Mean 7.44654 Adj R-Sq 0.3937 Coeff Var <.0001 21.95343 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| 95% Confidence Limits Intercept 1 -3.55706 1.55176 -2.29 0.0247 -6.64766 -0.46645 iq 1 0.10102 0.01414 7.14 <.0001 0.07285 0.12919 The REG Procedure Model: MODEL1 Dependent Variable: grade Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 155.42484 51.80828 20.84 <.0001 Error 74 184.00205 2.48651 Corrected Total 77 339.42689 Root MSE 1.57687 R-Square 0.4579 Dependent Mean 7.44654 Adj R-Sq 0.4359 Coeff Var 21.17586 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -2.25235 2.15377 -1.05 0.2991 iq 1 0.09400 0.02017 4.66 <.0001 gender 1 -3.84266 3.03670 -1.27 0.2097 iqgen 1 0.02656 0.02784 0.95 0.3432 Test sameline Results for Dependent Variable grade Mean Source Numerator Denominator DF Square F Value Pr > F 2 9.55302 3.84 0.0259 74 2.48651 gr ade = - 3. 5571 +0. 101 i q 12 N 78 Rs q 0. 4016 A d j Rs q 0. 3937 RMS E 1. 6348 10 8 6 4 2 0 70 80 90 100 110 120 130 140 i q gr ade = - 3. 5571 +0. 101 i q 4 N 78 Rs q 0. 4016 A d j Rs q 0. 3937 RMS E 1. 6348 2 0 - 2 - 4 - 6 - 8 70 80 90 100 110 i q 120 130 140 OUTPUT FOR PROBLEM 4 The REG Procedure Model: MODEL1 Dependent Variable: grade Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 4 183.86686 45.96672 21.57 <.0001 Error 73 155.56003 2.13096 Corrected Total 77 339.42689 Root MSE 1.45978 R-Square 0.5417 Dependent Mean 7.44654 Adj R-Sq 0.5166 Coeff Var 19.60348 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Type I SS Type II SS Intercept 1 3.62511 4.43504 0.82 0.4164 4325.17293 1.42371 iq 1 0.07401 0.01573 4.70 <.0001 136.31881 47.14706 age 1 -0.52028 0.28534 -1.82 0.0723 8.58581 7.08463 gender 1 -0.91623 0.34531 -2.65 0.0098 15.00824 15.00220 sc 1 0.05166 0.01541 3.35 0.0013 23.95401 23.95401 The REG Procedure Model: MODEL1 Dependent Variable: grade Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 176.78223 58.92741 26.81 <.0001 Error 74 162.64466 2.19790 Corrected Total 77 339.42689 Root MSE 1.48253 R-Square 0.5208 Dependent Mean 7.44654 Adj R-Sq 0.5014 Coeff Var 19.90901 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -4.05384 1.41211 -2.87 0.0053 iq 1 0.08412 0.01495 5.62 <.0001 sc 1 0.05129 0.01565 3.28 0.0016 gender 1 -0.96852 0.34948 -2.77 0.0071 The CORR Procedure 6 Variables: grade iq age sc iqgen gender Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum grade 78 7.44654 2.09956 580.83000 0.53000 10.76000 iq 78 108.92308 13.17097 8496 72.00000 136.00000 age 78 12.74359 0.63319 994.00000 12.00000 15.00000 sc 78 56.96154 12.41223 4443 20.00000 80.00000 iqgen 78 66.85897 55.44758 5215 0 136.00000 gender 78 0.60256 0.49254 47.00000 0 1.00000 Pearson Correlation Coefficients, N = 78 grade iq age sc iqgen gender grade 1.00000 0.63373 -0.38927 0.54183 -0.00505 -0.09733 iq 0.63373 1.00000 -0.38236 0.49315 0.30884 0.19142 age -0.38927 -0.38236 1.00000 -0.17808 -0.04358 0.00214 sc 0.54183 0.49315 -0.17808 1.00000 0.16141 0.09519 iqgen -0.00505 0.30884 -0.04358 0.16141 1.00000 0.98562 gender -0.09733 0.19142 0.00214 0.09519 0.98562 1.00000