Multiple Linear Regression - Solutions 1 Relationship Between Eighth Grade IQ, Eighth Grade Abstract Reasoning and Ninth grade Math Score For a statistics class project, students examined the relationship between x1 = 8th grade IQ, x2 = 8th grade Abstract Reasoning and y = 9th grade math scores for 20 students. The data are displayed below. Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Math Score 33 31 35 38 41 37 37 39 43 40 41 44 40 45 48 45 31 47 43 48 IQ 95 100 100 102 103 105 106 106 106 109 110 110 111 112 112 114 114 115 117 118 Abstract Reas 28 24 29 30 33 32 34 36 38 39 40 43 41 42 46 44 41 47 42 49 Use Minitab on the dataset Finals found in the Datasets folder in ANGEL. Do Stat>Regression>Regression and enter in the Response window the variable math score and in the Predictors window enter IQ and Abstract_Reas. Click ‘Storage’ and then ‘Residuals’ and ‘Fits’. These will be stored in columns C4 and C5 and named as RESI1 and FITS1. Your output should look as follows: Regression Analysis: Math Score versus IQ, Abstract_Reas The regression equation is Math Score = 54.1 - 0.484 IQ + 1.02 Abstract_Reas Predictor Constant IQ Abstract_Reas S = 3.00271 Coef 54.05 -0.4836 1.0185 SE Coef 22.99 0.2955 0.2656 R-Sq = 70.5% T 2.35 -1.64 3.84 P 0.031 0.120 0.001 R-Sq(adj) = 67.1% Analysis of Variance Source Regression Residual Error Total DF 2 17 19 SS 366.92 153.28 520.20 MS 183.46 9.02 F 20.35 P 0.000 1 a. What is the regression equation and provide an interpretation of each slop in terms of the change in Y per unit change in X? Math Score = 54.1 - 0.484 IQ + 1.02 Abstract_Reas In multiple linear regression, the slope indicates “for a unit change in Xi while holding the other predictors constant (i.e. not changing), Y will change by the amount and direction of the slope for Xi”. So here, when holding abstract reasoning constant, for a 1 unit increase in IQ the predicted math score will decrease by 0.484 points; when holding IQ constant, for a 1 unit increase in Abstract Reasoning the predicted math score will increase by 1.02 points. b. Create two scatter plots of the measurements by Graph > Scatter Plot > Simple, and select IQ as the predictor (x-variable) and math score as the response (y-variable) and enter math score again as a yvariable and enter Abstract Reas x-variable. Select Multiple Graphs and click the radio button for “In separate panels of the same graph”. Describe the relationship between math score, abstract reasoning and IQ. Scatterplot of Math Score vs IQ, Abstract_Reas 30 IQ 50 40 50 Abstract_Reas Math Score 45 40 35 30 95 100 105 110 115 There is a positive relationship between both the response variables and IQ (the explanatory variable). However, the slope coefficient for IQ in the regression model is negative! This occurs from how the coefficients are now calculated. In simple linear regression the estimates are related to how the X and Y variables are correlated. However, in multiple linear regression this simple correlation loses its relevance. Instead, a partial correlation comes into play. c. Based on the output, what is the test of the slope for this regression equation? That is, provide the null and alternative hypotheses, the test statistic, p-value of the test, and state your decision and conclusion. Ho: B1 = 0 Ha: B1 ╪ 0 The test statistic is -1.64 with a p-value of 0.120. Since this p-value is greater than 0.05, we would NOT reject Ho. This means, that when Abstract Reasoning is already in the model, IQ is not a statistically significant linear predictor of ninth grade math scores. 2 Ho: B2 = 0 Ha: B2 ╪ 0 The test statistic is 3.84 with a p-value of 0.001. Since this p-value is less than 0.05, we would REJECT Ho. This means, that when IQ is already in the model, Abstract Reasoning is a statistically significant linear predictor of ninth grade math scores. d. From the output, what is the meaning of the ANOVA F-test? Provide the two hypotheses statements, decision and conclusion. Ho: B1 = B2 = 0 and Ha at least one of these slopes does not equal zero. With a p-value of 0.000 and test statistic of 20.35, we reject Ho and conclude at least one of the slopes does not equal zero. NOTE: this rejection does not tell which slope(s) is/are significant. Just simply that at least one is significant. e. Check assumptions of constant variance and normality by creating a Scatterplot under Graphs of the residuals versus each of the predictor variables. For the normality plot, see Graphs > Probability Plot > Single and graph the residuals. What are your conclusions based on these graphs? Both scatterplots provide and indication of an outlier (bottom right of each figure) and the probability plot which is testing that the null hypothesis that the data comes from a normal distribution is rejected (p-value less than 0.005) giving evidence that the data does not satisfy both assumptions of normality and constant variance. Handling possible outlier(s) in multiple linear regression is analogous to the methods used in simple linear regression. Scatterplot of RESI1 vs IQ, Abstract_Reas 30 IQ 5.0 40 50 Abstract_Reas 2.5 RESI1 0.0 -2.5 -5.0 -7.5 -10.0 95 100 105 110 115 Probability Plot of RESI1 Normal - 95% CI 99 Mean StDev N AD P-Value 95 90 -1.776 36 E-14 2 .840 20 1 .170 <0 .005 Percent 80 70 60 50 40 30 20 10 5 1 -10 -5 0 RESI1 5 10 3