sol_linreg_mult

advertisement
Multiple Linear Regression - Solutions
1 Relationship Between Eighth Grade IQ, Eighth Grade Abstract Reasoning and Ninth
grade Math Score For a statistics class project, students examined the relationship between x1 = 8th
grade IQ, x2 = 8th grade Abstract Reasoning and y = 9th grade math scores for 20 students. The data are
displayed below.
Student
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Math Score
33
31
35
38
41
37
37
39
43
40
41
44
40
45
48
45
31
47
43
48
IQ
95
100
100
102
103
105
106
106
106
109
110
110
111
112
112
114
114
115
117
118
Abstract Reas
28
24
29
30
33
32
34
36
38
39
40
43
41
42
46
44
41
47
42
49
Use Minitab on the dataset Finals found in the Datasets folder in ANGEL. Do
Stat>Regression>Regression and enter in the Response window the variable math score and in the
Predictors window enter IQ and Abstract_Reas. Click ‘Storage’ and then ‘Residuals’ and ‘Fits’. These
will be stored in columns C4 and C5 and named as RESI1 and FITS1. Your output should look as
follows:
Regression Analysis: Math Score versus IQ, Abstract_Reas
The regression equation is
Math Score = 54.1 - 0.484 IQ + 1.02 Abstract_Reas
Predictor
Constant
IQ
Abstract_Reas
S = 3.00271
Coef
54.05
-0.4836
1.0185
SE Coef
22.99
0.2955
0.2656
R-Sq = 70.5%
T
2.35
-1.64
3.84
P
0.031
0.120
0.001
R-Sq(adj) = 67.1%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
2
17
19
SS
366.92
153.28
520.20
MS
183.46
9.02
F
20.35
P
0.000
1
a. What is the regression equation and provide an interpretation of each slop in terms of the change in Y
per unit change in X?
Math Score = 54.1 - 0.484 IQ + 1.02 Abstract_Reas
In multiple linear regression, the slope indicates “for a unit change in Xi while holding the other
predictors constant (i.e. not changing), Y will change by the amount and direction of the slope for
Xi”. So here, when holding abstract reasoning constant, for a 1 unit increase in IQ the predicted
math score will decrease by 0.484 points; when holding IQ constant, for a 1 unit increase in
Abstract Reasoning the predicted math score will increase by 1.02 points.
b. Create two scatter plots of the measurements by Graph > Scatter Plot > Simple, and select IQ as the
predictor (x-variable) and math score as the response (y-variable) and enter math score again as a yvariable and enter Abstract Reas x-variable. Select Multiple Graphs and click the radio button for “In
separate panels of the same graph”. Describe the relationship between math score, abstract reasoning
and IQ.
Scatterplot of Math Score vs IQ, Abstract_Reas
30
IQ
50
40
50
Abstract_Reas
Math Score
45
40
35
30
95
100
105
110
115
There is a positive relationship between both the response variables and IQ (the explanatory
variable). However, the slope coefficient for IQ in the regression model is negative! This occurs
from how the coefficients are now calculated. In simple linear regression the estimates are related
to how the X and Y variables are correlated. However, in multiple linear regression this simple
correlation loses its relevance. Instead, a partial correlation comes into play.
c. Based on the output, what is the test of the slope for this regression equation? That is, provide the null
and alternative hypotheses, the test statistic, p-value of the test, and state your decision and conclusion.
Ho: B1 = 0 Ha: B1 ╪ 0 The test statistic is -1.64 with a p-value of 0.120. Since this p-value is
greater than 0.05, we would NOT reject Ho. This means, that when Abstract Reasoning is
already in the model, IQ is not a statistically significant linear predictor of ninth grade
math scores.
2
Ho: B2 = 0 Ha: B2 ╪ 0 The test statistic is 3.84 with a p-value of 0.001. Since this p-value is
less than 0.05, we would REJECT Ho. This means, that when IQ is already in the model,
Abstract Reasoning is a statistically significant linear predictor of ninth grade math scores.
d. From the output, what is the meaning of the ANOVA F-test? Provide the two hypotheses statements,
decision and conclusion.
Ho: B1 = B2 = 0 and Ha at least one of these slopes does not equal zero.
With a p-value of 0.000 and test statistic of 20.35, we reject Ho and conclude at least one of the
slopes does not equal zero. NOTE: this rejection does not tell which slope(s) is/are significant. Just
simply that at least one is significant.
e. Check assumptions of constant variance and normality by creating a Scatterplot under Graphs of the
residuals versus each of the predictor variables. For the normality plot, see Graphs > Probability Plot >
Single and graph the residuals. What are your conclusions based on these graphs?
Both scatterplots provide and indication of an outlier (bottom right of each figure) and the
probability plot which is testing that the null hypothesis that the data comes from a normal
distribution is rejected (p-value less than 0.005) giving evidence that the data does not satisfy both
assumptions of normality and constant variance. Handling possible outlier(s) in multiple linear
regression is analogous to the methods used in simple linear regression.
Scatterplot of RESI1 vs IQ, Abstract_Reas
30
IQ
5.0
40
50
Abstract_Reas
2.5
RESI1
0.0
-2.5
-5.0
-7.5
-10.0
95
100
105
110
115
Probability Plot of RESI1
Normal - 95% CI
99
Mean
StDev
N
AD
P-Value
95
90
-1.776 36 E-14
2 .840
20
1 .170
<0 .005
Percent
80
70
60
50
40
30
20
10
5
1
-10
-5
0
RESI1
5
10
3
Download