MIDTERM2, STAT 512 20 pt total

advertisement
MIDTERM2, STAT 512
20 pt total
1. (4pt) A multiple regression was run with 65 cases and 7 explanatory
variables. Give the degrees of freedom for the F statistics (for numerator and
denominator) that tests the null hypothesis that the coefficients of the first 3
explanatory variables are all equal to zero.
The number of df is 3 and 65-8=57.
2. (8 pt) Refer to the SAS output marked OUTPUT FOR PROBLEM 3. The data are from
a study of 78 7th grade students. The goal is to predict GRADE (average school grade
on a scale of 0 to 11) from variables which include IQ (score on an I.Q. test) and
GENDER (0 = female, 1 = male).
a. (2pt) Using the output for the simple linear regression, does there appear to
be a linear relationship between GRADE and IQ? Give a test statistic with
degrees of freedom and p-value to support your answer (you may use other
evidence as well).
There does appear to be a linear relationship – just notice that the p-value of the t-test is very
small. The number of df for this t-test is 78-1=77; in other words, you could have used normal
distribution as well.
b. (2pt) Individual #51 has GRADE = 0.53 and IQ = 103. What value of GRADE is
predicted for this individual by the estimated simple linear regression model?
Calculate the residual ei for this observation.
Plug the value IQ=103 into the regression model to obtain -3.56+103*0.1=6.74. The
residual is 6.74-0.53=6.21
c. (2pt)The variable IQGEN is the product of IQ and GENDER. Examine the
output for the model involving these three variables. Write down the
estimated regression equation for this model. Also write down the two
separate fitted lines for female and male students.
The equation is -2.25+0.094*iq for females and -2.25-3.84+(0.094+0.026)*iq=-6.09+0.12*iq for
males.
The united regression model is -2.25-3.84*gender+0.094*iq+0.026*iqgen
d. (2pt) Examine the results of the t-tests for the three regression coefficients
as well as the result of the (general linear) F-test labeled “SAMELINE”. The
results of this general linear test were produced with the SAS input line
“test gender, iqgen;”. State the null hypotheses tested by each of
these four tests and whether that hypothesis is rejected. What apparent
conflict do you see between the results of these tests? Explain why such a
conflict might arise and suggest one possible action that might be used to
eliminate this conflict.
This is fairly self-explanatory. Out of the three t-tests, only the test for iq produces a significant
result stating that there is a non-negligible linear relationship between iq and the grade. The
other two t-tests say that there is no statistically significant relationship between iqgen gender ,
on one hand, and the grade, on the other. The general linear test , however, suggests that iqgen
and gender, taken together, do contribute to the explanation of total variation in grade; thus,
one shouldn’t drop both of them from the full model at once. This is the result of
multicollinearity…Clearly, gender and iqgen are strongly correlated.
3. (8 pt) Refer to the SAS output labeled OUTPUT FOR PROBLEM 4. This continues the
analysis begun in problem 3 using GRADE, IQ, and GENDER. Now the additional
variables AGE (in years) and SC (score on a “self-concept” scale) are included (and
IQGEN is removed). You may also use the OUTPUT FOR PROBLEM 3 results for this
problem
a. (3pt) Examine the results of the model that includes IQ, AGE, GENDER, and
SC (the “full” model). Which variable(s), if any, would you consider
eliminating from the model? Justify your answer extensively using
information such as the results of hypothesis tests, extra sums of squares,
and R2 values, as well as any other evidence that may support your
argument.
The most “suspicious”, at first sight, variable is iq – it has Type II SS that is very different from
the Type I SS; moreover, its t-test is also not significant. Age is also not significant; however, its
SS are closer to each other. The reduced model doesn’t contain age but contains iq; note that it
is quite satisfactory as all of the variables are now significant and R^2 has not changed much.
Moreover, the F-value is significant in both models. The conclusion would be that it is possible
to remove either iq or age from the first model.
b. (2pt) Does multicollinearity appear to be an issue in this analysis? Explain
your reasoning, making specific reference to the parameter estimates and
the results of hypothesis tests, as well as any other evidence that may
support your argument.
Yes, there is an obvious multicollinearity here at presence. The large difference in SS for the iq
model is the most obvious sign of it. Another one is the presence of insignificant individual test
results while the overall F-test is significant.
c. (2pt) Which variable do you think is the most important explanatory
variable? Do you recommend using this variable alone in the model? Justify
your answer.
The most important variable(s) is probably either age or iq. It is probably not a great idea to
use either one of its own – note that, even in the reduced model, sc and gender are statiscally
significant on their own.
d. (1pt) What are the dimensions of the design matrix for the full model in this
problem?
It is 78 by 6.
OUTPUT FOR PROBLEM 3
The REG Procedure
Model: MODEL1
Dependent Variable: grade
Analysis of Variance
Source
DF
Sum of
Mean
Squares
Square
F Value
Pr > F
Model
1
136.31881
136.31881
Error
76
203.10809
2.67247
Corrected Total
77
339.42689
51.01
Root MSE
1.63477
R-Square
0.4016
Dependent Mean
7.44654
Adj R-Sq
0.3937
Coeff Var
<.0001
21.95343
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
95% Confidence Limits
Intercept
1
-3.55706
1.55176
-2.29
0.0247
-6.64766
-0.46645
iq
1
0.10102
0.01414
7.14
<.0001
0.07285
0.12919
The REG Procedure
Model: MODEL1
Dependent Variable: grade
Analysis of Variance
Sum of
Mean
Source
DF
Squares
Square
F Value
Pr > F
Model
3
155.42484
51.80828
20.84
<.0001
Error
74
184.00205
2.48651
Corrected Total
77
339.42689
Root MSE
1.57687
R-Square
0.4579
Dependent Mean
7.44654
Adj R-Sq
0.4359
Coeff Var
21.17586
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Intercept
1
-2.25235
2.15377
-1.05
0.2991
iq
1
0.09400
0.02017
4.66
<.0001
gender
1
-3.84266
3.03670
-1.27
0.2097
iqgen
1
0.02656
0.02784
0.95
0.3432
Test sameline Results for Dependent Variable grade
Mean
Source
Numerator
Denominator
DF
Square
F Value
Pr > F
2
9.55302
3.84
0.0259
74
2.48651
gr ade = - 3. 5571 +0. 101 i q
12
N
78
Rs q
0. 4016
A d j Rs q
0. 3937
RMS E
1. 6348
10
8
6
4
2
0
70
80
90
100
110
120
130
140
i q
gr ade = - 3. 5571 +0. 101 i q
4
N
78
Rs q
0. 4016
A d j Rs q
0. 3937
RMS E
1. 6348
2
0
- 2
- 4
- 6
- 8
70
80
90
100
110
i q
120
130
140
OUTPUT FOR PROBLEM 4
The REG Procedure
Model: MODEL1
Dependent Variable: grade
Analysis of Variance
Sum of
Mean
Source
DF
Squares
Square
F Value
Pr > F
Model
4
183.86686
45.96672
21.57
<.0001
Error
73
155.56003
2.13096
Corrected Total
77
339.42689
Root MSE
1.45978
R-Square
0.5417
Dependent Mean
7.44654
Adj R-Sq
0.5166
Coeff Var
19.60348
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Type I SS
Type II SS
Intercept
1
3.62511
4.43504
0.82
0.4164
4325.17293
1.42371
iq
1
0.07401
0.01573
4.70
<.0001
136.31881
47.14706
age
1
-0.52028
0.28534
-1.82
0.0723
8.58581
7.08463
gender
1
-0.91623
0.34531
-2.65
0.0098
15.00824
15.00220
sc
1
0.05166
0.01541
3.35
0.0013
23.95401
23.95401
The REG Procedure
Model: MODEL1
Dependent Variable: grade
Analysis of Variance
Sum of
Mean
Source
DF
Squares
Square
F Value
Pr > F
Model
3
176.78223
58.92741
26.81
<.0001
Error
74
162.64466
2.19790
Corrected Total
77
339.42689
Root MSE
1.48253
R-Square
0.5208
Dependent Mean
7.44654
Adj R-Sq
0.5014
Coeff Var
19.90901
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Intercept
1
-4.05384
1.41211
-2.87
0.0053
iq
1
0.08412
0.01495
5.62
<.0001
sc
1
0.05129
0.01565
3.28
0.0016
gender
1
-0.96852
0.34948
-2.77
0.0071
The CORR Procedure
6
Variables:
grade
iq
age
sc
iqgen
gender
Simple Statistics
Variable
N
Mean
Std Dev
Sum
Minimum
Maximum
grade
78
7.44654
2.09956
580.83000
0.53000
10.76000
iq
78
108.92308
13.17097
8496
72.00000
136.00000
age
78
12.74359
0.63319
994.00000
12.00000
15.00000
sc
78
56.96154
12.41223
4443
20.00000
80.00000
iqgen
78
66.85897
55.44758
5215
0
136.00000
gender
78
0.60256
0.49254
47.00000
0
1.00000
Pearson Correlation Coefficients, N = 78
grade
iq
age
sc
iqgen
gender
grade
1.00000
0.63373
-0.38927
0.54183
-0.00505
-0.09733
iq
0.63373
1.00000
-0.38236
0.49315
0.30884
0.19142
age
-0.38927
-0.38236
1.00000
-0.17808
-0.04358
0.00214
sc
0.54183
0.49315
-0.17808
1.00000
0.16141
0.09519
iqgen
-0.00505
0.30884
-0.04358
0.16141
1.00000
0.98562
gender
-0.09733
0.19142
0.00214
0.09519
0.98562
1.00000
Download