Multicollinearity Exercise

advertisement
Multicollinearity Exercise
Use the attached SAS output to answer the questions. [OPTIONAL: Copy the SAS program
below into the SAS editor window and run it.] Please don’t print out all the output shown below or
the from the SAS job if you decide to run it.
1. Use at least three different methods to diagnose whether multicollinearity is a problem for this
set of data.
2. Identify which variables are key participants in the most serious near linear dependency in the
data. (How do you know this?)
3. Which variable has the “wrong sign” for its coefficient in this regression? Explain why its sign
is wrong.
4. What is the smallest value of the ridge constant (k) that “fixes” the sign of the coefficient you
named in #3?
5. What is the smallest value of the ridge constant (k) that reduces all VIF’s so that they are
below the guideline of 10?
6. What is the smallest value of k that seems (in your opinion) to stabilize the coefficients?
7. If one principle component is removed, give the estimated coefficients for X1, X2, X3, X4.
Does this fix the one with the “wrong” sign?
*******************************************************************
************
LAW SCHOOL ADMISSION DATA
******************
****************
PARTLY FROM PAGE 599 OF SMITH ***************
*******************************************************************;
**** DATA FOR 20 STUDENTS ******
Y IS THE LAW SCHOOL GPA
X1 IS THE UNDERGRADUATE SCHOOL GPA
X2 IS THE LMAT PERCENTILE
X3 IS A RATING OF THE UNDERGRADUATE SCHOOL QUALITY
X4 IS THE GRE SCORE;
DATA LAW;
INPUT Y X1 X2 X3 X4 NO $; CARDS;
3.42 3.28 .96 6
1330
1
3.60 3.18 .97 7
1370
2
3.28 2.89 .93 5
1140
3
3.75 3.72 .99 8
1520
4
3.36 3.18 .95 6
1270
5
3.96 3.50 .98 8
1450
6
3.31 3.04 .94 5
1200
7
3.33 3.87 .95 5
1340
8
3.60 3.54 .96 7
1350
9
4.00 3.27 .99 10
1480
a
3.28 3.30 .95 5
1280
b
3.44 3.29 .91 7
1080
c
3.25 3.17 .93 5
1170
d
3.75 3.62 .97 8
1410
e
3.30 3.34 .96 5
1330
f
3.20 3.08 .90 4
1010
g
3.50 3.37 .96 6
1340
h
3.28 3.16 .94 5
1220
i
3.17 3.20 .95 4
1270
j
3.31 3.10 .94 5
1210
k
;
TITLE 'LAW SCHOOL ADMISSIONS DATA';
PROC CORR; VAR Y X1 X2 X3 X4;
PROC REG; MODEL Y=X1 X2 X3 X4
/ COLLIN VIF;
PROC REG RIDGE = 0 TO .01 BY .001 OUTEST=B;
MODEL Y=X1 X2 X3 X4 ;
PROC PRINT;
PROC PLOT; PLOT (X1 X2 X3 X4) * _RIDGE_ / VREF=0 VPOS=25 HPOS=45;
PROC REG DATA = LAW
MODEL Y=X1 X2 X3 X4;
PROC PRINT;
RIDGE= 0 TO .01 BY .001 OUTEST=C OUTVIF;
PROC REG DATA = LAW
MODEL Y=X1 X2 X3 X4;
PROC PRINT;
run;
PCOMIT=1 2 3 OUTEST=C;
LAW SCHOOL ADMISSIONS DATA
Correlation Analysis
Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 20
Y
X1
X2
X3
X4
Y
1.00000
0.0
0.47331
0.0350
0.76094
0.0001
0.95925
0.0001
0.76574
0.0001
X1
0.47331
0.0350
1.00000
0.0
0.52911
0.0164
0.42078
0.0647
0.65377
0.0018
X2
0.76094
0.0001
0.52911
0.0164
1.00000
0.0
0.69724
0.0006
0.98781
0.0001
X3
0.95925
0.0001
0.42078
0.0647
0.69724
0.0006
1.00000
0.0
0.69983
0.0006
X4
0.76574
0.0001
0.65377
0.0018
0.98781
0.0001
0.69983
0.0006
1.00000
0.0
Model: MODEL1
Dependent Variable: Y
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
C Total
4
15
19
1.07143
0.07106
1.14249
0.26786
0.00474
Root MSE
Dep Mean
C.V.
0.06883
3.45450
1.99243
R-square
Adj R-sq
F Value
Prob>F
56.542
0.0001
0.9378
0.9212
Parameter Estimates
Variable
DF
Parameter
Estimate
Standard
Error
T for H0:
Parameter=0
Prob > |T|
INTERCEP
X1
X2
X3
X4
1
1
1
1
1
-2.378637
0.125719
6.058256
0.129773
-0.000878
24.38266385
0.64288464
32.14872369
0.01417076
0.00646192
-0.098
0.196
0.188
9.158
-0.136
0.9236
0.8476
0.8531
0.0001
0.8937
Variable
DF
Variance
Inflation
INTERCEP
X1
X2
X3
X4
1
1
1
1
1
0.00000000
96.67364263
2280.9435770
1.99014566
2880.9838356
Collinearity Diagnostics
Condition
Var Prop
Var Prop
Var Prop
Var Prop
Var Prop
Number
Eigenvalue
Index
INTERCEP
1
2
3
4
5
4.95347
0.04096
0.00348
0.00209
1.47569E-7
1.00000
10.99716
37.70759
48.66811
5794
0.0000
0.0000
0.0000
0.0000
1.0000
X1
X2
0.0000
0.0001
0.0022
0.0150
0.9827
X3
0.0000
0.0000
0.0000
0.0000
1.0000
X4
0.0012
0.5624
0.3141
0.1117
0.0106
0.0000
0.0000
0.0004
0.0005
0.9991
OBS
_MODEL_
_TYPE_
_DEPVAR_
_RIDGE_
_PCOMIT_
_RMSE_
1
2
3
4
5
6
7
8
9
10
11
12
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
PARMS
RIDGE
RIDGE
RIDGE
RIDGE
RIDGE
RIDGE
RIDGE
RIDGE
RIDGE
RIDGE
RIDGE
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
.
0.000
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
0.010
.
.
.
.
.
.
.
.
.
.
.
.
0.068829
0.068829
0.068871
0.068880
0.068886
0.068892
0.068899
0.068907
0.068915
0.068925
0.068935
0.068947
OBS
INTERCEP
X1
X2
X3
X4
Y
1
2
3
4
5
6
7
8
9
10
11
12
-2.37864
-2.37864
0.89844
1.17880
1.28047
1.33140
1.36094
1.37947
1.39160
1.39968
1.40504
1.40850
0.12572
0.12572
0.03989
0.03249
0.02977
0.02838
0.02755
0.02700
0.02663
0.02637
0.02617
0.02603
6.05826
6.05826
1.73602
1.36524
1.23008
1.16182
1.12178
1.09627
1.07920
1.06748
1.05935
1.05373
0.12977
0.12977
0.12932
0.12907
0.12883
0.12859
0.12836
0.12813
0.12790
0.12767
0.12744
0.12721
-.00087848
-.00087848
-.00000776
0.00006863
0.00009765
0.00011320
0.00012307
0.00013001
0.00013524
0.00013938
0.00014279
0.00014568
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
Plot of X1*_RIDGE_.
Legend: A = 1 obs, B = 2 obs, etc.
X1 ‚
‚
0.15 ˆ
‚
‚
‚ A
‚
‚
0.10 ˆ
‚
‚
‚
‚
‚
0.05 ˆ
‚
A
‚
A
A
‚
A
A
A
A
A
A
A
‚
‚
0.00 ˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
‚
Šƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒ
0.000
0.002
0.004
0.006
0.008
0.010
Ridge regression control value
Plot of X2*_RIDGE_.
Legend: A = 1 obs, B = 2 obs, etc.
X2 ‚
‚
6 ˆ A
‚
‚
‚
‚
‚
4 ˆ
‚
‚
‚
‚
‚
2 ˆ
‚
A
‚
A
A
‚
A
A
A
A
A
A
A
‚
‚
0 ˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Šƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒ
0.000
0.002
0.004
0.006
0.008
0.010
Ridge regression control value
Plot of X3*_RIDGE_. Legend: A = 1 obs, B = 2 obs, etc.
X3 ‚
‚
0.15 ˆ
‚
‚ A
A
‚
A
A
A
A
A
A
A
A
A
‚
‚
0.10 ˆ
‚
‚
‚
‚
‚
Šƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒ
0.000
0.002
0.004
0.006
0.008
0.010
Ridge regression control value
Plot of X4*_RIDGE_.
Legend: A = 1 obs, B = 2 obs, etc.
X4 ‚
‚
‚
0.00025 ˆ
‚
‚
A
A
A
A
A
A
A
A
‚
A
0 ˆƒƒƒƒƒAƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
‚
‚
‚
-0.00025 ˆ
‚
‚
‚
-0.0005 ˆ
‚
‚
‚
-0.00075 ˆ
‚
‚ A
‚
-0.001 ˆ
‚
Šƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒ
0.000
0.002
0.004
0.006
0.008
0.010
Ridge regression control value
O
B
S
1
2
3
4
OBS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
_MODEL_
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
MODEL1
_TYPE_
PARMS
RIDGEVIF
RIDGE
RIDGEVIF
RIDGE
RIDGEVIF
RIDGE
RIDGEVIF
RIDGE
RIDGEVIF
RIDGE
RIDGEVIF
RIDGE
RIDGEVIF
RIDGE
RIDGEVIF
RIDGE
RIDGEVIF
RIDGE
RIDGEVIF
RIDGE
RIDGEVIF
RIDGE
_DEPVAR_
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
OBS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
INTERCEP
-2.37864
.
-2.37864
.
0.89844
.
1.17880
.
1.28047
.
1.33140
.
1.36094
.
1.37947
.
1.39160
.
1.39968
.
1.40504
.
1.40850
X1
0.1257
96.6736
0.1257
3.9053
0.0399
2.1861
0.0325
1.8012
0.0298
1.6537
0.0284
1.5803
0.0275
1.5373
0.0270
1.5090
0.0266
1.4887
0.0264
1.4731
0.0262
1.4606
0.0260
_
X2
6.06
2280.94
6.06
59.05
1.74
17.99
1.37
8.89
1.23
5.48
1.16
3.84
1.12
2.93
1.10
2.37
1.08
2.00
1.07
1.74
1.06
1.55
1.05
_
M
O
D
E
L
_
MODEL1
MODEL1
MODEL1
MODEL1
D
_
E
T
P
Y
V
P
A
E
R
_
_
PARMS Y
IPC
Y
IPC
Y
IPC
Y
_
_
R
I
D
G
E
_
.
.
.
.
P
C
O
M
I
T
_
.
1
2
3
_RIDGE_
.
0.000
0.000
0.001
0.001
0.002
0.002
0.003
0.003
0.004
0.004
0.005
0.005
0.006
0.006
0.007
0.007
0.008
0.008
0.009
0.009
0.010
0.010
X3
0.12977
1.99015
0.12977
1.95543
0.12932
1.94548
0.12907
1.93597
0.12883
1.92660
0.12859
1.91732
0.12836
1.90811
0.12813
1.89898
0.12790
1.88992
0.12767
1.88093
0.12744
1.87201
0.12721
_PCOMIT_
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
X4
-0.00
2880.98
-0.00
74.08
-0.00
22.21
0.00
10.72
0.00
6.41
0.00
4.34
0.00
3.19
0.00
2.48
0.00
2.02
0.00
1.70
0.00
1.47
0.00
_
I
R
N
M
T
S
C
E
P
X
X
_
T
1
2
0.06883 -2.37864 0.12572 6.05826
0.06670 1.52761 0.02349 0.90751
0.10775 -0.66338 -0.13747 3.65788
0.13095 -0.76037 0.20866 2.78208
X
3
0.12977
0.12951
0.06579
0.03571
_RMSE_
0.068829
.
0.068829
.
0.068871
.
0.068880
.
0.068886
.
0.068892
.
0.068899
.
0.068907
.
0.068915
.
0.068925
.
0.068935
.
0.068947
Y
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
X
4
-.00087848
0.00015692
0.00053840
0.00051382
Y
-1
-1
-1
-1
Download