Stat 5511

advertisement
Stat 5511 - Lab 10
(1) Weighted Least squares
Create a sas file “lab10_1.sas” containing the following code:
options ls=80;
data one;
infile '~/5511/txt_files/data-table-B01.txt';
input y x1 x2 x3 x4 x5 x6 x7 x8 x9;
run;
proc reg data=one;
model y=x7;
output out=a1 r=resid;
run;
proc plot data=a1;
plot y*x7;
/*To see if there is a linear relationship between x7 and y.*/
plot resid*x7;
/*To see if there is equal variance.*/
run;
data two;
set a1;
/*Read data from data set a1.*/
sqrr=resid*resid;
/*Meanwhile create a new variable.*/
run;
proc reg data=two;
/*Fit least squares model and predict SDi2 . */
model sqrr=x7;
output out=a2 p=pred_s;
run;
data three;
set a2;
wt=1/pred_s;
run;
proc reg data=three;
model y=x7/clb;
weight wt;
run;
/* Weight is = 1 / SDi2 */
(2)Example of problem 6.5 on P199
options ls=80;
data;
infile '~/5511/txt_files/data-table-B01.txt';
input y x1 x2 x3 x4 x5 x6 x7 x8 x9;
proc reg;
model y=x1 x2 x3 x4 x5 x6 x7 x8 x9/influence;
output out=a cookd=cook_d;
proc print data=a;
var cook_d;
run;

You can do some analysis like the following:
P=9+1=10
n=28  2 p / n  0.7143
From hat matrix, all hii  0.7143 , so no leverage points.

You also can find all values of cook’sD<1, so based on this criterion, there are no obvious
influential points.

DEFITS, DFBETAS and COVRATIOS are also used to do analysis.
(3) Example of problem 9.2 on p.300
options ls=80;
data;
infile '~/5511/txt_files/data-table-B01.txt';
input y x1 x2 x3 x4 x5 x6 x7 x8 x9;
proc reg;
model y=x1 x2 x4 x7 x8 x9/selection=f;
proc reg;
model y=x1 x2 x4 x7 x8 x9/selection=b;
proc reg;
model y=x1 x2 x4 x7 x8 x9/selection=stepwise;
proc reg;
model y=x1 x2 x4 x7 x8 x9/selection=rsquare ADJRSQ cp best=3;
run;
If SELECTION=RSQUARE, the BEST=option requests the maximum number of subset models for each
size.
For example, the last model will give you the following results:
Number in
R-Square Adjusted
C(p) Variables in Model
Model
R-Square
1
0.5447
0.5272
26.4722
x8
1
0.3519
0.3270
47.8393
x1
1
0.2974
0.2704
53.8843
x7
------------------------------------------------------------------2
0.7433
0.7227
6.4577
x2 x8
2
0.6597
0.6325 15.7240
x2 x7
2
0.6068
0.5754 21.5835
x1 x2
------------------------------------------------------------------3
0.7863
0.7596
3.6881
x2 x7 x8
3
0.7775
0.7497 4.6637
x1 x2 x8
3
0.7495
0.7182 7.7709
x2 x8 x9
------------------------------------------------------------------4
0.8012
0.7666
4.0385
x2 x7 x8 x9
4
0.7949
0.7593
4.7301
x1 x2 x8 x9
4
0.7893
0.7527
5.3561
x2 x4 x7 x8
------------------------------------------------------------------5
0.8069
0.7630 5.4064
x1 x2 x7 x8 x9
5
0.8065
0.7625 5.4477
x2 x4 x7 x8 x9
5
0.7956
0.7491 6.6594
x1 x2 x4 x8 x9
------------------------------------------------------------------6
0.8106
0.7564 7.0000
x1 x2 x4 x7 x8 x9
C(p) is one of criterion to choose subset model. Basically, small value s of c(p) are desirable.
Download