Stat 5511 - Lab 10 (1) Weighted Least squares Create a sas file “lab10_1.sas” containing the following code: options ls=80; data one; infile '~/5511/txt_files/data-table-B01.txt'; input y x1 x2 x3 x4 x5 x6 x7 x8 x9; run; proc reg data=one; model y=x7; output out=a1 r=resid; run; proc plot data=a1; plot y*x7; /*To see if there is a linear relationship between x7 and y.*/ plot resid*x7; /*To see if there is equal variance.*/ run; data two; set a1; /*Read data from data set a1.*/ sqrr=resid*resid; /*Meanwhile create a new variable.*/ run; proc reg data=two; /*Fit least squares model and predict SDi2 . */ model sqrr=x7; output out=a2 p=pred_s; run; data three; set a2; wt=1/pred_s; run; proc reg data=three; model y=x7/clb; weight wt; run; /* Weight is = 1 / SDi2 */ (2)Example of problem 6.5 on P199 options ls=80; data; infile '~/5511/txt_files/data-table-B01.txt'; input y x1 x2 x3 x4 x5 x6 x7 x8 x9; proc reg; model y=x1 x2 x3 x4 x5 x6 x7 x8 x9/influence; output out=a cookd=cook_d; proc print data=a; var cook_d; run; You can do some analysis like the following: P=9+1=10 n=28 2 p / n 0.7143 From hat matrix, all hii 0.7143 , so no leverage points. You also can find all values of cook’sD<1, so based on this criterion, there are no obvious influential points. DEFITS, DFBETAS and COVRATIOS are also used to do analysis. (3) Example of problem 9.2 on p.300 options ls=80; data; infile '~/5511/txt_files/data-table-B01.txt'; input y x1 x2 x3 x4 x5 x6 x7 x8 x9; proc reg; model y=x1 x2 x4 x7 x8 x9/selection=f; proc reg; model y=x1 x2 x4 x7 x8 x9/selection=b; proc reg; model y=x1 x2 x4 x7 x8 x9/selection=stepwise; proc reg; model y=x1 x2 x4 x7 x8 x9/selection=rsquare ADJRSQ cp best=3; run; If SELECTION=RSQUARE, the BEST=option requests the maximum number of subset models for each size. For example, the last model will give you the following results: Number in R-Square Adjusted C(p) Variables in Model Model R-Square 1 0.5447 0.5272 26.4722 x8 1 0.3519 0.3270 47.8393 x1 1 0.2974 0.2704 53.8843 x7 ------------------------------------------------------------------2 0.7433 0.7227 6.4577 x2 x8 2 0.6597 0.6325 15.7240 x2 x7 2 0.6068 0.5754 21.5835 x1 x2 ------------------------------------------------------------------3 0.7863 0.7596 3.6881 x2 x7 x8 3 0.7775 0.7497 4.6637 x1 x2 x8 3 0.7495 0.7182 7.7709 x2 x8 x9 ------------------------------------------------------------------4 0.8012 0.7666 4.0385 x2 x7 x8 x9 4 0.7949 0.7593 4.7301 x1 x2 x8 x9 4 0.7893 0.7527 5.3561 x2 x4 x7 x8 ------------------------------------------------------------------5 0.8069 0.7630 5.4064 x1 x2 x7 x8 x9 5 0.8065 0.7625 5.4477 x2 x4 x7 x8 x9 5 0.7956 0.7491 6.6594 x1 x2 x4 x8 x9 ------------------------------------------------------------------6 0.8106 0.7564 7.0000 x1 x2 x4 x7 x8 x9 C(p) is one of criterion to choose subset model. Basically, small value s of c(p) are desirable.