Lab 7- Stat 5511

advertisement
Lab 7- Stat 5511
Review from the previous lab: Type I SS and Type II SS
options ls=80;
data one;
infile ‘ ~/5511/txt_files/data-table-B01.txt’;
input y x1 x2 x3 x4 x5 x6 x7 x8 x9;
run;
proc reg data=one;
model y=x2 x7 x8/ss1 ss2;
/*Treat this one as Model 1.*/
model y=x2 x8;
/*Treat this one as Model 2.*/
run;
(1) Create a SAS file that contains the following code:
options ls=80;
data pizza;
infile 'lab6_2.dat';
/*lab6_2.data was created in previous lab.*/
input month $ x1 x2 y;
run;
proc reg data=pizza;
model y = x1 x2/p r influence;
output out=result p=pred_y r=residual;
test x1,x2=0;
proc plot data=result;
plot residual*x1;
plot residual*x2;
plot residual*pred_y;
run;
proc univariate data=pizza normal plot;
var y;
/* when using proc univariate, we are able to use the
option normal plot to test the normality of the data.*/
run;
To get a partial F-test, you can use the test option in the proc reg statement (e.g. if you want to
test x1 and x2 to see if at least one of these is significant. Then based on the hypothesis H 0:
x1=x2=0, you can write test x1, x2 = 0). Since the partial F test in this example is testing all the
slope parameters we have, this test is equivalent to the ANOVA F test. The option influence
requests a detailed analysis of the influence of each observation on the estimates and the
predicted values.
The residual plots are important plots in determining trends in our data.
Note: When using proc plot, the first variable you enter in the plot statement goes on the y-axis,
and the second variable goes on the x-axis.
(1b) Another way of doing the same as above,(both of the codes for these can be found on P.
138-139, but with different data sets).
options linesize=80;
data pizza;
infile 'lab6_2.dat';
input month $ x1 x2 y;
proc reg;
model y=x1 x2;
plot residual.*(predicted. x1 x2);
plot npp.*residual.;
run;
(2) Generating random variables
The following code will generate random variables
options Is=80;
data one;
do i=1 to I000;
y=normal(1);
output;
end;
run;
proc sort data=one;
by y;
run;
proc rank normal=blom out=yrank;
/*Here you create output data set yrank which includes
var y;
variables y and nscore.*/
ranks nscore;
run;
proc plot data=yrank;
plot nscore*y;
label nscore = 'normal score'
y = 'normal random variable';
run;
proc print;
Here, by default, we create 1000 random number with distribution N(0,1) (normal with mean 0
and standard deviation 1).
If we wanted to create a normal random variable that has mean 2 and standard deviation 5,
then instead of n=normal(1), we can use: n=2+(normal(1) * 5)
proc rank normal=blom out=yrank; In this statement, normal=blom specifies the ranking
method and blom is a method to compute normal scores from the ranks.
Note:
(a) In our example, we have normal(1). In place of the 1, we could put any number. If we
change it to anything other than -1, the data we obtain will be the same every time we
run our SAS file. If we put in a -1, then the generated values will change every time we
run SAS.
The label option simply appends a label to an axis (by default, the variable name will
appear on the axis).
(b) Similarly, we can use uniform(2) or ranexp(-1) instead of normal(1) to generate random
numbers with distributions U(0,1) or Exp(1).
(c) proc rank obtains probability quantile scores with the appropriate formulas. The above
SAS code gives you a way to get a normal plot.
(d) We also can use probability normal plot to check the normality of the data. You can use
the statements as follows:
proc univariate data=one;
ppplot y;
run;
But for some terminals, this graph cannot be displayed. For example, you cannot
directly open this probability normal plot if you use terminal through Putty.
Download