Stat 5511

advertisement
Stat 5511 - Lab 5 – Fall 2011
1) Learn how to plot confidence and prediction bands (see figure 2.12 on p. 46 of text)
options ls=80;
data temp; /* names the data temp */
infile '~/5511/txt_files/data-table-B02.txt';
input y x1 x2 x3 x4 x5;
proc reg;
model y=x1/p cli clm;
output out=cplot p=pred_y l95m=lm u95m=um l95=lp u95=up;
proc plot data=cplot;
plot pred_y*x1='*' lm*x1='-' um*x1='-' lp*x1='+' up*x1='+'/overlay;
run;
There are several commands we haven't yet learned. The output statement is an option in
several procedures. This allows us to use information that is created from the reg procedure
but would otherwise be lost (i.e. the information we are saving with output is not stored in
memory after the procedure ends). The out= option specifies the name of the SAS data set that
we are saving. This data set will include all the observations from the input statement as well as
all the values we save from the rest of the output line. p= saves the predicted y values, l95m=
saves the lower 95% confidence interval bound for the mean for each observation, u95m=
saves the upper 95% confidence interval bound for the mean for each observation, and l95/u95
save the 95% lower/upper bounds for the prediction intervals.
The values on the right side of all the equalities are what we rename the variables (e.g. on the
output line, we have p=pred_y, therefore, if we want to use the predicted values, we use the
variable name pred_y). Note: This is opposite of what we do if we create a variable in the body
of our SAS file (e.g. new_y=y-10 defines the value new_y for each observation of y).
In the plotting procedure, we are able to define a symbol for each plot. The first plot, we use
asterisks (*) to plot the data, the second and third plots we use minus signs (-), and the fourth
and fifth graphs we use plus signs (+). The slash on the plot statement does the same thing as
the slash on the model statement. This separates the arguments of the statement from the
options. Overlay allows us to plot all of the graphs on the same graph.
2) Matrix in SAS, SAS/IML (Interactive Matrix Language):
options ls=80;
proc iml; /* here we use Interactive Matrix Language */
A={1 1 1, 1 -1 0, 1 2 3};/*input matrix by row, separated by comma*/
b={4,6,7};
C={1 2 1, 0 4 5, 3 1 3};
print A b C;
trA=t(A);/*transpose of A*/
print trA;
D=A*C; / * matrix multiplication*/
print D;
A_inv=inv(A); /* inverse of A*/
s=A_inv*b;
print A_inv;
print s;
run;
Here are the results,
The SAS System
1
07:41 Friday, February 25, 2011
A
b
C
1
1
1
1
-1
2
1
0
3
4
6
7
1
1
1
4
1
10
1
0
3
2
4
1
trA
1
-1
0
1
2
3
D
7
-2
13
9
-4
20
1
5
3
A_inv
1 0.3333333 -0.333333
1 -0.666667 -0.333333
-1 0.3333333 0.6666667
s
3.6666667
-2.333333
2.6666667
3) In the following SAS code, there is some relationship between the data of x and y roughly
defined by y=3x. The data for rand_x and rand_y are created by random generator.
options ls=80;
data two;
input x y rand_x rand_y;
lines;
10 30 6 15
12 34 47 11
6 16 24 20
9 29 36 0
5 15 2 3
8 23 7 9
9 26 38 0
7 18 47 17
7 19 5 11
10 28 0 3
proc corr;
var x y rand_x rand_y;
proc corr fisher (rho0=0.6);
var x y;
run;
proc corr is the command that will calculate correlation coefficients for your data (and a
corresponding p-value, based on the null hypothesis ρ=0). For proc corr to have any meaning,
you should always have at least two variables in the var statement.
The correlation is the top number of each grouping and the number underneath is the p-value.
Also, you can see that the correlation matrix is symmetric (since ρ(x,y)=ρ(y,x)).
To test H0: ρ=0.6, there is an option in proc corr called fisher. This option uses the Fisher
transformation which makes the new data approximately normal. The syntax for the fisher option
is shown in the example code.
Download