Lesson 8 - Topics
•
•
•
•
Creating SAS datasets from procedures
Using ODS and data steps to make reports
Using PROC RANK
Programs 14-15 in course notes
• LSB 4:11;5:3
Making SAS Datasets From Procedures
Output from SAS PROCs can be put into SAS datasets:
1.
To do further processing of the information from
the output
2.
To reformat output to make a report
3.
To restructure original SAS dataset or create new
variables
Ways to Put Output into SAS
Datasets
• Using OUTPUT statement available from many procedures
• Using ODS OUTPUT statement – any output table can be
put into a SAS dataset
Report We Want to Generate
Quartiles of Weight by Gender and Center
sex
Male
Male
Male
Male
Female
Female
Female
Female
clinic
N
P25
P50
A
B
C
D
A
B
C
D
9
16
29
11
6
9
6
6
180.0
158.3
178.0
172.0
125.0
150.0
132.5
131.0
190.0
174.8
199.5
184.5
143.5
164.5
134.3
137.5
P75
208.0
218.3
220.5
194.0
160.5
184.0
138.5
148.5
Program 14
LIBNAME class ‘C:\SAS_Files';
* Will use SAS dataset version of TOMHS data;
DATA wt;
SET class.tomhsp (KEEP=ptid age sex clinic wtbl wt12
);
wtchg = wt12 - wtbl;
RUN;
PROC FORMAT;
VALUE sexF 1 = ‘Male’ 2=‘Female’;
RUN;
Create report by sex and clinic of univariate
info;
PROC SORT DATA = wt;
BY sex clinic;
PROC UNIVARIATE DATA = wt NOPRINT;
BY sex clinic;
VAR wt12 ;
OUTPUT OUT=univinfo
Name of new dataset
N
= n
Q1
= p25
Statistic name = variable name
MEDIAN = p50
Q3
= P75 ;
Dataset univinfo will have one observation
for each combination of sex and clinic.
PROC PRINT DATA = univinfo;
FORMAT sex sexF.;
RUN;
Obs
1
2
3
4
5
6
7
8
sex
Male
Male
Male
Male
Female
Female
Female
Female
clinic
n
p75
p50
p25
A
B
C
D
A
B
C
D
9
16
29
11
6
9
6
6
208.00
218.25
220.50
194.00
160.50
184.00
138.50
148.50
190.00
174.75
199.50
184.50
143.50
164.50
134.25
137.50
180.00
158.25
178.00
172.00
125.00
150.00
132.50
131.00
PROC PRINT DATA = univinfo NOOBS;
VAR sex clinic n p25 p50 p75;
FORMAT p25 p50 p75 6.1 ;
TITLE
'Quartiles of Weight by Gender/Center';
RUN;
Quartiles of Weight by Gender/Center
sex
Male
Male
Male
Male
Female
Female
Female
Female
clinic
N
P25
P50
A
B
C
D
A
B
C
D
9
16
29
11
6
9
6
6
180.0
158.3
178.0
172.0
125.0
150.0
132.5
131.0
190.0
174.8
199.5
184.5
143.5
164.5
134.3
137.5
P75
208.0
218.3
220.5
194.0
160.5
184.0
138.5
148.5
Using ODS to Send Output to a SAS Dataset
Syntax:
ODS OUTPUT output-table = new-data-set;
* Output quantile table to a dataset;
ODS OUTPUT quantiles = qwt;
PROC UNIVARIATE DATA = wt ;
VAR wtbl wt12 ;
RUN;
ODS OUTPUT CLOSE ;
PROC PRINT DATA=qwt;
RUN;
Display of Output Dataset
Obs
Varname
Quantile
Estimate
1
2
3
4
5
6
7
8
9
10
11
wtbl
wtbl
wtbl
wtbl
wtbl
wtbl
wtbl
wtbl
wtbl
wtbl
wtbl
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
279.30
274.15
246.40
237.40
215.15
192.65
165.90
141.50
137.40
130.25
128.50
12
13
14
15
16
17
18
19
20
21
22
wt12
wt12
wt12
wt12
wt12
wt12
wt12
wt12
wt12
wt12
wt12
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
271.50
271.50
239.00
227.00
202.50
180.00
153.50
133.00
130.00
123.00
123.00
Would like to put side-by-side
DATA wtbl wt12 ;
SET qwt;
if varname = 'wtbl'
if varname = 'wt12'
RUN;
then output wtbl; else
then output wt12;
PROC DATASETS ;
MODIFY wtbl;
RENAME estimate = wtbl;
MODIFY wt12;
RENAME estimate = wt12;
RUN;
DATA all;
MERGE wtbl wt12;
DROP varname;
RUN;
PROC PRINT;
Separate the data
into 2 datasets
PROC DATASETS used for
changing variable names
Put 2 datasets side-by-side
Obs
Quantile
wtbl
wt12
1
100% Max
279.30
271.50
2
99%
274.15
271.50
3
95%
246.40
239.00
4
90%
237.40
227.00
5
75% Q3
215.15
202.50
6
50% Median
192.65
180.00
7
25% Q1
165.90
153.50
8
10%
141.50
133.00
9
5%
137.40
130.00
10
1%
130.25
123.00
11
0% Min
128.50
123.00
ODS OUTPUT ParameterEstimates (persist=proc) = betas;
PROC REG DATA=WT;
MODEL dbpchg = wtchg age sex;
RUN;
PROC REG data=wt;
MODEL sbpchg = wtchg age sex;
RUN;
ODS OUTPUT CLOSE;
PROC PRINT DATA=betas;
RUN;
Display of Output Dataset - Report
Obs
Dependent
Variable
1
2
3
4
dbpchg
dbpchg
dbpchg
dbpchg
Intercept
wtchg
age
sex
5
6
7
8
sbpchg
sbpchg
sbpchg
sbpchg
Intercept
wtchg
age
sex
Estimate
StdErr
tValue
Probt
-0.059
0.175
-0.101
-2.622
6.431
0.073
0.112
1.572
-0.01
2.38
-0.91
-1.67
0.99
0.02
0.37
0.10
-3.849
0.364
-0.042
-4.118
13.304
0.152
0.231
3.253
-0.29
2.40
-0.18
-1.27
0.77
0.02
0.86
0.21
Display of Output Dataset Using BY Statement
PROC PRINT;
VAR variable estimate stderr tvalue probt;
BY dependent NOTSORTED;
FORMAT estimate 7.3 stderr 7.3 probt pvalue5.2 ;
Dependent=dbpchg
Obs
1
2
3
4
Variable
Intercept
wtchg
age
sex
Estimate
StdErr
tValue
Probt
-0.059
0.175
-0.101
-2.622
6.431
0.073
0.112
1.572
-0.01
2.38
-0.91
-1.67
0.99
0.02
0.37
0.10
Dependent=sbpchg
Obs
Variable
Estimate
StdErr
tValue
Probt
5
6
7
8
Intercept
wtchg
age
sex
-3.849
0.364
-0.042
-4.118
13.304
0.152
0.231
3.253
-0.29
2.40
-0.18
-1.27
0.77
0.02
0.86
0.21
PROC RANK
• Used to divide observations into equal size
categories based on values of a variable
• Creates a new variable containing the categories
• New variable is added to the dataset or to a new
dataset
• Example: Divide weight change into 5 equal
categories (Quinitiles)
PROC RANK SYNTAX
PROC RANK DATA
= dataset
OUT
= outdataset
GROUPS = # of categories
VAR varname;
RANKS newvarname;
Most of the time you can set OUT to be the same dataset
specified in DATA.
PROC RANK writes no output
PROGRAM 15
LIBNAME class ‘C:\SAS_Files';
DATA wtchol;
SET class.tomhsp (KEEP=ptid clinic sex wtbl
wt12 cholbl chol12);
wtchg = wt12 - wtbl;
cholchg = chol12 - cholbl;
RUN;
*This PROC will add a new variable to dataset
which is the tertile of weight change. The
new variable will be 0,1,or 2;
PROC RANK DATA = wtchol
GROUPS=3 OUT = wtchol;
VAR wtchg; RANKS twtchg;
Name of new variable
PARTIAL LOG
8
9
10
11
12
DATA wtchol;
SET class.tomhsp (KEEP=ptid clinic sex wtbl wt12 cholbl chol12);
wtchg = wt12 - wtbl;
cholchg = chol12 - cholbl;
RUN;
NOTE: There were 100 observations read from the data set CLASS.TOMHSP.
NOTE: The data set WORK.WTCHOL has 100 observations and 9 variables.
PROC RANK DATA = wtchol GROUPS=3 OUT = wtchol;
20
VAR wtchg; RANKS twtchg;
21
RUN;
NOTE: The data set WORK.WTCHOL has 100 observations and 10 variables.
PROC FREQ DATA = wtchol;
TABLES twtchg;
RUN;
OUTPUT:
Rank for Variable wtchg
Cumulative
Cumulative
twtchg
Frequency
Percent
Frequency
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
0
31
33.70
31
33.70
1
30
32.61
61
66.30
2
31
33.70
92
100.00
Frequency Missing = 8
PROC PRINT DATA = wtchol (obs=20);
VAR ptid wtchg twtchg;
TITLE 'Partial Listing of Datset
wtchol with new variable added';
RUN;
Partial Listing of Datset wtchol with new variable added
Obs
1
2
3
4
5
6
7
8
9
10
PTID
A00083
A00301
A00312
A00354
A00400
A00504
A00608
A00720
A00762
A00811
wtchg
-12.00
.
-9.50
-21.00
.
-9.25
.
-18.50
-5.25
-6.75
twtchg
1
.
1
0
.
1
.
0
2
1
PROC MEANS N MEAN MIN MAX MAXDEC=2;
VAR cholchg wtchg;
CLASS twtchg;
TITLE 'Mean Cholesterol Change by
Tertile of Weight Change';
RUN;
Mean Cholesterol Change by Tertile of Weight Change
The MEANS Procedure
Cutpoints for tertiles
Rank for
Variable
N
wtchg Obs Variable
N
Mean
Minimum
Maximum
-------------------------------------------------------------------------0
31 cholchg
30
-13.43
-55.00
47.00
wtchg
31
-22.51
-36.50
-14.30
1
30
2
31
cholchg
wtchg
30
30
-4.70
-10.21
-37.00
-14.00
26.00
-6.80
cholchg
31
-0.74
-52.00
45.00
wtchg
31
-1.82
-6.50
13.00
--------------------------------------------------------------------------
Could graph this data in an x-y plot (3 points)