SAS Example A10

advertisement
SAS Example A10
data sales;
infile “U:\Documents\...\sales.txt”;
input Region : $8. State $2. +1 Month monyy5. Headcnt
Revenue Expenses;
format Month monyy5. Revenue dollar12.2;
run;
proc sort;
by Region State Month;
run;
proc print;
run;
Mervyn Marasinghe
proc print label;
by Region State ;
format Expenses dollar10.2 ;
label State= State Month= Month Revenue=“Sales Revenue”
Expenses=“Overhead Expenses”;
id Region State;
sum Revenue Expenses;
sumby Region;
title “ Sales report by State and Region”;
run;
2
OutputDeliverySystem(ODS)
Sample Data Set sales.txt
SOUTHERN FL JAN78 10 10000 8000
SOUTHERN FL FEB78 10 11000 8500
SOUTHERN FL MAR78 9 13500 9800
SOUTHERN GA JAN78 5 8000 2000
SOUTHERN GA FEB78 7 6000 1200
PLAINS NM MAR78 2
500 1350
NORTHERN MA MAR78 3 1000 1500
NORTHERN NY FEB78 4 2000 4000
NORTHERN NY MAR78 5 5000 6000
EASTERN NC JAN78 12 20000 9000
EASTERN NC FEB78 12 21000 8990
EASTERN NC MAR78 12 20500 9750
EASTERN VA JAN78 10 15000 7500
EASTERN VA FEB78 10 15500 7800
EASTERN VA MAR78 11 16600 8200
CENTRAL OH JAN78 13 21000 12000
CENTRAL OH FEB78 14 22000 13000
CENTRAL OH MAR78 14 22500 13200
CENTRAL MI JAN78 10 10000 8000
CENTRAL MI FEB78 9 11000 8200
CENTRAL MI MAR78 10 12000 8900
CENTRAL IL JAN78 4 6000 2000
CENTRAL IL FEB78 4 6100 2000
CENTRAL IL MAR78 4 6050 2100
ODS allows flexibility in presenting output from
SAS procedures
ODS can arrange output in more presentable
ways
It also can create output in a variety of formats
(called destinations)
Examples of currently available ODS
destinations:
LISTING: produces traditional monospace
output
PS or PDF: output that is formatted for a high-
resolution printer such as PostScript or PDF
HTML: output that is formatted in various
markup languages such as HTML
RTF: output that is formatted for use with
Microsoft Word
Traditionally, results of a SAS procedure were
displayed in a the output window in the LISTING
format.
Use the Results tab in the Preferences window
to change HTML to LISTING format
3
SAS ODS Example 1
Current Default Settings in SAS 9.3
The default destination in the SAS windowing
environment is HTML (changed from
traditional LISTING destination)
data oranges;
The default style for the HTML destination is
HTMLBlue
datalines;
input Variety $ Flavor texture Looks;
Total=Flavor+Texture+Looks;
navel 9 8 6
This new style is an all-color style that is
designed to integrate tables and modern
statistical graphics
temple 7 7 7
valencia 8 9 9
In the past, SAS users used SAS/Graph to
produce high quality graphics. We’ll call these
traditional SAS graphics.
Template-based graphics produced by ODS
Graphics (different from traditional SAS
graphics) is now used for producing graphs is
most SAS procs
Previously, in the SAS Windowing
environment, ODS GRAPHICS ON statement
was used to enable (and ODS GRAPHICS
OFF statement to disable) ODS Graphics
mandarin 5 7 8
;
proc sort data=oranges;
by descending Total;
run;
ods rtf file=“U:\Documents\...\oranges.rtf";
proc print data=oranges;
title 'Taste Test Results for Oranges';
run;
ods rtf close;
6
PROC MEANS (continued)
PROC MEANS
Statistics:
PROC MEANS options ;
N, NMISS, MEAN, STD, MIN, MAX, RANGE,
SUM, VAR, USS, CSS, CV, STDERR, T, PRT,
SUMWGT
CLASS variables ;
VAR variables ;
TYPES request(s);
WAYS list;
BY variables ;
FREQ variable ;
WEIGHT variable ;
ID variables ;
OUTPUT OUT=SAS_dataset statistics ;
Example of OUTPUT statement ;
output out=stats mean=Av_Ht Av_Wt
stderr=SE_Ht SE_Wt;
Options: DATA= , PRINT, MAXDEC= , FW= ,
MISSING, NWAY, IDMIN, DESCENDING,
ORDER= , VARDEF= , statistics
MAXDEC = n no. of decimal places (2)
FW = n field width (12)
For printing
computed
statistics
ORDER=FREQ, DATA, INTERNAL, FORMATTED
VARDEF=DF, N, WGT, WDF
7
8
PROC UNIVARIATE
SAS Example B5
data biology;
input Id Sex $ Age Year Height Weight;
datalines;
7389 M 24 4 69.2 132.5
3945 F 19 2 58.5 112.0
4721 F 20 2 65.3 98.6
……………………………..
……………………………..
6327 M 20 1 70.2 135.4
8472 F 20 4 66.8 142.6
4875 M 20 1 74.2 160.4
;
PROC UNIVARIATE options ;
proc means data=biology fw=8 maxdec=3;
class Year Sex;
var Height Weight;
output out=stats mean=Av_Ht Av_Wt
stderr=SE_Ht SE_Wt;
run;
Some Options: DATA = , NOPRINT, PLOT, FREQ, NORMAL,
ALPHA=, CIBASIC (TYPE= ALPHA=), MU0= ,
TRIM= (TYPE= ALPHA=) , PCTLDEF=, VARDEF=,
VAR variables ;
CLASS variable-1 <(v-options)>
< variable-2 <(v- options)> >
...< / KEYLEVEL= value1 | ( value1 value2 ) >;
BY variables ;
ID variable ;
OUTPUT < OUT=SAS-data-set >
< keyword_1=names...keyword_k=names >
< percentile-options >;
PCTLDEF=1,2,3,4 OR 5 (5 methods of computing percentiles)
VARDEF=DF, N, WEIGHT, WDF (divisor for
computing variance)
proc print data=stats;
title “Biology Class Data Set: Output Statement”;
run;
NORMAL
computes the Shapiro-Wilk statistic W if n
2000 or
the Kolmogorov-Smirnov statistic D if n > 2000
10
PROC UNIVARIATE (continued)
SAS ODS Example 2
TYPE= LOWER, UPPER, TWOSIDED
data biology;
input Id Sex $ Age Year Height Weight;
datalines;
7389 M 24 4 69.2 132.5
3945 F 19 2 58.5 112.0
4721 F 20 2 65.3 98.6
……………………………..
……………………………..
6327 M 20 1 70.2 135.4
8472 F 20 4 66.8 142.6
4875 M 20 1 74.2 160.4
;
(specify type of confidence intervals)
TRIM= list of integers or fractions specifying amount of
trimming
ALPHA=.05 (for both CIBASIC and TRIM specifications)
Class Options: The v-options are MISSING or ORDER=
ORDER = FREQ, DATA, INTERNAL, FORMATTED
specifies how levels of the variable are ordered in the output
Some Keywords for use with OUTPUT statement:
ods rtf file=“U:\Documents\...\progB2_out.rtf";
ods select BasicMeasures Quantiles TestsForNormality;
N, NMISS, NOBS, MEAN, SUM, SD, VAR, SKEWNESS,
KURTOSIS, SUMWGT, MAX, MIN, RANGE, Q3, MEDIAN,
Q1, ORANGE, P1, P5, P10, P90, P95, P99, MODE,
SIGNRANK, NORMAL, PCTLNAME= , PCTLPTS= ,
PCTLPRE=
proc univariate data=biology Normal;
var Height;
title “Biology class: Analysis of Height Distribution”;
run;
Example:
proc univariate data=survey;
class county;
var acreage rainfall;
output out=new mean=ave1 ave2 var= var1 var2;
ods rtf close;
11
PROC TABULATE
SAS Example C12 (simplified)
data rainfall;
PROC TABULATE options ;
input Control Seeded @@;
Logseed = log10(seeded);
label Logseed="Log of Seeded Rainfall";
datalines;
1202.6 2745.6 830.1 1697.8 372.4 1656.0 345.5 978.0 321.2 703.4
244.3 489.1 163.0 430.0 147.8 334.1 95.0 302.8 87.0 274.7
81.2 274.7 68.5 255.0 47.3 242.5 41.1 200.7 36.6 198.6
29.0 129.6 28.6 119.0 26.3 118.3 26.1 115.3 24.4 92.4
21.7 40.6 17.3 32.7 11.5 31.4 4.9 17.5 4.9 7.7
1.0
4.1
CLASS variable ;
VAR variable ;
FREQ variable ;
WEIGHT variable ;
BY variable ;
TABLE expression, expression,
… / options ;
KEYLABEL keyword=‘text’ … ;
Options: DATA= , MISSING, FORMAT= ,
ORDER= , FORMCHAR( ) = ‘ ’,
NOSEPS, DEPTH=
;
proc univariate data=rainfall;
var Logseed;
probplot Logseed/normal(mu=est sigma=est);
FORMAT = any valid SAS or user defined format
(BEST12.2)
run;
ORDER = FREQ, DATA, INTERNAL,FORMATTED
FORMCHAR=‘|- - - -| + | - - -’
DEPTH = 10
14
PROC TABULATE (continued)
PROC TABULATE (continued)
TABLE statement
Examples of TABLE statements.
TABLE expression 1, expression 2, expression 3 /options;
class variable
defines
defines
pages
rows
defines
analysis variables
TABLE PRODUCT, REGION, SALETYPE*(QUANTITY AMOUNT);
columns
‘Expression’ consists of:
page
expression
Classification variables (CLASS variables)
Analysis variables (variables in VAR)
Statistics (N, MEAN, STD, MIN, MAX, etc.)
Format specifications
Other expressions
row
expression
column
expression
In the examples below, the required table format is
specified with a TABLE statement and the output
produced by different TABLE statements are
sketched.
Example Expressions:
Assume we have the variables A,B,C,D,X and Y defined in
SAS statements as follows:
CLASS A B C D ;
VAR X Y ;
The TABLE statement used appears on top of the
table produced.
A*B
A*B*C
A*MEAN*X
A B
A B*C
(A B)*C
15
16
PROC TABULATE Examples
Examples of TABLE statement
Example 1: TABLE REGION,POP ;
First we will consider only three variables:
REGION, CITYSIZE, and POP where REGION
and CITYSIZE are CLASS variables and POP is an
analysis variable (numerical, continuous-valued).
Each observation in the data contains data for a
single city for many cities in several regions.
Variable
Description
REGION
code for region of the country
CITYSIZE
code for relative population size
(S=small, M=medium, L=large)
POP
urban population
POP
SUM
REGION
NC
4650000.00
NE
6666000.00
SO
6864000.00
WE
8376000.00
Example 2: TABLE REGION,CITYSIZE*POP *SUM ;
CITYSIZE
Most applications like this need both CLASS and
VAR statements in the PROC TABULATE step in
addition to the TABLE statement:
L
M
S
POP
POP
POP
SUM
SUM
SUM
PROC TABULATE;
REGION
TITLE “ Population by Region”;
NC
3750000.00
750000.00
15000.00
CLASS REGION ;
NE
5022000.00
1422000.00
222000.00
VAR POP ;
SO
4488000.00
2088000.00
288000.00
WE
5592000.00
2592000.00
192000.00
17
18
Example 4: TABLE REGION*CITYSIZE*(SUM MEAN),POP ;
Example 3: TABLE REGION*CITYSIZE,POP*SUM ;
POP
REGION
CITYSIZE
NC
L
SUM
M
SUM
750000.00
MEAN
125000.00
SUM
150000.00
MEAN
SUM
REGION
CITYSIZE
NC
L
3750000.00
M
750000.00
S
150000.00
L
5022000.00
M
1422000.00
S
222000.00
L
4488000.00
M
2088000.00
NE
SO
WE
S
288000.00
L
5592000.00
M
2592000.00
S
192000.00
POP
S
MEAN
NE
L
SUM
M
SUM
MEAN
S
25000.00
5022000.00
837000.00
1422000.00
237000.00
SUM
222000.00
L
SUM
M
SUM
MEAN
S
625000.00
MEAN
MEAN
SO
3750000.00
37000.00
4488000.00
748000.00
2088000.00
MEAN
348000.00
SUM
288000.00
MEAN
48000.00
WE
19
20
Next, add four additional variables to the
example:
Example 5:
TABLE PRODUCT, REGION CITYSIZE,SALETYPE*(QUANTITY AMOUNT);
PRODUCT A100
PRODUCT and SALETYPE are class variables:
SALETYPE
R
QUANTITY and AMOUNT are analysis
variables.
W
QUANTITY
AMOUNT
QUANTITY
AMOUNT
SUM
SUM
SUM
SUM
Modified proc step:
REGION
NC
1250.00
31250.00
1250.00
25000.00
PROC TABULATE;
NE
1600.00
40000.00
1600.00
32000.00
SO
1880.00
47000.00
1880.00
37600.00
WE
1840.00
46000.00
1840.00
36800.00
63800.00
TITLE “ Sales by Region and City Size”;
CLASS REGION PRODUCT SALETYPE ;
CITYSIZE
L
3190.00
79750.00
3190.00
VAR POP QUANTITY AMOUNT;
M
2440.00
61000.00
2440.00
48800.00
S
940.00
23500.00
940.00
18800.00
The report produced by the following TABLE
statement contains a separate table for each
PRODUCT and provides an analysis of
wholesale and retail sales by REGION and by
CITYSIZE.
PRODUCT A200
SALETYPE
R
W
QUANTITY
AMOUNT
QUANTITY
AMOUNT
SUM
SUM
SUM
SUM
REGION
NC
21
1295.00
32375.00
1295.00
25900.00
22
Download