04Sorting, Printing

advertisement
Sorting, Printing, and Summarizing
Your Data
Lecture 5
Review
 Creating and Redefining Variables
 SAS Functions
 IF-THEN Statements
 Grouping Observations with IF-THEN/ELSE
 Subsetting Data
 Simplifying Programs with Arrays
Lecture Structure
 Using SAS Procedures
 Printing Your Data with PROC PRINT
 Changing the Appearance of Printed Values with Formats
 Summarizing Your Data Using PROC MEANS
Using SAS Procedures
LABEL ReceiveDate = 'Date order was received'
ShipDate = 'Date merchandise was shipped';
Printing Your Data with PROC PRINT
PROC PRINT DATA = data-set NOOBS LABEL;
 Use the NOOBS option in the PROC PRINT statement. If you don’t want
observation numbers
 Print the labels instead of the variable names, then add the LABEL option as well.
The following are optional statements that sometimes come in handy:
The BY statement starts a new section in the output for each new value of
the BY variables and prints the values of the BY variables at the top of
each section. The data must be presorted by the BY variables.
When you use the ID statement, the observation numbers are not
ID variable-list;
printed. Instead, the variables in the ID variable list appear on the lefthand side of the page.
SUM variable-list; The SUM statement prints sums for the variables in the list.
VAR variable-list; The VAR statement specifies which variables to print and the order.
Without a VAR statement, all variables in the SAS data set are printed in
the order that they occur in the data set.
BY variable-list;
Printing Your Data with PROC PRINT
DATA sales;
INFILE 'D:\My Documents\My Class\TA\MyCode\05code and data\Candy.dat';
INPUT Name $ 1-11 Class @15 DateReturned MMDDYY10. CandyType $
Quantity;
Profit = Quantity * 1.25;
PROC SORT DATA = sales;
BY Class;
PROC PRINT DATA = sales;
BY Class;
Adriana 21 3/21/2008 MP 7
SUM Profit;
Nathan 14 3/21/2008 CD 19
VAR Name DateReturned CandyType Profit;
Matthew 14 3/21/2008 CD 14
TITLE 'Candy Sales for Field Trip by Class';
Claire
14 3/22/2008 CD 11
RUN;
Caitlin 21 3/24/2008 CD 9
Ian
Chris
Anthony
Stephen
Erika
21 3/24/2008 MP 18
14 3/25/2008 CD 6
21 3/25/2008 MP 13
14 3/25/2008 CD 10
21 3/25/2008 MP 17
Changing the Appearance of Printed
Values with Formats
Character
$formatw.
Numeric
formatw.d
Date
formatw.
FORMAT statement
FORMAT Profit Loss DOLLAR8.2 SaleDate MMDDYY8.;
FORMAT statements can go in either DATA steps or PROC steps. If the FORMAT
statement is in a DATA step, then the format association is permanent and is stored with
the SAS data set. If the FORMAT statement is in a PROC step, then it is temporary—
affecting only the results from that procedure.
PUT statement
PUT Profit DOLLAR8.2 Loss DOLLAR8.2 SaleDate MMDDYY8.;
Changing the Appearance of Printed
Values with Formats
DATA sales;
INFILE 'D:\My Documents\My Class\TA\MyCode\05code and data\Candy.dat';
INPUT Name $ 1-11 Class @15 DateReturned MMDDYY10. CandyType $
Quantity;
Profit = Quantity * 1.25;
PROC PRINT DATA = sales;
Adriana 21 3/21/2008 MP 7
VAR Name DateReturned CandyType Profit;
Nathan 14 3/21/2008 CD 19
FORMAT DateReturned DATE9. Profit DOLLAR6.2;
Matthew 14 3/21/2008 CD 14
TITLE 'Candy Sale Data Using Formats';
Claire
14 3/22/2008 CD 11
RUN;
Caitlin 21 3/24/2008 CD 9
Ian
21 3/24/2008 MP 18
Chris
14 3/25/2008 CD 6
Anthony 21 3/25/2008 MP 13
Stephen 14 3/25/2008 CD 10
Erika
21 3/25/2008 MP 17
Summarizing Your Data Using PROC MEANS
PROC MEANS options;
If you do not specify any options, MEANS will print the number of non-missing values, the mean,
the standard deviation, and the minimum and maximum values for each variable.
MAX
the maximum value
MIN
the minimum value
MEAN
the mean
MEDIAN
the median
MODE
the mode (new in SAS 9.2)
N
number of non-missing values
NMISS
number of missing values
RANGE
the range
STDDEV
the standard deviation
SUM
the sum
Summarizing Your Data Using PROC MEANS
If you use the PROC MEANS statement with no other statements, then you will
get statistics for all observations and all numeric variables in your data set.
Here are some of the optional statements you may want to use:
BY variable-list;
CLASS variable-list;
VAR variable-list;
The BY statement performs separate analyses for
each level of the variables in the list. [1] sorted in
the same order as the variable-list. (You can use
PROC SORT to do this.)
The CLASS statement also performs separate
analyses is more compact than with the BY
statement, and the data do not have to be sorted
first.
The VAR statement specifies which numeric
variables to use in the analysis. If it is absent then
SAS uses all numeric variables.
Summarizing Your Data Using PROC MEANS
A wholesale nursery is selling garden flowers, and they want to summarize their
sales figures by month. The data file which follows contains the customer ID,
date of sale, and number of petunias, snapdragons, and marigolds sold:
DATA sales;
INFILE 'D:\My Documents\My Class\TA\MyCode\05code and data\Flowers.dat';
INPUT CustomerID $ SaleDate MMDDYY10. Petunia SnapDragon
Marigold;
Month = MONTH(SaleDate);
756-01 05/04/2008 120 80 110
PROC SORT DATA = sales;
834-01 05/12/2008 90 160 60
BY Month;
901-02 05/18/2008 50 100 75
* Calculate means by Month for flower sales;
834-01 06/01/2008 80 60 100
PROC MEANS DATA = sales;
756-01 06/11/2008 100 160 75
BY Month;
901-02 06/19/2008 60 60 60
VAR Petunia SnapDragon Marigold;
756-01 06/25/2008 85 110 100
TITLE 'Summary of Flower Sales by Month';
RUN;
Exercise
 Download the dataset “Flowers.dat” from the folder “ 05 code and data” in
our blackboard.
 Summarizing this dataset Using PROC MEANS by CustomerID. (This result
does not need to submit. )
The data file which follows
contains the customer ID, date of
sale, and number of petunias,
snapdragons, and marigolds sold:
756-01
834-01
901-02
834-01
756-01
901-02
756-01
05/04/2008 120 80 110
05/12/2008 90 160 60
05/18/2008 50 100 75
06/01/2008 80 60 100
06/11/2008 100 160 75
06/19/2008 60 60 60
06/25/2008 85 110 100
/* This is the Sample Code with red filled part*/
DATA dataname;
INFILE ‘Locate your dataset here';
INPUT identify your data with right format;
PROC function_name DATA = dataname;
BY variable_name;
VAR othervariable you want to show in your output;
Exercise Result
DATA sales;
INFILE 'D:\My Documents\My Class\TA\MyCode\05code and data\Flowers.dat';
INPUT CustomerID $ SaleDate MMDDYY10. Petunia SnapDragon Marigold;
PROC SORT DATA = sales;
BY CustomerID;
* Calculate means by CustomerID, output sum and mean to new data set;
PROC MEANS DATA = sales;
BY CustomerID;
VAR Petunia SnapDragon Marigold;
Download