Lecture 5 Sorting, Printing, and Summarizing Your Data

advertisement
Sorting, Printing, and
Summarizing Your Data
(Chapter in the 4 Little SAS Book)
Animal Science 500
Lecture No. 5
September 14, 2010
IOWA STATE UNIVERSITY
Department of Animal Science
Using a Procedure (PROC)
 Using
the procedure statement is like filling
out a form


You fill in the blanks
Choose from the list of options
 PROC


statement
All procedures start with this statement
Is followed by the name of the procedure desired (Print,
Means, Tabulate, etc.)
IOWA STATE UNIVERSITY
Department of Animal Science
PROC statements
 If
you have many SAS dataset you are using
you can specify which dataset you want the
procedure to applied to

PROC Contents Data = Pig12;
 This

statement is optional
If the data statement is not present the Procedure will
be applied to the last data set created (not necessarily
used).
IOWA STATE UNIVERSITY
Department of Animal Science
By statement
 The
by statement is required for only one
Procedure, the PROC Sort

Obviously if you are sorting your data set you want to
sort it by some variable contained in your data.
 In
all other Procedures the by statement is
optional.
IOWA STATE UNIVERSITY
Department of Animal Science
Title and Footnote statements
 Title

prints at the top of your output
10 line limit
 Footnote

10 line limit
 You

prints at the bottom of your output
can place these anywhere you want
Applies to the Procedure you are working with so you
might want to just include it in this particular section
IOWA STATE UNIVERSITY
Department of Animal Science
Title and Footnote statements
 When
using the title and footnote statements
SAS does not care if you use the single quotes
‘test’ or the double quotes “test” as long as
you are consistent.
 Do
not mix types of quotes
IOWA STATE UNIVERSITY
Department of Animal Science
Label statement
 The
default situation in SAS uses the variable
names to label your output
 Can


create more descriptive labels
Limit up to 256 characters long limit
Example Label = DOT ‘date on test’
ADG = ‘Average Daily Gain’;
IOWA STATE UNIVERSITY
Department of Animal Science
Subsetting in Procedures with the Where
Statement
 The
Where statement can be used with almost any
PROC statement
 The
Where statement can be used to subset the
data much like an If – Then statement
 Advantage


for the Where statement
The IF – Then statement only works in the Data step
The Where statement works in both the Data and the
Procedure step.
IOWA STATE UNIVERSITY
Department of Animal Science
Subsetting in Procedures with the Where
Statement
 Examples

using the Where statement
In the Data step






Where Backfat le .50 backfat group =1;
Where Backfat ge .50 AND le .75 backfat group =2;
Where Backfat ge .75 AND le 1.00 backfat group =3;
Else Backfat group =.;
Run;
Quit;
IOWA STATE UNIVERSITY
Department of Animal Science
Subsetting in Procedures with the Where
Statement
 Examples

using the Where statement
In the Proc Step





Where Backfat Group = 1
Title ‘Leanest Pigs From The Experiment 1’
Footnote ‘Experimental Pigs with .50 inches of backfat and
less’;
Run;
Quit;
IOWA STATE UNIVERSITY
Department of Animal Science
Subsetting in Procedures with the PROC Sort
Statement
 Many


reasons to sort your data
Think of examples from labs
You can use the PROC Sort statement with the by option;

In the by statement you can include as many variables as you want
 Example – Proc sort; by pen sex trt;
 So it would sort by pen, then within pen sort sex and then, within pen
and sex sort by treatment;
 Often more useful to sort by one variable so that other procedures can
be performed.
IOWA STATE UNIVERSITY
Department of Animal Science
Subsetting in Procedures with the PROC Sort
Statement
 The
NODUPKEY eliminates duplicate observations
that have the same values for the BY variable
 Use

this with the DUPOUT option
This option will put the deleted observations in new data set
 Example

Data sort; by pen; Out = Pig13 NODUPKEY DUPOUT =
duplicates;
IOWA STATE UNIVERSITY
Department of Animal Science
Subsetting in Procedures with the PROC Sort
Statement
 Example

Data new Proc sort; by pen; Out = Pig13 NODUPKEY
DUPOUT = duplicates;
 This




is useful for working with field data
DHIA records
Records from Breed Associations (Yorkshire, Landrace,
Angus, Hereford, etc.)
How might you find?
Data= name; Proc Sort; by herd id; Out = cleandata
NODUPKEY DUPOUT = duplicates;
IOWA STATE UNIVERSITY
Department of Animal Science
Printing your Data with PROC PRINT
 We
have used PROC PRINT many times in lab
 What
is new is that there are several options in
SAS associated with the PROC PRINT
 Like
other statements you can tell SAS what
data set to use for printing

PROC PRINT DATA = PIG12
IOWA STATE UNIVERSITY
Department of Animal Science
Printing your Data with PROC PRINT
 PROC
PRINT Data = PIG12 NOOBS LABEL;
 The
NOOBS results in SAS NOT printing the
observation number by each line


Might want the observation number to check counts for
some reason so you would omit the NOOBS option
Use the NOOBS because it just takes up space
IOWA STATE UNIVERSITY
Department of Animal Science
Printing your Data with PROC PRINT
 Variable


ID statement prints out the IDs rather than the observation
number.
Useful to compare a printed list from SAS output to your
original data that you have stored electronically or on paper.
 The


option used with PROC PRINT
SUM option
This option sums the variable specified
Could be particularly useful when looking at pen weights
rather than individual wts. when pen is the experimental unit.
IOWA STATE UNIVERSITY
Department of Animal Science
Printing your Data with PROC PRINT
 The

SUM option
PROC Print Data = Pig 12;
by pen;
Sum wtgain;
VAR ID Pen Sex ADG wtgain;
Title ‘Weight Gain Summed by Pens’;
RUN;
QUIT;
IOWA STATE UNIVERSITY
Department of Animal Science
SAS Can be Used to Write Simple Reports
 For
example you just wanted to summerize the
data from Pig12
 Example
 Data
_Null_;
Infile ‘some data file source’;
Input variable names;
Created variable like ADG;
Tell where to put it;
File ‘c:\ somefile\data.txt’ Print;
Title;
Put @5 ‘Sale Report for ‘Name’ from classroom’ Class // @5 ‘Congratulations! You sold ‘
Quantity ’ boses of candy’ / @5 ‘and earned ‘ Profit DOLLAR6.2 ‘ for out field trip.’;
Put_PAGE_;
IOWA STATE UNIVERSITY
Department of Animal Science
Summarizing Your Data Using PROC MEANS
 We
have started to use PROC MEANS in LAB;
 Numerous

options to use with PROC MEANS
Use by including them behind PROC MEANS Option
 The
default will print the number of non-missing
values, the mean, the standard deviation, and the
minimum – maximum value for each observation.
 The


options available include;
Median –the numeric value separating the higher half of a
sample
Mode – value occurring most frequently
IOWA STATE UNIVERSITY
Department of Animal Science
Summarizing Your Data Using PROC MEANS
 Options



cont’
Nmiss – number of missing values by variable
Range – range of values
Sum – the sum
 Use
the default PROC MEANS; without the var
option;


If you do not wish to obtain means for every trait you can
indicate which variable means you wish to obtain
PROC MEANS var ADG, Backfat, LMA;
IOWA STATE UNIVERSITY
Department of Animal Science
Summarizing Your Data Using PROC MEANS
 You
can subset the data and obtain means using
the by statement

You will get n sets of means depending on how many levels
the by variable you sorted by
IOWA STATE UNIVERSITY
Department of Animal Science
Counting your Data Using PROC FREQ
 PROC



A frequency table gives you simple counts or number of
variables you have for each
When you have counts for one variable it is called a
one-way frequencies
When you have two or more variables the counts are
called two-way, three-way and so forth



FREQ
Alternatively the multiple variables are called Cross tabulations
Used most frequently to show distribution of data
variables
Can be used to identify data irregularities
IOWA STATE UNIVERSITY
Department of Animal Science
Counting your Data Using PROC FREQ
 Basis
form is PROC FREQ;
Tables variable – combinations;
 Options if any are included after a / after the
tables statement i.e.


Tables variable – combinations / list;
The options available are:






List – prints cross-tabulations in list format rather than grid
Missing – Includes missing values in frequency statistics
NOCOL – suppresses printing column percentages in corsstabulations
NOPERCENT- suppresses printing of percentages
NOROW - suppresses printing row percentages in cross-tabulations
Out = data-set writes a data set containing frequencies
IOWA STATE UNIVERSITY
Department of Animal Science
Counting your Data Using PROC FREQ
 Options
if any are included after a / after the
tables statement i.e.






Tables variable – combinations / list;
The options cont’ are:
Chisq - performs the standard Pearson chi-square test on
the table(s) requested.
Expected – prints the expected number of observations in
each cell under the null hypothesis.
Exact - requests Fisher's exact test for the table(s). This is
automatically computed for 2 x 2 tables.
Sparse - produces a full table, even if the table has many
cells containing no observations.
IOWA STATE UNIVERSITY
Department of Animal Science
Example of One–Way Contingency Table
Frequency
Percent
Cumulative
Frequency
JAZZ
273
61.21
273
61.21
CLASSICAL
59
13.23
332
74.44
POP
49
10.99
381
85.43
GOSPEL
44
9.87
425
95.29
RAP
21
4.71
446
100.00
CATEGORY
IOWA STATE UNIVERSITY
Department of Animal Science
Cumulative
Percentage
Example of Two –Way Contingency Table
Table of hichol1 by hichol2
hichol1
Frequeny
Percent
1
2
Total
hichol2
|
1 |
2 |
Total
-----------+-----------+-------| 21 | 21
|
42
| 22.83 | 22.83 |
45.65
-----------+------------+-------| 23
| 27
|
50
| 25.00 | 29.35 |
54.35
-----------+------------+-------44
48
92
47.83
52.17
100.00
Frequency Missing = 2
IOWA STATE UNIVERSITY
Department of Animal Science
Using PROC TABULATE
 Every
summary statistic the TABULATE
computes can be produced by other
procedures
Print
 Means, and
 Freq
But some people think PROC TABULATE output is much
“prettier”.

IOWA STATE UNIVERSITY
Department of Animal Science
Using PROC TABULATE
 General

Form
PROC TABULATE;


CLASS –classification variable list;
TABLE page-dimension, row-dimension, column-dimension
CLASS statement tell SAS which variables contain
categorical data to be used for dividing the observations
into groups

In our example data set this would mean things like
test, sex, pen, treatment, breed
IOWA STATE UNIVERSITY
Department of Animal Science
Using PROC TABULATE
 General

PROC TABULATE;





Form
CLASS –classification variable list;
TABLE page-dimension, row-dimension, column-dimension
TABLE statement defines only one table but you might
have multiple TABLE statements
If a variable is listed in the CLASS statement, then, by
default, PROC TABULATE produces simple counts of
the number of observations in each category fo that
variable.
PROC TABULATE offers many other statistics too.
IOWA STATE UNIVERSITY
Department of Animal Science
Using PROC TABULATE
 General

PROC TABULATE;




Form
CLASS –classification variable list;
TABLE page-dimension, row-dimension, column-dimension
Each TABLE statement can specify up to 3 dimensions
The dimensions separated by commas , and tell SAS
which variables are used for the pages, rows, and
columns



Specify one dimension then it becomes the column dimension
Specify two dimensions then they become rows and columns
Specify three dimensions then they become row, columns and
pages
IOWA STATE UNIVERSITY
Department of Animal Science
Using PROC TABULATE
 General

PROC TABULATE;



Form
CLASS –classification variable list;
TABLE page-dimension, row-dimension, column-dimension
The default results in observations that are missing are
not included in the in the tables

To keep the missing values included in the report then the code
PROC TABULATE MISSING; must be used
IOWA STATE UNIVERSITY
Department of Animal Science
Example form Generated by PROC TABULATE
 PROC



TABULATE;
CLASS GENDER;
VAR AGE INCOME EDUC;
TABLE (AGE INCOME EDUC)*MEAN, GENDER ALL;
 RUN;
 QUIT;
IOWA STATE UNIVERSITY
Department of Animal Science
Example form Generated by PROC TABULATE
 PROC



TABULATE;
CLASS GENDER;
VAR AGE INCOME EDUC;
TABLE (AGE INCOME EDUC)*MEAN, GENDER ALL;
 RUN;
 QUIT;
IOWA STATE UNIVERSITY
Department of Animal Science
Output from a PROC TABULATE example
Output 1.1
Variable
Label
N
Mean
Std Dev
Minimum
Maximum
-------------------------------------------------------------------------------------------------------------------------AGE
Age
6639
48.614
16.598
25.000
90.000
INCOME
Income
6639
25065.797
23850.488
0.000
263253.000
EDUC
Education
6639
13.040
2.953
4.000
19.000
--------------------------------------------------------------------------------------------------------------------------
IOWA STATE UNIVERSITY
Department of Animal Science
Output from a PROC TABULATE example
Output 1.2
GENDER = Female
Variable
Label
N
Mean
Std Dev
Minimum
Maximum
-------------------------------------------------------------------------------------------------------------------------AGE
Age
3559
49.528
17.158
25.000
90.000
INOME
Income
3559
17780.087
17070.596
0.000
263253.000
EDUC
Education
3559
12.932
2.899
4.000
19.000
------------------------------------------------------------------------------------------------------------------------GENDER = Male
Variable
Label
N
Mean
Std Dev
Minimum
Maximum
-------------------------------------------------------------------------------------------------------------------------AGE
Age
3080
47.558
15.864
25.000
INCOME
Income
3080
33484.577
27520.481
0.000
EDUC
Education
3080
13.165
3.011
4.000
90.000
251998.000
19.000
-------------------------------------------------------------------------------------------------------------
IOWA STATE UNIVERSITY
Department of Animal Science
Output from a PROC TABULATE example
Output 1.3
------------------------------------------------------------------------------------------------------------------------|
|
GENDER
|
|
|
|-------------------------------|
|
| Female |
Male
|
|
ALL
|
|-----------------------------------------------------+---------------+--------------+-------------------------------|
Age MEAN
|
49.53 |
47.56 |
48.61
|
|-----------------------------------------------------+---------------+--------------+--------------------------------|
|
Income MEAN
| 17780.09 | 33484.58 |
25065.80
|
|-----------------------------------------------------+---------------+--------------+--------------------------------|
|
Education MEAN
|
12.93
|
13.17 |
13.04
-------------------------------------------------------------------------------------------------------------------------
IOWA STATE UNIVERSITY
Department of Animal Science
|
Download