SAS-data-set - Mainframes Online Training

advertisement
Mainframe Online Training
Producing Descriptive Statistics
Statistical Analysis System
Poldsni Anil Kumar
Mainframe Online Training
Producing Descriptive Statistics
 Introduction
 Computing Statistics Using PROC MEANS
 Creating a Summarized Data Set Using PROC SUMMARY
 Producing Frequency Tables Using PROC FREQ
2
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Introduction
 PROC REPORT is the ability to summarize large amounts of data by
producing descriptive statistics. However, there are SAS procedures that
are designed specifically to produce various types of descriptive statistics and
to display them in meaningful reports
 If the data values that you want to describe are continuous numeric values ,
then you can use the MEANS procedure or the SUMMARY procedure to
calculate statistics such as the mean, sum, minimum, and maximum.
 If the data values that you want to describe are discrete then you can use the
FREQ procedure to show the distribution of these values.
3
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Computing Statistics Using PROC MEANS(Page 1)
 The MEANS procedure provides descriptive statistics such as the mean, minimum, and
maximum provide useful information about numeric data.
 Procedure Syntax
PROC MEANS <DATA=SAS-data-set>
<statistic-keyword(s)> <option(s)>;
RUN;
Where
 SAS-data-set is the name of the data set to be used
 statistic-keyword(s) specify the statistics to compute
 option(s) control the content, analysis, and appearance
Example
proc means data=perm.survey;
run;
4
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Computing Statistics Using PROC MEANS(Page 2)
Selecting Statistics
 Consider that you want to see the median and range of Perm. Survey numeric
values, add the MEDIAN and RANGE keywords as options.
Example
proc means data=perm.survey median range;
run;
The following keywords can be used with PROC
MEANS to compute statistics:
Keyword Description
MAX
5
Maximum value
MEAN
Average
MODE
Value that occurs most frequently
MIN
Minimum value
VAR
Variance
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Computing Statistics Using PROC MEANS(Page 3)
Limiting Decimal Places
To limit decimal places, use the MAXDEC= option in the PROC MEANS statement, and set
it equal to the length that you prefer.
Syntax
PROC MEANS <DATA=SAS-data-set>
<statistic-keyword(s)> MAXDEC=n;
where n specifies the maximum number of decimal places.
Example
proc means data=clinic.diabetes min max maxdec=0;
run;
6
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Computing Statistics Using PROC MEANS(Page 4)
Specifying Variables in PROC MEANS
To specify the variables that PROC MEANS analyzes, add a VAR statement and
list the variable names.
 General form, VAR statement:
VAR variable(s);
Example
proc means data=clinic.diabetes min max maxdec=0;
var age height weight;
run;
In addition to listing variables separately,
you can use a numbered range of variables
var item1-item5;
7
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Computing Statistics Using PROC MEANS(Page 5)
Group Processing Using the CLASS Statement
To produce separate analyses of grouped observations, add a CLASS statement to the
MEANS procedure.
General form, CLASS statement:
CLASS variable(s);
 where variable(s) specifies category variables for group processing.
CLASS variables can be either character or numeric, but they should contain a limited
number of discrete values that represent meaningful groupings.
Example
proc means data=clinic.heart maxdec=1;
var arterial heart cardiac urinary;
class survive sex;
run;
8
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Computing Statistics Using PROC MEANS(Page 6)
Group Processing Using the BY Statement
Like the CLASS statement, the BY statement specifies variables to use for
categorizing observations.
 General form, BY statement:
BY variable(s);
Difference between BY and CLASS
1.Unlike CLASS processing, BY processing requires that your data already be
sorted or indexed in the order of the BY variables.
2.BY group results have a layout that is different from the layout of CLASS group
results. Note that the BY statement in the program below creates four small
tables; a CLASS statement would produce a single large table.
9
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Computing Statistics Using PROC MEANS(Page 7)
Example for BY statement.
proc sort data=clinic.heart out=work.heartsort;
by survive sex;
run;
proc means data=work.heartsort maxdec=1;
var arterial heart cardiac urinary;
by survive sex;
run;
10
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Computing Statistics Using PROC MEANS(Page 8)
Creating a Summarized Data Set Using PROC MEANS
You might want to create an output SAS data set that contains just the
summarized variable.
 General form, OUTPUT statement:
OUTPUT OUT=SAS-data-set statistic=variable(s);
where
 OUT= specifies the name of the output data set
 statistic= specifies the summary statistic written out
11
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Computing Statistics Using PROC MEANS(Page 9)
Specifying the STATISTIC= Option
You can specify which statistics to produce in the output data set. To do so, you
must specify the statistic and then list all of the variables. The variables must
be listed in the same order as in the VAR statement. You can specify more
than one statistic in the OUTPUT statement.
proc means data=clinic.diabetes;
PROC MEANS in SAS LIST
var age height weight;
class sex;
output out=work.sum_gender
mean=AvgAge AvgHeight AvgWeight
min=MinAge MinHeight MinWeight;
run;
PROC MEANS OUTPUT TO SAS DATASET
To see the contents of the output data set,
submit the following PROC PRINT step.
12
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Computing Statistics Using PROC MEANS(Page 10)
Creating only the output data set
You can use the NOPRINT option in the PROC MEANS statement to prevent
the default report from being created. For example, the following program
creates only the output data set:
Example
proc means data=clinic.diabetes noprint;
var age height weight;
class sex;
output out=work.sum_gender
mean=AvgAge AvgHeight AvgWeight;
run;
13
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Creating a Summarized Data Set Using PROC SUMMARY
 You can also create a summarized output data set by using PROC SUMMARY.
 The difference between the two procedures is that PROC MEANS produces a report
by default. By contrast, to produce a report in PROC SUMMARY, you must include a
PRINT option in the PROC SUMMARY statement.
Example
proc summary data=clinic.diabetes print;
var age height weight;
class sex;
output out=work.sum_gender
mean=AvgAge AvgHeight AvgWeight;
run;
14
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Producing Frequency Tables Using PROC FREQ (Page 1)
 The FREQ procedure is a descriptive procedure as well as a statistical
procedure. It produces one-way and n-way frequency tables,.
 You can use the FREQ procedure to create cross-tabulation tables that
summarize data for two or more categorical variables by showing the number
of observations for each combination of variable values.
 General form, basic FREQ procedure:
PROC FREQ <DATA=SAS-data-set>;
RUN;
 By default, PROC FREQ creates a one-way table with the frequency, percent,
cumulative frequency, and
cumulative percent of every
value of all variables in a data set.
15
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Producing Frequency Tables Using PROC FREQ (Page 2)
Example
 For example, the following FREQ procedure creates a frequency table for each
variable in the data set Parts. Widgets. All the unique values are shown for
ItemName, LotSize, and Region.
proc freq data=parts.widgets;
run;
16
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Producing Frequency Tables Using PROC FREQ (Page 3)
Specifying Variables in PROC FREQ
 By default, the FREQ procedure creates frequency tables for every variable in
your data set.
 To specify the variables to be processed by the FREQ procedure, include a
TABLES statement.
Syntax
TABLES variable(s);
where variable(s) lists the variables to include.
17
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Producing Frequency Tables Using PROC FREQ (Page 4)
Example
Consider the SAS data set Finance.Loans. The variables Rate and Months are
best described as categorical values, so they are the best choices for
frequency tables.
proc freq data=finance.loans;
tables rate months;
run;
18
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Producing Frequency Tables Using PROC FREQ (Page 5)
Creating Two-Way Tables
It is often helpful to crosstabulate frequencies with the values of other variables.
For example, census data is typically crosstabulated with a variable that
represents geographical regions.
Syntax
TABLES variable-1 *variable-2 <* ... variable-n>;
where (for two-way tables)
 variable-1 specifies table rows and variable-2 specifies table columns.
When crosstabulations are specified, PROC FREQ produces tables with
cells that contain
 column cell frequency
 cell percentage of total frequency
 cell percentage of row frequency
 cell percentage of frequency.
19
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Producing Frequency Tables Using PROC FREQ (Page 6)
For example, the following program creates the two-way table shown
below.
Tables
proc format;
LEGEND BOX
value wtfmt low-139='< 140'
140-180='140-180‘
181-high='> 180';
value htfmt low-64='< 5''5"'
65-70='5''5-10"'
71-high='> 5''10"';
run;
proc freq data=clinic.diabetes;
tables weight*height;
format weight wtfmt.
height htfmt.;
run;
20
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Producing Frequency Tables Using PROC FREQ (Page 7)
Creating N-Way Tables
 For a frequency analysis of more than two variables, use PROC FREQ to
create n-way crosstabulations. A series of two-way tables is produced, with a
table for each level of the other variables.
 For example, suppose you want to add the variable Sex to your crosstabulation
of Weight and Height in the data set Clinic.Diabetes. Add Sex to the TABLES
statement, joined to the other variables with an asterisk (*).
 Example
tables sex*weight*height;
 The order of the variables is important. In n-way tables, the last two variables of
the TABLES statement become the two-way rows and columns.
21
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Producing Frequency Tables Using PROC FREQ (Page 8)
Example
proc format;
value wtfmt low-139='< 140‘
140-180='140-180‘
181-high='> 180';
value htfmt low-64='< 5''5"'
65-70='5''5-10"‘
71-high='> 5''10"';
run;
proc freq data=clinic.diabetes;
tables sex*weight*height;
format weight wtfmt. height htfmt.;
run;
22
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Producing Frequency Tables Using PROC FREQ (Page 9)
Changing the Table Format
CROSSLIST option to your TABLES statement displays cross-tabulation tables
proc format;
value wtfmt low-139='< 140‘
140-180='140-180‘
181-high='> 180';
value htfmt low-64='< 5''5"'
65-70='5''5-10"‘
71-high='> 5''10"';
run;
proc freq data=clinic.diabetes;
tables sex*weight*height/crosslist;
format weight wtfmt. height htfmt.;
run;
23
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Producing Frequency Tables Using PROC FREQ (Page 10)
Creating Tables
When three or more variables are specified, the multiple levels of n-way tables
can produce considerable output. Such bulky, often complex crosstabulations
are often easier to read as a continuous list in List Format
Syntax
TABLES variable-1 *variable-2 <* … variable-n> / LIST;
Example
proc format;
value wtfmt low-139='< 140'
….;
run;
proc freq data=clinic.diabetes;
tables sex*weight*height / list;
format weight wtfmt. height htfmt.;
run;
24
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
Producing Frequency Tables Using PROC FREQ (Page 11)
Suppressing Table Information
 NOFREQ suppresses cell frequencies
 NOPERCENT suppresses cell percentages
 NOROW supresses row percentages
 NOCOL suppresses column percentages.
Example
proc freq data=clinic.diabetes;
tables sex*weight / nofreq norow nocol;
format weight wtfmt.;
run;
25
SAS Presentation | IBM Confidential
|
15-Mar-16
Mainframe Online Training
26
SAS Presentation | IBM Confidential
|
15-Mar-16
Download