SAS PROC Freq 1. Introduction Frequency tables show the distribution of variable values. Cross-tabulation tables show combined frequency distributions for two or more variables. For one-way tables, PROC FREQ can compute chisquare tests for equal or specified proportions. For two-way tables, PROC FREQ computes tests and measures of association. For n-way tables, PROC FREQ does stratified analysis, computing statistics within as well as across strata. 2. Syntax PROC FREQ options; OUTPUT <OUT= SAS-data-set><output-statistic-list>; TABLES requests / options; WEIGHT variable; EXACT statistic-keywords; BY variable-list; 3. Details. a) The following options are available in the PROC FREQ statement: COMPRESS DATA= SAS-data-set ORDER= INTERNAL|FREQ|DATA|FORMATTED FORMCHAR(1,2,7)= 'string' PAGE NOPRINT COMRPESS The COMPRESS option includes the next one-way frequency table on the same page if there is enough space to begin the table. By default, the next one-way table begins on the same page only if the entire table fits on that page. ORDER= INTERNAL | FREQ | DATA | FORMATTED The ORDER= option specifies the order the variable levels are to be reported. INTERNAL: Levels are ordered by their internal value. FREQ : Levels are ordered by descending frequency count. DATA: Levels are ordered as they were ordered in the input SAS data set. FORMATTED: Levels are ordered by their external formatted value. Default: INTERNAL Note: the ORDER= option does not apply to missing values, which are always ordered first, or to observations with zero weights. FORMCHAR(1,2,7)= 'string' The FORMCHAR option defines the characters to be used for constructing the outlines and dividers for the cells of contingency tables. The string should be three characters long. The characters are used to denote (1) vertical divider, (2) horizontal divider, and (7) vertical-horizontal intersection. Default: FORMCHAR(1,2,7)= '|-+' PAGE The PAGE option requests that FREQ print only one table per page. NOPRINT The NOPRINT option suppresses all printed output from PROC FREQ. Note that a NOPRINT options continues to be available in the TABLES statement. It suppresses printing of the tables, but allows printing of the statistics specified by the ALL, CHISQ, CMH, EXACT, MEASURES, and PLCORR options. b) OUTPUT <OUT= SAS-data-set> <output-statistic-list>; The OUTPUT statement creates a SAS data set containing statistics computed by PROC FREQ. The output SAS data set can include any statistics requested in the TABLES statement. You can request these statistics by using keywords identical to the options used to request them in the TABLES statement: AGREE, ALL, CHISQ, CMH, CMH1, CMH2, EXACT, MEASURES, and PLCORR. Or, request individual statistics by specifying one of the keywords listed below: AJCHI BDCHI CMHCOR CMHGA CMHRMS CONTGY EXACT JT KAPPA LAMCR LAMDAS LAMRC MCNEM MHCHI MHOR MHRRC1 MHRRC2 N PHI PLCORR RDIF1 RDIF2 RRC1 RRC2 RSK11 RSK12 RSK21 RSK22 RELRISK RISKDIFF SMDCR SMDRC STUTC TREND TSYMM U CQ CRAMV EQKAPS EQWTKAPS LGOR LGRRC1 LGRRC2 LRCHI NMISS PCHI PCORR RROR RSK1 RSK2 RISKDIFF1 UCR RISKDIFF2 URC SCORR WTKAPPA Only one OUTPUT statement is allowed for each execution of the FREQ procedure. Where there are multiple TABLES statements, the contents of the output SAS data set correspond to the last TABLES statement; when there are multiple table requests in a TABLES statement, the contents correspond to the last table request. For each stratum, there is one observation that contains the requested statistics. The names for the requested statistics are the names of the keywords enclosed in underscores. If a statistic has a corresponding p-value, the name for the p-value is formed by adding P and an underscore before the keyword. Other variables included are BY variables, if any, and variables that identify the stratum. c) TABLES requests / options; The TABLES command requests tables be produced. Any number of TABLES statements can be included. If no TABLES statement is given, one-way frequencies for all of the variables in the data set are produced. To request a one-way frequency table for a variable, name the variable in a TABLES statement. For example: PROC FREQ;TABLES a; For a crosstabulation table of two variables, give their names separated by an asterisk. The first variable's values form the rows of the table, and the second variable's values form the columns. For example: PROC FREQ; TABLES a*b; For n-way crosstabulation tables, the last variable's values form the columns; the next-to-last variable's values form the rows. Each level (or combination of levels) of the other variables form one stratum. A contingency table is produced for each stratum. TABLES requests / options ; Options that can be used in the TABLES statement: General LIST MISSING OUT= V5FMT Request Statistical analysis: AGREE ALL CHISQ CL CMH CMH1 CMH2 EXACT JT MEASURES PLCORR RISKDIFF TESTF= TESTP= TREND Statistical Details ALPHA= CONVERGE= MAXITER= RELRISK SCORES= Request Additional Table information CELLCHI2 CUMCOL DEVIATION EXPECTED MISSPRINT SPARSE TOTPCT Suppress Printing NOCOL NOCUM NOFREQ NOPERCENT NOPRINT NOROW NOTE: see SAS online manual for more details. d) WEIGHT variable; Normally, each observation contributes a value of 1 to the frequency counts. When a WEIGHT statement appears, each observation contributes the weighting variable's value for that observation. The values do not have to be integers. Negative values for the specified variable are allowed. Since negative values cannot correspond to actual frequencies, the total frequency, percentages, and statistical calculations are undefined and, therefore, not printed when there are negative weights. If the value of the weight variable is missing or zero, the corresponding observation is ignored. Only one WEIGHT statement can be used, and that statement applies to counts collected for all tables. e) EXACT statistic-keywords; The EXACT statement allows you to specify statistics for which to calculate exact p-values. You can request exact computations for groups of statistics by specifying keywords identical to the TABLES statement options AGREE, CHISQ, and MEASURES. You can request exact p-values for an individual statistic by specifying the corresponding keyword in the following list. Note that specifying the keyword RROR requests exact confidence bounds for the odds ratio for 2x2 tables. JT LRCHI MHCHI PCORR SCORR WTKAP KAPPA MCNEM PCHI RROR TREND f) BY <DESCENDING> variables ... <NOTSORTED>; A BY statement is used with a procedure to obtain separate analyses on observations in groups defined by the BY variables. The data set being processed need not have been previously sorted by the SORT procedure. However, the data set must be in the same order as though PROC SORT had sorted it unless NOTSORTED is specified. If you have used a FORMAT or ATTRIB statement to group a continuous variable into discrete groups, the BY statement creates BY groups based on the formatted values. You can also ensure that variables are processed in ascending order by creating an index for one or more variables in the SAS data set. The usages of the BY statement differ in each procedure. Please refer to the Users' Guide for the details.