SAS--Proc Means (Descriptive Stats)

advertisement
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
SAS—Proc Means
Proc Means is the basic SAS command used to compute descriptive statistics for numeric, measurement
variables. (Proc Means should not be used for character/text variables, nor for nominal or ordinal
numeric variables.)
The keywords below are used to specify the statistics that you want Proc Means to compute and the
order to display them in the output.
Descriptive statistics keywords used in Proc Means:
N (for each variable, gives number of rows in data set with non-missing data)
NMISS (for each variable, gives number of rows with missing data for that variable)
SUM (for each variable, gives sum of all values in all rows for that variable)
MEAN (gives mean of each variable)
MEDIAN (gives median of each variable)
MODE (gives mode of each variable)
MAX (gives max of each variable)
MIN (gives min of each variable)
RANGE (gives range of each variable)
VAR (calculates the variance for each variable)
VARDEF= either N or DF (sets degrees of freedom for VAR and STDDEV)
STDDEV (calculates the standard deviation for each variable)
CV (calculates the coefficient of variation for each variable)
SKEWNESS or SKEW (calculates skewness for each variable)
KURTOSIS or KURT (calculates kurtosis for each variable)
STDERR (calculates the standard error for each sample mean)
CLM (calculates two-sided confidence limits for each sample mean)
UCLM (calculates one-sided, upper, confidence limit for each sample mean)
LCLM (calculates one-sided, lower, confidence limit for each sample mean)
maxdec=2 (sets the number of decimal places to show in the output)
1
Examples
The following command will calculate a zillion descriptive statistics for every numeric, measurement
variable in dataset01. The "vardef=df" command calculates statistics for a data set that is a sample
of the population (if your data set covers the entire population, then use "vardef=n" instead). The
"maxdec=3" command sets the number of decimal places that will be shown in the output to 3.
proc means data=dataset01 vardef=df maxdec=3 N NMISS MEAN MEDIAN MODE VAR STD
CV MIN MAX RANGE SUM SKEW KURT;
run;
Note: Standard error, confidence limits for the mean, and the Student's t-test are calculated for a
sample of data (rather than data on the whole population), so you must use VARDEF= DF.
Suppose you want descriptive statistics for just some of the variables (particular columns) in your
dataset. Use the "var" command to specify which variables you want to analyze. The following
commands produce output for variables X1 X2 and X3 only:
proc means data=dataset01 vardef=df maxdec=3 N NMISS MEAN MEDIAN
MODE VAR STD CV MIN MAX RANGE SUM SKEW KURT;
var X1 X2 X3;
run;
Suppose you want descriptive statistics for just some of the observations (particular rows) in your
dataset. First, sort the data set using the Proc Sort command. Second, use a "by" command in Proc
Means to produce a set of descriptive statistics for each value of the "by" variable. For example, the
following commands sort dataset01 by variable "Region", then Proc Means produces a separate set of
output for each Region:
proc sort data=dataset01;
by Region;
run;
proc means data=dataset01 vardef=df maxdec=3 N NMISS MEAN MEDIAN
MODE VAR STD CV MIN MAX RANGE SUM SKEW KURT;
by Region;
run;
or, after the proc sort, you could use “where” instead of “by”:
proc means data=dataset01 vardef=df maxdec=3 N NMISS MEAN MEDIAN
MODE VAR STD CV MIN MAX RANGE SUM SKEW KURT;
where Region=’coast’;
run;
the “where” command tells SAS to do proc means for only those data rows that meet
the condition specified in the “where” command.
2
Download