AGRO 6005

advertisement
AGRO 6005
Conferencia 3
Estadísticos descriptivos
Los procedimientos más usados para calcular estadísticos descriptivos son Means,
Univariate, Freq, Tabulate y Summary.
The MEANS procedure provides data summarization tools to compute descriptive
statistics for variables across all observations and within groups of observations. For
example, PROC MEANS





calculates descriptive statistics based on moments
estimates quantiles, which includes the median
calculates confidence limits for the mean
identifies extreme values
performs a t test.
By default, PROC MEANS displays output. You can also use the OUTPUT statement to
store the statistics in a SAS data set.
The options that can be used specify which statistics to compute and the order to display
them in the output. The available keywords in the PROC statement are
Descriptive statistic keywords
CLM (confidence limits for the mean)
CSS (corrected sum of squares)
CV (coefficient of variation)
KURTOSIS|KURT (coefficient of kurtosis)
LCLM (one-sided lower confidence limit for the mean)
MAX
MEAN
MIN
N
NMISS
RANGE
SKEWNESS|SKEW (coefficient of skewness)
STDDEV|STD (standard deviation)
STDERR (standard error of the mean)
SUM
SUMWGT (sum of weights)
UCLM (one-sided upper confidence limit for the mean)
USS (uncorrected sum of squares)
VAR (variance)
Quantile statistic keywords
MEDIAN|P50
P1
P5
P10
P90
P95
P99
Q1|P25
Q3|P75
QRANGE
Hypothesis testing keyword
PROBT
T
The default options are N, MEAN, STD, MIN, and MAX.
The UNIVARIATE procedure provides data summarization tools, high-resolution
graphics displays, and information on the distribution of numeric variables. For example,
PROC UNIVARIATE
 calculates descriptive statistics based on moments
 calculates the median, mode, range, and quantiles
 calculates the robust estimates of location and scale
 calculates confidence limits
 tabulates extreme observations and extreme values
 generates frequency tables
 plots the data distribution
 performs tests for location and normality
 performs goodness-of-fit tests for fitted parametric and nonparametric
distributions.
 creates histograms and optionally superimposes density curves for fitted
continuous distributions (beta, exponential, gamma, lognormal, and Weibull) and
for kernel density estimates
 creates quantile-quantile plots and probability plots for various theoretical
distributions and optionally superimposes a reference line that corresponds to the
specified or estimated location and scale parameters for the theoretical
distribution
 creates one-way and two-way comparative histograms, comparative quantilequantile plots, and comparative probability plots
 insets tables of statistics in the graphical displays (high-resolution graphs)
 creates output data sets with requested statistics, histogram intervals, and
parameters of the fitted distributions.
The FREQ procedure is a descriptive as well as a statistical procedure that produces
one-way to n-way frequency and crosstabulation tables. Frequency tables concisely
describe your data by reporting the distribution of variable values. Crosstabulation
tables, also known as contingency tables, summarize data for two or more classification
variables by showing the number of observations for each combination of variable
values.
For one-way frequency tables, PROC FREQ can compute statistics to test for equal
proportions, specified proportions, or the binomial proportion. For contingency tables,
PROC FREQ can compute various statistics to examine the relationships between two
classification variables adjusting for any stratification variables. PROC FREQ
automatically displays the output in a report and can also save the output in a SAS data
set.
The TABULATE procedure displays descriptive statistics in tabular format, using some
or all of the variables in a data set. You can create a variety of tables ranging from
simple to highly customized.
PROC TABULATE computes many of the same statistics that are computed by other
descriptive statistical procedures such as MEANS, FREQ, and REPORT. PROC
TABULATE provides



simple but powerful methods to create tabular reports
flexibility in classifying the values of variables and establishing hierarchical
relationships between the variables
mechanisms for labeling and formatting variables and procedure-generated
statistics.
Ejemplo:
ods rtf file='e:\conferencia3.rtf';
options pageno=1;
data matseca;
infile 'e:\conferencia3.txt';
input ciclo fecha : mmddyy8. germopl $ variedad @@;
do bloque=1 to 3;
input ms @@;
matseca=ms/100;
output;
end;
format fecha date7.;
drop ms;
proc sort data=matseca;
by ciclo variedad;
proc print data=matseca;
where germopl='Gram.';
by ciclo;
var fecha variedad bloque matseca;
title 'Datos de Gramíneas';
run;
title 'Estadísticos descriptivos';
proc means data=matseca n mean stddev q1 p50 q3 qrange;
class germopl variedad;
var matseca;
run;
proc tabulate data=matseca;
class ciclo germopl variedad;
var matseca;
table germopl*variedad*ciclo, matseca*(mean std);
run;
proc sort data=matseca;
by germopl;
proc univariate data=matseca;
by germopl;
var matseca;
run;
proc freq data=matseca;
tables germopl*ciclo;
run;
proc format;
value degree 1='Less than high school'
2='High school or junior college'
3='Bachelor or graduate';
data table;
input degree opinion $ count ;
format degree degree.;
datalines;
1 against 178
1 indiff
138
1 favor
108
2 against 570
2 indiff 648
2 favor
442
3 against 138
3 indiff 252
3 favor 252
title 'Uso de Proc Freq';
proc freq order=data;
weight count;
tables degree*religion / nopercent nocol ;
run;
ods rtf close;
Download