AGRO 6005 Conferencia 3 Estadísticos descriptivos Los procedimientos más usados para calcular estadísticos descriptivos son Means, Univariate, Freq, Tabulate y Summary. The MEANS procedure provides data summarization tools to compute descriptive statistics for variables across all observations and within groups of observations. For example, PROC MEANS calculates descriptive statistics based on moments estimates quantiles, which includes the median calculates confidence limits for the mean identifies extreme values performs a t test. By default, PROC MEANS displays output. You can also use the OUTPUT statement to store the statistics in a SAS data set. The options that can be used specify which statistics to compute and the order to display them in the output. The available keywords in the PROC statement are Descriptive statistic keywords CLM (confidence limits for the mean) CSS (corrected sum of squares) CV (coefficient of variation) KURTOSIS|KURT (coefficient of kurtosis) LCLM (one-sided lower confidence limit for the mean) MAX MEAN MIN N NMISS RANGE SKEWNESS|SKEW (coefficient of skewness) STDDEV|STD (standard deviation) STDERR (standard error of the mean) SUM SUMWGT (sum of weights) UCLM (one-sided upper confidence limit for the mean) USS (uncorrected sum of squares) VAR (variance) Quantile statistic keywords MEDIAN|P50 P1 P5 P10 P90 P95 P99 Q1|P25 Q3|P75 QRANGE Hypothesis testing keyword PROBT T The default options are N, MEAN, STD, MIN, and MAX. The UNIVARIATE procedure provides data summarization tools, high-resolution graphics displays, and information on the distribution of numeric variables. For example, PROC UNIVARIATE calculates descriptive statistics based on moments calculates the median, mode, range, and quantiles calculates the robust estimates of location and scale calculates confidence limits tabulates extreme observations and extreme values generates frequency tables plots the data distribution performs tests for location and normality performs goodness-of-fit tests for fitted parametric and nonparametric distributions. creates histograms and optionally superimposes density curves for fitted continuous distributions (beta, exponential, gamma, lognormal, and Weibull) and for kernel density estimates creates quantile-quantile plots and probability plots for various theoretical distributions and optionally superimposes a reference line that corresponds to the specified or estimated location and scale parameters for the theoretical distribution creates one-way and two-way comparative histograms, comparative quantilequantile plots, and comparative probability plots insets tables of statistics in the graphical displays (high-resolution graphs) creates output data sets with requested statistics, histogram intervals, and parameters of the fitted distributions. The FREQ procedure is a descriptive as well as a statistical procedure that produces one-way to n-way frequency and crosstabulation tables. Frequency tables concisely describe your data by reporting the distribution of variable values. Crosstabulation tables, also known as contingency tables, summarize data for two or more classification variables by showing the number of observations for each combination of variable values. For one-way frequency tables, PROC FREQ can compute statistics to test for equal proportions, specified proportions, or the binomial proportion. For contingency tables, PROC FREQ can compute various statistics to examine the relationships between two classification variables adjusting for any stratification variables. PROC FREQ automatically displays the output in a report and can also save the output in a SAS data set. The TABULATE procedure displays descriptive statistics in tabular format, using some or all of the variables in a data set. You can create a variety of tables ranging from simple to highly customized. PROC TABULATE computes many of the same statistics that are computed by other descriptive statistical procedures such as MEANS, FREQ, and REPORT. PROC TABULATE provides simple but powerful methods to create tabular reports flexibility in classifying the values of variables and establishing hierarchical relationships between the variables mechanisms for labeling and formatting variables and procedure-generated statistics. Ejemplo: ods rtf file='e:\conferencia3.rtf'; options pageno=1; data matseca; infile 'e:\conferencia3.txt'; input ciclo fecha : mmddyy8. germopl $ variedad @@; do bloque=1 to 3; input ms @@; matseca=ms/100; output; end; format fecha date7.; drop ms; proc sort data=matseca; by ciclo variedad; proc print data=matseca; where germopl='Gram.'; by ciclo; var fecha variedad bloque matseca; title 'Datos de Gramíneas'; run; title 'Estadísticos descriptivos'; proc means data=matseca n mean stddev q1 p50 q3 qrange; class germopl variedad; var matseca; run; proc tabulate data=matseca; class ciclo germopl variedad; var matseca; table germopl*variedad*ciclo, matseca*(mean std); run; proc sort data=matseca; by germopl; proc univariate data=matseca; by germopl; var matseca; run; proc freq data=matseca; tables germopl*ciclo; run; proc format; value degree 1='Less than high school' 2='High school or junior college' 3='Bachelor or graduate'; data table; input degree opinion $ count ; format degree degree.; datalines; 1 against 178 1 indiff 138 1 favor 108 2 against 570 2 indiff 648 2 favor 442 3 against 138 3 indiff 252 3 favor 252 title 'Uso de Proc Freq'; proc freq order=data; weight count; tables degree*religion / nopercent nocol ; run; ods rtf close;