Goodness-of-Fit Tests When you specify the NORMAL option in the PROC UNIVARIATE statement or you request a fitted parametric distribution in the HISTOGRAM statement, the procedure computes goodness-of-fit tests for the null hypothesis that the values of the analysis variable are a random sample from the specified theoretical distribution. See Example 4.22. When you specify the NORMAL option, these tests, which are summarized in the output table labeled "Tests for Normality," include the following: • • • • Shapiro-Wilk test Kolmogorov-Smirnov test Anderson-Darling test Cramér-von Mises test The Kolmogorov-Smirnov statistic, the Anderson-Darling statistic, and the Cramér-von Mises statistic are based on the empirical distribution function (EDF). Shapiro-Wilk Statistic If the sample size is less than or equal to 2000 and you specify the NORMAL option, PROC UNIVARIATE computes the Shapiro-Wilk statistic, (also denoted as to emphasize its dependence on the sample size ). Small values of lead to the rejection of the null hypothesis of normality. The distribution of is highly skewed. Seemingly large values of (such as 0.90) may be considered small and lead you to reject the null hypothesis. The method for computing the -value (the probability of obtaining a statistic less than or equal to the observed value) depends on . For , the probability distribution of is known and is used to determine the -value. For , a normalizing transformation is computed: The values of , , and are functions of obtained from simulation results. Large values of indicate departure from normality, and because the statistic has an approximately standard normal distribution, this distribution is used to determine the -values for . EDF Goodness-of-Fit Tests When you fit a parametric distribution, PROC UNIVARIATE provides a series of goodness-of-fit tests based on the empirical distribution function (EDF). Given observations , the values are computed by applying the transformation, as discussed in the next three sections. PROC UNIVARIATE provides three EDF tests: • • • Kolmogorov-Smirnov Anderson-Darling Cramér-von Mises The following sections provide formal definitions of these EDF statistics. Kolmogorov D Statistic The Kolmogorov-Smirnov statistic ( ) is defined as The Kolmogorov-Smirnov statistic is computed as the maximum of and , where is the largest vertical distance between the EDF and the distribution function when the EDF is greater than the distribution function, and is the largest vertical distance when the EDF is less than the distribution function. PROC UNIVARIATE uses a modified Kolmogorov statistic to test the data against a normal distribution with mean and variance equal to the sample mean and variance. Anderson-Darling Statistic The Anderson-Darling statistic is computed as Cramér-von Mises Statistic The Cramér-von Mises statistic is computed as