Goodness-of-Fit Tests

advertisement
Goodness-of-Fit Tests
When you specify the NORMAL option in the PROC UNIVARIATE statement or you
request a fitted parametric distribution in the HISTOGRAM statement, the procedure
computes goodness-of-fit tests for the null hypothesis that the values of the analysis
variable are a random sample from the specified theoretical distribution. See Example
4.22.
When you specify the NORMAL option, these tests, which are summarized in the output
table labeled "Tests for Normality," include the following:
•
•
•
•
Shapiro-Wilk test
Kolmogorov-Smirnov test
Anderson-Darling test
Cramér-von Mises test
The Kolmogorov-Smirnov statistic, the Anderson-Darling statistic, and the Cramér-von
Mises statistic are based on the empirical distribution function (EDF).
Shapiro-Wilk Statistic
If the sample size is less than or equal to 2000 and you specify the NORMAL option,
PROC UNIVARIATE computes the Shapiro-Wilk statistic, (also denoted as to
emphasize its dependence on the sample size ). Small values of lead to the rejection of
the null hypothesis of normality. The distribution of is highly skewed. Seemingly large
values of (such as 0.90) may be considered small and lead you to reject the null
hypothesis. The method for computing the -value (the probability of obtaining a
statistic less than or equal to the observed value) depends on . For
, the probability
distribution of is known and is used to determine the -value. For
, a normalizing
transformation is computed:
The values of , , and are functions of obtained from simulation results. Large values
of indicate departure from normality, and because the statistic has an approximately
standard normal distribution, this distribution is used to determine the -values for
.
EDF Goodness-of-Fit Tests
When you fit a parametric distribution, PROC UNIVARIATE provides a series of
goodness-of-fit tests based on the empirical distribution function (EDF). Given
observations
, the values
are computed by applying the
transformation, as discussed in the next three sections.
PROC UNIVARIATE provides three EDF tests:
•
•
•
Kolmogorov-Smirnov
Anderson-Darling
Cramér-von Mises
The following sections provide formal definitions of these EDF statistics.
Kolmogorov D Statistic
The Kolmogorov-Smirnov statistic ( ) is defined as
The Kolmogorov-Smirnov statistic is computed as the maximum of and , where
is the largest vertical distance between the EDF and the distribution function when the
EDF is greater than the distribution function, and is the largest vertical distance when
the EDF is less than the distribution function.
PROC UNIVARIATE uses a modified Kolmogorov statistic to test the data against a
normal distribution with mean and variance equal to the sample mean and variance.
Anderson-Darling Statistic
The Anderson-Darling statistic is computed as
Cramér-von Mises Statistic
The Cramér-von Mises statistic is computed as
Download