Minitab material on test for Normality NORMTEST example Note see also NORMTEST replaces %NORMPLOT. Stat > Basic Statistics > Normality Test or Graph > Probability Plot Command Syntax NORMTEST C Generates a normal probability plot PTILES C or K...K Specifies a set of reference percents DVALUE K...K Shows the percents at the reference x-scale positions PERCENT Specifies a percent y-scale PROBABILITY Specifies a probability y-scale SCORES Specifies a percentile y-scale RJTEST Specifies the Ryan-Joiner test (similar to Shapiro-Wilk test) KSTEST Specifies the Kolmogorov-Smirnov goodness-of-fit test SPVALUE C Stores the p-value of test TITLE "text" Specifies a graph title GSAVE "file" Saves the graph in a Minitab Graphics Format (MGF) file Generates a normal probability plot. Normal plots use the values in the input column as x-values. The grid on the graph resembles the grids found on normal probability paper. The horizontal axis is a linear scale. The line forms an estimate of the cumulative distribution function for the population from which data are drawn. By default, an Anderson-Darling test for normality is performed and the numerical results are displayed with the graph. You can also use a Ryan-Joiner test (similar to a Shapiro-Wilk test) or a KolmogorovSmirnov test. Subcommands PTILES C or K...K DVALUE K...K Specifies a set of reference percents. The values must be between 0 and 100 when percents are used as the y-scale type or 0 to 1 when probability is the yscale type. Minitab marks each percent in the column with a horizontal reference line on the plot, and marks each line with the percent value. Minitab draws a vertical reference line where the horizontal reference line intersects the line fit to the data, and marks this line with the estimated data value. Use DVALUE to show the percents at the reference x-scale positions. 1 PERCENT Specifies a percent y-scale. PROBABILITY Specifies a probability y-scale. SCORES Specifies a percentile y-scale. RJTEST There are 3 types of goodness-of-fit test: a chi-square based test, an ECDF based test, and a correlation based test. By default, Minitab uses the AndersonDarling test, which is an ECDF based test. Use RJTEST to perform a RyanJoiner test, which is a correlation based test; use KSTEST to perform a Kolmogorov-Smirnov test, which is a chi-square based test. KSTEST When your -value is larger than the p-value displayed with the graph, you should reject the hypothesis of normality. The -value (also known as the significance level), is the probability that you will reject the hypothesis of normality when the hypothesis is true. For example, if you are using an -value of 0.10 and the p-value displayed in the Graph window is 0.07, then you would reject the hypothesis of normality at the 0.10 level. SPVALUE C Stores the p-value of the test. TITLE "text" Use TITLE to specify a title for the graph. When you omit this subcommand, Minitab displays a default title. GSAVE "filename" Use GSAVE to save the graph in a Minitab Graphics Format (MGF) file. Unless you specify a file extension or use a graphics format subcommand, Minitab automatically adds the extension MGF to the file name. If you save the plot, you can view it later with GVIEW and edit the plot with graph editing tools. See GSAVE for more information on this subcommand. Use GSAVE to save the graph in a Minitab Graphics Format (MGF) file. Unless you specify a file extension or use a graphics format subcommand, Back to top Minitab automatically adds the extension MGF to the file name. If you save the plot, you can view it later with GVIEW and edit the plot with graph editing tools. See GSAVE for more information on this subcommand. Example of Normality Test main topic interpreting results session command see also In an operating engine, parts of the crankshaft move up and down. AtoBDist is the distance (in mm) from the actual (A) position of a point on the crankshaft to a baseline (B) position. To ensure production quality, 2 a manager took five measurements each working day in a car assembly plant, from September 28 through October 15, and then ten per day from the 18th through the 25th. You wish to see if these data follow a normal distribution, so you use Normality test. 1 Open the worksheet CRANKSH.MTW. 2 Choose Stat > Basic Statistics > Normality Test. 3 In Variable, enter AtoBDist. Click OK. Graph window output -- see below! Thus is a Minitab run on the dataset mentioned above. Retrieving worksheet from file: '\\purple2\resource\wminitab14\Data\Cranksh.MTW' Worksheet was saved on Fri Sep 12 2003 Results for: Cranksh.MTW MTB > describe c1 Descriptive Statistics: AtoBDist Variable AtoBDist N 125 N* 0 Variable AtoBDist Maximum 8.023 Mean 0.442 SE Mean 0.312 StDev 3.491 Minimum -7.303 Q1 -2.243 Median 0.130 Q3 3.607 MTB > NormTest c1; SUBC> KSTest. Probability Plot of AtoBDist Probability Plot of AtoBDist Normal 99.9 Mean StDev N KS P-Value 99 Percent 95 90 0.4417 3.491 125 0.094 <0.010 80 70 60 50 40 30 20 10 5 1 0.1 -10 -5 0 AtoBDist 5 10 MTB > normtest c1; SUBC> rjtest. Probability Plot of AtoBDist 3 Probability Plot of AtoBDist Normal 99.9 99 Percent 95 90 Mean StDev N RJ P-Value 0.4417 3.491 125 0.990 0.066 Mean StDev N AD P-Value 0.4417 3.491 125 0.891 0.022 80 70 60 50 40 30 20 10 5 1 0.1 -10 -5 0 AtoBDist 5 10 MTB > normtest c1 Probability Plot of AtoBDist Probability Plot of AtoBDist Normal 99.9 99 Percent 95 90 80 70 60 50 40 30 20 10 5 1 0.1 -10 -5 0 AtoBDist 5 10 4 If your data are perfectly normal, then the data points on the probability plot will form a straight line. The red line forms an estimate of the cumulative distribution function for the population from which the data are drawn. Your text says that if the data are skewed to the left, the plot will rise rapidly at first and then level off, while if they are skewed to the right the plot will rise slowly at first and fast later. Interpreting the results The graphical output is a plot of normal probabilities versus the data. The data depart from the fitted line most evidently in the extremes, or distribution tails. The Anderson-Darling test's p-value indicates that, at levels greater than 0.022, there is evidence that the data do not follow a normal distribution. There is a slight tendency for these data to be lighter in the tails than a normal distribution because the smallest points are below the line and the largest point is just above the line. A distribution with heavy tails would show the opposite pattern at the extremes. Many statistical procedures assume the data follow a normal distribution. In order to verify this assumption, you can perform a normality test on your data. MINITAB provides three normality tests that you can choose from: · Anderson-Darling - This test has good power and is especially effective at detecting departure from normality in the high and low values of a distribution. · Ryan-Joiner (similar to Shapiro-Wilk) - This test also has good power. It is based on the correlation between the sample data and the data one would expect from a normal distribution. · Kolmogorov-Smirnov - This is a popular test of normality, but tends to be less powerful than the other two. Choosing a normality test You have a choice of hypothesis tests for testing normality: Anderson-Darling test (the default), which is an ECDF (empirical cumulative distribution function) based test Ryan-Joiner test [4], [9] (similar to the Shapiro-Wilk test [10], [11]) which is a correlation based test Kolmogorov-Smirnov test [8], an ECDF based test The Anderson-Darling and Ryan-Joiner tests have similar power for detecting non-normality. The Kolmogorov-Smirnov test has lesser powersee [3], [8]and [9] for discussions of these tests for normality. The common null hypothesis for these three tests is H0: data follow a normal distribution. If the p-value of the test is less than your level, reject H0. The results of each test are accompanied by a normal probability plot that can also help you determine whether your data follow a normal distribution. The graphs below are data points generated randomly from a Chi-squared distribution. Copyright © 2000-2005 Minitab Inc. All rights reserved. 5 Probability Plot of C1 Normal 99 95 90 Mean StDev N KS P-Value 10.20 5.540 20 0.122 >0.150 Mean StDev N RJ P-Value 10.20 5.540 20 0.970 >0.100 Percent 80 70 60 50 40 30 20 10 5 1 0 5 10 C1 15 20 25 Probability Plot of C1 Normal 99 95 90 Percent 80 70 60 50 40 30 20 10 5 1 0 5 10 C1 15 20 25 6 Probability Plot of C1 Normal 99 Mean StDev N AD P-Value 95 90 10.20 5.540 20 0.473 0.217 Percent 80 70 60 50 40 30 20 10 5 1 0 5 10 C1 15 20 25 References Basic Statistics [1] S.F. Arnold (1990). Mathematical Statistics. Prentice-Hall. [2] M.B. Brown and A.B. Forsythe (1974). "Robust Tests for the Equality of Variances," Journal of the American Statistical Association, 69, 364-367. [3] R.B. D'Agostino and M.A. Stephens, Eds. (1986). Goodness-of-Fit Techniques, Marcel Dekker. [4] J.J. Filliben (1975). "The Probability Plot Correlation Coefficient Test for Normality," Technometrics, 17, 111. [5] T.P. Hettmansperger and S.J. Sheather (1986). "Confidence Intervals Based on Interpolated Order Statistics," Statistics and Probability Letters, 4, 75-79. [6] N.L. Johnson and S. Kotz (1969). Discrete Distributions, John Wiley & Sons. [7] H. Levene (1960). Contributions to Probability and Statistics, Stanford University Press. [8] H.W. Lilliefors (1967). "On the Kolmogorov Unknown," Journal of the American Statistical Association, 62, 399-402. [9] T.A. Ryan, Jr. and B.L. Joiner (1976). "Normal Probability Plots and Tests for Normality," Technical Report, Statistics Department, The Pennsylvania State University. (Available from Minitab Inc.) [10] S.S. Shapiro and R.S. Francia (1972). "An Approximate Analysis of Variance Test for Normality," Journal of the American Statistical Association, 67, 215-216. [11] S.S. Shapiro and M.B. Wilk. (1965). "An Analysis of Variance Test for Normality (Complete Samples)," Biometrika, 52, 591. 7