Checking for Normality

advertisement
Checking for normality:
What is a normal distribution?
A normal distribution is a probability distribution that follows the formula:
This will produce a probability distribution graph that looks like these:
Why check for normality?




It is one of the most important conditions for parametric statistical tests which include ttests and ANOVAs
If the data set is not normal, then these tests could have a high chance of false positives.
Then the appropriate tests would be non-parametric.
More information about parametric and non-parametric tests are shown in this link:
http://www.creative-wisdom.com/teaching/WBI/parametric_test.shtml
How to check for normality?
●
●
Qualitatively, one can use the normplot(data) command in MATLAB.
Enter in your data into a row or column vector and this command allows you to plot the
normal probability plot
●
●
●
.
The predicted normal probabilities are plotted against the data values in blue crosses
while a red linear regression line is plotted along with that.
If the blue crosses look approximately linear (close to the red line) then the data is
approximately normal
To be more objective and precise, you can use the chi squared goodness of fit test or a
lilliefors test.
Chi Squared Goodness of Fit:
The conditions for using a chi squared goodness of fit test are:
● The sampling method is simple random sampling.
● The population is at least 10 times as large as the sample.
● The variable under study is categorical.
● The expected value of the number of sample observations in each level of the
variable is at least 5.
Once these conditions are met then check out pages 2 and 3 of Chi Squared Tutorial to see
how to conduct the test by hand.
Though for the ECG lab, you can simply use commands from MATLAB to check for normality of
the class data.
[h,p]=chi2gof (data,’alpha’).
Inputs:
● data- this is your data and make sure that it is all in one column or one row
● ‘alpha’- this is the significance value that you want your test to be at. The default is 0.05.
Outputs:
● h- this will be 1 if the null hypothesis that the distribution is normal can be rejected and
will be 0 when the null cannot be rejected
● p- this is the p-value of the chi squared test
However, when the sample size is too small, this test will not work and that is when you have to
use the lilliefors test for normality.
Lilliefors Test for Normality
OPTIONAL: to learn more about the test go to: http://en.wikipedia.org/wiki/Lilliefors_test
To use lilliefors in MATLAB, [h,p] = lillietest (data, ‘alpha’)
The inputs and outputs are exactly the same as the chi squared goodness of fit test.
Examples:
Chi squared goodness of fit when it works:
Chi squared goodness of fit when there is not enough data:
This is the time to use the lilliefors test:
The test on the same data set produces an existent p-value and also does not reject the null
that the distribution is normal.
Conclusion:
Checking for normality is very important for all types of statistical tests. Plotting a normal
probability plot is very subjective because you have to judge whether data points are linear or
not. The chi squared goodness of fit test is simple to do and produces good results for large
sample sizes but when sample sizes are too small then use the lilliefors test.
Download