Quantile-Quantile Plots Also called QQ plots Normal probability plots Uses Check whether data came from normal distribution Suggest how data is different from normal distribution See if two datasets are from the same distribution General idea Using x and s from data, plot sorted data against what you would expect if it were from a normal distribution. Points should follow straight line if normal. Advantages of QQ-plot over histogram Can see if data’s tails are thicker or thinner than a normal distribution. Skewness can be more apparent Disadvantages of QQ-plot Not intuitive Takes practice Styles of QQ-plots differ between software packages, books, etc. 1 Practice Data: n=25, x 108.52, s 19.15 Obtained from Normal 100, 20 data standardized values for data percentiles Z Quantiles for percentiles 66 75 83 84 91 94 99 102 103 103 104 107 112 114 115 117 118 118 122 124 125 127 131 133 146 -2.22076 -1.75070 -1.33287 -1.28064 -0.91504 -0.75836 -0.49722 -0.34053 -0.28830 -0.28830 -0.23607 -0.07939 0.18176 0.28621 0.33844 0.44290 0.49513 0.49513 0.70404 0.80850 0.86073 0.96518 1.17410 1.27855 1.95752 0.02 0.06 0.10 0.14 0.18 0.22 0.26 0.30 0.34 0.38 0.42 0.46 0.50 0.54 0.58 0.62 0.66 0.70 0.74 0.78 0.82 0.86 0.90 0.94 0.98 -2.05375 -1.55477 -1.28155 -1.08032 -0.91537 -0.77219 -0.64335 -0.52440 -0.41246 -0.30548 -0.20189 -0.10043 0.00000 0.10043 0.20189 0.30548 0.41246 0.52440 0.64335 0.77219 0.91537 1.08032 1.28155 1.55477 2.05375 Expected data values using Z quantiles from percentiles 69.198 78.751 83.983 87.836 90.994 93.735 96.202 98.480 100.623 102.671 104.654 106.597 108.520 110.443 112.386 114.369 116.417 118.561 120.838 123.305 126.046 129.204 133.057 138.289 147.842 Different software packages use different algorithms for how to divide 0-to-1 up into equally spaced n points. Idea: The 10th percentile of the data should match up to what you would expect from a normal x, s distribution’s 10th percentile, same with 50th percentile, etc. xi x for each s data point should match up with the p th percentile for the standard normal distribution. Equivalent idea: The p th percentile for the standardized value xi, standardized 2 Probability Plot of data Normal 99 95 90 Mean StDev N AD P-Value 108.5 19.15 25 0.214 0.831 Mean StDev N AD P-Value 108.5 19.15 25 0.214 0.831 Percent 80 70 60 50 40 30 20 10 5 1 60 70 80 90 100 110 data 120 130 140 150 Probability Plot of data Normal - 95% CI 99 95 90 Percent 80 70 60 50 40 30 20 10 5 1 50 75 100 125 150 175 data The first graph is from Stat>Basic Statistics>Normality test. The second graph is from from Graph > Probability Plot. The 2nd graph includes a 95% pointwise confidence interval for what you would expect if the data came from a normal distribution. The pvalues are from the Anderson Darling test for normality. There are a number of tests for normality. 3 120 80 Sample Quantiles Normal Q-Q Plot -2 -1 0 1 2 Theoretical Quantiles 2 1 0 -1 -2 Theoretical Quantiles Normal Q-Q Plot 80 100 120 140 2 1 0 -1 -2 Standard Normal Quantiles Sample Quantiles -2 -1 0 1 2 127 146 Percents Data Z-scores 0.98 0.94 0.78 0.62 0.5 0.38 0.18 0.06 0.02 66 75 83 91 99 107 117 Data values 4 Note how Minitab Probability Plot’s y-axis has unequal spacing for the percents Some standard normal quantiles 1.96 1.282 0.842 0.524 0.253 0 -0.253 -0.524 -0.842 -1.282 -1.96 percents 0.975 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.025 quanitles The above 4 plots were programmed in R. Probability Plot of data Normal 99 Mean StDev N AD P-Value 95 90 108.5 19.15 25 0.214 0.831 Percent 80 70 60 50 40 30 20 10 5 1 60 70 80 90 100 110 data 120 130 140 150 5 Histogram of Chi-sq df=4 Normal 90 Mean StDev N 80 4.011 3.009 500 70 Frequency 60 50 40 30 20 10 0 -3 0 3 6 9 12 Chi-sq df=4 15 18 Probability Plot of Chi-sq df=4 Normal - 95% CI 99.9 Mean StDev N AD P-Value 99 95 Percent 90 4.011 3.009 500 13.472 <0.005 80 70 60 50 40 30 20 10 5 1 0.1 -5 0 5 10 Chi-sq df=4 15 20 Histogram of Left Skewed Normal 90 Mean StDev N 80 35.99 3.009 500 70 Frequency 60 50 40 30 20 10 0 21 24 27 30 33 Left Skewed 36 39 42 Probability Plot of Left Skewed Normal - 95% CI 99.9 Mean StDev N AD P-Value 99 95 Percent 90 35.99 3.009 500 13.472 <0.005 80 70 60 50 40 30 20 10 5 1 0.1 20 25 30 35 Left Skewed 40 45 6 Histogram of t-distn, df=4 Normal 200 Mean StDev N 0.06165 1.356 500 Mean StDev N AD P-Value 0.06165 1.356 500 4.404 <0.005 Mean StDev N -0.1607 1.399 20 Frequency 150 100 50 0 -3 0 3 6 t-distn, df=4 9 12 Probability Plot of t-distn, df=4 Normal - 95% CI 99.9 99 95 90 Percent 80 70 60 50 40 30 20 10 5 1 0.1 -5 0 5 t-distn, df=4 10 15 Histogram of t, df=4, n=20 Normal 9 8 Frequency 7 6 5 4 3 2 1 0 -4 -3 -2 -1 0 t, df=4, n=20 1 2 3 Probability Plot of t, df=4, n=20 Normal - 95% CI 99 Mean StDev N AD P-Value 95 90 -0.1607 1.399 20 0.733 0.047 Percent 80 70 60 50 40 30 20 10 5 1 -5.0 -2.5 0.0 t, df=4, n=20 2.5 5.0 7 Histogram of Beta(2,2) Normal 40 Mean StDev N 0.5039 0.2308 500 Mean StDev N AD P-Value 0.5039 0.2308 500 2.741 <0.005 Frequency 30 20 10 0 -0.00 0.15 0.30 0.45 0.60 Beta(2,2) 0.75 0.90 1.05 Probability Plot of Beta(2,2) Normal - 95% CI 99.9 99 95 90 Percent 80 70 60 50 40 30 20 10 5 1 0.1 -0.4 -0.2 0.0 0.2 0.4 0.6 Beta(2,2) 0.8 1.0 1.2 1.4 Can be used to plot random values against random values to if distributions are the same. 0 -1 -2 Znorm 1 2 30 random values from two distributions -2 -1 0 1 2 tDistnDF4 8