Q-Q Plots for Normality

advertisement
Q-Q Plots to Assess Normality
Suppose that a sample x1,…,xn ordered from smallest to largest values: x(1)≤x(2)≤…≤x(n). The
proportion of the sample that is at or below x(j) might be thought of as j/n but since obviously
there is the possibility of ties, so this is usual calculated as (j-½)/n. If we have random variable Z
with pdf f(Z), the jth quantile is a number q(j) such that P[Z≤q(j)]=(j-½ )/n. If Z is standard
q( j) 1
 1 z2
normal, the quantiles are found by solving the equation 
e 2 dz  ( j  12 ) / n for q(j). Of

2
course, if we had a non-standard normal we would have to be slightly more complex, since we
should transform this to +q(j). If the sample is drawn from a normal distribution, then x(j)
should equal +q(j). To see whether this is true, we draw a Q-Q diagram (see SPSS drop-down
list for Descriptive Statistics) with points (x(1), q(1)), …,(x(n), q(n)), and this should be a straight
line. In the graph below, the Fortune 500 “sales” Q-Q plot is seen. Notice that sales tend to
below the line at extremes and above the line in the middle, suggesting that this is a curve, not a
line of data. Hence, sales are not very “normal.”
Normal Q-Q Plot of sales
120,000
Expected Normal Value
100,000
80,000
60,000
40,000
20,000
0
0
20,000
40,000
60,000
80,000
Observed Value
100,000
120,000
140,000
__
This procedure can be generalized to multi-dimensions as follows. For each observation
i, compute the Mahalanobis squared distance: d i2  (x i  x)' S 1 (x i  x) . These are on the
diagonal of the matrix (X-1 x ' )S-1(X-1 x ' )’. If X is normal, then this should be 2p-distributed.
Rank order the Mahalanobis squared distances, compare them to the quantiles of the 2p
distribution; there should be linearity with slope 1. SPSS allows you to draw a Q-Q plot for 2p.
Download