Q-Q Plots to Assess Normality Suppose that a sample x1,…,xn ordered from smallest to largest values: x(1)≤x(2)≤…≤x(n). The proportion of the sample that is at or below x(j) might be thought of as j/n but since obviously there is the possibility of ties, so this is usual calculated as (j-½)/n. If we have random variable Z with pdf f(Z), the jth quantile is a number q(j) such that P[Z≤q(j)]=(j-½ )/n. If Z is standard q( j) 1 1 z2 normal, the quantiles are found by solving the equation e 2 dz ( j 12 ) / n for q(j). Of 2 course, if we had a non-standard normal we would have to be slightly more complex, since we should transform this to +q(j). If the sample is drawn from a normal distribution, then x(j) should equal +q(j). To see whether this is true, we draw a Q-Q diagram (see SPSS drop-down list for Descriptive Statistics) with points (x(1), q(1)), …,(x(n), q(n)), and this should be a straight line. In the graph below, the Fortune 500 “sales” Q-Q plot is seen. Notice that sales tend to below the line at extremes and above the line in the middle, suggesting that this is a curve, not a line of data. Hence, sales are not very “normal.” Normal Q-Q Plot of sales 120,000 Expected Normal Value 100,000 80,000 60,000 40,000 20,000 0 0 20,000 40,000 60,000 80,000 Observed Value 100,000 120,000 140,000 __ This procedure can be generalized to multi-dimensions as follows. For each observation i, compute the Mahalanobis squared distance: d i2 (x i x)' S 1 (x i x) . These are on the diagonal of the matrix (X-1 x ' )S-1(X-1 x ' )’. If X is normal, then this should be 2p-distributed. Rank order the Mahalanobis squared distances, compare them to the quantiles of the 2p distribution; there should be linearity with slope 1. SPSS allows you to draw a Q-Q plot for 2p.