Why Kendall Tau

advertisement

Why Kendall Tau?

G. E. NOETHER

In the 1980 issue of TEACHING STATISTICS, D. Griffiths supports the use of the Spearman rank correlation coefficient on the grounds that "it is the one which is commonly used." The claim may well be true. But it is a poor excuse for ignoring practical and pedagogical advantages of the Kendall coefficient.

All too often statistics users compute a quantity like a correlation coefficient without asking what the quantity means. In the Griffiths discussion, what operational interpretation can we associate with the value r for the Spearman correlation coefficient between the density of public houses and the density of places of worship ? We are given a formal explanation for the rather awkward-looking algebraic structure of the Spearman coefficient, but we are not told anything about how to interpret the result of our computations.

The facts are that it is no easy matter to assign an operational interpretation to the Spearman coefficient.

The Kendall coefficient, on the other hand, has an intuitively simple interpretation.

What is more, its algebraic structure is much simpler than that of the Spearman coefficient. It can even be computed from the actual observations without first converting them to ranks.

A correlation coefficient is intended to measure "strength of relationship". But different correlation coefficients measure strength of relationship in different ways. A product moment coefficient, a

Spearman coefficient, and a Kendall coefficient, all equal to1/3 mean three rather different things.

Only the Kendall coefficient has a simple interpretation.

When statisticians talk of strength of relationship, they usually have in mind the strength of the tendency of two variables, X and Y, to move in the same (opposite) direction. The Kendall coefficient measures this tendency in a very direct and easily understood way. Let (Xi, Yi) and (Xj,Yj) be a pair of (bivariate) observations. If Xj -Xi and Yj - Yi have the same sign, we shall say that the pair is concordant, if they have opposite signs, we shall say that the pair is discordant In the (x,y)-plane the points + + form a concordant pair, while the points ++ form a discordant pair. In a sample containing n observations we can form n(n-1)/2 pairs corresponding to choices 1 <= i <j<= n. Let C stand for the number of concordant pairs and D stand for the number of discordant pairs. A simple way to measure strength of relationship is to compute S = C - D, a quantity known as Kendall S. A preponderance of concordant pairs resulting in a large positive value of S indicates a strong positive relationship between X and Y; a preponderance of discordant pairs resulting in a large negative value of S indicates a strong negative relationship between X and Y.

As a measure of strength of relationship, S has a disadvantage. Its range depends on the sample size n.

But a simple standardization gets around this problem S can vary between -n(n-1)/2 and +n(n-1)/2. If we then compute t =2S/n(n-1) we have always -1 <= t <= 1

The maximum value +1 is achieved if all n(n -l)/2 pairs are concordant. Correspondingly the minimum value -1 is achieved if all pairs are discordant. The quantity t is known as the Kendall rank correlation coefficient t au.

Since C equals the number of concordant pairs among all n(n - 1)/2 pairs in the sample, 2C/n(n-1) is an estimate of the probability pC that two pairs of observations (Xi, Yi) and (Xj,Yj) are concordant.

Similarly 2D/n(n-1) is an estimate of the probability pD of discordance of two pairs of observations It follows that t = 2C/(n(n-1) — 2D/n(n-1) is an estimate of the parameter t = pC — pD

We shall call t the Kendall correlation coefficient in the (X, Y)-population. It is an intuitively simple measure of strength of relationship between X and Y.

For example, in a population with t = 1/3 two sets of observations (Xi, Yi) and (Xj,Yj) are twice as likely to be concordant than discordant. More generally, in a population with Kendall correlation coefficient t , the odds ratio pC /pD of the concordant to discordant sets of observations equals (1+ t

)/(1- t ). Corresponding properties for the Spearman coefficient are considerably more complex.

Following Griffiths, we start with Sd2 as an intuitively reasonable measure of the extent to which two sets of ranks differ (or agree). This time, as Griffiths points out, standardization involves the by no means elementary task of determining the maximum value of Sd2 in addition to reversing the direction of increase of Sd2. Kruskal has given an interpretation of the Spearman coefficient which involves concordance relationships among three sets of observations This interpretation is considerably more complex than that for the Kendall coefficient and will not be attempted in this note.

On practical and pedagogical grounds, the Kendall coefficient has substantial advantages over the

Spearman coefficient. This writer is not aware of any theoretical reasons for preferring the Spearman coefficient. Quite the contrary, Kendall S has much greater universality, so much so that a good deal of what is called non-parametric statistics can be built around S. A recent book by C. Leach does exactly that.

Addendum: A recent paper in Teaching Statistics by D. Wilkie discusses a pictorial representation of the Kendall coefficient in terms of a quantity c equal to the number of crossings among the n lines which connect X - and Y-ranks. A little consideration shows that discordant pairs of observations produce crossings while concordant pairs do not. Thus Wilkie’s c equals our D. If, as Wilkie seems to assume, there are no tied ranks, C + D= n(n - 1)/2, and t can be written as

[n(n-1)/2 - 2D]/[n(n-1)/2] = 1 - 4D/n(n-1) which is the expression given by Wilkie. (Note: Wilkie’s t is our t.)

University of Connecticut

References

Griffiths, D. (1980). A Pragmatic Approach to Spearman’s Rank Correlation Coefficient . Teaching

Statistics 2, pp. 10?13.

Kruskal, W. (1958). Ordinal Measures of Association . Journal of the American Statistical Association

53, pp. 814?861.

Leach, C. (1979). Introduction to Statistics: A Nonparametric Approach for the Social Sciences .

Wiley.

Wilkie, D. (1980).

Pictorial Representation of Kendall’s, Rank Correlation Coefficient

. Teaching

Statistics 2, pp. 76-78.

Download