c Journal of Applied Mathematics & Decision Sciences, 1(1), 13{25 (1997) Reprints Available directly from the Editor. Printed in New Zealand. Extensions to the Kruskal-Wallis Test and a Generalised Median Test with Extensions J.C.W. RAYNER john rayner@uow.edu.au Department of Applied Statistics, University of Wollongong, Northelds Avenue, NSW 2522, Australia D.J. BEST johnb@biomsyd.dfst.csiro.au CSIRO Mathematical and Information Sciences, PO Box 52, North Ryde, NSW 2113, Australia Abstract. The data for the tests considered here may be presented in two-way contingency tables with all marginal totals xed. We show that Pearson's test statistic XP2 (P for Pearson) may be partitioned into useful and informative components. The rst detects location dierences between the treatments, and the subsequent components detect dispersion and higher order moment dierences. For Kruskal-Wallis-type data when there are no ties, the location component is the Kruskal-Wallis test. The subsequent components are the extensions. Our approach enables us to generalise to when there are ties, and to when there is a xed number of categories and a large number of observations. We also propose a generalisation of the well-known median test. In this situation the location-detecting rst component of XP2 reduces to the usual median test statistic when there are only two categories. Subsequent components detect higher moment departures from the null hypothesis of equal treatment eects. Keywords: Categorical data, Components, Non-parametric tests, Orthonormal polynomial, Two-way data. 1. Introduction The idea of decomposing a test into orthogonal contrasts, as in the analysis of variance, has long been appreciated by statisticians as a way of making hypothesis tests more informative. In the authors' smooth goodness of t work (see Rayner and Best, 1989), a similar approach is pursued. Omnibus test statistics are partitioned into smooth components. We dene the components of a test statistic to be asymptotically pairwise independent, with each asymptotically having the chi-squared distribution, and such that their sum gives the original test statistic. The components provide powerful directional tests and permit a convenient and informative scrutiny of the data. This approach is applied to Spearman's test in Best and Rayner (1996) and Rayner and Best (1996a) Rayner and Best (1996b) gave an overview of this approach applied to several commonly used nonparametric tests, including the Friedman and Durbin tests. Data for a generalisation of the median test that we subsequently propose, and for the Kruskal-Wallis test both with and without ties, may be presented in the form of two-way tables with xed marginal totals. We derive the covariance matrix J.C.W. RAYNER AND D.J. BEST 14 of entries in such tables and then partition a multiple of 2 into components that detect location and higher moment dierences between rows. For Kruskal-Wallis-type data when there are no ties, the location component is the Kruskal-Wallis test. Our approach enables us to generalise to when there are ties, and to when there is a xed number of categories and a large number of observations. We also propose a generalisation of the well-known median test. The location detecting rst component of 2 reduces to the usual median test statistic when there are only two categories. Using more categories allows components other than this location component to be calculated. These additional components, that detect dispersion and higher moment eects, are not available when using the usual median test. The structure of this paper is as follows. In the next section the model for twoway contingency tables with xed marginal totals is given, and Pearson's 2 is derived as a test statistic for the null hypothesis of like rows. In section three a multiple of 2 is partitioned into components. The material in section 2 and 3 will be familiar to many readers, but is necessary background for the new work. In section four it is shown that when there are no ties the rst component is the usual Kruskal-Wallis statistic. The non-location detecting components are our extensions. Section ve generalises the treatment to when there are ties. Section six introduces a generalisation of the usual median 2 test, which is thus identied as a location detecting test the extensions permit dispersion and other eects to be detected. XP XP XP XP X 2. A Model and Pearson's X 2 Test Suppose we have a two-way table of counts , with = 1 and = 1 . and , = 1 The row and column totals, respectively , = 1 are known constants. Under the null hypothesis of simple random sampling, the likelihood was given by Roy and Mitra (1956) as Nij ni: ::: ::: r r j n:j j ::: ::: c c 9 8 9 (Y ) 8 <Y = < Y Y = : =1 : =1 =1 =1 P is the grand total of the observations. = r c ni: P i i i r n:j = c n:: j nij i j in which = The models for tables with just one set of marginal totals xed, or only the grand total xed, are quite dierent from our model in which all row and column totals are xed. See Lancaster (1969, chapter XI section 2, pp. 212-217). This likelihood can be expressed as a product of extended or multivariate hypergeometric probability functions: n:: i ni: j n:j 82 Y <4 Y : =1 =2 r i c n1j j +:::+nij Cnij 3 5 n1: = +:::+ni: Cni: 9 = : To nd moments of the , expectations may be taken with respect to the distribution of the second row conditional on knowledge of the column sums of the Nij EXTENDED KRUSKAL-WALLIS AND MEDIAN TESTS 15 rst two rows, then conditional on the column sums of the rst three rows, and so on. It suces to know the moments of the extended hypergeometric distribution. Details are given in the Appendix. We nd =1 and = 1 Write N = ( 1 ) , =1 and N = (N 1 N ), so that N is the vector of all the cell counts. The joint covariance matrix of N and N is, for 6= , E Nij Ni i ]= ::: ni: n:j =n:: Nic T i i ::: ::: r j ::: T r T c: T r ::: i i j cov Write fj = (N N )=; i =1 diag 2 j n:j =n:: j ni: nj: n:: c R= The covariance matrix of N ( ; 1) ; n:r n:s n:: , and ::: n:r n:: n:: ;1 j : ( ; 1) ; ;1 is (N ) = f ( ) ; ( )g R, where is the direct or Kronecker product. See Lancaster (1969) for details about direct or Kronecker sums and products. Now dene the standardised cell counts n:r n:: diag Zij I =( Nij ; ]) E Nij n:: q ] i =1 ::: fi fj r and j =1 ::: c ) the by identity matrix and 1 the by 1 vector with every element 1. Then Z11 : : : a diag fj E Nij Z=( : n:: cov = n:r n:s a a Z1c : : : Zr 1 : : : Zrc T a a q ])g R p The matrix fI ; ( ])g has ; 1 latent roots 1 and one latent root zero. The latent roots of R are dicult to nd in general, but their asymptotic limits follow from Lancaster (1969, Chapter V.3). Lancaster showed that the quadratic form with vector the standardised cell counts and matrix essentially R, is the familiar Pearson goodness of t statistic, with asymptotic distribution 2;1 . Hence the latent roots of R are asymptotically one ; 1 times and zero once. So under the null hypothesis of simple random sampling, Z has zero mean and covariance matrix (Z ), which asymptotically has ( ; 1)( ; 1) latent roots one, and the remaining cov (Z ) = fI ; ( fi fj r fi fj r : r c c cov r c + ; 1 latent roots zero. In the well known and often used `classical' model, and are xed and the total count ! 1. The test statistic 2 is given by r c r n:: 2 XP = XX r XP c Nij =1 j =1 i ; ni: n:j n:: 2 = ni: n:j n:: c =Z T Z : We now conrm that our model leads to this test statistic. Suppose H is orthogonal and diagonalises (Z ). Asymptotically we then have cov J.C.W. RAYNER AND D.J. BEST 16 H cov(Z )H = I ( ;1)( ;1) ( + ;1) where means direct or Kronecker sum. Dene Y = H Z . Now Z Z = Y Y , in which Y , by the multivariate Central Limit Theorem, is asymptotically (0 I ( ;1)( ;1) 0( + ;1)]) under the null hypothesis of simple random sampling. It follows that under the null hypothesis, 2 = Z Z = Y Y asymptotically has the 2( ;1)( ;1) distribution. T r O r c c T T T Nrc r c r c T XP r T c 3. Partitioning Pearson's Statistic We now show that 2 may be partitioned into components, the th of which detects th moment departures from the null hypothesis of similarly distributed rows (treatments). X 2 The elements of Y are such that 2 = . There is some choice in dening XP s s rc Yi XP Yi the , as H is not yet fully specied. In doing so, our aim is to nd that can be easily and usefully interpreted. To achieve one such partition, rst suppose that f ( )g is the set of polynomials orthonormal on f g. See the Appendix for the denitions of the rst two polynomials and the derivation of subsequent polynomials. This approach results, when there are no ties, in the rst component being the Kruskal-Wallis test. Write for the by 1 vector with elements ( ). Dene G by =1 i Yi Yi gs j n:j =n:: gs in which G is the s G = G rc by matrix r 2 60 G = 664 .. . 0 gs ::: gs .. . . .. . . . 0 0 s ::: 21 60 G = 664 .. . q ::: 3 77 75 gs ::: c ::: gs j G]p c = s 3 77 75 0 1 .. . . .. . . . 0 0 1 c c c =1 c ::: is also c ; 1 and rc by r: c ;1 G Z . The elements of Y may be considered in blocks of Dene Y = , the th block corresponding to the polynomial of order . These blocks are asymptotically mutually independent. Write Y = (V 1 V ), in which T n n r s s T V1=( Y1 : : : Yr ) T ::: V ;1 = (Y( ;2) c c T r +1 T c ::: ::: ;1) ) Y(c r T EXTENDED KRUSKAL-WALLIS AND MEDIAN TESTS 17 and V = 0 (all the V are by 1) so that c ; 1 r s ; 1 = Z Z = Y Y = V 1 V 1 + + V ;1V ;1 ; This partitions ;1 2 into components V V , = 1 ; 1. The V are asymptotically mutually independent and asymptotically (0 I ( ;1) 0), so that the V V are asymptotically mutually independent 2;1. Explicitly we have, for n n n 2 XP n n T T T T s XP s T c ::: n s ::: c Nr s =1 T s ::: s c c : s r r ; 1, p( ; 1) p( ; 1) 0X @ V = G Z= n s T s n c n n j () gs j Zij =1 1 A : Because V involves, through , a polynomial of order , the elements of V are polynomials of order in the elements of N . Under the null hypothesis Z ] = 0, but when this is not true V ] involves moments up to order of Z . So for =1 ; 1, V V detects th moment departures from the null hypothesis of similarly distributed rows (treatments). Instructors Example. See Conover (1980, p. 233). Three instructors assign grades in ve categories according to the following table. gs s s s s E E s ::: T s r s s s s Grade A B C D Instructor 1 4 14 17 6 Instructor 2 10 6 9 7 Instructor 3 6 7 8 6 Total 20 27 34 19 E Total 2 43 6 38 1 28 9 109 Conover (1980) found the Kruskal-Wallis statistic adjusted for ties to be 0.3209, which is to be compared with the 22 (5%) point of 5.991. We nd the location detecting component V 1 V 1 to have P-value 0.85, conrming, as Conover reported, that \none of the instructors can be said to grade higher or lower than the others on the basis of the evidence presented". However the dispersion detecting component V 2 V 2 has P-value 0.01, indicating a signicant variability dierence. From the data it appears that the rst instructor is less variable than the other two. In fact, 9 643 = (;2 113)2 + (2 274)2 + (;0 031)2, with the elements of v2 = (;2 113 2 274 ;0 031) being values of approximately standard normally distributed contributions from instructors 1, 2 and 3 respectively. The rst instructor is less variable than the third who is less variable than the second. This can be formalised by a LSD analysis. The residual 2 ; V 1 V 1 ; V 2 V 2 has P-value 0.75, indicating no further eects in the data. T T : : : : : : : T XP T T J.C.W. RAYNER AND D.J. BEST 18 Partition of 2 for instructor's data Statistics Degrees of Freedom Value P-value V1V1 2 0.324 0.85 XP T V2V2 2 ;V V ;V V 1 1 2 2 T T XP T 2 XP 2 9.643 0.01 4 8 2.021 11.985 0.75 0.15 4. The Kruskal-Wallis Test with No Ties We now consider models that lead to the Kruskal-Wallis test when there are no ties. The latent roots of (Z ) will be found explicitly rather than asymptotically as in section 2. We show that 2 is not an appropriate test statistic, but nevertheless, its components are. The rst component is the Kruskal-Wallis test statistic, and the subsequent components provide informative extensions. Suppose we have distinct observations , being the th of observations on the th of treatments. All = 1 + + observations are combined, ordered, ranked, and the sums of the ranks obtained by the th treatment calculated. The Kruskal-Wallis statistic is cov XP xij i t n n ::: j ni nt Ri i = f12 ( + 1)]g 2 ; 3( + 1) See for example, Conover (1980, section 5.2). The data may be presented as an by contingency table of counts f g, with = 1 if rank is allotted to treatment , and = 0 if rank is allotted to some other treatment. The row and column totals are all xed: the row totals are the treatment sample sizes, so that = for = 1 , while the column totals are all one: = 1 for = 1 . Such a table has 2 = ( ; 1) no matter what the f g. Since 2 is constant, it has P-value 1. Clearly 2 is not a suitable test statistic. The model of section 2 holds, except that now R = ; 1 In ; ;1 1 1 1 This matrix has one latent root one and ; 1 latent roots ( ; 1). It follows that (Z ) has ( ; 1)( ; 1) latent roots ( ; 1), and the remaining + ; 1 latent roots zero. So if H is orthogonal and diagonalises (Z ) then H = n n i Ri =ni n : t n Nij i Nij Nij j j ni: i ::: t n:j XP t n i Nij ::: ni n X P XP n n n n T : n n cov t n n= n n= n t cov H Dene T cov (Z )H = ( ; 1) n n I ( ;1)( ;1) 0( + ;1) t n t n : n EXTENDED KRUSKAL-WALLIS AND MEDIAN TESTS s Y= Then n 19 ; 1 H Z T n : ; 1 ; 1 Y Y= Z Z= n T n T n n 2 XP : With replaced by and replaced by , this is the same as in section three. As in section 2, we are interested in the distribution theory as ! 1. However there Z was an by 1 vector of xed length here Z is a by 1 vector. Fortunately, it is not the asymptotic distribution of Z that is required. First recall that 2 has a xed value, ( ; 1) , for all tables, and so is not available as a test statistic. Second, as in section three, the multivariate Central Limit Theorem shows that each V is asymptotically (0 I ( ;1) 0). Moreover consideration of all pairs V , V shows that they are asymptotically jointly multivariate normal, and since their covariance matrix is zero, they are asymptotically pairwise independent. The V2 V still partition ; ;1 2 . It is the pairwise independence and convenient ;1 distribution of each V that makes data analysis so informative and convenient. What is lost by the unavailability of 2 , is demonstrated in the Employees Example below: there is no residual available to assess if there are higher moment dierences between the treatments. We now show that provided there are no ties, V 1 V 1 is the Kruskal-Wallis statistic, so that the subsequent V V provide extensions to the Kruskal-Wallis test. First note that the f ( )g is the set of polynomials orthonormal on the discrete uniform distribution, so that 1 ( ) = + , = 1 , in which r t c n n rc tn XP t n Nt s s T s t t n s XP n t s XP T T s s gs j g a p = n ::: j ::: n Ri , q () g1 j E Nij n = n n , is j Nij =1 ]=p ni n j X n X j X and b p b i =1 aj = 12 ( 2 ; 1) and = ; 3( + 1) ( ; 1) = ;f( + 1) 2g The rank sum for treatment , j j X n , =1 i ::: = . Now since t n:j ( ) ( )(1 ) = 0 g1 j g0 j =n j () = Zij g1 j p n=ni j = a p ] P n=ni j ] Ri p ( + )= Nij aj b ; +2 1 n ni : n=ni ]f aRi + a: bni g = 1 for J.C.W. RAYNER AND D.J. BEST 20 Now V1V1 = T = 2 Y1 n + ::: ;1 2 n 0 12 X X A + 2 = ;2 1 @ 1( ) 2 a n Yt X t =1 i t n n n g i =1 j =1 p ; +2 1 p Ri n ni 2 ni j Zij = ( 12+ 1) n n X t i =1 2 ; 3( + 1) Ri n ni: after some manipulation. This is the Kruskal-Wallis statistic, well known to be sensitive to location departures from the null hypothesis. Since V assesses th moment departures between treatments, we have partitioned the statistic ( ;1 ) 2 into asymptotically pairwise independent components, V V , = 1 ;1, each with the 2;1 distribution, and such that the th detects th moment departures from the hypothesis of similarly distributed rows (treatments). Since the rst of these is the Kruskal-Wallis statistic, the subsequent components provide extensions to the Kruskal-Wallis test. Employees Example. Conover (1980, p. 238, exercise 2) gave an exercise in which 20 new employees are randomly assigned to four dierent job training programmes. At the end of their training the employees are ranked, with a low ranking reecting a low job ability. s s n n T s t s Programme 1 2 3 4 XP s s ::: n s Ranks 2, 4, 6, 7, 10 1, 3, 8, 11, 12 5, 14, 16, 19, 20 9, 13, 15, 17, 18 The value of the Kruskal-Wallis statistic is 9.72, with 23 P-value 0.021, but Monte Carlo permutation test P-value 0.010. The latter is more likely to be accurate as the sample size is small. Further components are not signicant. An LSD analysis can be used to show that programmes 1 and 2 and programmes 3 and 4 are equally eective, with 3 and 4 being superior. 5. The Kruskal-Wallis Test with Ties If there are ties, the data may be presented as an by contingency table of counts f g, with the row totals are xed at the treatment sample sizes, so again = , =1 , while the column totals are no longer all one. The covariance matrix of Z is t n Nij ni: ni i ::: t q cov (Z ) = fIt ; ( fi fj ])g R and R= diag ( ; 1) ; n:u n:: n:: n:u n:v n:: ;1 : EXTENDED KRUSKAL-WALLIS AND MEDIAN TESTS 21 As in section 2, the latent roots of R are zero once and asymptotically one ; 1 times. It follows that (Z ) has ( ; 1)( ; 1) latent roots asymptotically one, and the remaining + ; 1 latent roots zero. With suitable modications the ; 1, partitioning of section three holds. For = 1 n cov t t n n s V s =G T s Zp = = n ::: X n j =1 n () p gs j Zij = n : Note that f ( )g is the set of polynomials orthonormal on f g, not on the discrete uniform as in the previous section when there were no ties. This is the partition derived in section 3 for 2 . So the rst component of 2 in the Instructors example is the Kruskal-Wallis statistic corrected for ties. The subsequent components are extensions to the Kruskal-Wallis test adjusted for ties. Note that for this example the model assumed in section 3, with xed numbers of rows and columns, is more plausible than the model of this section, since = 5 is hardly large. gs j n:j =n:: XP XP n 6. Generalised Median Tests Conover (1980, section 4.3) described the median test, in which random samples are taken from each of c populations. Each random sample is classied as above and below the grand median (the median of the combined random samples), forming an by 2 contingency table with xed marginal totals. The usual chi-squared test, based on 2 , is then applied to this contingency table. If instead of the grand median, a `grand quantile' is used, the resulting test is described as a quantile test: see Conover (1980, p. 174). These tests can be generalised by choosing instead of two categories for the combined random samples, and so forming an by contingency table of counts of the number of observations for the th sample in the th category. This table has all row and column totals xed and can be tested for row consistency using the results of the sections 2 and 3. The rst three say, components of 2 are of particular interest, indicating location, dispersion and skewness dierences between treatments. It is routine to show that the location component V 1 V 1 of 2 reduces to the median test statistic when observations are classied into just two categories. This is shown in the Appendix. The result identies the median test as a location detecting test. To detect up to th moment dierences between the populations requires categorisation into + 1 categories and the use of the V 2 V components. If there are as many categories as observations and each category has one observation, the test based on the location component is the Kruskal-Wallis test, which is known to be more powerful than the median test. Using more than two categories will result in less loss of information due to categorisation compared to the median test, and will permit assessment of higher moment dierences between the treatments. Corn Example. Conover (1980, p. 172) gave the example of four dierent methods of growing corn. He classied the data as greater than 89 and up to 88 and applied r XP c r c Nij i j XP T XP s s ::: s J.C.W. RAYNER AND D.J. BEST 22 the median test. In this form this does not conform to the xed margins model. If the objective were to divide the data into groups of the lowest 18 and highest 16 observations, it would conform to the xed margins model. We now classify the data into four approximately equal groups. Using the median test, Conover reported a P-value \slightly less than 0.001": the method median yields are clearly dierent. We calculate 2 = 49 712 on 9 degrees of freedom. In addition V 1 V 1 = 25 723, V 2 V 2 = 19 972 and V 3 V 3 = 2 574, all on 3 degrees of freedom. The location and dispersion components and 2 are all signicant, with P-values all zero to three decimal places. The residual or skewness component has 23 P-value 0.45. The ner classication, compare to that employed by the median test, has uncovered a variability dierence between the methods: methods 3 and 4 are signicantly less variable than 1 and 2. T XP T : : T : : XP First Second Third Fourth Total Quartile Quartile Quartile Quartile 0 3 4 2 9 1 6 3 0 10 0 0 1 6 7 8 0 0 0 8 9 9 8 8 34 Method 1 Method 2 Method 3 Method 4 Total Appendix The Orthogonal Polynomials The rst two polynomials, dened on 1 and orthonormal with regard to the weights 1 , are 1 ( ) and 2 ( ), given explicitly by: x p ::: pc g xj g in which = xc ( )=( ; ) p g1 xj and ::: xj xj = ( ) = f( ; )2 ; 3 ( ; ) g2 xj X a xj c j xj pj r =1 = X c j =1 ( ; ) xj xj r pj =2 and a 2 ; 2g = ; 4 j + =1 2 3 =2 ::: ; c 2 2 ;0 5 : : The subsequent polynomials 3 ( ) ;1( ) may be derived by using the useful recurrence relations in Emerson (1968). In the text we have taken, as in many applications, = , = 1 . g xj j j xj ::: ::: gc xj c Derivation of the Covariance Matrix of the Cell Counts In section 2 the method used to nd the moments of the is described. To nd 21 ], we take 21 j 1 + 2 =1 ], then the conditional expectation Nij E N E N N j N j j ::: c of this expression with the sum of the rst three columns being known, and so on. The successive expectations are EXTENDED KRUSKAL-WALLIS AND MEDIAN TESTS ( n2: N11 f ( + 21 ) ( 1 + 2 ) + 2 )gf( 1 + 2 )( N n2: = n1: .. . = n : n : n : n : N11 n : + 23 + N21 )( N31 = n1: + n2: + n3: )g )gf( 1 + 2 ) ( 1 + 2 + 3 )g + ( ;1) ) ( 1 + + )gf 1g = 2 1 By symmetry ] = , =2 and = 1 . By dierence the expectations for the rst row may be obtained, giving the familiar f ( n2: = n1: + f( 1 + n : n2: n c E Nij : = n : ni: n:j =n:: E Nij In the same way n : = n : n : ::: ]= i ni: n:j =n:: n : ::: ::: =1 i n : nc: r ::: n : n : ::: and r n:1 n: n : n: =n:: : j ( ; 1)] = 2 ( 2 ; 1) 1( from which we obtain ( 21 ), and E N21 N21 ::: n: j =1 c ::: c: ; 1) f ( ; 1)g = n:: n:: var N ( var Nij )= ni: n:j 1; n:: ; n:j n:: n:: ni: n:: ;1 i =1 ::: and r j =1 ::: c: Similarly ( cov Nij Nik )=; ni: n:j n:k ; n:: n:: n:: n:: ni: ;1 i =1 ::: r and j i =1 ::: r and j 6= = 1 k ::: c: ::: c By symmetry ( cov Nrj Nsj )=; n:j nr: ns: ; n:: n:: n:: n:: n:j ;1 6= = 1 k and by the expectation argument again ( cov Nir Njs )= ni: n:r nj: n:s n:: n:: 1 n:: ;1 6= = 1 i j ::: and r Write N = ( 1 ) , =1 and N = (N 1 covariance matrix of N and N is, for 6= , Ni i ::: Nic T i (N N )=; i ::: j i T r j T ::: r 6= = 1 s N ). T r ::: c: The joint ( ; 1) ; ;1 Now since the f g are such that the row and column totals are known constants, (N N 1 + + N ) = 0 for = 1 . So if we write = , =1 , and cov i j ni: nj: n:: n:: diag n:r n:: n:: n:r n:s n:: : Nij cov i ::: r i ::: r fj n:j =n:: j ::: c J.C.W. RAYNER AND D.J. BEST 24 R= diag then the covariance matrix of N cov (Ni ) = ; X i 6= cov ( ; 1) ; is n:r n:: n:: i (N N )= i j i ;1 n:: X j n:r n:s 6= fi fj R= fi (1 ; )R fi j which agrees with direct calculation. So if is the Kronecker product, the covariance matrix of N is (N ) = f Recall that we have dened =( 1 , and Z = ( 11 1 cov Zij ::: c Z ::: ( ) ; ( )g p ; ]) ], = 1 ) . It follows that diag fj Z c ::: cov Nij fi fj E Nij Ztl : : : q (Z ) = fI ; ( The location component V 1 V 1 of T fi fj X 2 P = E Nij i ::: r T Ztc r R: ])g R and = j : reduces to the median test statistic If there are observations below a predetermined point in the combined sample, and above it, then Conover (1980, p. 172) gave the 2 Median test statistic as b a T = 2 n:: ab X r ; Ni1 =1 i ni: b X 2 =ni: : n:: It is routine to show that 1 = ( ; ) , = 1 , in which and are are the mean and standard deviation of the distribution dened by ( = 1) = 2 . Now and ( = 2) = . It follows that = 1 + and 2 = g j j = j ::: c P X P X a=n:: p ni: V1i = 2 X j Nij g1j =1 = Ni1 (; = ( ; ); It follows that, as required, V 1 V 1 = . ni: ani: =n:: T a=n:: a=n:: Ni1 = )+( ni: ni: b=n:: ; ; Ni1 b=n:: ab=n:: )(1 ; a=n:: ) Ni1 : T References 1. D.J. Best and J.C.W. Rayner. Nonparametric analysis for doubly ordered two-way contingency tables. Biometrics, 52:1153-1156, 1996. 2. W.J. Conover. Practical Nonparametric Statistics (2nd ed.). Wiley, New York, 1980. 3. P.L. Emerson. Numerical construction of orthogonal polynomials from a general recurrence formula. Biometrics, 24:695-701, 1968. 4. H.O. Lancaster. The Chi-squared Distribution. Wiley, New York, 1969. 5. J.C.W. Rayner and D.J. Best. Smooth Tests of Goodness of Fit. Oxford University Press, New York, 1989. 6. J.C.W. Rayner and D.J. Best. Smooth extensions of Pearson's produce moment correlation and Spearman's rho. Statistics and Probability Letters, 30(2):171-177, 1996a. EXTENDED KRUSKAL-WALLIS AND MEDIAN TESTS 25 7. J.C.W. Rayner and D.J. Best. Extensions to some important nonparametric tests. In Proceedings of the A.C. Aitken Centenary Conference, Dunedin 1995, (Kavalieris, L. et al. ed.), University of Otago Press, Dunedin, 257-266, 1996b. 8. S.N. Roy and S.K. Mitra. An introduction to some non-parametric generalisations of analysis of variance and multivariate analysis. Biometrika, 43:361-376, 1956.