c Journal of Applied Mathematics & Decision Sciences, 1(1), 13{25... Reprints Available directly from the Editor. Printed in New Zealand.

advertisement
c Journal of Applied Mathematics & Decision Sciences, 1(1), 13{25 (1997)
Reprints Available directly from the Editor. Printed in New Zealand.
Extensions to the Kruskal-Wallis Test and a
Generalised Median Test with Extensions
J.C.W. RAYNER
john rayner@uow.edu.au
Department of Applied Statistics, University of Wollongong, Northelds Avenue, NSW 2522,
Australia
D.J. BEST
johnb@biomsyd.dfst.csiro.au
CSIRO Mathematical and Information Sciences, PO Box 52, North Ryde, NSW 2113, Australia
Abstract. The data for the tests considered here may be presented in two-way
contingency tables with all marginal totals xed. We show that Pearson's test statistic XP2 (P for Pearson) may
be partitioned into useful and informative components. The rst detects location dierences between the treatments, and the subsequent components detect dispersion and higher order moment
dierences. For Kruskal-Wallis-type data when there are no ties, the location component is the
Kruskal-Wallis test. The subsequent components are the extensions. Our approach enables us to
generalise to when there are ties, and to when there is a xed number of categories and a large
number of observations. We also propose a generalisation of the well-known median test. In this
situation the location-detecting rst component of XP2 reduces to the usual median test statistic
when there are only two categories. Subsequent components detect higher moment departures
from the null hypothesis of equal treatment eects.
Keywords: Categorical data, Components, Non-parametric tests, Orthonormal polynomial,
Two-way data.
1. Introduction
The idea of decomposing a test into orthogonal contrasts, as in the analysis of
variance, has long been appreciated by statisticians as a way of making hypothesis
tests more informative. In the authors' smooth goodness of t work (see Rayner
and Best, 1989), a similar approach is pursued. Omnibus test statistics are partitioned into smooth components. We dene the components of a test statistic
to be asymptotically pairwise independent, with each asymptotically having the
chi-squared distribution, and such that their sum gives the original test statistic.
The components provide powerful directional tests and permit a convenient and
informative scrutiny of the data. This approach is applied to Spearman's test in
Best and Rayner (1996) and Rayner and Best (1996a) Rayner and Best (1996b)
gave an overview of this approach applied to several commonly used nonparametric
tests, including the Friedman and Durbin tests.
Data for a generalisation of the median test that we subsequently propose, and
for the Kruskal-Wallis test both with and without ties, may be presented in the
form of two-way tables with xed marginal totals. We derive the covariance matrix
J.C.W. RAYNER AND D.J. BEST
14
of entries in such tables and then partition a multiple of 2 into components that
detect location and higher moment dierences between rows.
For Kruskal-Wallis-type data when there are no ties, the location component is
the Kruskal-Wallis test. Our approach enables us to generalise to when there are
ties, and to when there is a xed number of categories and a large number of
observations. We also propose a generalisation of the well-known median test. The
location detecting rst component of 2 reduces to the usual median test statistic
when there are only two categories. Using more categories allows components other
than this location component to be calculated. These additional components, that
detect dispersion and higher moment eects, are not available when using the usual
median test.
The structure of this paper is as follows. In the next section the model for twoway contingency tables with xed marginal totals is given, and Pearson's 2 is
derived as a test statistic for the null hypothesis of like rows. In section three a
multiple of 2 is partitioned into components. The material in section 2 and 3
will be familiar to many readers, but is necessary background for the new work. In
section four it is shown that when there are no ties the rst component is the usual
Kruskal-Wallis statistic. The non-location detecting components are our extensions.
Section ve generalises the treatment to when there are ties. Section six introduces
a generalisation of the usual median 2 test, which is thus identied as a location
detecting test the extensions permit dispersion and other eects to be detected.
XP
XP
XP
XP
X
2. A Model and Pearson's X 2 Test
Suppose we have a two-way table of counts , with = 1
and = 1
.
and , = 1
The row and column totals, respectively , = 1
are known constants. Under the null hypothesis of simple random sampling, the
likelihood was given by Roy and Mitra (1956) as
Nij
ni:
:::
:::
r
r
j
n:j
j
:::
:::
c
c
9 8
9
(Y ) 8
<Y = < Y Y =
: =1 : =1 =1 =1
P is the grand total of the observations.
=
r
c
ni:
P
i
i
i
r
n:j
=
c
n::
j
nij
i
j
in which =
The models for tables with just one set of marginal totals xed, or only the grand total
xed, are quite dierent from our model in which all row and column totals are
xed. See Lancaster (1969, chapter XI section 2, pp. 212-217). This likelihood can
be expressed as a product of extended or multivariate hypergeometric probability
functions:
n::
i
ni:
j
n:j
82
Y <4 Y
: =1
=2
r
i
c
n1j
j
+:::+nij
Cnij
3
5
n1:
=
+:::+ni:
Cni:
9
=
:
To nd moments of the , expectations may be taken with respect to the
distribution of the second row conditional on knowledge of the column sums of the
Nij
EXTENDED KRUSKAL-WALLIS AND MEDIAN TESTS
15
rst two rows, then conditional on the column sums of the rst three rows, and so
on. It suces to know the moments of the extended hypergeometric distribution.
Details are given in the Appendix. We nd
=1
and = 1
Write N = ( 1
) , =1
and N = (N 1
N ), so that N is
the vector of all the cell counts. The joint covariance matrix of N and N is, for
6= ,
E Nij
Ni
i
]=
:::
ni: n:j =n::
Nic
T
i
i
:::
:::
r
j
:::
T
r
T
c:
T
r
:::
i
i
j
cov
Write
fj
=
(N
N )=;
i
=1
diag
2
j
n:j =n:: j
ni: nj:
n::
c
R=
The covariance matrix of N
( ; 1) ;
n:r n:s
n::
, and
:::
n:r n::
n::
;1
j
:
( ; 1) ;
;1
is (N ) = f ( ) ; ( )g R, where is
the direct or Kronecker product. See Lancaster (1969) for details about direct or
Kronecker sums and products. Now dene the standardised cell counts
n:r n::
diag
Zij
I
=(
Nij
; ])
E Nij
n::
q
]
i
=1
:::
fi fj
r
and
j
=1
:::
c
)
the by identity matrix and 1 the by 1 vector with every element 1. Then
Z11 : : :
a
diag fj
E Nij
Z=(
:
n::
cov
=
n:r n:s
a
a
Z1c : : :
Zr 1 : : :
Zrc
T
a
a
q
])g R
p
The matrix fI ; ( ])g has ; 1 latent roots 1 and one latent root zero. The
latent roots of R are dicult to nd in general, but their asymptotic limits follow
from Lancaster (1969, Chapter V.3). Lancaster showed that the quadratic form
with vector the standardised cell counts and matrix essentially R, is the familiar
Pearson goodness of t statistic, with asymptotic distribution 2;1 . Hence the
latent roots of R are asymptotically one ; 1 times and zero once. So under the
null hypothesis of simple random sampling, Z has zero mean and covariance matrix
(Z ), which asymptotically has ( ; 1)( ; 1) latent roots one, and the remaining
cov
(Z ) = fI ; ( fi fj
r
fi fj
r
:
r
c
c
cov
r
c
+ ; 1 latent roots zero.
In the well known and often used `classical' model, and are xed and the total
count ! 1. The test statistic 2 is given by
r
c
r
n::
2
XP
=
XX
r
XP
c
Nij
=1 j =1
i
;
ni: n:j
n::
2 =
ni: n:j
n::
c
=Z
T
Z
:
We now conrm that our model leads to this test statistic. Suppose H is orthogonal
and diagonalises (Z ). Asymptotically we then have
cov
J.C.W. RAYNER AND D.J. BEST
16
H
cov(Z )H = I ( ;1)( ;1) ( + ;1)
where means direct or Kronecker sum. Dene Y = H Z . Now Z Z =
Y Y , in which Y , by the multivariate Central Limit Theorem, is asymptotically
(0 I ( ;1)( ;1) 0( + ;1)]) under the null hypothesis of simple random sampling.
It follows that under the null hypothesis, 2 = Z Z = Y Y asymptotically has
the 2( ;1)( ;1) distribution.
T
r
O r
c
c
T
T
T
Nrc
r
c
r
c
T
XP
r
T
c
3. Partitioning Pearson's Statistic
We now show that 2 may be partitioned into components, the th of which
detects th moment departures from the null hypothesis of similarly distributed
rows (treatments).
X 2
The elements of Y are such that 2 =
. There is some choice in dening
XP
s
s
rc
Yi
XP
Yi
the , as H is not yet fully specied. In doing so, our aim is to nd that can
be easily and usefully interpreted. To achieve one such partition, rst suppose
that f ( )g is the set of polynomials orthonormal on f
g. See the Appendix
for the denitions of the rst two polynomials and the derivation of subsequent
polynomials. This approach results, when there are no ties, in the rst component
being the Kruskal-Wallis test. Write for the by 1 vector with elements ( ).
Dene G by
=1
i
Yi
Yi
gs j
n:j =n::
gs
in which G is the
s
G = G
rc
by matrix
r
2
60
G = 664 ..
.
0
gs
:::
gs
.. . . ..
. . .
0 0
s
:::
21
60
G = 664 ..
.
q
:::
3
77
75
gs
:::
c
:::
gs j
G]p
c =
s
3
77
75
0
1
.. . . ..
. . .
0 0
1
c
c
c
=1
c
:::
is also
c
; 1 and
rc
by
r:
c
;1 G Z . The elements of Y may be considered in blocks of
Dene Y =
, the th block corresponding to the polynomial of order . These blocks are
asymptotically mutually independent. Write Y = (V 1
V ), in which
T
n
n
r
s
s
T
V1=(
Y1 : : :
Yr
)
T
:::
V
;1 = (Y( ;2)
c
c
T
r
+1
T
c
:::
:::
;1) )
Y(c
r
T
EXTENDED KRUSKAL-WALLIS AND MEDIAN TESTS
17
and V = 0 (all the V are by 1) so that
c
; 1
r
s
; 1
=
Z Z = Y Y = V 1 V 1 + + V ;1V ;1
; This partitions ;1 2 into components V V , = 1
; 1. The V are
asymptotically mutually independent and asymptotically (0 I ( ;1) 0), so that
the V V are asymptotically mutually independent 2;1. Explicitly we have, for
n
n
n
2
XP
n
n
T
T
T
T
s
XP
s
T
c
:::
n
s
:::
c
Nr
s
=1
T
s
:::
s
c
c
:
s
r
r
; 1,
p( ; 1)
p( ; 1) 0X
@
V =
G Z=
n
s
T
s
n
c
n
n
j
()
gs j Zij
=1
1
A
:
Because V involves, through , a polynomial of order , the elements of V are
polynomials of order in the elements of N . Under the null hypothesis Z ] = 0,
but when this is not true V ] involves moments up to order of Z . So for
=1
; 1, V V detects th moment departures from the null hypothesis of
similarly distributed rows (treatments).
Instructors Example. See Conover (1980, p. 233). Three instructors assign
grades in ve categories according to the following table.
gs
s
s
s
s
E
E
s
:::
T
s
r
s
s
s
s
Grade
A B C D
Instructor 1 4 14 17 6
Instructor 2 10 6 9 7
Instructor 3 6 7 8 6
Total
20 27 34 19
E Total
2 43
6 38
1 28
9 109
Conover (1980) found the Kruskal-Wallis statistic adjusted for ties to be 0.3209,
which is to be compared with the 22 (5%) point of 5.991. We nd the location detecting component V 1 V 1 to have P-value 0.85, conrming, as Conover reported,
that \none of the instructors can be said to grade higher or lower than the others on the basis of the evidence presented". However the dispersion detecting
component V 2 V 2 has P-value 0.01, indicating a signicant variability dierence.
From the data it appears that the rst instructor is less variable than the other
two. In fact, 9 643 = (;2 113)2 + (2 274)2 + (;0 031)2, with the elements of
v2 = (;2 113 2 274 ;0 031) being values of approximately standard normally
distributed contributions from instructors 1, 2 and 3 respectively. The rst instructor is less variable than the third who is less variable than the second. This can
be formalised by a LSD analysis. The residual 2 ; V 1 V 1 ; V 2 V 2 has P-value
0.75, indicating no further eects in the data.
T
T
:
:
:
:
:
:
:
T
XP
T
T
J.C.W. RAYNER AND D.J. BEST
18
Partition of 2 for instructor's data
Statistics
Degrees of Freedom Value P-value
V1V1
2
0.324
0.85
XP
T
V2V2
2 ;V V ;V V
1 1
2 2
T
T
XP
T
2
XP
2
9.643
0.01
4
8
2.021
11.985
0.75
0.15
4. The Kruskal-Wallis Test with No Ties
We now consider models that lead to the Kruskal-Wallis test when there are no ties.
The latent roots of (Z ) will be found explicitly rather than asymptotically as in
section 2. We show that 2 is not an appropriate test statistic, but nevertheless,
its components are. The rst component is the Kruskal-Wallis test statistic, and
the subsequent components provide informative extensions.
Suppose we have distinct observations , being the th of observations on
the th of treatments. All = 1 + + observations are combined, ordered,
ranked, and the sums of the ranks obtained by the th treatment calculated.
The Kruskal-Wallis statistic is
cov
XP
xij
i
t
n
n
:::
j
ni
nt
Ri
i
= f12 ( + 1)]g 2 ; 3( + 1)
See for example, Conover (1980, section 5.2). The data may be presented as an by
contingency table of counts f g, with
= 1 if rank is allotted to treatment
, and
= 0 if rank is allotted to some other treatment. The row and column
totals are all xed: the row totals are the treatment sample sizes, so that =
for = 1
, while the column totals are all one: = 1 for = 1
. Such
a table has 2 = ( ; 1) no matter what the f g. Since 2 is constant, it has
P-value 1. Clearly 2 is not a suitable test statistic.
The model of section 2 holds, except that now
R = ; 1 In ; ;1 1 1 1
This matrix has one latent root one and ; 1 latent roots ( ; 1). It follows
that (Z ) has ( ; 1)( ; 1) latent roots ( ; 1), and the remaining + ; 1
latent roots zero. So if H is orthogonal and diagonalises (Z ) then
H
= n n
i Ri =ni
n
:
t
n
Nij
i
Nij
Nij
j
j
ni:
i
:::
t
n:j
XP
t
n
i
Nij
:::
ni
n
X
P
XP
n
n
n
n
T
:
n
n
cov
t
n
n= n
n= n
t
cov
H
Dene
T
cov
(Z )H = ( ; 1)
n
n
I
( ;1)( ;1) 0( + ;1)
t
n
t
n
:
n
EXTENDED KRUSKAL-WALLIS AND MEDIAN TESTS
s
Y=
Then
n
19
; 1 H Z
T
n
:
; 1
; 1
Y Y=
Z Z=
n
T
n
T
n
n
2
XP :
With replaced by and replaced by , this is the same as in section three.
As in section 2, we are interested in the distribution theory as ! 1. However
there Z was an by 1 vector of xed length here Z is a by 1 vector. Fortunately, it is not the asymptotic distribution of Z that is required. First recall
that 2 has a xed value, ( ; 1) , for all tables, and so is not available as a test
statistic. Second, as in section three, the multivariate Central Limit Theorem shows
that each V is asymptotically (0 I ( ;1) 0). Moreover consideration of all pairs
V , V shows that they are asymptotically jointly multivariate normal, and since
their covariance matrix is zero, they are asymptotically pairwise independent. The
V2 V still partition ; ;1 2 . It is the pairwise independence and convenient
;1 distribution of each V that makes data analysis so informative and convenient. What is lost by the unavailability of 2 , is demonstrated in the Employees
Example below: there is no residual available to assess if there are higher moment
dierences between the treatments.
We now show that provided there are no ties, V 1 V 1 is the Kruskal-Wallis statistic, so that the subsequent V V provide extensions to the Kruskal-Wallis test.
First note that the f ( )g is the set of polynomials orthonormal on the discrete
uniform distribution, so that 1 ( ) = + , = 1
, in which
r
t
c
n
n
rc
tn
XP
t
n
Nt
s
s
T
s
t
t
n
s
XP
n
t
s
XP
T
T
s
s
gs j
g
a
p
= n
:::
j
:::
n
Ri
,
q
()
g1 j
E Nij
n
= n
n
, is
j Nij
=1
]=p
ni n
j
X
n
X
j
X
and
b
p
b
i
=1
aj
= 12 ( 2 ; 1) and = ; 3( + 1) ( ; 1) = ;f( + 1) 2g
The rank sum for treatment ,
j
j
X
n
, =1
i
:::
=
. Now since
t
n:j
( ) ( )(1 ) = 0
g1 j g0 j
=n
j
() =
Zij g1 j
p
n=ni
j
=
a
p
]
P
n=ni
j
]
Ri
p
( + )= Nij aj
b
; +2 1
n
ni
:
n=ni
]f
aRi
+
a:
bni
g
= 1 for
J.C.W. RAYNER AND D.J. BEST
20
Now
V1V1
=
T
=
2
Y1
n
+
:::
;1
2
n
0
12
X X
A
+ 2 = ;2 1 @
1( )
2
a n
Yt
X
t
=1
i
t
n
n
n
g
i
=1
j
=1
p ; +2 1 p
Ri
n
ni
2
ni
j Zij
= ( 12+ 1)
n n
X
t
i
=1
2
; 3( + 1)
Ri
n
ni:
after some manipulation. This is the Kruskal-Wallis statistic, well known to be
sensitive to location departures from the null hypothesis. Since V assesses th
moment departures between treatments, we have partitioned the statistic ( ;1 ) 2
into asymptotically pairwise independent components, V V , = 1
;1, each
with the 2;1 distribution, and such that the th detects th moment departures
from the hypothesis of similarly distributed rows (treatments). Since the rst of
these is the Kruskal-Wallis statistic, the subsequent components provide extensions
to the Kruskal-Wallis test.
Employees Example. Conover (1980, p. 238, exercise 2) gave an exercise in which
20 new employees are randomly assigned to four dierent job training programmes.
At the end of their training the employees are ranked, with a low ranking reecting
a low job ability.
s
s
n
n
T
s
t
s
Programme
1
2
3
4
XP
s
s
:::
n
s
Ranks
2, 4, 6, 7, 10
1, 3, 8, 11, 12
5, 14, 16, 19, 20
9, 13, 15, 17, 18
The value of the Kruskal-Wallis statistic is 9.72, with 23 P-value 0.021, but Monte
Carlo permutation test P-value 0.010. The latter is more likely to be accurate as
the sample size is small. Further components are not signicant. An LSD analysis
can be used to show that programmes 1 and 2 and programmes 3 and 4 are equally
eective, with 3 and 4 being superior.
5. The Kruskal-Wallis Test with Ties
If there are ties, the data may be presented as an by contingency table of
counts f g, with the row totals are xed at the treatment sample sizes, so again
= , =1
, while the column totals are no longer all one. The covariance
matrix of Z is
t
n
Nij
ni:
ni
i
:::
t
q
cov
(Z ) = fIt ; ( fi fj
])g R and
R=
diag
( ; 1) ;
n:u n::
n::
n:u n:v
n::
;1
:
EXTENDED KRUSKAL-WALLIS AND MEDIAN TESTS
21
As in section 2, the latent roots of R are zero once and asymptotically one ; 1
times. It follows that (Z ) has ( ; 1)( ; 1) latent roots asymptotically one,
and the remaining + ; 1 latent roots zero. With suitable modications the
; 1,
partitioning of section three holds. For = 1
n
cov
t
t
n
n
s
V
s
=G
T
s
Zp
=
=
n
:::
X
n
j
=1
n
()
p
gs j Zij =
n :
Note that f ( )g is the set of polynomials orthonormal on f
g, not on the
discrete uniform as in the previous section when there were no ties. This is the
partition derived in section 3 for 2 . So the rst component of 2 in the Instructors example is the Kruskal-Wallis statistic corrected for ties. The subsequent
components are extensions to the Kruskal-Wallis test adjusted for ties. Note that
for this example the model assumed in section 3, with xed numbers of rows and
columns, is more plausible than the model of this section, since = 5 is hardly
large.
gs j
n:j =n::
XP
XP
n
6. Generalised Median Tests
Conover (1980, section 4.3) described the median test, in which random samples are
taken from each of c populations. Each random sample is classied as above and
below the grand median (the median of the combined random samples), forming
an by 2 contingency table with xed marginal totals. The usual chi-squared test,
based on 2 , is then applied to this contingency table.
If instead of the grand median, a `grand quantile' is used, the resulting test is
described as a quantile test: see Conover (1980, p. 174). These tests can be generalised by choosing instead of two categories for the combined random samples,
and so forming an by contingency table of counts
of the number of observations for the th sample in the th category. This table has all row and column
totals xed and can be tested for row consistency using the results of the sections 2
and 3. The rst three say, components of 2 are of particular interest, indicating
location, dispersion and skewness dierences between treatments.
It is routine to show that the location component V 1 V 1 of 2 reduces to the
median test statistic when observations are classied into just two categories. This
is shown in the Appendix. The result identies the median test as a location detecting test. To detect up to th moment dierences between the populations requires
categorisation into + 1 categories and the use of the V 2
V components. If
there are as many categories as observations and each category has one observation,
the test based on the location component is the Kruskal-Wallis test, which is known
to be more powerful than the median test. Using more than two categories will
result in less loss of information due to categorisation compared to the median test,
and will permit assessment of higher moment dierences between the treatments.
Corn Example. Conover (1980, p. 172) gave the example of four dierent methods
of growing corn. He classied the data as greater than 89 and up to 88 and applied
r
XP
c
r
c
Nij
i
j
XP
T
XP
s
s
:::
s
J.C.W. RAYNER AND D.J. BEST
22
the median test. In this form this does not conform to the xed margins model.
If the objective were to divide the data into groups of the lowest 18 and highest
16 observations, it would conform to the xed margins model. We now classify the
data into four approximately equal groups.
Using the median test, Conover reported a P-value \slightly less than 0.001": the
method median yields are clearly dierent. We calculate 2 = 49 712 on 9 degrees
of freedom. In addition V 1 V 1 = 25 723, V 2 V 2 = 19 972 and V 3 V 3 = 2 574, all
on 3 degrees of freedom. The location and dispersion components and 2 are all
signicant, with P-values all zero to three decimal places. The residual or skewness
component has 23 P-value 0.45. The ner classication, compare to that employed
by the median test, has uncovered a variability dierence between the methods:
methods 3 and 4 are signicantly less variable than 1 and 2.
T
XP
T
:
:
T
:
:
XP
First
Second Third
Fourth Total
Quartile Quartile Quartile Quartile
0
3
4
2
9
1
6
3
0
10
0
0
1
6
7
8
0
0
0
8
9
9
8
8
34
Method 1
Method 2
Method 3
Method 4
Total
Appendix
The Orthogonal Polynomials
The rst two polynomials, dened on 1
and orthonormal with regard to
the weights 1
, are 1 ( ) and 2 ( ), given explicitly by:
x
p
:::
pc
g
xj
g
in which
=
xc
( )=( ; ) p
g1 xj
and
:::
xj
xj
=
( ) = f( ; )2 ; 3 ( ; )
g2 xj
X
a
xj
c
j
xj pj
r
=1
=
X
c
j
=1
( ; )
xj
xj
r
pj
=2
and
a
2
; 2g
=
;
4
j
+
=1
2
3 =2
:::
;
c
2
2
;0 5
:
:
The subsequent polynomials 3 ( )
;1( ) may be derived by using the useful
recurrence relations in Emerson (1968). In the text we have taken, as in many
applications, = , = 1
.
g
xj
j
j
xj
:::
:::
gc
xj
c
Derivation of the Covariance Matrix of the Cell Counts
In section 2 the method used to nd the moments of the
is described. To nd
21 ], we take 21 j 1 + 2
=1
], then the conditional expectation
Nij
E N
E N
N j
N j
j
:::
c
of this expression with the sum of the rst three columns being known, and so on.
The successive expectations are
EXTENDED KRUSKAL-WALLIS AND MEDIAN TESTS
(
n2: N11
f
(
+ 21 ) ( 1 + 2 )
+ 2 )gf( 1 + 2 )(
N
n2: = n1:
..
.
= n :
n :
n :
n :
N11
n :
+
23
+
N21
)(
N31 = n1:
+
n2:
+
n3:
)g
)gf( 1 + 2 ) ( 1 + 2 + 3 )g
+ ( ;1) ) ( 1 + + )gf 1g = 2 1
By symmetry ] =
, =2
and = 1
. By dierence the
expectations for the rst row may be obtained, giving the familiar
f
(
n2: = n1:
+
f( 1 +
n :
n2:
n c
E Nij
: = n :
ni: n:j =n::
E Nij
In the same way
n : = n :
n :
:::
]=
i
ni: n:j =n::
n :
:::
:::
=1
i
n :
nc:
r
:::
n : n :
:::
and
r
n:1
n:
n : n: =n:: :
j
( ; 1)] = 2 ( 2 ; 1) 1(
from which we obtain ( 21 ), and
E N21 N21
:::
n:
j
=1
c
:::
c:
; 1) f ( ; 1)g
= n:: n::
var N
(
var Nij
)=
ni:
n:j
1;
n::
; n:j
n::
n::
ni:
n::
;1
i
=1
:::
and
r
j
=1
:::
c:
Similarly
(
cov Nij Nik
)=;
ni:
n:j n:k
; n::
n:: n::
n::
ni:
;1
i
=1
:::
r
and
j
i
=1
:::
r
and
j
6= = 1
k
:::
c:
:::
c
By symmetry
(
cov Nrj Nsj
)=;
n:j
nr: ns:
; n::
n:: n::
n::
n:j
;1
6= = 1
k
and by the expectation argument again
(
cov Nir Njs
)=
ni: n:r nj: n:s
n::
n::
1
n::
;1
6= = 1
i
j
:::
and
r
Write N = ( 1
) , =1
and N = (N 1
covariance matrix of N and N is, for 6= ,
Ni
i
:::
Nic
T
i
(N
N )=;
i
:::
j
i
T
r
j
T
:::
r
6= = 1
s
N ).
T
r
:::
c:
The joint
( ; 1) ;
;1
Now since the f g are such that the row and column totals are known constants,
(N N 1 + + N ) = 0 for = 1
. So if we write =
, =1
,
and
cov
i
j
ni: nj:
n:: n::
diag
n:r n::
n::
n:r n:s
n::
:
Nij
cov
i
:::
r
i
:::
r
fj
n:j =n::
j
:::
c
J.C.W. RAYNER AND D.J. BEST
24
R=
diag
then the covariance matrix of N
cov
(Ni ) = ;
X
i
6=
cov
( ; 1) ;
is
n:r n::
n::
i
(N
N )=
i
j
i
;1
n::
X
j
n:r n:s
6=
fi fj
R=
fi
(1 ; )R
fi
j
which agrees with direct calculation. So if is the Kronecker product, the covariance matrix of N is
(N ) = f
Recall that we have dened
=(
1
, and Z = ( 11
1
cov
Zij
:::
c
Z
:::
( ) ; ( )g p
; ]) ], = 1
) . It follows that
diag fj
Z c :::
cov
Nij
fi fj
E Nij
Ztl : : :
q
(Z ) = fI ; ( The location component V 1 V 1 of
T
fi fj
X
2
P
=
E Nij
i
:::
r
T
Ztc
r
R:
])g R
and =
j
:
reduces to the median test statistic
If there are observations below a predetermined point in the combined sample,
and above it, then Conover (1980, p. 172) gave the 2 Median test statistic as
b
a
T
=
2
n::
ab
X
r
;
Ni1
=1
i
ni: b
X
2
=ni: :
n::
It is routine to show that 1 = ( ; ) , = 1
, in which and are are
the mean and standard deviation of the distribution dened by ( = 1) =
2 . Now
and ( = 2) =
. It follows that = 1 +
and 2 =
g j
j
=
j
:::
c
P X
P X
a=n::
p
ni: V1i
=
2
X
j
Nij g1j
=1
=
Ni1
(;
= ( ;
);
It follows that, as required, V 1 V 1 = .
ni:
ani: =n::
T
a=n::
a=n::
Ni1
=
)+(
ni:
ni: b=n::
;
;
Ni1
b=n::
ab=n::
)(1 ;
a=n::
)
Ni1 :
T
References
1. D.J. Best and J.C.W. Rayner. Nonparametric analysis for doubly ordered two-way contingency tables. Biometrics, 52:1153-1156, 1996.
2. W.J. Conover. Practical Nonparametric Statistics (2nd ed.). Wiley, New York, 1980.
3. P.L. Emerson. Numerical construction of orthogonal polynomials from a general recurrence
formula. Biometrics, 24:695-701, 1968.
4. H.O. Lancaster. The Chi-squared Distribution. Wiley, New York, 1969.
5. J.C.W. Rayner and D.J. Best. Smooth Tests of Goodness of Fit. Oxford University Press,
New York, 1989.
6. J.C.W. Rayner and D.J. Best. Smooth extensions of Pearson's produce moment correlation
and Spearman's rho. Statistics and Probability Letters, 30(2):171-177, 1996a.
EXTENDED KRUSKAL-WALLIS AND MEDIAN TESTS
25
7. J.C.W. Rayner and D.J. Best. Extensions to some important nonparametric tests. In Proceedings of the A.C. Aitken Centenary Conference, Dunedin 1995, (Kavalieris, L. et al. ed.),
University of Otago Press, Dunedin, 257-266, 1996b.
8. S.N. Roy and S.K. Mitra. An introduction to some non-parametric generalisations of analysis
of variance and multivariate analysis. Biometrika, 43:361-376, 1956.
Download