STATISTICAL ANALYSIS OF PAIRED SAMPLE DATA BY RANKS

advertisement
Science Journal of Mathematics & Statistics
ISSN:2276-6324
Published By
Science Journal Publication
http://www.sjpub.org/sjms.html
© Author(s) 2012. CC Attribution 3.0 License.
International Open Access Publisher
STATISTICAL ANALYSIS OF PAIRED SAMPLE DATA BY RANKS
OYEKA, I. C.A, UMEH, E.U
Accepted 18th June, 2012.
DEPARTMENT OF STATISTICS, NNAMDI AZIKIWE UNIVERSITY, AWKA
Email: editus2002@yahoo.com
ABSTRACT
This paper developed a rank based test statistic for testing the
equality of two population medians. Instead of taking the
differences between pairs of observations and using these
differences with the signs test or taking the absolute values of
these differences assigning them ranks and applying the Wilcoxon
signed sum rank test, one may first assign ranks to members of
the paired observations then used them to develop a rank – based
test statistic for testing the equality of two populations. From the
above result, the proposed method is observed to be more
efficient than the Wilcoxon Signed rank sum test which in turn is
more efficient than the ordinary sign test since it uses all available
information quite unlike the ordinary Sign test and Wilcoxon
signed rank sum test that ignores zero differences. Also it was
observed that it had an added advantage over these ones since it
enables the statistical comparison of the medians of two related
populations that are measurements on as low as the ordinal scale.
KEYWORD: Median, Wilcoxon Signed Ranks, Test Statistic, Paired
Sample, Data
INTRODUCTION
If the usual assumptions of continuity and normality
are satisfied, then the parametric paired sample ttest may be used to test the null hypothesis that two
populations have equal medians. This parametric
method may not however be properly used for this
purpose if the necessary assumptions are not
satisfied. In this case, use of non-parametric
procedures is indicated. Some of the non-parametric
method that readily suggest them here are the sign
test and the Wilcoxon signed rank test (Gibbon 1971;
Gibbon 1993, Oyeka 2009; Oyeka et al 2010,
Zimmerman, 1998, Harrell, 1999, Corder and
Foreman 2009, Wilcoxon 1945, Lowry, 2011 ).
However, instead oftaking the differences between
pairs of observations, and using these differences
with the sign test or taking the absolute values of
these differences assigning them ranks and applying
the Wilcoxon signed sum rank test, one may first
assign ranks either from the smaller to the larger or
the larger to smaller to members of the paired
observations and then using these ranks to develop a
rank- based test statistic for testing the equality of
two population medians. This procedure is proposed
and developed in this paper.
Suppose (xi1, xi2 ) equal xi1yi ) is the ith pair in a
paired random sample of size n drawn from
populations X and Y, for i = 1, 2, …,n. X and Y may be
related or independent populations measured on at
least the ordinal scale. Let assigned a rank ri1 = 2, 1.5,
or 1 if
xi2 greater than, equal to or less than,
respectively. Similarly, let xi2 be assigned a rank
ri2 =2, 1.5 or 1 if xi2 is greater than, equal to or less
than xi1 respectively
Let
ri = ri2 -ri1
Define
=
For i = 1, 2, …, n
Let
(1)
>0
=
<0
(2)
= P(Ui = 1)
π0 = P(Ui = 0)
π- = P(Ui = -1)
Where π+ + π0 + π- = 1
Define
=∑
Now
E(Ui) = π+ - π- ;
(3)
(4)
(5)
)
Var( ) =
+
-(
−
If paired observations are randomly drawn, from
populations X and y, then
,
and
are
respectively the probabilities that the second
S c i e n c e
J o u r n a l
O f
M a t h e m a t i c s
a n d
S t a t i s t i c s
elements in the pairs, the observations from
population Y are on the average greater than, equal
to or less than the first elements in the pairs, the
observations from population X. These probabilities
are estimated respectively as
=
;
=
;
(7)
=
Where
I S S N :
2 2 7 6 - 6 3 2 4
P a g e |2
Now if the null hypothesis of equation 12 is true,
then the test statistic
=
(
)
( )
=
(
(
)
)
(13)
has approximately the chi-square distribution with
one(1) degree of freedom for sufficiently large n and
may be used to test the null hypothesis of equation
(12). Null hypothesis is rejected at the - level of
significance if
≥
(14)
,
,
and
are respectively the numbers of 1 ,
0 , and −1 in the frequency distribution of
the n-values of these numbers in , I = 1, 2,.., n
The test statistic for the null hypothesis usually
tested with the sign test (( = 0) is
E(W) = ∑
( ) = n(
Var(W)=∑
( )=n(
is rejected if equation 14 is satisfied.
The proposed method is similar to the ties adjusted
modified sign test and yields similar results. Also an
advantage of the proposed method over the ordinary
sign test and the Wilcoxon signed rank sum test is
that unlike the last two procedures, the proposed
method enables the statistical comparison of the
medians of two related populations that are
measurements on as low as the ordinal scale. If the
three methods can be equally used with a set of data,
the proposed method is generally more powerful
than the Wilcoxon signed rank sum test which is
itself more powerful than the ordinary sign test. This
is because the proposed method uses all available
information unlike the ordinary sign test and the
Wilcoxon signed rank sum test which often ignore
zero differences.
To prove that the proposed test statistic “W” is
generally more powerful than the Wilcoxon signed
rank sum test statistic that ignore zero absolute
differences, it would be sufficient to show that “W” is
relatively more efficient than “T” for a specified
sample size. To show this, we note that the variance
of “T” for a given sample size ‘n’ is
Also
l
And
−
+
−(
)
(8)
−
) )
(9)
If paired observations are randomly selected from
populations X and Y, then
−
is a measure of
the probabilities that the second elements or
numbers in the pairs, the observations from Y are on
the average greater than minus the probability that
they are on the average less than the first elements
in the pairs, the observation from population X
which is estimated from equation (8) as
−
=
( 10)
Using equation 10 in equation 9, we obtain a sample
estimate of the variance of W as
Var(W) = n(
+
)−
(11)
Now if X and Y have equal population medians, then
−
would be expected to be equal to zero.
However, often of more general interest is to
determine whether the two population medians
differ by some constant values. This is equivalent to
determining whether
and
differ by some
constants
say. In other word a null hypothesis of
research maybe.
:
−
=
Vs
:
−
> ; say (− <
< 1)
(12)
=
( )
Var(T) =
(
=
(
( 15)
)
)(
)
( 16)
Using the variance of “W” given in Equation (9), we
calculate the efficiency of “W” relative to “T” as
RE(W,T) =
≥
(
−
)(
=
)
( )
( )
=
since (
−
(
−
)(
(
)
)
) ≥ , and
(see equation 3) (4)
How to Cite this Article: Oyeka, I. C.A Umeh, E.u, “Statistical Analysis of Paired Sample Data by Ranks,” Science Journal Of Mathematics and Statistics, Volume
2012 (2012), Article ID sjms-102, 5 Pages, doi: 10.7237/sjms/102
S c i e n c e
J o u r n a l
O f
M a t h e m a t i c s
( , )>1
∴
a n d
S t a t i s t i c s
( 17)
For all n≥
Hence the proposed test statistic “W” is relatively
more efficient and thus more powerful than the
Wilcoxon signed rank sum test statistic “T” for all
sample sizes n ≥ 3.
S/No
Grade in
Year1(χi1)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
E
C
B+
F
AB
BE
C
CC
A+
F
E
15
16
17
18
C
B
B
A
Grade in
Year1( χi2)
= 0.056; and
=
)
= 3.428 (
−
(Eqn 2)
C+
C+
A+
C
CB+
F
B+
B+
CCB+
CB+
1
1
1
1
2
1
2
1
1
1.5
2
2
1
1
2
2
2
2
1
2
1
2
2
1.5
1
1
2
2
-1
-1
-1
-1
1
-1
1
-1
-1
0
1
1
-1
-1
-1
-1
-1
-1
1
-1
1
-1
-1
0
1
1
-1
-1
CA
A+
A+
2
1
1
1
2
2
1
-1
-1
1
-1
-1
= 0.667
Hence the test statistic for the null hypothesis of
Equation 12 is from equation 13
.
To illustrate the application of the proposed method
to ordinal data, we here use the results in letter
grades of a random sample of 18 students in two
related courses they took during their first and
second years of study in the university. The results
are presented in Table 1 shown also in Table 1 are
the ranks assigned to these grades, their differences
and values of ( Equation2)
Rank of (χi2) (ri1)
Var(W) = (18)(0.278 + 0.667 − (0.278 − 0.667) )
= (18)(0.794)= 14.292
(
P a g e |3
Rank of(χi1) (ri1)
Also W = 5-12 =-7 and From Equation 9 , we have
=
2 2 7 6 - 6 3 2 4
Table 1: Letter Grades of a Random Sample of Students and Values of
From the last column of Table 1 and equation 7, we
have
= 5,
= 1 and
= 12, so that from
equation7, we have =
= 0.278;
=
I S S N :
= 0.0322)
Difference ri = ri1 - ri2
ui
Eqn 2
Which with 1 degree of freedom s statistically
Significant at = 0.05. This indicates that students
may have on the average improved
their
performance in the two courses taken during their
first and second years of study. Note that ordinary
Sign and Wilcoxon sign rank sum test cannot be
applicable with this type of data which is ordinal
letter. We now further illustrate the method with
ratio type data. Illustrative Example 2:
In a study to compare the actual with the ideal family
size of married woman, a random sample of 24
married women were selected and asked to state the
actual number of children they had and the ideal
number of children they would like to have. The
results are as follows:
How to Cite this Article: Oyeka, I. C.A Umeh, E.u, “Statistical Analysis of Paired Sample Data by Ranks,” Science Journal Of Mathematics and Statistics, Volume
2012 (2012), Article ID sjms-102, 5 Pages, doi: 10.7237/sjms/102
S c i e n c e
J o u r n a l
O f
M a t h e m a t i c s
a n d
S t a t i s t i c s
S/NO
Actual χi1
Ideal χi2
ri1
ri2
ri = ri2 - ri1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
4
1
6
1
7
1
4
2
8
5
4
4
5
5
4
4
5
3
10
5
6
5
4
0
5
5
5
6
5
9
4
6
8
5
4
5
6
6
4
6
6
7
9
5
4
6
5
5
1
1
2
1
2
1
1.5
1
1.5
1.5
1.5
1
1
1
1.5
1
1
1
2
1.5
2
1
1
1
2
2
1
2
1
2
1.5
2
1.5
1.5
1.5
2
2
2
1.5
2
2
2
1
1.5
1
2
2
2
+1
+1
-1
+1
-1
+1
0
+1
0
0
0
+1
+1
+1
0
+1
+1
+1
-1
0
-1
+1
+1
+1
The number of +1s is
= 14, number of 0s is
number of -1s is
=4
∴
=
W=
=
=
=
−
= 0.583;
= 0.167
14-4 = 10
= =
Var (W) = 24(0.583 + 0.167) −
13.833
=
(
.
)
= 7.229(
−
=
1 if ri >0
Ui = 0 if ri = 0
-1 if ri< 0
1
1
-1
1
-1
1
0
1
0
0
0
1
1
1
0
1
1
1
-1
0
-1
1
1
1
=6,
= 0.250,
= 18 − 4.167 =
= 0.0036)
Which with 1 degree of freedom is highly statistically
significant indicating that actual and desired number
of children differ significantly.
If we were to apply the ordinary Sign test , we would
have that the effective sample size to use is n= 24-6
=18, since they are altogether 6 tied observations in
the data. Now the number of minuses which is 4 is
less than the number of plus signs which is 14,we
calculate the probability of obtaining at most x=4
minus signs out of a total of n=4 minus signs and 14
plus signs under the null hypothesis of equal
population medians ( = 0.50) obtaining
P(X≤ 4) = ∑
18
Which is less than
(0.50)
= 0.025
= 0.01544
I S S N :
2 2 7 6 - 6 3 2 4
P a g e |4
[di]
di = χi2 - χi1
1
4
-1
5
-2
8
0
4
0
0
0
1
1
1
0
2
1
4
-1
0
-2
1
1
5
+1
+4
1
+5
2
+8
0
+4
0
0
0
+1
+1
+1
0
+2
+1
+4
1
0
2
+1
+1
+5
Rank of
[di]
5
14
5
16.5
11
18
14
5
5
5
11
5
14
5
11
5
5
16.5
Hence we reject the null hypothesis at 5% level of
significant.To apply the Wilcoxon signed rank sum
test, we note from column 10 of the above table that
the sum of the ranks assigned to the absolute values
of differences with negative signs is
5+11+5+11 = 32 = T
Now the expected value of T is
(
)
(
E(T) = =
=
And the variance of T is
Var (T) =
(
)(
)
=
)
=
(
)(
(P-value = 0.0099)
which is statistically significant.
= 85.5.
)
=
= 527.25
Note that for the present illustrative example, the
efficiency of the proposed test statistic “W” relative
to Wilcoxon’s signed rank sum test statistic T is
.
RE(W, T) =
= 38.12,
.
showing that the proposed method is much more
efficient than the Wilcoxon signed rank sum test
How to Cite this Article: Oyeka, I. C.A Umeh, E.u, “Statistical Analysis of Paired Sample Data by Ranks,” Science Journal Of Mathematics and Statistics, Volume
2012 (2012), Article ID sjms-102, 5 Pages, doi: 10.7237/sjms/102
S c i e n c e
J o u r n a l
O f
M a t h e m a t i c s
a n d
S t a t i s t i c s
which in turn more efficient than the ordinary sign
test.
SUMMARY AND CONCLUSION
This paper developed a rank based test statistic for
testing the equality of two population medians.
Instead of taking the differences between pairs of
observations and using these differences with the
signs test or taking the absolute values of these
differences assigning them ranks and applying the
Wilcoxon signed sum rank test, one may first assign
RREFERENCES
I S S N :
2 2 7 6 - 6 3 2 4
P a g e |5
ranks to members of the paired observations then
used them to develop a rank – based test statistic for
testing the equality of two populations.
From the above result, the proposed method is
observed to be more efficient than the Wilcoxon
Signed rank sum test which in turn is more efficient
than the ordinary sign test since it uses all available
information. Also it has an added advantage over
them since it enables the statistical comparison of
the medians of two related populations that are
measurements on as low as the ordinal scale.
1.
Corder, G.W. and Foreman, D. I.: Non- Parametric Statistics for non- Statisticians: A Step by step Approach, New Jersey, Wiley 2009
3.
Gibbons, J. D.: Non- Parametric Statistical. An Introduction; Newbury Park: Sage Publication 1993
2.
4.
5.
6.
7.
8.
Gibbons, J. D.: Non- Parametric Statistical Inference. McGraw Hill, New York, 1971
Hollander, M. and Wolfe, D.A.: Non-Parametric Statistical Methods (2nd Edition). Wiley Interscience, New York, 1999
Lowry, Richard: Concepts and Applications of inferential Statistics, Retrieved 24th march 2011
Oyeka, C. A., Ebuh, G.U., Nwankwo, C.C., Obiora- Ilouno, H. Ibeakuzie, P. O., Utazi, C. : A Statistical Comparison of Test Scores: A NonParametric Approach. Journal of Mathematical Sciences. Vol 21. No 1(2010) 77-87
Siegel, S.: Non- Parametric Statistics for the Behavioural Sciences. McGraw- Hill, Kogakusha, Ltd, Tokyo
Wilcoxon, frank: Individual Comparison s by Ranking Methods. Biometrics Bulletin, 1 (6): 80-83
How to Cite this Article: Oyeka, I. C.A Umeh, E.u, “Statistical Analysis of Paired Sample Data by Ranks,” Science Journal Of Mathematics and Statistics, Volume
2012 (2012), Article ID sjms-102, 5 Pages, doi: 10.7237/sjms/102
Download