Science Journal of Mathematics & Statistics ISSN:2276-6324 Published By Science Journal Publication http://www.sjpub.org/sjms.html © Author(s) 2012. CC Attribution 3.0 License. International Open Access Publisher STATISTICAL ANALYSIS OF PAIRED SAMPLE DATA BY RANKS OYEKA, I. C.A, UMEH, E.U Accepted 18th June, 2012. DEPARTMENT OF STATISTICS, NNAMDI AZIKIWE UNIVERSITY, AWKA Email: editus2002@yahoo.com ABSTRACT This paper developed a rank based test statistic for testing the equality of two population medians. Instead of taking the differences between pairs of observations and using these differences with the signs test or taking the absolute values of these differences assigning them ranks and applying the Wilcoxon signed sum rank test, one may first assign ranks to members of the paired observations then used them to develop a rank – based test statistic for testing the equality of two populations. From the above result, the proposed method is observed to be more efficient than the Wilcoxon Signed rank sum test which in turn is more efficient than the ordinary sign test since it uses all available information quite unlike the ordinary Sign test and Wilcoxon signed rank sum test that ignores zero differences. Also it was observed that it had an added advantage over these ones since it enables the statistical comparison of the medians of two related populations that are measurements on as low as the ordinal scale. KEYWORD: Median, Wilcoxon Signed Ranks, Test Statistic, Paired Sample, Data INTRODUCTION If the usual assumptions of continuity and normality are satisfied, then the parametric paired sample ttest may be used to test the null hypothesis that two populations have equal medians. This parametric method may not however be properly used for this purpose if the necessary assumptions are not satisfied. In this case, use of non-parametric procedures is indicated. Some of the non-parametric method that readily suggest them here are the sign test and the Wilcoxon signed rank test (Gibbon 1971; Gibbon 1993, Oyeka 2009; Oyeka et al 2010, Zimmerman, 1998, Harrell, 1999, Corder and Foreman 2009, Wilcoxon 1945, Lowry, 2011 ). However, instead oftaking the differences between pairs of observations, and using these differences with the sign test or taking the absolute values of these differences assigning them ranks and applying the Wilcoxon signed sum rank test, one may first assign ranks either from the smaller to the larger or the larger to smaller to members of the paired observations and then using these ranks to develop a rank- based test statistic for testing the equality of two population medians. This procedure is proposed and developed in this paper. Suppose (xi1, xi2 ) equal xi1yi ) is the ith pair in a paired random sample of size n drawn from populations X and Y, for i = 1, 2, …,n. X and Y may be related or independent populations measured on at least the ordinal scale. Let assigned a rank ri1 = 2, 1.5, or 1 if xi2 greater than, equal to or less than, respectively. Similarly, let xi2 be assigned a rank ri2 =2, 1.5 or 1 if xi2 is greater than, equal to or less than xi1 respectively Let ri = ri2 -ri1 Define = For i = 1, 2, …, n Let (1) >0 = <0 (2) = P(Ui = 1) π0 = P(Ui = 0) π- = P(Ui = -1) Where π+ + π0 + π- = 1 Define =∑ Now E(Ui) = π+ - π- ; (3) (4) (5) ) Var( ) = + -( − If paired observations are randomly drawn, from populations X and y, then , and are respectively the probabilities that the second S c i e n c e J o u r n a l O f M a t h e m a t i c s a n d S t a t i s t i c s elements in the pairs, the observations from population Y are on the average greater than, equal to or less than the first elements in the pairs, the observations from population X. These probabilities are estimated respectively as = ; = ; (7) = Where I S S N : 2 2 7 6 - 6 3 2 4 P a g e |2 Now if the null hypothesis of equation 12 is true, then the test statistic = ( ) ( ) = ( ( ) ) (13) has approximately the chi-square distribution with one(1) degree of freedom for sufficiently large n and may be used to test the null hypothesis of equation (12). Null hypothesis is rejected at the - level of significance if ≥ (14) , , and are respectively the numbers of 1 , 0 , and −1 in the frequency distribution of the n-values of these numbers in , I = 1, 2,.., n The test statistic for the null hypothesis usually tested with the sign test (( = 0) is E(W) = ∑ ( ) = n( Var(W)=∑ ( )=n( is rejected if equation 14 is satisfied. The proposed method is similar to the ties adjusted modified sign test and yields similar results. Also an advantage of the proposed method over the ordinary sign test and the Wilcoxon signed rank sum test is that unlike the last two procedures, the proposed method enables the statistical comparison of the medians of two related populations that are measurements on as low as the ordinal scale. If the three methods can be equally used with a set of data, the proposed method is generally more powerful than the Wilcoxon signed rank sum test which is itself more powerful than the ordinary sign test. This is because the proposed method uses all available information unlike the ordinary sign test and the Wilcoxon signed rank sum test which often ignore zero differences. To prove that the proposed test statistic “W” is generally more powerful than the Wilcoxon signed rank sum test statistic that ignore zero absolute differences, it would be sufficient to show that “W” is relatively more efficient than “T” for a specified sample size. To show this, we note that the variance of “T” for a given sample size ‘n’ is Also l And − + −( ) (8) − ) ) (9) If paired observations are randomly selected from populations X and Y, then − is a measure of the probabilities that the second elements or numbers in the pairs, the observations from Y are on the average greater than minus the probability that they are on the average less than the first elements in the pairs, the observation from population X which is estimated from equation (8) as − = ( 10) Using equation 10 in equation 9, we obtain a sample estimate of the variance of W as Var(W) = n( + )− (11) Now if X and Y have equal population medians, then − would be expected to be equal to zero. However, often of more general interest is to determine whether the two population medians differ by some constant values. This is equivalent to determining whether and differ by some constants say. In other word a null hypothesis of research maybe. : − = Vs : − > ; say (− < < 1) (12) = ( ) Var(T) = ( = ( ( 15) ) )( ) ( 16) Using the variance of “W” given in Equation (9), we calculate the efficiency of “W” relative to “T” as RE(W,T) = ≥ ( − )( = ) ( ) ( ) = since ( − ( − )( ( ) ) ) ≥ , and (see equation 3) (4) How to Cite this Article: Oyeka, I. C.A Umeh, E.u, “Statistical Analysis of Paired Sample Data by Ranks,” Science Journal Of Mathematics and Statistics, Volume 2012 (2012), Article ID sjms-102, 5 Pages, doi: 10.7237/sjms/102 S c i e n c e J o u r n a l O f M a t h e m a t i c s ( , )>1 ∴ a n d S t a t i s t i c s ( 17) For all n≥ Hence the proposed test statistic “W” is relatively more efficient and thus more powerful than the Wilcoxon signed rank sum test statistic “T” for all sample sizes n ≥ 3. S/No Grade in Year1(χi1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 E C B+ F AB BE C CC A+ F E 15 16 17 18 C B B A Grade in Year1( χi2) = 0.056; and = ) = 3.428 ( − (Eqn 2) C+ C+ A+ C CB+ F B+ B+ CCB+ CB+ 1 1 1 1 2 1 2 1 1 1.5 2 2 1 1 2 2 2 2 1 2 1 2 2 1.5 1 1 2 2 -1 -1 -1 -1 1 -1 1 -1 -1 0 1 1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 0 1 1 -1 -1 CA A+ A+ 2 1 1 1 2 2 1 -1 -1 1 -1 -1 = 0.667 Hence the test statistic for the null hypothesis of Equation 12 is from equation 13 . To illustrate the application of the proposed method to ordinal data, we here use the results in letter grades of a random sample of 18 students in two related courses they took during their first and second years of study in the university. The results are presented in Table 1 shown also in Table 1 are the ranks assigned to these grades, their differences and values of ( Equation2) Rank of (χi2) (ri1) Var(W) = (18)(0.278 + 0.667 − (0.278 − 0.667) ) = (18)(0.794)= 14.292 ( P a g e |3 Rank of(χi1) (ri1) Also W = 5-12 =-7 and From Equation 9 , we have = 2 2 7 6 - 6 3 2 4 Table 1: Letter Grades of a Random Sample of Students and Values of From the last column of Table 1 and equation 7, we have = 5, = 1 and = 12, so that from equation7, we have = = 0.278; = I S S N : = 0.0322) Difference ri = ri1 - ri2 ui Eqn 2 Which with 1 degree of freedom s statistically Significant at = 0.05. This indicates that students may have on the average improved their performance in the two courses taken during their first and second years of study. Note that ordinary Sign and Wilcoxon sign rank sum test cannot be applicable with this type of data which is ordinal letter. We now further illustrate the method with ratio type data. Illustrative Example 2: In a study to compare the actual with the ideal family size of married woman, a random sample of 24 married women were selected and asked to state the actual number of children they had and the ideal number of children they would like to have. The results are as follows: How to Cite this Article: Oyeka, I. C.A Umeh, E.u, “Statistical Analysis of Paired Sample Data by Ranks,” Science Journal Of Mathematics and Statistics, Volume 2012 (2012), Article ID sjms-102, 5 Pages, doi: 10.7237/sjms/102 S c i e n c e J o u r n a l O f M a t h e m a t i c s a n d S t a t i s t i c s S/NO Actual χi1 Ideal χi2 ri1 ri2 ri = ri2 - ri1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 4 1 6 1 7 1 4 2 8 5 4 4 5 5 4 4 5 3 10 5 6 5 4 0 5 5 5 6 5 9 4 6 8 5 4 5 6 6 4 6 6 7 9 5 4 6 5 5 1 1 2 1 2 1 1.5 1 1.5 1.5 1.5 1 1 1 1.5 1 1 1 2 1.5 2 1 1 1 2 2 1 2 1 2 1.5 2 1.5 1.5 1.5 2 2 2 1.5 2 2 2 1 1.5 1 2 2 2 +1 +1 -1 +1 -1 +1 0 +1 0 0 0 +1 +1 +1 0 +1 +1 +1 -1 0 -1 +1 +1 +1 The number of +1s is = 14, number of 0s is number of -1s is =4 ∴ = W= = = = − = 0.583; = 0.167 14-4 = 10 = = Var (W) = 24(0.583 + 0.167) − 13.833 = ( . ) = 7.229( − = 1 if ri >0 Ui = 0 if ri = 0 -1 if ri< 0 1 1 -1 1 -1 1 0 1 0 0 0 1 1 1 0 1 1 1 -1 0 -1 1 1 1 =6, = 0.250, = 18 − 4.167 = = 0.0036) Which with 1 degree of freedom is highly statistically significant indicating that actual and desired number of children differ significantly. If we were to apply the ordinary Sign test , we would have that the effective sample size to use is n= 24-6 =18, since they are altogether 6 tied observations in the data. Now the number of minuses which is 4 is less than the number of plus signs which is 14,we calculate the probability of obtaining at most x=4 minus signs out of a total of n=4 minus signs and 14 plus signs under the null hypothesis of equal population medians ( = 0.50) obtaining P(X≤ 4) = ∑ 18 Which is less than (0.50) = 0.025 = 0.01544 I S S N : 2 2 7 6 - 6 3 2 4 P a g e |4 [di] di = χi2 - χi1 1 4 -1 5 -2 8 0 4 0 0 0 1 1 1 0 2 1 4 -1 0 -2 1 1 5 +1 +4 1 +5 2 +8 0 +4 0 0 0 +1 +1 +1 0 +2 +1 +4 1 0 2 +1 +1 +5 Rank of [di] 5 14 5 16.5 11 18 14 5 5 5 11 5 14 5 11 5 5 16.5 Hence we reject the null hypothesis at 5% level of significant.To apply the Wilcoxon signed rank sum test, we note from column 10 of the above table that the sum of the ranks assigned to the absolute values of differences with negative signs is 5+11+5+11 = 32 = T Now the expected value of T is ( ) ( E(T) = = = And the variance of T is Var (T) = ( )( ) = ) = ( )( (P-value = 0.0099) which is statistically significant. = 85.5. ) = = 527.25 Note that for the present illustrative example, the efficiency of the proposed test statistic “W” relative to Wilcoxon’s signed rank sum test statistic T is . RE(W, T) = = 38.12, . showing that the proposed method is much more efficient than the Wilcoxon signed rank sum test How to Cite this Article: Oyeka, I. C.A Umeh, E.u, “Statistical Analysis of Paired Sample Data by Ranks,” Science Journal Of Mathematics and Statistics, Volume 2012 (2012), Article ID sjms-102, 5 Pages, doi: 10.7237/sjms/102 S c i e n c e J o u r n a l O f M a t h e m a t i c s a n d S t a t i s t i c s which in turn more efficient than the ordinary sign test. SUMMARY AND CONCLUSION This paper developed a rank based test statistic for testing the equality of two population medians. Instead of taking the differences between pairs of observations and using these differences with the signs test or taking the absolute values of these differences assigning them ranks and applying the Wilcoxon signed sum rank test, one may first assign RREFERENCES I S S N : 2 2 7 6 - 6 3 2 4 P a g e |5 ranks to members of the paired observations then used them to develop a rank – based test statistic for testing the equality of two populations. From the above result, the proposed method is observed to be more efficient than the Wilcoxon Signed rank sum test which in turn is more efficient than the ordinary sign test since it uses all available information. Also it has an added advantage over them since it enables the statistical comparison of the medians of two related populations that are measurements on as low as the ordinal scale. 1. Corder, G.W. and Foreman, D. I.: Non- Parametric Statistics for non- Statisticians: A Step by step Approach, New Jersey, Wiley 2009 3. Gibbons, J. D.: Non- Parametric Statistical. An Introduction; Newbury Park: Sage Publication 1993 2. 4. 5. 6. 7. 8. Gibbons, J. D.: Non- Parametric Statistical Inference. McGraw Hill, New York, 1971 Hollander, M. and Wolfe, D.A.: Non-Parametric Statistical Methods (2nd Edition). Wiley Interscience, New York, 1999 Lowry, Richard: Concepts and Applications of inferential Statistics, Retrieved 24th march 2011 Oyeka, C. A., Ebuh, G.U., Nwankwo, C.C., Obiora- Ilouno, H. Ibeakuzie, P. O., Utazi, C. : A Statistical Comparison of Test Scores: A NonParametric Approach. Journal of Mathematical Sciences. Vol 21. No 1(2010) 77-87 Siegel, S.: Non- Parametric Statistics for the Behavioural Sciences. McGraw- Hill, Kogakusha, Ltd, Tokyo Wilcoxon, frank: Individual Comparison s by Ranking Methods. Biometrics Bulletin, 1 (6): 80-83 How to Cite this Article: Oyeka, I. C.A Umeh, E.u, “Statistical Analysis of Paired Sample Data by Ranks,” Science Journal Of Mathematics and Statistics, Volume 2012 (2012), Article ID sjms-102, 5 Pages, doi: 10.7237/sjms/102