Matakuliah Tahun : D0722 - Statistika dan Aplikasinya : 2010 Statistik Non Parametrik Pertemuan 13 Learning Outcomes • Pada akhir pertemuan ini, diharapkan mahasiswa akan mampu : 1. menerapkan statistik non parametrik: uji tanda dan uji runtunan 2. menerapkan statistik non parametrik: uji peringkat 3 COMPLETE BUSINESS STATISTICS 5th edi tion 1-4 Nonparametric Tests • Nonparametric Tests Distribution-free methods making no assumptions about the population distribution Types of tests • Sign tests • McGraw-Hill/Irwin » Sign Test: Comparing paired observations » McNemar Test: Comparing qualitative variables » Cox and Stuart Test: Detecting trend Runs tests » Runs Test: Detecting randomness » Wald-Wolfowitz Test: Comparing two distributions Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 5th edi tion 1-5 Nonparametric Tests (Continued) • Nonparametric Tests Ranks tests • • • Mann-Whitney U Test: Comparing two populations Wilcoxon Signed-Rank Test: Paired comparisons Comparing several populations: ANOVA with ranks » Kruskal-Wallis Test » Friedman Test: Repeated measures Spearman Rank Correlation Coefficient Chi-Square Tests • • • McGraw-Hill/Irwin Goodness of Fit Testing for independence: Contingency Table Analysis Equality of Proportions Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-6 BUSINESS STATISTICS 14-2 Sign Test • Comparing paired observations Paired observations: X and Y p = P(X>Y) • Two-tailed test • Right-tailed test • Left-tailed test • Test statistic: McGraw-Hill/Irwin H0: p = 0.50 H1: p0.50 H0: p 0.50 H1: p0.50 H0: p 0.50 H1: p 0.50 T = Number of + signs Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 5th edi tion 1-7 Sign Test Decision Rule • Small Sample: Binomial Test For a two-tailed test, find a critical point corresponding as closely as possible to /2 (C1) and define C2 as n-C1. Reject null hypothesis if T C1or T C2. For a right-tailed test, reject H0 if T C, where C is the value of the binomial distribution with parameters n and p = 0.50 such that the sum of the probabilities of all values less than or equal to C is as close as possible to the chosen level of significance, . For a left-tailed test, reject H0 if T C, where C is defined as above. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-8 BUSINESS STATISTICS Example CEO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 3 5 2 2 4 2 1 5 4 5 3 2 2 2 1 3 4 Before 4 5 3 4 4 3 2 4 5 4 4 5 5 3 2 2 5 McGraw-Hill/Irwin After 1 0 1 1 0 1 1 -1 1 -1 1 1 1 1 1 -1 1 Sign + + + + + + + + + + + + n = 15 T = 12 0.025 C1=3 C2 = 15-3 = 12 H0 rejected, since T C2 Aczel/Sounderpandian C1 Cumulative Binomial Probabilities (n=15, p=0.5) x F(x) 0 0.00003 1 0.00049 2 0.00369 3 0.01758 4 0.05923 5 0.15088 6 0.30362 7 0.50000 8 0.69638 9 0.84912 10 0.94077 11 0.98242 12 0.99631 13 0.99951 14 0.99997 15 1.00000 © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 1-9 5th edi tion 14-3 The Runs Test - A Test for Randomness A run is a sequence of like elements that are preceded and followed by different elements or no element at all. Case 1: S|E|S|E|S|E|S|E|S|E|S|E|S|E|S|E|S|E|S|E Case 2: SSSSSSSSSS|EEEEEEEEEE Case 3: S|EE|SS|EEE|S|E|SS|E|S|EE|SSS|E : R = 20 Apparently nonrandom : R = 2 Apparently nonrandom : R = 12 Perhaps random A two-tailed hypothesis test for randomness: H0: Observations are generated randomly H1: Observations are not generated randomly Test Statistic: R=Number of Runs Reject H0 at level if R C1 or R C2, as given in Table 8, with total tail probability P(R C1) + P(R C2) = McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 5th edi tion 1-10 Runs Test: Examples Table 8: (n1,n2) 11 (10,10) 0.586 0.758 0.872 0.949 0.981 0.996 0.999 1.000 1.000 1.000 . . . 12 Number of Runs (r) 13 14 15 16 17 18 19 20 Case 1: n1 = 10 n2 = 10 R= 20 p-value0 Case 2: n1 = 10 n2 = 10 R = 2 p-value 0 Case 3: n1 = 10 n2 = 10 R= 12 p-value PR F(11)] = (2)(1-0.586) = (2)(0.414) = 0.828 H0 not rejected McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 1-11 5th edi tion Ranks Tests • Ranks tests Mann-Whitney U Test: Comparing two populations Wilcoxon Signed-Rank Test: Paired comparisons Comparing several populations: ANOVA with ranks • Kruskal-Wallis Test • Friedman Test: Repeated measures McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 1-12 5th edi tion The Mann-Whitney U Test (Comparing Two Populations) The null and alternative hypotheses: H0: The distributions of two populations are identical H1: The two population distributions are not identical The Mann-Whitney U statistic: n1 ( n1 1) U n1 n2 R1 R 1 Ranks from sample 1 2 where n1 is the sample size from population 1 and n2 is the sample size from population 2. n1n2 n1n2 (n1 n2 1) E[U ] U 2 12 U E[U ] The large - sample test statistic: z U McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-13 BUSINESS STATISTICS The Mann-Whitney U Test: Example 14-4 Model A A A A A A B B B B B B Time 35 38 40 42 41 36 29 27 30 33 39 37 McGraw-Hill/Irwin Rank 5 8 10 12 11 6 2 1 3 4 9 7 Rank Sum U n1 n 2 n1 ( n1 1) 2 (6)(6 + 1) = (6)(6) + 5 52 26 R1 52 2 Cumulative Distribution Function of the Mann-Whitney U Statistic n2=6 n1=6 u . . . 4 0.0130 P(u5) 5 0.0206 6 0.0325 . . . Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-14 BUSINESS STATISTICS The Wilcoxon Signed-Ranks Test (Paired Ranks) The null and alternative hypotheses: H0: The median difference between populations are 1 and 2 is zero H1: The median difference between populations are 1 and 2 is not zero Find the difference between the ranks for each pair, D = x1 -x2, and then rank the absolute values of the differences. The Wilcoxon T statistic is the smaller of the sums of the positive ranks and the sum of the negative ranks: T min ( ), ( ) For small samples, a left-tailed test is used, using the values in Appendix C, Table 10. E[T ] n ( n 1) 4 The large-sample test statistic: McGraw-Hill/Irwin T z T E[T ] n ( n 1)( 2 n 1) 24 T Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-15 BUSINESS STATISTICS Example Sold Sold (1) (2) (D<0) 56 48 100 85 22 44 35 28 52 77 89 10 65 90 70 33 40 70 60 70 8 40 45 7 60 70 90 10 85 61 40 26 Rank Rank Rank D=x1-x2 ABS(D) ABS(D)(D>0) 16 -22 40 15 14 4 -10 21 -8 7 -1 0 -20 29 30 7 McGraw-Hill/Irwin 16 22 40 15 14 4 10 21 8 7 1 * 20 29 30 7 9.0 12.0 15.0 8.0 7.0 2.0 6.0 11.0 5.0 3.5 1.0 * 10.0 13.0 14.0 3.5 9.0 0.0 15.0 8.0 7.0 2.0 0.0 11.0 0.0 3.5 0.0 * 0.0 13.0 14.0 3.5 0 12 0 0 0 0 6 0 5 0 1 * 10 0 0 0 Sum: 86 34 Aczel/Sounderpandian T=34 n=15 P=0.05 30 P=0.025 25 P=0.01 20 P=0.005 16 H0 is not rejected (Note the arithmetic error in the text for store 13) © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-16 BUSINESS STATISTICS Example 14-7 Hourly Rank Messages (D<0) 151 144 123 178 105 112 140 167 177 185 129 160 110 170 198 165 109 118 155 102 164 180 139 166 82 Rank Rank Md0 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 2 -5 -26 29 -44 -37 -9 18 28 36 -20 11 -39 21 49 16 -40 -31 6 -47 15 31 -10 17 33 D=x1-x2 ABS(D) ABS(D) (D>0) 2 5 26 29 44 37 9 18 28 36 20 11 39 21 49 16 40 31 6 47 15 31 10 17 33 1.0 2.0 13.0 15.0 23.0 20.0 4.0 10.0 14.0 19.0 11.0 6.0 21.0 12.0 25.0 8.0 22.0 16.5 3.0 24.0 7.0 16.5 5.0 9.0 18.0 Sum: McGraw-Hill/Irwin 1.0 0.0 0.0 15.0 0.0 0.0 0.0 10.0 14.0 19.0 0.0 6.0 0.0 12.0 25.0 8.0 0.0 0.0 3.0 0.0 7.0 16.5 0.0 9.0 18.0 0.0 2.0 13.0 0.0 23.0 20.0 4.0 0.0 0.0 0.0 11.0 0.0 21.0 0.0 0.0 0.0 22.0 16.5 0.0 24.0 0.0 0.0 5.0 0.0 0.0 163.5 161.5 Aczel/Sounderpandian E[ T ] n ( n 1) (25)(25 + 1) = T = 162.5 4 4 n ( n 1)( 2 n 1) 24 25( 25 1)(( 2 )( 25) 1) 24 33150 37 .165 24 The large - sample test statistic: z T E[ T ] T 163.5 162 .5 0.027 37 .165 H 0 cannot be rejected © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 5th edi tion 1-17 The Kruskal-Wallis Test - A Nonparametric Alternative to One-Way ANOVA The Kruskal-Wallis hypothesis test: H0: All k populations have the same distribution H1: Not all k populations have the same distribution The Kruskal-Wallis test statistic: 12 k Rj H 3(n 1) n(n 1) n j 1 j 2 If each nj > 5, then H is approximately distributed as a 2. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 1-18 BUSINESS STATISTICS 5th edi tion Example : The Kruskal-Wallis Test SoftwareTimeRank Group RankSum 1 45 14 1 90 1 38 10 2 56 1 56 16 3 25 1 60 17 1 47 15 1 65 18 2 30 8 2 40 11 2 28 7 2 44 13 2 25 5 2 42 12 3 22 4 3 19 3 3 15 1 3 31 9 3 27 6 3 17 2 McGraw-Hill/Irwin 2 R k 12 j H j1 3( n 1) n ( n 1) nj 12 902 562 252 3(18 1) 18(18 1) 6 6 6 12 11861 57 342 6 12 .3625 2(2,0.005)=10.5966, so H0 is rejected. Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 5th edi tion 1-19 Further Analysis (Pairwise Comparisons of Average Ranks) If the null hypothesis in the Kruskal-Wallis test is rejected, then we may wish, in addition, compare each pair of populations to determine which are different and which are the same. The pairwise comparison test statistic: D Ri R j where R i is the mean of the ranks of the observations from population i. The critical point for the paired comparisons: n(n 1) 1 1 2 C KW ( , k 1 ) 12 ni n j Reject if D > C KW McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-20 BUSINESS STATISTICS Pairwise Comparisons: Example 14-8 C KW Critical Point: n(n 1) 1 1 ( 2 ,k 1 ) 12 ni n j 18(18 1) 1 1 ( 9.21034) 12 6 6 87.49823 9.35 90 15 6 56 R2 9.33 6 25 R3 4.17 6 R1 McGraw-Hill/Irwin D1,2 15 9.33 5.67 D1,3 15 4.17 10.83 *** D2,3 9.33 4.17 516 . Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-21 BUSINESS STATISTICS The Friedman Test for a Randomized Block Design The Friedman test is a nonparametric version of the randomized block design ANOVA. Sometimes this design is referred to as a two-way ANOVA with one item per cell because it is possible to view the blocks as one factor and the treatment levels as the other factor. The test is based on ranks. The Friedman hypothesis test: H0: The distributions of the k treatment populations are identical H1: Not all k distribution are identical The Friedman test statistic: 2 12 R 3n( k 1) nk (k 1) k j 1 2 j The degrees of freedom for the chi-square distribution is (k – 1). McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 1-22 5th edi tion The Spearman Rank Correlation Coefficient The Spearman Rank Correlation Coefficient is the simple correlation coefficient calculated from variables converted to ranks from their original values. The Spearman Rank Correlation Coefficient (assuming no ties): n 2 6 di rs 1 i 21 where d = R(x ) - R(y ) i i i n ( n 1) Null and alternative hypotheses: H 0: s = 0 H1: s 0 Critical values for small sample tests from Appendix C, Table 11 Large sample test statistic: z = rs ( n 1) McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-23 BUSINESS STATISTICS Spearman Rank Correlation Coefficient: Example 14-11 MMI S&P100 220 151 218 150 216 148 217 149 215 147 213 146 219 152 236 165 237 162 235 161 R-MMI R-S&P Diff Diffsq 7 6 1 1 5 5 0 0 3 3 0 0 4 4 0 0 2 2 0 0 1 1 0 0 6 7 -1 1 9 10 -1 1 10 9 1 1 8 8 0 0 Sum: Table 11: =0.005 n. .. 7 -----8 0.881 9 0.833 10 0.794 11 0.818 .. . 4 n 2 6 di (6)(4) 24 rs 1 i 21 = 1= 1= 0.9758 > 0.794 H rejected 990 0 n ( n 1) (10)(102 - 1) McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-24 BUSINESS STATISTICS A Chi-Square Test for Goodness of Fit Steps in a chi-square analysis: Formulate null and alternative hypotheses Compute frequencies of occurrence that would be expected if the null hypothesis were true - expected cell counts Note actual, observed cell counts Use differences between expected and actual cell counts to find chi-square statistic: 2 k (Oi Ei ) Ei i 1 2 Compare chi-statistic with critical values from the chi-square distribution (with k-1 degrees of freedom) to test the null hypothesis McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-25 BUSINESS STATISTICS Example : Goodness-of-Fit Test for the Multinomial Distribution The null and alternative hypotheses: H0: The probabilities of occurrence of events E1, E2...,Ek are given by p1,p2,...,pk H1: The probabilities of the k events are not as specified in the null hypothesis Assuming equal probabilities, p1= p2 = p3 = p4 =0.25 and n=80 Preference Tan Brown Maroon Black Total Observed 12 40 8 20 80 Expected(np) 20 20 20 20 80 (O-E) -8 20 -12 0 0 k ( Oi E i ) 2 i 1 Ei 2 ( 8 ) 20 2 ( 20 ) 2 ( 12 ) 20 2 20 ( 0) 2 20 30.4 2 11.3449 ( 0.01, 3) H 0 is rejected at the 0.01 level. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-26 BUSINESS STATISTICS Contingency Table Analysis: A Chi-Square Test for Independence First Classification Category Second Classification Category 1 2 3 4 5 Column Total McGraw-Hill/Irwin 1 O11 O21 O31 O41 O51 2 O12 O22 O32 O42 O52 3 O13 O23 O33 O43 O53 4 O14 O24 O34 O44 O54 5 O15 O25 O35 O45 O55 C1 C2 C3 C4 C5 Aczel/Sounderpandian Row Total R1 R2 R3 R4 R5 n © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 5th edi tion 1-27 Contingency Table Analysis: A Chi-Square Test for Independence A and B are independent if:P(AUB) = P(A)P(B). If the first and second classification categories are independent:Eij = (Ri)(Cj)/n Null and alternative hypotheses: H0: The two classification variables are independent of each other H1: The two classification variables are not independent Chi-square test statistic for independence: 2 ( O E ) ij 2 ij Eij i 1 j 1 r c Degrees of freedom: df=(r-1)(c-1) Expected cell count: McGraw-Hill/Irwin Ri C j Eij n Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-28 BUSINESS STATISTICS Contingency Table Analysis: Example 2(0.01,(2-1)(2-1))=6.63490 Industry Type Profit ij 11 12 21 22 Service Nonservice (Expected) (Expected) Total 42 18 60 (Expected) (60*48/100)=28.8 (60*52/100)=31.2 Loss 6 34 (Expected) (40*48/100)=19.2 (40*52/100)=20.8 Total 48 52 O 42 18 6 34 E 28.8 31.2 19.2 20.8 O-E 13.2 -13.2 -13.2 13.2 (O-E)2 (O-E)2/E 174.24 6.0500 174.24 5.5846 174.24 9.0750 174.24 8.3769 2: McGraw-Hill/Irwin 29.0865 H0 is rejected at the 0.01 level and it is concluded that the two variables are not independent. 40 100 2 Yates corrected for a 2x2 table: 2 Oij Eij 0.5 2 Eij Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 RINGKASAN Statistik non parametrikUji tanda Uji runtunan Uji peringkat Uji Kruskal Wallis 29