Document 14246863

advertisement
Matakuliah
Tahun
: D0722 - Statistika dan Aplikasinya
: 2010
Statistik Non Parametrik
Pertemuan 13
Learning Outcomes
•
Pada akhir pertemuan ini, diharapkan
mahasiswa akan mampu :
1. menerapkan statistik non parametrik: uji
tanda dan uji runtunan
2. menerapkan statistik non parametrik: uji
peringkat
3
COMPLETE
BUSINESS STATISTICS
5th edi tion
1-4
Nonparametric Tests
• Nonparametric Tests
Distribution-free methods making no
assumptions about the population distribution
Types of tests
• Sign tests
•
McGraw-Hill/Irwin
» Sign Test: Comparing paired observations
» McNemar Test: Comparing qualitative variables
» Cox and Stuart Test: Detecting trend
Runs tests
» Runs Test: Detecting randomness
» Wald-Wolfowitz Test: Comparing two distributions
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
5th edi tion
1-5
Nonparametric Tests (Continued)
• Nonparametric Tests
Ranks tests
•
•
•
Mann-Whitney U Test: Comparing two populations
Wilcoxon Signed-Rank Test: Paired comparisons
Comparing several populations: ANOVA with ranks
» Kruskal-Wallis Test
» Friedman Test: Repeated measures
Spearman Rank Correlation Coefficient
Chi-Square Tests
•
•
•
McGraw-Hill/Irwin
Goodness of Fit
Testing for independence: Contingency Table Analysis
Equality of Proportions
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-6
BUSINESS STATISTICS
14-2 Sign Test
• Comparing paired observations
Paired observations: X and Y
p = P(X>Y)
• Two-tailed test
• Right-tailed test
• Left-tailed test
• Test statistic:
McGraw-Hill/Irwin
H0: p = 0.50
H1: p0.50
H0: p  0.50
H1: p0.50
H0: p  0.50
H1: p 0.50
T = Number of + signs
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
5th edi tion
1-7
Sign Test Decision Rule
• Small Sample: Binomial Test
For a two-tailed test, find a critical point corresponding
as closely as possible to /2 (C1) and define C2 as n-C1.
Reject null hypothesis if T  C1or T  C2.
For a right-tailed test, reject H0 if T  C, where C is the
value of the binomial distribution with parameters n
and p = 0.50 such that the sum of the probabilities of all
values less than or equal to C is as close as possible to
the chosen level of significance, .
For a left-tailed test, reject H0 if T  C, where C is
defined as above.
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-8
BUSINESS STATISTICS
Example
CEO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
3
5
2
2
4
2
1
5
4
5
3
2
2
2
1
3
4
Before
4
5
3
4
4
3
2
4
5
4
4
5
5
3
2
2
5
McGraw-Hill/Irwin
After
1
0
1
1
0
1
1
-1
1
-1
1
1
1
1
1
-1
1
Sign
+
+
+
+
+
+
+
+
+
+
+
+
n = 15
T = 12
  0.025
C1=3 C2 = 15-3 = 12
H0 rejected, since
T  C2
Aczel/Sounderpandian
C1
Cumulative
Binomial
Probabilities
(n=15, p=0.5)
x
F(x)
0
0.00003
1
0.00049
2
0.00369
3
0.01758
4
0.05923
5
0.15088
6
0.30362
7
0.50000
8
0.69638
9
0.84912
10
0.94077
11
0.98242
12
0.99631
13
0.99951
14
0.99997
15
1.00000
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
1-9
5th edi tion
14-3 The Runs Test - A Test for
Randomness
A run is a sequence of like elements that are preceded and followed
by different elements or no element at all.
Case 1: S|E|S|E|S|E|S|E|S|E|S|E|S|E|S|E|S|E|S|E
Case 2: SSSSSSSSSS|EEEEEEEEEE
Case 3: S|EE|SS|EEE|S|E|SS|E|S|EE|SSS|E
: R = 20 Apparently nonrandom
: R = 2 Apparently nonrandom
: R = 12 Perhaps random
A two-tailed hypothesis test for randomness:
H0: Observations are generated randomly
H1: Observations are not generated randomly
Test Statistic:
R=Number of Runs
Reject H0 at level  if R  C1 or R  C2, as given in Table 8, with total tail
probability P(R  C1) + P(R  C2) = 
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
5th edi tion
1-10
Runs Test: Examples
Table 8:
(n1,n2)
11
(10,10)
0.586 0.758 0.872 0.949 0.981 0.996 0.999 1.000 1.000 1.000
.
.
.
12
Number of Runs (r)
13 14 15 16 17
18
19
20
Case 1: n1 = 10 n2 = 10 R= 20 p-value0
Case 2: n1 = 10 n2 = 10 R = 2 p-value 0
Case 3: n1 = 10 n2 = 10 R= 12
p-value PR  F(11)]
= (2)(1-0.586) = (2)(0.414) = 0.828
H0 not rejected
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
1-11
5th edi tion
Ranks Tests
• Ranks tests
 Mann-Whitney U Test: Comparing two populations
 Wilcoxon Signed-Rank Test: Paired comparisons
 Comparing several populations: ANOVA with ranks
• Kruskal-Wallis Test
• Friedman Test: Repeated measures
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
1-12
5th edi tion
The Mann-Whitney U Test (Comparing
Two Populations)
The null and alternative hypotheses:
H0: The distributions of two populations are identical
H1: The two population distributions are not identical
The Mann-Whitney U statistic:
n1 ( n1  1)
U  n1 n2 
 R1
R 1   Ranks from sample 1
2
where n1 is the sample size from population 1 and n2 is the
sample size from population 2.
n1n2
n1n2 (n1  n2  1)
E[U ] 
U 
2
12
U  E[U ]
The large - sample test statistic: z 
U
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-13
BUSINESS STATISTICS
The Mann-Whitney U Test:
Example 14-4
Model
A
A
A
A
A
A
B
B
B
B
B
B
Time
35
38
40
42
41
36
29
27
30
33
39
37
McGraw-Hill/Irwin
Rank
5
8
10
12
11
6
2
1
3
4
9
7
Rank
Sum
U  n1 n 2 
n1 ( n1  1)
2
(6)(6 + 1)
= (6)(6) +
5
52
26
 R1
 52
2
Cumulative Distribution Function of
the Mann-Whitney U Statistic
n2=6
n1=6
u
.
.
.
4
0.0130
P(u5)
5
0.0206
6
0.0325
.
.
.
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-14
BUSINESS STATISTICS
The Wilcoxon Signed-Ranks Test
(Paired Ranks)
The null and alternative hypotheses:
H0: The median difference between populations are 1 and 2 is zero
H1: The median difference between populations are 1 and 2 is not zero
Find the difference between the ranks for each pair, D = x1 -x2, and then rank the
absolute values of the differences.
The Wilcoxon T statistic is the smaller of the sums of the positive ranks and the sum of
the negative ranks:
T  min  (  ),  (  ) 
For small samples, a left-tailed test is used, using the values in Appendix C, Table 10.
E[T ] 
n ( n  1)
4
The large-sample test statistic:
McGraw-Hill/Irwin
T 
z
T  E[T ]
n ( n  1)( 2 n  1)
24
T
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-15
BUSINESS STATISTICS
Example
Sold Sold
(1) (2)
(D<0)
56
48
100
85
22
44
35
28
52
77
89
10
65
90
70
33
40
70
60
70
8
40
45
7
60
70
90
10
85
61
40
26
Rank Rank
Rank
D=x1-x2 ABS(D) ABS(D)(D>0)
16
-22
40
15
14
4
-10
21
-8
7
-1
0
-20
29
30
7
McGraw-Hill/Irwin
16
22
40
15
14
4
10
21
8
7
1
*
20
29
30
7
9.0
12.0
15.0
8.0
7.0
2.0
6.0
11.0
5.0
3.5
1.0
*
10.0
13.0
14.0
3.5
9.0
0.0
15.0
8.0
7.0
2.0
0.0
11.0
0.0
3.5
0.0
*
0.0
13.0
14.0
3.5
0
12
0
0
0
0
6
0
5
0
1
*
10
0
0
0
Sum:
86
34
Aczel/Sounderpandian
T=34
n=15
P=0.05 30
P=0.025 25
P=0.01 20
P=0.005 16
H0 is not rejected (Note the
arithmetic error in the text for
store 13)
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-16
BUSINESS STATISTICS
Example 14-7
Hourly
Rank
Messages
(D<0)
151
144
123
178
105
112
140
167
177
185
129
160
110
170
198
165
109
118
155
102
164
180
139
166
82
Rank
Rank
Md0
149
149
149
149
149
149
149
149
149
149
149
149
149
149
149
149
149
149
149
149
149
149
149
149
149
2
-5
-26
29
-44
-37
-9
18
28
36
-20
11
-39
21
49
16
-40
-31
6
-47
15
31
-10
17
33
D=x1-x2 ABS(D) ABS(D) (D>0)
2
5
26
29
44
37
9
18
28
36
20
11
39
21
49
16
40
31
6
47
15
31
10
17
33
1.0
2.0
13.0
15.0
23.0
20.0
4.0
10.0
14.0
19.0
11.0
6.0
21.0
12.0
25.0
8.0
22.0
16.5
3.0
24.0
7.0
16.5
5.0
9.0
18.0
Sum:
McGraw-Hill/Irwin
1.0
0.0
0.0
15.0
0.0
0.0
0.0
10.0
14.0
19.0
0.0
6.0
0.0
12.0
25.0
8.0
0.0
0.0
3.0
0.0
7.0
16.5
0.0
9.0
18.0
0.0
2.0
13.0
0.0
23.0
20.0
4.0
0.0
0.0
0.0
11.0
0.0
21.0
0.0
0.0
0.0
22.0
16.5
0.0
24.0
0.0
0.0
5.0
0.0
0.0
163.5
161.5
Aczel/Sounderpandian
E[ T ] 
n ( n  1)
(25)(25 + 1)
=
T 

= 162.5
4
4
n ( n  1)( 2 n  1)
24
25( 25  1)(( 2 )( 25)  1)
24

33150
 37 .165
24
The large - sample test statistic:
z 
T  E[ T ]
T

163.5  162 .5
 0.027
37 .165
H 0 cannot be rejected
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
5th edi tion
1-17
The Kruskal-Wallis Test - A Nonparametric
Alternative to One-Way ANOVA
The Kruskal-Wallis hypothesis test:
H0: All k populations have the same distribution
H1: Not all k populations have the same distribution
The Kruskal-Wallis test statistic:
12  k Rj 
H
  3(n  1)
n(n  1)  
n
j 1
j 
2
If each nj > 5, then H is approximately distributed as a 2.
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
1-18
BUSINESS STATISTICS
5th edi tion
Example : The Kruskal-Wallis Test
SoftwareTimeRank Group RankSum
1
45
14
1
90
1
38
10
2
56
1
56
16
3
25
1
60
17
1
47
15
1
65
18
2
30
8
2
40
11
2
28
7
2
44
13
2
25
5
2
42
12
3
22
4
3
19
3
3
15
1
3
31
9
3
27
6
3
17
2
McGraw-Hill/Irwin
2
R

k
12
j
H 
 j1   3( n  1)
n ( n  1) 
nj 
12
 902 562 252 




  3(18  1)
18(18  1)  6
6
6 
12   11861



  57
 342   6 
 12 .3625
2(2,0.005)=10.5966, so H0 is rejected.
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
5th edi tion
1-19
Further Analysis (Pairwise Comparisons
of Average Ranks)
If the null hypothesis in the Kruskal-Wallis test is rejected, then we may wish,
in addition, compare each pair of populations to determine which are different
and which are the same.
The pairwise comparison test statistic:
D  Ri  R j
where R i is the mean of the ranks of the observations from
population i.
The critical point for the paired comparisons:
 n(n  1)  1 1 
2
C KW  (   , k 1 ) 
  

 12  ni n j 
Reject if D > C KW
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-20
BUSINESS STATISTICS
Pairwise Comparisons: Example 14-8
C KW
Critical Point:
n(n  1)  1 1 
 (  2 ,k 1 ) 
 


 12  ni n j 
18(18  1)  1 1
 ( 9.21034)
  
12
 6 6
 87.49823  9.35
90
 15
6
56
R2   9.33
6
25
R3   4.17
6
R1 
McGraw-Hill/Irwin
D1,2  15  9.33  5.67
D1,3  15  4.17  10.83 ***
D2,3  9.33  4.17  516
.
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-21
BUSINESS STATISTICS
The Friedman Test for a Randomized
Block Design
The Friedman test is a nonparametric version of the randomized block design
ANOVA. Sometimes this design is referred to as a two-way ANOVA with one item per
cell because it is possible to view the blocks as one factor and the treatment levels as the
other factor. The test is based on ranks.
The Friedman hypothesis test:
H0: The distributions of the k treatment populations are identical
H1: Not all k distribution are identical
The Friedman test statistic:
 
2
12
 R  3n( k  1)
nk (k  1)
k
j 1
2
j
The degrees of freedom for the chi-square distribution is (k – 1).
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
1-22
5th edi tion
The Spearman Rank Correlation
Coefficient
The Spearman Rank Correlation Coefficient is the simple correlation coefficient
calculated from variables converted to ranks from their original values.
The Spearman Rank Correlation Coefficient (assuming no ties):
n 2
6  di
rs  1  i 21
where d = R(x ) - R(y )
i
i
i
n ( n  1)
Null and alternative hypotheses:
H 0:  s = 0
H1:  s  0
Critical values for small sample tests from Appendix C, Table 11
Large sample test statistic:
z = rs ( n  1)
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-23
BUSINESS STATISTICS
Spearman Rank Correlation Coefficient:
Example 14-11
MMI S&P100
220 151
218 150
216 148
217 149
215 147
213 146
219 152
236 165
237 162
235 161
R-MMI R-S&P Diff Diffsq
7
6
1
1
5
5
0
0
3
3
0
0
4
4
0
0
2
2
0
0
1
1
0
0
6
7
-1
1
9
10
-1
1
10
9
1
1
8
8
0
0
Sum:
Table 11: =0.005
n.
..
7
-----8
0.881
9
0.833
10
0.794
11
0.818
..
.
4
n 2
6  di
(6)(4)
24

rs  1  i 21
= 1= 1= 0.9758 > 0.794 H rejected
990
0
n ( n  1)
(10)(102 - 1)
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-24
BUSINESS STATISTICS
A Chi-Square Test for
Goodness of Fit

Steps in a chi-square analysis:
 Formulate null and alternative hypotheses
 Compute frequencies of occurrence that would be expected if the
null hypothesis were true - expected cell counts
 Note actual, observed cell counts
 Use differences between expected and actual cell counts to find
chi-square statistic:
2
k
(Oi  Ei )
 
Ei
i 1
2
 Compare chi-statistic with critical values from the chi-square
distribution (with k-1 degrees of freedom) to test the null
hypothesis
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-25
BUSINESS STATISTICS
Example : Goodness-of-Fit Test for the
Multinomial Distribution
The null and alternative hypotheses:
H0: The probabilities of occurrence of events E1, E2...,Ek are given by
p1,p2,...,pk
H1: The probabilities of the k events are not as specified in the null
hypothesis
Assuming equal probabilities, p1= p2 = p3 = p4 =0.25 and n=80
Preference
Tan
Brown Maroon Black
Total
Observed
12
40
8
20
80
Expected(np)
20
20
20
20
80
(O-E)
-8
20
-12
0
0
k ( Oi  E i )
2
  
i 1
Ei
2

( 8 )
20
2

( 20 )
2

( 12 )
20
2
20

( 0)
2
20
 30.4  
2
 11.3449
( 0.01, 3)
H 0 is rejected at the 0.01 level.
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-26
BUSINESS STATISTICS
Contingency Table Analysis:
A Chi-Square Test for Independence
First Classification Category
Second
Classification
Category
1
2
3
4
5
Column
Total
McGraw-Hill/Irwin
1
O11
O21
O31
O41
O51
2
O12
O22
O32
O42
O52
3
O13
O23
O33
O43
O53
4
O14
O24
O34
O44
O54
5
O15
O25
O35
O45
O55
C1
C2
C3
C4
C5
Aczel/Sounderpandian
Row
Total
R1
R2
R3
R4
R5
n
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
5th edi tion
1-27
Contingency Table Analysis:
A Chi-Square Test for Independence
A and B are independent if:P(AUB) = P(A)P(B).
If the first and second classification categories are independent:Eij = (Ri)(Cj)/n
Null and alternative hypotheses:
H0: The two classification variables are independent of each other
H1: The two classification variables are not independent
Chi-square test statistic for independence:
2
(
O

E
)
ij
 2    ij
Eij
i 1 j 1
r
c
Degrees of freedom: df=(r-1)(c-1)
Expected cell count:
McGraw-Hill/Irwin
Ri C j
Eij 
n
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-28
BUSINESS STATISTICS
Contingency Table Analysis:
Example
2(0.01,(2-1)(2-1))=6.63490
Industry Type
Profit
ij
11
12
21
22
Service Nonservice
(Expected) (Expected) Total
42
18
60
(Expected)
(60*48/100)=28.8
(60*52/100)=31.2
Loss
6
34
(Expected)
(40*48/100)=19.2
(40*52/100)=20.8
Total
48
52
O
42
18
6
34
E
28.8
31.2
19.2
20.8
O-E
13.2
-13.2
-13.2
13.2
(O-E)2 (O-E)2/E
174.24
6.0500
174.24
5.5846
174.24
9.0750
174.24
8.3769
 2:
McGraw-Hill/Irwin
29.0865
H0 is rejected at the
0.01 level and
it is concluded that the
two variables
are not independent.
40
100
2
Yates corrected  for a 2x2 table:
2
Oij  Eij  0.5
2
  
Eij
Aczel/Sounderpandian


© The McGraw-Hill Companies, Inc., 2002
RINGKASAN
Statistik non parametrikUji tanda
Uji runtunan
Uji peringkat
Uji Kruskal Wallis
29
Download