n 2 - WordPress.com

advertisement
Chi-Square Applications
liliksugiharti@yahoo.co.id
lilik for mm
1
introduction
• You are the manager of The Sheraton Hotel Group.
Guests who are satisfied with the quality of services
during their stay are more likely to return on a future
vacation and to recommended the hotel to friends and
relatives. To asses the quality of services being provided
by your hotels, guest are encourage to compete a
satisfaction survey when they check out. You need to
analyze the data from these surveys to determine the
overall satisfaction with the service provided, the
likelihood that the guests will return to the hotel, and the
reasons some guests indicate that they will not return.
lilik for mm
2
Χ2 test for the difference
between two proportions
• Comparing the tallies or counts of categorical
responses between two independent groups 
two way cross-classification table
(contingency table)
• H0: there is no difference between the two
population proportions
– H0: p1=p2
• H1: two population proportions are not the same
– H0: p1≠p2
lilik for mm
3
Characteristics of
The Chi-Square Distribution
• It is never negative
• There is a family of chi-square
distributions
– The shape of the chi-square distribution does
not depend on the size of the sample, but the
number of categories used (k)
• It is positively skewed
– As the number of both d.f. increases, the
distribution begins to approximate the normal
distribution
lilik for mm
4
2-2
CHI-SQUARE DISTRIBUTION
df = 3
df = 5
df = 10
c2
lilik for mm
5
Chi-Square Test
•
•
•
•
•
Compare several proportion (Multinomial Test)
One of nonparametric or distribution-free tests of hypothesis
Data : nominal-scale or ordinal-scale
The test statistic is :
  f  f 2 
x2  

0
e
fe


c 2 test statistic is equal to the squared difference between the
observed and expected frequencies, divided by the expected
frequency in each cell of the table
• f0 is observed frequency in a particular cell of a contingency table
• fe is theoretical or expected frequency in a particular cell if the null
hypothesis is true
lilik for mm
6
Row
Variable
successe
s
failures
totals
Column variable (group)
1
2
totals
X1
X2
X
n1-X1
n2-X
n-X
n1
n2
n
lilik for mm
7
•
•
•
•
•
•
X1= number of successes in group 1
X2= number of successes in group 2
n1-X1= number of failures in group 1
n2-X2= number of failures in group 2
X= X1+ X2 is the total number of successes
n-X=(n1-X1)+(n2-X2) is the total number of
failures
• n1=the sample size in group 1
• n2=the sample size in group 2
• n=n1+n2 is the total sample size
lilik for mm
8
Example: are you likely to choose
this hotel again?
Choose
hotel
again?
Hotel
Sheraton
Nusa Dua
154
Total
yes
Sheraton
lagoon
163
no
64
108
172
Total
227
262
489
lilik for mm
317
9
Output minitab
Chi-Square Test: Lagoon, Nusa Dua
Expected counts are printed below observed counts
1
Lagoon Nusa Dua
154
163
147.16 169.84
Total
317
2
64
79.84
108
92.16
172
Total
227
262
489
Chi-Sq = 1.706 + 1.478 +
3.144 + 2.724 = 9.053
DF = 1, P-Value = 0.003
lilik for mm
10
cont..
• Chi-square test is used to :
– Test whether an observed set of frequencies
could have come from a hypothesized
population distribution
– Determine whether the sample observations
come from a particular distribution such as the
normal distribution
– Contingency table analysis, is used to test
whether two traits or characteristics are related
(Test of Independency)
lilik for mm
11
Rejection and non-rejection
area
Reject H0 if χ2>χ2U
Otherwise do not reject H0
(1-α)
α
χ2
00
Region of non-rejection
Critical value Region of rejection
lilik for mm
12
• If the null hypothesis is true, the computed
c2
statistic should be close to zero because the squared
difference between what is actually observed in each cell
f0, and what is theoretically expected fe, would be very
small
• On the other hand, if H0 is false, and there are real
differences in the population proportions, the computed
statistic is expected to be large. This is because the
difference between what is actually observed in each cell
and what is theoretically
expected will be magnified
c2
when the difference are squared
lilik for mm
13
Goodness-of-Fit Test:
Equal Expected Frequencies
• The purpose of Goodness-of-Fit Test is to
compare an observed set of frequencies (fo)
to an expected set of frequencies (fe).
• Ho : no difference between fo and fe
• H1 : there is a difference between fo & fe
• The critical value is a chi-square value with (k
- 1) degrees of freedom, where k is the
number of categories
lilik for mm
14
Contoh : Penjualan Kaos Pemain Sepak
Bola
Pemain
Owen
Ronaldo
Nesta
Dida
Becham
Zidane
TOTAL
Jumlah
Terjual (fo)
13
33
14
7
36
17
120
Jumlah yang
diharapkan Terjual
(fe)
20
20
20
20
20
20
120
lilik for mm
15
Cont..
Pemain
fo
fe
Owen
Ronaldo
Vieri
Buffon
Becham
Zidane
13
33
14
7
36
17
20
20
20
20
20
20
(fo – fe) (fo –
-7
13
-6
-13
16
-3
0
lilik for mm
fe)2
49
169
36
169
256
9
( fo  fe )2
fe
2,45
8,45
1,80
8,45
12,80
0,45
34,40
X
2
16
Goodness-of-Fit Test:
Unequal Expected Frequencies
• Contoh :
Dosen mengharapkan distribusi nilai ujian : A
= 40%, B = 40%, dan C = 20%. Hasil ujian
menunjukkan distribusi nilai sebagai berikut :
A : 30 orang B : 20 orang C : 10 orang
Uji dengan level of significance 10%, apakah
distribusi nilai tersebut sesuai dengan harapan
dosen tersebut ?
lilik for mm
17
Limitations of Chi-Square
• If there are only two cells, the expected
frequency in each cell should be 5 or more
• For more than two cells, Chi-Square should
not be used if more than 20% of the
expected frequency cells have expected
frequency less than 5.
lilik for mm
18
Example
Level of Management
Foreman
Supervisor
Manager
Middle Manager
Assistant vice president
Vice president
Senior vice president
TOTAL
lilik for mm
fo
30
110
86
23
5
5
4
263
fe
32
113
87
24
2
4
1
263
19
Level of Management
Foreman
Supervisor
Manager
Middle Manager
Vice president
TOTAL
lilik for mm
fo
30
110
86
23
14
263
fe
32
113
87
24
7
263
20
Goodness-of-Fit Test for Normality
• Purpose: To test whether the observed
frequencies in a frequency distribution
match the theoretical normal distribution.
• Procedure:
– Determine the mean and standard deviation
of the frequency distribution.
– Compute the z-value for the lower class limit
and the upper class limit for each class.
– Determine fe for each category
– Use the chi-square goodness-of-fit test to
determine if fo coincides with fe.
lilik for mm
21
EXAMPLE : Distribution of Salary
Salary ($ 000) frequency
20 – 30
4
30 – 40
20
40 – 50
41
50 – 60
44
60 – 70
29
70 – 80
16
80 – 90
2
90 – 100
4
TOTAL
160
lilik for mm
  54.03
  13.76
22
Salary (S 000)
Z Value
Under 30
Under –1.75
Area
fe
0.0401 6.416
30 – 40
-1.75 to -1.02 0.1138 18.208
40 – 50
-1.02 to -0.29 0.2320 37.120
50 – 60
-0.29 to 0.43 0.2805 44.880
60 – 70
0.43 to 1.16
0.2106 33.696
70 – 80
1.16 to 1.89
0.0936 14.976
80 or more
over 1.89
0.0294 1.704
1
Z 
160
x

lilik for mm
23
Calculation for Chi-Square
Salary (S 000)
fo
fe
Under 30
30 – 40
40 – 50
50 – 60
60 – 70
70 – 80
80 or more
4
20
41
44
29
16
6
160
6.416
18.208
37.120
44.880
33.696
14.976
1.704
160
2
(
f

f
)
fe)2 o f e
e
(fo – (fo –
fe)
-2.416
5.837
1.792
3.211
3.880 15.054
-0.880
0.774
-4.696 22.052
1.024
1.049
1.296
1.680
lilik for mm
0.910
0.176
0.406
0.017
0.654
0.070
0.357
2.590
X2
24
• Suppose we knew the mean and standard
deviation of population but wished to find
whether some sample data conform to the
normal distribution,
d.f. = k - 1
• On the other hand, if we don’t know the mean
and standard deviation of population but we
wish to test whether some sample data follow
the normal distribution,
d.f. = k – p – 1
(where p is the number of population
parameter being estimated from the sample
data)
lilik for mm
25
Contingency Table Analysis
• Contingency table analysis is used to test whether
two traits or variables are related.
 Two-way classification table
• Each observation is classified according to two
variables.
• d.f. : (number of rows-1)(number of columns-1).
• The expected frequency (fe) is computed as:

Row _ total Coloumn _ total 
fe 
Grand _ total
2
• Coefficient of Contingency :
lilik for mm
C
X
X2N
26
Contoh
Manajer produksi meneliti tingkat kerusakan pada
mesin produksi. Hasilnya pengamatan terhadap
barang yang diproduksi sebagai berikut
Kondisi
Rusak
Baik
Mesin 1
12
Mesin 2
15
Mesin 3
6
88
105
74
Apakah kerusakan tersebut disebabkan mesin
atau kebetulan saja ? Uji dengan  = 0,05
lilik for mm
27
Contoh
Lembaga riset meneliti apakah ada hubungan
antara jenis surat kabar yang dibaca dengan
kelompok masyarakat. Hasilnya sebagai
berikut :
Kelompok
Atas
Menengah
Bawah
A
170
120
130
Uji dengan  = 0,1
Surat Kabar
B
C
124
90
112
100
90
88
lilik for mm
28
Download