Chi-Square Applications liliksugiharti@yahoo.co.id lilik for mm 1 introduction • You are the manager of The Sheraton Hotel Group. Guests who are satisfied with the quality of services during their stay are more likely to return on a future vacation and to recommended the hotel to friends and relatives. To asses the quality of services being provided by your hotels, guest are encourage to compete a satisfaction survey when they check out. You need to analyze the data from these surveys to determine the overall satisfaction with the service provided, the likelihood that the guests will return to the hotel, and the reasons some guests indicate that they will not return. lilik for mm 2 Χ2 test for the difference between two proportions • Comparing the tallies or counts of categorical responses between two independent groups two way cross-classification table (contingency table) • H0: there is no difference between the two population proportions – H0: p1=p2 • H1: two population proportions are not the same – H0: p1≠p2 lilik for mm 3 Characteristics of The Chi-Square Distribution • It is never negative • There is a family of chi-square distributions – The shape of the chi-square distribution does not depend on the size of the sample, but the number of categories used (k) • It is positively skewed – As the number of both d.f. increases, the distribution begins to approximate the normal distribution lilik for mm 4 2-2 CHI-SQUARE DISTRIBUTION df = 3 df = 5 df = 10 c2 lilik for mm 5 Chi-Square Test • • • • • Compare several proportion (Multinomial Test) One of nonparametric or distribution-free tests of hypothesis Data : nominal-scale or ordinal-scale The test statistic is : f f 2 x2 0 e fe c 2 test statistic is equal to the squared difference between the observed and expected frequencies, divided by the expected frequency in each cell of the table • f0 is observed frequency in a particular cell of a contingency table • fe is theoretical or expected frequency in a particular cell if the null hypothesis is true lilik for mm 6 Row Variable successe s failures totals Column variable (group) 1 2 totals X1 X2 X n1-X1 n2-X n-X n1 n2 n lilik for mm 7 • • • • • • X1= number of successes in group 1 X2= number of successes in group 2 n1-X1= number of failures in group 1 n2-X2= number of failures in group 2 X= X1+ X2 is the total number of successes n-X=(n1-X1)+(n2-X2) is the total number of failures • n1=the sample size in group 1 • n2=the sample size in group 2 • n=n1+n2 is the total sample size lilik for mm 8 Example: are you likely to choose this hotel again? Choose hotel again? Hotel Sheraton Nusa Dua 154 Total yes Sheraton lagoon 163 no 64 108 172 Total 227 262 489 lilik for mm 317 9 Output minitab Chi-Square Test: Lagoon, Nusa Dua Expected counts are printed below observed counts 1 Lagoon Nusa Dua 154 163 147.16 169.84 Total 317 2 64 79.84 108 92.16 172 Total 227 262 489 Chi-Sq = 1.706 + 1.478 + 3.144 + 2.724 = 9.053 DF = 1, P-Value = 0.003 lilik for mm 10 cont.. • Chi-square test is used to : – Test whether an observed set of frequencies could have come from a hypothesized population distribution – Determine whether the sample observations come from a particular distribution such as the normal distribution – Contingency table analysis, is used to test whether two traits or characteristics are related (Test of Independency) lilik for mm 11 Rejection and non-rejection area Reject H0 if χ2>χ2U Otherwise do not reject H0 (1-α) α χ2 00 Region of non-rejection Critical value Region of rejection lilik for mm 12 • If the null hypothesis is true, the computed c2 statistic should be close to zero because the squared difference between what is actually observed in each cell f0, and what is theoretically expected fe, would be very small • On the other hand, if H0 is false, and there are real differences in the population proportions, the computed statistic is expected to be large. This is because the difference between what is actually observed in each cell and what is theoretically expected will be magnified c2 when the difference are squared lilik for mm 13 Goodness-of-Fit Test: Equal Expected Frequencies • The purpose of Goodness-of-Fit Test is to compare an observed set of frequencies (fo) to an expected set of frequencies (fe). • Ho : no difference between fo and fe • H1 : there is a difference between fo & fe • The critical value is a chi-square value with (k - 1) degrees of freedom, where k is the number of categories lilik for mm 14 Contoh : Penjualan Kaos Pemain Sepak Bola Pemain Owen Ronaldo Nesta Dida Becham Zidane TOTAL Jumlah Terjual (fo) 13 33 14 7 36 17 120 Jumlah yang diharapkan Terjual (fe) 20 20 20 20 20 20 120 lilik for mm 15 Cont.. Pemain fo fe Owen Ronaldo Vieri Buffon Becham Zidane 13 33 14 7 36 17 20 20 20 20 20 20 (fo – fe) (fo – -7 13 -6 -13 16 -3 0 lilik for mm fe)2 49 169 36 169 256 9 ( fo fe )2 fe 2,45 8,45 1,80 8,45 12,80 0,45 34,40 X 2 16 Goodness-of-Fit Test: Unequal Expected Frequencies • Contoh : Dosen mengharapkan distribusi nilai ujian : A = 40%, B = 40%, dan C = 20%. Hasil ujian menunjukkan distribusi nilai sebagai berikut : A : 30 orang B : 20 orang C : 10 orang Uji dengan level of significance 10%, apakah distribusi nilai tersebut sesuai dengan harapan dosen tersebut ? lilik for mm 17 Limitations of Chi-Square • If there are only two cells, the expected frequency in each cell should be 5 or more • For more than two cells, Chi-Square should not be used if more than 20% of the expected frequency cells have expected frequency less than 5. lilik for mm 18 Example Level of Management Foreman Supervisor Manager Middle Manager Assistant vice president Vice president Senior vice president TOTAL lilik for mm fo 30 110 86 23 5 5 4 263 fe 32 113 87 24 2 4 1 263 19 Level of Management Foreman Supervisor Manager Middle Manager Vice president TOTAL lilik for mm fo 30 110 86 23 14 263 fe 32 113 87 24 7 263 20 Goodness-of-Fit Test for Normality • Purpose: To test whether the observed frequencies in a frequency distribution match the theoretical normal distribution. • Procedure: – Determine the mean and standard deviation of the frequency distribution. – Compute the z-value for the lower class limit and the upper class limit for each class. – Determine fe for each category – Use the chi-square goodness-of-fit test to determine if fo coincides with fe. lilik for mm 21 EXAMPLE : Distribution of Salary Salary ($ 000) frequency 20 – 30 4 30 – 40 20 40 – 50 41 50 – 60 44 60 – 70 29 70 – 80 16 80 – 90 2 90 – 100 4 TOTAL 160 lilik for mm 54.03 13.76 22 Salary (S 000) Z Value Under 30 Under –1.75 Area fe 0.0401 6.416 30 – 40 -1.75 to -1.02 0.1138 18.208 40 – 50 -1.02 to -0.29 0.2320 37.120 50 – 60 -0.29 to 0.43 0.2805 44.880 60 – 70 0.43 to 1.16 0.2106 33.696 70 – 80 1.16 to 1.89 0.0936 14.976 80 or more over 1.89 0.0294 1.704 1 Z 160 x lilik for mm 23 Calculation for Chi-Square Salary (S 000) fo fe Under 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80 80 or more 4 20 41 44 29 16 6 160 6.416 18.208 37.120 44.880 33.696 14.976 1.704 160 2 ( f f ) fe)2 o f e e (fo – (fo – fe) -2.416 5.837 1.792 3.211 3.880 15.054 -0.880 0.774 -4.696 22.052 1.024 1.049 1.296 1.680 lilik for mm 0.910 0.176 0.406 0.017 0.654 0.070 0.357 2.590 X2 24 • Suppose we knew the mean and standard deviation of population but wished to find whether some sample data conform to the normal distribution, d.f. = k - 1 • On the other hand, if we don’t know the mean and standard deviation of population but we wish to test whether some sample data follow the normal distribution, d.f. = k – p – 1 (where p is the number of population parameter being estimated from the sample data) lilik for mm 25 Contingency Table Analysis • Contingency table analysis is used to test whether two traits or variables are related. Two-way classification table • Each observation is classified according to two variables. • d.f. : (number of rows-1)(number of columns-1). • The expected frequency (fe) is computed as: Row _ total Coloumn _ total fe Grand _ total 2 • Coefficient of Contingency : lilik for mm C X X2N 26 Contoh Manajer produksi meneliti tingkat kerusakan pada mesin produksi. Hasilnya pengamatan terhadap barang yang diproduksi sebagai berikut Kondisi Rusak Baik Mesin 1 12 Mesin 2 15 Mesin 3 6 88 105 74 Apakah kerusakan tersebut disebabkan mesin atau kebetulan saja ? Uji dengan = 0,05 lilik for mm 27 Contoh Lembaga riset meneliti apakah ada hubungan antara jenis surat kabar yang dibaca dengan kelompok masyarakat. Hasilnya sebagai berikut : Kelompok Atas Menengah Bawah A 170 120 130 Uji dengan = 0,1 Surat Kabar B C 124 90 112 100 90 88 lilik for mm 28