Practical Statistics Chi-Square Statistics There are six statistics that will answer 90% of all questions! 1. 2. 3. 4. 5. 6. Descriptive Chi-square Z-tests Comparison of Means Correlation Regression Chi-square: Chi-square is a simple test for counts….. Which means: nominal data and… if some cases… Ordinal data Chi-square: There are three types: 1. Test for population variance 2. Test of “goodness-of-fit” 3. Contingency table analysis Which is essentially a measure of association! Chi-square: There are three types: 1. Test for population variance 2 n 1 S 2 2 Chi-square: There are three types: 1. Test for population variance 2. Test of “goodness-of-fit” (oi ei ) ei i 1 k 2 2 Where o = frequency of actual observation, and e = frequency you expected to find (oi ei ) ei i 1 k 2 2 Coin thrown 100 times: Expect (e): heads = 50, tails = 50 Observed (o): heads = 40, tails = 60 Is this a “fair” coin? According to marketing research, the clientele of a Monkey Shine Restaurant is made up of 30% Western businessmen, 30% women who stop in while shopping, 30% Chinese businessmen, and 10% tourists. A random sample of 600 customers at the Kowloon Monkey Shine found 150 Western businessmen, 190 Chinese businessmen, 100 tourists, and 65 women who were shopping. Is the clientele at this establishment different than the norm for this company? Type Percent Western Business Chinese Business Women Shoppers Tourist 30% Expected 600 Observed 600 180 150 180 190 180 160 60 100 30% 30% 10% (oi ei ) ei i 1 k 2 2 (180 150) (180 190) (180 160) (60 100) 180 180 180 60 2 2 2 = 5.00 + 0.56 + 2.22 + 26.67 = 34.45 With (4-1) degrees of freedom 2 The chi-square distribution is highly skewed and dependent upon how many degrees of freedom (df) a problems has. The chi-square for the restaurant problem was: Chi-square = 34.45, df = 3 By looking in a table, the critical value of Chi-square with df = 3 is 7.82. The probability that the researched frequency equals the frequency found in the MR project was p < .001. http://www.fourmilab.ch/rpkp/experiments/analysis/chiCalc.html By looking at the analysis, it is obvious that the largest contribution to chi-square came from the tourists. (180 150) (180 190) (180 160) (60 100) 180 180 180 60 2 2 2 = 5.00 + 0.56 + 2.22 + 26.67 = 34.45 df = 3 Hence, the Kowloon property is attracting more tourist than what would be expected at the Monkey Shine. 2 Chi-square: There are three types: 1. Test for population variance 2. Test of “goodness-of-fit” 3. Contingency table analysis (oi ei ) ei i 1 k 2 2 Where o = frequency of actual observation, and e = frequency you expected to find A contingency table is a table with numbers grouped by frequency. A contingency table is a table with numbers grouped by frequency. Consider a study: There are three groups: brand loyal customers, regular buyers, and occasional buyers. Each is asked if they like the taste of new product over the old. They answer with a “yes” or a “no.” A contingency YES table would look like this: NO Totals Loyal 50 40 90 Regular 60 40 100 Occasional 40 40 80 150 120 270 Total A contingency table is a table with numbers grouped by frequency. All the numbers in the table are “observed” frequencies (o). So, what are the expected values? The expected values (e) would be a random distribution of frequencies. YES NO Totals Loyal 50 40 90 Regular 60 40 100 Occasional 40 40 80 150 120 270 Total The expected values (e) would be a random distribution of frequencies. These can be calculated by multiplying the row frequency by the column frequency and dividing by the total number of observations. YES Loyal Regular Occasional Total 50 60 40 150 NO 40 40 40 120 Totals 90 100 80 270 For example, the expected values (e) of “loyal” and “yes” would be (150 X 90)/270 = 50 YES NO Totals Loyal 50 40 90 Regular Occasional 60 40 100 80 Total 150 40 40 120 270 For example, the expected values (e) of “regular” And “no” would be (120 X 100)/270 = 44.4 YES NO Totals Loyal 50 40 90 Regular Occasional 60 40 100 80 Total 150 40 40 120 270 The expected values (e) for the entire table would be: YES NO Loyal 50.0 40.0 90 Regular 55.6 44.4 100 Occasional 44.4 35.6 80 Total 150 120 270 Totals The chi-square value is calculated for every cell, and then summed over all the cells. YES NO Loyal 50.0 40.0 90 Regular 55.6 44.4 100 Occasional 44.4 35.6 80 Total 150 120 270 Totals The chi-square value is calculated for every cell: For Cell A: (50-50)^2/50 = 0 For Cell D: (40-44.4)^2/44.4 = 0.44 YES Loyal NO Totals A 50.0 40.0 90 Regular 55.6 D 44.4 100 Occasional 44.4 35.6 80 Total 150 120 270 The chi-square value is calculated for every cell: YES NO 0 0 Regular .36 .44 Occasional .44 .55 Loyal Total Totals The chi-square value is calculated for every cell: Chi-square = 0 + 0 + .35 + .44 + .44 + .54 = 1.77 The df = (r-1)(c-1) = 1 X 2 = 2 YES NO 0 0 Regular .35 .44 Occasional .44 .54 Loyal Total Totals A chi-square with a df = 2 has a critical value of 5.99, this chi-square = 1.77, so the results are nonsignificant. http://www.fourmilab.ch/rpkp/experiments/analysis/chiCalc.html The probability = 0.4127. This means that the distribution is random, and there is no association between customer type and taste preference. A chi-square with a df = 2 has a critical value of 5.99, this chi-square = 1.77, so the results are nonsignificant. This means that the distribution is random, and there is no association between customer type and taste preference. Note: This type of chi-square is a test of association using nothing but counts (frequency); VERY useful in business research. Service Encounter and Personality Normally, 60% of our shoppers are women. Is our sample correct? 0.6 X 271 = 163 women .4 X 271 = 109 men Service Encounter and Personality Do men and women shop at different times? Service Encounter and Personality Do men and women shop at different times?