Fundamental Statistics in Applied Linguistics Research Spring 2010 Weekend MA Program on Applied English Dr. Da-Fu Huang 7. Finding group differences with Chi-Square when all variables are categorical 7.1 Two main uses of the chi-square test Test for goodness of fit of the data Test for group independence 7. Finding group differences with Chi-Square 7.2 Test for goodness of fit Only one categorical variable with 2 or more levels of choices Whether observed frequencies match expected frequencies if every chance were equally likely Measure how good the fit is to the probabilities that we expect Desired Foreign Language at one university (χ2 = 8.2, p = .09, df = 4) Chinese 23 Spanish 20 French 15 German 13 Japanese 29 7. Finding group differences with Chi-Square 7.3 Test for group independence 2 or more categorical variables with 2 or more levels of choices Whether there is any association between the variables Desired Foreign Language at two universities Language Chinese Spanish French German Japanese Total HT U BC U Total 23 14 37 20 25 45 15 10 25 13 26 39 29 25 54 100 100 200 7. Finding group differences with Chi-Square 7.3 Test for group independence Observed and expected frequencies for the foreign language survey Chinese Observed frequencies HT U 23 BC U 14 Total 37 Spanish 20 25 45 French German 15 10 25 13 26 39 Japanese Total 29 25 54 100 100 200 Expected frequencies HT U (100*37)/200 (100*45)/200 (100*25)/200 (100*39)/200 (100*54)/200 BC U (100*37)/200 (100*45)/200 (100*25)/200 (100*39)/200 (100*54)/200 HT U 18.5 22.5 12.5 18.5 27 BC U 18.5 22.5 12.5 18.5 27 χ2 = Σ [(O – E)2 / E] ( = 8.374, p = .07, df = 4 ) (df = # levels – 1) 7. Finding group differences with Chi-Square 7.4 Situations that look like Chi-square but are not Scenario #1: Case study, only one participant The binomial test Scenario #2: Binary choice, only one variable with exactly 2 levels The binomial test Scenario #3: Matched pairs with categorical outcome The McNemar test Scenario #4: Summary over a number of similar items by the same participants Application activities (8.1.4): PP215-216 7. Finding group differences with Chi-Square 7.5 Data inspection: Tables and Crosstabs 7.5.1 Summary tables for goodness-of-fit data Analyze > Descriptive Statistics > Frequencies Student English proficiency Cumulative Frequency Valid Percent Valid Percent Percent Low 9 16.7 16.7 16.7 Mid 26 48.1 48.1 64.8 High 19 35.2 35.2 100.0 Total 54 100.0 100.0 7. Finding group differences with Chi-Square 7.5 Data inspection: Tables and Crosstabs 7.5.1 Summary tables for goodness-of-fit data Analyze > Descriptive Statistics > Frequencies Student English proficiency Cumulative Frequency Valid Percent Valid Percent Percent Low 9 16.7 16.7 16.7 Mid 26 48.1 48.1 64.8 High 19 35.2 35.2 100.0 Total 54 100.0 100.0 7. Finding group differences with Chi-Square 7.5 Data inspection: Tables and Crosstabs 7.5.2 Summary tables for group-independence data (crosstabs) Analyze > Descriptive Statistics > Crosstabs Move variables into Row, Column, and Layer (when more than 2 variables) Student English proficiency * Major1 Crosstabulation Count Major1 non-English majors Student English proficiency High English majors Total 9 4 13 Mid 25 4 29 Low 12 0 12 Total 46 8 54 7. Finding group differences with Chi-Square 7.5 Data inspection: Tables and Crosstabs 7.5.3 Bar plots with one and two categorical variables Graphs > Legacy Dialogs > Bar With one variable, choose Simple, and Summaries For Groups Of Cases With 2 variables, choose Clustered, and Summaries For Groups Of Cases. Put the variables in “Category Axis” and “Define clusters by” boxes Bar plots with one categorical variable Bar plots with two categorical variables Bar plots with two categorical variables 7. Finding group differences with Chi-Square 7.6 Assumptions of Chi-Square (PP226-228) Independence of observations (no repeated measures) Nominal data (no inherent rank or order) Data are normally distributed (there are at least 5 cases in every cell) Non-occurrences must be included as well as occurrences 7. Finding group differences with Chi-Square 7.7 Chi-Square statistic test 7.7.1 One-way goodness-of-fit Chi-Square in SPSS Analyze > Nonparametric Tests > Chi-Square Put variable in “Test Variable List” box Student English proficiency Test Statistics Student English Observed N Expected N Residual proficiency Chi-Square a 8.111 df Asymp. Sig. 2 .017 Low 9 18.0 -9.0 Mid 26 18.0 8.0 High 19 18.0 1.0 Total 54 a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 18.0. 7. Finding group differences with Chi-Square 7.7 Chi-Square statistic test 7.7.2 Two-way group-independence Chi-Square in SPSS Analyze > Descriptive Statistics > Crosstabs Tick “Display clustered bar charts” box for a bar plot Open Statistics and tick “Chi-Square” and “Phi and Cramer’s V” boxes Open Cells and tick “Expected values” and all of the boxes under “Percentages” Chi-Square statistic test (Two-way group-independence ) Test Statistics Chi-Square df Asymp. Sig. Student English students from proficiency different colleges a 8.111 16.000 2 4 .017 .003 a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 18.0. b. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 10.8. b Chi-Square statistic test (Two-way group-independence ) Alternative to Pearson Chi-Square. Should be equivalent to the Chi-Square when sample sizes are large. Chi-Square Tests Asymp. Sig. Value df (2-sided) Pearson Chi-Square 10.431 a 8 .236 Likelihood Ratio 10.737 8 .217 2.182 1 .140 Linear-by-Linear Association N of Valid Cases 54 a. 11 cells (73.3%) have expected count less than 5. The minimum expected count is .33. Assuming that the variables are ordinal. Report this if your variables have inherent rank Chi-Square statistic test (Two-way group-independence ) Symmetric Measures Value Nominal by Nominal Approx. Sig. Phi .440 .236 Cramer's V .311 .236 N of Valid Cases 54 Measures of effect size for the chi-square Phi (2 x 2 contingency tables with 2 levels /var) Cramer’s V (larger than 2 x 2 with more than 2 levels/var) Measures of effect size for the chi-square Phi (2 x 2 contingency tables with 2 levels /var) Cramer’s V (larger than 2 x 2 with more than 2 levels/var) w = phi (2x2 tables); = V √r-1 ( >2 levels ) (V = Cramer’s V; r = the # of rows or columns whichever is smaller) Odds ratio (= N11*N22 / N12*N21) Table subscripts N11 N12 N21 N22 Reporting Chi-square test results Contingency table with a summary of data and statistical results Chi-square value Df P-value Effect size (for test for group independence) Phi, Cramer’s V, w, or odds ratio Example reporting (P239) Contingency table (2 X 3) Student English proficiency * Major1 Crosstabulation Count Major1 non-English majors Student English proficiency English majors Total High 9 4 13 Mid 25 4 29 Low 12 0 12 Total 46 8 54 Application activities 8.5.3 (P234) with Chi-Square in SPSS