Week 7 Chi-square READING Text reading Dancey and Reidy, 3rd edition, it is Chapter 8 In 2007 Fourth Edition its Chapter 9 CHAPTER OVERVIEW Earlier, you learned how to analyse the relationship between two continuous variables using Pearson's r. You also learned what was meant by a correlation coefficient, and that r is a measure of effect size. You have seen how to represent such relationships on scatterplots. In this chapter, we discuss the relationship between categorical variables. Chi-square (χ2 ) is a measure of the association between two categorical variables. You have learned about categorical variables in Chapter 1. If, for instance, we classify people into groups based on gender (i.e., males vs. females), and type of high school they have attended (public vs. private), these are examples of categorical variables. In the same way, if we classify people by ethnic group, religion, or the country in which they live, these are all categorical judgments. In this chapter then, you will learn how to: analyse the association between categorical variables report another measure of effect (Cramer's V) report the results of such analyses. But the same advice as given before applies here. It is important to get a general idea of how this statistic might be useful to you in the future. It is not important to try to memorise everything in that textbook. When you need to do this in real life, then you need that book. One final point is that quite often (in our experience with graduate students) we will use chi-square from raw data tallies, but not using SPSS. There are several Internet sites that we use when we do this. So if you ever find yourself doing chi-squares, email Greg Yates who can show you several other quick ways to compute direct from tallied data. KEY TERMS 1- Frequency counts 2- Categorical variables 3- Pearson’s chi-square 4- Phi 5- Cramer’s V 6- Lambda KEY POINTS 1- Categorical variables take a value that is one of several possible categories. 2- Categorical variables have no numerical meaning. Examples of categorical variables include hair color, gender, field of study, college attended, and political affiliation. 3- In a study asking respondents to identify themselves according to their field of study as physical education or early childhood education, each respondent will answer with exactly one of these categories. These are the values the variable takes. Physical education is not a variable but field of study is. 4- Often categorical variables are disguised as quantitative variables. For example, one might record gender information coded as Male = 0, and females = 1. 5- With frequency counts, the data are not scores, but they are the number of participants that falls into a certain category. χ2 is particularly appropriate for frequency count data. 6- χ2 is a measure of association or relationship between categorical variables and it was developed by Karl Pearson in 1900. It enables us to see whether the frequency counts, we obtain when asking participants which category they are in, are significantly different from the frequency counts we would expect by chance. 7- One-variable χ2 (goodness-of-fit test) is used when we have one variable only. 8- Independence χ2 (2x2) is used when we are looking for association between two variables, each with two categories. For example, the relationship between gender (males vs. females) and field of study (physical education vs. early childhood education). 9- Independent χ2 (r x 2) is used when we are looking for association between two variables, where one variable has two categories, gender (males vs. females), and the other variable has more than two categories such as field of study (physical education vs. early childhood education vs. curriculum design). 10- χ2 probability of .05 or less is commonly interpreted by social scientists as justification for rejecting the null hypothesis that the row variable is unrelated (that is, only randomly related) to the column variable. Put bluntly, there is a significant relationship between the two categorical variables. 11- Phi and Cramer’s V are measures of the strength of association between two categorical variables and each variable has only two categories. Phi is used with 2x2 contingency tables. If one of the two categorical variables contains more than two categories, then Cramer’s V is preferred to Phi because Phi fails to reach its minimum value of zero. 12- Lambda measures the proportional reduction in error that is achieved when membership of a category of one variable is used to predict category membership on the other variable. A lambda value of 1 means that one variable perfectly predicts the other; whereas a lambda value of 0 indicates that one variable does not predict the other. Web activities http://www.ruf.rice.edu/~lane/stat_sim/contingency/index.html This site supposedly provides a simulation of contingency tables, of which chi-square test is one type.. To be honest, we have played with it, but did not feel it helped us much. Chi-square calculators are available on Internet. Here are two of them, which might be useful sites if you need to use chi-square in your projects (Note: these pages are calculators, not demos, but you can play with them by inventing data to pop into the cells). http://www.quantitativeskills.com/sisa/statistics/twoby2.htm http://statpages.org/ctab2x2.html ACTIVE LEARNING AND OPPORTUNITIES Multiple Choice Questions 1- A one-variable χ2 is also called ABCD- Goodness-of-fit test χ2 test of independence χ2 4x2 χ2 2x2 2- The value of χ2 will always be ABCD- Positive Negative High It depends 3- The Yates’ correction is sometimes used by researchers when: ABCD- Cell sizes are huge Cell sizes are small They analyse data from 2 x 2 contingency tables Both b and c 4- If you are performing a 4 x 4 χ2 analysis and find you have violated the assumptions, then you need to: ABCD- Look at the results for a Fisher’s exact probability test Look to see whether it is possible to collapse categories Investigate the possibility of T.Test Disregard the assumption that you have violated. 5- Which of the below statements is false of chi square testing? A- Chi square tests can be used to check how well a theoretical model fits the data B- Chi square can be applied to continuous variables; it just means that a larger contingency table is needed. C- Chi square is used in research to measure the association between two categorical variables. Questions 6- are based on the output shown in the following tables gender * academic background Crosstabulation Count gender acdemic background 0 1 28 17 33 22 61 39 0 1 Total Total 45 55 100 Chi-Square Tests Pears on Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test Linear-by-Linear Ass ociation N of Val id Cas es Value 15.300b 15.300 15.300 df 1 1 1 15.300 Asym p. Si g. (2-s ided) .001 .001 .001 1 Exact Si g. (2-s ided) Exact Si g. (1-s ided) .001 .000 .001 100 a. Com puted onl y for a 2x2 table b. 0 cells (.0%) have expected count les s than 5. The m inim um expected count is 17.55. Symmetric Measures Nom i nal by Nom i nal Phi Cram er's V N of Val id Cas es Value .366 .366 100 Approx. Si g. .010 .010 a. Not as s um ing the null hypothes is. b. Usi ng the as ym ptotic standard error as sum ing the nul l hypothesis . 1- How many students is the total sample of the analysis? 2- Have the analysis violated the assumption of expected count less than 5? Explain? 3- Is there significant relationship between gender and academic background? Explain? 4- What is the strength of the relationship between gender and academic background if any? Is it significant? Why?