Chi-Square Test of Independence Definitions • A chi-square test (denoted as π2) is used to determine whether there is an association between two categorical variables. • In chi-square test we test the π»0 that there is no significance association between the categorical variables (independence) against the alternative π»π΄ that there is an association between the categorical variables (dependent). • In health research, a test of chi-square is frequently used to assess whether disease (present/absent) is associated with exposure (yes/no). For example, does smoking effect fracture risk. Assumptions The assumptions that must be met when using a chi-square test are that: 1. Each observation must be independent. 2. Each participant is represented in the table once only 2 row and 2 columns Crosstabulations one thing on the left and one top • The data for chi-square tests are summarized using crosstabulations. • These tables are sometimes called frequency or contingency tables. • Table 8.3 is called a 2 × 2 table because each variable has two levels and it can have larger dimensions when either the exposure or the disease has more than two levels. • In a contingency table, one variable (usually the exposure) forms the rows and the other variable (usually the disease) forms the columns and it can be the opposite. Crosstabulations • it is important to identify which variable is the outcome variable and which variable is the explanatory variable, to display the percentages that are appropriate for answering the research question. This can be achieved by either: A • In case of arrangement A, use row percentages. • In case of arrangement B, use column percentages. B Crosstabulations Examples β’ A group of sports psychologists, the Westchester Academy of Sports Psychologists (WASPS), decide to explore the relationship between exercise behavior and mood, and to investigate whether these factors vary between men and women. WASPS recruit 100 participants, 45 men and 55 women. A. Two variables example: • Variable(1): mood status – depressed vs. not depressed. • Variable(2): Gender – male vs. female. βͺ It is an example of a 2 * 2 cross-tabulation (mood status vs gender). βͺ The numbers within the cells of this table called observed frequencies. βͺ We say 21 depressed male and 39 depressed female. Crosstabulations Examples β’ A group of sports psychologists, the Westchester Academy of Sports Psychologists (WASPS), decide to explore the relationship between exercise behavior and mood, and to investigate whether these factors vary between men and women. WASPS recruit 100 participants, 45 men and 55 women. B. Two variables example: • Variable(1): mood status – depressed vs. not depressed. • Variable(2): Exercise frequency – frequent, infrequent and none. βͺ It is an example of a 2 * 3 cross-tabulation (mood status vs exercise frequency) because there are two rows and three columns. Crosstabulations Examples β’ A group of sports psychologists, the Westchester Academy of Sports Psychologists (WASPS), decide to explore the relationship between exercise behavior and mood, and to investigate whether these factors vary between men and women. WASPS recruit 100 participants, 45 men and 55 women. C. Three variables example: • Variable(1): mood status – depressed vs. not depressed. • Variable(2): Exercise frequency – frequent, infrequent and none. • Variable(3): Gender – male vs. female. βͺ It is an example of a 2 * 2 * 3 cross-tabulation because there are two gender groups (male and female), two depression groups (depressed and not depressed) and three exercise frequency groups (frequent, infrequent and none). βͺ We say 6 depressed male do exercises frequently. total 100 Expected Counts • The observed count is the actual count in the sample and is shown in each cell of the crosstabulation. • Expected counts are the frequencies in each cell of the crosstabulation if the null hypothesis is true (aka, no association/relationship between the variables.) • The expected count is the expected value due by chance alone and is calculated for each cell as the: Expected Counts • From the previous example (see slide 6), to obtain these expected frequencies, for each of the cells shown as the following: βͺ Depressed Male = E11 = 45 ∗60 100 βͺ Depressed Female = E12 = E11 E21 E12 E22 = 27 55 ∗60 100 βͺ Non-depressed Male = E21 = = 33 45 ∗40 = 100 βͺ Non-depressed Female = E22 = 18 55 ∗40 = 100 Now we can see a clearer picture of how observed and expected frequencies might be associated. 22 Calculating chi-square value • The Pearson chi-square value is calculated by the following summation from all cells: • Another phrase of the null hypothesis for a chi-square test is that there is no significant difference between the observed frequencies and expected frequencies. • We reject the null hypothesis if chi-square value from the test statistic exceed chi-square value from the table or p-value < 0.05. • If the observed and expected values are similar (small numbers), then the chi-square value will be close to zero and therefore will not be significant. Calculating chi-square value • From the previous example (see slide 6). To calculate the chi-square value, we should first find the expected counts for each cell inside crosstabulation (see slide 10) βͺ π2 = = (πππ πππ£ππ −πΈπ₯ππππ‘ππ)2 σ πΈπ₯ππππ‘ππ (21 −27)2 27 = 6.061 + (39 −33)2 33 + (24 −18)2 18 + (16 −22)2 22 Fisher Exact Test categorical variable • Fisher's exact test is a statistical test used to determine if there are non-random associations between two categorical variable (similar to chi-square test). • It’s usually used for smaller sample size. • If more than 20% of the cells in the cross-tabulation have an expected frequency of less than 5 or if any cell count is less than 1, we should use Fisher’s exact test (but this is permitted only for 2 * 2 tables in SPSS). • When you run a Chi-square test on SPSS and the condition for fisher exact test is true, SPSS automatically provides the Fisher’s exact test (in 2 * 2 tables) for us to refer to if the expected cell counts are too low. Fisher Exact Test Example E11 E12 E21 E22 When we calculate the expected counts we get: βͺ E11 = 2.455 βͺ E12 = 6.545 βͺ E21 = 3.545 βͺ E22 = 9.455 Two cells have expected counts less than 5, so this means we cannot use Chi-square test. Odd Ratio (OR) • Odds ratio (OR) is a measure of association between exposure and an outcome. • The OR represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of the exposure. • Odds of an event happening is defined as the likelihood that an event will occur, expressed as a proportion of the likelihood that the event will not occur. • If A is the probability of subjects affected and B is probability of subjects not affected, then odds = A/B. Odd Ratio (OR) • If OR > 1 indicates increased occurrence of event by the presence of exposure (exposure and outcome associated). • If OR < 1 indicates decreased occurrence of event by the presence of exposure (exposure and outcome associated). • If OR = 1 exposure and outcome are independent. Odd Ratio (OR) Odd Ratio (OR) Example In this example, we calculate the odd ratio from Risk factor/exposure Present (case) the 2 * 2 cross-tabulation as following: βͺ OR = ππππ ππ ππππππ π πππ ππ₯πππ π π‘π ππ₯πππππ π ππππ ππ ππππππ π πππ πππ‘ ππ₯πππ π π‘π ππ₯πππππ π βͺ OR = 22/23 25/7 = 22 ∗ 7 23 ∗ 25 = 0.268 OR = 0.268 < 1, this means decrease of depression occurrence when people do exercise (outcome and exposure are related). Depression Absent (control) With Exercise 22 23 No Exercise 25 7 Running Chi-square Test in SPSS • Example: Describe the effect of smoking on fracture risk. βͺ Hypothesis: π»0 : Smoking status doesn’t cause fracture risk (independent). π»1 : Smoking status does cause fracture risk (dependent). Running Chi-square Test in SPSS 1. Define your categorical variables (Smoker – Fractures) from Variable View -> Values. 2. Go Analyze -> Descriptive Statistics -> Crosstabs. Running Chi-square Test in SPSS 3. Add your exposure variable (Smoker) in Column and outcome variable (Fractures) in Row (you can do the opposite). Running Chi-square Test in SPSS 4. Click on Statistics and put (β) on Chi-square test. 5. Click Continue. Running Chi-square Test in SPSS 6. Click on Cells and put (β) on the following options (Note: we added our exposure factor in column so we take the percentage in column). 7. Click Continue. Running Chi-square Test in SPSS 8. Put (β) on the Display clustered bar charts to represent your data graphically. 9. Click OK. Running Chi-square Test in SPSS βͺ Go to the Output window to view your analysis. βͺ In the Crosstabulation (1st table), we see the expected counts closer to the real counts. βͺ In the Crosstabulation (1st table), 21% of nonsmokers have fracture risk WHILE 16% of smokers have fracture risk (it must be the opposite if there is a relationship). βͺ The condition bellow Chi-Square tests table (2nd table) is valid, so we use chi-square not fisher test. βͺ Pearson’s Chi-Square p-value = 0.324 > 0.05, so we do not reject π»0 . βͺ There is no significance effect between smoking on fracture risk and they are independent. Running Chi-square Test in SPSS • To show data on the graph, double click on the chart and Chart Editor window will open. • Go to Elements -> Show Data Labels. • Close Chart Editor window