Uploaded by Mariam Khalil

Chapter 6 Chi-Square

advertisement
Chi-Square Test of
Independence
Definitions
• A chi-square test (denoted as πœ’2) is used to determine whether there is an association between two
categorical variables.
• In chi-square test we test the 𝐻0 that there is no significance association between the categorical
variables (independence) against the alternative 𝐻𝐴 that there is an association between the
categorical variables (dependent).
• In health research, a test of chi-square is frequently used to assess whether disease (present/absent) is
associated with exposure (yes/no). For example, does smoking effect fracture risk.
Assumptions
The assumptions that must be met when using a chi-square test are that:
1. Each observation must be independent.
2. Each participant is represented in the table once only
2 row and 2 columns
Crosstabulations
one thing on the left and one top
• The data for chi-square tests are summarized using crosstabulations.
• These tables are sometimes called frequency or contingency tables.
• Table 8.3 is called a 2 × 2 table because each variable has two levels and it can have larger dimensions
when either the exposure or the disease has more than two levels.
• In a contingency table, one variable (usually the exposure) forms the rows and the other variable
(usually the disease) forms the columns and it can be the opposite.
Crosstabulations
• it is important to identify which variable is the outcome variable and which variable is the explanatory
variable, to display the percentages that are appropriate for answering the research question. This can
be achieved by either:
A
• In case of arrangement A, use row percentages.
• In case of arrangement B, use column percentages.
B
Crosstabulations Examples
➒ A group of sports psychologists, the Westchester Academy of Sports Psychologists (WASPS), decide to
explore the relationship between exercise behavior and mood, and to investigate whether these
factors vary between men and women. WASPS recruit 100 participants, 45 men and 55 women.
A.
Two variables example:
• Variable(1): mood status – depressed vs. not depressed.
• Variable(2): Gender – male vs. female.
β–ͺ It is an example of a 2 * 2 cross-tabulation (mood
status vs gender).
β–ͺ The numbers within the cells of this table called
observed frequencies.
β–ͺ We say 21 depressed male and 39 depressed
female.
Crosstabulations Examples
➒ A group of sports psychologists, the Westchester Academy of Sports Psychologists (WASPS), decide to
explore the relationship between exercise behavior and mood, and to investigate whether these
factors vary between men and women. WASPS recruit 100 participants, 45 men and 55 women.
B.
Two variables example:
• Variable(1): mood status – depressed vs. not depressed.
• Variable(2): Exercise frequency – frequent, infrequent and none.
β–ͺ It is an example of a 2 * 3 cross-tabulation (mood
status vs exercise frequency) because there are
two rows and three columns.
Crosstabulations Examples
➒ A group of sports psychologists, the Westchester Academy of Sports Psychologists (WASPS), decide to
explore the relationship between exercise behavior and mood, and to investigate whether these
factors vary between men and women. WASPS recruit 100 participants, 45 men and 55 women.
C.
Three variables example:
• Variable(1): mood status – depressed vs. not depressed.
• Variable(2): Exercise frequency – frequent, infrequent and none.
• Variable(3): Gender – male vs. female.
β–ͺ It is an example of a 2 * 2 * 3 cross-tabulation
because there are two gender groups (male and
female), two depression groups (depressed and
not depressed) and three exercise frequency
groups (frequent, infrequent and none).
β–ͺ We say 6 depressed male do exercises frequently.
total 100
Expected Counts
• The observed count is the actual count in the sample and is shown in each cell of the crosstabulation.
• Expected counts are the frequencies in each cell of the crosstabulation if the null hypothesis is true
(aka, no association/relationship between the variables.)
• The expected count is the expected value due by chance alone and is calculated for each cell as the:
Expected Counts
• From the previous example (see slide 6), to obtain these expected frequencies, for each of the cells
shown as the following:
β–ͺ Depressed Male = E11 =
45 ∗60
100
β–ͺ Depressed Female = E12 =
E11
E21
E12
E22
= 27
55 ∗60
100
β–ͺ Non-depressed Male = E21 =
= 33
45 ∗40
=
100
β–ͺ Non-depressed Female = E22 =
18
55 ∗40
=
100
Now we can see a clearer picture of how observed and expected frequencies might be associated.
22
Calculating chi-square value
• The Pearson chi-square value is calculated by the following summation from all cells:
• Another phrase of the null hypothesis for a chi-square test is that there is no significant difference
between the observed frequencies and expected frequencies.
• We reject the null hypothesis if chi-square value from the test statistic exceed chi-square value from
the table or p-value < 0.05.
• If the observed and expected values are similar (small numbers), then the chi-square value will be close
to zero and therefore will not be significant.
Calculating chi-square value
• From the previous example (see slide 6). To calculate the chi-square value, we should first find the
expected counts for each cell inside crosstabulation (see slide 10)
β–ͺ πœ’2 =
=
(π‘‚π‘π‘ π‘’π‘Ÿπ‘£π‘’π‘‘ −𝐸π‘₯𝑝𝑒𝑐𝑑𝑒𝑑)2
σ
𝐸π‘₯𝑝𝑒𝑐𝑑𝑒𝑑
(21 −27)2
27
= 6.061
+
(39 −33)2
33
+
(24 −18)2
18
+
(16 −22)2
22
Fisher Exact Test
categorical variable
• Fisher's exact test is a statistical test used to determine if there are non-random associations between
two categorical variable (similar to chi-square test).
• It’s usually used for smaller sample size.
• If more than 20% of the cells in the cross-tabulation have an expected frequency of less than 5 or if any
cell count is less than 1, we should use Fisher’s exact test (but this is permitted only for 2 * 2 tables in
SPSS).
• When you run a Chi-square test on SPSS and the condition for fisher exact test is true, SPSS
automatically provides the Fisher’s exact test (in 2 * 2 tables) for us to refer to if the expected cell
counts are too low.
Fisher Exact Test Example
E11
E12
E21
E22
When we calculate the expected
counts we get:
β–ͺ E11 = 2.455
β–ͺ E12 = 6.545
β–ͺ E21 = 3.545
β–ͺ E22 = 9.455
Two cells have expected counts
less than 5, so this means we
cannot use Chi-square test.
Odd Ratio (OR)
• Odds ratio (OR) is a measure of association between exposure and an outcome.
• The OR represents the odds that an outcome will occur given a particular exposure, compared to the
odds of the outcome occurring in the absence of the exposure.
• Odds of an event happening is defined as the likelihood that an event will occur, expressed as a
proportion of the likelihood that the event will not occur.
• If A is the probability of subjects affected and B is probability of subjects not affected, then odds = A/B.
Odd Ratio (OR)
• If OR > 1 indicates increased occurrence of event by the presence of exposure (exposure and outcome
associated).
• If OR < 1 indicates decreased occurrence of event by the presence of exposure (exposure and outcome
associated).
• If OR = 1 exposure and outcome are independent.
Odd Ratio (OR)
Odd Ratio (OR) Example
In this example, we calculate the odd ratio from
Risk factor/exposure
Present (case)
the 2 * 2 cross-tabulation as following:
β–ͺ OR =
π‘œπ‘‘π‘‘π‘  π‘œπ‘“ π‘‘π‘’π‘π‘Ÿπ‘’π‘ π‘ π‘–π‘œπ‘› 𝑒π‘₯π‘π‘œπ‘ π‘’ π‘‘π‘œ 𝑒π‘₯π‘’π‘Ÿπ‘π‘–π‘ π‘’
π‘œπ‘‘π‘‘π‘  π‘œπ‘“ π‘‘π‘’π‘π‘Ÿπ‘’π‘ π‘ π‘–π‘œπ‘› π‘›π‘œπ‘‘ 𝑒π‘₯π‘π‘œπ‘ π‘’ π‘‘π‘œ 𝑒π‘₯π‘’π‘Ÿπ‘π‘–π‘ π‘’
β–ͺ OR =
22/23
25/7
=
22 ∗ 7
23 ∗ 25
= 0.268
OR = 0.268 < 1, this means decrease of
depression occurrence when people do exercise
(outcome and exposure are related).
Depression
Absent
(control)
With Exercise
22
23
No Exercise
25
7
Running Chi-square Test in SPSS
• Example:
Describe the effect of smoking on fracture risk.
β–ͺ Hypothesis:
𝐻0 : Smoking status doesn’t cause fracture risk (independent).
𝐻1 : Smoking status does cause fracture risk (dependent).
Running Chi-square Test in SPSS
1.
Define your categorical variables (Smoker –
Fractures) from Variable View -> Values.
2.
Go Analyze -> Descriptive Statistics ->
Crosstabs.
Running Chi-square Test in SPSS
3.
Add your exposure variable (Smoker) in
Column and outcome variable (Fractures) in
Row (you can do the opposite).
Running Chi-square Test in SPSS
4.
Click on Statistics and put (βœ“) on Chi-square
test.
5.
Click Continue.
Running Chi-square Test in SPSS
6.
Click on Cells and put (βœ“) on the following
options (Note: we added our exposure factor
in column so we take the percentage in
column).
7.
Click Continue.
Running Chi-square Test in SPSS
8.
Put (βœ“) on the Display clustered bar charts
to represent your data graphically.
9.
Click OK.
Running Chi-square Test in SPSS
β–ͺ Go to the Output window to view your analysis.
β–ͺ In the Crosstabulation (1st table), we see the
expected counts closer to the real counts.
β–ͺ In the Crosstabulation (1st table), 21% of nonsmokers have fracture risk WHILE 16% of smokers
have fracture risk (it must be the opposite if there
is a relationship).
β–ͺ The condition bellow Chi-Square tests table (2nd
table) is valid, so we use chi-square not fisher test.
β–ͺ Pearson’s Chi-Square p-value = 0.324 > 0.05, so we
do not reject 𝐻0 .
β–ͺ There is no significance effect between smoking on
fracture risk and they are independent.
Running Chi-square Test in SPSS
• To show data on the graph, double click on the
chart and Chart Editor window will open.
• Go to Elements -> Show Data Labels.
• Close Chart Editor window
Download