Chi-Square

advertisement
SPSS Session 5:
Association between Nominal
Variables Using Chi-Square Statistic
Learning Objectives
• Review previous statistical tests and how they
relate to the levels of measurement for a
variable
• Describe how a Chi-Square test uses cross
tabulation
• Using a table, calculate a Chi-Square statistic
• Conduct a Chi-Square analysis and SPSS and
interpret the findings
Statistical Tests Review
• t-tests
– Mean differences between two groups
– Uses a t-test statistic
– One nominal variable and one interval/ratio variable
• ANOVA (Analysis of Variance)
– Mean differences between three or more groups
– Uses a F-test statistic
– One nominal variable and one interval/ratio variable
• Correlation
– Test of association between two interval/ratio variables
– Uses Pearson’s r-values as a correlation coefficient
– Tells you the magnitude, direction, and statistical significance of the
relationship
• Regression
– Test of prediction between two interval/ratio variables
– Uses a F-test statistic
– Helps create a useful equation for prediction of the value in one variable
from another
Chi-Square (χ2) Test
• The Chi-Square (χ2) test offers a test of
association between two nominal/ordinal
variables
• Social work examples:
– Gender and whether or not an individual reports
having elevated distress scores or not
– Gender and either normal, borderline, or abnormal
scores on a measure
– Previous involvement in child protection services and
whether or not an individual reports having elevated
distress scores or not
Cross Tabulation Example 1
• The Chi-Square (χ2) test is best shown
through the use of a cross tabulation table
• Take for example a table that displays the
relationship between gender of the
parent/carer and whether or not they
reported having elevated distress scores on
the GHQ measure
• This will be a 2x2 table with each variable
having two categories
We can see the following from this table:
• 6 men reported normal levels of distress (subclinical) on the GHQ measure
• 8 men reported clinically elevated levels of distress on the GHQ measure
• 47 women reported normal levels of distress (subclinical) on the GHQ measure
• 34 women reported clinically elevated levels of distress on the GHQ measure
• There were 14 men and 81 women in the sample
• 53 individuals reported normal levels of distress (subclinical) on the GHQ
measure
• 42 individuals reported clinically elevated levels of distress on the GHQ measure
• 95 individuals were in the sample
Cross Tabulation Example 2
• A second example is the relationship
between the degree of previous
involvement in child protections services
and whether or not they reported having
elevated distress scores on the GHQ
measure
• This will be a 3x2 table with degree of
previous involvement having three
categories
Cross Tabulation Example 2
The Chi-Square (χ2) Test Process
• The χ2 Test is designed to test for differences
between the what was observed and what
you would expect if there is no association
between the variables
• Observed numbers in each category in your
table
• Expected numbers is the arrangement of the
cases in the table if all of the cases were
spread evenly across all the categories in the
table
Expected vs. Observed Values
• Expected values are those in each category if
there was no association between your two
variables.
• Expected values are your null hypothesis, that
is no association between the variables
• Let’s look at an example
Chi-Square Example 1
• In our child protection study, we later
collected information about if cases were
referred for child protection services within a
year of their closure.
• Cases were viewed as successfully closed if
there was no later re-referral for additional
child protection services.
• This information in the “Case_Referral”
variable has two categories and is a nominal
variable.
Chi-Square Example 1
• We wanted to know if there was an
association between whether a parent/carer
reported elevated levels of psychological
distress and later re-referral for additional
child protection services.
• We would expect that these variables would
be associated (research hypothesis).
• Our null hypothesis would be that no
significant association exists.
Chi-Square Example 1: Observed Values
• Here is the cross tabulation table from this
analysis.
• These are our observed values
Chi-Square Example 1: Observed Values
• If the null hypothesis was true and no
association between these variables existed,
what would the numbers in the table be?
Chi-Square Example 1: Expected Values
• If the null hypothesis was true and no
association between these variables existed,
these would be the expected values:
27
22
26
20
Chi-Square Example 1: Expected Values
• For each category, the numbers are evenly
spread across the table as though the null
hypothesis was true and no association between
the variables exists.
27
22
26
20
Expected Values in Chi-Square
• Expected values for each category are
calculated by this formula:
( R)(C)
E
N
E = Expected Frequency
R = Marginal Row total
C = Marginal Column total
N = Total number of cases
27.3 25.7
21.7 20.3
Marginal Totals
Calculating Expected Values
27.3 25.7
21.7 20.3
Marginal Totals
Expected values
Subclinical scores and no referral = (53 x 49)/95 = 27.3
Subclinical scores and referral = (53 x 46)/95 = 25.7
Clinical score and no referral
= (42 x 49)/95 = 21.7
Clinical score and referral
= (42 x 46)/95 = 20.3
Chi-Square Statistic
• The Chi-Square (χ2) assesses the size of the
differences between the observed and
expected values in a cross tabulation table
(O  E )
 
E
2
2
Observed values (the actual data)
Expected values (if null hypothesis is true)
27.3
21.7
25.7
20.3
Chi-Square Statistic Calculated
(O  E )
 
E
2
2
(33-27.3)2 + (20-25.7)2 + (26-20.3)2 + (16-21.7)2
27.3
25.7
20.3
21.7
32.49
27.3
+ 32.49
25.7
+ 32.49
20.3
+ 32.49
21.7
1.19 +
1.264
+ 1.6
+ 1.497
X 2 = 5.55
Chi-Square
2
(χ )
Example 1
• We wanted to know if there was an association
between whether a parent/carer reported
elevated levels of psychological distress and later
re-referral for additional child protection services.
• We would expect that these variables would be
associated (research hypothesis).
• Our null hypothesis would be that no significant
association exists.
• Our χ2= 5.55 is an approximate calculation.
• To determine if it is statistically significant and not
due to chance, let’s turn to SPSS
Chi-Square (χ2) Example 1: SPSS
• From the “Analyze” menu, select “Descriptive
Statistics”, and finally “Crosstabs”
Chi-Square (χ2) Example 1: SPSS
• Find “GHQ_Cutoff_4” variable which is our
variable indicating whether a parent or carer
reported clinically elevated GHQ scores
• Place this variable in the “Row(s)” list
• Find “Case_Referral” variable which indicates
whether a case was referred later for additional
child protection services
• Place this variable in the “Column(s)” list
Chi-Square (χ2) Example 1: SPSS
Chi-Square (χ2) Example 1: SPSS
• Within the “Statistics” menu, select “Chi-Square”
• Press “Continue”
Chi-Square (χ2) Example 1: SPSS
• Within the “Cells” menu, there are plenty of
options:
Chi-Square (χ2) Example 1: SPSS
• Within the “Cells” menu, there are plenty of
options:
– Counts: will provide the observed and expected
values for the formula we have seen previously
– Percentages: will provide all of the percentages per
category, marginal totals, or grand total in the
analysis which can be useful
• For our analysis, just select “Observed” and
“Expected” counts for now.
• Press “Continue” and then “OK” to conduct
analysis
Chi-Square (χ2) Example 1: SPSS
• From the new output, we see the new cross
tabulation table with the observed and expected
values
Chi-Square (χ2) Example 1: SPSS
• Finally, we look to the top of the last table for our
significance value and the SPSS calculated χ2 value
• The χ2 value is 5.480, and the significance level is
p<.05. It was calculated to be lower than our hand
calculation due to rounding error.
• There is a statistically significant association
between these two nominal/ordinal variables.
• We would reject the null hypothesis.
Chi-Square (χ2) Example 1: SPSS
• Whether a parent/carer reported elevated GHQ
scores was associated with future referrals to
additional child protection services (χ2 =5.480,
df=1, p<.05).
• These variables are associated but interpreting
the Chi-Square test requires a visual inspection
to know where exactly our observed values were
different than our expected values.
• Look the cross tabulation table, and compare observed
and expected values per category
• New referrals appeared to be less common in the group of
parents/carers reporting lower GHQ scores.
• New referrals appeared much more in group with elevated
GHQ scores.
• With these findings and a statistically significant χ2 value,
we can assume parent/carer psychological distress was
associated with future need for additional child protection
services!
Chi-Square (χ2) Example 2
• Now let’s consider the Chi-Square analysis
example from earlier.
• We wanted to know if the gender of the
respondent (parent or carer) was associated with
their reporting of elevated GHQ scores or not.
Chi-Square (χ2) Example 2: SPSS
• Suppose for a moment that we have no reason
to suspect that gender is associated with
psychological distress.
• Our research hypothesis then would actually be
the null hypothesis!
• We would hypothesize that gender and elevated
psychological distress scores are unrelated.
• To demonstrate our research hypothesis, we
would hope to fail to reject the null hypothesis.
Chi-Square (χ2) Example 2: SPSS
• From the “Analyze” menu, select “Descriptive
Statistics”, and finally “Crosstabs”
Chi-Square (χ2) Example 2: SPSS
• Find “Gender_Respondent” is the gender of the
parent or carer who responded to the family
questionnaire
• Place this variable in the “Row(s)” list
• Find “GHQ_Cutoff_4” variable which is our
variable indicating whether a parent or carer
reported clinically elevated GHQ scores
• Place this variable in the “Column(s)” list
Chi-Square (χ2) Example 2: SPSS
Chi-Square (χ2) Example 2: SPSS
• Within the “Statistics” menu, select “Chi-Square”
• Press “Continue”
• Within the “Cells” menu, select “Observed” and
“Expected”, and then “Continue”.
• Press “Continue” and then “OK” to conduct analysis
• Below is the cross tabulation between the two
variables.
• Note that the observed values are rather close to
the expected values.
• This may indicate that there is not much of an
association between the variables.
Chi-Square (χ2) Example 2: SPSS
• The last table confirms our research hypothesis
which was the null hypothesis on this occasion.
• Gender and reporting of elevated GHQ scores do
not have a statistically significant association in
our study (χ2= 1.113, df=1, p>.05). The obtained
p-value in this analysis was .291, well above our
standard of α=.05.
Download