Categorical Data

advertisement
Chi-Square
Adv. Experimental
Methods & Statistics
PSYC 4310 / COGS 6310
Michael J. Kalsher
Department of
Cognitive Science
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2013, Michael Kalsher
1 of 27
Outline
• Review of Important Definitions and Concepts
• Chi-Square Tests
• Goodness of Fit
• Test of Independence
• Sample Problems
2 of 27
What are Non-parametric Statistics?
Methods of analyzing data that examine the
relative position or rank of the data rather than
the actual values.
Non-parametric statistics do not:
• assume that the data come from a normal
distribution.
• create any parameter estimates (e.g., means;
standard deviations) to assess whether one set
of numbers is statistically different from another
set of numbers.
3 of 27
Chi-Square Goodness of Fit
This test is used to evaluate questions concerning
the probabilities associated with each value of a
variable by comparing an observed frequency
distribution to an expected frequency distribution.
It’s most often used when a person has the
observed frequencies for several mutuallyexclusive categories and wants to decide if they
have occurred equally frequently.
4 of 27
Chi-Square Goodness of Fit:
Test Assumptions
1. Random and independent sampling.
2. Sample size must be sufficiently large
3. Values of the variable are mutually exclusive
and exhaustive. Every subject must fall in
only one category.
Note: If these values are not met, the critical values
in the chi-square table are not necessarily correct.
5 of 27
Chi-Square Goodness of Fit:
Computing by hand
Note: df = (k - 1)
6 of 27
Critical Values for the Chi-Square Test
7 of 27
Chi-Square Test of Independence
This test is used to examine whether two or more
variables are related based on information about
probabilities. It assesses whether observed
frequencies of events differ from those that would be
expected by chance.
One common use is to determine whether there is
an association between two independent variables.
8 of 27
The Chi-Square Test of Independence:
Computing by hand
Note: df = (k - 1)(q - 1)
9 of 27
Critical Values for the Chi-Square Test
10 of 27
Chi-Square Example Using SPSS
A researcher is interested in whether cats can be trained
to line dance. He recruits 200 cats and then tries to train
them to line dance by giving them either food or affection
as a reward for “dance-like” behavior. At the end of the
week he counts how many of the cats could line dance
and how many cannot.
We have two categorical variables: Training (food vs.
affection) and Dance (each cat learned to dance or it did
not).
Open cat.sav
11 of 27
12 of 27
13 of 27
14 of 27
15 of 27
16 of 27
How big is the effect?: Cramer’s V
Cramer’s V = .36, p<.01.
.36 out of 1 = a medium association between type of
training and whether the cats dance.
Can be viewed like a correlation coefficient. The
significance level indicates it is highly unlikely the
observed pattern of data is due to chance.
17 of 27
How big is the effect?: Odds Ratio
1. Calculate odds that a cat danced given they had food as a reward.
Odds(dancing after food) = number that had food and danced = 28/10
number that had food but didn’t dance
= 2.8
2. Calculate odds a cat danced given they had affection as a reward.
Odds(dancing after affection) = number that had affection and danced = 48/114
number that had affection but didn’t dance
= 0.421
3. Calculate odds ratio. Odds(dancing after food)
Odds(dancing after affection)
= 2.8 / 0.421 = 6.65
There was a significant association between the type of training and whether
or not cats would dance, 2(1)= 25.36, p<.001. This seems to represent the
fact that, based on the odds ratio, the odds of cats dancing were 6.65 times
higher if they were trained with food than if trained with affection.
18 of 27
Alternative Method of Data Entry
19 of 27
20 of 27
This process tells the computer that it should weight each category
combination by the number in the column labeled Frequency. So, for
example, the computer “pretends” there are 28 rows of data that have
the category combination 0,0, representing cats trained with food and
that danced).
21 of 27
22 of 27
Chi-Square Example:
Computing expected frequencies for hand-computation
A researcher wants to know if there is a significant difference in the
frequencies with which males come from small, medium, or large
cities as contrasted with females. The two variables we are
considering here are hometown size (small, medium, or large) and
sex (male or female). Another way of putting our research question is:
Is gender independent of size of hometown?
The data for 30 females and 6 males is in the following table.
Frequency with which males and females come from small, medium, and large cities
Small
Medium
Large
Totals
Female
10
14
6
30
Male
4
1
1
6
Totals
14
15
7
36
23 of 27
The formula for chi-square is:
Where:
O is the observed frequency, and
E is the expected frequency.
The degrees of freedom for the 2-D chi-square statistic is:
df = (Columns - 1) x (Rows - 1)
24 of 27
Computing Expected Frequencies
Frequency with which males and females come from small, medium, and large cities
Small
Medium
Large
Totals
Female
10
14
6
30
Male
4
1
1
6
Totals
14
15
7
36
Expected Frequency for each Cell:
The cell’s Column Total X the cell’s Row Total / Grand Total
In our example:
Column Totals are 14 (small), 15 (medium), and 7 (large).
Row Totals are 30 (female) and 6 (male).
Grand total is 36.
25 of 27
Computing Expected Frequencies
Frequency with which males and females come from small, medium, and large cities
Small
Medium
Large
Totals
Female
10
14
6
30
Male
4
1
1
6
Totals
14
15
7
36
The expected frequency:
1. Small female cell:
14 X 30 / 36 = 11.667
2. Medium female cell: 15 X 30 / 36 = 12.500
3. Large female cell:
7 X 30 / 36 = 5.833
4. Small male cell:
14 X 6 / 36 = 2.333
5. Medium male cell:
15 X 6 / 36 = 2.500
6. Large male cell:
7 X 6 / 36 = 1.167
26 of 27
Observed frequencies, expected frequencies, and (O - E)2/E for males and females from small,
medium, and large cities
Small
Observed Expected
Medium
Large
Totals
(O(O(OObserved Expected
Observed Expected
2
2
E) /E
E) /E
E)2/E
Female 10
11.667
0.238 14
12.500
0.180 6
5.833
0.005 30
Male 4
2.333
1.191 1
2.500
0.900 1
1.167
0.024 6
Totals 14
15
7
36
27 of 27
Download