Fundamental Statistics in Applied Linguistics Research Spring 2010

advertisement
Fundamental Statistics in
Applied Linguistics Research
Spring 2010
Weekend MA Program on Applied English
Dr. Da-Fu Huang
7. Finding group differences with Chi-Square
when all variables are categorical
7.1 Two main uses of the chi-square test
Test for goodness of fit of the data
Test for group independence
7. Finding group differences with Chi-Square
7.2 Test for goodness of fit
Only one categorical variable with 2 or more levels
of choices
Whether observed frequencies match expected
frequencies if every chance were equally likely
Measure how good the fit is to the probabilities that
we expect
Desired Foreign Language at one university (χ2 = 8.2, p = .09, df = 4)
Chinese
23
Spanish
20
French
15
German
13
Japanese
29
7. Finding group differences with Chi-Square
7.3 Test for group independence
2 or more categorical variables with 2 or more levels
of choices
Whether there is any association between the
variables
Desired Foreign Language at two universities
Language
Chinese Spanish French German Japanese Total
HT U
BC U
Total
23
14
37
20
25
45
15
10
25
13
26
39
29
25
54
100
100
200
7. Finding group differences with Chi-Square
7.3 Test for group independence
Observed and expected frequencies for the foreign language survey
Chinese
Observed frequencies
HT U
23
BC U
14
Total
37
Spanish
20
25
45
French
German
15
10
25
13
26
39
Japanese Total
29
25
54
100
100
200
Expected frequencies
HT U (100*37)/200 (100*45)/200 (100*25)/200 (100*39)/200 (100*54)/200
BC U (100*37)/200 (100*45)/200 (100*25)/200 (100*39)/200 (100*54)/200
HT U
18.5
22.5
12.5
18.5
27
BC U
18.5
22.5
12.5
18.5
27
χ2 = Σ [(O – E)2 / E] ( = 8.374, p = .07, df = 4 ) (df
= # levels – 1)
7. Finding group differences with Chi-Square
7.4 Situations that look like Chi-square but are not
Scenario #1: Case study, only one participant
The binomial test
Scenario #2: Binary choice, only one variable
with exactly 2 levels
The binomial test
Scenario #3: Matched pairs with categorical
outcome
The McNemar test
Scenario #4: Summary over a number of similar
items by the same participants
Application activities (8.1.4): PP215-216
7. Finding group differences with Chi-Square
7.5 Data inspection: Tables and Crosstabs
7.5.1 Summary tables for goodness-of-fit data
 Analyze > Descriptive Statistics > Frequencies
Student English proficiency
Cumulative
Frequency
Valid
Percent
Valid Percent
Percent
Low
9
16.7
16.7
16.7
Mid
26
48.1
48.1
64.8
High
19
35.2
35.2
100.0
Total
54
100.0
100.0
7. Finding group differences with Chi-Square
7.5 Data inspection: Tables and Crosstabs
7.5.1 Summary tables for goodness-of-fit data
 Analyze > Descriptive Statistics > Frequencies
Student English proficiency
Cumulative
Frequency
Valid
Percent
Valid Percent
Percent
Low
9
16.7
16.7
16.7
Mid
26
48.1
48.1
64.8
High
19
35.2
35.2
100.0
Total
54
100.0
100.0
7. Finding group differences with Chi-Square
7.5 Data inspection: Tables and Crosstabs
7.5.2 Summary tables for group-independence data
(crosstabs)
 Analyze > Descriptive Statistics > Crosstabs
 Move variables into Row, Column, and Layer (when
more than 2 variables)
Student English proficiency * Major1 Crosstabulation
Count
Major1
non-English
majors
Student English proficiency
High
English majors
Total
9
4
13
Mid
25
4
29
Low
12
0
12
Total
46
8
54
7. Finding group differences with Chi-Square
7.5 Data inspection: Tables and Crosstabs
7.5.3 Bar plots with one and two categorical
variables
 Graphs > Legacy Dialogs > Bar
 With one variable, choose Simple, and
Summaries For Groups Of Cases
With 2 variables, choose Clustered, and
Summaries For Groups Of Cases. Put the
variables in “Category Axis” and “Define
clusters by” boxes
Bar plots with one categorical variable
Bar plots with two categorical variables
Bar plots with two categorical variables
7. Finding group differences with Chi-Square
7.6 Assumptions of Chi-Square (PP226-228)
Independence of observations (no repeated
measures)
Nominal data (no inherent rank or order)
Data are normally distributed (there are at least
5 cases in every cell)
Non-occurrences must be included as well as
occurrences
7. Finding group differences with Chi-Square
7.7 Chi-Square statistic test
7.7.1 One-way goodness-of-fit Chi-Square in SPSS
 Analyze > Nonparametric Tests > Chi-Square
 Put variable in “Test Variable List” box
Student English proficiency
Test Statistics
Student English
Observed N Expected N Residual
proficiency
Chi-Square
a
8.111
df
Asymp. Sig.
2
.017
Low
9
18.0
-9.0
Mid
26
18.0
8.0
High
19
18.0
1.0
Total
54
a. 0 cells (.0%) have expected
frequencies less than 5. The
minimum expected cell
frequency is 18.0.
7. Finding group differences with Chi-Square
7.7 Chi-Square statistic test
7.7.2 Two-way group-independence Chi-Square in SPSS
 Analyze > Descriptive Statistics > Crosstabs
 Tick “Display clustered bar charts” box for a bar plot
 Open Statistics and tick “Chi-Square” and “Phi and
Cramer’s V” boxes
 Open Cells and tick “Expected values” and all of the
boxes under “Percentages”
Chi-Square statistic test
(Two-way group-independence )
Test Statistics
Chi-Square
df
Asymp. Sig.
Student English
students from
proficiency
different colleges
a
8.111
16.000
2
4
.017
.003
a. 0 cells (.0%) have expected frequencies less
than 5. The minimum expected cell frequency is
18.0.
b. 0 cells (.0%) have expected frequencies less
than 5. The minimum expected cell frequency is
10.8.
b
Chi-Square statistic test
(Two-way group-independence )
Alternative to Pearson Chi-Square. Should be equivalent to
the Chi-Square when sample sizes are large.
Chi-Square Tests
Asymp. Sig.
Value
df
(2-sided)
Pearson Chi-Square
10.431
a
8
.236
Likelihood Ratio
10.737
8
.217
2.182
1
.140
Linear-by-Linear Association
N of Valid Cases
54
a. 11 cells (73.3%) have expected count less than 5. The minimum
expected count is .33.
Assuming that the variables are ordinal.
Report this if your variables have inherent
rank
Chi-Square statistic test
(Two-way group-independence )
Symmetric Measures
Value
Nominal by Nominal
Approx. Sig.
Phi
.440
.236
Cramer's V
.311
.236
N of Valid Cases
54
Measures of effect size for the chi-square
 Phi (2 x 2 contingency tables with 2 levels /var)
 Cramer’s V
(larger than 2 x 2 with more than 2 levels/var)
Measures of effect size for the chi-square
Phi (2 x 2 contingency tables with 2 levels /var)
Cramer’s V
(larger than 2 x 2 with more than 2 levels/var)
w = phi (2x2 tables); = V √r-1 ( >2 levels ) (V = Cramer’s V; r =
the # of rows or columns whichever is smaller)
Odds ratio (= N11*N22 / N12*N21)
Table subscripts
N11
N12
N21
N22
Reporting Chi-square test results
Contingency table with a summary of data and
statistical results
Chi-square value
Df
P-value
Effect size (for test for group independence)
Phi, Cramer’s V, w, or odds ratio
Example reporting (P239)
Contingency table (2 X 3)
Student English proficiency * Major1 Crosstabulation
Count
Major1
non-English majors
Student English proficiency
English majors
Total
High
9
4
13
Mid
25
4
29
Low
12
0
12
Total
46
8
54
Application activities 8.5.3 (P234)
with Chi-Square in SPSS
Download