Week 1

advertisement
Week 7
Chi-square
READING
Text reading
Dancey and Reidy, 3rd edition, it is Chapter 8
In 2007 Fourth Edition its Chapter 9
CHAPTER OVERVIEW
Earlier, you learned how to analyse the relationship between two continuous variables
using Pearson's r. You also learned what was meant by a correlation coefficient, and
that r is a measure of effect size. You have seen how to represent such relationships
on scatterplots. In this chapter, we discuss the relationship between categorical
variables.
Chi-square (χ2 ) is a measure of the association between two categorical variables.
You have learned about categorical variables in Chapter 1. If, for instance, we classify
people into groups based on gender (i.e., males vs. females), and type of high school
they have attended (public vs. private), these are examples of categorical variables. In
the same way, if we classify people by ethnic group, religion, or the country in which
they live, these are all categorical judgments. In this chapter then, you will learn how
to:



analyse the association between categorical variables
report another measure of effect (Cramer's V)
report the results of such analyses.
But the same advice as given before applies here. It is important to get a general idea
of how this statistic might be useful to you in the future. It is not important to try to
memorise everything in that textbook. When you need to do this in real life, then you
need that book.
One final point is that quite often (in our experience with graduate students) we will
use chi-square from raw data tallies, but not using SPSS. There are several Internet
sites that we use when we do this. So if you ever find yourself doing chi-squares,
email Greg Yates who can show you several other quick ways to compute direct from
tallied data.
KEY TERMS
1- Frequency counts
2- Categorical variables
3- Pearson’s chi-square
4- Phi
5- Cramer’s V
6- Lambda
KEY POINTS
1- Categorical variables take a value that is one of several possible categories.
2- Categorical variables have no numerical meaning. Examples of categorical
variables include hair color, gender, field of study, college attended, and
political affiliation.
3- In a study asking respondents to identify themselves according to their field of
study as physical education or early childhood education, each respondent will
answer with exactly one of these categories. These are the values the variable
takes. Physical education is not a variable but field of study is.
4- Often categorical variables are disguised as quantitative variables. For
example, one might record gender information coded as Male = 0, and females
= 1.
5- With frequency counts, the data are not scores, but they are the number of
participants that falls into a certain category. χ2 is particularly appropriate for
frequency count data.
6- χ2 is a measure of association or relationship between categorical variables
and it was developed by Karl Pearson in 1900. It enables us to see whether the
frequency counts, we obtain when asking participants which category they are
in, are significantly different from the frequency counts we would expect by
chance.
7- One-variable χ2 (goodness-of-fit test) is used when we have one variable only.
8- Independence χ2 (2x2) is used when we are looking for association between
two variables, each with two categories. For example, the relationship between
gender (males vs. females) and field of study (physical education vs. early
childhood education).
9- Independent χ2 (r x 2) is used when we are looking for association between
two variables, where one variable has two categories, gender (males vs.
females), and the other variable has more than two categories such as field of
study (physical education vs. early childhood education vs. curriculum
design).
10- χ2 probability of .05 or less is commonly interpreted by social scientists as
justification for rejecting the null hypothesis that the row variable is unrelated
(that is, only randomly related) to the column variable. Put bluntly, there is a
significant relationship between the two categorical variables.
11- Phi and Cramer’s V are measures of the strength of association between two
categorical variables and each variable has only two categories. Phi is used
with 2x2 contingency tables. If one of the two categorical variables contains
more than two categories, then Cramer’s V is preferred to Phi because Phi
fails to reach its minimum value of zero.
12- Lambda measures the proportional reduction in error that is achieved when
membership of a category of one variable is used to predict category
membership on the other variable. A lambda value of 1 means that one
variable perfectly predicts the other; whereas a lambda value of 0 indicates
that one variable does not predict the other.
Web activities
http://www.ruf.rice.edu/~lane/stat_sim/contingency/index.html
This site supposedly provides a simulation of contingency tables, of which chi-square
test is one type.. To be honest, we have played with it, but did not feel it helped us
much.
Chi-square calculators are available on Internet. Here are two of them, which might
be useful sites if you need to use chi-square in your projects (Note: these pages are
calculators, not demos, but you can play with them by inventing data to pop into the
cells).
http://www.quantitativeskills.com/sisa/statistics/twoby2.htm
http://statpages.org/ctab2x2.html
ACTIVE LEARNING AND OPPORTUNITIES
Multiple Choice Questions
1- A one-variable χ2 is also called
ABCD-
Goodness-of-fit test
χ2 test of independence
χ2 4x2
χ2 2x2
2- The value of χ2 will always be
ABCD-
Positive
Negative
High
It depends
3- The Yates’ correction is sometimes used by researchers when:
ABCD-
Cell sizes are huge
Cell sizes are small
They analyse data from 2 x 2 contingency tables
Both b and c
4- If you are performing a 4 x 4 χ2 analysis and find you have violated the
assumptions, then you need to:
ABCD-
Look at the results for a Fisher’s exact probability test
Look to see whether it is possible to collapse categories
Investigate the possibility of T.Test
Disregard the assumption that you have violated.
5- Which of the below statements is false of chi square testing?
A- Chi square tests can be used to check how well a theoretical model fits the data
B- Chi square can be applied to continuous variables; it just means that a larger
contingency table is needed.
C- Chi square is used in research to measure the association between two
categorical variables.
Questions 6- are based on the output shown in the following tables
gender * academic background Crosstabulation
Count
gender
acdemic background
0
1
28
17
33
22
61
39
0
1
Total
Total
45
55
100
Chi-Square Tests
Pears on Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear
Ass ociation
N of Val id Cas es
Value
15.300b
15.300
15.300
df
1
1
1
15.300
Asym p. Si g.
(2-s ided)
.001
.001
.001
1
Exact Si g.
(2-s ided)
Exact Si g.
(1-s ided)
.001
.000
.001
100
a. Com puted onl y for a 2x2 table
b. 0 cells (.0%) have expected count les s than 5. The m inim um expected count is
17.55.
Symmetric Measures
Nom i nal by
Nom i nal
Phi
Cram er's V
N of Val id Cas es
Value
.366
.366
100
Approx. Si g.
.010
.010
a. Not as s um ing the null hypothes is.
b. Usi ng the as ym ptotic standard error as sum ing the nul l
hypothesis .
1- How many students is the total sample of the analysis?
2- Have the analysis violated the assumption of expected count less than
5? Explain?
3- Is there significant relationship between gender and academic
background? Explain?
4- What is the strength of the relationship between gender and academic
background if any? Is it significant? Why?
Download