Chi Square

advertisement
Quantitative Methods
Part 3
Chi - Squared Statistic
Recap on T-Statistic
It used the mean and standard error of a
population sample
 The data is on an “interval” or scale
 Mean and standard error are the
parameters
 This approach is known as parametric
 Another approach is non-parametric
testing

Introduction to Chi-Squared
It does not use the mean and standard
error of a population sample
 Each respondent can only choose one
category (unlike scale in T-Statistic)
 The expected frequency must be greater
than 5 for the test to succeed.
 If any of the categories have less than 5
for the expected frequency, then you
need to increase your sample size

Example using Chi-Squared

“Is there a preference amongst the UW
student population for a particular web
browser? “ (Dr C Price’s Data)
◦ They could only indicate one choice
◦ These are the observed frequencies
responses from the sample
Firefox
Observed
30
frequencies
IExplorer
Safari
Chrome
Opera
6
4
8
2
Was it just chance?

How confident am I?
◦ Was the sample representative of all UW
students?
◦ Was it just chance?

Chi-Squared test for significance
◦ Some variations on test
◦ Simplest is Null Hypothesis

:The students show “no preference” for a
particular browser
Chi-Squared: “Goodness of fit”
(No preference)
: The students show no preference for a
particular browser
 This leads to Hypothetical or Expected
distribution of frequency
◦ We would expect an equal number of
respondents per category
◦ We had 50 respondents and 5 categories
Firefox
Expected
10
frequencies
IExplorer
Safari
Chrome
Opera
10
10
10
10
Expected frequency table
Stage1: Formulation of Hypothesis
: There is no preference in the underlying
population for the factor suggested.
 : There is a preference in the underlying
population for the factors suggested.


The basis of the chi-squared test is to
compare the observed frequencies against
the expected frequencies
Stage 2: Expected Distribution

As our “null- hypothesis” is no preference,
we need to work out the expected
frequency:
◦ You would expect each category to have the
same amount of respondents
◦ Show this in “Expected frequency” table
◦ Has to have more than 5 to be valid
Firefox
Expected
10
frequencies
IExplorer
Safari
Chrome
Opera
10
10
10
10
Stage 3a: Level of confidence

Choose the level of confidence (often 0.05)
◦ 0.05 means that there is 5% chance that
conclusion is chance
◦ 95% chance that our conclusions are certain
Stage 3b: Degree of freedom
We need to find the degree of freedom
 This is calculated with the number of
categories

◦ We had 5 categories, df = 5-1 (4)
Stage 3: Critical value of Chi-Squared

In order to compare our calculated chisquare value with the “critical value” in the
chi-squared table we need:
◦ Level of confidence (0.05)
◦ Degree of freedom (4)

Our critical value from the table = 9.49
Stage 4: Calculate statistics
We compare the observed against the
expected for each category
 We square each one
 We add all of them up

Firefox
IExplorer
Safari
Chrome
Opera
Observed
30
6
4
8
2
Expected
10
10
10
10
10
= 52
Stage 5: Decision

Can we reject the
That students show
no preference for a particular browser?
◦ Our value of 52 is way beyond 9.49. We are
95% confident the value did not occur by
chance
So yes we can safely reject the null
hypothesis
 Which browser do they prefer?

◦ Firefox as it is way above expected frequency
of 10
Chi-Squared: “No Difference from a
Comparison Population”.

RQ: Are drivers of high performance cars
more likely to be involved in accidents?
◦ Sample n = 50 and Market Research data of
proportion of people driving these categories
◦ Once null hypothesis of expected frequency has
been done, the analysis is the same as no
preference calculation
High
Performance
Compact
Midsize
Full size
FO
20
14
9
9
MR%
10%
40%
30%
20%
FE
5 (10% of 50)
20
15
10
Chi-Squared test for “Independence”.


What makes computer games fun?
Review found the following
◦ Factors (Mastery, Challenge and Fantasy)
◦ Different opinion depending on gender
 Research sample of 50 males and 50 females
Mastery
Challenge
Fantasy
Male
10
32
8
Female
24
8
18
Observed frequency table
What is the research question?

A single sample with individuals measured
on 2 variables
◦ RQ: ”Is there a relationship between fun factor
and gender?”
◦ HO : “There is no such relationship”

Two separate samples representing 2
populations (male and female)
◦ RQ: ““Do male and female players have different
preferences for fun factors?”
◦ HO : “Male and female players do not have
different preferences”
Chi-Squared analysis for “Independence”.
Establish the null hypothesis (previous slide)
 Determine the critical value of chi-squared
dependent on the confidence limit (0.05) and
the degrees of freedom.

◦ df = (R – 1)*(C – 1) = 1 * 2 = 2 (R=2, C=3)

Mastery
Challenge
Fantasy
Male
10
32
8
Female
24
8
18
Look up in chi-squared table
◦ Chi-squared value = 5.99
Chi-Squared analysis for “Independence”.

Calculate the expected frequencies
◦ Add each column and divide by types (in this case 2)
◦ Easier if you have equal number for each gender (if
not come and see me)
Mastery
Challenge
Fantasy
Respondents
Male (FO)
10
32
8
50
Female (FO)
24
8
18
50
Cat total
34
40
26
Male (FE)
17
20
13
Female (FE)
17
20
13
Chi-Squared analysis for “Independence”.

Calculate the statistics using the chi-squared
formula
◦ Ensure you include both male and female data
2
2
2
2
(10

17)
(32

20)
(24

17)
(8

20)
2 

 ... 

17
20
17
20
 24.01
Mastery
Challenge
Fantasy
Male (FO)
10
32
8
Female (FO)
24
8
18
Male (FE)
17
20
13
Female (FE)
17
20
13
Stage 5: Decision

Can we reject the null hypothesis?
◦ Our value of 24.01 is way beyond 5.99. We are 95%
confident the value did not occur by chance
Conclusion: We are 95% confident that there is a
relationship between gender and fun factor
 But else can we get from this?

◦ Significant fun factor for males = Challenge
◦ Significant fun factor for females = Mastery and Fantasy
Mastery
Challenge
Fantasy
Male (FO)
10
32
8
Female (FO)
24
8
18
Male (FE)
17
20
13
Female (FE)
17
20
13
Workshop
Work on Workshop 7 activities
 Your journal (Homework)
 Your Literature Review
(Complete/update)

References


Dr C. Price’s notes 2010
Gravetter, F. and Wallnau, L. (2003) Statistics for the Behavioral
Sciences, New York: West Publishing Company
Download