The Chi-Square Distribution

advertisement
Goodness Of Fit
Goodness Of Fit
The purpose of a chi-square goodness-of-fit test is to compare
an observed distribution to an expected distribution.
For example, suppose there are four entrances to a building. You want to
know if the four entrances are equally used. You observe 400 people
entering the building on a random basis:
Entrance
Observed
Frequency
Expected
Frequency
Main
Back
Side 1
Side 2
Total
140
120
90
50
400
100
100
100
100
400
H0: pM = pB = pS1 = pS2
H1: The proportions are not all equal.
If the entrances are equally utilized, we would expect each entrance to be
used approximately 25% of the time. Is the difference shown above
statistically significant?
Chi Square Test
If the observed frequencies are obtained from a random sample and each
expected frequency is at least 5, the sampling distribution for the goodnessof-fit test is a chi-square distribution with k-1 degrees of freedom. (where k
= the number of categories)
Test Statistic

2


  f o  f e 2

fe




O = observed frequency in each category
E = expected frequency in each category
Goodness-of-Fit Test: Equal
Expected Frequencies
Let f0 and fe be the observed and expected frequencies,
respectively.
H0: There is no difference between the observed and
expected frequencies.
H0: p1 = p2 = p3 = p4
H1: There is a difference between the observed and the
expected frequencies.
H1: The proportions are not all equal.
k-1 degrees of freedom. (where
k = the number of categories)
See Table P.495
df = 3
df = 5
df = 10
2
EXAMPLE
The following information shows the number of employees
absent by day of the week at a large manufacturing plant.
At the .05 level of significance, is there a difference in the
absence rate by day of the week?
Day
Monday
Tuesday
Wednesday
Thursday
Friday
Total
Frequency
120
45
60
90
130
445
EXAMPLE continued
The expected frequency is:
(120+45+60+90+130)/5=89.
The degrees of freedom is (5-1)=4.
The critical value is 9.488. (Appendix B, P.495)
Example continued
Day
Monday
Tuesday
Wednesday
Thursday
Friday
Total
Freq.
120
45
60
90
130
445
Expec.
89
89
89
89
89
445
Because the computed value of chi-square is
greater than the critical value, H0 is
rejected.
We conclude that there is a difference in the
number of workers absent by day of the week.
(fo – fe)2/fe
10.80
21.75
9.45
0.01
18.89
60.90
2 

  f o  f e 2 


fe


Example
Goodness of Fit
A seller of baseball cards wants to know if the demand for the following
Cards Sold
6 cards is the same.
Tom Seaver
Nolan Ryan
Ty Cobb
George Brett
Hank Aaron
Johnny Bench
13
33
14
7
36
17
120
Goodness of Fit Test
MegaStat
Tom Seaver
Nolan Ryan
Ty Cobb
George Brett
Hank Aaron
Johnny Bench
Observed
13
33
14
7
36
17
120
Expected
20
20
20
20
20
20
120
34.40 chi-square
5 df
1.98E-06 p-value
O-E
-7.000
13.000
-6.000
-13.000
16.000
-3.000
0.000
(O - E)² / E
2.450
8.450
1.800
8.450
12.800
0.450
34.400
% of chisq
7.12
24.56
5.23
24.56
37.21
1.31
100.00
Goodness Of Fit
(unequal frequencies)
Example - Goodness Of Fit
(unequal frequencies)
The Bank of America (BoA) credit card department knows from national
US government records that 5% of all US VISA card holders have no
high school diploma, 15% have a high school diploma, 25% have some
college, and 55% have a college degree. Given the information below, at
the 1% level of significance can we conclude that (BoA) card holders are
significantly different from the rest of the nation?
Education
Observed
Frequency
Expected
Frequency
Some HS
HS Diploma
Some College
College Degree
Total
50
100
190
160
500
25
75
125
275
500
= (500)(.05)
= (500)(.15)
= (500)(.25)
= (500)(.55)
2


fo  fe  
2
  
  115.22
fe


C  11.345
2
df = (4 - 1) = 3
Reject H0
Limitations of Chi-Square
Limitations of Chi-Square
1.) If there are only 2 cells, the expected frequency in each
cell should be at least 5.
2.) For more than 2 cells, chi-square should not be used if
more than 20% of fe cells have expected frequencies less
than 5.
Roll-Of-The-Die Experiment
Outcome
1
2
3
4
5
6
TOTAL
Observed
Frequency
Expected
Frequency
3
6
2
3
9
7
30
5
5
5
5
5
5
30
Two-thirds of the computed chi-square value is
accounted for by just two categories (outcomes).
Although the expected frequency is not less than
5, too much weight may be given to these
categories. More experimental trials should be
conducted to increase the number of observations.
Goodness of Fit Test
MegaStat
observed
3
6
2
3
9
7
30
expected
5.000
5.000
5.000
5.000
5.000
5.000
30.000
7.60 chi-square
5 df
.1797 p-value
O-E
-2.000
1.000
-3.000
-2.000
4.000
2.000
0.000
(O - E)² / E
0.800
0.200
1.800
0.800
3.200
0.800
7.600
% of chisq
10.53
2.63
23.68
10.53
42.11
10.53
100.00
Independence &
Contingency Tables
Contingency Table Analysis
A contingency table is used to investigate whether two
traits or characteristics are related. Each observation is
classified according to two criteria.
The degrees of freedom is equal to:
df = (# rows - 1)(# columns - 1).
The expected frequency is computed as:
Expected Frequency = (row total)(column total)/Grand Total
EXAMPLE
Is there a relationship between the location of an accident
and the gender of the person involved in the accident? A
sample of 150 accidents reported to the police were
classified by type and gender. At the .05 level of
significance, can we conclude that gender and the location
of the accident are related?
Gender Work Home
Other
Total
Male
60
20
10
90
Female
20
30
10
60
Total
80
50
20
150
EXAMPLE continued
Gender Work Home
Other
Total
Male
60
20
10
90
Female
20
30
10
60
Total
80
50
20
150
The expected frequency for the work-male intersection is computed
as (90)(80)/150=48. Similarly, you can compute the expected
frequencies for the other cells.
H0: Gender and location are not related.
H1: Gender and location are related.
EXAMPLE continued
H0 is rejected if the computed value of χ2 is greater than
5.991. There are (3- 1)(2-1) = 2 degrees of freedom.
Find the value of χ2.
2

60  48 2

48

10  82
 ... 
8
 16 .667
H0 is rejected. We conclude that gender and location are related.
MegaStat Example
Contingency Tables
A crime agency wants to know if a male released from prison and returned to
his hometown has an easier (or more difficult) time adjusting to civilian life .
Residence After
Release From
Prison
Hometown
Not Hometown
Total
MegaStat
Hometown
Adjustment to Civilian Life
Outstanding
Good
Fair
Unsatisfactory
Total
27
35
33
25
120
13
40
15
50
27
60
25
50
80
200
Chi-square Contingency Table Test for Independence
Observed
Expected
Not Hometown Observed
Expected
Total
Observed
Expected
Outstanding
27
24.00
13
16.00
40
40.00
Good
35
30.00
15
20.00
50
50.00
5.73 chi-square
3 df
.126 p-value
Fair Unsatisfactory
33
25
36.00
30.00
27
25
24.00
20.00
60
50
60.00
50.00
Total
120
120.00
80
80.00
200
200.00
Download