Chap. 9: The Chi-Square Test & The Analysis of Contingency Tables

advertisement
Statistics for Business and
Economics
Chapter 9
Categorical Data Analysis
Learning Objectives
1. Explain 2 Test for Proportions
2. Explain 2 Test of Independence
3. Solve Hypothesis Testing Problems
• More Than Two Population Proportions
• Independence
Data Types
Data
Quantitative
Discrete
Continuous
Qualitative
Qualitative Data
•
Qualitative random variables yield responses
that classify
– Example: gender (male, female)
•
Measurement reflects number in category
•
Nominal or ordinal scale
•
Examples
– What make of car do you drive?
– Do you live on-campus or off-campus?
Hypothesis Tests
Qualitative Data
Qualitative
Data
1 pop.
Proportion
More than
2 pop.
Independence
2 pop.
Z Test
Z Test
2 Test
2 Test
Chi-Square (2) Test
for k Proportions
Hypothesis Tests
Qualitative Data
Qualitative
Data
1 pop.
Proportion
More than
2 pop.
Independence
2 pop.
Z Test
Z Test
2 Test
2 Test
Multinomial Experiment
•
•
•
•
•
•
n identical trials
k outcomes to each trial
Constant outcome probability, pk
Independent trials
Random variable is count, nk
Example: ask 100 people (n) which of 3
candidates (k) they will vote for
One-Way
Contingency Table
Shows number of observations in k independent
groups (outcomes or variable levels)
Outcomes (k = 3)
Candidate
Tom
Bill
Mary
Total
35
20
45
100
Number of responses
• 假設檢定三者機率是否一致:
– H: Prof (Tom) = Prob (Mary)=Prof(Bill)=1/3
• 能不能採用三比例檢定,亦即檢定
– H1: Prof (Tom) = 1/3
– H2: Prof (Mary) = 1/3
– H3: Prof (Bill) = 1/3
Calculate the probability of incorrectly rejecting the null using
the “common sense” test based on the three individual t-statistics.
• To simplify the calculation, suppose that P1 , P2 and P3 are
independently distributed. Let t1 and t2 be the t-statistics.
p13  0
p1  0
p12  0
t1 
, t2 
, t3 
SE ( p1 )
SE ( p12 )
SE ( p 3 )
• The “common sense” test is reject H : P1  P2  P3  1/ 3 if
|t1|>1.96 and/or |t2| > 1.96 and/or |t3| > 1.96 . What is the
probability that this “common sense” test rejects H0 when H0 is
actually true? (It should be 5%.)
11
Probability of incorrectly rejecting the null
 Pr(| t1 |  1.96 and/or | t2 |  1.96 and/or | t3 |  1.96)
H0
 Pr(| t1 |  1.96 , | t2 |  1.96, | t3 |  1.96)
H0
 Pr(| t1 |  1.96 , | t2 |  1.96, | t3 |  1.96)
H0
 Pr(| t1 |  1.96 , | t2 |  1.96, | t3 |  1.96)
H0
 Pr(| t1 |  1.96 , | t2 |  1.96, | t3 |  1.96)
H0
 Pr(| t1 |  1.96 , | t2 |  1.96, | t3 |  1.96)
H0
 Pr(| t1 |  1.96 , | t2 |  1.96, | t3 |  1.96)
H0
 Pr(| t1 |  1.96 , | t2 |  1.96, | t3 |  1.96)
H0
12
 Pr(| t1 |  1.96)  Pr(| t2 |  1.96)  Pr(| t3 |  1.96)
H0
H0
H0
 Pr(| t1 |  1.96)  Pr(| t2 |  1.96)  Pr(| t3 |  1.96)
H0
H0
H0
 Pr(| t1 |  1.96)  Pr(| t2 |  1.96)  Pr(| t3 |  1.96)...
H0
H0
H0
(t1 , t2 , t3 are independent by assumption)
 .05  .05  .05  3  .05  .95  .05  3  .95  .95  .05
 .143  14.3%
which is not the desired 5%.
13
The size of a test is the actual rejection rate under the
null hypothesis.
• The size of the “common sense” test isn’t 5%.
• Its size actually depends on the correlation between t1 t2 and
t3(and thus on the correlation between p1 p2 and p3 ).
Two Solutions.
• Use a different critical value in this procedure - not 1.96 (this
is the “Bonferroni method”). This is rarely used in practice.
• Use a different test statistic that test at once
14
2
( )
Chi-Square
Test
for k Proportions
•
Tests equality (=) of proportions only
– Example: p1 = .2, p2=.3, p3 = .5
•
One variable with several levels
•
Uses one-way contingency table
Conditions Required for a Valid
Test: One-way Table
1. A multinomial experiment has been
conducted
2. The sample size n is large: E(ni) is greater
than or equal to 5 for every cell
2 Test for k Proportions
Hypotheses & Statistic
1.
Hypotheses
Hypothesized
probability
H0: p1 = p1,0, p2 = p2,0, ..., pk = pk,0
Ha: At least one pi is different from above
2.
Test Statistic
 ni  E  ni  
  
E  ni 
all cells
2
3.
Observed count
2
Expected count:
E(ni) = npi,0
Degrees of Freedom: k – 1
Number of
outcomes
2 Test Basic Idea
1. Compares observed count to expected count
assuming null hypothesis is true
2. Closer observed count is to expected count,
the more likely the H0 is true
•
Measured by squared difference relative to
expected count
— Reject large values
Finding Critical Value
Example
What is the critical 2 value if k = 3, and  =.05?
If ni = E(ni), 2 = 0.
Do not reject H0
Reject H0
 = .05
df = k - 1 = 2
0
2 Table
(Portion)
DF .995
1
...
2 0.010
5.991
2
Upper Tail Area
…
.95
…
… 0.004
…
… 0.103
…
.05
3.841
5.991
2 Test for k Proportions
Example
As personnel director, you want
to test the perception of fairness
of three methods of performance
evaluation. Of 180 employees,
63 rated Method 1 as fair, 45
rated Method 2 as fair, 72 rated
Method 3 as fair. At the .05
level of significance, is there a
difference in perceptions?
2

•
•
•
•
•
Test for k Proportions
Solution
H0: p1 = p2 = p3 = 1/3
Test Statistic:
Ha: At least 1 is different
 = .05
n1 = 63 n2 = 45 n3 = 72
Critical Value(s):
Decision:
Reject H0
 = .05
0
5.991
2
Conclusion:
2

Test for k Proportions
Solution
E  ni   npi ,0
E  n1   E  n2   E  n3   180 1 3   60
 ni  E  ni  
  
E  ni 
all cells
2
2
63  60


2
60
45  60 


2
60
72  60 


2
60
 6.3
2

•
•
•
•
•
Test for k Proportions
Solution
H0: p1 = p2 = p3 = 1/3
Test Statistic:
2 = 6.3
Ha: At least 1 is different
 = .05
n1 = 63 n2 = 45 n3 = 72
Critical Value(s):
Decision:
Reject H0
Reject at  = .05
 = .05
0
5.991
2
Conclusion:
There is evidence of a
difference in proportions
Hypothesis Tests
Qualitative Data
Qualitative
Data
1 pop.
Proportion
More than
2 pop.
Independence
2 pop.
Z Test
Z, Chi
2 Test

2 Test
Contingency Table Example
Left-Handed vs. Gender
Dominant Hand: Left vs. Right
Gender: Male vs. Female
 2 categories for each variable, so
called a 2 x 2 table
 Suppose we examine a sample of
300 children
Contingency Table Example
(continued)
Sample results organized in a contingency table:
Hand Preference
sample size = n = 300:
120 Females, 12 were
left handed
180 Males, 24 were
left handed
Gender
Left
Right
Female
12
108
120
Male
24
156
180
36
264
300
Contingency Table Example
Solution
•
•
•
•
•
H0: p1 = p2
Test Statistic:
Ha: At least 1 is different
 = .05
n1 = 12 n2 = 24
Critical Value(s):
Decision:
Reject H0
 = .05
0
3.841
2
Conclusion:
Contingency Table Example
Solution
12  24
36
p

 0.12
120  180 300
If the two proportions are equal, then
P(Left Handed | Female) = P(Left Handed | Male) = .12
i.e., we would expect (.12)(120) = 14.4 females to be left handed
(.12)(180) = 21.6 males to be left handed
χ 2STAT 

all cells
(f o  f e ) 2
fe
(12  14.4) 2 (108  105.6) 2 (24  21.6) 2 (156  158.4) 2




 0.7576
14.4
105.6
21.6
158.4
Contingency Table Example
Solution
•
•
•
•
•
H0: p1 = p2
Test Statistic:
2 = 0.7576
Ha: At least 1 is different
 = .05
n1 = 12 n2 = 24
Critical Value(s):
Decision:
Reject H0
Reject at  = .05
 = .05
0
3.841
2
Conclusion:
There is evidence of a
difference in proportions
2 Test of Independence
Hypothesis Tests
Qualitative Data
Qualitative
Data
1 pop.
Proportion
More than
2 pop.
Independence
2 pop.
Z Test
Z Test
2 Test
2 Test
2 Test of Independence
•
Shows if a relationship exists between two
qualitative variables
– One sample is drawn
– Does not show causality
•
Uses two-way contingency table
2 Test of Independence
Contingency Table
Shows number of observations from 1 sample
jointly in 2 qualitative variables
Levels of variable 2
House Style
Split-Level
Ranch
Total
House Location
Urban
Rural
63
49
15
33
78
82
Levels of variable 1
Total
112
48
160
Conditions Required for a
Valid 2 Test: Independence
1. Multinomial experiment has been conducted
2. The sample size, n, is large: Eij is greater than
or equal to 5 for every cell
2 Test of Independence
Hypotheses & Statistic
1. Hypotheses
• H0: Variables are independent
• Ha: Variables are related (dependent)
2. Test Statistic
Observed count
 nij  Eij 
  
Eij
all cells
2
2
Expected
count
3. Degrees of Freedom: (r – 1)(c – 1)
Rows Columns
2

Test of Independence
Expected Counts
1. Statistical independence means joint
probability equals product of marginal
probabilities
2. Compute marginal probabilities and multiply
for joint probability
3. Expected count is sample size times joint
probability
Expected Count Example
Marginal probability = 112
160
House Style
Location
Urban
Rural
Obs.
Obs.
Total
Split–Level
63
49
112
Ranch
15
33
48
Total
78
82
160
Expected Count Example
Marginal probability = 112
160
House Style
Location
Urban
Rural
Obs.
Obs.
Total
Split–Level
63
49
112
Ranch
15
33
48
Total
78
82
160
Marginal probability =
78
160
Expected Count Example
Joint probability =
House Style
112 78
160 160
Marginal probability = 112
160
Location
Urban
Rural
Obs.
Obs.
Total
Split–Level
63
49
112
Ranch
15
33
48
Total
78
82
160
Marginal probability =
78
160
112 78
Expected count = 160· 160 160
= 54.6
Expected Count Calculation
Eij =
112·78
160
House Style
R iC j
n
House Location
Urban
Rural
Obs. Exp. Obs. Exp.
112·82
160
Total
Split-Level
63
54.6
49
57.4
112
Ranch
15
23.4
33
24.6
48
Total
78
78
82
82
48·78
160
160
48·82
160
2 Test of Independence
Example
As a realtor you want to determine if house style and
house location are related. At the .05 level of
significance, is there evidence of a relationship?
House Style
Split-Level
Ranch
Total
House Location
Urban
Rural
63
49
15
33
78
82
Total
112
48
160
2 Test of Independence
Solution
•
•
•
•
•
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Test Statistic:
Decision:
Reject H0
 = .05
0
3.841
2
Conclusion:
2 Test of Independence
Solution

Eij  5 in all cells
112·78
160
House Style
House Location
Urban
Rural
Obs. Exp. Obs. Exp.
112·82
160
Total
Split-Level
63
54.6
49
57.4
112
Ranch
15
23.4
33
24.6
48
Total
78
78
82
82
48·78
160
160
48·82
160
2 Test of Independence
Solution
 nij  Eij 
  
Eij
all cells
2
2
n11  E11 


n12  E12 


2
E11
E12
63  54.6


2
54.6
n22  E22 


2
2

49  57.4 


E22
2
57.4
33  24.6 


2

24.6
 8.41
2 Test of Independence
Solution
•
•
•
•
•
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Reject H0
 = .05
0
3.841
2
Test Statistic:
2 = 8.41
Decision:
Reject at  = .05
Conclusion:
There is evidence of a
relationship
2 Test of Independence
Thinking Challenge
You’re a marketing research analyst. You ask a
random sample of 286 consumers if they purchase Diet
Pepsi or Diet Coke. At the .05 level of significance, is
there evidence of a relationship?
Diet Coke
No
Yes
Total
Diet Pepsi
No
Yes
84
32
48
122
132
154
Total
116
170
286
2 Test of Independence
Solution*
•
•
•
•
•
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Test Statistic:
Decision:
Reject H0
 = .05
0
3.841
2
Conclusion:
2 Test of Independence
Solution*

Eij  5 in all cells
116·132
286
Diet Coke
Diet Pepsi
154·132
286
No
Yes
Obs. Exp. Obs. Exp. Total
No
84
53.5
32
62.5
116
Yes
48
78.5
122
91.5
170
Total
132
132
154
154
286
170·132
286
170·154
286
2 Test of Independence
Solution*
 nij  Eij 
  
Eij
all cells
2
2
n11  E11 


n12  E12 


2
E11
E12
84  53.5


2
53.5
n22  E22 


2
2

32  62.5


E22
2
62.5
122  91.5


2

91.5
 54.29
2 Test of Independence
Solution*
•
•
•
•
•
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Reject H0
 = .05
0
3.841
2
Test Statistic:
2 = 54.29
Decision:
Reject at  = .05
Conclusion:
There is evidence of a
relationship
Example
• The meal plan selected by 200 students is shown
below:
Number of meals per week
Total
Class 20/wee 10/week
Standing
k
Fresh.
24
32
Soph.
22
26
Junior
10
14
Senior
14
16
Total
70
88
none
14
12
6
10
42
70
60
30
40
200
Example
(continued)
• The hypothesis to be tested is:
H0: Meal plan and class standing are independent
(i.e., there is no relationship between them)
H1: Meal plan and class standing are dependent
(i.e., there is a relationship between them)
Example:
Expected Cell Frequencies
(continued)
Observed:
Class
Standin
g
Number of meals
per week
20/wk 10/wk
none
Total
Fresh.
24
32
14
70
Soph.
22
26
12
60
Junior
10
14
6
30
Senior
14
16
10
Total
70
88
42

40
none
Total
200
Fresh.
24.5
30.8
14.7
70
Soph.
21.0
26.4
12.6
60
Junior
10.5
13.2
6.3
30
Senior
14.0
17.6
8.4
40
70
88
42
200
row total  column total
n
30  70
 10.5
200
Number of meals
per week
Class
Standing 20/wk 10/wk
Example for one cell:
fe 
Expected cell
frequencies if H0 is true:
Total
Example: The Test Statistic
(continued)
• The test statistic value is:
2
χ STAT


all cells
( f o  f e )2
fe
( 24  24.5 ) 2 ( 32  30.8 ) 2
( 10  8.4 ) 2



 0.709
24.5
30.8
8.4
χ 0.2 05
= 12.592 from the chi-squared distribution
with (4 – 1)(3 – 1) = 6 degrees of freedom
Example:
Decision and Interpretation
(continued)
2
The te ststatisticis χ STAT
 0.709 ; χ 02.05 with 6 d.f.  12.592
Decision Rule:
2
χ
If
STAT > 12.592, reject H0,
otherwise, do not reject H0
Here,
0.05
0
Do not
reject H0
Reject H0
20.05=12.592
2
2
χ STAT
= 0.709 < χ 0.05 = 12.592,
2
so do not reject H0
Conclusion: there is not
sufficient evidence that meal
plan and class standing are
related at  = 0.05
Conclusion
1. Explained 2 Test for Proportions
2. Explained 2 Test of Independence
3. Solved Hypothesis Testing Problems
•
•
More Than Two Population Proportions
Independence
Download