HERE - Towson University

advertisement
OPRE504 Study Guide
Chapter 13
Chi-Square Tests for Counts
Common Procedures for Conducting Chi-Square Tests
1.
State Hypotheses
2.
Find Expected Values for Each Cell
3.
Compute Squared Residuals for Each Cell
4.
Standardize Squared Residuals Using Expected Value
5.
Sum up Standardized Residuals to Derive Chi-Squared Statistic (πœ’ 2 )
πœ’ 2 =∑π‘Žπ‘™π‘™ 𝑐𝑒𝑙𝑙𝑒𝑠
(𝑂𝑏𝑠−𝐸π‘₯𝑝)2
𝐸π‘₯𝑝
6.
Find the Degree of Freedom
df = the number of cells – 1 for Goodness of Fit tests;
df = (# of rows – 1) x (# of columns – 1) for Homogeneity and Independence Tess.
7.
Determine the Critical Value of πœ’ 2 using X-Table based on df and the alpha level.
8.
Decision:
Reject H0 if πœ’ 2 > critical value; fail to reject if otherwise.
I
Goodness of Fit Test
Purpose:
Compare the distribution of a single variable to an expected distribution. A test of how the
distribution of counts in one categorical variable matches the distribution predicted by a model.
Typical Research Questions:
Whether a particular day of the week is more likely to show a gain in DJIA than any other;
whether a die is fair (some faces are more likely to appear); whether M&M has more candies of a
particular color than advertised; whether the likelihood of drawing a number is equal in a lottery;
whether there is employment discrimination based on ethnics, etc.
Hypotheses:
H0:
The distributions of observed counts and expected counts across all categories of a single
variable are the same (There is no particular pattern)
Ha:
The distributions of observed counts and expected counts across all categories of one
single variable are different (There is a pattern of distribution)
Degree of Freedom:
Df = number of categories in the category variable - 1
Chaodong Han
OPRE504
Data Analysis and Decisions Class Handout
Page 1 of 8
Q13.1 [Sharpe 2011, Chapter 13, Exercise 7, p.424] Maryland has a Pick-3 Lottery where 3
random digits are drawn each day. A fair game depends on every value (0-9) being equally likely
at each of the three positions. To investigate the randomness, we collect data of winning digits
over a 32-week period as shown in the following table. Is each of the digits from 0 to 9 equally
likely to be drawn?
Group
0
1
2
3
4
5
6
7
8
9
1.
Observed Count
62
55
66
64
75
57
71
74
69
61
Observed (%)
9.480%
8.410%
10.092%
9.786%
11.468%
8.716%
10.856%
11.315%
10.550%
9.327%
State Hypotheses:
H0:
Ha:
2.
Compute Expected Count and Standardized Squared Residuals (Step 2-4)
By random, for a total of 654 digits, if each category of digit (0-9) has an equal chance,
the expected count should be 10% x 654 = 65.4.
Group
0
1
2
3
4
5
6
7
8
9
Observed
Count
62
55
66
64
75
57
71
74
69
61
Expected Residual
Observed % Count
(Obs-Exp)
9.480%
8.410%
10.092%
9.786%
11.468%
8.716%
10.856%
11.315%
10.550%
9.327%
3.
Chi-square Statistic
πœ’2 =
4.
Determine Critical Value of πœ’ 2
Chaodong Han
OPRE504
Standardized
Squared Squared
Residuals Residuals
Data Analysis and Decisions Class Handout
Page 2 of 8
Df = 10 (10 groups) – 1 = 9; 5% significance level: πœ’ 2 * = CHIINV(0.05, 9) =
5.
Decision
More Exercises:
Guided Example- Stock Market Patterns (pp.403-405);
Chapter 13: Exercises 2, 3, 4, 5, 8, 38.
II
Chi-Square Test of Homogeneity
Purpose:
Compare the distribution of counts for two or more groups on the same categorical variable. This
categorical variable has multiple categories.
Typical Research Questions:
Whether the responses to one a survey questions vary across different groups, for example,
“whether the distribution of responses about the importance of looking good is the same across
five countries (China, France, India, U.K. and U.S.)?
Hypotheses:
H0:
The distribution of responses is homogeneous across all groups
Ha:
The distribution of responses is not homogeneous across all groups
Degree of Freedom:
Df = (number of response categories – 1) x (number of groups – 1)
Q13.2 [Sharpe 2011, Chapter13, Exercise 14, pp.425-6] A European manufacturer of
automobiles claims that its cars are preferred by the younger generation and would like to target
university students in its next ad campaign. Suppose we test its claim with our own survey using
a random survey of cars parked in the student and staff parking lots respectively at a large
university. The car brands are reported by country of origin in the following table. Are there
differences in the national origin of cars driven by students and staff?
American Brand
European Brand
Asian Brand
1.
Student
107
33
55
Staff
105
12
47
State Hypotheses
H0:
Ha:
Chaodong Han
OPRE504
Data Analysis and Decisions Class Handout
Page 3 of 8
2.
Compute expected value
Observed Brand
Distribution
OBSERVED
Student
Staff
Total
American Brand
107
105
212
American Brand
European Brand
33
12
45
European Brand
Asian Brand
55
47
102
Asian Brand
Total
195
164
359
3.
100.00%
EXPECTED
Student
Staff
Expected Total
Compute Standardized Squared differences
(obs-exp)
Student
Staff
(obs-exp)2
Student
Staff
(obs-exp)2/exp
Student
Staff
American Brand
European Brand
Asian Brand
4.
Chi-square Statistic
πœ’2 =
5.
Degree of Freedom
Df = (no. of car brands – 1) x (no. of driver groups – 1) =
6.
Critical Value of Chi-square πœ’ 2 * for alpha level = 0.05 and df=
πœ’ 2 * = CHIINV
7.
Decision
More exercises:
Chapter 13, Guided Example – Attitudes on Appearance (pp.410-411)
Chapter 13: Exercises 13, 14, 25, 26, 30, 37 and 41.
Chaodong Han
OPRE504
Data Analysis and Decisions Class Handout
Page 4 of 8
III
Chi-Square Test of Independence
Purpose:
When we have a two-variable contingency table for one sample and each variable has an
exhaustive list of categories, we would like to know whether one variable is independent of the
other variable. It uses the same calculation as a test of homogeneity.
Hypotheses:
H0:
Two variables are independent of each other
Ha:
Two variables are not independent of each other
Degree of freedom:
According to the contingency table, df = (no. of rows – 1) x (no. of columns – 1)
Q13.3 [Sharpe 2011, Chapter 13, Exercise 10, p.424] The following table shows the rank
attained by male and female officers in the New York City Police Department (NYPD). Do these
data indicate that men and women are equitably represented at all levels of the department?
Rank
Gender
Officer
Detective
Sergeant
Lieutenant
Captain
Higher Ranks
Male
21,900
4,058
3,898
1,333
359
218
Female
4,281
806
415
89
12
10
1.
Hypotheses
2.
Expected Counts
(1) Compute male and female proportions and total by category
Officer
Detective
Sergeant
Lieutenant
Captain
Higher Ranks
total
Male
21,900
4,058
3,898
1,333
359
218
31,766
Female
4,281
806
415
89
12
10
5,613
Total
26,181
4,864
4,313
1,422
371
228
37,379
100%
(2) Compute expected male and female counts by multiplying overall male and female
proportions with the total for each category. For example, male officers = 84.98% x
26,181 = 22,249.5; female captain = 15.02% x 371 = 55.7
Chaodong Han
OPRE504
Data Analysis and Decisions Class Handout
Page 5 of 8
Male
Officer
Detective
Sergeant
Lieutenant
Captain
Higher Ranks
Total
3.
Female
31,766 5,613
Compute Squared Residuals for All Cells
Male
Female
Officer
Detective
Sergeant
Lieutenant
Captain
Higher Ranks
Male
Female
Officer
Detective
Sergeant
Lieutenant
Captain
Higher Ranks
4.
Standardize Squared Residuals [(obs-exp)2/exp]
Standardized Squares
Male
Female
Officer
Detective
Sergeant
Lieutenant
Captain
Higher Ranks
5.
Chi-Squared statistic (πœ’ 2 )
πœ’2 =
6.
Determine Critical πœ’ 2 *
Chaodong Han
OPRE504
Data Analysis and Decisions Class Handout
Page 6 of 8
Df = (number of rows – 1) x (number of columns – 1) =
5% alpha level, πœ’ 2 * = CHIINV (
)=
7.
Decision
More Exercises:
Chapter 13: Guided Example – Personal Appearance and Age (pp.415 – 416)
Chapter 13: Exercises 10, 11, 15, 16, 27, 28, 32, 35, 36, 39, 40, 42, 43, and 44
IV
Compare Two Proportions and Confidence Intervals
for the Difference of Two Proportions
Q13.4 [Sharpe 2011, p.412] A research conducted by the U.S. Department of Commerce
surveyed 24-year-old Americans to see if they had finished high school and reports:
Men Women
HS Diploma
No HS Diploma
Total
Total
10,579
11,169
21,748
1,881
1,509
3,390
12,460
12,678
25,138
a): Test whether the distribution of high school diplomas is different for men and women.
Step 1: Overall percentage for completing high school =
Overall percentage for not completing high school =
Step 2: Under homogeneity assumption, we would expect the same percentages to occur in both
Mean and Women groups.
Men with HS diploma:
Women with HS diploma:
EXPECTED
Men
12,460 x
12,678 x
%=
%=
Women
HS Diploma
No HS Diploma
Total
12,460 12,678
Chaodong Han
OPRE504
Data Analysis and Decisions Class Handout
Page 7 of 8
Step 3: Calculate Standardized Difference
expected
HS
Diploma
No HS
Diploma
Men
Women
10,779.70
10,968.30
1,680.30
1,709.70
12,460
12,678
obs-exp
obs-exp
(obs-exp)^2
(obs-exp)^2
(obsexp)^2/exp
(obsexp)^2/exp
Men
Women
Men
Women
Men
Women
Sum:
πœ’2 =
Chi-square statistic:
Step 4: Degree of Freedom df =
Step 5: Critical Value of Chi-square given df= and alpha = 0.05, πœ’ 2 * =CHIINV
Step 6: Decision:
Note:
A Chi-square test on a 2x2 contingency table with df=(2-1) x (2-1) =1 is equivalent to testing
whether two proportions (for men and women in the above question) are equal.
Confidence Interval for the Difference of Two Proportions:
𝑝̂1 is the proportion for group 1, π‘žΜ‚1 = 1- 𝑝̂1 ; 𝑝̂2 is the proportion for group 2, π‘žΜ‚2 = 1-𝑝̂ 2
CI : (𝑝̂1 - 𝑝̂2 ) ± z* SE(𝑝̂1- 𝑝̂ 2 ) where SE(𝑝̂1- 𝑝̂ 2 ) = √
𝑝̂1 π‘žΜ‚1
𝑛1
+
𝑝̂2 π‘žΜ‚2
𝑛2
Q13.4 b) Are women more likely to complete high school? Test confidence interval at 95%.
Proportion of women who have high school diplomas:
𝑝̂1=
π‘žΜ‚1 = 1- 𝑝̂1 =
𝑛1 =
Proportion of men who have high school diplomas:
𝑝̂1=
π‘žΜ‚1 = 1- 𝑝̂1 =
𝑛2 =
𝑝̂1 π‘žΜ‚1
SE(𝑝̂1- 𝑝̂2 ) = √
𝑛1
+
𝑝̂2 π‘žΜ‚2
𝑛2
=
CI = (𝑝̂1 - 𝑝̂ 2 ) ± z* SE(𝑝̂1 - 𝑝̂2 ) =
Conclusion:
More exercises: Chapter 13: Exercises 17, 18, 19, 20, 21 and 22.
Chaodong Han
OPRE504
Data Analysis and Decisions Class Handout
Page 8 of 8
Download