Handoutn 11 Analysis of Categorical Data

advertisement
Analysis of Categorical Data
Hypothesis Test for a Population Proportion, 
Ho:

  0
H a:
1.
2.
3.
  0
  0
  0

z
T.S.
ˆ   0
 ˆ
R.R. For a probability of a Type-I error


1. Reject Ho if z  z

2. Reject Ho if z  -z
3. Reject Ho if z  z/2 or if z  -z/2

Note: Under H0,
 ˆ 
 0 (1   0 )
n
Assumptions:
1. A random sample is selected from a population
2. The sample size is sufficiently large ( n 0
proportion
ˆ
 5 and n(1   0 )  5 ) such that the sampling distribution of the sample
is approximately normal.
Example State DMV records indicate that of all vehicles undergoing emissions testing during the previous year, 70%
passed on the first try. A random sample of 200 cars tested in a particular county during the current year yields 160 that
passed on the initial test. Does this suggest that the population proportion for this county during the current year differs
from the previous statewide proportion? Conduct hypothesis test using   .05.
Minitab Commands: > stat > basic statistics > 1 Proportion
Minitab OUTPUT
Test and CI for One Proportion
Test of p = 0.7 vs p not = 0.7
Sample
1
X
160
N
200
Sample p
0.800000
95% CI
(0.744564, 0.855436)
Z-Value
3.09
P-Value
0.002
Hypothesis Test about the Difference Between Two Population Proportions
Using Independent Random Samples
Ho:  1   2  0

H a:
1.  1   2  0
2.  1   2  0
3.  1   2  0

T.S.
ˆ1  ˆ 2
z =
ˆ (1  ˆ )(
1
1
 )
n1 n2
where
ˆ 
x1  x 2
n1  n2
R.R. For a given value of 


1. reject H0 if z  z

2. reject H0 if z  -z
3. reject H0 if z  z/2 or if z  -z/2

Assumptions:
1. Independent random samples.
2. n1 and n2 are sufficiently large ( n1ˆ1 , n1 (1  ˆ1 ), n2ˆ 2 , and
sampling distribution of ( ˆ1  ˆ 2 ) is approximately normal.
n2 (1  ˆ 2 ) are at least 5 ) such that the
Example A law student believes that the proportion of registered Republican in favor of additional tax incentives is
greater than the proportion of registered democrats in favor of such incentives. The student acquired independent random
samples of 200 republicans and 200 Democrats and found that 109 Republicans and 86 Democrats in favor of additional
tax incentives. Use the data to test Ho: 1-2=0 versus Ha: 1 - 2 > 0. Use =.05.
Minitab Commands: > stat > basic statistics > 2 Proportions > use pooled estimate
Minitab Output
Test and CI for Two Proportions
Sample
1
2
X
109
86
N
200
200
Sample p
0.545000
0.430000
Difference = p (1) - p (2)
Estimate for difference: 0.115
95% lower bound for difference: 0.0333288
Test for difference = 0 (vs > 0): Z = 2.30
P-Value = 0.011
Chi-Square Distributions


Right-Skewed distributions with minimum value of 0.
Specific Chi-Square distribution indicated by a parameter called a degrees of freedom.
Chi-Square Goodness-of-Fit Test
H 0:
1 = hypothesized proportion for category 1
.
.
.
k = hypothesized proportion for category k
H a:
Ho is not true, so at least one of the category proportions differs from the corresponding hypothesized value.
Test Statistic:  2 =
(observed cell count - expected cell count) 2

expected cell count
Rejection Region: Reject H0 if  2   2  , k -1
Assumptions:
1. A random sample is selected from the population.
2. Expected cell count  5 in all cells.
Example M&M’s plain chocolate candies come in six different colors: brown, yellow, red, orange, green, and tan.
According to the manufacturer (Mars, Inc.), the color ratio in each large production batch is 30% brown, 20% yellow,
20% red, 10% orange, 10% green, and 10% tan. To test this claim, a professor at Carleton College (Minnesota) had
students count the colors of M&M’s found in “fun size” bags of candy (Teaching Statistics, Spring 1993). The results for
the 370 M&M’s are shown in the table. [Note: In 1995, Mars, Inc. added a seventh color - blue - to bags of M&M’s.]
Color
# M&M’s
Brown
84
Yellow
79
Red
75
Orange
49
Green
36
Tan
47
Total
370
Conduct a test to determine whether the true percentages of the colors produced differ from the manufacturer’s stated
percentages. Use =. 05.
Chi-Square Test of Independence
H0: The two variables are independent
Ha: The two variables are dependent(related)
Test Statistic:  2 =
(observed cell count - expected cell count) 2

expected cell count
all
cells
Where expected cell count = (row total  column total)/total sample size
Rejection Region: Reject Ho if  2   2  , (r -1 )(c-1 )
Assumptions:
1. A random sample is selected from the population.
2. Expected cell count  5 in all cells.
Example Opinion polls often provide information on how different groups’ opinions vary on controversial issues. A
random sample of 102 registered voters was taken from the Supervisor of Election’s roll. Each of the registered voters
was asked the following two questions:
1. What is your political party affiliation?
2. Are you in favor of increased arms spending?
The results are summarized in the table below.
Opinion
Favor
No favor
Democrat
16
24
Party
Republican
21
17
None
11
13
Conduct test to determine if the opinions of individuals concerning military spending are related to party affiliation.
Minitab Commands: > stat > tables > Chi-Square Test
Minitab Output:
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts
1
2
C1
16
18.82
0.424
C2
21
17.88
0.544
C3
11
11.29
0.008
Total
48
24
21.18
0.376
17
20.12
0.483
13
12.71
0.007
54
Total
40
38
24
102
ChiSq = 1.841, DF = 2, P-Value = 0.398
Download