Analysis of Categorical Data Hypothesis Test for a Population Proportion, Ho: 0 H a: 1. 2. 3. 0 0 0 z T.S. ˆ 0 ˆ R.R. For a probability of a Type-I error 1. Reject Ho if z z 2. Reject Ho if z -z 3. Reject Ho if z z/2 or if z -z/2 Note: Under H0, ˆ 0 (1 0 ) n Assumptions: 1. A random sample is selected from a population 2. The sample size is sufficiently large ( n 0 proportion ˆ 5 and n(1 0 ) 5 ) such that the sampling distribution of the sample is approximately normal. Example State DMV records indicate that of all vehicles undergoing emissions testing during the previous year, 70% passed on the first try. A random sample of 200 cars tested in a particular county during the current year yields 160 that passed on the initial test. Does this suggest that the population proportion for this county during the current year differs from the previous statewide proportion? Conduct hypothesis test using .05. Minitab Commands: > stat > basic statistics > 1 Proportion Minitab OUTPUT Test and CI for One Proportion Test of p = 0.7 vs p not = 0.7 Sample 1 X 160 N 200 Sample p 0.800000 95% CI (0.744564, 0.855436) Z-Value 3.09 P-Value 0.002 Hypothesis Test about the Difference Between Two Population Proportions Using Independent Random Samples Ho: 1 2 0 H a: 1. 1 2 0 2. 1 2 0 3. 1 2 0 T.S. ˆ1 ˆ 2 z = ˆ (1 ˆ )( 1 1 ) n1 n2 where ˆ x1 x 2 n1 n2 R.R. For a given value of 1. reject H0 if z z 2. reject H0 if z -z 3. reject H0 if z z/2 or if z -z/2 Assumptions: 1. Independent random samples. 2. n1 and n2 are sufficiently large ( n1ˆ1 , n1 (1 ˆ1 ), n2ˆ 2 , and sampling distribution of ( ˆ1 ˆ 2 ) is approximately normal. n2 (1 ˆ 2 ) are at least 5 ) such that the Example A law student believes that the proportion of registered Republican in favor of additional tax incentives is greater than the proportion of registered democrats in favor of such incentives. The student acquired independent random samples of 200 republicans and 200 Democrats and found that 109 Republicans and 86 Democrats in favor of additional tax incentives. Use the data to test Ho: 1-2=0 versus Ha: 1 - 2 > 0. Use =.05. Minitab Commands: > stat > basic statistics > 2 Proportions > use pooled estimate Minitab Output Test and CI for Two Proportions Sample 1 2 X 109 86 N 200 200 Sample p 0.545000 0.430000 Difference = p (1) - p (2) Estimate for difference: 0.115 95% lower bound for difference: 0.0333288 Test for difference = 0 (vs > 0): Z = 2.30 P-Value = 0.011 Chi-Square Distributions Right-Skewed distributions with minimum value of 0. Specific Chi-Square distribution indicated by a parameter called a degrees of freedom. Chi-Square Goodness-of-Fit Test H 0: 1 = hypothesized proportion for category 1 . . . k = hypothesized proportion for category k H a: Ho is not true, so at least one of the category proportions differs from the corresponding hypothesized value. Test Statistic: 2 = (observed cell count - expected cell count) 2 expected cell count Rejection Region: Reject H0 if 2 2 , k -1 Assumptions: 1. A random sample is selected from the population. 2. Expected cell count 5 in all cells. Example M&M’s plain chocolate candies come in six different colors: brown, yellow, red, orange, green, and tan. According to the manufacturer (Mars, Inc.), the color ratio in each large production batch is 30% brown, 20% yellow, 20% red, 10% orange, 10% green, and 10% tan. To test this claim, a professor at Carleton College (Minnesota) had students count the colors of M&M’s found in “fun size” bags of candy (Teaching Statistics, Spring 1993). The results for the 370 M&M’s are shown in the table. [Note: In 1995, Mars, Inc. added a seventh color - blue - to bags of M&M’s.] Color # M&M’s Brown 84 Yellow 79 Red 75 Orange 49 Green 36 Tan 47 Total 370 Conduct a test to determine whether the true percentages of the colors produced differ from the manufacturer’s stated percentages. Use =. 05. Chi-Square Test of Independence H0: The two variables are independent Ha: The two variables are dependent(related) Test Statistic: 2 = (observed cell count - expected cell count) 2 expected cell count all cells Where expected cell count = (row total column total)/total sample size Rejection Region: Reject Ho if 2 2 , (r -1 )(c-1 ) Assumptions: 1. A random sample is selected from the population. 2. Expected cell count 5 in all cells. Example Opinion polls often provide information on how different groups’ opinions vary on controversial issues. A random sample of 102 registered voters was taken from the Supervisor of Election’s roll. Each of the registered voters was asked the following two questions: 1. What is your political party affiliation? 2. Are you in favor of increased arms spending? The results are summarized in the table below. Opinion Favor No favor Democrat 16 24 Party Republican 21 17 None 11 13 Conduct test to determine if the opinions of individuals concerning military spending are related to party affiliation. Minitab Commands: > stat > tables > Chi-Square Test Minitab Output: Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts 1 2 C1 16 18.82 0.424 C2 21 17.88 0.544 C3 11 11.29 0.008 Total 48 24 21.18 0.376 17 20.12 0.483 13 12.71 0.007 54 Total 40 38 24 102 ChiSq = 1.841, DF = 2, P-Value = 0.398