Chapter 12Inferences on Categorical Data Ch 12.1 Goodness-of-Fit Test Objective A : Goodness-of-Fit Test Example 1: Determine the expected counts for each outcome. Ei i npi E1 700 * 0.15 105 E 2 700 * 0.30 210 E3 700 * 0.35 245 E 4 700 * 0.20 140 Example 2: How was E1 , E2 ,...... being calculated? Stat --> Calculator --> Binomial --> Input the following --> Compute! Since the grand total is (260+ 400 +280 + 50 + 10) = 1000, for X = 0, the expected count is 1000*0.2401 = 240.1. We use the same procedure to find the rest of E i . (a) Determine the 2 test statistic. 2 Oi Ei 2 Ei (260 240.1) 2 (400 411.6) 2 (280 264.6) 2 (50 75.6) 2 (10 8.1) 2 240.1 411.6 264.6 75.6 8.1 = 11.987 = (b) Determine the degrees of freedom. DF k 1 5 1 4 (c) Use StatCrunch to determine the P value. Stat --> Calculator --> Chi-Square --> Standard --> The hypothesis tests of this section are always right-tailed; Input the following. P value = 0.0174 (d) Test the hypothesis at the 0.05 level of significance. Recall: Ho: The random variable X is binomial with n = 4, p = 0.3. Since P-value (0.0174) is less than alpha (0.05), reject Ho. There is sufficient evidence to support the claim that the random variable X is not binomial with n = 4, p = 0.3. Example 3: Use StatCrunch to perform the hypothesis testing of Example 2 at the 0.01 level of significance. (a) Setup Ho: The random variable X is binomial with n = 4, p = 0.3. H1: The random variable X is not binomial with n=4, p=0.3. (b) P value Open StatCrunch --> Input Observed Counts in Var1 and Expected Counts in Var2 --> Stat --> Goodness-of-fit --> Chi-Square test --> Select Var1 for Observed and select Var2 for Expected --> Compute! P value = 0.0174 (c) Conclusion Since P-value (0.0174) is less not than alpha (0.01), we do not reject the null hypothesis. There is not sufficient evidence to warrant rejection of the random variable X is binomial with n = 4, p = 0.3. Example 4: Total = 53+66+38+96+88+59 = 400 Ho: p1 = 0.12, p2 = 0.15, p3 = 0.12, p4 = 0.23, p5 = 0.23, p6 = 0.15 H1: At least one of the proportion is not equal to the given claim. E1 = 0.12*400 = 48, E2 = 0.15*400 = 60, E3 = 0.12*400 = 48, E4 = 0.23*400 = 92, E5 = 0.23*400 = 92, E6 = 0.15*400 = 60. X Brown Observed 53 Expected 48 Yellow 66 60 Red 38 48 Blue 96 92 Orange 88 92 Green 59 60 Open StatCrunch --> Input Observed Counts in Var1 and Expected Counts in Var2 --> Stat --> Goodness-of-fit --> Chi-Square test --> Select Var1 for Observed and select Var2 for Expected --> Compute! Since P-value (0.613) is not less than alpha, do not reject the null hypothesis. There is not sufficient evidence to warrant rejection of the distribution claimed by the manufacturer of M&Ms. Ch 12.2Tests for Independence Objective A :Tests for Independence Example 1: (a) Compute the expected values of each cell under the assumption of independence. By assuming an individual opinion and gender are independent, (row total)(column total) . E ( grand total) row1 total = 196+199 = 395 col1 total = 196+239 = 435 row2 total = 239+249 = 488 col2 total = 199+249 = 448 Grand total = 196 + 199 + 239 + 249 = 883 E11 (395)( 435) (395)( 448) 194.592 E12 200.408 883 883 E 21 (488)( 435) (488)( 448) 240.408 E 22 247.592 883 883 Summarize the observed counts and expected counts in a table where the expected counts are expressed in a parenthesis. Gender Men Women Column Total Pro Life 196 (194.592) 239 (240.408) 435 Pro Choice 199 (200.408) 249 (247.592) 448 Row Total 395 488 883 (Grand Total) (b) Verify that the requirements for performing a chi-square test of independence are satisfied. (1) All expected frequencies are greater than or equal to 1. "True" (2) No more than 20% of the expected frequencies are less than 5. "True" (c) Determine the 2 test statistic. 2 Oi Ei 2 Ei (196 194.592) 2 (199 200.408) 2 (239 240.408) 2 (249 247.592) 2 194.592 200.408 240.408 247.592 = 0.0374 = (d) Determine the degrees of freedom. DF = (r - 1)(c - 1) = (2 - 1)(2 - 1) = 1 (e) Use StatCrunch to determine the P value. Stat --> Calculator --> Chi-Square --> Standard --> The hypothesis tests of this section are always right-tailed; Input the following. (f) Test whether an individual's opinion regarding abortion is independent of gender at the 0.10 level of significance. Ho: An individual opinion regarding abortion is independent of gender. H1: An individual opinion regarding abortion is dependent of gender. Since P-value (0.847) is not less than alpha (0.10), do not reject the null hypothesis. An individual opinion regarding abortion is independent of gender. Example 2: Use StatCrunch to redo example 1 of testing whether an individual's opinion regarding abortion is independent of gender at the 0.10 level of significance. (a) Setup Ho: An individual opinion regarding abortion is independent of gender. H1: An individual opinion regarding abortion is dependent of gender. (b) P value Open StatCrunch --> Input the data (see below) Stat --> Tables --> Contingency --> With Summary --> Under Select Column(s), click Pro Life and Ctrl click Pro choice --> Under Row Labels, select Gender --> Under Display, select Expected count --> Compute! StatCrunch Results: (c) Conclusion Since P-value (0.8489) is not less than alpha (0.05), do not reject the null hypothesis. An individual opinion regarding abortion is independent of gender. Example 3: Test whether prenatal care and the wantedness of pregnancy are independent at the 0.05 level of significance. (a) Setup Ho: Prenatal care and wantedness of pregnancy are independent. H1: Prenatal care and wantedness of pregnancy are dependent. (b) P value Open StatCrunch --> Input the data (see below) Stat --> Tables --> Contingency --> With Summary --> Under Select Column(s), click Less than 3 months, 3 to 5 Months, and More Than 5 Months (use the Ctrl) --> Under Row Labels, select Wantedness of Pregnancy --> Under Display, select Expected count --> Compute! StatCrunch Results: (c) Conclusion Since P-value (0.0003) is less than alpha (0.05), reject the null hypothesis. Prenatal care and wantedness of pregnancy are dependent. Ch 12.3Comparing Three or More Means - One-Way Analysis of Variance, ANOVA Objective A :One-Way ANOVA Test (Supplemental Materials) Example 1: (a) Ho: 1 2 3 H1: At least one of the means is different from the others. (b) 1. 2. 3. 4. There are 3 simple random samples. The 3 samples are independent of each other. Normal probability plots indicate that the sample data come from a normal population. If the largest sample standard deviation is no more than twice the smallest sample standard deviation, we can assume the populations have the same variance. Stat Summary Stats Columns Input the following, then click Compute! StatCrunch Results: 0.109 is not larger than twice (0.063) 0.109 is not larger than 0.126. Thus we can assume the populations have the same variance. (C) Test 1 2 3 Stat ANOVA One Way Select all three columns (Simple, Go/No Go, Choice) Compute! StatCrunch Results: Note: We could have obtained each sample standard deviation using ANOVA instead of using Summary Stats. Let’s try to understand the meaning of each output. Column statistics Std. Error for Simple St. Dev. of Simple divided by the square root of n sx1 sx1 0.063750817 = 0.026026162 6 s1 n1 Std. Error for Go/No Go sx 2 Std. Error for Choice sx 3 s2 0.10996803 0.044894258 n2 6 s3 0.091593486 0.037392884 n3 6 ANOVA table SS(Columns) = SS(treatment) = SS(between samples/groups) = 0.052888444 Sum of square due to treatment which is a measure of the variation between the sample means. MS(Columns) = MS(treatment) = SS(treatment)/DF of treatment = 0.052888444/2 = 0.026444222 The mean square due to treatment SS(Error) = SS(within samples/groups) = 0.1227325 Sum of square due to error which is a measure of the variation within each treatment group. MS(Error) = SS(Error)/DF of error = 0.1227325/15 = 0.0081821667The mean square due to error SS(Total) = SS(treatment) + SS(error) = 0.052888444 + 0.1227325 = 0.17562094 F-Stat = MS(treatment)/MS(error) = 0.026444222/0.0081821667 = 3.2319339 P-value = a right-tailed area with F-Stat of 3.2319339 (F Distribution) = 0.0681 Since P-value (0.0681) is not less than alpha (0.05), do not reject Ho: 1 2 3 . There is not sufficient evidence to support the claim that at least one population mean is different from the others. (d) Draw boxplots of the three stimuli. Stat Graph Boxplots Select all three columns Check draw boxes horizontally Compute! The visual display supports 1 2 but 3 = 1 2 is not as obvious. Ch 12.4 Two - Way Analysis of Variance (Supplemental Materials) Objective A : Two - Way ANOVA Test An example of interaction effect is sleeping pills and alcohol. They are usually not fatal when taken alone, but can be fatal when combined. Example 1: Example 2: Redo Example 1 using StatCrunch. (a) Verify that the largest sample standard deviation is no more than twice the smallest standard deviation. If this is true, we can assume the populations have the same variance. Input the data in StatCrunch. Stat Summary Stats Columns Input the following, then click Compute! StatCrunch Results: The largest Std. dev. is 2.6457513 and the smallest Std. dev. is 1.5275252. Since 2.6457513 is not larger than twice(1.5275252), we can assume the populations have the same variance. (b) Test whether there is an interaction effect between the drug dosage and age. Inputting the data is a bit tricky for a two-way ANOVA analysis. Use cut and paste or type the data in to produce three columns. Response (HDL), Row Factor (Age), Col Factor (Dosage) see below. Stat ANOVA Two Way Input the following Compute! StatCrunch Results: Hit > for the next page of the StatCrunch outputs: The above graph is called the interaction plots. We look at the level of parallelism among the lines. Since the lines are roughly parallel, we conclude there is no interaction between age and drug dosages. (c) If there is no interaction between age and drug dosages, determine whether there is sufficent evidence to conclude that the mean increase in HDL cholesterol is different (i) among each drug dosage group, (ii) for each age group. (i) Since the P-value (<0.0001) is less than alpha (0.05), reject Ho. There is sufficient evidence to support the claim that HDL cholesterol changes with dosage. Hit > for the next page of the StatCrunch outputs: The above graph indicates the mean of HDL increases as the drug dosage increases. (ii) Since the P-value (0.0838) is not less than alpha (0.05), do not reject Ho. There is not sufficient evidence to support the claim that HDL cholesterol changes with age.