M&Ms Two-way Tables Ellen Gundlach STAT 301 Course Coordinator Purdue University M&Ms Color Distribution % according to their website Brown Yellow Red Blue Orange Green Plain 13 14 13 24 20 16 Peanut 12 15 12 23 23 15 Peanut 10 Butter/ Almond 20 10 20 20 20 Skittles Color Distribution % according to their hotline Red Skittles 20 Orange Yellow Green Purple 20 20 20 20 My M&Ms data in counts Plain Brown Yellow Red Blue Orange Green Total 14 10 10 8 4 8 54 Peanut 2 3 5 0 8 4 22 Total 13 15 8 12 12 76 16 My M&Ms data: joint % (divide counts by total = 76) Brown Yellow Red Plain 18.4 Peanut 2.6 Blue Orange Green 13.2 13.2 10.5 5.3 10.5 3.9 6.6 5.3 0 10.5 My M&Ms data: marginal %s for color (add down the columns) Brown Yellow Red Plain 18.4 Blue Orange Green 13.2 13.2 10.5 5.3 10.5 Peanut 2.6 3.9 6.6 5.3 Marg. 21.0 for color 17.1 19.8 10.5 15.8 0 10.5 15.8 Total 100 My M&Ms data: marginal %s for flavor (add across the rows) Brown Yellow Red Plain 18.4 Peanut 2.6 Total Blue Orange Green Marg. for flavor 13.2 13.2 10.5 5.3 10.5 71.1 3.9 6.6 5.3 28.9 0 10.5 100 My M&Ms data: joint and marginal %s Brown Yellow Red Plain 18.4 Blue Orange Green Marg. for flavor 13.2 13.2 10.5 5.3 10.5 71.1 Peanut 2.6 3.9 6.6 5.3 28.9 Marg. for color 17.1 19.8 10.5 15.8 15.8 100 21.0 0 10.5 Conditional distribution of flavor for color • We know the color of our M&M already, but now how is flavor distributed for this color? joint % of color and flavor marginal % of color Conditional distribution example • We know we have a red M&M, so what is the probability it is a plain M&M? joint % of red and plain 13.2 66.7% marginal % of red 19.8 Conditional distribution of color for flavor • We know the flavor of our M&M already, but now how is color distributed for this color? joint % of color and flavor marginal % of flavor Conditional distribution example • We know we have a peanut M&M, so what is the probability it is green? joint % of peanut and green 5.3 18.3% marginal % of peanut 28.9 Conditional distributions in general Conditional distribution of X for Y (we know Y for sure already, but we want to know the probability or % of having X be true as well): joint % of X and Y marginal % of Y (what we know for sure) Bar graphs for conditional distribution of color for both flavors Conditional distribution of color for Milk Chocolate M&Ms Conditional distribution of color for Peanut M&Ms 30 40 25 30 Percent Percent 20 15 20 10 10 5 0 0 blue brown green orange red color for milk chocolate M&Ms Cases weighted by percentages for plain M&Ms yellow brown green orange red color for peanut M&Ms Cases weighted by percentages for peanut M&Ms yellow Chi-squared hypothesis test H0: There is no association between color distribution and flavor for M&Ms. Ha: There is association between color distribution and flavor for M&Ms. Use an = 0.01 for this story. Full-class M&Ms data in counts (large sample size necessary for test) Brown Yellow Red Plain 147 Peanut 69 Blue Orange Green 302 264 407 330 373 110 70 123 162 148 Chi-squared test SPSS results Chi-Square Tests Pearson Chi-Square Likelihood Ratio N of Valid Cases Value 14.396a 14.623 2505 df 5 5 Asymp. Sig. (2-sided) .013 .012 a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 58.81. Chi-squared test conclusions • Test statistic = 14.396 and P-value = 0.013 • Since P-value is > our of 0.01, we do not reject H0. • We do not have enough evidence to say there is association between color distribution and flavor for M&Ms. Skittles vs. M&Ms • Now we will compare the proportion of yellow candies for Skittles and for M&Ms. • The previous two-way table with plain and peanut M&Ms was of size 2 x 6. • This table will be of size 2x2 because we only care about whether a candy is yellow or non-yellow. Full-class M&Ms and Skittles data in counts (large sample size necessary for test) Yellow NonTotal Yellow Plain M&Ms 302 1521 1823 Skittles 361 1351 1712 Total 663 2872 3535 Chi-squared hypothesis test H0: There is no association between color distribution and flavor for these candies. Ha: There is association between color distribution and flavor for these candies. Use an = 0.01 for this story. Chi-squared test SPSS results Chi-Square Tests Pears on Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test N of Valid Cas es Value 11.839b 11.544 11.840 df 1 1 1 Asymp. Sig. (2-s ided) .001 .001 .001 Exact Sig. (2-s ided) Exact Sig. (1-s ided) .001 .000 3535 a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count les s than 5. The minimum expected count is 321. 09. Chi-squared test conclusions • Test statistic = 11.839 and P-value = 0.001 • Since P-value is < our of 0.01, we reject H 0. • We have evidence that there is association between color distribution and flavor for these candies. Another way to do this test Since this is a 2x2 table, and if we are only interested in a 2-sided () hypothesis test, we can use the 2-sample proportions test here. 2-sample proportion test hypotheses H0: pM&Ms = pSkittles Ha: pM&Ms pSkittles Defining the proportions p M&Ms # yellow M&Ms total # M&Ms pSkittles # yellow Skittles total # Skittles Test statistic Z pˆ M & Ms pˆ Skittles 1 1 pˆ (1 pˆ ) nM & Ms nSkittles Results from the proportion test • Sample proportions: pˆ M &Ms 0.166 and pˆ Skittles 0.211 • Test statistic Z = -3.44 • P-value = 2(0.0003) = 0.0006 • Since P-value < our of 0.01, we reject H0. Conclusion to the proportion test • We have evidence the proportion of yellow M&Ms is not the same as the proportion of yellow Skittles. • In other words, the type of candy makes a difference to the color distribution. How do our results from the 2 tests compare? • The X2 test statistic = 11.839, which is actually the (Z test statistic = -3.44)2. • If you take into account the rounding, the Pvalues for both tests are 0.001. • We rejected H0 in both tests. When do you use which test? • Chi-squared tests are best for: two-sided hypothesis tests only 2x2 or bigger tables • Proportion (Z) tests are best for: one- or two-sided hypothesis tests only 2x2 tables