Activity 13 “I Didn’t Get Enough Blues!” Questions to keep in mind: i. What is the purpose of this new procedure? What is it called? ii. How do I calculate the chi-square number? iii. How do I calculate df? iv. What does the new distribution look like? v. How do I calculate the P-value using the table? vi. How do I calculate the P-value using the calculator? Note: Create document YI_Activity13.doc with tables 1 and 2, density curve graphs and, answers to the questions i. through vi. Submit your work on edmodo.com Materials needed: One 1.69-ounce bag of plain M&M's per student. The M&M/Mars Company, headquartered in Hackettstown, New Jersey, makes plain and peanut chocolate candies. In 1995, they decided to replace the tan-colored M&M’s with a new color. After conducting an extensive national preference survey, they decided to replace the tan M&M's with blue M&M's. The company's Consumer Affairs Department announced: On average, the new mix of colors of M&M's Plain Chocolate Candies will contain 30 percent browns, 20 percent each of yellows and reds and 10 percent each of oranges, greens, and blues. They explained: While we mix the colors as thoroughly as possible, the above ratios may vary somewhat, especially in the smaller bags. This is because we combine the various colors in large quantities for the last production stage (printing). The bags are then filled on high-speed packaging machines by weight, not by count. The purpose of this activity is to compare the color distribution of M&M's in your individual bag with the advertised distribution. We will want to see if there is sufficient evidence to dispute the company's claim for their distribution. In order to use as random a sample as possible, it is best if the bags of M&M's are purchased at different stores and not obtained from one or a few sources of supply. According to M&M website, each package of Milk Chocolate M&M's should contain 24% blue, 14% brown, 16% green, 20% orange, 13% red, and 14% yellow M&M's. Open your bag and carefully count the number of M&M's of color-brown, yellow, red, orange, green, and blue-as well as the total number of M&M's in the bag. 2. Fill in the counts, by color, and the total number of M&M's for in the "Observed" row in Table 1: 1. Color Observed Expected (O – E)2 / E Brown Yellow Red Orange Green Blue Total χ2= To obtain the expected counts, multiply the total number of M&M's in your bag by the company's stated percentages (expressed in decimal form) for each of the colors. 4. For each color, perform this calculation: 3. (observed - expected)2/ expected and enter the result in the last row of the table. Then add up all of these calculated values, and name the sum χ 2 . Keep this number handy -- you will use it later in the chapter. 5. If your sample reflects the distribution advertised by the M&M/Mars Company, then there should be very little difference between the observed counts and the expected counts. Hence the calculated values making up the sum χ 2 should be very small. Are the entries in the last row all about the same, or does any of the quantities stand out because they are "significantly" larger? Did you get more of a particular color than you expected? Did you get fewer of a particular color than you expected? 6. Combine the counts obtained by all the students in your class to obtain a total count of M&M's of each color. Record the results in Table 2: Color Observed Expected (O – E)2 / E 7. Brown Yellow Red Orange Green Blue Total χ2= You will need this data in the exercises. Record the total number of M&M's in each student's bag in your class. How did your bag compare with those of your classmates? Test for Goodness of Fit You could use the z test described in the last chapter to test the hypotheses H0: p = 0.10 Ha: p < 0.10 where p is the proportion of blue M&M's. You could then perform additional tests of significance for each of the remaining colors. But this would be inefficient. More important, it wouldn't tell us how likely it is that six sample proportions differ from the values stated by M&M/Mars as much as our sample does. There is a single test that can be applied to see if the observed sample distribution is significantly different from the hypothesized population distribution. It is called the chi-square ( χ 2 ) test for goodness of fit. The Chi-Square ( χ 2 ) distributions • The chi-square distributions are a family of distributions that take only positive values and are skewed to the right. A specific chi-square distribution is specified by one parameter, called the degrees of freedom. • Since we are working with percentages, five of the six percentages are free to vary, but the sixth is not, since all six have to add to 100. • In this case, we say that there are 6-1 degrees of freedom. Draw the density curves for three members of the chi-square family of distributions. a. Adjust your WINDOW settings as shown b. Enter functions Y1, Y2, and Y3 in the y-editor as illustrated below. χ 2 pdf can be found in the DISTR menu (2nd VARS) option 6 on the TI 83 and in the CATALOG under Flash Apps on the TI 89 c. Change the graph style on Y2 to a thick line. d. Graph the chi-square density curves for df=1, 4, and 8 on the same axes. • • • As the degrees of freedom increase, the density curves become less skewed and larger values become more probable. Table E in the back of the book gives critical values for chi-square distributions. You can use Table E if software does not give you P-values for a chi-square test. The chi-square density curves have the following properties: i. The total area under a chi-square curve is equal to 1. ii. Each chi-square curve (except when df = 1) begins at 0 on the horizontal axis, increases to a peak, and then approaches the horizontal axis asymptotically from above. iii. Each chi-square curve is skewed to the right. As the number of degrees of freedom increase, the curve becomes more and more symmetrical and looks more like a normal curve. We use the chi-square density curve with n-1 degrees of freedom to calculate the P-value in a goodness of fit test. A goodness of fit test is used to help determine whether a population has a certain hypothesized distribution, expressed as proportions of population members falling into a various outcome categories. To test the hypothesis H0: the actual population proportions are equal to the hypothesized proportions first calculate the chi-square test statistic 2 ( O − E ) χ 2 =∑ E Then χ 2 has approximately a χ 2 For a test H against the alternative hypothesis Ha: the actual population proportions are equal to the hypothesized Proportions the P-value is P( χ 2 ≥ Χ 2 )