Skittles Term Project for Statistics by Chris Blake This project will help us to better understand the contents of packages of Skittles. We will look closely at the amounts of each color of Skittles and compare them. I will also compare the contents of my Skittles bag with the others in the class. I’ve started by totaling each color of Skittles, creating a pie chart, bar graph and pareto chart to help us compare data. CLASS TOTAL COLOR RELATIONSHIPS 1 2 3 19% 4 5 24% 20% 18% 19% CLASS TOTAL COLOR RELATIONSHIPS Series1 Series2 Series3 Series4 Series5 350 300 294 250 200 150 100 50 0 Total Class Amounts of Candies by Color 350 300 250 200 150 100 50 0 1 2 3 4 5 I’m observing that red candies are by far the most produced and packaged of the colors of Skittles. I would have expected yellow or orange to be the most. I guess because it seems like there are too many yellow and orange. I would do away with both of those colors if it was up to me. My bag of Skittles color relationships 18 16 14 12 10 8 6 4 2 0 1 2 3 4 5 The amounts of each color of Skittles in my bag do reflect the true nature of the data. As you see above, my bag has mostly red, then green, then purple, and yellow and orange are the same. 5 Number summary for total amount of candies per bag: Median = 60, Q1 = 59, Q3 = 61.5, Minimum = 51, Maximum = 64 Sample standard deviation = 3.12 Here are my frequencies of candies per bag from 21 total bags: Candies per bag 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 I’m observing that the Skittles company is pretty consistent at putting a certain amount of candies into each bag, as well as a certain amount of colors into each bag of candies. The data is skewed to the left and is a pretty normal distribution. Bell shaped but with a tail on the left. These graphs don’t reflect what I thought I’d see. I thought there were much more yellow and orange per bag and was wrong. However, I did think that they would be consistent on amount of candies per bag. The overall data from the whole class agrees with my data from my one bag completely. My bag had 60 candies and there were 21 total bags of Skittles. I found that the pie chart and bar graphs were the easiest to read and comprehend when dealing with this type of data. I don’t believe a scatter plot would be good for something like this because it would be hard to understand the data and compare differences. Confidence Interval Estimates Explain in general the purpose and meaning of a confidence interval. A confidence interval is a type of interval estimate of a population parameter. The level of confidence of the confidence interval would indicate the probability that the confidence range captures this true population parameter given a distribution of samples. Construct a 99% confidence interval estimate for the true proportion of yellow candies. The mean shall fall within 0.165 to 0.222 at a 99% confidence level. And does in my calculations. Please see answers on the attached Spreadsheet. Construct a 95% confidence interval estimate for the true mean number of candies per bag. The mean shall fall within 58.566 to 61.234 at a 95% confidence level. And does in my calculations. Please see answers on the attached Spreadsheet. Construct a 98% confidence interval estimate for the standard deviation of the number of candies per bag. The mean shall fall within 0.2.27 < 4.855 at a 98% confidence level. And does in my calculations. Please see answers on the attached Spreadsheet. Discuss and interpret the results of each of your three interval estimates. Include neatly written and scanned copies of your work. After calculating the above I determine that there are in fact 19% yellow candies and I can say this with a 99% confidence interval and the average # of candies per bag is 60. The standard deviation falls between 2.27 and 4.855 which fits the standard deviation that I calculated which is 3.12. Hypothesis Tests Explain in general the purpose and meaning of a hypothesis test. To find out if the hypothesis is likely to be true or false. The goal is to either accept or reject the null hypothesis. Use a 0.05 significance level to test the claim that 20% of all Skittles candies are red. P=0.003 which is less than 0.05; So, I reject the null hypothesis. Work on spreadsheet. Use a 0.01 significance level to test the claim that the mean number of candies in a bag of Skittles is 55. P=6.19 (10^-13) Basically P=0. Which is less than 0.01. So, I reject the null hypothesis. Work on spreadsheet. Discuss and interpret the results of each of your two hypothesis tests. Include neatly written and scanned copies of your work. 20% of the Skittles are NOT red. My calculations said 23% are red though. Pretty close. P is much lower than alpha here. Null Hypothesis rejected. The data does NOT show that the null Hypothesis is true. 55 is NOT the mean # of candies per bag. I calculated that 60 is the mean # of candies and I can tell by completing a Z-test on my TI84 that P is less than alpha here. Not sufficient data to accept hypothesis. Null Hypothesis rejected… Reflection Discuss the conditions for doing interval estimates and hypothesis tests and discuss whether or not your samples met these conditions. What possible errors could have been made by using this data? How could the sampling method be improved? State what conclusions you have drawn from your statistical research. You should have a random sample larger than 30 if possible. We only had 21 bags of candy used as n on some of these and 1,257 total candies which is plenty to be used as n on some of the other calculations. An easy error for me was to switch those 2 numbers up. I used 1,257 as n when I should have used 21 and visa-versa. The method could be improved by adding more samples. More bags of candy. My conclusion is that red candies are the most prevalent color of candies in a bag of Skittles, orange are the least prevalent and the mean # of candies per bag is 59.9. In reflection I can see that the information I learned in this class will be helpful to me in other classes as well as in my general life. I know how the odds work, for and against, outcomes in a much more proficient way. I’ve found myself questioning the methods of collecting data in studies I’ve recently read, and I understand the boxplots and graphs that are used in studies in a much clearer way. My problem solving skills are more calculated all together after practicing Statistics. When I make a decision I’ve always calculated risks and rewards in my head, in order to try to make the right decision. Now I can actually calculate some of those outcomes and make even more educated decisions. In a meeting with my daughter’s teacher the other day I was presented with two pages of box plots. As the teacher explained these to my wife and I she acted as if we would not understand, and I felt like she hated going over this stuff with parents. However, I did understand exactly what she was talking about as she went through the means, standard deviations and more. I reassured her that I knew what she was talking about and showed my understanding by pointing out each quartile to my wife and explaining, in my way, what these numbers meant to us as they related to our child. It made me feel great to have that understanding and made life a bit easier for me in that moment. As I move forward through life I will view graphs, charts, plots and even correlation with clear eyes and confidence. So although this class was very difficult for me, I do appreciate that I was able to attend and learn all that I have.