Skittles Term Project for Statistics

advertisement
Skittles Term Project for Statistics
by
Chris Blake
This project will help us to better understand the contents of packages of Skittles. We
will look closely at the amounts of each color of Skittles and compare them. I will also compare
the contents of my Skittles bag with the others in the class.
I’ve started by totaling each color of Skittles, creating a pie chart, bar graph and pareto
chart to help us compare data.
CLASS TOTAL COLOR RELATIONSHIPS
1
2
3
19%
4
5
24%
20%
18%
19%
CLASS TOTAL COLOR RELATIONSHIPS
Series1
Series2
Series3
Series4
Series5
350
300
294
250
200
150
100
50
0
Total Class Amounts of Candies by Color
350
300
250
200
150
100
50
0
1
2
3
4
5
I’m observing that red candies are by far the most produced and packaged of the colors of
Skittles. I would have expected yellow or orange to be the most. I guess because it seems like
there are too many yellow and orange. I would do away with both of those colors if it was up
to me.
My bag of Skittles color relationships
18
16
14
12
10
8
6
4
2
0
1
2
3
4
5
The amounts of each color of Skittles in my bag do reflect the true nature of the data. As you
see above, my bag has mostly red, then green, then purple, and yellow and orange are the
same.
5 Number summary for total amount of candies per bag:
Median = 60, Q1 = 59, Q3 = 61.5, Minimum = 51, Maximum = 64
Sample standard deviation = 3.12
Here are my frequencies of candies per bag from 21 total bags:
Candies per bag
70
60
50
40
30
20
10
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21
I’m observing that the Skittles company is pretty consistent at putting a certain amount
of candies into each bag, as well as a certain amount of colors into each bag of candies. The
data is skewed to the left and is a pretty normal distribution. Bell shaped but with a tail on the
left. These graphs don’t reflect what I thought I’d see. I thought there were much more yellow
and orange per bag and was wrong. However, I did think that they would be consistent on
amount of candies per bag. The overall data from the whole class agrees with my data from my
one bag completely. My bag had 60 candies and there were 21 total bags of Skittles. I found
that the pie chart and bar graphs were the easiest to read and comprehend when dealing with
this type of data. I don’t believe a scatter plot would be good for something like this because it
would be hard to understand the data and compare differences.
Confidence Interval Estimates
Explain in general the purpose and meaning of a confidence interval.
A confidence interval is a type of interval estimate of a population parameter. The level of
confidence of the confidence interval would indicate the probability that the confidence
range captures this true population parameter given a distribution of samples.
Construct a 99% confidence interval estimate for the true proportion of yellow candies.
The mean shall fall within 0.165 to 0.222 at a 99% confidence level. And does in my
calculations. Please see answers on the attached Spreadsheet.
Construct a 95% confidence interval estimate for the true mean number of candies per bag.
The mean shall fall within 58.566 to 61.234 at a 95% confidence level. And does in my
calculations. Please see answers on the attached Spreadsheet.
Construct a 98% confidence interval estimate for the standard deviation of the number of candies
per bag.
The mean shall fall within 0.2.27 < 4.855 at a 98% confidence level. And does in my
calculations. Please see answers on the attached Spreadsheet.
Discuss and interpret the results of each of your three interval estimates. Include neatly written
and scanned copies of your work.
After calculating the above I determine that there are in fact 19% yellow candies and I can
say this with a 99% confidence interval and the average # of candies per bag is 60. The
standard deviation falls between 2.27 and 4.855 which fits the standard deviation that I
calculated which is 3.12.
Hypothesis Tests
Explain in general the purpose and meaning of a hypothesis test.
To find out if the hypothesis is likely to be true or false. The goal is to either accept or
reject the null hypothesis.
Use a 0.05 significance level to test the claim that 20% of all Skittles candies are red.
P=0.003 which is less than 0.05; So, I reject the null hypothesis. Work on spreadsheet.
Use a 0.01 significance level to test the claim that the mean number of candies in a bag of
Skittles is 55.
P=6.19 (10^-13) Basically P=0. Which is less than 0.01. So, I reject the null hypothesis.
Work on spreadsheet.
Discuss and interpret the results of each of your two hypothesis tests. Include neatly written and
scanned copies of your work.
20% of the Skittles are NOT red. My calculations said 23% are red though. Pretty close.
P is much lower than alpha here. Null Hypothesis rejected. The data does NOT show that
the null Hypothesis is true.
55 is NOT the mean # of candies per bag. I calculated that 60 is the mean # of candies and
I can tell by completing a Z-test on my TI84 that P is less than alpha here. Not sufficient
data to accept hypothesis. Null Hypothesis rejected…
Reflection
Discuss the conditions for doing interval estimates and hypothesis tests and discuss whether or
not your samples met these conditions. What possible errors could have been made by using this
data? How could the sampling method be improved? State what conclusions you have drawn
from your statistical research.
You should have a random sample larger than 30 if possible. We only had 21 bags of
candy used as n on some of these and 1,257 total candies which is plenty to be used as n on
some of the other calculations. An easy error for me was to switch those 2 numbers up. I
used 1,257 as n when I should have used 21 and visa-versa. The method could be improved
by adding more samples. More bags of candy. My conclusion is that red candies are the
most prevalent color of candies in a bag of Skittles, orange are the least prevalent and the
mean # of candies per bag is 59.9.
In reflection I can see that the information I learned in this class will be helpful to me in
other classes as well as in my general life. I know how the odds work, for and against,
outcomes in a much more proficient way. I’ve found myself questioning the methods of
collecting data in studies I’ve recently read, and I understand the boxplots and graphs that are
used in studies in a much clearer way.
My problem solving skills are more calculated all together after practicing Statistics.
When I make a decision I’ve always calculated risks and rewards in my head, in order to try to
make the right decision. Now I can actually calculate some of those outcomes and make even
more educated decisions.
In a meeting with my daughter’s teacher the other day I was presented with two pages
of box plots. As the teacher explained these to my wife and I she acted as if we would not
understand, and I felt like she hated going over this stuff with parents. However, I did
understand exactly what she was talking about as she went through the means, standard
deviations and more. I reassured her that I knew what she was talking about and showed my
understanding by pointing out each quartile to my wife and explaining, in my way, what these
numbers meant to us as they related to our child.
It made me feel great to have that understanding and made life a bit easier for me in
that moment.
As I move forward through life I will view graphs, charts, plots and even correlation with
clear eyes and confidence. So although this class was very difficult for me, I do appreciate that I
was able to attend and learn all that I have.
Download