Let’s flip a coin

advertisement
Let’s flip a coin
Making Data-Based Decisions
We’re going to flip a coin 10 times.
What results do you think we will get?
The Research Question…
Hypotheses:
Null hypothesis: A coin toss will results in 50 % head
and 50% tails.
 Expected data: Equal numbers of heads and tails
Alternative hypothesis 1: Heads will occur more often
compared to tails.
 More heads than tails
Alternative hypothesis 2: Heads will occur less often
compared to tails.
 Fewer heads than tails
Testing:
 Flip coin 10 times
Results - out of 10 flips
What is the minimum number of heads that you would
expect if:
 the null hypothesis is correct? Why?
(A coin toss will results in 50 % head and 50% tails.)
 alternative hypothesis 1 is correct? Why?
(Heads will occur more often compared to tails.)
 alternative hypothesis 2 is correct? Why?
(Heads will occur less often compared to tails.)
Results, coin flipped 10 times
 8 or 80% of the flips were heads
Which hypothesis does this support?
 null hypothesis: A coin toss will results in 50 % head and 50%
tails.
 alternative hypothesis 1: Heads will occur more often compared
to tails.
 alternative hypothesis 2: Heads will occur less often compared
to tails.
Results - out of 10 flips
 8 or 80% of the flips were heads.
Is it possible that we could have gotten 8 heads if the
other hypotheses were correct, too?
What if we could actually calculate the
likelihood of getting at least 8 heads?
For instance
 If null hypothesis fails to be rejected – what is the
probability that we could have actually gotten 8?
 If alternative hypothesis 1 is supported, what is the
probability that we could have actually gotten 8?
Testing the null hypothesis is the easiest…why?
The problem is….
 Null hypothesis – equal numbers of predicted flips
 Alternative 1 – how many more heads would we
expect?
WE KNOW EXACTLY WHAT TO EXPECT FOR THE
NULL, BUT HAVE NO IDEA WHAT VALUES TO
EXPECT FOR THE ALTERNATIVE
Let’s set up a simulation….
 Let’s flip a coin 10 times as one sample
• How many heads would you expect to get? Explain
TO DETERMINE PROBABILITY REPEAT A LOT OF
TIMES AND SEE HOW OFTEN WE GET AT LEAST 8
HEADS
Count of Samples
Simulation – 20 samples
X
X
0
1
2
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
3
4
5
6
7
8
9
10
8
9
10
Number of “correct flips”
Number Heads
Chance of getting this
many heads
0
0.0
1
2
0.05 0.0
3
4
5
0.15 0.15 0.25
6
7
0.2
0.1
0.05 0.05
0
Question…
Why don’t we get 5 heads every time we flip a coin 10
times?
Why are some values not represented? We didn’t get any
samples with 0, 2, or 10 heads?
The Impact of Sampling
 We are sampling
 We don’t expect every sample to look exactly like the
population.
 There is going to be variability because of chance
Simulation – 20 samples
Number
Heads
Total Times
occurred
Probability
of
Occurring
0
1
2
3
4
5
6
7
8
9
10
0
1
0
3
3
5
4
2
1
1
0
0.0
0.05
0.0
0.2
0.1
0.15 0.15 0.25
0.05 0.05
0
Simulation – 10000 samples
Number
0
Heads
Total
Times
5
occurred
Probability
of
0.0005
Occurring
1
2
3
4
5
6
7
8
9
10
92
461
1154
1981
2537
2063
1117
479
101
6
0.009
0.046
0.115 0.198 0.245 0.206 0.117 0.048 0.010
0.0006
The Big Question…
What is the likelihood (probability) of having AT LEAST
8 heads in our sample (getting 8, 9, or 10 heads)?
My simulation – 10000 samples
Number
0
Heads
Total
Times
5
occurred
Probability
of
0.0005
Occurring
1
2
3
4
5
6
7
8
9
10
92
461
1154
1981
2537
2063
1117
479
101
6
0.009
0.046
0.115 0.198 0.245 0.206 0.117 0.048 0.010
0.0006
The Big Question…
What is the likelihood (probability) of having AT LEAST 8
heads in our sample?
p= 0.048 +0.01 + 0.006 so p=0.0586
(8)
(9)
(10)
My simulation – 10000 samples
Number
0
Heads
Total
Times
5
occurred
Probability
of
0.0005
Occurring
1
2
3
4
5
6
7
8
9
10
92
461
1154
1981
2537
2063
1117
479
101
6
0.009
0.046
0.115 0.198 0.245 0.206 0.117 0.048 0.010
0.0006
The Big Question…
What is the likelihood (probability) of having AT LEAST
8 heads in our sample?
p=0.0586 or the likelihood of this occurring is 6 times
out of 100.
My simulation – 10000 samples
Number
0
Heads
Total
Times
5
occurred
Probability
of
0.0005
Occurring
1
2
3
4
5
6
7
8
9
10
92
461
1154
1981
2537
2063
1117
479
101
6
0.009
0.046
0.115 0.198 0.245 0.206 0.117 0.048 0.010
0.0006
Logic of Statistical Testing
 Inferring from samples – INFERENTIAL STATISTICS
 Scientists collect data from a sample and determine
whether or not that sample provides EVIDENCE
AGAINST the null hypothesis.
 If the null hypothesis is true, what is the probability we
would have randomly chosen a sample with the values
we observed?
 Analysis:
 By looking at our probability of obtaining 80% or 8
heads in a sample of 10 flips, we can make a decision.
 PROBABILITY IS OFTEN CALLED THE P value
Our Example
 Likelihood of getting 8 heads out of ten in our
sample if the null hypothesis were actually true is
p=0.0586 meaning it would occur roughly 6 times
out of 100.
 Do you consider this value low or high?
 Do you think it provides enough evidence against the
null hypothesis?
Statistical Significance
 Need a cut point for the p-value
 Common “cut points”: 0.05, 0.01, .001
 If P value < 0.05,
•
you say the result is “statistically significant” and you
reject the null hypothesis.
• If the null hypothesis is true, the probability of
randomly getting the observed sample is unlikely.
• This provides evidence against the null hypothesis and
we would REJECT the null hypothesis, suggesting one
of the alternative hypotheses were correct.
Statistical Significance
 If P value > 0.05,
 You say the results were “not statically significant”
 If the null hypothesis is true, the probability of
randomly getting the observed sample is likely.
 This does not provides evidence against the null
hypothesis and we would FAIL TO REJECT the null
hypothesis, allowing us to reject the alternative
hypotheses.
Statistical Tests/Hypothesis
Testing/Inferential Test:
 All statistical tests provide a P-value that is the
probability that your results would have occurred if the
null hypothesis were true.
 They use information from your data (mean, standard
deviation, etc.) to figure out a probability based upon a
population that meets the null hypothesis (much like
our coin simulation).
 You use the p-value to make a data-driven decision
Question:
 What do you think would happen to the probability of
getting 80% heads if we had flipped more:
 16 heads out of 20? p = 0.01
 40 heads out of 50? p<0.0001
 Increasing your sample size decreases the chance that
your results will be impacted by errors or chance
factors that might mask differences.
Example Hypotheses and P value
Null Hypothesis
The mean life-span is 15
years.
P-Value
Decision
0.078
The correlation between
amount of nutrient and
growth is 0.
The mean height of plants
exposed to sunlight equals
the mean height of plants
not exposed to light.
Cut-off value is 0.05
Interpretation
Example Hypotheses and p-value
Null Hypothesis
The mean life-span is 15
years.
P-Value
0.078
Decision
Interpretation
Do not reject the There is no
null hypothesis evidence to suggest
the mean life-span
is not 15 years.
The correlation between
amount of nutrient and
growth is 0.
The mean height of plants
exposed to sunlight equals
the mean height of plants
not exposed to light.
Cut-off value is 0.05
Example Hypotheses and p-value
Null Hypothesis
P-Value
The mean life-span is 15
years.
0.078
The correlation between
amount of nutrient and
growth is 0.
0.010
Decision
Interpretation
Do not reject the There is no
null hypothesis evidence to suggest
the mean life-span
is not 15 years.
The mean height of plants
exposed to sunlight equals
the mean height of plants
not exposed to light.
Cut-off value is 0.05
Example Hypotheses and p-value
Null Hypothesis
P-Value
Decision
Interpretation
The mean life-span is 15
years.
0.078
Do not reject the There is no
null hypothesis evidence to suggest
(P > 0.05)
the mean life-span
is not 15 years.
The correlation between
amount of nutrient and
growth is 0.
0.010
Reject the null
hypothesis
(P < 0.05)
The mean height of plants
exposed to sunlight equals
the mean height of plants
not exposed to light.
Cut-off value is 0.05
There is evidence
to suggest the
correlation is not
zero.
Example Hypotheses and p-value
Null Hypothesis
P-Value
Decision
Interpretation
The mean life-span is 15
years.
0.078
Do not reject the There is no
null hypothesis evidence to suggest
(P > 0.05)
the mean life-span
is not 15 years.
The correlation between
amount of nutrient and
growth is 0.
0.010
Reject the null
hypothesis
(P < 0.05)
The mean height of plants
exposed to sunlight equals
the mean height of plants
not exposed to light.
0.0001
Cut-off value is 0.05
There is evidence
to suggest the
correlation is not
zero.
Example Hypotheses and p-value
Null Hypothesis
P-Value
Decision
Interpretation
The mean life-span is 15
years.
0.078
Do not reject the There is no
null hypothesis evidence to suggest
(P > 0.05)
the mean life-span
is not 15 years.
The correlation between
amount of nutrient and
growth is 0.
0.010
Reject the null
hypothesis
(P < 0.05)
There is evidence
to suggest the
correlation is not
zero.
The mean height of plants
exposed to sunlight equals
the mean height of plants
not exposed to light.
0.0001
Reject the null
hypothesis
(P < 0.05)
There is evidence
to suggest light
makes a difference
on plan growth.
Cut-off value is 0.05
Download