Section 4.5
Confidence Intervals
and Hypothesis Tests
Bootstrap and Randomization Distributions
Bootstrap Distribution
Randomization Distribution
Estimates the distribution of
sample statistics
Centered around the observed
sample statistic
Simulate sampling from the
population by resampling from
the original sample
Estimates the distribution of
sample statistics, if H0 true
Centered around the null
hypothesized value
Simulate samples assuming H0
were true
 Big difference: a randomization distribution assumes
H0 is true, while a bootstrap distribution does not
Which Distribution?
 Let  be the average amount of sleep college students get
per night. Data was collected on a sample of students, and
for this sample hours.
 A bootstrap distribution is generated to create a confidence
interval for , and a randomization distribution is
generated to see if the data provide evidence that  > 7.
 Which distribution below is the bootstrap distribution?
(a) is centered
around the
sample statistic,
6.7
Which Distribution?
 Intro stat students are surveyed, and we find that 152 out
of 218 are female. Let p be the proportion of intro stat
students at that university who are female.
 A bootstrap distribution is generated for a confidence
interval for p, and a randomization distribution is
generated to see if the data provide evidence that p > 1/2.
 Which distribution is the randomization distribution?
(a) is centered
around the null
value, 1/2
Body Temperature
We created a bootstrap distribution for average
body temperature by resampling with replacement
from the original sample (
Body Temperature
We also created a randomization distribution to see if average
body temperature differs from 98.6F by adding 0.34 to every
value to make the null true, and then resampling with
replacement from this modified sample:
Body Temperature
 These two distributions are identical (up to random
variation from simulation to simulation) except for
the center
 The bootstrap distribution is centered around the
sample statistic, 98.26, while the randomization
distribution is centered around the null hypothesized
value, 98.6
 The randomization distribution is equivalent to the
bootstrap distribution, but shifted over
Body Temperature
Bootstrap
Distribution
98.26
Randomization
Distribution
H0:  = 98.6
Ha:  ≠ 98.6
98.6
Body Temperature
Bootstrap
Distribution
98.26
Randomization
Distribution
H0:  = 98.4
Ha:  ≠ 98.4
98.4
Intervals and Tests
If a 95% CI contains the parameter in H0, then a
two-tailed test should not reject H0 at a 5%
significance level.
If a 95% CI misses the parameter in H0, then a
two-tailed test should reject H0 at a 5%
significance level.
Intervals and Tests
 A confidence interval represents plausible
values for the population parameter
 If the null hypothesized value IS NOT within
the CI, it is not plausible and should be
rejected
 If the null hypothesized value IS within the CI,
it is plausible and should not be rejected
Body Temperatures
• Using bootstrapping, we found a 95% confidence
interval for the mean body temperature to be
(98.05, 98.47)
• This does not contain 98.6, so at α = 0.05 we
would reject H0 for the hypotheses
H0 :  = 98.6
Ha :  ≠ 98.6
Both Father and Mother
•
“Does a child need both a father and a mother to grow
up happily?”
•
Let p be the proportion of adults aged 18-29 in 2010
who say yes. A 95% CI for p is (0.487, 0.573).
•
Testing H0: p = 0.5 vs Ha: p ≠ 0.5 with α = 0.05, do we
reject H0 or not reject H0?
Do not reject H0
0.5 is within the CI, so
is a plausible value for
p.
http://www.pewsocialtrends.org/2011/03/09/for-millennials-parenthood-trumps-marriage/#fn-7199-1
Both Father and Mother
•
“Does a child need both a father and a mother to grow
up happily?”
•
Let p be the proportion of adults aged 18-29 in 1997
who say yes. A 95% CI for p is (0.533, 0.607).
•
Testing H0: p = 0.5 vs Ha: p ≠ 0.5 with α = 0.05, we
reject H0 or do not reject H0?
Reject H0
0.5 is not within the CI,
so is not a plausible value
for p.
http://www.pewsocialtrends.org/2011/03/09/for-millennials-parenthood-trumps-marriage/#fn-7199-1
Intervals and Tests
 Confidence intervals are most useful when you
want to estimate population parameters
 Hypothesis tests and p-values are most useful
when you want to test hypotheses about
population parameters
 Confidence intervals give you a range of plausible
values; p-values quantify the strength of evidence
against the null hypothesis
Interval, Test, or Neither?
Are the following questions best assessed using a
confidence interval, a hypothesis test, or is statistical
inference not relevant?
•
Do a majority of adults riding a bicycle wear a
helmet?
•
On average, how much more do adults who played
sports in high school exercise than adults who did
not play sports in high school?
•
On average, were the 23 players on the 2010
Canadian Olympic hockey team older than the 23
players on the 2010 US Olympic hockey team?
Cautions About Significance
• With small sample sizes, even large
differences or effects may not be significant
• With large sample sizes, even a very small
difference or effect can be significant
• A statistically significant result is not
always practically significant, especially
with large sample sizes
Statistical vs Practical Significance
Example: Suppose a weight loss program
recruits 10,000 people for a randomized
experiment.
• A difference in average weight loss of only
0.5 lbs could be found to be statistically
significant
• Suppose the experiment lasted for a year. Is
a loss of ½ a pound practically significant?
Diet and Sex of Baby
Are certain foods in your diet associated with
whether or not you conceive a boy or a girl?
To study this, researchers asked women about
their eating habits, including asking whether or
not they ate 133 different foods regularly.
For each of the 133 foods studied, a hypothesis
test was conducted for a difference between
mothers who conceived boys and girls in the
proportion who consume each food.
http://www.newscientist.com/article/dn13754-breakfast-cereals-boost-chances-of-conceiving-boys.html
Hypothesis Tests
 State the null and alternative hypotheses
pb: proportion of mothers who have boys that consume the
food regularly
pg: proportion of mothers who have girls that consume the
food regularly
H0: pb = pg
 If there are NO differences
null
hypotheses are true),
Ha: p(all
≠
p
g
about how many significantbdifferences
would be found
using α = 0.05?
133  0.05 = 6.65
 A significant difference was found for breakfast cereal
(mothers of boys eat more), prompting the headline
“Breakfast Cereal Boosts Chances of Conceiving Boys”.
How might you explain this?
Random chance; several tests (about 6 or 7)
are going to be significant, even if no
differences exist
Multiple Testing
When multiple hypothesis tests are
conducted, the chance that at least one test
incorrectly rejects a true null hypothesis
increases with the number of tests.
If the null hypotheses are all true, α of the
tests will yield statistically significant
results just by random chance.
Multiple Comparisons
• Consider a topic that is being
investigated by research teams all
over the world
 Using α = 0.05, 5% of teams are going
to find something significant, even if the
null hypothesis is true
Multiple Comparisons
• Consider a research team/company
doing many hypothesis tests
 Using α = 0.05, 5% of tests are going
to be significant, even if the null
hypotheses are all true
Multiple Comparisons
This is a serious problem
• The most important thing is to be aware of
this issue, and not to trust claims that are
obviously one of many tests (unless they
specifically mention an adjustment for
multiple testing)
• There are ways to account for this (e.g.
Bonferroni’s Correction), but these are
beyond the scope of this class
Publication Bias
Publication bias refers to the fact that usually
only the significant results get published
• The one study that turns out significant gets
published, and no one knows about all the
insignificant results
• This combined with the problem of multiple
comparisons, can yield very misleading
results
Jelly Beans Cause Acne!
http://xkcd.com/882/
http://xkcd.com/882/
Summary
 If a null hypothesized value lies inside a 95% CI, a
two-tailed test using α = 0.05 would not reject H0
 If a null hypothesized value lies outside a 95% CI,
a two-tailed test using α = 0.05 would reject H0
 Statistical significance is not always the same as
practical significance
 Using α = 0.05, 5% of all hypothesis tests will lead
to rejecting the null, even if all the null
hypotheses are true