Chapter 2 Slides

advertisement
Estimation:
How Large is the Effect?
Chapter 2
Chapter Overview

So far, we can only say things like
◦ “We have strong evidence that the long-run
probability Buzz pushes the correct button is
larger than 0.5.”
◦ “We do not have strong evidence kids have a
preference between candy and a toy when trickor-treating.”

We want a method that says
◦ “I believe 68 to 75% of all elections can be
correctly predicted by the competent face
method.”
Chapter Overview
Estimation tells how large the effect is,
through an interval of values.
 We can be 95% confident that the “true” effect
of taking bi-daily aspirin will reduce the rate of
heart attacks somewhere between 30% and
50%.
“If the election were held today, would you vote
for Barack Obama or Mitt Romney?”
 51% responded Obama (margin of error is ± 3
percentage points)
Confidence Intervals
These interval estimates of a population
parameter are called confidence intervals.
 We will find confidence intervals three ways.

◦ Through a series of tests of significance to see
which proportions are plausible values for the
parameter.
◦ Using the standard deviation of the simulated null
distribution to help us determine the width of the
interval.
◦ Through traditional theory-based methods.
Statistical Inference –
Confidence Intervals
Section 2.1
Can Dogs Sniff Out Cancer?
Example 2.1
Marine sniffing samples
Can Dogs Sniff Out Cancer?
Marine, a dog originally trained for water rescues,
was tested to see if she could detect whether a
patient had colorectal cancer by smelling a sample of
their breath.
 She first smells a bag from a patient with colorectal
cancer.
 Then she smells 5 other samples; 4 from normal
patients and 1 from a person with colorectal
cancer
 She is trained to sit next to the bag that matches
the scent of the initial bag (the “cancer scent”) by
being rewarded with a tennis ball.
Can Dogs Sniff Out Cancer?
Marine was tested with a set of 33 trials.


Null hypothesis: Marine is randomly
guessing which bag is the cancer specimen
(𝜋 = 0.20)
Alternative hypothesis: Marine can detect
cancer better than guessing (𝜋 > 0.20)
𝜋 represents her long-run probability of
identifying the cancer specimen.
Can Dogs Sniff Out Cancer?
30 out of 33 trials resulted in Marine
correctly identifying the bag from the
cancer patient
 So our sample proportion is

𝑝=


30
33
≈ 0.909
Do you think Marine can detect cancer?
What sort of p-value will we get?
Can Dogs Sniff Out Cancer?
Our sample proportion lies more than 10 standard
deviation above the mean and hence our p-value is 0.
Can Dogs Sniff Out Cancer?




While we found that Marine’s probability of
picking the correct specimen is greater than
0.2, but what is it?
Since our sample proportion is about 0.909,
it is plausible that 0.909 is a value for the
probability. What about other values?
Is it plausible that Marine’s probability is
actually 0.70 and she had a lucky day?
Is a sample proportion of 0.909 unlikely if
𝜋 = 0.70?
Can Dogs Sniff Out Cancer?
H0: 𝜋 = 0.70 Ha: 𝜋 ≠ 0.70
 We get a small p-value (0.011) so we can
rule out 0.70 as a probability

Can Dogs Sniff Out Cancer?
What about 0.80?
 Is 0.909 unlikely if 𝜋 = 0.80?

Can Dogs Sniff Out Cancer?

H0: 𝜋 = 0.80
Ha: 𝜋 ≠ 0.80

We get a large p-value (0.146) so 0.80 is a
plausible value for Marine’s long-run probability.
Developing a range of plausible values
If I get a small p-value (like I did with 0.70) I
will conclude that the value under the null is
not plausible. This is when we conclude the
alternative hypothesis.
 If I get a large p-value (like I did with 0.80) I
will conclude the value under the null is
plausible. This is when I can’t conclude the
alternative.

Developing a range of plausible values
Let’s use the one-proportion inference
applet to find a range of plausible values
for Marine’s long term probability of
choosing the correct specimen.
 We will keep the sample proportion the
same and change the possible values 𝜋.
 We will first use 0.10 as our cutoff value
for if a p-value is small or large. (This is
called the significance level.)

Can Dogs Sniff Out Cancer?
We should have found that values between
0.79 and 0.96 are plausible values for
Marine’s probability of picking the correct
specimen.
 We can do more tests and find a more
precise interval to be 0.787 to 0.966.

Probability
under null
0.785
p-value
0.085
0.093
0.110
0.144
Plausible?
No
No
Yes
Yes
0.786
0.787
0.788
0.965
0.966
0.967
0.968
0.109
0.102
0.094
0.088
Yes
Yes
No
No
…………
… Yes …
Can Dogs Sniff Out Cancer?
(0.787, 0.966) is called a confidence
interval.
 Since we used 10% as our significance level,
this is a 90% confidence interval. (100% −
10%)
 90% is the confidence level of the interval
of plausible values.

Can Dogs Sniff Out Cancer?
We would say we are 90% confident that
Marine’s probability of correctly picking
the bag with breath from the cancer
patient from among 5 bags is between
0.797 and 0.966.
 This is a more precise statement than our
initial significance test which concluded
Marine’s probability was more than 0.20.

Significance Level
Confidence Level
Typically we use 0.05 for our significance
level. There is nothing magical about 0.05.
We could set up our test to make it harder
to conclude the alternative (smaller
significance level) or easier (larger
significance level).
 If we increase the confidence level from
90% to 95%, what will happen to the width
of the confidence interval?

Can Dogs Sniff Out Cancer?
Since the confidence level gives an
indication of how sure we are that we
captured the actual value of the parameter
in our interval, to be more sure our
interval should be wider.
 How would we obtain a wider interval of
plausible values to represent a 95%
confidence level?

◦ Use a 5% significance level in the tests.
Can Dogs Sniff Out Cancer?
Values that correspond to 2-sided p-values
larger than 0.05 should now be in our
interval.
 Using the table we developed, what is the
95% confidence interval for Marine’s longrun probability?

Exploration 2.1: Kissing Right
As you work through this exploration …
Exploration 2.1
Your first test will be one-sided, but after
that everything is a two-sided test.
 The sample proportion stays constant.
 The parameter under the null will change
since we are testing to see if different
parameters are plausible or not.
 For small p-values we can rule the
parameter out.
 For large p-values, the parameter is
plausible.

2SD and Theory-Based Methods
for Confidence Intervals
Section 2.2
Introduction



In Section 2.1 we found confidence intervals
by doing repeated tests of significance
(changing the value in the null hypothesis) to
find a range of values that were plausible for
the population parameter (long run relative
frequency or probability).
This is a very tedious way to construct a
confidence interval.
Today, we will look at two other ways to
construct confidence intervals.
◦ Two Standard Deviations (2SD).
◦ Theory-based.
Example 2.2: Halloween
Treats (continued)
In example 1.5 we looked at a study
where researchers investigated whether
children show a preference to toys or
candy
 Test households in five Connecticut
neighborhoods offered children two
plates:

◦ One with candy
◦ One with small, inexpensive toys
Halloween Treats

We tested the hypotheses:
◦ Null: The proportion of trick-or-treaters who
choose candy is 0.5. (π= 0.5)
◦ Alternative: The proportion of trick-ortreaters who choose candy is not 0.5. (π ≠
0.5)

Our sample proportion was:
◦ 148 out of 283 (52.3%) chose candy
Halloween Treats
When we ran this test, we got a very large p-value
so we could not conclude that π was not 0.5. This
means 0.5 is a plausible value for π.
Halloween Treats
What are some other plausible values for π?
 Through repeated tests we come up with:

Prob.
under
null
2-sided
p-value
Plausible
value
(@0.05)?
0.463 0.464 0.465
……
……
0.581 0.582 0.583
0.048
……
……
……
……
0.053 0.047
No
0.049 0.057
No
Yes
Yes
No
0.046
No
Halloween Treats
Thus we found a 95% confidence interval of 0.465
to 0.581.
 This means, we are 95% confident that the
probability a child will choose candy while trick-ortreating is between 0.465 and 0.581
 Does it make sense that 0.5 is in the interval?
Why?

Halloween Treats

Remember that a p-value of less than 0.05
corresponds with a standardized statistic of
2 or larger (or -2 or smaller)
Hence for most
symmetric, bellshaped distributions,
about 95% of the
values in the
distribution fall within
2 SD of the mean.
Halloween Treats


So we could say that a parameter value is
plausible if it is within 2 standard deviations
(SD) from our best estimate of the
parameter, our observed sample statistic.
This gives us the simple formula for a 95%
confidence interval of
𝒑 ± 𝟐𝑺𝑫
Halloween Treats

Using the 2SD method on our Halloween treat
data we get a 95% confidence interval
0.523 ± 2(0.030)
0.523 ± 0.060
The 0.06 in the above is called a margin of
error.
 The interval can also be written as we did
before using just the endpoints; (0.463, 0.583)
 This is approximately what we got with our
range of plausible values method

Halloween Treats



The 2SD method only gives us a 95%
confidence interval
If we want a different level of confidence, we
can us the range of plausible values (hard) or
theory-based methods (easy).
In the theory-based applet, you can input any
level of confidence and the applet will
calculate the confidence interval for you. This
is valid provided there are at least 10
successes and 10 failures in your sample
(validity conditions).
Applets
Let’s check out this example using applets
and doing the 2SD method and theorybased method.
 Remember 52.3% of 283 trick-or-treaters
chose candy.

Predicting Elections from Faces
(continued)
Exploration 2.2
Factors that affect the width of a
confidence interval
Section 2.3
Factors Affecting
Confidence Interval Widths

Level of confidence (e.g., 90% vs. 95%)
◦ As we increase the confidence level, we
increase the width of the interval.

Sample size
◦ As sample size increases, variability decreases
and hence the standard deviation will be
smaller. This will result in a narrower interval.

Sample proportion
◦ Wider intervals when 𝑝 is closer to 0.5.
Level of Confidence
Let’s take another look at the St. George
Hospital heart transplant data.
 In the sample of 361 patients, 71of them
died within 30 days of their heart
transplant, and 290 survived.
 Each of these counts of “successes” and
“failures” is greater than 10, so we can
use our theory-based applet to play
around with confidence intervals.

Level of Confidence
Since the standard
deviation is predictable,
we can use the TheoryBased Inference Applet
to easily find a
confidence interval for
the 30 day mortality
rate for heart transplant
patients at St. George’s.
(Notice that the
confidence level can be
adjusted.)
Level of Confidence



What happens to the width of the confidence
interval as we change the level of confidence?
As the level of confidence increases, the width
of our confidence interval increases (and hence
the margin of error increases).
We are more confident of capturing our
parameter in a wider range of values.
Level of
Confidence
Confidence
Interval
80%
90%
95%
99%
(0.170, 0.224) (0.163, 0.231) (0.156, 0.238)
(0.143, 0.251)
0.197±0.027
0.197±0.054
0.197±0.034
0.197±0.041
Sample Size

We know as sample size increases, the
variability (and thus standard deviation) in
our null distribution decreases
n = 90 (SD = 0.054)
Sample size
SD of null distr.
Margin of error
Confidence interval
n = 361 (SD = 0.026)
n = 1444 (SD = 0.013)
90
361
1444
0.053
0.027
0.013
2 x SD = 0.106
2 × SD = 0.054
2 × SD = 0.026
(0.091, 0.303)
(0.143, 0.251)
(0.171, 0.223)
Sample Size
 (With
everything else staying the
same) increasing the sample size will
make a confidence interval narrower.
Notice:
 The
observed sample proportion is
the midpoint. (that won’t change)
 Margin of error is a multiple of the
standard deviation so as the standard
deviation decreases, so will the margin
of error.
Formula for Theory-Based
Confidence Interval

The Theory-Based Inference applet uses the
one proportion z-interval formula to
calculate confidence intervals:
𝑝 + z*
𝑝 1−𝑝
𝑛
The width of the confidence interval increases
as level of confidence increases (see z*)
 The width of the confidence interval decreases
as the sample size increases
 The value 𝑝 also has a more subtle effect. The
farther it is from 0.5 the smaller the width.

Summary
Increasing the level of confidence will
make a confidence interval wider.
 Increasing the sample size will make a
confidence interval narrower.
 The farther 𝑝 is from 0.5 will make a
confidence interval narrower.

What does 95% confidence mean?
If we repeatedly sampled from a
population and constructed 95%
confidence intervals, 95% of our intervals
should contain the population parameter.
 Notice the interval is the random event
here.
 The population parameter is a fixed
number, we just don’t know what it is.
 Simulating Confidence Intervals Applet.

Type I error




Think back to the St. George’s example that
looked at deaths of heart transplant patients.
We concluded that their death rate was
higher than the national average. Suppose
this resulted in ceasing operations at the
hospital.
Also suppose that in reality their rate was
really the same as the national average.
What we have done is to reject a true null
hypothesis. This is called a type I error and
is sometimes referred to a false alarm.
Type II error
Now suppose we obtained a large p-value so we
didn’t get significant results in the St. George’s
example.
 Hence, we could not conclude that their death
rate was higher than the national average. And
heart operations continued at the hospital.
 Also suppose that in reality their rate was in fact
higher than the national average.
 What we have done is to not reject a false null
hypothesis. This is called a type II error and is
sometimes referred to a missed opportunity.

Type I and Type II Errors
What is true (unbeknownst to us)
What we
decide
(based on
data)
Reject null
hypothesis
Do not
reject null
hypothesis
Null hypothesis is
true
Null hypothesis is
false
Type I Error
(false alarm)
Correct Decision
Correct Decision
Type II Error
(missed
opportunity)
Type I and Type II errors
Reject null
Jury finds
defendant guilty
Not reject null
Jury finds
defendant not
guilty
Null is true
Null is false
Defendant is
innocent
Defendant is
guilty
Type I Error
Correct Decision
Innocent person
Guilty person
goes to prison
goes to prison
Correct Decision
Innocent person
is set free
Type II Error
Guilty person is
set free
The probabilities of
Type I and Type II errors
The significance level is the criterion used
to reject the null hypothesis. We have
been using 0.05 as our significance level.
 The probability of a type I error is the
significance level. (Suppose the
significance level is 0.05. If the null is true
we would reject it 5% of the time and
thus make a type I error 5% of the time.)

Type I and Type II Errors

Think back to the dog sniffing cancer
study:
◦ Describe what a type I error would be.
◦ Describe what a type II error would be.
Competitive advantage to Uniform
Color (continued)
Exploration 2.3A
Download