Confidence Intervals and Hypothesis tests with

advertisement
Confidence Intervals
and Hypothesis tests
with Proportions
What happens to your
confidence as the interval
gets smaller?
Your confidence level decreases
with smaller intervals
%
%
%
%
Confidence level
• Is the success rate of the method
used to construct the interval
• Using this method, ____% of the
time the intervals constructed will
contain the true population
parameter
Critical value (z*)
• Found from the confidence level
• The upper z-score with probability p lying to
its right under the standard normal curve
Confidence level
90%
95%
99%
z*=1.645
tail area z*=1.96
z*=2.576z*
.05
.025
.005
1.645
.05
.025 1.96
.005
2.576
Confidence interval for a
But do we know the
population proportion:
population proportion?

pˆ  z *

pˆ1  ppˆ 
n
Statistic + Critical value × Standard deviation of the statistic
Margin of error
What are the steps for performing a
confidence interval?
1.) Assumptions
•
•
•
SRS of context
Approximate Normal distribution because
np > 10 & n(1-p) > 10
Population is at least 10n
2.) Calculations
3.) Conclusion
We are ________%
confident that the
true proportion
context is between
______ and ______.
• As the confidence level increases, do the intervals
generally get wider or more narrow? Explain.
• As the sample size increases, do the intervals
generally get wider or more narrow? Explain.
•When 100 confidence intervals are generated, why
are they all different?
• If the confidence level selected is 90%, about how
many of 100 intervals will cover the true percentage of
orange balls? Will exactly this number of intervals
cover the true percentage each time 100 intervals are
created? Explain.
A May 2000 Gallup Poll found that
38% of a random sample of 1012
adults said that they believe in
ghosts. Find a 95% confidence
interval for the true proportion of
adults who believe in ghost.
Assumptions:
Step 1: check assumptions!
•Have an SRS of adults
•np =1012(.38) = 384.56 & n(1-p) = 1012(.62) = 627.44
Since both are greater than 10, the distribution can be
approximated by a normal curve
2:10,120.
make
•Population of adults isStep
at least
calculations
 .38(.62)
 p 1  p  
  .38  1.96
Pˆ  z * 



n
1012




  .35,.41 


Step 3: conclusion in context
We are 95% confident that the true proportion of
adults who believe in ghosts is between 35% and
41%.
The manager of the dairy section of a
large supermarket took a random
sample of 250 egg cartons and found
that 40 cartons had at least one broken
egg. Find a 90% confidence interval for
the true proportion of egg
cartons with at least one
broken egg.
Assumptions:
Step 1: check assumptions!
•Have an SRS of egg cartons
•np =250(.16) = 40 & n(1-p) = 250(.84) = 210 Since
both are greater than 10, the distribution can be
approximated by a normal curve
2: make
•Population of cartons Step
is at least
2500.
calculations
 .16(.84) 
  .122,.198
.16  1.645


250


Step 3: conclusion in context
We are 90% confident that the true proportion of egg
cartons with at least one broken egg is between
12.2% and 19.8%.
Another Gallop Poll is taken
To findtosample
size: the
in order
measure
proportion of adults
 pwho



1

p

m z *


approve of attempts
to
clone
n


humans.
What
size
is a
However,
since sample
we have not
yet taken
sample, we
know a +
p-hat
(or p)
necessary
todobenotwithin
0.04
oftothe
use!
true proportion of adults who
approve of attempts to clone
humans with a 95% Confidence
Interval?
Another Gallop Poll is taken in order to measure the
proportion of adults who approve of attempts to clone
humans. What sample size is necessary to be within +
0.04 of the true proportion of adults who approve of
attempts to clone humans with a 95% Confidence
 p 1  p  
Interval?


m z *



.04  1.96




.5.5  
n 
n
.5.5 
n
.04

1.96
2
.25
 .04 

 
n
 1.96 
n  600 .25  601
Use p-hat = .5
Divide by 1.96
Square both sides
Round up on sample size
What are hypothesis tests?
Calculations that tell us if the sample
These calculations
(called
the
Is
it
one
of
the
statistics (p-hat) occurs by random
test statistic)
willproportions
tell us how
sample
chance or not OR . . . if it is statistically
many standard
deviations
a
that
are
likely
to
significant
sample proportion
is from the
occur?
IsStatistically
it . . . population
significant
means that it
proportion!
Is it one that
–isaNOT
random
occurrence
to natural
a random
chancedue
occurrence!
isn’t likely to
variation?
occur?
– an occurrence due to some other
reason?
Steps:
1) Assumptions
2) Hypothesis statements &
define parameters
3) Calculations
4) Conclusion, in context
Assumptions for z-test:
•
•
YEA –
These
the same
Have an SRS
of are
context
assumptions as confidence
Distribution is intervals!!
(approximately)
normal because both np > 10 and
n(1-p) > 10
• Population is at least 10n
How to write hypothesis
statements
• Null hypothesis – is the statement
(claim) being tested; this is a statement
of “no effect” or “no difference”
H0:
• Alternative hypothesis – is the
statement that we suspect is true
Ha:
How to write hypotheses:
Null hypothesis
H0: parameter = hypothesized value
Alternative hypothesis
Ha: parameter > hypothesized value
Ha: parameter < hypothesized value
Ha: parameter = hypothesized value
Facts to remember about hypotheses:
• Hypotheses ALWAYS refer to
populations (use parameters – never
statistics)
• The alternative hypothesis should be
what you are trying to prove!
• ALWAYS define your parameter in
context!
Activity: For each pair of hypotheses,
indicate which are not legitimate &
Must
use parameter
Must be(population)
NOT equal! x
explain
why
is a statistics (sample)
a) H0 :   15 ; Ha :   15
 is the population
b) H0 : x  123; Ha : x  123
proportion!
Must use same
.1 a
1 ;asHHa 0:!  –Not
: isa.statistic
H0 number
c) P-hat
parameter!
d) H0 : p  .4; Ha : p  .6
e) H0 : pˆ  .1 ; Ha : pˆ  .1
P-value -
The statistic is our p-hat!
• Assuming H0 is true, the
probability that the statistic
would have a value as extreme
or more than what is actually
observed
Notice that this is a
Why not
find
the probability
Remember
that
in
continuous
conditional probability
that the
equals
distributions,
wep-hat
cannot
find a
value?
probabilitiescertain
of a single
value!
P-values We can use normalcdf to
• Assuming H0 find
is true,
the probability
this probability.
that the statistic would have a value
as extreme or more than what is
actually observed
In other words . . . What is
the probability of getting
values more (or less) than
our p-hat?
pˆ
pˆ
Level of significance • Is the amount of evidence
necessary before we begin to doubt
that the null hypothesis is true
• Is the probability that we will
reject the null hypothesis, assuming
that it is true
• Denoted by a
– Can be any value
– Usual values: 0.1, 0.05, 0.01
– Most common is 0.05
Statistically significant –
• Our statistic (p-hat) is statistically
Remember that the verdict is never
significant
if
the
p-value
is
as
small
or
“innocent” – so we can never decide
smaller than
the
significance (a).
that
thelevel
null of
is true!
Our “guilty” verdict.
Our “not guilty” verdict.
Decisions:
• If p-value < a, “reject” the null hypothesis
at the a level.
• If p-value > a, “fail to reject” the null
hypothesis at the a level.
Facts about p-values:
• ALWAYS make the decision about
the null hypothesis!
• Large p-values show support for the
null hypothesis, but never that it is
true!
• Small p-values show support that the
null is not true.
• Double the p-value for two-tail (≠)
tests
• Never accept the null hypothesis!
Never “accept” the null hypothesis!
Never “accept” the null
hypothesis!
Never “accept” the
null hypothesis!
Calculating p-values
• For z-test statistic (z) –
– Use normalcdf(lb,ub) to find the
probability of the test statistic
or more extreme
We
will
seewehow
Since
areto
incompute
the
– Remember
the
standard normal
this value
tomorrow.
standard
normal
curve, weof z’s where
curve
is comprised
do
 =not
0 need
and s, =s 1here.
Writing Conclusions:
1) A statement of the decision being
made (reject or fail to reject H0) &
why (linkage)
AND
2) A statement of the results in
context. (state in terms of Ha)
“Since the p-value < (>) a,
I reject (fail to reject)
the H0. There is (is not)
sufficient evidence to
suggest that Ha.”
Be sure to write Ha in
context (words)!
Formula for hypothesis test:
statistic - parameter
Test statistic 
SD of parameter
z
pˆˆ  p
p 1  p 
n
Example 5: A company is willing to renew its
advertising contract with a local radio
station only if the station can prove that
more than 20% of the residents of the city
have heard the ad and recognize the
company’s product. The radio station
conducts a random sample of 400 people
and finds that 90 have heard the ad and
recognize the product. Is this
sufficient evidence for the
company to renew its contract?
Assumptions:
•Have an SRS of people
•np = 400(.2) = 80 & n(1-p) = 400(.8) = 320 - Since both are
greater than 10, this distribution is approximately normal.
•Population of people is at least 4000.
Use the parameter in the null
hypothesis
to check
assumptions!
H0: p = .2
where p is the true
proportion
of people
who
Ha: p > .2
heard the ad
.225  .2
z
 1.25 p  value  .1056 a  .05
.2(.8)
Use the parameter in the null
hypothesis to calculate standard
400
deviation!
Since the p-value > a, I fail to reject the null hypothesis. There
is not sufficient evidence to suggest that the true proportion of
people who heard the ad is greater than .2. The company will not
renew their advertising contract with the radio station.
Calculate the appropriate confidence
interval for the above problem.
CI = (.19066,.25934)
How do the results from the
confidence interval compare to the
results of the hypothesis test?
The confidence interval contains
the parameter of .2 thus providing
no evidence that more than 20%
had heard the ad.
Two-Sample
Proportions
Inference
Assumptions:
• Two, independent SRS’s from
populations ( or randomly assigned
treatments)
• Populations at least 10n
• Normal approximation for both
n1 p1  10
n1 1  p1   10
n2 p2  10
n2 1  p2   10
Formula for confidence interval:
CI  statistic  critical value SD of statistic
 pˆ
 pˆ  
Margin of
1 error! 2
z*
Standard
error!

pˆ1 1  pˆ1  pˆ2 1  pˆ2 

n1
n2
Note: use p-hat when p is not known
Example 1: At Community Hospital, the burn
center Since
is experimenting
new nplasma
n1p1=259, n1with
(1-p1a
)=57,
2p2=94,
compressn2treatment.
A random
sample
of 316
(1-p2)=325 and
all > 10,
then the
distribution
of burns
difference
in proportions
patients
with minor
received
the plasma
is approximately
normal.
compress treatment.
Of
these
.82(.18) .22patients,
(.78) it was
S .259
E . had no visible
 scars after
found that
419 of 419
treatment. Another 316
random sample
 0.0296
patients with
minor burns received no plasma
compress treatment. For this group, it was
found that 94 had no visible scars after
treatment. What is the shape & standard error
of the sampling distribution of the difference in
the proportions of people with no visible scars
between the two groups?
Example 1: At Community Hospital, the burn
center is experimenting with a new plasma
compress treatment. A random sample of 316
patients with minor burns received the plasma
compress treatment. Of these patients, it was
found that 259 had no visible scars after
treatment. Another random sample of 419
patients with minor burns received no plasma
compress treatment. For this group, it was
found that 94 had no visible scars after
treatment. What is a 95% confidence interval
of the difference in proportion of people who
had no visible scars between the plasma
compress treatment & control group?
Assumptions:
Since these are all burn patients, we can add 316
+ 419 treatment
= 735.
•Have 2 independent randomly assigned
groups
If not the same – you MUST list separately.
•Both distributions are approximately normal since n1p1=259,
n1(1-p1)=57, n2p2=94, n2(1-p2)=325 and all > 5
•Population of burn patients is at least 7350.
p1 1  p1  p2 1  p2 
pˆ1  pˆ2   z *


n1
n2
.82.18 .22.78
.82  .22  1.96

 .537, .654 
316
419
We are 95% confident that the true the difference in
proportion of people who had no visible scars between the
plasma compress treatment & control group is between 53.7%
and 65.4%
Example 2: Suppose that researchers
want to estimate the difference in
proportions of people who are against
the death penalty in Texas & in
Since both n’s are the same
California.size,
Ifyou
the
two
sample sizes
have
common
denominators
so add!
are the same,
what –size
sample is
needed to be within 2% of the true
difference at 90% confidence?
.5(.5) .5(.5)
.02  1.645

n
n
.25  .25
.02  1.645
n
n = 3383
Hypothesis statements:
H00:: pp11 -=pp22= 0
H
H
p11 >- p
p22 > 0
Haa:: p
Haa:: p
H
p11 <- p
p22 < 0
H
p11 ≠- pp22≠ 0
Haa:: p
Be sure to
define both
p1 & p2!
Since we assume that the
population proportions are
equal in the null hypothesis,
the variances are equal.
Therefore, we pool
the variances!
x1  x 2
pˆ 
n1  n2
Formula for Hypothesis test:
Usually
p1 statistic
– p2 =0
Test statistic 
z
- parameter
SD of statistic
pˆ1  pˆ2   p1  p2 
1 1
pˆ1  pˆ 

n1
n2
Example 4: A forest in Oregon has an
infestation of spruce moths. In an effort to
control the moth, one area has been
regularly sprayed from airplanes. In this
area, a random sample of 495 spruce trees
showed that 81 had been killed by moths. A
second nearby area receives no treatment.
In this area, a random sample of 518 spruce
trees showed that 92 had been killed by the
moth. Do these data indicate that the
proportion of spruce trees killed by the
moth is different for these areas?
Assumptions:
•Have 2 independent SRS of spruce trees
•Both distributions are approximately normal since n1p1=81,
n1(1-p1)=414, n2p2=92, n2(1-p2)=426 and all > 10
•Population of spruce trees is at least 10,130.
H0: p1=p2
Ha: p1≠p2
z 

where p1 is the true proportion of trees killed by moths
in the treated area p2 is the true proportion of trees
killed by moths in the untreated area
pˆ1  pˆ2
pˆ 1  pˆ

1
1

n1 n2

.16  .18
 0.59
1
1
.17.83

495 518
P-value = 0.5547
a = 0.05
Since p-value > a, I fail to reject H0. There is not sufficient
evidence to suggest that the proportion of spruce trees killed by
the moth is different for these areas
Download