Confidence Intervals

advertisement
Confidence
Intervals
Chapter 10
Rate your confidence
0 - 100
• Name my age within 10 years?
•
within 5 years?
•
within 1 year?
• Shooting a basketball at a wading pool, will I
make the basket?
• Shooting the ball at a large trash can, will I
make the basket?
• Shooting the ball at a carnival, will I make the
basket?
What happens to your
confidence as the interval
gets smaller?
The larger your confidence,
the wider the interval.
Point Estimate
• Use a single statistic based on
sample data to estimate a
population parameter
• Simplest approach
• But not always very precise due to
variation in the sampling
distribution
Confidence intervals
• Are used to estimate the
unknown population mean
• Formula:
estimate + margin of error
Margin of error
• Shows how accurate we believe our
estimate is
• The smaller the margin of error, the
more precise our estimate of the true
parameter
• Formula:
 critical
m  
 value
  standard deviation
  
  of the statistic



Confidence level
• Is the success rate of the method
used to construct the interval
• Using this method, ____% of the
time the intervals constructed will
contain or CAPTURE the true
population parameter
What does it mean to be 95%
confident?
• 95% chance that m is contained in
the confidence interval
• The probability that the interval
contains m is 95%
• The method used to construct the
interval will produce intervals that
contain m 95% of the time.
Critical value (z*)
• Found from the confidence level
• The upper z-score with probability p lying to
its right under the standard normal curve
Confidence level
90%
95%
99%
z*=1.645
tail area z*=1.96
z*=2.576z*
.05
.025
.005
1.645
.05
.025 1.96
.005
2.576
Confidence interval for a
population mean:
Standard
Critical
value
deviation of the
statistic
  
x  z *

 n
estimate
Margin of error
The 4-Step Process
(from the Inference Toolbox)
Step 1 (Population and parameter)
Define the population and parameter you are
investigating
Step 2 (Conditions)
Do we have biased data?
• If SRS, we’re good. Otherwise PWC (proceed
with caution)
Do we have independent sampling?
• If pop>10n, we’re good. Otherwise PWC.
Do we have a normal distribution?
• If pop is normal or n>30 (CLT), we’re good.
Otherwise, PWC.
The 4-Step Process
(from the Inference Toolbox)
Step 3 (Calculations)
• Find z* based on your confidence level. If you
are not given a confidence level, use 95%
• Calculate CI.
Step 4 (Interpretation)
• “With ___% confidence, we believe that the true
mean is captured in the interval (lower, upper)”
The 4-Step Process
Confidence Interval
Step 1 (Population and parameter)
Define the population and parameter you are
investigating
Step 2 (Conditions)
 SRS from population?
 Pop>10n?
 Pop is normal or n>30 (CLT)? If raw data is
given, graph and see if distribution is normal
  is known
The 4-Step Process
Confidence Interval
Step 3 (Calculations)
• Find z* based on your confidence level. If you
are not given a confidence level, use 95%
• Calculate CI using
  
x  z *

n


Step 4 (Interpretation)
• “With ___% confidence, we believe that the true
mean is captured in the interval (lower, upper)”
• Or…. “The methods used to construct the
interval will capture the true mean ___% of the
time”.
Statement: (memorize!!)
We are ________% confident
that the true mean context IS
CAPTURED within the interval
from ______ to ______.
(This means that using these methods,
____% of the time the intervals constructed
will capture the true population mean.)
A test for the level of potassium in the blood
is not perfectly precise. Suppose that
repeated measurements for the same
person on different days vary normally with
 = 0.2. A random sample of three has a
mean of 3.2. What is a 90% confidence
interval for the mean potassium level?
Assumptions:
Have an SRS of blood measurements
Potassium level is normally distributed (given)
 known
 .2 
  3.0101, 3.3899 
3.2  1.645
 3
We are 90% confident that the true mean
potassium level is captured in the interval
between 3.01 and 3.39.
95% confidence interval?
Assumptions:
Have an SRS of blood measurements
Potassium level is normally distributed
(given)
 known
 .2 
  2.9737, 3.4263
3.2  1.96
 3
We are 95% confident that the true mean
potassium level is captured in the
interval between 2.97 and 3.43.
99% confidence interval?
Assumptions:
Have an SRS of blood measurements
Potassium level is normally distributed
(given)
 known
 .2 
3.2  2.576
  2.9026,3.4974
 3
We are 99% confident that the true mean
potassium level is captured in the interval
between 2.90 and 3.50.
What happens to the interval as the
confidence level increases?
the interval gets wider as the
confidence level increases
How can you make the margin of
error smaller?
• z* smaller
(lower confidence level)
•
 smaller
(less variation in the population)
• n larger
Really cannot
(to cut the margin of error
in half, n must
change!
be 4 times as big)
A random sample of 50 SHS students
was taken and their mean SAT score
was 1250. (Assume  = 105) What is a
95% confidence interval for the mean
SAT scores of SHS students?
 105 
1250  1.96 
  1220.90,1279.10 
 50 
We are 95% confident that the true
mean SAT score for SHS students is
captured in the interval between
1220.9 and 1279.1
Suppose that we have this random sample
of SAT scores:
950 1130 1260 1090 1310 1420 1190
What is a 95% confidence interval for the
true mean SAT score? (Assume  = 105)
We are 95% confident that the true
mean SAT score for SHS students is
captured in the interval between 1115.1
and 1270.6.
Find a sample size:
• If a certain margin of error is wanted,
then to find the sample size necessary
for that margin of error use:
 
m  z *

 n
Always round up to the nearest person!
The heights of SHS male students is
normally distributed with  = 2.5
inches. How large a sample is
necessary to be accurate within + .75
inches with a 95% confidence
interval?
 2.5 
0.75  1.96 

 n
n = 42.68 or 43 students
In a randomized comparative experiment
on the effects of calcium on blood
pressure, researchers divided 54 healthy,
white males at random into two groups,
takes calcium or placebo. The paper
reports a mean seated systolic blood
pressure of 114.9 with standard deviation
of 9.3 for the placebo group. Assume
systolic blood pressure is normally
distributed.
Can you find a z-interval for this
problem? Why or why not?
Student’s t- distribution
• Developed by William Gosset
• Continuous distribution
• Unimodal, symmetrical, bell-shaped
density curve
• Above the horizontal axis
• Area under the curve equals 1
• Based on degrees of freedom
Graph examples of t- curves vs normal
curve
How does t compare to
normal?
• Shorter & more spread out
• More area under the tails
• As n increases, t-distributions
become more like a standard
normal distribution
How to find t*
Can also use invT on the calculator!
• Use Table B for t distributions
t* value with
5% at
is above
• Need
Lookupper
up confidence
level
bottom– &
so 95% is below
df on the sides
• df = n – 1
invT(p,df)
Find these t*
90% confidence when n = 5
95% confidence when n = 15
t* =2.132
t* =2.145
Formula:
Standard
deviation of
Critical value
statistic
Confidence Interval :
 s 

x  t * 
 n
estimate
Margin of error
Assumptions for t-inference
• Have an SRS from population
•  unknown
• Normal distribution
– Given
– Large sample size
– Check graph of data
For the Ex. 4: Find a 95% confidence
interval for the true mean systolic
blood pressure of the placebo group.
Assumptions:
• Have an SRS of healthy, white males
• Systolic blood pressure is normally distributed
(given).
•  is unknown
 9.3 
114.9  2.056
  (111.22, 118.58)
 27 
We are 95% confident that the true mean systolic
blood pressure is between 111.22 and 118.58.
Robust
• An inference procedure is ROBUST if
the confidence level or p-value doesn’t
change much if the assumptions are
violated.
• t-procedures can be used with some
skewness, as long as there are no
outliers.
• Larger n can have more skewness.
Ex. 5 – A medical researcher measured
the pulse rate of a random sample of 20
adults and found a mean pulse rate of
72.69 beats per minute with a standard
deviation of 3.86 beats per minute.
Assume pulse rate is normally
distributed. Compute a 95% confidence
interval for the true mean pulse rates of
adults.
(70.883, 74.497)
Another medical researcher claims that
the true mean pulse rate for adults is 72
beats per minute. Does the evidence
support or refute this? Explain.
The 95% confidence interval contains
the claim of 72 beats per minute.
Therefore, there is no evidence to doubt
the claim.
Ex. 6 – Consumer Reports tested 14
randomly selected brands of vanilla
yogurt and found the following
numbers of calories per serving:
160 200 220 230 120 180 140
130 170 190 80 120 100 170
Compute a 98% confidence interval for
the average calorie content per serving
of vanilla yogurt.
(126.16, 189.56)
A diet guide claims that you will get 120
calories from a serving of vanilla
Note: confidence intervals tell us
yogurt. What does this evidence
if something is NOT EQUAL –
indicate?
never less or greater than!
Since 120 calories is not contained
within the 98% confidence interval, the
evidence suggest that the average
calories per serving does not equal 120
calories.
Some Cautions:
• The data MUST be a SRS from the
population
• The formula is not correct for more
complex sampling designs, i.e.,
stratified, etc.
• No way to correct for bias in data
Cautions continued:
• Outliers can have a large effect on
confidence interval
• Must know  to do a z-interval –
which is unrealistic in practice
Download