Confidence Intervals

advertisement
Confidence
Intervals with
Means
Chapter 9
What is the purpose of a
confidence interval?
To estimate an unknown
population parameter
If
One-Sample z Confidence
Interval for m
1. x is the sample mean from a random
sample,
2. The sample size n is large (generally n30),
and
3. s , the population standard deviation, is
known then the general formula for a confidence
interval for a population mean m is given by
s
x   z critical value 
n
Formula:
Standard
deviation of
Critical value
parameter
Confidence Interval :
 s 

x  z * 
 n
statistic
Margin of error
Example
A certain filling machine has a true
population standard deviation s = 0.228
ounces when used to fill catsup bottles. A
random sample of 36 “6 ounce” bottles of
catsup was selected from the output from
this machine and the sample mean was
6.018 ounces.
Find a 90% confidence interval estimate for
the true mean fills of catsup from this
machine.
x  6.018, s  0.228, n  36
The z critical value is 1.645
s
x  (z critical value)
n
0.228
 6.018  1.645
 6.018  0.063
36
90% Confidence Interval (5.955, 6.081)
Conclusion:
We are 90% confident that the true mean fills of
catsup from the machine is between 5.955 oz. and
6.081 oz.
In a randomized comparative experiment on
the effects of calcium on blood pressure,
researchers divided 54 healthy, white males
at random into two groups. The participants
either take calcium or a placebo. The paper
reports a mean seated systolic blood
pressure of 114.9 with standard deviation of
9.3 for the placebo group. Assume systolic
blood pressure is normally distributed.
Can you find a z-interval for this problem?
Why or why not?
We only know sample statistics! We do
not know population standard deviation!
William S. Gossett
Quality control engineer for
Guiness Brewery in Dublin, Ireland
Checked the stout’s quality by
performing hypothesis tests
Figured out a new family of models
Student’s t distributions
Student’s t- distribution
• Developed by William Gosset
• Continuous distribution
• Unimodal, symmetrical, bell-shaped
density curve
• Above the horizontal axis
• Area under the curve equals 1
• Based on degrees of freedom
df = n - 1
t Distributions
How does the t-distributions
compare to the standard
normal distribution?
• Bell-shaped and centered at 0
• Shorter & more spread out
• More area under the tails
• As n increases, t-distributions become more
like a standard normal distribution
Formula:
Standard
deviation of
Standard error
–
Critical value
statistic
when you
substitute s for s.
Confidence Interval :
 s 

x  t * 
 n
statistic
Margin of error
t Distributions
Since each t distribution would require a
table similar to the standard normal
table, we usually only create a table of
critical values for the t distributions.
Appendix Table 3 in BOB
Central area captured:
Confidence level:
1
2
3
4
5
6
D
7
e
8
g
9
r
10
11
e
12
e
13
s
14
15
16
o
17
f
18
19
20
f
21
r
22
e
23
24
e
25
d
26
o
27
m
28
29
30
40
60
120
z critical values
0.80
0.90
0.95
0.98
0.99
0.998
0.999
80%
90%
95%
98%
99%
99.8%
99.9%
3.08
1.89
1.64
1.53
1.48
1.44
1.41
1.40
1.38
1.37
1.36
1.36
1.35
1.35
1.34
1.34
1.33
1.33
1.33
1.33
1.32
1.32
1.32
1.32
1.32
1.31
1.31
1.31
1.31
1.31
1.30
1.30
1.29
1.28
6.31
2.92
2.35
2.13
2.02
1.94
1.89
1.86
1.83
1.81
1.80
1.78
1.77
1.76
1.75
1.75
1.74
1.73
1.73
1.72
1.72
1.72
1.71
1.71
1.71
1.71
1.70
1.70
1.70
1.70
1.68
1.67
1.66
1.645
12.71
4.30
3.18
2.78
2.57
2.45
2.36
2.31
2.26
2.23
2.20
2.18
2.16
2.14
2.13
2.12
2.11
2.10
2.09
2.09
2.08
2.07
2.07
2.06
2.06
2.06
2.05
2.05
2.05
2.04
2.02
2.00
1.98
1.96
31.82
6.96
4.54
3.75
3.36
3.14
3.00
2.90
2.82
2.76
2.72
2.68
2.65
2.62
2.60
2.58
2.57
2.55
2.54
2.53
2.52
2.51
2.50
2.49
2.49
2.48
2.47
2.47
2.46
2.46
2.42
2.39
2.36
2.33
63.66
9.92
5.84
4.60
4.03
3.71
3.50
3.36
3.25
3.17
3.11
3.05
3.01
2.98
2.95
2.92
2.90
2.88
2.86
2.85
2.83
2.82
2.81
2.80
2.79
2.78
2.77
2.76
2.76
2.75
2.70
2.66
2.62
2.58
318.29
22.33
10.21
7.17
5.89
5.21
4.79
4.50
4.30
4.14
4.02
3.93
3.85
3.79
3.73
3.69
3.65
3.61
3.58
3.55
3.53
3.50
3.48
3.47
3.45
3.43
3.42
3.41
3.40
3.39
3.31
3.23
3.16
3.09
636.58
31.60
12.92
8.61
6.87
5.96
5.41
5.04
4.78
4.59
4.44
4.32
4.22
4.14
4.07
4.01
3.97
3.92
3.88
3.85
3.82
3.79
3.77
3.75
3.73
3.71
3.69
3.67
3.66
3.65
3.55
3.46
3.37
3.29
How
find
t* on the calculator!
Canto
also
use invT
• Use
t distributions
NeedTable
upperBt*for
value
with 5% is above –
• Look up confidence
so 95% is level
belowat bottom &
df on the sides
invT(p,df)
• df = n – 1
Find these t*
90% confidence when n = 5
95% confidence when n = 15
t* =2.132
t* =2.145
Let’s do some comparing:
Finding probabilities
Normal model:
Normalcdf(1.645, ∞) .04998
Student’s t models:
tcdf(1.645, ∞, 4) .08766
tcdf(1.645, ∞, 9) .06719
Student’s t models
resemble normal
models as sample
size gets bigger….
tcdf(1.645, ∞, 14) .06111
tcdf(1.645, ∞, 29) .05538
tcdf(1.645, ∞, 99) .05157
Finding critical values:
Normal model:
z critical values:
95% invNorm(.975) 1.96
T critical values
approach z critical
values as sample
size increases….
Student’s t models:
t critical values:
invT(.975, 4)
2.776
invT(.975, 9)
2.262
invT(.975, 14)
2.145
invT(.975, 29)
2.045
invT(.975, 99)
1.984
Steps for doing a confidence
interval:
1) Identify by name or formula
One-sample confidence interval for means
2) Assumptions
3) Calculate the interval
Confidence Interval :
 s 

x  t * 
 n
4) Write a statement about the interval
in the context of the problem.
Assumptions for t-inference
• Have an SRS from population (or
randomly assigned treatments)
•
s unknown
Use only one of
• Normal (or approx. normal)
thesedistribution
methods to
– Given
check normality
– Large sample size
– Check graph of data
Statement: (memorize!!)
We are ________% confident
that the true mean context is
between ______ and ______.
Ex. 1) Find a 95% confidence interval for the
true mean systolic blood pressure of the
placebo group.
Assumptions:
• Have randomly assigned males to treatment
• Systolic blood pressure is normally distributed
(given).
• s is unknown
 9.3 
114.9  2.056
  (111.22, 118.58)
 27 
We are 95% confident that the true mean systolic
blood pressure is between 111.22 and 118.58.
Ex. 2) A medical researcher measured
the pulse rate of a random sample of 20
adults and found a mean pulse rate of
72.69 beats per minute with a standard
deviation of 3.86 beats per minute.
Assume pulse rate is normally
distributed. Compute a 95% confidence
interval for the true mean pulse rates of
adults.
One-sample confidence interval for means
Assumptions:
• random sample of adults
• Pulse rate is normally distributed (given).
• s is unknown
 3.86 
72.69  2.093 
  (70.883, 74.497)
 20 
We are 95% confident that the true mean
pulse rate of adults is between 70.883 &
74.497.
Ex 2 continued) Another medical
researcher claims that the true mean
pulse rate for adults is 72 beats per
minute. Does the evidence support or
refute this? Explain.
The 95% confidence interval
contains the claim of 72 beats
per minute. Therefore, there is
no evidence to doubt the
claim.
Ex. 3) Consumer Reports tested 14
randomly selected brands of vanilla
yogurt and found the following numbers
of calories per serving:
160 200 220 230 120 180 140
130 170 190 80 120 100 170
Compute a 98% confidence interval for
the average calorie content per serving
of vanilla yogurt.
We are 98% confident that the true mean
calorie content per serving of vanilla
yogurt is between 126.16 calories & 189.56
calories.
confidence
intervals
tell us that
Ex 3Note:
continued)
A diet
guide claims
if something
is NOT from
EQUAL
–
you will
get 120 calories
a serving
never
less or
greater
than!
of vanilla
yogurt.
What
does
this
evidence indicate?
Since 120 calories is not contained
within the 98% confidence interval, the
evidence suggest that the average
calories per serving does not equal
120 calories.
CI & p-values deal with area in the tails
Robust
– is the area changed greatly when
there
is
skewness
• An inference procedure is ROBUST if
the confidence level or p-value doesn’t
change much if the normality
assumption is violated.
Since there is more area in the tails in tdistributions,can
then,
a distribution
has
• t-procedures
beif used
with some
some skewness,
tail area
not
skewness,
as long the
as there
areisno
greatly affected.
outliers.
• Larger n can have more skewness.
Find a sample size:
• If a certain margin of error is wanted,
then to find the sample size necessary
for that margin of error use:
s 
m  z *

 n
Always round up to the nearest person!
Ex 4) The heights of SHS male
students is normally distributed with
s = 2.5 inches. How large a sample
is necessary to be accurate within +
.75 inches with a 95% confidence
interval?
n = 43
Some Cautions:
• The data MUST be a SRS from the
population (or randomly assigned
treatment)
• The formula is not correct for more
complex sampling designs, i.e.,
stratified, etc.
• No way to correct for bias in data
Cautions continued:
• Outliers can have a large effect on
confidence interval
• Must know s to do a z-interval –
which is unrealistic in practice
Confidence Interval Example
Ten randomly selected shut-ins were each asked
to list how many hours of television they
watched per week. The results are
82
66
90
84
75
88
80
94
100
91
Find a 90% confidence interval estimate for the
true mean number of hours of television
watched per week by shut-ins.
Can we meet the assumptions for a t –
confidence interval?
Calculating the sample mean and standard
86s = 9.843
deviation we have n = 10, x= 85,
t-critical value of 1.833 by looking on the t table
at 90% confidence with df = 9 or invt(.95, 9).
s
x  t*
n
9.843
85  1.833(
)  85  5.705 (79.295, 90.705)
10
We are 90% confident that the true mean number of
hours of television watched per week is between
79.295 hours and 90.705 hours.
Download